date:20160118

Re: [HACKERS] Sequence Access Method WIP

2016-01-18 Thread Craig Ringer

Needs rework after the commit of https://commitfest.postgresql.org/8/336/
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Amit Kapila

On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier 
wrote:
>
> On Sun, Jan 17, 2016 at 1:37 PM, Amit Kapila 
wrote:
> > On Sat, Jan 16, 2016 at 6:37 PM, Michael Paquier <
michael.paqu...@gmail.com>
> > wrote:
> >>
> >>
> >>
> >
> > So here if I understand correctly the check related to the last segment
> > forcibly switched is based on the fact that if it is forcibly switched,
then
> > we don't need to log this record? What is the reason of such an
> > assumption?
>
> Yes, the thing is that, as mentioned at the beginning of the thread, a
> low value of archive_timeout causes a segment to be forcibly switched
> at the end of the timeout defined by this parameter. In combination
> with the standby snapshot taken in bgwriter since 9.4, this is causing
> segments to be always switched even if a system is completely idle.
> Before that, in 9.3 and older versions, we would have a segment
> forcibly switched on an idle system only once per checkpoint.
>

Okay, so this will fix the case where the system is idle, but what I
am slightly worried is that it should not miss to log the standby snapshot
due to this check when there is actually some activity in the system.
Why is not possible to have a case such that the segment is forcibly
switched due to reason other than checkpoint (user has requested the
same) and the current insert LSN is at beginning of a new segment
due to write activity? If that is possible, which to me theoretically seems
possible, then I think bgwriter will miss to log the standby snapshot.

>
> The
> documentation is already clear about that actually. The whole point of
> this patch is to fix this regression, aka switch to a new segment only
> once per checkpoint.
>
> > It is not very clear by reading the comments you have
> > added in patch, may be you can expand comments slightly to explain
> > the reason of same.
>
> OK. Here are the comments that this patch is adding:
> - * only log if enough time has passed and some xlog record
has
> - * been inserted.
> + * Only log if enough time has passed and some xlog record
has
> + * been inserted on a new segment. On an idle system where
> + * segments can be archived in a fast pace with for example a
> + * low archive_command setting, avoid as well logging a new
> + * standby snapshot if the current insert position is still
> + * at the beginning of the segment that has just been
switched.
> + *
> + * It is possible that GetXLogLastSwitchPtr points to the
last
> + * position of previous segment or to the first position of
the
> + * new segment after the switch, hence take both cases into
> + * account when deciding if a standby snapshot should be
taken.
> + * (see comments on top of RequestXLogSwitch for more
details).
>   */
> It makes mention of the problem that it is trying to fix, perhaps you
> mean that this should mention the fact that on an idle system standby
> snapshots should happen at worst once per checkpoint?
>

I mean to say that in below part of comment, explanation about the
the check related to insert position is quite clear whereas why it is
okay to avoid logging standby snapshot when the segment is not
forcibly switched is not apparent.

* avoid as well logging a new
* standby snapshot if the current insert position is still
* at the beginning of the segment that has just been switched.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Andres Freund

On 2015-12-21 16:26:25 +0900, Michael Paquier wrote:
> On Sun, Dec 20, 2015 at 10:14 PM, Michael Paquier
>  wrote:
> > Speaking of which, this patch was registered in this CF, I am moving
> > it to the next as a bug fix.
> 
> I found a stupid bug in my previous patch: when issuing XLOG_SWITCH it
> is possible that the return LSN pointer is on the new segment that has
> been forcibly archived if RequestXLogSwitch is called multiple times
> and that subsequent calls are not necessary. Patch updated.

I find this patch rather unsatisfactory. Yes, it kinda solves the
problem of archive timeout, but it leaves the bigger and longer standing
problems of unneccessary checkpoints with wal_level=hs in place. It's
also somewhat fragile in my opinion.

I think we rather want a per backend (or perhaps per wal insertion lock)
flag that says 'last relevant record inserted at', and a way to not set
that during insertion. Then during a checkpoint or the relevant bgwriter
code, we look wether anything relevant happened in any backend since the
last time we performed a checkpoint/logged a running xacts snapshot.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Haribabu Kommi

On Mon, Jan 18, 2016 at 10:32 PM, David Rowley
 wrote:
> On 18 January 2016 at 16:44, Robert Haas  wrote:
>>
>> On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
>>  wrote:
>> > hmm, so wouldn't that mean that the transition function would need to
>> > (for
>> > each input tuple):
>> >
>> > 1. Parse that StringInfo into tokens.
>> > 2. Create a new aggregate state object.
>> > 3. Populate the new aggregate state based on the tokenised StringInfo,
>> > this
>> > would perhaps require that various *_in() functions are called on each
>> > token.
>> > 4. Add the new tuple to the aggregate state.
>> > 5. Build a new StringInfo based on the aggregate state modified in 4.
>> >
>> > ?
>>
>> I don't really know what you mean by parse the StringInfo into tokens.
>> The whole point of the expanded-object interface is to be able to keep
>> things in an expanded internal form so that you *don't* have to
>> repeatedly construct and deconstruct internal data structures.
>
>
> That was a response to Haribabu proposal, although perhaps I misunderstood
> that. However I'm not sure how a PolyNumAggState could be converted to a
> string and back again without doing any string parsing.

I just thought like direct mapping of the structure with text pointer.
something like
the below.

result = PG_ARGISNULL(0) ? NULL : (text *) PG_GETARG_POINTER(0);
state = (PolyNumAggState *)VARDATA(result);

To handle the big-endian or little-endian, we may need some extra changes.

Instead of adding 3 new columns to the pg_aggregate catalog table to handle
the internal types, either something like the above to handle the internal types
or some other way is better IMO.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Expanded Objects and Order By

2016-01-18 Thread Paul Ramsey

I'm starting to think this might not actually be my mistake, but be a
very narrow issue w/ the expanded object code.

So, I've added support for converting postgis in-memory objects into
expanded outputs, and have overwritten the usual
lwgeom_to_gserialized() function with one that creates an expanded
object. I haven't done anything to actually handle expanded objects on
input, but as I understand it, that's fine, everything is supposed to
work as normal when handed expanded objects. And thus far, that has
been true, almost all regression tests work fine.

But here's my narrow case. This works fine with the old
lwgeom_to_gserialized that uses standard flat varlena outputs. It does
the following w/ expanded outputs.

SELECT gid
FROM test_table
ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5;

ERROR:  could not find pathkey item to sort

Tracing through the debugger, I see the lwgeom_to_gserialized()
function get hit once for the geometry literal, via parse_analyze.
After that, the error shows up shortly.

If I change the query ever so slightly, adding a geometry output column:

SELECT gid, geom
FROM test_table
ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5;

It runs through to completion, five records returned. As long as the
query includes a geometry output on the output line, it works. A
function that uses a geometry, but not doesn't actually output it,
will still fail, with the same pathkey error. This will fail, for
example.

SELECT gid, ST_AsText(geom)
FROM test_table
ORDER BY st_distance(geom, 'POINT(-305 998.5)'::geometry) LIMIT 5;

So I'm wondering what I should debug next. My code is hardly being
touched at all at this point, and I'm very confident it's memory
clean, etc (the flattener raises hell if the output is shorter or
longer than the expected allocation).

Any pointers as to what to look for, much appreciated.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dealing with extension dependencies that aren't quite 'e'

2016-01-18 Thread Craig Ringer

On 15 January 2016 at 14:26, Abhijit Menon-Sen  wrote:

> * «DROP EXTENSION ext» won't work without adding CASCADE, which is an
>   (admittedly relatively minor) inconvenience to users.
>
> * More importantly, pg_dump will dump all those trigger definitions,
>   which is inappropriate in this case because the extension will try
>   to create them.
>
>
I dealt with both of those in BDR (and pglogical), where we create TRUNCATE
triggers to capture and replicate table truncation. The triggers are
created either during node creation/join by a SQL function that calls into
C code, or via an event trigger on CREATE TABLE for subsequent creations.

Creating them tgisinternal gets you both properties you need IIRC.
Certainly it hides them from pg_dump, which was the requirement for me.

You can't easily create a tgisinternal trigger from SQL. You can hack it
but it's ugly. It's simple enough to just create from C. See:

https://github.com/2ndQuadrant/bdr/blob/5567302d8112c5422efc80fc43d79cd347afe09b/bdr_executor.c#L393

Other people are doing it the hacky way already, see e.g.:

https://github.com/zombodb/zombodb/commit/c801a2b766bad729a22547e0a26c17cf80ec279e

Rather than overloading 'e', we could introduce a new dependency type
> that references an extension, but means that the dependent object should
> be dropped when the extension is, but it can also be dropped on its own,
> and pg_dump should ignore it.

So ... kind of like tgisinternal and 'i' dependencies, but without the
restriction on the object being dropped directly?

Is there any particular reason the user needs to be able to drop the
created trigger directly?

Is it reasonable to endorse the use of 'i' dependencies by extensions
instead?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] extend pgbench expressions with functions

2016-01-18 Thread Fabien COELHO




The basic operator functions also do not check for integer overflows.


This is a feature. I think that they should not check for overflow, as in C,
this is just int64_t arithmetic "as is".


(int64_t is not an available type on Windows btw.)


Possibly. I really meant "64 bits signed integers", whatever its name. 
"int64_t" is the standard C99 name.


Finally I can think of good reason to use overflows deliberately, so I 
think it would argue against such a change.



Could you show up an example? I am curious about that.


The one I can think of is the use of "SUM" to aggregate hashes for 
computing a hash on a table. If SUM would overflow and stop this would 
break the usage. Now there could be a case for having an overflow 
detection on SUM, but that would be another SUM, preferably with a 
distinct name.



\set cid debug(9223372036854775807 * 9223372036854775807)
debug(script=0,command=3): int 1

All these results are fine from my point of view.


On HEAD we are getting similar strange results,


Yep, this is not new.


so I am fine to withdraw but that's still very strange to me.


Arithmetic operator modulo are pretty strange, I can agree with that:-)

The first case is generating -9223372036854775808, the second one 
compiles 9223372036854775807 and the third one generates 1.


This should be the "real" result modulo 2**64, if I'm not mistaken.

Or we make the error checks even more consistent in back-branches, 
perhaps that 's indeed not worth it for a client though.


Yep, that would be another patch.


And this one generates a core dump:
\set cid debug(-9223372036854775808 / -1)
Floating point exception: 8 (core dumped)


This one is funny, but it is a fact of int64_t life: you cannot divide
INT64_MIN by -1 because the result cannot be represented as an int64_t.
This is propably hardcoded in the processor. I do not think it is worth
doing anything about it for pgbench.


This one, on the contrary, is generating an error on HEAD, and your 
patch is causing a crash: value "9223372036854775808" is out of range 
for type bigint That's a regression, no?


Hmmm, yes, somehow, but just for this one value, if you try the next:

pgbench 9.4.5: value "-9223372036854775809" is out of range for type bigint

I guess that the implementation before 9.5 converted 
"-9223372036854775808" as an int, which is INT64_MIN, so it was fine. Now 
it is parsed as "operator uminus" applied to "9223372036854775808", which 
is not fine because this would be INT64_MAX+1, which is not possible.


I would prefer just to neglect that as a very small (1/2**64) feature 
change rather than a meaningful regression, especially as the coding 
effort to fix this is significant and the value of handling it differently 
is nought.


I am uncomfortable with the fact of letting such holes in the code, even 
if that's a client application.


This is a 2**128 probability case which stops pgbench. It is obviously 
possible to add a check to catch it, and then generate an error message, 
but I would rather just ignore it and let pgbench stop on that.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we need SQL-level access to amoptions functions?

2016-01-18 Thread Craig Ringer

On 18 January 2016 at 04:13, Tom Lane  wrote:

>
> Does anyone know of client code that's actually doing this?
>
>
site:github.com "ginoptions" -"pg_am.h" -"ginutil.c" -png -typedefs -vim

and the same for "btreeoptions" looks promising.

Nothing seems to spring out on https://searchcode.com either.

It seems quite reasonable to leave such an esoteric corner as "fix it if
somebody screams" territory for a point release.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Sequence Access Method WIP

2016-01-18 Thread Craig Ringer

On 1 January 2016 at 07:51, Petr Jelinek  wrote:

>
> Other than that, this is based on the new am api by Alexander Korotkov
> [1]. It extends it by adding another column called amkind to the pg_am
> which can have either value "i" for index or "S" for sequence (same as
> relkind in pg_class for those).
>

This patch will no longer apply after 65c5fcd353 (
http://github.com/postgres/postgres/commit/65c5fcd353) as outlined in
http://www.postgresql.org/message-id/10804.1453077...@sss.pgh.pa.us .

Setting waiting-on-author in the CF app.

The good news is that the commit of the pg_am rework greatly eases the path
of this patch into core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Optimizer questions

2016-01-18 Thread Gilles Darold

Le 18/01/2016 03:47, Bruce Momjian a écrit :
> On Tue, Jan  5, 2016 at 05:55:28PM +0300, konstantin knizhnik wrote:
>> Hi hackers, 
>>
>> I want to ask two questions about PostgreSQL optimizer.
>> I have the following query:
>>
>> SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id
>> as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name
>> ,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login 
>> as
>> creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,
>> (select 'userid\\:'||string_agg(user_id,' userid\\:') from 
>> get_authorized_users
>> (o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT 
>> JOIN
>> flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN 
>> objects_last_activity
>> la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo
>> WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner 
>> =
>> uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;
> FYI, I could not make any sense out of this query, and I frankly can't
> figure out how others can udnerstand it.  :-O   Anyway, I ran it through
> pgFormatter (https://github.com/darold/pgFormatter), which I am showing
> here because I was impressed with the results:
>
>   SELECT
>   o.id AS id,
>   s.id AS sid,
>   o.owner,
>   o.creator,
>   o.parent_id AS dir_id,
>   s.mime_id,
>   m.c_type,
>   s.p_file,
>   s.mtime,
>   o.ctime,
>   o.name,
>   o.title,
>   o.size,
>   o.deleted,
>   la.otime,
>   la.etime,
>   uo.login AS owner_login,
>   uc.login AS creator_login,
>   (
>   CASE
>   WHEN f.user_id IS NULL THEN 0
>   ELSE 1
>   END ) AS flagged,
>   (
>   SELECT
>   'userid\\:' || string_agg (
>   user_id,
>   ' userid\\:' )
>   FROM
>   get_authorized_users (
>   o.id ) ) AS acl
>   FROM
>   objects s
>   JOIN objects o ON s.original_file = o.id
>   LEFT JOIN flags f ON o.id = f.obj_id
>   AND o.owner = f.user_id
>   LEFT JOIN objects_last_activity la ON o.id = la.obj_id
>   AND o.owner = la.user_id,
>   mime m,
>   users uc,
>   users uo
>   WHERE (
>   s.mime_id = 904
>   OR s.mime_id = 908 )
>   AND m.mime_id = o.mime_id
>   AND o.owner = uo.user_id
>   AND o.creator = uc.user_id
>   ORDER BY
>   s.mtime
>   LIMIT 9;
>


Thanks Bruce for the pointer on this tool, even if it is not perfect I
think it can be useful. There's also an on-line version at
http://sqlformat.darold.net/ that every one can use without having to
install it and to format queries up to 20Kb with option to control the
output format. It can also anonymize SQL queries.

Actually this is the SQL formatter/beautifier used in pgBadger, it has
been extracted as an independent project to be able to improve this part
of pgBadger without having to run it each time.

-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-18 Thread Ashutosh Bapat

Hi All,
PFA patches for postgres_fdw join pushdown, taken care of all TODOs in my
last mail.

Here is the list of things that have been improved/added new as compared to
Hanada-san's previous patch at [1].

1. Condition handling for join
Patch in [1] allowed a foreign join to be pushed down if only all the
conditions were safe to push down to the foreign server. This patch
differentiates these conditions into 1. conditions to be applied while
joining (ON clause) 2. conditions to be applied after joining (WHERE
clause). For a join to be safe to pushdown, only conditions in 1 need to be
all safe to pushdown. The conditions in second category, which are not safe
to be pushed down can be applied locally. This allows more avenue for join
pushdown. For an INNER join all the conditions can be applied on the cross
product. Hence we can push down an INNER join even if one or more of the
conditions are not safe to be pushed down. This patch includes the
optimization as well.

2. Targetlist handling:
The columns required to evaluate the non-pushable conditions on a join
relation need to be fetched from the foreign server. In previous patch the
SELECT clauses were built from rel->reltargetlist, which doesn't contain
these columns. This patch includes those columns as well.

3. Column projection:
Earlier patch required another layer of SQL to project whole-row attribute
from a base relation. This patch takes care of that while constructing and
deparsing
targetlist. This reduces the complexity and length of the query to be sent
to the foreign server e.g.

With the projection in previous patch the query looked like
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
  QUERY PLAN


... explain output clipped
   Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT
l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a10,
l.a12 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7
a16, c8 a17, ctid a7 FROM "S 1"."T 1") l) l (a1, a2, a3, a4) INNER JOIN
(SELECT ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a9
FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8
a17 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a3 = r.a2))

With this patch it looks like
EXPLAIN (COSTS false, VERBOSE)
SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
   QUERY
PLAN

... explain output clipped
   Remote SQL: SELECT l.a3, l.a4, l.a1, l.a2, r.a2 FROM (SELECT
"C 1", c3, ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8) FROM "S 1"."T 1") l
(a1, a2, a3, a4) INNER JOIN (SELECT "C 1", ROW("C 1", c2, c3, c4, c5, c6,
c7, c8) FROM "S 1"."T 1") r (a1, a2) ON (TRUE) WHERE ((l.a1 = r.a1))
(9 rows)

4. Local cost estimation
Previous patch had a TODO left for estimating join cost locally, when
use_remote_estimate is false. This patch adds support for the same. The
relevant
discussion in mail thread [2], [3].

5. This patch adds a GUC enable_foreignjoin to enable or disable join
pushdown through core.

6. Added more tests to test lateral references, unsafe to push conditions
at various places in the query,

Many cosmetic improvements like adding static function declarations,
comment improvements and making code readable.

[1]
http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_wagh2rgzpg0o4pqgd+iauyaj8wtze+cyj...@mail.gmail.com
[2]
http://www.postgresql.org/message-id/cafjfprcqswus+tb5iyp1m3c-w0k3xab6h5mw4+n2q2iuafs...@mail.gmail.com
[3]
http://www.postgresql.org/message-id/CAFjFpRepSC2e3mZ1uYSopJD6R19fOZ0dNNf9Z=gnyksb6wg...@mail.gmail.com

I will be working next on (in that order)
1. eval_plan_qual fix for foreign join. (Considered as a must-have for 9.6)
2. Pushing down ORDER BY clause along with join pushdown
3. Parameterization of foreign join paths (Given the complexity of the
feature this may not make it into 9.6)

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Andres Freund

On 2016-01-18 10:18:34 +0900, Michael Paquier wrote:
> We are trying to hide away from non-superusers WAL-related information
> in system views and system function, that's my point to do the same
> here.

We are? pg_current_xlog_insert_location(), pg_current_xlog_location(),
pg_is_xlog_replay_paused(), pg_stat_bgwriter ... are all non-superuser?

> For the data of pg_control, it seems to me that this can give
> away to any authorized users hints regarding the way Postgres is
> built, perhaps letting people know for example which Linux
> distribution is used and which flavor of Postgres is used (we already
> give away some information with version() but that's different than
> the libraries this is linking to), so an attacker may be able to take
> advantage of that to do attacks on potentially outdated packages? And
> I would think that many users are actually going to revoke the access
> of those functions to public if we are going to make them
> world-visible. It is easier as well to restrict things first, and then
> relax if necessary, than the opposite as well.

Meh, that seems pretty far into pseudo security arguments.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Death by regexp_replace

2016-01-18 Thread Benedikt Grundmann

thanks

On Fri, Jan 15, 2016 at 7:22 PM, Devrim Gündüz  wrote:

> Hi,
>
> That is the version of *repo* RPM, not PostgreSQL itself.Once you install
> it, you can grab the latest version with
>
> yum install postgresql92-server
>
> Regards, Devrim
>
> On January 15, 2016 7:48:53 PM GMT+02:00, Robert Haas <
> robertmh...@gmail.com> wrote:
>>
>>  Hmm I just wanted to get the rpm for the latest 9.2 release for centos6 but
>>>  it looks like you haven't released at least the link on this page for 9.2
>>>
>>>  http://yum.postgresql.org/repopackages.php
>>>
>>>  says 7 in the filename which is certainly not 14 ;-)
>>>
>>>  
>>> http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-7.noarch.rpm
>>>
>>>  Is that expected?
>>>
>>
>> Adding Devrim, who I believe maintains that stuff.
>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

Re: [HACKERS] extend pgbench expressions with functions

2016-01-18 Thread Fabien COELHO




OK, so I had an extra look at this patch and I am marking it as ready
for committer.


Ok.


- INT64_MIN / -1 throws a core dump, and errors on HEAD. I think this
should be fixed, Fabien does not.


Yep. Another point about this one is that it is not related to this patch 
about functions.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-01-18 Thread Shulgin, Oleksandr

On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane  wrote:
>
>> "Shulgin, Oleksandr"  writes:
>> > This post summarizes a few weeks of research of ANALYZE statistics
>> > distribution on one of our bigger production databases with some
>> real-world
>> > data and proposes a patch to rectify some of the oddities observed.
>>
>> Please add this to the 2016-01 commitfest ...
>>
>
> Added: https://commitfest.postgresql.org/8/434/
>

It would be great if some folks could find a moment to run the queries I
was showing on their data to confirm (or refute) my findings, or to
contribute to the picture in general.

As I was saying, the queries were designed in such a way that even
unprivileged user can run them (the results will be limited to the stats
data available to that user, obviously; and for custom-tailored statstarget
one still needs superuser to join the pg_statistic table directly).  Also,
on the scale of ~30k attribute statistics records, the queries take only a
few seconds to finish.

Cheers!
--
Alex

Re: [HACKERS] 2016-01 Commitfest

2016-01-18 Thread Alvaro Herrera

Two weeks into the commitfest, things have moved a bit:

 Needs review: 53.
 Waiting on Author: 20.
 Ready for Committer: 10.
 Committed: 16.
Total: 99.  https://commitfest.postgresql.org/8/

We have two committed patches since last report -- not a lot for a whole
week.  We've managed to review a few patches though: WoA went from 14 to
20.  Patch authors have something to do rather just twiddle their
thumbs^W^W^W review others' patches.

Remember: reviewing patches where somebody else has marked themselves as
reviewer is still welcome.  You could spot things the other person might
miss.

Committers: There are patches for you to push!  Go pick one up right now!
(And remember: marking yourself as committer for a patch makes it
unlikely for anybody else to review/commit it.  If you do not intend to
review+commit such a patch soon, it's probably better to remove yourself
as committer.)

Many patches seem stalled without review:
   Table partition + join pushdown  https://commitfest.postgresql.org/8/325/
   multivariate statistics  https://commitfest.postgresql.org/8/450/
   Implement failover on libpq levelhttps://commitfest.postgresql.org/8/389/
   Unique Joins https://commitfest.postgresql.org/8/129/
   Replace buffer manager spinlock with atomic operations
https://commitfest.postgresql.org/8/408/

Reviews of those patches would be useful, to keep things moving forward.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 11:09 AM, Alvaro Herrera
 wrote:
> Amit Kapila wrote:
>
>> The reason for not updating the patch related to this thread is that it is
>> dependent on the work for refactoring the tranches for LWLocks [1]
>> which is now coming towards an end, so I think it is quite reasonable
>> that the patch can be updated for this work during commit fest, so
>> I am moving it to upcoming CF.
>
> Thanks.  I think the tranche reworks are mostly done now, so is anyone
> submitting an updated version of this patch?
>
> Also, it would be very good if someone can provide insight on how this
> patch interacts with the other submitted patch for "waiting for
> replication" https://commitfest.postgresql.org/8/436/
> Andres seems to think that the other patch is completely independent of
> this one, i.e. the "waiting for replication" column needs to exist
> separately and not as part of the "more descriptive" new 'waiting'
> column.

Yeah, I really don't agree with that.  I think that it's much better
to have one column that says what you are waiting for than a bunch of
separate columns that tell you whether you are waiting for individual
things for which you might be waiting.  I think this patch, which
introduces the general mechanism, should win: and the other patch
should then be one client of that mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

2016-01-18 Thread Atri Sharma

On Mon, Jan 18, 2016 at 11:05 PM, Alvaro Herrera 
wrote:

> Two weeks into the commitfest, things have moved a bit:
>
>  Needs review: 53.
>  Waiting on Author: 20.
>  Ready for Committer: 10.
>  Committed: 16.
> Total: 99.  https://commitfest.postgresql.org/8/
>
> We have two committed patches since last report -- not a lot for a whole
> week.  We've managed to review a few patches though: WoA went from 14 to
> 20.  Patch authors have something to do rather just twiddle their
> thumbs^W^W^W review others' patches.
>
> Remember: reviewing patches where somebody else has marked themselves as
> reviewer is still welcome.  You could spot things the other person might
> miss.
>
> Committers: There are patches for you to push!  Go pick one up right now!
> (And remember: marking yourself as committer for a patch makes it
> unlikely for anybody else to review/commit it.  If you do not intend to
> review+commit such a patch soon, it's probably better to remove yourself
> as committer.)
>
> Many patches seem stalled without review:
>Table partition + join pushdown
> https://commitfest.postgresql.org/8/325/
>multivariate statistics
> https://commitfest.postgresql.org/8/450/
>Implement failover on libpq level
> https://commitfest.postgresql.org/8/389/
>Unique Joins
> https://commitfest.postgresql.org/8/129/
>Replace buffer manager spinlock with atomic operations
>
> https://commitfest.postgresql.org/8/408/
>
>
I am guilty of ignoring unique joins patch. I am still up for reviewing it
later this week or weekend, but will not stall anybody else from picking it
up.

Apologies!


-- 
Regards,

Atri
*l'apprenant*

Re: [HACKERS] Removing service-related code in pg_ctl for Cygwin

2016-01-18 Thread Alvaro Herrera

Andrew Dunstan wrote:

> I think we can be a bit more adventurous and remove all the Cygwin-specific
> code. See attached patch, which builds fine on buildfarm cockatiel.

Hopefully you also tested that it builds under MSVC :-)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

2016-01-18 Thread Alvaro Herrera

Andres Freund wrote:
> On December 4, 2015 9:48:55 AM GMT+01:00, Craig Ringer 
>  wrote:
> >On 3 December 2015 at 22:58, Amit Kapila 
> >wrote:
> >
> >> On Thu, Dec 3, 2015 at 9:02 AM, Craig Ringer 
> >> wrote:
> 
> >http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=lwa...@mail.gmail.com
> >>
> >Good point. I'm not sure this shouldn't set 'waiting' anyway, though.
> 
> No. Waiting corresponds to pg locks -setting it to true for other things 
> would be even more confusing...

Robert's opinion elsewhere is relevant for this patch, and contradicts
Andres' opinion here:
http://www.postgresql.org/message-id/ca+tgmoaj+epoo9qobvfh18f5kjg2taexhq8_-vaywhur-za...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tom Lane

Robert Haas  writes:
> Yeah.  The API contract for an expanded object took me quite a while
> to puzzle out, but it seems to be this: if somebody hands you an R/W
> pointer to an expanded object, you're entitled to assume that you can
> "take over" that object and mutate it however you like.  But the
> object might be in some other memory context, so you have to move it
> into your own memory context.

Only if you intend to keep it --- for example, a function that is mutating
and returning an object isn't required to move it somewhere else, if the
input is R/W, and I think it generally shouldn't.

In the context here, I'd think it is the responsibility of nodeAgg.c
not individual datatype functions to make sure that expanded objects
live where it wants them to.

> Well, that's pretty odd.  I guess the plan change must be a result of
> switching the transition type from internal to text, although I'm not
> immediately certain why that would make a difference.

I'm pretty sure there's something in the planner that special-cases
its estimate of space consumed by hashtable entries when the data type
is "internal".  You'd be wanting to fool with that estimate anyway
for something like this ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trivial fixes for some IDENTIFICATION comment lines

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 12:01 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-01-18 12:52:06 +0100, Shulgin, Oleksandr wrote:
>>> I've noticed that in src/backend/replication/logical/reorderbuffer.c, the
>>> IDENTIFICATION comment line is incorrect:
>>>
>>> * IDENTIFICATION
>>> - *src/backend/catalog/dropcmds.c
>>> - *src/backend/replication/logicalfuncs.c
>>> - *src/backend/replication/reorderbuffer.c
>>> - *src/backend/replication/snapbuild.c
>>> - *src/backend/storage/ipc/dsm.c
>>> - *src/backend/utils/cache/relfilenode.c
>>> - *src/backend/utils/cache/evtcache.c
>
>> How about we simply drop them instead alltogether? This isn't exactly a
>> seldomly made mistake, and they seem to actually contribute very little
>> value?
>
> Personally I think they're of some value.  Particularly with stuff like
> Makefiles, which are otherwise confusingly similar.

I'm not unwilling to keep them around, but I tend to agree with
Andres.  My editor will happily tell me which file I'm editing, and it
will do so regardless of whether I am on the first page of the file or
the hundredth page.  The IDENTIFICATION comment isn't present in every
file and is occasionally wrong and in any event isn't displayed except
at the top.  So I find those comments a fairly useless nuisance --
it's just one more thing I have to adjust when adding a contrib module
or a directory someplace, or moving something around.  However, if you
want to keep them around, I don't have a big problem with that.  It's
not necessary for everyone to find exactly the same things useful that
I do.

(vim FTW!)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing service-related code in pg_ctl for Cygwin

2016-01-18 Thread Andrew Dunstan




On 01/14/2016 12:38 AM, Michael Paquier wrote:

Hi all,

Beginning a new thread seems more adapted regarding $subject but
that's mentioned here as well:
http://www.postgresql.org/message-id/CAB7nPqQXghm_SdB5iniupz1atzMxk=95gv9a8ocdo83sxcn...@mail.gmail.com

It happens based on some investigation from Andrew that cygwin does
not need to use the service-related code, aka register/unregister
options or similar that are proper to WIN32 and rely instead on
cygrunsrv with a SYSV-like init file to manage the service. Based on
my tests with cygwin, I am able to see as well that a server started
within a cygwin session is able to persist even after it is closed,
which is kind of nice and I think that it is a additional reason to
not use the service-related code paths. Hence what about the following
patch, that makes cygwin behave like any *nix OS when using pg_ctl?
This simplifies a bit the code.

Marco, as I think you do some packaging for Postgres in Cygwin, and
others, does that sound acceptable?





I think we can be a bit more adventurous and remove all the 
Cygwin-specific code. See attached patch, which builds fine on buildfarm 
cockatiel.


cheers

andrew



diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 919d764..98aa1a0 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -39,13 +39,6 @@
 #include "getopt_long.h"
 #include "miscadmin.h"
 
-#if defined(__CYGWIN__)
-#include 
-#include 
-/* Cygwin defines WIN32 in windows.h, but we don't want it. */
-#undef WIN32
-#endif
-
 /* PID can be negative for standalone backend */
 typedef long pgpid_t;
 
@@ -105,7 +98,7 @@ static char backup_file[MAXPGPATH];
 static char recovery_file[MAXPGPATH];
 static char promote_file[MAXPGPATH];
 
-#if defined(WIN32) || defined(__CYGWIN__)
+#ifdef WIN32
 static DWORD pgctl_start_type = SERVICE_AUTO_START;
 static SERVICE_STATUS status;
 static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0;
@@ -133,7 +126,7 @@ static void do_kill(pgpid_t pid);
 static void print_msg(const char *msg);
 static void adjust_data_dir(void);
 
-#if defined(WIN32) || defined(__CYGWIN__)
+#ifdef WIN32
 #if (_MSC_VER >= 1800)
 #include 
 #else
@@ -165,7 +158,7 @@ static void unlimit_core_size(void);
 #endif
 
 
-#if defined(WIN32) || defined(__CYGWIN__)
+#ifdef WIN32
 static void
 write_eventlog(int level, const char *line)
 {
@@ -207,20 +200,11 @@ write_stderr(const char *fmt,...)
 	va_list		ap;
 
 	va_start(ap, fmt);
-#if !defined(WIN32) && !defined(__CYGWIN__)
+#ifndef WIN32
 	/* On Unix, we just fprintf to stderr */
 	vfprintf(stderr, fmt, ap);
 #else
 
-/*
- * On Cygwin, we don't yet have a reliable mechanism to detect when
- * we're being run as a service, so fall back to the old (and broken)
- * stderr test.
- */
-#ifdef __CYGWIN__
-#define	pgwin32_is_service()	(isatty(fileno(stderr)))
-#endif
-
 	/*
 	 * On Win32, we print to stderr if running on a console, or write to
 	 * eventlog if running as a service
@@ -718,7 +702,7 @@ test_postmaster_connection(pgpid_t pm_pid, bool do_checkpoint)
 #endif
 
 		/* No response, or startup still in process; wait */
-#if defined(WIN32)
+#ifdef WIN32
 		if (do_checkpoint)
 		{
 			/*
@@ -1342,7 +1326,7 @@ do_kill(pgpid_t pid)
 	}
 }
 
-#if defined(WIN32) || defined(__CYGWIN__)
+#ifdef WIN32
 
 #if (_MSC_VER < 1800)
 static bool
@@ -1408,20 +1392,6 @@ pgwin32_CommandLine(bool registration)
 		}
 	}
 
-#ifdef __CYGWIN__
-	/* need to convert to windows path */
-	{
-		char		buf[MAXPGPATH];
-
-#if CYGWIN_VERSION_DLL_MAJOR >= 1007
-		cygwin_conv_path(CCP_POSIX_TO_WIN_A, cmdPath, buf, sizeof(buf));
-#else
-		cygwin_conv_to_full_win32_path(cmdPath, buf);
-#endif
-		strcpy(cmdPath, buf);
-	}
-#endif
-
 	/* if path does not end in .exe, append it */
 	if (strlen(cmdPath) < 4 ||
 		pg_strcasecmp(cmdPath + strlen(cmdPath) - 4, ".exe") != 0)
@@ -1775,10 +1745,8 @@ CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_ser
 	if (!OpenProcessToken(GetCurrentProcess(), TOKEN_ALL_ACCESS, ))
 	{
 		/*
-		 * Most Windows targets make DWORD a 32-bit unsigned long.  Cygwin
-		 * x86_64, an LP64 target, makes it a 32-bit unsigned int.  In code
-		 * built for Cygwin as well as for native Windows targets, cast DWORD
-		 * before printing.
+		 * Most Windows targets make DWORD a 32-bit unsigned long, but
+		 * in case it doesn't cast DWORD before printing.
 		 */
 		write_stderr(_("%s: could not open process token: error code %lu\n"),
 	 progname, (unsigned long) GetLastError());
@@ -1819,10 +1787,6 @@ CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_ser
 		return 0;
 	}
 
-#ifndef __CYGWIN__
-	AddUserToTokenDacl(restrictedToken);
-#endif
-
 	r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, CREATE_SUSPENDED, NULL, NULL, , processInfo);
 
 	Kernel32Handle = LoadLibrary("KERNEL32.DLL");
@@ -1926,7 +1890,7 @@ CreateRestrictedProcess(char *cmd, PROCESS_INFORMATION *processInfo, bool as_ser
 	 */
 	return r;
 }

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Yeah.  The API contract for an expanded object took me quite a while
>> to puzzle out, but it seems to be this: if somebody hands you an R/W
>> pointer to an expanded object, you're entitled to assume that you can
>> "take over" that object and mutate it however you like.  But the
>> object might be in some other memory context, so you have to move it
>> into your own memory context.
>
> Only if you intend to keep it --- for example, a function that is mutating
> and returning an object isn't required to move it somewhere else, if the
> input is R/W, and I think it generally shouldn't.
>
> In the context here, I'd think it is the responsibility of nodeAgg.c
> not individual datatype functions to make sure that expanded objects
> live where it wants them to.

That's how I did it in my prototype, but the problem with that is that
spinning up a memory context for every group sucks when there are many
groups with only a small number of elements each - hence the 50%
regression that David Rowley observed.  If we're going to use expanded
objects here, which seems like a good idea in principle, that's going
to have to be fixed somehow.  We're flogging the heck out of malloc by
repeatedly creating a context, doing one or two allocations in it, and
then destroying the context.

I think that, in general, the memory context machinery wasn't really
designed to manage lots of small contexts.  The overhead of spinning
up a new context for just a few allocations is substantial.  That
matters in some other situations too, I think - I've commonly seen
AllocSetContextCreate taking several percent  of runtime in profiles.
But here it's much exacerbated.  I'm not sure whether it's better to
attack that problem at the root and try to make AllocSetContextCreate
cheaper, or whether we should try to figure out some change to the
expanded-object machinery to address the issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanded Objects and Order By

2016-01-18 Thread Paul Ramsey

I have a size/flatten callback setup (and they are very careful not to write 
outside their boundaries), so that’s all OK. 
Since you’re not seeing anything “aha” in the error pattern, I’ll go back to 
the mats on memory… is there a good page on valgriding postgresql? I thought 
the memory manager papered over things so much that valgrind couldn’t see what 
was going on under the covers.

Thanks!

P

> On Jan 18, 2016, at 8:36 AM, Tom Lane  wrote:
> 
> Paul Ramsey  writes:
>> So, I've added support for converting postgis in-memory objects into
>> expanded outputs, and have overwritten the usual
>> lwgeom_to_gserialized() function with one that creates an expanded
>> object. I haven't done anything to actually handle expanded objects on
>> input, but as I understand it, that's fine, everything is supposed to
>> work as normal when handed expanded objects. And thus far, that has
>> been true, almost all regression tests work fine.
> 
> Hmm ... you do need to be able to flatten them again.  In the given
> example, the parser is going to want to form a Const node whose Datum
> value is a geometry object, and that Const node needs to be copiable
> by copyObject(), which means datumCopy() has to work, and if you look
> at that it will exercise EOH_get_flat_size/EOH_flatten_into when the
> input routine originally produced an expanded object.
> 
> The error message is very strange; it's hard to see how toying with the
> internal representation of Consts could cause that.  I think the
> hypothesis of a memory clobber is stronger than you give it credit for,
> especially since you see the behavior changing for seemingly unrelated
> reasons.  Valgrind might be a useful tool here.
> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better

2016-01-18 Thread Bruce Momjian

On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:
> Commit 803716013dc1350f installed a safeguard against loading plpython2
> and plpython3 at the same time, stating
> 
> +   It is not allowed to use PL/Python based on Python 2 and PL/Python
> +   based on Python 3 in the same session, because the symbols in the
> +   dynamic modules would clash, which could result in crashes of the
> +   PostgreSQL server process.  There is a check that prevents mixing
> +   Python major versions in a session, which will abort the session if
> +   a mismatch is detected.  It is possible, however, to use both
> +   PL/Python variants in the same database, from separate sessions.
> 
> It turns out though that the freedom promised in the last sentence
> is fairly illusory, because if you have functions in both languages
> in one DB and you try a dump-and-restore, it will fail.
> 
> But it gets worse: a report today in pgsql-general points out that
> even if you have the two languages in use *in different databases*,
> pg_upgrade will fail, because pg_upgrade rather overenthusiastically
> loads every .so mentioned anywhere in the source installation into
> one session.

The fix for that would be for pg_upgrade to change
check_loadable_libraries() to start a new session for each LOAD command.
Would you like that done?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Table partition + join pushdown

2016-01-18 Thread Robert Haas

On Tue, Dec 22, 2015 at 8:36 AM, Michael Paquier
 wrote:
> On Fri, Nov 20, 2015 at 9:05 PM, Taiki Kondo  wrote:
>> I created v3 patch for this feature, and v1 patch for regression tests.
>> Please find attached.
>>
>> [blah review and replies]
>>
>> Please find from attached patch.
>
> This new patch did not actually get a review, moved to next CF.

I think this patch is doomed.  Suppose you join A to B on A.x = B.y.
The existence of a constraint on table A which says CHECK(P(x)) does
not imply that only rows from y where P(y) is true will join.  For
example, suppose that x and y are numeric columns and P(x) is
length(x::text) == 3.  Then you could have 1 in one table and 1.0 in
the table; they join, but P(x) is true for one and false for the
other.  The fundamental problem is that equality according to the join
operator need not mean equality for all purposes.  1 and 1.0, as
numerics, are equal, but not the same.  Since the whole patch is based
on this idea, I believe that means this patch is dead in the water.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2016-01-18 Thread Robert Haas

On Sun, Jan 17, 2016 at 6:58 PM, Stephen Frost  wrote:
> I'm not against that idea, though I continue to feel that there are
> common sets of privileges which backup tools could leverage.
>
> The other issue that I'm running into, again, while considering how to
> move back to ACL-based permissions for these objects is that we can't
> grant out the actual permissions which currently exist.  That means we
> either need to break backwards compatibility, which would be pretty
> ugly, in my view, or come up with new functions and then users will have
> to know which functions to use when.
>
> As I don't think we really want to break backwards compatibility or
> remove existing functionality, the only approach which is going to make
> sense is to add additional functions in some cases.  In particular, we
> will need alternate versions of pg_terminate_backend and
> pg_cancel_backend.  One thought I had was to make that
> 'pg_signal_backend', but that sounds like we'd allow any signal sent by
> a user with that right, which seems a bit much to me...

So, this seems like a case where a built-in role would be
well-justified.  I don't really believe in built-in roles as a way of
bundling related permissions; I know you do, but I don't.  I'd rather
see the individual function permissions granted individually.  But
here you are talking about a variable level of access to the same
function, depending on role.  And for that it seems to me that a
built-in role has a lot more to recommend it in that case than it does
in the other.  If you have been granted pg_whack, you can signal any
process on the system; otherwise just your own.  Those checks are
internal to pg_terminate_backend/pg_cancel_backend so GRANT is not a
substitute.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better

2016-01-18 Thread Tom Lane

Bruce Momjian  writes:
> On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:
>> But it gets worse: a report today in pgsql-general points out that
>> even if you have the two languages in use *in different databases*,
>> pg_upgrade will fail, because pg_upgrade rather overenthusiastically
>> loads every .so mentioned anywhere in the source installation into
>> one session.

> The fix for that would be for pg_upgrade to change
> check_loadable_libraries() to start a new session for each LOAD command.
> Would you like that done?

Not necessary anymore, at least not for PL/Python; and that solution
sounds a tad expensive.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanded Objects and Order By

2016-01-18 Thread Tom Lane

Paul Ramsey  writes:
> Since youâre not seeing anything âahaâ in the error pattern, Iâll go 
> back to the mats on memoryâ¦ is there a good page on valgriding postgresql?

https://wiki.postgresql.org/wiki/Valgrind

> I thought the memory manager papered over things so much that valgrind 
> couldnât see what was going on under the covers.

That used to be true, but it's a lot better now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Truncating/vacuuming relations on full tablespaces

2016-01-18 Thread Robert Haas

On Fri, Jan 15, 2016 at 2:16 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Now, I do think it's a somewhat fair complaint that if you have a
>> tablespace which is totally, 100% full, recovery is difficult.  You
>> can probably DROP the table, but TRUNCATE might fail, and so might
>> VACUUM.  You could argue that DROP is usually a good substitute for
>> TRUNCATE, although there could be dependencies, but DROP is certainly
>> not a good substitute for VACUUM.  We could address the VACUUM case by
>> having an optional argument to VACUUM which tells it to skip VM and
>> FSM maintenance, presumably to be used only in case of emergency.  I'm
>> not sure if it's worth having for what is presumably  a pretty rare
>> case, but it seems like it could be done.
>
> Presumably the hope would be that VACUUM would truncate off some of the
> heap, else there's not much good that's going to happen.  That leaves
> me wondering exactly what the invariant is for the maps, and if it's
> okay to not touch them during a heap truncation.

No, you're missing the point, or at least I think you are.  Suppose
somebody creates a big table and then deletes all the tuples in the
second half, but VACUUM never runs.  When at last VACUUM does run on
that table, it will try to build the VM and FSM forks as it scans the
table, and will only truncate AFTER that has been done.  If building
the VM and FSM forks fails because there is no freespace, we will
never reach the part of the operation that could create some.

The key point is that both the VM and the FSM are optional.  If
there's no VM, vacuum will have to visit every page in the table and
index-only scans will never be index-only, but everything still works.
If there's no FSM, we'll assume the table has no internal freespace,
so inserts will extend the table.  Those consequences are bad, of
course, so we really want vacuum to succeed in creating the VM and
FSM.  However, when a failure creating the FSM or VM causes us not to
reach the truncation step, then there's no way to shrink the table.
That's not good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Improve spinlock inline assembly for x86.

2016-01-18 Thread Robert Haas

On Sun, Jan 17, 2016 at 6:38 AM, Andreas Seltenreich  wrote:
> I'm currently experimenting with just-in-time compilation using libfirm.
> While discussing issues with its developers, it was pointed out to me
> that our spinlock inline assembly is less than optimal.  Attached is a
> patch that addresses this.
>
> ,
> | Remove the LOCK prefix from the XCHG instruction.  Locking is implicit
> | with XCHG and the prefix wastes a byte.  Also remove the "cc" register
> | from the clobber list as the XCHG instruction does not modify any flags.
> |
> | Reported by Christoph Mallon.
> `

I did a Google search and everybody seems to agree that the LOCK
prefix is redundant.  I found a source agreeing with the idea that it
doesn't clobber registers, too:

http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_6/CH06-1.html#HEADING1-85

So I guess it would be safe to change this.  Scares me slightly, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: speeding up GIN build with parallel workers

2016-01-18 Thread Robert Haas

On Fri, Jan 15, 2016 at 5:38 PM, Constantin S. Pan  wrote:
> In current state the implementation is just a proof of concept
> and it has all the configuration hardcoded, but it already works as is,
> though it does not speed up the build process more than 4 times on my
> configuration (12 CPUs). There is also a problem with temporary tables,
> for which the parallel mode does not work.

I have yet to see a case where parallel query, with any current or
pending patch, gets more than about a 4x speedup.  I can't think of
any reason that there should be a wall at 4x, and I'm not sure we're
hitting the wall there for the same reason in all cases.  But your
observation corresponds to my experience.

I hope we eventually figure out how to make that much better, but I
wouldn't feel too bad about being at that spot now.  4x faster is
still a lot faster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot find a working 64-bit integer type

2016-01-18 Thread Robert Haas

On Sun, Jan 17, 2016 at 5:45 PM, Igal @ Lucee.org  wrote:
> UPDATE: when I ran: configure --without-zlib --enable-debug
> CFLAGS="-Wno-cpp"
>
> I did not get an error from configure (though I get an error from "make" but
> that's another issue)
>
> I'm not sure what I'm "losing" by passing the "no-cpp" compiler flag?

According to 'man gcc':

   -Wno-cpp
   (C, Objective-C, C++, Objective-C++ and Fortran only)

   Suppress warning messages emitted by "#warning" directives.

So apparently on your system configure fails the test for a 64-bit
integer type because a #warning is emitted, and compiling with
-Wno-cpp gets rid of that (probably without breaking anything else).
The relevant portion of config.log seems to be this:

configure:13285: gcc -o conftest.exe -Wall -Wmissing-prototypes
-Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -O2
-I/home/Admin/sources/postgresql-9.5.0/src/include/port/win32
-DEXEC_BACKEND -Wl,--allow-multiple-definition
-Wl,--disable-auto-import conftest.c -lz -lws2_32 -lm >&5
conftest.c:106:5: warning: no previous prototype for 'does_int64_work'
[-Wmissing-prototypes] int does_int64_work()
conftest.c:120:1: warning: return type defaults to 'int' [-Wimplicit-int]
conftest.c: In function 'main':
conftest.c:121:3: warning: implicit declaration of function 'exit'
[-Wimplicit-function-declaration]
conftest.c:121:3: warning: incompatible implicit declaration of
built-in function 'exit'
conftest.c:121:3: note: include '' or provide a declaration of 'exit'
C:/Apps/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
reopening conftest.exe: Permission denied
C:/Apps/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
final link failed: Permission denied collect2.exe: error: ld returned
1 exit status
configure:13285: $? = 1 configure: program exited with status 1

I'm a little confused as to why -Wno-cpp fixes any of that, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-18 Thread Bruce Momjian

On Wed, Jan 13, 2016 at 10:47:18AM -0500, Tom Lane wrote:
> Vladimir Sitnikov  writes:
> > Note: I state that mixing "kinds" of bind values is a bad application
> > design anyway. In other words, application developer should understand
> > if a query is DWH-like (requires replans) or OLTP-like (does not
> > require replans). Agreed?
> 
> No, not agreed.  As was already pointed out upthread, such information
> is not available in many use-cases for the plancache.
> 
> The real problem here IMO is inaccurate plan cost estimates, and that's
> not something that there is any easy fix for.
> 
> However ... one specific aspect of that is that to some extent, the cost
> estimate made for the generic plan is incommensurate with the estimates
> for the custom plans because the latter are made with more information.
> I don't remember the details of your specific case anymore, but we've
> seen cases where the generic plan is falsely estimated to be cheaper
> than custom plans because of this.

I never understood why we don't just keep the selectivity estimates of
previous plans and just reuse the plan if the selectivity estimates are
similar.  Isn't parameter selectivity the only thing that distinguishes
on plan with parameter from another?

Checking selectivity estimates must be cheaper than replanning.  This
could be done at the second use of the prepared plan, and maybe for all
plan reuses, rather than waiting for five and then perhaps getting this
bad behavior.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Trivial fixes for some IDENTIFICATION comment lines

2016-01-18 Thread Andres Freund

On 2016-01-18 12:52:06 +0100, Shulgin, Oleksandr wrote:
> I've noticed that in src/backend/replication/logical/reorderbuffer.c, the
> IDENTIFICATION comment line is incorrect:
> 
>  * IDENTIFICATION
> - * src/backend/catalog/dropcmds.c
> - * src/backend/replication/logicalfuncs.c
> - * src/backend/replication/reorderbuffer.c
> - * src/backend/replication/snapbuild.c
> - * src/backend/storage/ipc/dsm.c
> - * src/backend/utils/cache/relfilenode.c
> - * src/backend/utils/cache/evtcache.c

How about we simply drop them instead alltogether? This isn't exactly a
seldomly made mistake, and they seem to actually contribute very little
value?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread David Rowley

On 18 January 2016 at 16:44, Robert Haas  wrote:

> On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
>  wrote:
> > hmm, so wouldn't that mean that the transition function would need to
> (for
> > each input tuple):
> >
> > 1. Parse that StringInfo into tokens.
> > 2. Create a new aggregate state object.
> > 3. Populate the new aggregate state based on the tokenised StringInfo,
> this
> > would perhaps require that various *_in() functions are called on each
> > token.
> > 4. Add the new tuple to the aggregate state.
> > 5. Build a new StringInfo based on the aggregate state modified in 4.
> >
> > ?
>
> I don't really know what you mean by parse the StringInfo into tokens.
> The whole point of the expanded-object interface is to be able to keep
> things in an expanded internal form so that you *don't* have to
> repeatedly construct and deconstruct internal data structures.

That was a response to Haribabu proposal, although perhaps I misunderstood
that. However I'm not sure how a PolyNumAggState could be converted to a
string and back again without doing any string parsing.

I worked up an example of this approach using string_agg(), which I
> attach here.  This changes the transition type of string_agg() from
> internal to text.  The same code would work for bytea_string_agg(),
> which would allow removal of some other code, but this patch doesn't
>
do that, because the point of this is to elucidate the approach.
>
>
Many thanks for working up that patch. I was clearly missing what you meant
previously. I understand it much better now. Thank you for taking the time
on that.

> In my tests, this seems to be slightly slower than what we're doing
> today; worst of all, it adds a handful of cycles to
> advance_transition_function() even when the aggregate is not an
> expanded object or, indeed, not even pass-by-reference.  Some of this
> might be able to be fixed by a little massaging - in particular,
> DatumIsReadWriteExpandedObject() seems like it could be partly or
> entirely inlined, and maybe there's some other way to improve the
> coding here.
>

It also seems that an expanded object requires a new memory context which
must be malloc()d and free()d. This has added quite an overhead in my
testing. I assume that we must be doing that so that we can ensure that all
memory is properly free()d once we're done with the expanded-object.

create table ab(a int, b text);
insert into ab select x,'aaa' from
generate_Series(1,100) x(x);
set work_mem='1GB';
vacuum analyze ab;

explain analyze select a%100,length(string_agg(b,',')) from ab group by
1;

Patched
1521.045 ms
1515.905 ms
1530.920 ms

Master
932.457 ms
959.285 ms
991.021 ms

Although performance of the patched version is closer to master when we
have less groups, I felt it necessary to show the extreme case. I feel this
puts a bit of a dampener on the future of this idea, as I've previously had
patches rejected for adding 2-5% on planning time alone, adding over 50%
execution time seems like a dead-end.

I've run profiles on this and malloc() and free() are both top of the
profile with the patched version with about 30% CPU time each.

> Generally, I think finding a way to pass expanded objects through
> nodeAgg.c would be a good thing to pursue, if we can make it work.
> The immediate impetus for changing things this way would be that we
> wouldn't need to add a mechanism for serializing and deserializing
> internal functions just to pass around partial aggregates.  But
> there's another advantage, too: right now,
> advance_transition_function() does a lot of data copying to move data
> from per-call context to the per-aggregate context.  When an expanded
> object is in use, this can be skipped.

The part I quite liked about the serialize/deserialize is that there's no
need to add any overhead at all for serializing and deserializing the
states when the aggregation is done in a single backend process. We'd
simply just have the planner pass the make_agg()'s serializeStates as false
when we're working within a single backend. This does not appear possible
with your proposed implementation, since it makes changes to each
transition function. It is my understanding that we normally bend over
backwards with new code to try and stop any performance regression. I'm not
quite sure the expanded-object idea works well in this regard, but I do
agree your approach seems neater. I just don't want to waste my time
writing all the code required to replace all INTERNAL aggregate states when
I know the performance regression is going to be unacceptable.

I also witnessed another regression with your patch when testing on another
machine. It caused the plan to change to a HashAgg instead of GroupAgg
causing a significant slowdown.

Unpatched

# explain analyze select a%100,length(string_agg(b,',')) from ab group
by 1;
QUERY PLAN

[HACKERS] Trivial fixes for some IDENTIFICATION comment lines

2016-01-18 Thread Shulgin, Oleksandr

Hello,

I've noticed that in src/backend/replication/logical/reorderbuffer.c, the
IDENTIFICATION comment line is incorrect:

 * IDENTIFICATION
 *  src/backend/replication/reorderbuffer.c

By using a simple find+grep command I can see this is also the case for the
following files:

$ find src -name \*.c -o -name \*.h | xargs grep -A1 IDENTIFICATION | grep
-v -E 'IDENTIFICATION|--' | grep -v '^\(src/.*\.[ch]\)-\s*\*\s*\1\s*$'

src/include/utils/evtcache.h- *   src/backend/utils/cache/evtcache.c
src/backend/utils/cache/relfilenodemap.c- *
src/backend/utils/cache/relfilenode.c
src/backend/utils/adt/version.c- *
src/backend/storage/ipc/dsm_impl.c- * src/backend/storage/ipc/dsm.c
src/backend/replication/logical/reorderbuffer.c- *
 src/backend/replication/reorderbuffer.c
src/backend/replication/logical/snapbuild.c- *
 src/backend/replication/snapbuild.c
src/backend/replication/logical/logicalfuncs.c- *
src/backend/replication/logicalfuncs.c
src/backend/commands/dropcmds.c- *src/backend/catalog/dropcmds.c

The one wtih version.c is a false positive: there's just an extra blank
line in the comment.

A patch to fix the the above is attached.

Another minor thing is that if there is a convention for the order of
Copyright, NOTES and IDENTIFICATION, then it is not followed strictly.
Compare e.g. reorderbuffer.c vs. snapbuild.c.

--
Alex
From 4e71c026be7d1a75710fe89177d0dd37d8d84421 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Mon, 18 Jan 2016 12:49:32 +0100
Subject: [PATCH] Assorted IDENTIFICATION comment line fixes.

---
 src/backend/commands/dropcmds.c | 2 +-
 src/backend/replication/logical/logicalfuncs.c  | 2 +-
 src/backend/replication/logical/reorderbuffer.c | 2 +-
 src/backend/replication/logical/snapbuild.c | 2 +-
 src/backend/storage/ipc/dsm_impl.c  | 2 +-
 src/backend/utils/cache/relfilenodemap.c| 2 +-
 src/include/utils/evtcache.h| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c
index 522027a..20b42fe 100644
--- a/src/backend/commands/dropcmds.c
+++ b/src/backend/commands/dropcmds.c
@@ -8,7 +8,7 @@
  *
  *
  * IDENTIFICATION
- *	  src/backend/catalog/dropcmds.c
+ *	  src/backend/commands/dropcmds.c
  *
  *-
  */
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 56e47e4..562c8f6 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -9,7 +9,7 @@
  * Copyright (c) 2012-2016, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/replication/logicalfuncs.c
+ *	  src/backend/replication/logical/logicalfuncs.c
  *-
  */
 
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 78acced..7402f20 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -8,7 +8,7 @@
  *
  *
  * IDENTIFICATION
- *	  src/backend/replication/reorderbuffer.c
+ *	  src/backend/replication/logical/reorderbuffer.c
  *
  * NOTES
  *	  This module gets handed individual pieces of transactions in the order
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 97c1ad4..1dd302b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -99,7 +99,7 @@
  * Copyright (c) 2012-2016, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
- *	  src/backend/replication/snapbuild.c
+ *	  src/backend/replication/logical/snapbuild.c
  *
  *-
  */
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
index 5d7b46f..297cfdd 100644
--- a/src/backend/storage/ipc/dsm_impl.c
+++ b/src/backend/storage/ipc/dsm_impl.c
@@ -41,7 +41,7 @@
  *
  *
  * IDENTIFICATION
- *	  src/backend/storage/ipc/dsm.c
+ *	  src/backend/storage/ipc/dsm_impl.c
  *
  *-
  */
diff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c
index 894dacb..bef4d5b 100644
--- a/src/backend/utils/cache/relfilenodemap.c
+++ b/src/backend/utils/cache/relfilenodemap.c
@@ -7,7 +7,7 @@
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  * IDENTIFICATION
- *	  src/backend/utils/cache/relfilenode.c
+ *	  src/backend/utils/cache/relfilenodemap.c
  *
  *-
  */
diff --git a/src/include/utils/evtcache.h b/src/include/utils/evtcache.h
index 7c8da74..76d2519 100644
---

[HACKERS] jsonb array-style subscription

2016-01-18 Thread Dmitry Dolgov

Hi,

Here is a reworked version of patch for jsonb subscription.
There weren't many changes in functionality:

=# create TEMP TABLE test_jsonb_subscript (
   id int,
   test_json jsonb
);

=# insert into test_jsonb_subscript values
(1, '{}'),
(2, '{}');

=# update test_jsonb_subscript set test_json['a'] = 42;
=# select * from test_jsonb_subscript;
 id |test_json
+--
  1 | {"a": 42}
  2 | {"a": 42}
(2 rows)

=# select test_json['a'] from test_jsonb_subscript;
 test_json

 {"a": 42}
 {"a": 42}
(2 rows)

I've cleaned up the code, created a separate JsonbRef node (and there are a
lot of small changes because of that), abandoned an idea of "deep nesting"
of assignments (because it doesn't relate to jsonb subscription, is more
about the
"jsonb_set" function, and anyway it's not a good idea). It looks fine for
me, and I need a little guidance - is it ok to propose this feature for
commitfest 2016-03 for a review?


jsonb_subscription.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-18 Thread Thom Brown

On 18 January 2016 at 10:46, Ashutosh Bapat
 wrote:
> Hi All,
> PFA patches for postgres_fdw join pushdown, taken care of all TODOs in my
> last mail.
>
> Here is the list of things that have been improved/added new as compared to
> Hanada-san's previous patch at [1].
>
> 1. Condition handling for join
> Patch in [1] allowed a foreign join to be pushed down if only all the
> conditions were safe to push down to the foreign server. This patch
> differentiates these conditions into 1. conditions to be applied while
> joining (ON clause) 2. conditions to be applied after joining (WHERE
> clause). For a join to be safe to pushdown, only conditions in 1 need to be
> all safe to pushdown. The conditions in second category, which are not safe
> to be pushed down can be applied locally. This allows more avenue for join
> pushdown. For an INNER join all the conditions can be applied on the cross
> product. Hence we can push down an INNER join even if one or more of the
> conditions are not safe to be pushed down. This patch includes the
> optimization as well.
>
> 2. Targetlist handling:
> The columns required to evaluate the non-pushable conditions on a join
> relation need to be fetched from the foreign server. In previous patch the
> SELECT clauses were built from rel->reltargetlist, which doesn't contain
> these columns. This patch includes those columns as well.
>
> 3. Column projection:
> Earlier patch required another layer of SQL to project whole-row attribute
> from a base relation. This patch takes care of that while constructing and
> deparsing
> targetlist. This reduces the complexity and length of the query to be sent
> to the foreign server e.g.
>
> With the projection in previous patch the query looked like
> EXPLAIN (COSTS false, VERBOSE)
> SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
> ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
>   QUERY PLAN
> ... explain output clipped
>Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT
> l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a10,
> l.a12 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7
> a16, c8 a17, ctid a7 FROM "S 1"."T 1") l) l (a1, a2, a3, a4) INNER JOIN
> (SELECT ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17), r.a9
> FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8
> a17 FROM "S 1"."T 1") r) r (a1, a2) ON ((l.a3 = r.a2))
>
> With this patch it looks like
> EXPLAIN (COSTS false, VERBOSE)
> SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)
> ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
>QUERY PLAN
> ... explain output clipped
>Remote SQL: SELECT l.a3, l.a4, l.a1, l.a2, r.a2 FROM (SELECT
> "C 1", c3, ctid, ROW("C 1", c2, c3, c4, c5, c6, c7, c8) FROM "S 1"."T 1") l
> (a1, a2, a3, a4) INNER JOIN (SELECT "C 1", ROW("C 1", c2, c3, c4, c5, c6,
> c7, c8) FROM "S 1"."T 1") r (a1, a2) ON (TRUE) WHERE ((l.a1 = r.a1))
> (9 rows)
>
> 4. Local cost estimation
> Previous patch had a TODO left for estimating join cost locally, when
> use_remote_estimate is false. This patch adds support for the same. The
> relevant
> discussion in mail thread [2], [3].
>
> 5. This patch adds a GUC enable_foreignjoin to enable or disable join
> pushdown through core.
>
> 6. Added more tests to test lateral references, unsafe to push conditions at
> various places in the query,
>
> Many cosmetic improvements like adding static function declarations, comment
> improvements and making code readable.
>
> [1]
> http://www.postgresql.org/message-id/CAEZqfEe9KGy=1_wagh2rgzpg0o4pqgd+iauyaj8wtze+cyj...@mail.gmail.com
> [2]
> http://www.postgresql.org/message-id/cafjfprcqswus+tb5iyp1m3c-w0k3xab6h5mw4+n2q2iuafs...@mail.gmail.com
> [3]
> http://www.postgresql.org/message-id/CAFjFpRepSC2e3mZ1uYSopJD6R19fOZ0dNNf9Z=gnyksb6wg...@mail.gmail.com
>
> I will be working next on (in that order)
> 1. eval_plan_qual fix for foreign join. (Considered as a must-have for 9.6)
> 2. Pushing down ORDER BY clause along with join pushdown
> 3. Parameterization of foreign join paths (Given the complexity of the
> feature this may not make it into 9.6)

It seems you forgot to attach the patch.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] better systemd integration

2016-01-18 Thread Alvaro Herrera

Peter Eisentraut wrote:
> I have written a couple of patches to improve the integration of the
> postgres daemon with systemd.

Great.  Is anything happening with these patches, or?  They've been
inactive for quite a while now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 6:32 AM, David Rowley
 wrote:
>> In my tests, this seems to be slightly slower than what we're doing
>> today; worst of all, it adds a handful of cycles to
>> advance_transition_function() even when the aggregate is not an
>> expanded object or, indeed, not even pass-by-reference.  Some of this
>> might be able to be fixed by a little massaging - in particular,
>> DatumIsReadWriteExpandedObject() seems like it could be partly or
>> entirely inlined, and maybe there's some other way to improve the
>> coding here.
>
> It also seems that an expanded object requires a new memory context which
> must be malloc()d and free()d. This has added quite an overhead in my
> testing. I assume that we must be doing that so that we can ensure that all
> memory is properly free()d once we're done with the expanded-object.

Yeah.  The API contract for an expanded object took me quite a while
to puzzle out, but it seems to be this: if somebody hands you an R/W
pointer to an expanded object, you're entitled to assume that you can
"take over" that object and mutate it however you like.  But the
object might be in some other memory context, so you have to move it
into your own memory context.  That's implementing by reparenting the
object's context under your context.  This is nice because it's O(1) -
whereas copying would be O(n) - but it's sort of aggravating, too.  In
this case, the transition function already knows that it wants
everything in the per-agg state, but it can't just create everything
there and be done with it, because nodeAgg.c has to content with the
possibility that some other piece of code will return an expanded
object that *isn't* parented to the aggcontext.  And in fact one of
the regression tests does exactly that, which caused me lots of
head-banging yesterday.

Making it worse, the transition function isn't required to have the
same behavior every time: the one in the regression tests sometimes
returns an expanded-object pointer, and sometimes doesn't, and if
nodeAgg.c reparents that pointer to the aggcontext, the next call to
the transition function reparents the pointer BACK to the per-tuple
context, so you can't even optimize away the repeated set-parent
calls.  Uggh.

> create table ab(a int, b text);
> insert into ab select x,'aaa' from
> generate_Series(1,100) x(x);
> set work_mem='1GB';
> vacuum analyze ab;
>
> explain analyze select a%100,length(string_agg(b,',')) from ab group by
> 1;
>
> Patched
> 1521.045 ms
> 1515.905 ms
> 1530.920 ms
>
> Master
> 932.457 ms
> 959.285 ms
> 991.021 ms
>
> Although performance of the patched version is closer to master when we have
> less groups, I felt it necessary to show the extreme case. I feel this puts
> a bit of a dampener on the future of this idea, as I've previously had
> patches rejected for adding 2-5% on planning time alone, adding over 50%
> execution time seems like a dead-end.

Thanks for working up this test case.  I certainly agree that adding
50% execution time is a dead-end, but I wonder if that problem can be
fixed somehow.  It would be a shame to find out that expanded-objects
can't be effectively used for anything other than the purposes for
which they were designed, namely speeding up PL/pgsql array
operations.

> seems neater. I just don't want to waste my time writing all the code
> required to replace all INTERNAL aggregate states when I know the
> performance regression is going to be unacceptable.

No argument there.  If the performance regression isn't fixable, this
approach is doomed.

> I also witnessed another regression with your patch when testing on another
> machine. It caused the plan to change to a HashAgg instead of GroupAgg
> causing a significant slowdown.
>
> Unpatched
>
> # explain analyze select a%100,length(string_agg(b,',')) from ab group
> by 1;
> QUERY PLAN
> ---
>  GroupAggregate  (cost=119510.84..144510.84 rows=100 width=32) (actual
> time=538.938..1015.278 rows=100 loops=1)
>Group Key: ((a % 100))
>->  Sort  (cost=119510.84..122010.84 rows=100 width=32) (actual
> time=538.917..594.194 rows=100 loops=1)
>  Sort Key: ((a % 100))
>  Sort Method: quicksort  Memory: 102702kB
>  ->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
> (actual time=0.016..138.964 rows=100 loops=1)
>  Planning time: 0.146 ms
>  Execution time: 1047.511 ms
>
>
> Patched
> # explain analyze select a%100,length(string_agg(b,',')) from ab group
> by 1;
>QUERY PLAN
> 
>  HashAggregate  (cost=24853.00..39853.00 rows=100

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-01-18 Thread Alvaro Herrera

Peter Geoghegan wrote:
> On Fri, Dec 18, 2015 at 12:55 PM, Robert Haas  wrote:

> > I only see one patch version on the thread.
> 
> I'm not going to post a revision until I thrash out the tiny issues
> with Heikki. He kind of trailed off. So maybe that kills it
> immediately, which is a shame.

If you refuse to post an updated version of the patch until Heikki
weighs in some more, and given that Heikki has (for the purposes of this
patch) completely vanished, I think we should mark this rejected.

If somebody else is open to reviewing the patch, I think that'd be
another way to move forward, but presumably they would start from a
version with the discussed changes already fixed.  Otherwise it's a
waste of time.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Expanded Objects and Order By

2016-01-18 Thread Tom Lane

Paul Ramsey  writes:
> So, I've added support for converting postgis in-memory objects into
> expanded outputs, and have overwritten the usual
> lwgeom_to_gserialized() function with one that creates an expanded
> object. I haven't done anything to actually handle expanded objects on
> input, but as I understand it, that's fine, everything is supposed to
> work as normal when handed expanded objects. And thus far, that has
> been true, almost all regression tests work fine.

Hmm ... you do need to be able to flatten them again.  In the given
example, the parser is going to want to form a Const node whose Datum
value is a geometry object, and that Const node needs to be copiable
by copyObject(), which means datumCopy() has to work, and if you look
at that it will exercise EOH_get_flat_size/EOH_flatten_into when the
input routine originally produced an expanded object.

The error message is very strange; it's hard to see how toying with the
internal representation of Consts could cause that.  I think the
hypothesis of a memory clobber is stronger than you give it credit for,
especially since you see the behavior changing for seemingly unrelated
reasons.  Valgrind might be a useful tool here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-01-18 Thread Andres Freund

On 2016-01-16 10:01:25 +0100, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> >I measured it in a different number of cases, both on SSDs and spinning
> >rust. I just reproduced it with:
> >
> >postgres-ckpt14 \
> >   -D /srv/temp/pgdev-dev-800/ \
> >   -c maintenance_work_mem=2GB \
> >   -c fsync=on \
> >   -c synchronous_commit=off \
> >   -c shared_buffers=2GB \
> >   -c wal_level=hot_standby \
> >   -c max_wal_senders=10 \
> >   -c max_wal_size=100GB \
> >   -c checkpoint_timeout=30s
> >
> >Using a fresh cluster each time (copied from a "template" to save time)
> >and using
> >pgbench -M prepared -c 16 -j 16 -T 300 -P 1

So, I've analyzed the problem further, and I think I found something
rater interesting. I'd profiled the kernel looking where it blocks in
the IO request queues, and found that the wal writer was involved
surprisingly often.

So, in a workload where everything (checkpoint, bgwriter, backend
writes) is flushed: 2995 tps
After I kill the wal writer with -STOP: 10887 tps

Stracing the wal writer shows:

17:29:02.001517 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17857, 
si_uid=1000} ---
17:29:02.001538 rt_sigreturn({mask=[]}) = 0
17:29:02.001582 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily 
unavailable)
17:29:02.001615 write(3, 
"\210\320\5\0\1\0\0\0\0@\330_/\0\0\0w\f\0\0\0\0\0\0\0\4\0\2\t\30\0\372"..., 
49152) = 49152
17:29:02.001671 fdatasync(3)= 0
17:29:02.005022 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17825, 
si_uid=1000} ---
17:29:02.005043 rt_sigreturn({mask=[]}) = 0
17:29:02.005081 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily 
unavailable)
17:29:02.005111 write(3, 
"\210\320\5\0\1\0\0\0\0\0\331_/\0\0\0\7\26\0\0\0\0\0\0T\251\0\0\0\0\0\0"..., 
8192) = 8192
17:29:02.005147 fdatasync(3)= 0
17:29:02.008688 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17866, 
si_uid=1000} ---
17:29:02.008705 rt_sigreturn({mask=[]}) = 0
17:29:02.008730 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily 
unavailable)
17:29:02.008757 write(3, "\210\320\5\0\1\0\0\0\0 
\331_/\0\0\0\267\30\0\0\0\0\0\0"..., 98304) = 98304
17:29:02.008822 fdatasync(3)= 0
17:29:02.016125 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17865, 
si_uid=1000} ---
17:29:02.016141 rt_sigreturn({mask=[]}) = 0
17:29:02.016174 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily 
unavailable)
17:29:02.016204 write(3, 
"\210\320\5\0\1\0\0\0\0\240\332_/\0\0\0s\5\0\0\0\0\0\0\t\30\0\2|8\2u"..., 
57344) = 57344
17:29:02.016281 fdatasync(3)= 0
17:29:02.019181 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17865, 
si_uid=1000} ---
17:29:02.019199 rt_sigreturn({mask=[]}) = 0
17:29:02.019226 read(8, 0x7ffea6b6b200, 16) = -1 EAGAIN (Resource temporarily 
unavailable)
17:29:02.019249 write(3, 
"\210\320\5\0\1\0\0\0\0\200\333_/\0\0\0\307\f\0\0\0\0\0\0"..., 73728) = 
73728
17:29:02.019355 fdatasync(3)= 0
17:29:02.022680 --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=17865, 
si_uid=1000} ---
17:29:02.022696 rt_sigreturn({mask=[]}) = 0

I.e. we're fdatasync()ing small amount of pages. Roughly 500 times a
second. As soon as the wal writer is stopped, it's much bigger chunks,
on the order of 50-130 pages. And, not that surprisingly, that improves
performance, because there's far fewer cache flushes submitted to the
hardware.


> I'm running some tests similar to those above...

> Do you do some warmup when testing? I guess the answer is "no".

Doesn't make a difference here, I tried both. As long as before/after
benchmarks start from the same state...


> I understand that you have 8 cores/16 threads on your host?

On one of them, 4 cores/8 threads on the laptop.


> Loading scale 800 data for 300 seconds tests takes much more than 300
> seconds (init takes ~360 seconds, vacuum & index are slow). With 30 seconds
> checkpoint cycles and without any warmup, I feel that these tests are really
> on the very short (too short) side, so I'm not sure how much I can trust
> such results as significant. The data I reported were with more real life
> like parameters.

I see exactly the same with 300s or 1000s checkpoint cycles, it just
takes a lot longer to repeat. They're also similar (although obviously
both before/after patch are higher) if I disable full_page_writes,
thereby eliminating a lot of other IO.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-18 Thread Thom Brown

On 3 January 2016 at 13:26, Masahiko Sawada  wrote:
> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
>  wrote:
>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada  
>> wrote:
>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>>  wrote:
 On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
  wrote:
> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
> above, then this function could be renamed to SyncRepGetSyncStandbys.
> I think it would be a tiny bit nicer if it also took a Size n argument
> along with the output buffer pointer.
>>>
>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>> function uses synchronous_standby_num which is global variable.
>>> But you mean that the number of synchronous standbys is given as
>>> function argument?
>>
>> Yeah, I was thinking of it as the output buffer size which I would be
>> inclined to make more explicit (I am still coming to terms with the
>> use of global variables in Postgres) but it doesn't matter, please
>> disregard that suggestion.
>>
> As for the body of that function (which I won't paste here), it
> contains an algorithm to find the top K elements in an array of N
> elements.  It does that with a linear search through the top K seen so
> far for each value in the input array, so its worst case is O(KN)
> comparisons.  Some of the sorting gurus on this list might have
> something to say about that but my take is that it seems fine for the
> tiny values of K and N that we're dealing with here, and it's nice
> that it doesn't need any space other than the output buffer, unlike
> some other top-K algorithms which would win for larger inputs.
>>>
>>> Yeah, it's improvement point.
>>> But I'm assumed that the number of synchronous replication is not
>>> large, so I use this algorithm as first version.
>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>
>> You're right, I was dropping that detail, in the tradition of the
>> hand-wavy school of big-O notation.  (I suppose you could skip the
>> inner loop when the priority is lower than the current lowest
>> priority, giving a O(N) best case when the walsenders are perfectly
>> ordered by coincidence.  Probably a bad idea or just not worth
>> worrying about.)
>
> Thank you for reviewing the patch.
> Yeah, I added the logic that skip the inner loop.
>
>>
>>> Attached latest version patch.
>>
>> +/*
>> + * Obtain currently synced LSN location: write and flush, using priority
>> - * In 9.1 we support only a single synchronous standby, chosen from a
>> - * priority list of synchronous_standby_names. Before it can become the
>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>
>> s/standby/standbys/
>>
>> + * list of synchronous_standby_names. Before it can become the
>>
>> s/Before it can become the/Before any standby can become a/
>>
>>   * synchronous standby it must have caught up with the primary; that may
>>   * take some time. Once caught up, the current highest priority standby
>>
>> s/standby/standbys/
>>
>>   * will release waiters from the queue.
>>
>> +bool
>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>> +{
>> + int sync_standbys[synchronous_standby_num];
>>
>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>> in C89.)
>>
>> +/*
>> + * Populate a caller-supplied array which much have enough space for
>> + * synchronous_standby_num. Returns position of standbys currently
>> + * considered as synchronous, and its length.
>> + */
>> +int
>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>
>> s/much/must/ (my bad, in previous email).
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("The number of synchronous standbys must be smaller than the
>> number of listed : %d",
>> + synchronous_standby_num)));
>>
>> How about "the number of synchronous standbys exceeds the length of
>> the standby list: %d"?  Error messages usually start with lower case,
>> ':' is not usually preceded by a space.
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>
>> s/The/the/, s/ : /: /
>
> Fixed you mentioned.
>
> Attached latest v5 patch.
> Please review it.

synchronous_standby_num doesn't appear to be a valid GUC name:

LOG:  unrecognized configuration parameter "synchronous_standby_num"
in file "/home/thom/Development/test/primary/postgresql.conf" line 244

All I did was uncomment it and set it to a value.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-18 Thread Alvaro Herrera

Amit Kapila wrote:

> The reason for not updating the patch related to this thread is that it is
> dependent on the work for refactoring the tranches for LWLocks [1]
> which is now coming towards an end, so I think it is quite reasonable
> that the patch can be updated for this work during commit fest, so
> I am moving it to upcoming CF.

Thanks.  I think the tranche reworks are mostly done now, so is anyone
submitting an updated version of this patch?

Also, it would be very good if someone can provide insight on how this
patch interacts with the other submitted patch for "waiting for
replication" https://commitfest.postgresql.org/8/436/
Andres seems to think that the other patch is completely independent of
this one, i.e. the "waiting for replication" column needs to exist
separately and not as part of the "more descriptive" new 'waiting'
column.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statistics for array types

2016-01-18 Thread Alvaro Herrera

Alexander Korotkov wrote:

> The patch implementing my idea above is attached.

What's the status here?  Jeff, did you have a look at Alexander's
version of your patch?  Tomas, does this patch satisfy your concerns?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Pavel Stehule

>
> > # explain analyze select a%100,length(string_agg(b,',')) from ab
> group
> > by 1;
> > QUERY PLAN
> >
> ---
> >  GroupAggregate  (cost=119510.84..144510.84 rows=100 width=32)
> (actual
> > time=538.938..1015.278 rows=100 loops=1)
> >Group Key: ((a % 100))
> >->  Sort  (cost=119510.84..122010.84 rows=100 width=32) (actual
> > time=538.917..594.194 rows=100 loops=1)
> >  Sort Key: ((a % 100))
> >  Sort Method: quicksort  Memory: 102702kB
> >  ->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
> > (actual time=0.016..138.964 rows=100 loops=1)
> >  Planning time: 0.146 ms
> >  Execution time: 1047.511 ms
> >
> >
> > Patched
> > # explain analyze select a%100,length(string_agg(b,',')) from ab
> group
> > by 1;
> >QUERY PLAN
> >
> 
> >  HashAggregate  (cost=24853.00..39853.00 rows=100 width=32) (actual
> > time=8072.346..144424.872 rows=100 loops=1)
> >Group Key: (a % 100)
> >->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
> (actual
> > time=0.025..481.332 rows=100 loops=1)
> >  Planning time: 0.164 ms
> >  Execution time: 263288.332 ms
>
> Well, that's pretty odd.  I guess the plan change must be a result of
> switching the transition type from internal to text, although I'm not
> immediately certain why that would make a difference.
>

It is strange, why hashaggregate is too slow?

Pavel



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Optimizer questions

2016-01-18 Thread Konstantin Knizhnik

I am sorry for badly formatted query - I just cut it from C++ 
client program.


I have one more question to community regarding this patch.
Proposed patch fix the problem particularly for SORT+LIMIT clauses.
In this case evaluation of expressions which are not used in sort is 
alway waste of time.
But I wonder if we should delay evaluation of complex expressions even 
if there is no LIMIT?
Quite often client application doesn't fetch all query results even if 
there is no LIMIT clause.




On 18.01.2016 05:47, Bruce Momjian wrote:

On Tue, Jan  5, 2016 at 05:55:28PM +0300, konstantin knizhnik wrote:

Hi hackers,

I want to ask two questions about PostgreSQL optimizer.
I have the following query:

SELECT o.id as id,s.id as sid,o.owner,o.creator,o.parent_id
as dir_id,s.mime_id,m.c_type,s.p_file,s.mtime,o.ctime,o.name
,o.title,o.size,o.deleted,la.otime,la.etime,uo.login as owner_login,uc.login as
creator_login,(CASE WHEN f.user_id IS NULL THEN 0 ELSE 1 END) AS flagged,
(select 'userid\\:'||string_agg(user_id,' userid\\:') from get_authorized_users
(o.id)) as acl FROM objects s JOIN objects o ON s.original_file=o.id LEFT JOIN
flags f ON o.id=f.obj_id AND o.owner=f.user_id LEFT JOIN objects_last_activity
la ON o.id = la.obj_id AND o.owner = la.user_id, mime m, users uc , users uo
WHERE (s.mime_id=904 or s.mime_id=908) AND m.mime_id = o.mime_id AND o.owner =
uo.user_id AND o.creator = uc.user_id ORDER BY s.mtime LIMIT 9;

FYI, I could not make any sense out of this query, and I frankly can't
figure out how others can udnerstand it.  :-O   Anyway, I ran it through
pgFormatter (https://github.com/darold/pgFormatter), which I am showing
here because I was impressed with the results:

SELECT
o.id AS id,
s.id AS sid,
o.owner,
o.creator,
o.parent_id AS dir_id,
s.mime_id,
m.c_type,
s.p_file,
s.mtime,
o.ctime,
o.name,
o.title,
o.size,
o.deleted,
la.otime,
la.etime,
uo.login AS owner_login,
uc.login AS creator_login,
(
CASE
WHEN f.user_id IS NULL THEN 0
ELSE 1
END ) AS flagged,
(
SELECT
'userid\\:' || string_agg (
user_id,
' userid\\:' )
FROM
get_authorized_users (
o.id ) ) AS acl
FROM
objects s
JOIN objects o ON s.original_file = o.id
LEFT JOIN flags f ON o.id = f.obj_id
AND o.owner = f.user_id
LEFT JOIN objects_last_activity la ON o.id = la.obj_id
AND o.owner = la.user_id,
mime m,
users uc,
users uo
WHERE (
s.mime_id = 904
OR s.mime_id = 908 )
AND m.mime_id = o.mime_id
AND o.owner = uo.user_id
AND o.creator = uc.user_id
ORDER BY
s.mtime
LIMIT 9;



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-18 Thread Masahiko Sawada

On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
 wrote:
> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada  
> wrote:
>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>> * Confirm that the data is synchronously replicated to multiple
>> standbys in same cases.
>>   * case 1 : The standby which is not listed in s_s_name, is down
>>   * case 2 : The standby which is listed in s_s_names but potential
>> standby, is down
>>   * case 3 : The standby which is considered as sync standby, is down.
>> * Standby promotion
>>
>> In order to confirm that the commit isn't done in case #3 forever
>> unless new sync standby is up, I think we need the framework that
>> cancels executing query.
>> That is, what I'm planning is,
>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>> 2. Set up two standby servers
>> 3. Standby1 is down
>> 4. Create some contents on master (But transaction is not committed)
>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>> standby2 makes progress)
>
> This will need some thinking and is not as easy as it sounds. There is
> no way to hold on a connection after executing a query in the current
> TAP infrastructure. You are just mentioning case 3, but actually cases
> 1 and 2 are falling into the same need: if there is a failure we want
> to be able to not be stuck in the test forever and have a way to
> cancel a query execution at will. TAP uses psql -c to execute any sql
> queries, but we would need something that is far lower-level, and that
> would be basically using the perl driver for Postgres or an equivalent
> here.
>
> Honestly for those tests I just thought that we could get to something
> reliable by just looking at how each sync replication setup reflects
> in pg_stat_replication as the flow is really getting complicated,
> giving to the user a clear representation at SQL level of what is
> actually occurring in the server depending on the configuration used
> being important here.

I see.
We could check the transition of sync_state in pg_stat_replication.
I think it means that it tests for each replication method (switching
state) rather than synchronization of replication.

What I'm planning to have are,
* Confirm value of pg_stat_replication.sync_state (sync, async or potential)
* Standby promotion
* Standby catching up master
And each replication method has above tests.

Are these enough?

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb - jsonb operators

2016-01-18 Thread Tom Lane

Dmitry Dolgov <9erthali...@gmail.com> writes:
>> if there's any future intention to add a delete operator that removes
> element/pair matches?

> I think the operator (jsonb - jsonb) is logical because we have a shallow
> concatenation function (something like a "union" operation), but we have
> nothing like "set difference" and "intersection" functions. Actually, I
> thought to implement these functions (at least for jsonbx). But of course
> this function should be quite simple and consider only full key/value
> matching as a target.

I am wary of this proposal because it seems to be taking little
account of the fact that there *already is* a jsonb minus operator,
two of them in fact.  For example

regression=# select '["a","b","c"]'::jsonb - 'b';
  ?column?  

 ["a", "c"]
(1 row)

regression=# select '{"a":1, "b":2}'::jsonb - 'b';
 ?column? 
--
 {"a": 1}
(1 row)

The proposed full-match semantics don't seem to me to be consistent with
the way that the existing operator works.

Another rather nasty problem is that the latter case works at all,
ie the parser will decide the unknown literal is "text" so that it can
apply "jsonb - text", there being no other plausible choice.  If there
were a "jsonb - jsonb" operator, the parser would prefer that one, due
to its heuristic about assuming that an unknown literal is of the same
type as the other operator input.  So adding such an operator will almost
certainly break queries that work in 9.5.  Maybe it's worth adding one
anyway, but I don't think the case for its usefulness has been proven
to the point where we should create compatibility issues to get it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot find a working 64-bit integer type

2016-01-18 Thread Tom Lane

Robert Haas  writes:
> The relevant portion of config.log seems to be this:

> configure:13285: gcc -o conftest.exe -Wall -Wmissing-prototypes
> -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels
> -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
> -fwrapv -fexcess-precision=standard -O2
> -I/home/Admin/sources/postgresql-9.5.0/src/include/port/win32
> -DEXEC_BACKEND -Wl,--allow-multiple-definition
> -Wl,--disable-auto-import conftest.c -lz -lws2_32 -lm >&5
> conftest.c:106:5: warning: no previous prototype for 'does_int64_work'
> [-Wmissing-prototypes] int does_int64_work()
> conftest.c:120:1: warning: return type defaults to 'int' [-Wimplicit-int]
> conftest.c: In function 'main':
> conftest.c:121:3: warning: implicit declaration of function 'exit'
> [-Wimplicit-function-declaration]
> conftest.c:121:3: warning: incompatible implicit declaration of
> built-in function 'exit'
> conftest.c:121:3: note: include '' or provide a declaration of 
> 'exit'
> C:/Apps/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
> reopening conftest.exe: Permission denied
> C:/Apps/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/5.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe:
> final link failed: Permission denied collect2.exe: error: ld returned
> 1 exit status
> configure:13285: $? = 1 configure: program exited with status 1

I do not think configure pays attention to mere warnings for this type of
test.  The real problem here seems to be the "permission denied" errors,
which to me reek of broken Windows antivirus software.  (As far as I'm
aware, the word "broken" is redundant in that phrase.)

> I'm a little confused as to why -Wno-cpp fixes any of that, though.

Most likely, it's pure chance that a retry worked.  Or if it's repeatable,
maybe no-cpp changes the compiler's file access patterns enough that
there's no longer a false trip of the AV check.

Short answer is that I wonder how much of the OP's multiple problems
are being caused by AV bugs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-18 Thread Tom Lane

Bruce Momjian  writes:
> I never understood why we don't just keep the selectivity estimates of
> previous plans and just reuse the plan if the selectivity estimates are
> similar.  Isn't parameter selectivity the only thing that distinguishes
> on plan with parameter from another?

> Checking selectivity estimates must be cheaper than replanning.  This
> could be done at the second use of the prepared plan, and maybe for all
> plan reuses, rather than waiting for five and then perhaps getting this
> bad behavior.

You're imagining that a selectivity recheck could be separated out from
the rest of the planner.  That's nowhere near feasible, IMO.  Even if it
were, what would we do with it?  There's no reliable way to determine
whether X% change in one or another selectivity number would change the
selected plan, other than by redoing practically all of the planning work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb - jsonb operators

2016-01-18 Thread Glyn Astill

On Mon, 18/1/16, Tom Lane  wrote:

 Subject: Re: [HACKERS] jsonb - jsonb operators
 To: "Dmitry Dolgov" <9erthali...@gmail.com>
 Cc: "Glyn Astill" , "Merlin Moncure" 
, "pgsql-hackers@postgresql.org" 

 Date: Monday, 18 January, 2016, 16:50

 Dmitry Dolgov <9erthali...@gmail.com>
 writes:
 >> if there's any future intention to
 add a delete operator that removes
 >
 element/pair matches?

 >
 I think the operator (jsonb - jsonb) is logical because we
 have a shallow
 > concatenation function
 (something like a "union" operation), but we
 have
 > nothing like "set
 difference" and "intersection" functions.
 Actually, I
 > thought to implement these
 functions (at least for jsonbx). But of course
 > this function should be quite simple and
 consider only full key/value
 > matching
 as a target.

 I am
 wary of this proposal because it seems to be taking
 little
 account of the fact that there
 *already is* a jsonb minus operator,
 two of
 them in fact.  For example

 regression=# select
 '["a","b","c"]'::jsonb
 - 'b';
   ?column?  

  ["a",
 "c"]
 (1 row)

 regression=# select '{"a":1,
 "b":2}'::jsonb - 'b';

 ?column? 
 --

 {"a": 1}
 (1 row)

 The proposed full-match
 semantics don't seem to me to be consistent with
 the way that the existing operator works.

 Another rather nasty problem
 is that the latter case works at all,
 ie the
 parser will decide the unknown literal is "text"
 so that it can
 apply "jsonb -
 text", there being no other plausible choice.  If
 there
 were a "jsonb - jsonb"
 operator, the parser would prefer that one, due
 to its heuristic about assuming that an unknown
 literal is of the same
 type as the other
 operator input.  So adding such an operator will almost
 certainly break queries that work in 9.5. 
 Maybe it's worth adding one
 anyway, but
 I don't think the case for its usefulness has been
 proven
 to the point where we should create
 compatibility issues to get it.

             regards, tom lane

In that case pehaps there is no need for an operator, but a function would be 
useful. Perhaps specifying the depth to delete on like Dimitri's key versions 
do?

I mocked up the top level version last year, like you say its trivial, but I 
find it useful.  It's here https://github.com/glynastill/jsonb_delete

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot find a working 64-bit integer type

2016-01-18 Thread Igal @ Lucee.org


On 1/18/2016 11:09 AM, Tom Lane wrote:

Robert Haas  writes:

The relevant portion of config.log seems to be this:
I do not think configure pays attention to mere warnings for this type 
of test. The real problem here seems to be the "permission denied" 
errors, which to me reek of broken Windows antivirus software. (As far 
as I'm aware, the word "broken" is redundant in that phrase.)
Thank you both for looking into this.  The only A/V-type software that I 
have running is the "Microsoft Security Essentials".



I'm a little confused as to why -Wno-cpp fixes any of that, though.

Most likely, it's pure chance that a retry worked.  Or if it's repeatable,
maybe no-cpp changes the compiler's file access patterns enough that
there's no longer a false trip of the AV check.

Short answer is that I wonder how much of the OP's multiple problems
are being caused by AV bugs.
I did not make any changes other than adding the compiler flags between 
those two runs (nor afterwards).


The reason that I decided to try to add the -Wno-error flag was that I 
searched the net for the error message, and found this
thread from 4 years ago: 
http://postgresql.nabble.com/Setting-Werror-in-CFLAGS-td5118384.html -- 
which showed

a similar error message and a play of the compiler flags.

I will try to run both forms again and report whether it is repeatable.

Thanks again,


Igal



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Truncating/vacuuming relations on full tablespaces

2016-01-18 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 15, 2016 at 2:16 PM, Tom Lane  wrote:
>> Presumably the hope would be that VACUUM would truncate off some of the
>> heap, else there's not much good that's going to happen.  That leaves
>> me wondering exactly what the invariant is for the maps, and if it's
>> okay to not touch them during a heap truncation.

> No, you're missing the point, or at least I think you are.  Suppose
> somebody creates a big table and then deletes all the tuples in the
> second half, but VACUUM never runs.  When at last VACUUM does run on
> that table, it will try to build the VM and FSM forks as it scans the
> table, and will only truncate AFTER that has been done.  If building
> the VM and FSM forks fails because there is no freespace, we will
> never reach the part of the operation that could create some.

No, I follow that perfectly.  I think you missed *my* point, which is:
suppose that we do have a full-length VM and/or FSM fork for a relation,
and VACUUM decides to truncate the relation.  Is it okay to not truncate
the VM/FSM?  If it isn't, we're going to have to have very tricky
semantics for any "don't touch the map forks" option, because it will
have to suppress only some of VACUUM's map updates.

If the map invariants are such that leaving garbage in them is
unconditionally safe, then this isn't a problem; but I'm unsure of that.

> The key point is that both the VM and the FSM are optional.

No, the key point is whether it's okay if they *are* there and contain
lies, or self-inconsistent data.

An alternative approach that might avoid such worries is to have a mode
wherein VACUUM always truncates the map forks to nothing, rather than
attempting to update them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cannot find a working 64-bit integer type

2016-01-18 Thread Igal @ Lucee.org


It looks like Tom is correct.

I added the directory tree to an exclude list of Microsoft Security 
Essentials and

ran `configure` without any flags and it completed successfully this time.

Thank you both for your time and expertise,


Igal

On 1/18/2016 11:23 AM, Igal @ Lucee.org wrote:

On 1/18/2016 11:09 AM, Tom Lane wrote:

Robert Haas  writes:

The relevant portion of config.log seems to be this:
I do not think configure pays attention to mere warnings for this 
type of test. The real problem here seems to be the "permission 
denied" errors, which to me reek of broken Windows antivirus 
software. (As far as I'm aware, the word "broken" is redundant in 
that phrase.)
Thank you both for looking into this.  The only A/V-type software that 
I have running is the "Microsoft Security Essentials".



I'm a little confused as to why -Wno-cpp fixes any of that, though.
Most likely, it's pure chance that a retry worked.  Or if it's 
repeatable,

maybe no-cpp changes the compiler's file access patterns enough that
there's no longer a false trip of the AV check.

Short answer is that I wonder how much of the OP's multiple problems
are being caused by AV bugs.
I did not make any changes other than adding the compiler flags 
between those two runs (nor afterwards).


The reason that I decided to try to add the -Wno-error flag was that I 
searched the net for the error message, and found this
thread from 4 years ago: 
http://postgresql.nabble.com/Setting-Werror-in-CFLAGS-td5118384.html 
-- which showed

a similar error message and a play of the compiler flags.

I will try to run both forms again and report whether it is repeatable.

Thanks again,


Igal





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-01-18 Thread Peter Geoghegan

On Mon, Jan 18, 2016 at 8:36 AM, Alvaro Herrera
 wrote:
> If you refuse to post an updated version of the patch until Heikki
> weighs in some more, and given that Heikki has (for the purposes of this
> patch) completely vanished, I think we should mark this rejected.

I don't refuse. I just don't want to waste anyone's time. I will
follow all of Heikki's feedback immediately, except this:

"I think it'd be better to define it as "like CHECK_UNIQUE_YES, but
return FALSE instead of throwing an error on conflict". The difference
is that the aminsert would not be allowed to return FALSE when there
is no conflict".

That's because I believe this is quite broken, as already pointed out.

> If somebody else is open to reviewing the patch, I think that'd be
> another way to move forward, but presumably they would start from a
> version with the discussed changes already fixed.  Otherwise it's a
> waste of time.

Your premise here is that what Heikki said in passing months ago is
incontrovertibly the right approach. That's ridiculous. I think Heikki
and I could work this out quite quickly, if he engaged, but for
whatever reason he appears unable to. I doubt that Heikki thinks that
about what he said, so why do you?

The point about CHECK_UNIQUE_YES I highlighted above felt like a
temporary misunderstanding to me, and not even what you might call a
real disagreement. It wasn't as if Heikki was insistent at the time. I
pointed out that what he said was broken according to an established
definition of broken (it would result in unprincipled deadlocks). He
didn't respond to that point. I think he didn't get back quickly in
part because I gave him something to think about.

If any other committer wants to engage with me on this, then I will of
course work with them. But that will not be predicated on my first
revising the patch in a way that this other committer does not
understand. That would be profoundly unfair.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring speculative insertion with unique indexes a little

2016-01-18 Thread Alvaro Herrera

Peter Geoghegan wrote:
> On Mon, Jan 18, 2016 at 8:36 AM, Alvaro Herrera
>  wrote:
> > If you refuse to post an updated version of the patch until Heikki
> > weighs in some more, and given that Heikki has (for the purposes of this
> > patch) completely vanished, I think we should mark this rejected.
> 
> I don't refuse. I just don't want to waste anyone's time. I will
> follow all of Heikki's feedback immediately, except this:
> 
> "I think it'd be better to define it as "like CHECK_UNIQUE_YES, but
> return FALSE instead of throwing an error on conflict". The difference
> is that the aminsert would not be allowed to return FALSE when there
> is no conflict".
> 
> That's because I believe this is quite broken, as already pointed out.

I think I like your approach better.

> > If somebody else is open to reviewing the patch, I think that'd be
> > another way to move forward, but presumably they would start from a
> > version with the discussed changes already fixed.  Otherwise it's a
> > waste of time.
> 
> Your premise here is that what Heikki said in passing months ago is
> incontrovertibly the right approach. That's ridiculous. I think Heikki
> and I could work this out quite quickly, if he engaged, but for
> whatever reason he appears unable to. I doubt that Heikki thinks that
> about what he said, so why do you?

I don't -- I just think you could have sent a patch that addressed all
the other points, leave this one as initially submitted, and note that
the new submission left it unaddressed because you disagreed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-18 Thread Vitaly Burovoy

On 1/4/16, Robert Haas  wrote:
> On Mon, Jan 4, 2016 at 10:17 AM, Pavel Stehule 
> wrote:
>> [ new patch ]
>
> + case '-':
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +  errmsg("size cannot be negative")));
>
> Why not?  I bet if you copy any - sign to the buffer, this will Just Work.

Hmm. The function's name is pg_size_bytes. How number of bytes can be
negative? How any length can be negative? If anyone insert '-' sign to
an argument, it is copy-paste error. I don't see any case where there
is possible negatives as input value.

I prefer error message instead of getting all relations (by using
comparison from the initial letter) just because of copy-paste mistake
or incomplete checking of input values at app-level.

> ...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

-- 
Best regards,
Vitaly Burovoy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (... tab completion

2016-01-18 Thread Andreas Karlsson


On 01/11/2016 02:01 AM, Peter Eisentraut wrote:

I think this would be a useful addition.  A couple of problems:


Thanks for the review. A new version is attached.


This change in the comment doesn't make sense to me and doesn't seem to
match the code:

-   /* If we have COPY [BINARY] , complete it with "TO" or "FROM" */
+   /* If we have COPY|BINARY , complete it with "TO" or "FROM" */


Fixed. As Tom correctly guessed this was the result of a mistake when 
rebasing.



The list of commands to allow as the "query" inside the parentheses is
documented to be: SELECT, VALUES, INSERT, UPDATE or DELETE; and actually
TABLE should also work.  Your list doesn't include all of those.  So
please adjust that.


Fixed. And TABLE works too.

Andreas
commit 3b7a808e710e613f81abd0207847a3378ec3192c
Author: Andreas Karlsson 
Date:   Sat Dec 12 17:38:19 2015 +0100

Improve COPY completion

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ad8a580..c928ebf 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1934,11 +1934,18 @@ psql_completion(const char *text, int start, int end)
 /* COPY */
 
 	/*
-	 * If we have COPY [BINARY] (which you'd have to type yourself), offer
-	 * list of tables (Also cover the analogous backslash command)
+	 * If we have COPY, offer list of tables or "(" (Also cover the analogous
+	 * backslash command).
 	 */
-	else if (Matches1("COPY|\\copy") || Matches2("COPY", "BINARY"))
+	else if (Matches1("COPY|\\copy"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables,
+   " UNION ALL SELECT '('");
+	/* If we have COPY BINARY, compelete with list of tables */
+	else if (Matches2("COPY", "BINARY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
+	/* If we have COPY (, complete it with legal commands */
+	else if (TailMatches2("COPY|\\copy", "("))
+		COMPLETE_WITH_LIST6("SELECT", "TABLE", "WITH", "INSERT", "UPDATE", "DELETE");
 	/* If we have COPY [BINARY] , complete it with "TO" or "FROM" */
 	else if (Matches2("COPY|\\copy", MatchAny) ||
 			 Matches3("COPY", "BINARY", MatchAny))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making plpython 2 and 3 coexist a bit better

2016-01-18 Thread Bruce Momjian

On Mon, Jan 18, 2016 at 01:10:05PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Mon, Jan 11, 2016 at 10:44:42AM -0500, Tom Lane wrote:
> >> But it gets worse: a report today in pgsql-general points out that
> >> even if you have the two languages in use *in different databases*,
> >> pg_upgrade will fail, because pg_upgrade rather overenthusiastically
> >> loads every .so mentioned anywhere in the source installation into
> >> one session.
> 
> > The fix for that would be for pg_upgrade to change
> > check_loadable_libraries() to start a new session for each LOAD command.
> > Would you like that done?
> 
> Not necessary anymore, at least not for PL/Python; and that solution
> sounds a tad expensive.

Yes, it would certainly be inefficient.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interesting read on SCM upending software and hardware architecture

2016-01-18 Thread Jim Nasby


On 1/18/16 2:47 PM, Peter Geoghegan wrote:

On Mon, Jan 18, 2016 at 12:31 PM, Robert Haas  wrote:

People keep predicting the death of spinning media, but I think
it's not happening to anywhere near as fast as that people think.
Yes, I'm writing this on a laptop with an SSD, and my personal laptop
also has an SSD, but their immediate predecessors did not, and these
are fairly expensive laptops.  And most customers I talk to are still
using spinning disks.  Meanwhile, main memory is getting so large that
even pretty significant databases can be entirely RAM-cached.  So I
tend to think that this is a lot less exciting than people who are not
me seem to think.


I tend to agree that the case for SSDs as a revolutionary technology
has been significantly overstated. This recent article makes some
interesting points:

http://www.zdnet.com/article/what-we-learned-about-ssds-in-2015/

I think it's much more true that main memory scaling (in particular,
main memory capacity) has had a huge impact, but that trend appears to
now be stalling.


My original article doesn't talk about SSDs; it's talking about 
non-volatile memory architectures (quoted extract below). Fusion IO is 
an example of this, and if NVDIMMs become available we'll see even 
faster non-volatile performance.


To me, the most interesting point the article makes is that systems now 
need much better support for multiple classes of NV storage. I agree 
with your point that spinning rust is here to stay for a long time, 
simply because it's cheap as heck. So systems need to become much better 
at moving data between different layers of NV storage so that you're 
getting the biggest bang for the buck. That will remain critical as long 
as SCM's remain 25x more expensive than rust.


Quote from article:



Flash-based storage devices are not new: SAS and SATA SSDs have been 
available for at least the past decade, and have brought flash memory 
into computers in the same form factor as spinning disks. SCMs reflect a 
maturing of these flash devices into a new, first-class I/O device: SCMs 
move flash off the slow SAS and SATA buses historically used by disks, 
and onto the significantly faster PCIe bus used by more 
performance-sensitive devices such as network interfaces and GPUs. 
Further, emerging SCMs, such as non-volatile DIMMs (NVDIMMs), interface 
with the CPU as if they were DRAM and offer even higher levels of 
performance for non-volatile storage.


Today's PCIe-based SCMs represent an astounding three-order-of-magnitude 
performance change relative to spinning disks (~100K I/O operations per 
second versus ~100). For computer scientists, it is rare that the 
performance assumptions that we make about an underlying hardware 
component change by 1,000x or more. This change is punctuated by the 
fact that the performance and capacity of non-volatile memories continue 
to outstrip CPUs in year-on-year performance improvements, closing and 
potentially even inverting the I/O gap.


The performance of SCMs means that systems must no longer "hide" them 
via caching and data reduction in order to achieve high throughput. 
Unfortunately, however, this increased performance comes at a high 
price: SCMs cost 25x as much as traditional spinning disks ($1.50/GB 
versus $0.06/GB), with enterprise-class PCIe flash devices costing 
between three and five thousand dollars each. This means that the cost 
of the non-volatile storage can easily outweigh that of the CPUs, DRAM, 
and the rest of the server system that they are installed in. The 
implication of this shift is significant: non-volatile memory is in the 
process of replacing the CPU as the economic center of the datacenter.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Joe Conway

On 01/18/2016 04:16 PM, Joe Conway wrote:
> Please see the attached. Duplication removed.

Actually please see this version instead.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..abf9a70 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_timezone_abbrevs AS
*** 433,438 
--- 433,444 
  CREATE VIEW pg_timezone_names AS
  SELECT * FROM pg_timezone_names();
  
+ CREATE VIEW pg_config AS
+ SELECT * FROM pg_config();
+ 
+ REVOKE ALL on pg_config FROM PUBLIC;
+ REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC;
+ 
  -- Statistics views
  
  CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 7889101..a0c82c1 100644
*** a/src/backend/utils/misc/Makefile
--- b/src/backend/utils/misc/Makefile
*** include $(top_builddir)/src/Makefile.glo
*** 14,21 
  
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
! OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
!sampling.o superuser.o timeout.o tzparser.o
  
  # This location might depend on the installation directories. Therefore
  # we can't subsitute it into pg_config.h.
--- 14,21 
  
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
! OBJS = guc.o help_config.o pg_config.o pg_rusage.o \
!ps_status.o rls.o sampling.o superuser.o timeout.o tzparser.o
  
  # This location might depend on the installation directories. Therefore
  # we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/pg_config.c b/src/backend/utils/misc/pg_config.c
index ...05ee67b .
*** a/src/backend/utils/misc/pg_config.c
--- b/src/backend/utils/misc/pg_config.c
***
*** 0 
--- 1,102 
+ /*-
+  *
+  * pg_config.c
+  *		Expose same output as pg_config except as an SRF
+  *
+  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * IDENTIFICATION
+  *	  src/backend/utils/misc/pg_config.c
+  *
+  */
+ 
+ #include "postgres.h"
+ 
+ #include "funcapi.h"
+ #include "miscadmin.h"
+ #include "catalog/pg_type.h"
+ #include "common/config_info.h"
+ #include "utils/builtins.h"
+ #include "utils/elog.h"
+ #include "port.h"
+ 
+ Datum
+ pg_config(PG_FUNCTION_ARGS)
+ {
+ 	ReturnSetInfo	   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	Tuplestorestate	   *tupstore;
+ 	HeapTuple			tuple;
+ 	TupleDesc			tupdesc;
+ 	AttInMetadata	   *attinmeta;
+ 	MemoryContext		per_query_ctx;
+ 	MemoryContext		oldcontext;
+ 	configdata		   *ConfigData;
+ 	size_tconfigdata_len;
+ 	char			   *values[2];
+ 	int	i = 0;
+ 
+ 	/* check to see if caller supports us returning a tuplestore */
+ 	if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize))
+ 		ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+  errmsg("materialize mode required, but it is not "
+ 		"allowed in this context")));
+ 
+ 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+ 
+ 	/* get the requested return tuple description */
+ 	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+ 
+ 	/*
+ 	 * Check to make sure we have a reasonable tuple descriptor
+ 	 */
+ 	if (tupdesc->natts != 2 ||
+ 		tupdesc->attrs[0]->atttypid != TEXTOID ||
+ 		tupdesc->attrs[1]->atttypid != TEXTOID)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+  errmsg("query-specified return tuple and "
+ 		"function return type are not compatible")));
+ 
+ 	/* OK to use it */
+ 	attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ 
+ 	/* let the caller know we're sending back a tuplestore */
+ 	rsinfo->returnMode = SFRM_Materialize;
+ 
+ 	/* initialize our tuplestore */
+ 	tupstore = tuplestore_begin_heap(true, false, work_mem);
+ 
+ 	ConfigData = get_configdata(my_exec_path, _len);
+ 	for (i = 0; i < configdata_len; i++)
+ 	{
+ 		values[0] = ConfigData[i].name;
+ 		values[1] = ConfigData[i].setting;
+ 
+ 		tuple = BuildTupleFromCStrings(attinmeta, values);
+ 		tuplestore_puttuple(tupstore, tuple);
+ 	}
+ 	
+ 	/*
+ 	 * no longer need the tuple descriptor reference created by
+ 	 * TupleDescGetAttInMetadata()
+ 	 */
+ 	ReleaseTupleDesc(tupdesc);
+ 
+ 	tuplestore_donestoring(tupstore);
+ 	rsinfo->setResult = tupstore;
+ 
+ 	/*
+ 	 * SFRM_Materialize mode expects us to return a NULL Datum. The actual
+ 	 * tuples are in our tuplestore and passed back through
+ 	 * rsinfo->setResult. rsinfo->setDesc is set to the tuple description
+ 	 * that we actually used to build our tuples with, so the caller can
+ 	 * verify we did what it was expecting.
+ 	 */
+ 	rsinfo->setDesc = tupdesc;
+

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 6:55 AM, Robert Haas  wrote:
> On Mon, Jan 18, 2016 at 4:43 AM, Andres Freund  wrote:
>> On 2016-01-18 10:18:34 +0900, Michael Paquier wrote:
>>> We are trying to hide away from non-superusers WAL-related information
>>> in system views and system function, that's my point to do the same
>>> here.
>>
>> We are? pg_current_xlog_insert_location(), pg_current_xlog_location(),
>> pg_is_xlog_replay_paused(), pg_stat_bgwriter ... are all non-superuser?
>
> Yeah.  There's certainly no need for the WAL positions reported by
> pg_controldata to be any more restricted than other functions that
> give away information about WAL position.  We had some discussion
> about restricting WAL position information in general due to possible
> information leakage, and if we do that, then perhaps this should be
> similarly restricted.  Presumably vulnerabilities here would be harder
> to exploit because the values change much less frequently, so if you
> are trying to learn something the rate at which you can glean
> information will be much lower.  But maybe we should put the same
> restrictions on all of it.

Well, we can still use REVOKE on those functions, so it is not like a
user cannot restrict the access to this information. The current
situation makes it hard for both us and the user to figure out if an
instance is considered as secure or not, so things are unbalanced.
Perhaps the best answer is to add a documentation section to tell
people how to harden their database after initdb'ing it, with
different sections aimed at hardening different things, one being the
WAL information, and mention as well in those docs which hardening
action covers what. Stephen mentioned that some time ago, that would
still be good.

>>> For the data of pg_control, it seems to me that this can give
>>> away to any authorized users hints regarding the way Postgres is
>>> built, perhaps letting people know for example which Linux
>>> distribution is used and which flavor of Postgres is used (we already
>>> give away some information with version() but that's different than
>>> the libraries this is linking to), so an attacker may be able to take
>>> advantage of that to do attacks on potentially outdated packages? And
>>> I would think that many users are actually going to revoke the access
>>> of those functions to public if we are going to make them
>>> world-visible. It is easier as well to restrict things first, and then
>>> relax if necessary, than the opposite as well.
>>
>> Meh, that seems pretty far into pseudo security arguments.
>
> Yeah, I really don't see anything in the pg_controldata output that
> looks sensitive.  The WAL locations are the closest of anything,
> AFAICS.

The system identifier perhaps? I honestly don't have on top of my head
a way to exploit this information but leaking that at SQL level seems
sensible: that's a unique identifier of a Postgres instance used when
setting up a cluster after all.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (... tab completion

2016-01-18 Thread Andreas Karlsson


On 01/19/2016 01:57 AM, Andreas Karlsson wrote:

Thanks for the review. A new version is attached.


Whops, attached the wrong file.

Andreas
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ad8a580..bc80ed0 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1934,11 +1934,18 @@ psql_completion(const char *text, int start, int end)
 /* COPY */
 
 	/*
-	 * If we have COPY [BINARY] (which you'd have to type yourself), offer
-	 * list of tables (Also cover the analogous backslash command)
+	 * If we have COPY, offer list of tables or "(" (Also cover the analogous
+	 * backslash command).
 	 */
-	else if (Matches1("COPY|\\copy") || Matches2("COPY", "BINARY"))
+	else if (Matches1("COPY|\\copy"))
+		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables,
+   " UNION ALL SELECT '('");
+	/* If we have COPY BINARY, compelete with list of tables */
+	else if (Matches2("COPY", "BINARY"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
+	/* If we have COPY (, complete it with legal commands */
+	else if (TailMatches2("COPY|\\copy", "("))
+		COMPLETE_WITH_LIST7("SELECT", "TABLE", "VALUES", "INSERT", "UPDATE", "DELETE", "WITH");
 	/* If we have COPY [BINARY] , complete it with "TO" or "FROM" */
 	else if (Matches2("COPY|\\copy", MatchAny) ||
 			 Matches3("COPY", "BINARY", MatchAny))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improved tab completion for FDW DDL

2016-01-18 Thread Andreas Karlsson


On 01/11/2016 02:39 AM, Peter Eisentraut wrote:

The second part is not necessary, because there is already code that
completes FDW names after "FOREIGN DATA WRAPPER".  So this already works.


Good spot, thanks. I have no idea why I did not think it worked. Maybe 
it just did not work in 9.2 and I failed when trying to reproduce it on 
master.



- Also complete RENAME TO in ALTER FOREIGN DATA WRAPPER.


Done.


- Also complete OPTIONS in FOREIGN DATA WRAPPER and SERVER commands.


Done.

Andreas
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index c3c77bd..3d8cdf4 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1417,7 +1417,7 @@ psql_completion(const char *text, int start, int end)
 
 	/* ALTER FOREIGN DATA WRAPPER  */
 	else if (Matches5("ALTER", "FOREIGN", "DATA", "WRAPPER", MatchAny))
-		COMPLETE_WITH_LIST4("HANDLER", "VALIDATOR", "OPTIONS", "OWNER TO");
+		COMPLETE_WITH_LIST5("HANDLER", "VALIDATOR", "OPTIONS", "OWNER TO", "RENAME TO");
 
 	/* ALTER FOREIGN TABLE  */
 	else if (Matches4("ALTER", "FOREIGN", "TABLE", MatchAny))
@@ -1544,7 +1544,10 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_LIST3("MINVALUE", "MAXVALUE", "CYCLE");
 	/* ALTER SERVER  */
 	else if (Matches3("ALTER", "SERVER", MatchAny))
-		COMPLETE_WITH_LIST3("VERSION", "OPTIONS", "OWNER TO");
+		COMPLETE_WITH_LIST4("VERSION", "OPTIONS", "OWNER TO", "RENAME TO");
+	/* ALTER SERVER  VERSION */
+	else if (Matches5("ALTER", "SERVER", MatchAny, "VERSION", MatchAny))
+		COMPLETE_WITH_CONST("OPTIONS");
 	/* ALTER SYSTEM SET, RESET, RESET ALL */
 	else if (Matches2("ALTER", "SYSTEM"))
 		COMPLETE_WITH_LIST2("SET", "RESET");
@@ -1993,7 +1996,7 @@ psql_completion(const char *text, int start, int end)
 
 	/* CREATE FOREIGN DATA WRAPPER */
 	else if (Matches5("CREATE", "FOREIGN", "DATA", "WRAPPER", MatchAny))
-		COMPLETE_WITH_LIST2("HANDLER", "VALIDATOR");
+		COMPLETE_WITH_LIST3("HANDLER", "VALIDATOR", "OPTIONS");
 
 	/* CREATE INDEX --- is allowed inside CREATE SCHEMA, so use TailMatches */
 	/* First off we complete CREATE UNIQUE with "INDEX" */
@@ -2372,6 +2375,10 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches3("FOREIGN", "DATA", "WRAPPER") &&
 			 !TailMatches4("CREATE", MatchAny, MatchAny, MatchAny))
 		COMPLETE_WITH_QUERY(Query_for_list_of_fdws);
+	/* applies in CREATE SERVER */
+	else if (TailMatches4("FOREIGN", "DATA", "WRAPPER", MatchAny) &&
+			 HeadMatches2("CREATE", "SERVER"))
+		COMPLETE_WITH_CONST("OPTIONS");
 
 /* FOREIGN TABLE */
 	else if (TailMatches2("FOREIGN", "TABLE") &&
@@ -2816,6 +2823,8 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH_QUERY(Query_for_list_of_user_mappings);
 	else if (Matches5("CREATE|ALTER|DROP", "USER", "MAPPING", "FOR", MatchAny))
 		COMPLETE_WITH_CONST("SERVER");
+	else if (Matches7("CREATE|ALTER", "USER", "MAPPING", "FOR", MatchAny, "SERVER", MatchAny))
+		COMPLETE_WITH_CONST("OPTIONS");
 
 /*
  * VACUUM [ FULL | FREEZE ] [ VERBOSE ] [ table ]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Joe Conway

On 01/17/2016 02:29 PM, Joe Conway wrote:
>> Documentation would be good to have.
> 
> I'm definitely happy to write the docs, but earlier it was not clear
> that there was enough support for this patch at all, and I don't want to
> waste cycles writing docs for a feature that ultimately does not get
> committed. What's the current feel for whether this feature in general
> is a good idea or bad?


Thoughts anyone?

>> ! # don't include subdirectory-path-dependent -I and -L switches
>> ! STD_CPPFLAGS := $(filter-out -I$(top_srcdir)/src/include
>> -I$(top_builddir)/src/include,$(CPPFLAGS))
>> ! STD_LDFLAGS := $(filter-out -L$(top_builddir)/src/port,$(LDFLAGS))
>> ! override CPPFLAGS += -DVAL_CONFIGURE="\"$(configure_args)\""
>> ! override CPPFLAGS += -DVAL_CC="\"$(CC)\""
>> ! override CPPFLAGS += -DVAL_CPPFLAGS="\"$(STD_CPPFLAGS)\""
>> ! override CPPFLAGS += -DVAL_CFLAGS="\"$(CFLAGS)\""
>> ! override CPPFLAGS += -DVAL_CFLAGS_SL="\"$(CFLAGS_SL)\""
>> ! override CPPFLAGS += -DVAL_LDFLAGS="\"$(STD_LDFLAGS)\""
>> ! override CPPFLAGS += -DVAL_LDFLAGS_EX="\"$(LDFLAGS_EX)\""
>> ! override CPPFLAGS += -DVAL_LDFLAGS_SL="\"$(LDFLAGS_SL)\""
>> ! override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
>> This duplication from src/bin/pg_config is a bad idea. Couldn't we do
>> something in src/common instead that sets up values at compilation
>> time in a routine (perhaps set of routines) available for both the
>> frontend and backend?
> 
> Will take a look at it.

Please see the attached. Duplication removed.

Thanks,

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 923fe58..abf9a70 100644
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** CREATE VIEW pg_timezone_abbrevs AS
*** 433,438 
--- 433,444 
  CREATE VIEW pg_timezone_names AS
  SELECT * FROM pg_timezone_names();
  
+ CREATE VIEW pg_config AS
+ SELECT * FROM pg_config();
+ 
+ REVOKE ALL on pg_config FROM PUBLIC;
+ REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC;
+ 
  -- Statistics views
  
  CREATE VIEW pg_stat_all_tables AS
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 7889101..a0c82c1 100644
*** a/src/backend/utils/misc/Makefile
--- b/src/backend/utils/misc/Makefile
*** include $(top_builddir)/src/Makefile.glo
*** 14,21 
  
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
! OBJS = guc.o help_config.o pg_rusage.o ps_status.o rls.o \
!sampling.o superuser.o timeout.o tzparser.o
  
  # This location might depend on the installation directories. Therefore
  # we can't subsitute it into pg_config.h.
--- 14,21 
  
  override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)
  
! OBJS = guc.o help_config.o pg_config.o pg_rusage.o \
!ps_status.o rls.o sampling.o superuser.o timeout.o tzparser.o
  
  # This location might depend on the installation directories. Therefore
  # we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/pg_config.c b/src/backend/utils/misc/pg_config.c
index ...05ee67b .
*** a/src/backend/utils/misc/pg_config.c
--- b/src/backend/utils/misc/pg_config.c
***
*** 0 
--- 1,102 
+ /*-
+  *
+  * pg_config.c
+  *		Expose same output as pg_config except as an SRF
+  *
+  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * IDENTIFICATION
+  *	  src/backend/utils/misc/pg_config.c
+  *
+  */
+ 
+ #include "postgres.h"
+ 
+ #include "funcapi.h"
+ #include "miscadmin.h"
+ #include "catalog/pg_type.h"
+ #include "common/config_info.h"
+ #include "utils/builtins.h"
+ #include "utils/elog.h"
+ #include "port.h"
+ 
+ Datum
+ pg_config(PG_FUNCTION_ARGS)
+ {
+ 	ReturnSetInfo	   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	Tuplestorestate	   *tupstore;
+ 	HeapTuple			tuple;
+ 	TupleDesc			tupdesc;
+ 	AttInMetadata	   *attinmeta;
+ 	MemoryContext		per_query_ctx;
+ 	MemoryContext		oldcontext;
+ 	configdata		   *ConfigData;
+ 	size_tconfigdata_len;
+ 	char			   *values[2];
+ 	int	i = 0;
+ 
+ 	/* check to see if caller supports us returning a tuplestore */
+ 	if (!rsinfo || !(rsinfo->allowedModes & SFRM_Materialize))
+ 		ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+  errmsg("materialize mode required, but it is not "
+ 		"allowed in this context")));
+ 
+ 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+ 
+ 	/* get the requested return tuple description */
+ 	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+ 
+ 	/*
+ 	 * Check to make sure we have a reasonable tuple descriptor
+ 	 */
+ 	if (tupdesc->natts != 2 ||
+ 		tupdesc->attrs[0]->atttypid !=

Re: [HACKERS] source files without copyright notices

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 5:55 AM, Joe Conway  wrote:
> I never noticed before, but today I came across a header file without
> any copyright notice at all. Turns out there are quite a few:
>
>   grep -riL Copyright src/* --include=*.c --include=*.h
>
> Shouldn't at least some of these get a copyright?

+1 for adding them where needed to be consistent, for example the
files in snowball don't need them. It would be nice as well to fix the
header format of what is in ecpg... That's a boring and repetitive
work, I don't mind helping out.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Bruce Momjian

fOn Mon, Jan 18, 2016 at 01:54:02PM -0800, Joe Conway wrote:
> On 01/18/2016 01:47 PM, Bruce Momjian wrote:
> > On Sun, Jan 17, 2016 at 02:24:46PM -0800, Joe Conway wrote:
> >> On 01/16/2016 06:02 AM, Michael Paquier wrote:
> >>> On Wed, Dec 30, 2015 at 9:08 AM, Joe Conway  wrote:
>  1) Change NextXID output format from "%u/%u" to "%u:%u"
> (see recent hackers thread)
> >>>
> >>> ! printf(_("Latest checkpoint's NextXID:  %u/%u\n"),
> >>>  ControlFile.checkPointCopy.nextXidEpoch,
> >>>  ControlFile.checkPointCopy.nextXid);
> >>>   printf(_("Latest checkpoint's NextOID:  %u\n"),
> >>> --- 646,652 
> >>>  ControlFile.checkPointCopy.ThisTimeLineID);
> >>>   printf(_("Latest checkpoint's full_page_writes: %s\n"),
> >>>  ControlFile.checkPointCopy.fullPageWrites ? _("on") : 
> >>> _("off"));
> >>> ! printf(_("Latest checkpoint's NextXID:  %u:%u\n"),
> >>> This should be definitely a separate patch.
> >>
> >> Ok. Notwithstanding Simon's reply, there seems to be consensus that this
> >> is the way to go. Will commit it this way unless some additional
> >> objections surface in the next day or so.
> > 
> > FYI, this slash-colon change will break pg_upgrade unless it is patched.
> > Dp you want a patch from me?
> 
> Didn't realize that -- yes please.

Sure, attached, and it would be applied only to head, where you change
pg_controldata.  pg_upgrade has to read the old and new cluster's
pg_controldata.  We could get more sophisticated by checking the catalog
version number where the format was changed, but that doesn't seem worth
it, and is overly complex because we get the catalog version number from
pg_controldata, so you would be adding a dependency in ordering of the
pg_controldata entries.

I can test all suppored Postgres versions with pg_upgrade once you apply
the patch, but I think it will be fine. 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
new file mode 100644
index 1f7b65e..aaaea7b
*** a/src/bin/pg_upgrade/controldata.c
--- b/src/bin/pg_upgrade/controldata.c
*** get_control_data(ClusterInfo *cluster, b
*** 198,203 
--- 198,206 
  			cluster->controldata.chkpnt_nxtepoch = str2uint(p);
  
  			p = strchr(p, '/');
+ 			/* delimiter changed from '/' to ':' in 9.6 */
+ 			if (p == NULL && GET_MAJOR_VERSION(cluster->major_version) >= 906)
+ p = strchr(p, ':');
  			if (p == NULL || strlen(p) <= 1)
  pg_fatal("%d: controldata retrieval problem\n", __LINE__);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-01-18 Thread Amit Kapila

On Mon, Jan 18, 2016 at 11:06 PM, Robert Haas  wrote:
>
> On Mon, Jan 18, 2016 at 11:09 AM, Alvaro Herrera
>  wrote:
> > Amit Kapila wrote:
> >
> >> The reason for not updating the patch related to this thread is that
it is
> >> dependent on the work for refactoring the tranches for LWLocks [1]
> >> which is now coming towards an end, so I think it is quite reasonable
> >> that the patch can be updated for this work during commit fest, so
> >> I am moving it to upcoming CF.
> >
> > Thanks.  I think the tranche reworks are mostly done now, so is anyone
> > submitting an updated version of this patch?
> >

Before updating the patch, it is better to clarify few points as mentioned
below.

>
> > Also, it would be very good if someone can provide insight on how this
> > patch interacts with the other submitted patch for "waiting for
> > replication" https://commitfest.postgresql.org/8/436/
> > Andres seems to think that the other patch is completely independent of
> > this one, i.e. the "waiting for replication" column needs to exist
> > separately and not as part of the "more descriptive" new 'waiting'
> > column.
>
> Yeah, I really don't agree with that.  I think that it's much better
> to have one column that says what you are waiting for than a bunch of
> separate columns that tell you whether you are waiting for individual
> things for which you might be waiting.  I think this patch, which
> introduces the general mechanism, should win: and the other patch
> should then be one client of that mechanism.
>

I agree with what you have said, but I think here bigger question is about
the UI and which is the more appropriate place to store wait information. I
will try to summarize the options discussed.

Initially, we started with extending the 'waiting' column in
pg_stat_activity,
to which some people have raised concerns about backward
compatability, so another option that came-up during discussion was to
retain waiting as it-is and have an additional column 'wait_event' in
pg_stat_activity, after that there is feedback that we should try to include
wait information about background processes as well which raises a bigger
question whether it is any good to expose this information via
pg_stat_activity
(pg_stat_activity doesn't display information about background processes)
or is it better to have a new view as discussed here [1].

Second important and somewhat related point is whether we should save
this information in PGPROC as 4 bytes or keep it in pgBackendStatus.
I think it is better to store in PGPROC, if we want to save wait information
for backend processes as well.

I am of opinion that we should save this information in PGPROC and
expose it via new view, but I am open to go other ways based on what
others think about this matter.

[1] -
http://www.postgresql.org/message-id/CAA4eK1+=5Ex8-5NRr3u94=_t2p65v0kcjZ5rXddVmkx=lwa...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread David Rowley

On 19 January 2016 at 06:03, Pavel Stehule  wrote:

>
>
> >
>> > # explain analyze select a%100,length(string_agg(b,',')) from ab
>> group
>> > by 1;
>> > QUERY PLAN
>> >
>> ---
>> >  GroupAggregate  (cost=119510.84..144510.84 rows=100 width=32)
>> (actual
>> > time=538.938..1015.278 rows=100 loops=1)
>> >Group Key: ((a % 100))
>> >->  Sort  (cost=119510.84..122010.84 rows=100 width=32) (actual
>> > time=538.917..594.194 rows=100 loops=1)
>> >  Sort Key: ((a % 100))
>> >  Sort Method: quicksort  Memory: 102702kB
>> >  ->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
>> > (actual time=0.016..138.964 rows=100 loops=1)
>> >  Planning time: 0.146 ms
>> >  Execution time: 1047.511 ms
>> >
>> >
>> > Patched
>> > # explain analyze select a%100,length(string_agg(b,',')) from ab
>> group
>> > by 1;
>> >QUERY PLAN
>> >
>> 
>> >  HashAggregate  (cost=24853.00..39853.00 rows=100 width=32) (actual
>> > time=8072.346..144424.872 rows=100 loops=1)
>> >Group Key: (a % 100)
>> >->  Seq Scan on ab  (cost=0.00..19853.00 rows=100 width=32)
>> (actual
>> > time=0.025..481.332 rows=100 loops=1)
>> >  Planning time: 0.164 ms
>> >  Execution time: 263288.332 ms
>>
>> Well, that's pretty odd.  I guess the plan change must be a result of
>> switching the transition type from internal to text, although I'm not
>> immediately certain why that would make a difference.
>>
>
> It is strange, why hashaggregate is too slow?
>

Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million of
them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.

set work_mem = '130MB' does coax the planner into a GroupAggregate plan,
which is faster, but due to the the hash agg executor code not giving any
regard to work_mem. If I set work_mem to 140MB (which is more realistic for
this VM), it does cause the same swapping problems to occur.  Probably
setting aggtransspace for this aggregate to 1024 would help the costing
problem, but it would also cause hashagg to be a less chosen option during
planning.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Robert Haas

On Mon, Jan 18, 2016 at 12:34 PM, Robert Haas  wrote:
> On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> Yeah.  The API contract for an expanded object took me quite a while
>>> to puzzle out, but it seems to be this: if somebody hands you an R/W
>>> pointer to an expanded object, you're entitled to assume that you can
>>> "take over" that object and mutate it however you like.  But the
>>> object might be in some other memory context, so you have to move it
>>> into your own memory context.
>>
>> Only if you intend to keep it --- for example, a function that is mutating
>> and returning an object isn't required to move it somewhere else, if the
>> input is R/W, and I think it generally shouldn't.
>>
>> In the context here, I'd think it is the responsibility of nodeAgg.c
>> not individual datatype functions to make sure that expanded objects
>> live where it wants them to.
>
> That's how I did it in my prototype, but the problem with that is that
> spinning up a memory context for every group sucks when there are many
> groups with only a small number of elements each - hence the 50%
> regression that David Rowley observed.  If we're going to use expanded
> objects here, which seems like a good idea in principle, that's going
> to have to be fixed somehow.  We're flogging the heck out of malloc by
> repeatedly creating a context, doing one or two allocations in it, and
> then destroying the context.
>
> I think that, in general, the memory context machinery wasn't really
> designed to manage lots of small contexts.  The overhead of spinning
> up a new context for just a few allocations is substantial.  That
> matters in some other situations too, I think - I've commonly seen
> AllocSetContextCreate taking several percent  of runtime in profiles.
> But here it's much exacerbated.  I'm not sure whether it's better to
> attack that problem at the root and try to make AllocSetContextCreate
> cheaper, or whether we should try to figure out some change to the
> expanded-object machinery to address the issue.

Here is a patch that helps a good deal.  I changed things so that when
we create a context, we always allocate at least 1kB.  If that's more
than we need for the node itself and the name, then we use the rest of
the space as a sort of keeper block.  I think there's more that can be
done to improve this, but I'm wimping out for now because it's late
here.

I suspect something like this is a good idea even if we don't end up
using it for aggregate transition functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index d26991e..3730a21 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -164,6 +164,14 @@ typedef void *AllocPointer;
 /*
  * AllocSetContext is our standard implementation of MemoryContext.
  *
+ * Note: An AllocSetContext can include one or more "keeper" blocks which
+ * are not freed when the context is reset.  "keeper" should point to the
+ * first of these blocks.  Sometimes, there may be a block which shouldn't
+ * be freed even when the context is deleted (because that space isn't a
+ * separate allocation); if so, "stopper" can point to this block.  "keeper"
+ * must be set whenever "stopper" is set, and must point to the same block
+ * as "stopper" or an earlier one.
+ *
  * Note: header.isReset means there is nothing for AllocSetReset to do.
  * This is different from the aset being physically empty (empty blocks list)
  * because we may still have a keeper block.  It's also different from the set
@@ -181,7 +189,8 @@ typedef struct AllocSetContext
 	Size		maxBlockSize;	/* maximum block size */
 	Size		nextBlockSize;	/* next block size to allocate */
 	Size		allocChunkLimit;	/* effective chunk size limit */
-	AllocBlock	keeper;			/* if not NULL, keep this block over resets */
+	AllocBlock	keeper;		/* on reset, stop freeing when we reach this */
+	AllocBlock	stopper;	/* on delete, stop freeing when we reach this */
 } AllocSetContext;
 
 typedef AllocSetContext *AllocSet;
@@ -248,7 +257,6 @@ typedef struct AllocChunkData
 static void *AllocSetAlloc(MemoryContext context, Size size);
 static void AllocSetFree(MemoryContext context, void *pointer);
 static void *AllocSetRealloc(MemoryContext context, void *pointer, Size size);
-static void AllocSetInit(MemoryContext context);
 static void AllocSetReset(MemoryContext context);
 static void AllocSetDelete(MemoryContext context);
 static Size AllocSetGetChunkSpace(MemoryContext context, void *pointer);
@@ -267,7 +275,6 @@ static MemoryContextMethods AllocSetMethods = {
 	AllocSetAlloc,
 	AllocSetFree,
 	AllocSetRealloc,
-	AllocSetInit,
 	AllocSetReset,
 	AllocSetDelete,
 	AllocSetGetChunkSpace,
@@ -440,13 +447,45 @@ AllocSetContextCreate(MemoryContext parent,
 	  Size maxBlockSize)
 {
 	AllocSet

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-18 Thread Vitaly Burovoy

On 1/4/16, Pavel Stehule  wrote:
> 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr  :
>> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule  
>> wrote:
>> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr 
>> > :
>> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas  wrote:
>> >>
>> >> I'm also inclined on dropping that explicit check for empty string
>> >> below and let numeric_in() error out on that.  Does this look OK, or can
>> >> it confuse someone:
>> >> postgres=# select pg_size_bytes('');
>> >> ERROR:  invalid input syntax for type numeric: ""
>> >
>> > both fixed
>>
>> Hm...
>>
>> > + switch (*strptr)
>> > + {
>> > + /* ignore plus symbol */
>> > + case '+':
>> > + case '-':
>> > + *bufptr++ = *strptr++;
>> > + break;
>> > + }
>>
>> Well, to that amount you don't need any special checks, I'm just not sure
>> if reported error message is not misleading if we let numeric_in() handle
>> all the errors.  At least it can cope with the leading spaces, +/- and
>> empty input quite well.
>>
>
> I don't would to catch a exception from numeric_in - so I try to solve some
> simple situations, where I can raise possible better error message.

There are several cases where your behavior gives strange errors (see below).

Next batch of notes:

src/include/catalog/pg_proc.h:
---
+ DATA(insert OID = 3317 ( pg_size_bytes...
now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free

---
+ DESCR("convert a human readable text with size units to big int bytes");
May be the best way is to copy the first sentence from the doc?
("convert a size in human-readable format with size units into bytes")


src/backend/utils/adt/dbsize.c:
+ text *arg = PG_GETARG_TEXT_PP(0);
+ char *str = text_to_cstring(arg);
...
+   /* working buffer cannot be longer than original string */
+   buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
Is there any reason to get TEXT for only converting it to cstring and
call VARSIZE_ANY_EXHDR instead of strlen?

---
+   text   *arg = PG_GETARG_TEXT_PP(0);
+   char   *str = text_to_cstring(arg);
+   char*strptr = str;
+   char   *buffer;
There are wrong offsets of variable names after their types (among all
body of the "pg_size_bytes" function).
See variable declarations in nearby functions (
  >> "make the new code look like the existing code around it"
  http://www.postgresql.org/docs/devel/static/source-format.html
)

---
+errmsg("\"%s\" is not number", str)));
s/is not number/is not a number/
(the second version can be found in a couple places besides translations)

---
+   if (*strptr != '\0')
...
+   while (*strptr && !isspace(*strptr))
Sometimes it explicitly compares to '\0', sometimes implicitly.
Common use is explicit comparison and it is preferred due to different
compilers (their conversions to boolean).

---
+   /* Skip leading spaces */
...
+   /* ignore plus symbol */
...
+   /* copy digits to working buffer */
...
+   /* allow whitespace between integer and unit */
I'm also inclined on dropping that explicit skipping spaces, checking
for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
and let numeric_in() error out on that.

It allows to get correct error messages for something like:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: ".+912"
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid input syntax for type numeric: "+912+ "
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "++123 "

instead of current:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: "."
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid unit: "+ kB"
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "+"

---
+   while (isspace((unsigned char) *strptr))
...
+   while (isspace(*strptr))
...
+   while (*strptr && !isspace(*strptr))
...
+   while (isspace(*strptr))
The first occurece of isspace's parameter is casting to "unsigned
char" whereas the others are not.
Note:
"The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF"
Proof:
http://en.cppreference.com/w/c/string/byte/isspace

---
+   pfree(buffer);
+   pfree(str);
pfree-s here are not necessary. See:
http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 1:49 PM, Michael Paquier
 wrote:
> On Tue, Jan 19, 2016 at 11:08 AM, Joe Conway  wrote:
>> On 01/18/2016 04:16 PM, Joe Conway wrote:
>>> Please see the attached. Duplication removed.
>>
>> Actually please see this version instead.
>
> Thanks for the new patch.
>
> +   tuplestore_puttuple(tupstore, tuple);
> +   }
> +
> +   /*
> +* no longer need the tuple descriptor reference created by
> The patch has some whitespaces.
>
> +REVOKE ALL on pg_config FROM PUBLIC;
> +REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC;
> I guess that this portion is still under debate :)
>
> make[1]: Nothing to be done for `all'.
> make -C ../backend submake-errcodes
> make[2]: *** No rule to make target `config_info.o', needed by
> `libpgcommon.a'.  Stop.
> make[2]: *** Waiting for unfinished jobs
> The patch is visibly forgetting to include config_info.c, which should
> be part of src/common.
>
>  /*
> + * This function cleans up the paths for use with either cmd.exe or Msys
> + * on Windows. We need them to use filenames without spaces, for which a
> + * short filename is the safest equivalent, eg:
> + * C:/Progra~1/
> + */
> +void
> +cleanup_path(char *path)
> +{
> Perhaps this refactoring would be useful as a separate patch?

You need as well to update @pgcommonallfiles in Mkvcbuild.pm or the
compilation with MSVC is going to fail.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread David Rowley

On 19 January 2016 at 18:04, Tomas Vondra 
wrote:

> Hi,
>
> On 01/19/2016 05:00 AM, David Rowley wrote:
>
>> On 19 January 2016 at 06:03, Pavel Stehule > > wrote:
>>
>> ...
>
>>
>> It is strange, why hashaggregate is too slow?
>>
>>
>> Good question. I looked at this and found my VM was swapping like crazy.
>> Upon investigation it appears that's because, since the patch creates a
>> memory context per aggregated group, and in this case I've got 1 million
>> of them, it means we create 1 million context, which are
>> ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
>> which is more than my VM likes.
>>
>
> Really? Where do we create the memory context? IIRC string_agg uses the
> aggcontext directly, and indeed that's what I see in string_agg_transfn and
> makeStringAggState.
>
>
Yeah, all this is talk is relating to Robert's expandedstring-v1.patch
which changes string_agg to use text and expanded-objects. This also means
that a memory context is created per group, which is rather a big overhead.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tomas Vondra



On 01/19/2016 05:14 AM, Robert Haas wrote:

On Mon, Jan 18, 2016 at 12:34 PM, Robert Haas  wrote:

On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:

Robert Haas  writes:

Yeah.  The API contract for an expanded object took me quite a while
to puzzle out, but it seems to be this: if somebody hands you an R/W
pointer to an expanded object, you're entitled to assume that you can
"take over" that object and mutate it however you like.  But the
object might be in some other memory context, so you have to move it
into your own memory context.


Only if you intend to keep it --- for example, a function that is
mutating and returning an object isn't required to move it
somewhere else, if the input is R/W, and I think it generally
shouldn't.

In the context here, I'd think it is the responsibility of
nodeAgg.c not individual datatype functions to make sure that
expanded objects live where it wants them to.


That's how I did it in my prototype, but the problem with that is that
spinning up a memory context for every group sucks when there are many
groups with only a small number of elements each - hence the 50%
regression that David Rowley observed.  If we're going to use expanded
objects here, which seems like a good idea in principle, that's going
to have to be fixed somehow.  We're flogging the heck out of malloc by
repeatedly creating a context, doing one or two allocations in it, and
then destroying the context.

I think that, in general, the memory context machinery wasn't really
designed to manage lots of small contexts.  The overhead of spinning
up a new context for just a few allocations is substantial.  That
matters in some other situations too, I think - I've commonly seen
AllocSetContextCreate taking several percent  of runtime in profiles.
But here it's much exacerbated.  I'm not sure whether it's better to
attack that problem at the root and try to make AllocSetContextCreate
cheaper, or whether we should try to figure out some change to the
expanded-object machinery to address the issue.


Here is a patch that helps a good deal.  I changed things so that when
we create a context, we always allocate at least 1kB.  If that's more
than we need for the node itself and the name, then we use the rest of
the space as a sort of keeper block.  I think there's more that can be
done to improve this, but I'm wimping out for now because it's late
here.

I suspect something like this is a good idea even if we don't end up
using it for aggregate transition functions.


I dare to claim that if expanded objects require a separate memory 
context per aggregate state (I don't see why would be the case), it's a 
dead end. Not so long ago we've fixed exactly this issue in array_agg(), 
which addressed a bunch of reported OOM issues and improved array_agg() 
performance quite a bit. It'd be rather crazy introducing the same 
problem to all aggregate functions.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Advices on custom data type and extension development

2016-01-18 Thread Luciano Coutinho Barcellos


Dear friends,

I'm planning to develop an extension, and I'm here for getting some 
help. But I would like to share the problem I intend to solve. Maybe my 
desired solution is not a good option.


What I have:

* a lot of data being generated every day, which are mainly 
queried by an immutable column of type date or timestamp;
* as a standard, almost every table has a bigserial id column 
as a primary key;
* data is huge enough to demand table partitioning, which is 
implemented as suggested in Postgres documentation, by using triggers 
and table inheritance. A function called by cron deal with creation of 
new partitions.


What I would like to develop first is a custom type (let's call it 
datedserial) for replacing bigserial as the primary key:


* the type would be 8 bytes long, being 4 dedicated to storing 
the Date, and 4 dedicated to storing a serial within that day;
* the text representation of the type would show its date and 
its serial number (something like '2015-10-02.007296' as a canonical 
form, but which could accept inputs like '20151002.007296');
* as a consequence of this internal representation, the serial 
part could not be greater than 4 billion and some;
* support for operator classes allowing the type being used in 
GIN and GIST indexes would be optional for now.


That would allow me to have a compact primary key which I could use 
to partition the table based on the object's date. That would also allow 
me to partition detail tables on the foreign key column having this data 
type. Besides that, just by examining the value, mainly when used as a 
foreign key, I could infer where the record belongs to.


When I have a working custom data type, I would go to the next and 
harder part. I would like to create a new structure like a sequence, and 
it should behave exactly like sequences, but separated by a date space. 
So I would have functions similar to the following:


* createsequencegroup(sequence_group_name text): create a new 
named structure for managing the sequence group;
* nextval(sequence_group_name text, context_date date): return 
next value of the sequence (as a datedserial) belonging to the sequence 
group and associated with the context date. The value returned have the 
context_date in its date part and the next value for that date in the 
sequence part. The first call for a specific date would return 1 for the 
sequence part. Concerning to concurrency and transactions, the function 
behaves exactly like nextval(sequence_group_name text);
* currval(sequence_group_name text, context_date date): the 
currval function counterpart;
* setval(sequence_group_name text, context_date date, int4 
value): the setval function counterpart;
* freeze_before(sequence_group_name text, freeze_date date): 
disallow using the sequence group with context dates before the freeze_date.


I would consider extending the data type to allow including 
information about the cluster which generated the values. This way, the 
user could set a configuration entry defining a byte value for 
identifying the cluster among others involved in replication, so that 
the sequence group could have different sequences not only for different 
dates, but for different nodes as well.


As I've said, I would like to package the resulting work as an 
extension.


For now, I would like some help about where to start. I've 
downloaded the postgres source code and have successfully compiled it 
using my Ubuntu desktop, although have not tested the resulting binary. 
Should I create a folder in the contrib directory and use another 
extension as a starting point? Is this the recommended path? Or is this 
too much and I should create a separate project?


Thanks in advance.

Best regards,
Luciano Barcellos



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tom Lane

Robert Haas  writes:
> Here is a patch that helps a good deal.  I changed things so that when
> we create a context, we always allocate at least 1kB.

That's going to kill performance in some other cases; subtransactions
in particular rely on the subtransaction's TransactionContext not causing
any actual malloc unless something gets put into the TransactionContext.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [JDBC] 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-18 Thread Pavel Stehule

2016-01-18 23:50 GMT+01:00 Thomas Kellerer :

> Robert Haas wrote:
> > This isn't the first complaint about this mechanism that we've gotten,
> > and it won't be the last.  Way too many of our users are way more
> > aware than they should be that the threshold here is five rather than
> > any other number, which to me is a clear-cut sign that this needs to
> > be improved.  How to improve it is a harder question.  We lack the
> > ability to do any kind of sensitivity analysis on a plan, so we can't
> > know whether there are other parameter values that would have resulted
> > in a different plan, nor can we test whether a particular set of
> > parameter values would have changed the outcome.
>
> (I initially posted that question on the JDBC mailing list)
>
> To be honest: looking at the efforts Oracle has done since 9 up until 12 I
> am not sure this is a problem that can be solved by caching plans.
>
> Even with the new "in-flight" re-planning in Oracle 12 ("cardinality
> feedback") and all the effort that goes into caching plans we are still
> seeing similar problems with (prepared) statements that are suddenly slow.
> And as far as I can tell, the infrastructure around plan caching,
> invalidation, bind variable peeking and all that seems to be a *lot* more
> complex ("sophisticated") in Oracle compared to Postgres. And the results
> don't seem to justify the effort (at least in my experience).
>
> With all the problems I have seen (in Oracle and Postgres) I think that
> maybe a better solution to this problem is to make the planner fast (and
> reliable) enough so that plan caching isn't necessary in the first place.
>
> However I have no idea how feasible that is.
>

for statements like INSERT INTO tab VALUES(..), UPDATE tab SET x = WHERE id
= .. will be planner significant overhead. But these statements are
relative simply and probably some solution is exists.

Regards

Pavel



>
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Fwd-JDBC-Re-9-4-1207-behaves-differently-with-server-side-prepared-statements-compared-to-9-2-1102-tp5881825p5882835.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 11:08 AM, Joe Conway  wrote:
> On 01/18/2016 04:16 PM, Joe Conway wrote:
>> Please see the attached. Duplication removed.
>
> Actually please see this version instead.

Thanks for the new patch.

+   tuplestore_puttuple(tupstore, tuple);
+   }
+
+   /*
+* no longer need the tuple descriptor reference created by
The patch has some whitespaces.

+REVOKE ALL on pg_config FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_config() FROM PUBLIC;
I guess that this portion is still under debate :)

make[1]: Nothing to be done for `all'.
make -C ../backend submake-errcodes
make[2]: *** No rule to make target `config_info.o', needed by
`libpgcommon.a'.  Stop.
make[2]: *** Waiting for unfinished jobs
The patch is visibly forgetting to include config_info.c, which should
be part of src/common.

 /*
+ * This function cleans up the paths for use with either cmd.exe or Msys
+ * on Windows. We need them to use filenames without spaces, for which a
+ * short filename is the safest equivalent, eg:
+ * C:/Progra~1/
+ */
+void
+cleanup_path(char *path)
+{
Perhaps this refactoring would be useful as a separate patch?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2016-01-18 Thread Vitaly Burovoy

On 1/18/16, Vitaly Burovoy  wrote:
> <>
> ---
> + if (*strptr != '\0')
> ...
> + while (*strptr && !isspace(*strptr))
> Sometimes it explicitly compares to '\0', sometimes implicitly.
> Common use is explicit comparison and it is preferred due to different
> compilers (their conversions to boolean).
>
> ---
> <>

It seems I distracted on something else... That lines are ok, skip that block.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tomas Vondra


Hi,

On 01/19/2016 05:00 AM, David Rowley wrote:

On 19 January 2016 at 06:03, Pavel Stehule > wrote:


...


It is strange, why hashaggregate is too slow?


Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million
of them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.


Really? Where do we create the memory context? IIRC string_agg uses the 
aggcontext directly, and indeed that's what I see in string_agg_transfn 
and makeStringAggState.


Perhaps you mean that initStringInfo() allocates 1kB buffers by default?



set work_mem = '130MB' does coax the planner into a GroupAggregate plan,
which is faster, but due to the the hash agg executor code not giving
any regard to work_mem. If I set work_mem to 140MB (which is more
realistic for this VM), it does cause the same swapping problems to
occur.  Probably setting aggtransspace for this aggregate to 1024 would
help the costing problem, but it would also cause hashagg to be a less
chosen option during planning.


I'm not quite sure I understand - the current code ends up using 8192 
for the transition space (per count_agg_clauses_walker). Are you 
suggesting lowering the value, despite the danger of OOM issues?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing service-related code in pg_ctl for Cygwin

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 2:32 AM, Andrew Dunstan  wrote:
>
>
> On 01/14/2016 12:38 AM, Michael Paquier wrote:
>>
>> Hi all,
>>
>> Beginning a new thread seems more adapted regarding $subject but
>> that's mentioned here as well:
>>
>> http://www.postgresql.org/message-id/CAB7nPqQXghm_SdB5iniupz1atzMxk=95gv9a8ocdo83sxcn...@mail.gmail.com
>>
>> It happens based on some investigation from Andrew that cygwin does
>> not need to use the service-related code, aka register/unregister
>> options or similar that are proper to WIN32 and rely instead on
>> cygrunsrv with a SYSV-like init file to manage the service. Based on
>> my tests with cygwin, I am able to see as well that a server started
>> within a cygwin session is able to persist even after it is closed,
>> which is kind of nice and I think that it is a additional reason to
>> not use the service-related code paths. Hence what about the following
>> patch, that makes cygwin behave like any *nix OS when using pg_ctl?
>> This simplifies a bit the code.
>>
>> Marco, as I think you do some packaging for Postgres in Cygwin, and
>> others, does that sound acceptable?
>>
>
>
>
> I think we can be a bit more adventurous and remove all the Cygwin-specific
> code. See attached patch, which builds fine on buildfarm cockatiel.

Ah, OK. I see the difference. It builds as well for me.

-#ifndef __CYGWIN__
-   AddUserToTokenDacl(restrictedToken);
-#endif
[...]
-#if defined(WIN32) || defined(__CYGWIN__)
+#ifdef WIN32
setvbuf(stderr, NULL, _IONBF, 0);
 #endif
Fine for me, those two do not seem to matter much as far as I have tested.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread Tomas Vondra




On 01/19/2016 06:04 AM, Tomas Vondra wrote:

Hi,

On 01/19/2016 05:00 AM, David Rowley wrote:

Good question. I looked at this and found my VM was swapping like crazy.
Upon investigation it appears that's because, since the patch creates a
memory context per aggregated group, and in this case I've got 1 million
of them, it means we create 1 million context, which are
ALLOCSET_SMALL_INITSIZE (1KB) in size, which means about 1GB of memory,
which is more than my VM likes.


Really? Where do we create the memory context? IIRC string_agg uses the
aggcontext directly, and indeed that's what I see in string_agg_transfn
and makeStringAggState.

Perhaps you mean that initStringInfo() allocates 1kB buffers by default?


...


I'm not quite sure I understand - the current code ends up using 8192
for the transition space (per count_agg_clauses_walker). Are you
suggesting lowering the value, despite the danger of OOM issues?


Meh, right after sending the message I realized that perhaps this 
relates to Robert's patch and that maybe it changes this. And indeed, it 
changes the type from internal to text, and thus the special case in 
count_agg_clauses_walker no longer applies.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 1:40 AM, Masahiko Sawada  wrote:
> On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
>  wrote:
>> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada  
>> wrote:
>>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>>> * Confirm that the data is synchronously replicated to multiple
>>> standbys in same cases.
>>>   * case 1 : The standby which is not listed in s_s_name, is down
>>>   * case 2 : The standby which is listed in s_s_names but potential
>>> standby, is down
>>>   * case 3 : The standby which is considered as sync standby, is down.
>>> * Standby promotion
>>>
>>> In order to confirm that the commit isn't done in case #3 forever
>>> unless new sync standby is up, I think we need the framework that
>>> cancels executing query.
>>> That is, what I'm planning is,
>>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>>> 2. Set up two standby servers
>>> 3. Standby1 is down
>>> 4. Create some contents on master (But transaction is not committed)
>>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>>> standby2 makes progress)
>>
>> This will need some thinking and is not as easy as it sounds. There is
>> no way to hold on a connection after executing a query in the current
>> TAP infrastructure. You are just mentioning case 3, but actually cases
>> 1 and 2 are falling into the same need: if there is a failure we want
>> to be able to not be stuck in the test forever and have a way to
>> cancel a query execution at will. TAP uses psql -c to execute any sql
>> queries, but we would need something that is far lower-level, and that
>> would be basically using the perl driver for Postgres or an equivalent
>> here.
>>
>> Honestly for those tests I just thought that we could get to something
>> reliable by just looking at how each sync replication setup reflects
>> in pg_stat_replication as the flow is really getting complicated,
>> giving to the user a clear representation at SQL level of what is
>> actually occurring in the server depending on the configuration used
>> being important here.
>
> I see.
> We could check the transition of sync_state in pg_stat_replication.
> I think it means that it tests for each replication method (switching
> state) rather than synchronization of replication.
>
> What I'm planning to have are,
> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
> * Standby promotion
> * Standby catching up master
> And each replication method has above tests.
>
> Are these enough?

Does promoting the standby and checking that it caught really have
value in this context of this patch? What we just want to know is on a
master, which nodes need to be waited for when s_s_names or any other
method is used, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread David Rowley

On 19 January 2016 at 02:44, Haribabu Kommi 
wrote:

> On Mon, Jan 18, 2016 at 10:32 PM, David Rowley
>  wrote:
>
> I just thought like direct mapping of the structure with text pointer.
> something like
> the below.
>
> result = PG_ARGISNULL(0) ? NULL : (text *) PG_GETARG_POINTER(0);
> state = (PolyNumAggState *)VARDATA(result);
>
> To handle the big-endian or little-endian, we may need some extra changes.
>
> Instead of adding 3 new columns to the pg_aggregate catalog table to handle
> the internal types, either something like the above to handle the internal
> types
> or some other way is better IMO.

The problem with that is that most of these internal structs for the
aggregate states have pointers to other memory, so even if we laid those
bytes down into a bytea or something, then doing so is not going to
dereference the pointers to the other memory, and when we dereference those
pointers in the other process, we'll have problems as these addresses
belong to the other process.

For example PolyNumAggState is defined as:

typedef NumericAggState PolyNumAggState;

and NumericAggState has:

NumericVar sumX; /* sum of processed numbers */
NumericVar sumX2; /* sum of squares of processed numbers */

And NumericVar has:

NumericDigit *buf; /* start of palloc'd space for digits[] */
NumericDigit *digits; /* base-NBASE digits */

Both of these point to other memory which won't be in the varlena type.

Serialization is the process of collecting all of these pointers up in to
some consecutive bytes.

Of course, that's not to say that there's never Aggregate State structs
which don't have any pointers, I've not checked, but in these cases we
could (perhaps) just make the serialize and deserialize function a simple
memcpy() into a bytea array, although in reality, as you mentioned, we'd
likely want to agree on some format that's cross platform for different
byte orders, as we'll probably, one day, want to forward these values over
to some other server to finish off the aggregation.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] pgindent-polluted commits

2016-01-18 Thread Noah Misch

On Sat, Jan 16, 2016 at 09:57:45AM +, Simon Riggs wrote:
> On 16 January 2016 at 02:10, Noah Misch  wrote:
> > On Wed, Jan 13, 2016 at 12:13:11PM -0500, Tom Lane wrote:
> > > Basically this is trading off convenience of the committer (all of the
> > > alternatives Noah mentions are somewhat annoying) versus the convenience
> > > of post-commit reviewers.  I'm not sure that his recommendation is the
> > > best trade-off, nor that the situation is precisely comparable to
> > > pre-commit review.  There definitely will be pre-commit review, there
> > > may or may not be any post-commit review.
> >
> > That's a good summary.

> My objective in committing patches to PostgreSQL is to develop the Open
> Source version of PostgreSQL as a standalone product and I encourage others
> to do the same.
> 
> PostgreSQL is open source and therefore usable for various additional
> purposes, one of which is modified versions of PostgreSQL.
> 
> I will not go out of my way to cause problems for the secondary users of
> the code. I will try to implement one of the suggestions for whitespace
> handling, though may make mistakes in that, nobody being perfect.

Thanks.  Clean commits help so many audiences, including immediate post-commit
reviewers, intensive beta testers, fork maintainers, and hackers performing
root cause analysis on the bugs to be discovered in future years.  For what
it's worth, most committers already have been using some mix of strategy 2
(leave pgindent entirely to Bruce) and strategy 1 (neither add nor remove work
for the next whole-tree pgindent to do).  If you're already in that majority,
I advise no change.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: postpone building buckets to the end of Hash (in HashJoin)

2016-01-18 Thread Tomas Vondra


Hi,

On 12/17/2015 10:28 PM, Tomas Vondra wrote:

Hi,

On 12/17/2015 07:20 PM, Robert Haas wrote:
...


If this doesn't regress performance in the case where the number of
buckets is estimated accurately to begin with, then I think this is
a great idea. Can you supply some performance tests results for that
case, and maybe some of the other cases also?


I don't see how it could regress performance, and the benchmarks I've
done confirm that. I'll do more thorough benchmarking and post the
results here, but not now as this patch is in 2016-01 CF and I want to
put all my time into reviewing patches from the open commitfest.


I've finally got to do more thorough benchmarks, and surprisingly there 
seems to be a minor regression. The scripts I've used are attached, 
along with results, but in short it joins two tables, with different 
combinations:


  1M x 10M
  10M x 100M
  5M x 250M

with different work_mem sizes (so that the smallest value uses a bunch 
of batches while the largest value uses a single batch).


The 1x10 and 10x100 are designed to fit into RAM on the machine 
(including the temporary files in batching mode), while 5x250 is 
designed to specifically force the temp files to disk (but it's on fast 
SSD array, so not a big deal).


Average timings for current master and the first two patches of the 
series (0001 postpones the build of buckets and 0002 always starts 
without batching) look like this (the timings are in milliseconds):


size   work_memmasterpostpone   no-batch
 -
1x10  8  5735  5760 6107
 32  5939  5955 6124
128  4852  5038 5175
 -
   5x250 64158512158429   159008
256144046144584   144994
   1024111478117529   116933
 -
  10x100 64 68389 6827868530
256 66693 6641566605
   1024 48606 5065450490

So compared to master, the differences look like this:

size   work_mempostpone   no-batch
  -
1x10  8   0.44%  6.50%
 32   0.28%  3.13%
128   3.83%  6.65%
  -
   5x250 64  -0.05%  0.31%
256   0.37%  0.66%
   1024   5.43%  4.89%
  -
   10x10064  -0.16%  0.21%
256  -0.42% -0.13%
   1024   4.21%  3.88%

So for the first patch (postponing building of buckets) with batching, 
the difference is rather negligible, possibly within noise (~0.5%). When 
the hash join runs with a single batch, the difference however gets much 
more significant 4-5%.


I'm not going to discuss the second patch for now, but naturally it has 
the same issue and apparently also some additional ones (at least on the 
1x10 data set).


I have spent a fair amount of time profiling this, and I can't wrap my 
head around where the difference comes from, though. Essentially we 
don't do any additional stuff, we've just moved some of the stuff to a 
different place in the code.


It might change the memory access pattern a bit, but the original 
patterns are not exactly sequential or so, as we're moving random 
updates to array of pointers. Or rather moving them to the 
BuildBuckets() method, so if something consumes more time, it should be 
in this method, I believe.


So I've measured how much time the function takes for the 1x10 and 
10x100 datasets, and it's ~25ms and ~290ms respectively. That's way less 
than the actual difference in query runtime, which is ~190ms and ~2050ms.


So even if we entirely skipped the bucket build, we would not recover 
the difference. Not sure what's going on here.


I've also done some profiling using perf, but I haven't really found 
anything suspicious there.


Any ideas what might be the cause of this?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


hash-scripts.tgz
Description: application/compressed-tar


hashes-delayed.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Amit Kapila

On Mon, Jan 18, 2016 at 7:08 PM, Andres Freund  wrote:
>
> On 2015-12-21 16:26:25 +0900, Michael Paquier wrote:
> > On Sun, Dec 20, 2015 at 10:14 PM, Michael Paquier
> >  wrote:
> > > Speaking of which, this patch was registered in this CF, I am moving
> > > it to the next as a bug fix.
> >
> > I found a stupid bug in my previous patch: when issuing XLOG_SWITCH it
> > is possible that the return LSN pointer is on the new segment that has
> > been forcibly archived if RequestXLogSwitch is called multiple times
> > and that subsequent calls are not necessary. Patch updated.
>
> I find this patch rather unsatisfactory. Yes, it kinda solves the
> problem of archive timeout, but it leaves the bigger and longer standing
> problems of unneccessary checkpoints with wal_level=hs in place. It's
> also somewhat fragile in my opinion.
>
> I think we rather want a per backend (or perhaps per wal insertion lock)
> flag that says 'last relevant record inserted at', and a way to not set
> that during insertion. Then during a checkpoint or the relevant bgwriter
> code, we look wether anything relevant happened in any backend since the
> last time we performed a checkpoint/logged a running xacts snapshot.
>

Sounds to be a more robust way of dealing with this problem.  Michael,
if you don't disagree with above proposal, then we can mark this patch
as Waiting on Author?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Combining Aggregates

2016-01-18 Thread David Rowley

On 19 January 2016 at 17:14, Robert Haas  wrote:

> On Mon, Jan 18, 2016 at 12:34 PM, Robert Haas 
> wrote:
> > On Mon, Jan 18, 2016 at 12:09 PM, Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> Yeah.  The API contract for an expanded object took me quite a while
> >>> to puzzle out, but it seems to be this: if somebody hands you an R/W
> >>> pointer to an expanded object, you're entitled to assume that you can
> >>> "take over" that object and mutate it however you like.  But the
> >>> object might be in some other memory context, so you have to move it
> >>> into your own memory context.
> >>
> >> Only if you intend to keep it --- for example, a function that is
> mutating
> >> and returning an object isn't required to move it somewhere else, if the
> >> input is R/W, and I think it generally shouldn't.
> >>
> >> In the context here, I'd think it is the responsibility of nodeAgg.c
> >> not individual datatype functions to make sure that expanded objects
> >> live where it wants them to.
> >
> > That's how I did it in my prototype, but the problem with that is that
> > spinning up a memory context for every group sucks when there are many
> > groups with only a small number of elements each - hence the 50%
> > regression that David Rowley observed.  If we're going to use expanded
> > objects here, which seems like a good idea in principle, that's going
> > to have to be fixed somehow.  We're flogging the heck out of malloc by
> > repeatedly creating a context, doing one or two allocations in it, and
> > then destroying the context.
> >
> > I think that, in general, the memory context machinery wasn't really
> > designed to manage lots of small contexts.  The overhead of spinning
> > up a new context for just a few allocations is substantial.  That
> > matters in some other situations too, I think - I've commonly seen
> > AllocSetContextCreate taking several percent  of runtime in profiles.
> > But here it's much exacerbated.  I'm not sure whether it's better to
> > attack that problem at the root and try to make AllocSetContextCreate
> > cheaper, or whether we should try to figure out some change to the
> > expanded-object machinery to address the issue.
>
> Here is a patch that helps a good deal.  I changed things so that when
> we create a context, we always allocate at least 1kB.  If that's more
> than we need for the node itself and the name, then we use the rest of
> the space as a sort of keeper block.  I think there's more that can be
> done to improve this, but I'm wimping out for now because it's late
> here.
>
> I suspect something like this is a good idea even if we don't end up
> using it for aggregate transition functions.


Thanks for writing this up, but I think the key issue is not addressed,
which is the memory context per aggregate group just being a performance
killer. I ran the test query again with the attached patch and the hashagg
query still took 300 seconds on my VM with 4GB ram.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-01-18 Thread Michael Paquier

On Wed, Dec 2, 2015 at 3:24 PM, Michael Paquier
 wrote:
> On Wed, Dec 2, 2015 at 3:23 PM, Michael Paquier
>  wrote:
>> On Wed, Dec 2, 2015 at 7:05 AM, Tomas Vondra
>>  wrote:
>>> Attached is v2 of the patch, that
>>>
>>> (a) adds explicit fsync on the parent directory after all the rename()
>>> calls in timeline.c, xlog.c, xlogarchive.c and pgarch.c
>>>
>>> (b) adds START/END_CRIT_SECTION around the new fsync_fname calls
>>> (except for those in timeline.c, as the START/END_CRIT_SECTION is
>>> not available there)
>>>
>>> The patch is fairly trivial and I've done some rudimentary testing, but I'm
>>> sure I haven't exercised all the modified paths.
>>
>> I would like to have an in-depth look at that after finishing the
>> current CF, I am the manager of this one after all... Could you
>> register it to 2016-01 CF for the time being? I don't mind being
>> beaten by someone else if this someone has some room to look at this
>> patch..
>
> And please feel free to add my name as reviewer.

Tomas, I am planning to have a look at that, because it seems to be
important. In case it becomes lost on my radar, do you mind if I add
it to the 2016-03 CF?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Odd behavior in foreign table modification (Was: Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW)

2016-01-18 Thread Etsuro Fujita


On 2016/01/08 14:08, Etsuro Fujita wrote:

On 2016/01/07 21:50, Etsuro Fujita wrote:

On 2016/01/06 20:37, Thom Brown wrote:



I've run into an issue:

*# UPDATE master_customers SET id = 22 WHERE id = 16 RETURNING
tableoid::regclass;
ERROR:
CONTEXT:  Remote SQL command: UPDATE public.customers SET id = 22
WHERE ((id = 16)) RETURNING NULL



While working on this, I noticed that the existing postgres_fdw system
shows similar behavior, so I changed the subject.

IIUC, the reason for that is when the local query specifies "RETURNING
tableoid::regclass", the FDW has fmstate->has_returning=false while the
remote query executed at ModifyTable has "RETURNING NULL", as shown in
the above example; that would cause an abnormal exit in executing the
remote query in postgresExecForeignUpdate, since that the FDW would get
PGRES_TUPLES_OK as a result of the query while the FDW would think that
the right result to get should be PGRES_COMMAND_OK, from the flag
fmstate->has_returning=false.

Attached is a patch to fix that.


I added this to the next CF.

https://commitfest.postgresql.org/9/483/

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 1:28 PM, Amit Kapila  wrote:
> On Mon, Jan 18, 2016 at 7:08 PM, Andres Freund  wrote:
>> I find this patch rather unsatisfactory. Yes, it kinda solves the
>> problem of archive timeout, but it leaves the bigger and longer standing
>> problems of unneccessary checkpoints with wal_level=hs in place. It's
>> also somewhat fragile in my opinion.

Check.

>> I think we rather want a per backend (or perhaps per wal insertion lock)
>> flag that says 'last relevant record inserted at', and a way to not set
>> that during insertion. Then during a checkpoint or the relevant bgwriter
>> code, we look wether anything relevant happened in any backend since the
>> last time we performed a checkpoint/logged a running xacts snapshot.

And in this case, the last relevant record would be caused by a forced
segment switch or a checkpoint record, right? Doing that per WAL
insertion lock seems more scalable to me. I haven't looked at the code
yet though to see how that would work out.

> Sounds to be a more robust way of dealing with this problem.  Michael,
> if you don't disagree with above proposal, then we can mark this patch
> as Waiting on Author?

Yeah let's do so. I'll think more about this thing.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-18 Thread Etsuro Fujita


On 2016/01/18 19:46, Ashutosh Bapat wrote:

PFA patches for postgres_fdw join pushdown, taken care of all TODOs in
my last mail.

Here is the list of things that have been improved/added new as compared
to Hanada-san's previous patch at [1].


Great!  Thank you for working on that!  I'll review the patch.


I will be working next on (in that order)
1. eval_plan_qual fix for foreign join. (Considered as a must-have for 9.6)
2. Pushing down ORDER BY clause along with join pushdown
3. Parameterization of foreign join paths (Given the complexity of the
feature this may not make it into 9.6)


As discussed internally, I think #3 might have some impact on the 
overall design of the EvalPlanQual fix, especially the design of a 
helper function that creates a local join execution path for a foreign 
join path for EvalPlanQual.  So, IMO, I think the order is #1, #3, and 
#2 (or #3, #1, #2).


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor code improvements to create_foreignscan_plan/ExecInitForeignScan

2016-01-18 Thread Etsuro Fujita


On 2016/01/15 19:00, Etsuro Fujita wrote:

On 2016/01/12 18:00, Etsuro Fujita wrote:

On 2016/01/12 2:36, Alvaro Herrera wrote:

I wonder,



--- 2166,2213 
   }

   /*
!  * If rel is a base relation, detect whether any system columns
are
!  * requested from the rel.  (If rel is a join relation,
rel->relid will be
!  * 0, but there can be no Var in the target list with relid 0,
so we skip
!  * this in that case.  Note that any such system columns are
assumed to be
!  * contained in fdw_scan_tlist, so we never need fsSystemCol to
be true in
!  * the joinrel case.)  This is a bit of a kluge and might go
away someday,
!  * so we intentionally leave it out of the API presented to FDWs.
*/
! scan_plan->fsSystemCol = false;
! if (scan_relid > 0)
   {
! Bitmapset  *attrs_used = NULL;
! ListCell   *lc;
! inti;

! /*
!  * First, examine all the attributes needed for joins or
final output.
!  * Note: we must look at reltargetlist, not the attr_needed
data,
!  * because attr_needed isn't computed for inheritance child
rels.
!  */
! pull_varattnos((Node *) rel->reltargetlist, scan_relid,
_used);

! /* Add all the attributes used by restriction clauses. */
! foreach(lc, rel->baserestrictinfo)
   {
! RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
!
! pull_varattnos((Node *) rinfo->clause, scan_relid,
_used);
   }

! /* Now, are any system columns requested from rel? */
! for (i = FirstLowInvalidHeapAttributeNumber + 1; i < 0; i++)
! {
! if (bms_is_member(i -
FirstLowInvalidHeapAttributeNumber, attrs_used))
! {
! scan_plan->fsSystemCol = true;
! break;
! }
! }
!
! bms_free(attrs_used);
! }

   return scan_plan;
   }



Would it make sense to call pull_varattnos(reltargetlist), then walk the
bitmapset and break if we see a system column, then call
pull_varattnos() on the rinfo->clause?  That way, if the targetlist
request a system column we don't have to walk the RestrictInfos.



Seems like a good idea.  Will update the patch.



Done.  Attached is an updated version of the patch.


On second thought, I noticed that detecting whether we see a system 
column that way needs more cycles in cases where the reltargetlist and 
the restriction clauses don't contain any system columns.  ISTM that 
such cases are rather common, so I'm inclined to keep that code as-is.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COPY (... tab completion

2016-01-18 Thread Michael Paquier

On Tue, Jan 19, 2016 at 10:12 AM, Andreas Karlsson  wrote:
> On 01/19/2016 01:57 AM, Andreas Karlsson wrote:
>>
>> Thanks for the review. A new version is attached.
>
>
> Whops, attached the wrong file.

+/* If we have COPY BINARY, compelete with list of tables */
s/compelete/complete

+else if (TailMatches2("COPY|\\copy", "("))
+COMPLETE_WITH_LIST7("SELECT", "TABLE", "VALUES", "INSERT",
"UPDATE", "DELETE", "WITH");
This one should be Matches, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-01-18 Thread Tomas Vondra




On 01/19/2016 07:44 AM, Michael Paquier wrote:

On Wed, Dec 2, 2015 at 3:24 PM, Michael Paquier
 wrote:

On Wed, Dec 2, 2015 at 3:23 PM, Michael Paquier
 wrote:

On Wed, Dec 2, 2015 at 7:05 AM, Tomas Vondra
 wrote:

Attached is v2 of the patch, that

(a) adds explicit fsync on the parent directory after all the rename()
 calls in timeline.c, xlog.c, xlogarchive.c and pgarch.c

(b) adds START/END_CRIT_SECTION around the new fsync_fname calls
 (except for those in timeline.c, as the START/END_CRIT_SECTION is
 not available there)

The patch is fairly trivial and I've done some rudimentary testing, but I'm
sure I haven't exercised all the modified paths.


I would like to have an in-depth look at that after finishing the
current CF, I am the manager of this one after all... Could you
register it to 2016-01 CF for the time being? I don't mind being
beaten by someone else if this someone has some room to look at this
patch..


And please feel free to add my name as reviewer.


Tomas, I am planning to have a look at that, because it seems to be
important. In case it becomes lost on my radar, do you mind if I add
it to the 2016-03 CF?


Well, what else can I do? I have to admit I'm quite surprised this is 
still rotting here, considering it addresses a rather serious data loss 
/ corruption issue on pretty common setup.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-18 Thread Michael Paquier

On Mon, Jan 18, 2016 at 10:19 PM, Amit Kapila  wrote:
> On Mon, Jan 18, 2016 at 10:54 AM, Michael Paquier
>  wrote:
>>
>> On Sun, Jan 17, 2016 at 1:37 PM, Amit Kapila 
>> wrote:
>> > On Sat, Jan 16, 2016 at 6:37 PM, Michael Paquier
>> > 
>> > wrote:
>> >>
>> >>
>> >>
>> >
>> > So here if I understand correctly the check related to the last segment
>> > forcibly switched is based on the fact that if it is forcibly switched,
>> > then
>> > we don't need to log this record? What is the reason of such an
>> > assumption?
>>
>> Yes, the thing is that, as mentioned at the beginning of the thread, a
>> low value of archive_timeout causes a segment to be forcibly switched
>> at the end of the timeout defined by this parameter. In combination
>> with the standby snapshot taken in bgwriter since 9.4, this is causing
>> segments to be always switched even if a system is completely idle.
>> Before that, in 9.3 and older versions, we would have a segment
>> forcibly switched on an idle system only once per checkpoint.
>>
>
> Okay, so this will fix the case where the system is idle, but what I
> am slightly worried is that it should not miss to log the standby snapshot
> due to this check when there is actually some activity in the system.
> Why is not possible to have a case such that the segment is forcibly
> switched due to reason other than checkpoint (user has requested the
> same) and the current insert LSN is at beginning of a new segment
> due to write activity? If that is possible, which to me theoretically seems
> possible, then I think bgwriter will miss to log the standby snapshot.

Yes, with segments forcibly switched by users this check would make
the bgwriter not log in a snapshot if the last action done by server
was XLOG_SWITCH. Based on the patch I sent, the only alternative would
be to not forcedSegSwitchLSN in those code paths. Perhaps that's fine
enough for back-branches..

>> The
>> documentation is already clear about that actually. The whole point of
>> this patch is to fix this regression, aka switch to a new segment only
>> once per checkpoint.
>>
>> > It is not very clear by reading the comments you have
>> > added in patch, may be you can expand comments slightly to explain
>> > the reason of same.
>>
>> OK. Here are the comments that this patch is adding:
>> - * only log if enough time has passed and some xlog record
>> has
>> - * been inserted.
>> + * Only log if enough time has passed and some xlog record
>> has
>> + * been inserted on a new segment. On an idle system where
>> + * segments can be archived in a fast pace with for example a
>> + * low archive_command setting, avoid as well logging a new
>> + * standby snapshot if the current insert position is still
>> + * at the beginning of the segment that has just been
>> switched.
>> + *
>> + * It is possible that GetXLogLastSwitchPtr points to the
>> last
>> + * position of previous segment or to the first position of
>> the
>> + * new segment after the switch, hence take both cases into
>> + * account when deciding if a standby snapshot should be
>> taken.
>> + * (see comments on top of RequestXLogSwitch for more
>> details).
>>   */
>> It makes mention of the problem that it is trying to fix, perhaps you
>> mean that this should mention the fact that on an idle system standby
>> snapshots should happen at worst once per checkpoint?
>>
>
> I mean to say that in below part of comment, explanation about the
> the check related to insert position is quite clear whereas why it is
> okay to avoid logging standby snapshot when the segment is not
> forcibly switched is not apparent.
>
> * avoid as well logging a new
> * standby snapshot if the current insert position is still
> * at the beginning of the segment that has just been switched.

Perhaps: "avoid as well logging a new standby snapshot if the current
insert position is at the beginning of a segment that has been
*forcibly* switched at checkpoint or by archive_timeout"?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 149 matches

Mail list logo