date:20160304

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-03-04 Thread Kyotaro HORIGUCHI

Hello,

Sorry for long, hard-to-read writings in advance..

At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada  
wrote in 
> Hi,
> 
> Thank you so much for reviewing this patch!
> 
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
> 
> > This accepts 'abc^Id' as a name, which is wrong behavior (but
> > such appliction names are not allowed anyway. If you assume so,
> > I'd like to see a comment for that.).
> 
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
>  application_name
> --
>  abc^Id
> (1 row)

Sorry, I implicitly used "^" in the meaning of "ctrl key". So
"^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
following in psql shows that.

=# set application_name to E'abc\td';
=# show application_name ;
 application_name 
--
 ab?d
(1 row)

The  is replaced with '?' (literally) at the time of
guc assinment.

> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> > char ychar) requires differnt character types. Is there any reason
> > for that?
> 
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.

Umm. My qustion might have been a bit out of the point.

The addlitchar_xd_string(str,unsigned char c) does
appendStringInfoChar(, c). On the other hand, the signature of
the function of stringinfo is the following.

AppendStringInfoChar(StringInfo str, char ch);

Of course "char" is equivalent of "signed char" as
default. addlitchar_xd_string assigns the given character in
"unsigned char" to the parameter of AppendStringInfoChar of
"signed char".

These two are incompatible types. Imagine the
following codelet, 

#include 

void hoge(signed char c){
  int ch = c;
  fprintf(stderr, "char = %d\n", ch);
}

int main(void)
{
  unsigned char u;

  u = 200;
  hoge(u);
  return 0;
}

The result is -56. So we generally should get rid of such type of
mixture of signedness for no particular reason.

In this case, the domain of the variable is 0x20-0x7e so no
problem won't be actualized but also there's no reason for the
signedness mixture.

> > I personally don't like addlit*string() things for such simple
> > syntax but itself is acceptble enough for me. However it uses
> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
> > of memory chunk for every double-quoted name. The chunks are
> > finally stacked up left uncollected until the current
> > memorycontext is deleted or reset (It is deleted just after
> > finishing config file processing). Addition to that, setting
> > s_s_names runs the parser twice. It seems to me too greedy and
> > seems that static char [NAMEDATALEN] is enough using the v12 way
> > without palloc/repalloc.
> 
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?

Such long names doesn't seem to necessary. Too long identifiers
no longer act as identifier for human eyeballs. We are limiting
the length of identifiers of the whole database system to
NAMEDATALEN-1, which seems to have been enough so I don't see any
reason to have a group name longer than that.

> > I found that the name SyncGroupName.wait_num is not
> > instinctive. How about sync_num, sync_member_num or
> > sync_standby_num? If the last is preferable, .members also should
> > be .standbys .
> 
> Thanks, sync_num is preferable to me.
> 
> ===
> > I am quite uncomfortable with the existence of
> > WanSnd.sync_standby_priority. It represented the pirority in the
> > old linear s_s_names format but nested groups or even
> > single-level quarum list obviously doesn't fit it. Can we get rid
> > of sync_standby_priority, even though we realize atmost
> > n-priority for now?
> 
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.

As far as I can see the varialbe is referred to as a boolean to
indicate whether a walsernder is connected to a candidate
synchronous standby. So the value is totally useless, at least
for now. However, SyncRepRelaseWaiters uses the value to check if
the synced LSNs can be advaned by a walsender so the variable is
useful as a boolean.

In the previous versions, the reason why WanSnd had the priority
value is that a pair of synchronized LSNs is determined only by
one wansender, which has the highest priority among active
wansenders. So even if a walsender receives a response from
walreceiver, it doesn't need to do nothing if it is not at the
highest priority. It's a simple world.

In the quorum commit word, in contrast, what
SyncRepGetSyncStandbysFn shoud do is returning certain private
infor

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Aleksander Alekseev

> Attached is a v3 which test integers more logically. I'm a lazy
> programmer who tends to minimize the number of key strokes.

Well. From what I can tell this patch is Ready for Committer. 


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-04 Thread Ildar Musin


Hi Pavel

27/02/16 10:26, Pavel Stehule пишет:

Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule >:


Hi,

Psql based implementation needs new infrastructure (more than few
lines)

Missing:

* binary mode support
* parametrized query support,

I am not against, but both points I proposed, and both was rejected.

So why dont use current infrastructure? Raw copy is trivial patch.


I was asked by Daniel Verite about reopening this patch in opened 
commitfest.


I am sending rebased patch

Regards

Pavel


I am new to reviewing, here is what I got. Patch have been applied 
nicely to the HEAD. I tried to upload and export files in psql, it works 
as expected. All regression tests are passed without problems as well. 
Code looks good for me. There is a little confusion for me in this line 
of documentation:


"use any metadata - only row data in network byte order are exported"

Did you mean "only raw data in network byte order is exported"?

And there are two entries for this patch on commitfest page: in 
"miscellaneous" and "sql" sections. Probably it's better to remove one 
of them to avoid confusion.


--
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Alexander Korotkov

On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila  wrote:

> > I think the wait event types should be documented - and the wait
> > events too, perhaps.
> >
>
> As discussed upthread, I have added documentation for all the possible
> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
> AsyncCtlLock are used for quite similar purpose, so I have kept there
> explanation as same.
>

Do you think it worth grouping rows in "wait_event Description" table by
wait event type? It would be nice if user a priori knows which wait event
have which type. Readability of this large table will be also improved.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: Upper planner pathification

2016-03-04 Thread Tomas Vondra


Hi,

On 03/03/2016 01:10 AM, Tom Lane wrote:

Hmmm ... I'm now wondering about the "measurement error" theory.
I tried to repeat this measurement locally, focusing on the select-only
number since that should have a higher ratio of planning time to
execution.

Test setup:
cassert-off build
pgbench -i -s 100
sudo cpupower frequency-set --governor performance

repeat 3 times: pgbench -c 4 -j 4 -P 5 -T 60 -S

HEAD:
tps = 32508.217002 (excluding connections establishing)
tps = 33081.402766
tps = 32520.859913
average of 3: 32703 tps

WITH PATCH:
tps = 32815.922160 (excluding connections establishing)
tps = 33312.149718
tps = 32784.527489
average of 3: 32970 tps

(Hardware: dual quad-core Xeon E5-2609, running current RHEL6)

So I see no evidence for a slowdown on pgbench's SELECT queries.
Anybody else want to check performance on simple scan/join queries?


I did a small test today, mostly out of curiosity. And I think that 
while the results are a bit noisy, there's a clear slowdown. But it's 
extremely small, like ~0.5% for median/average, so I'd say it's negligible.


I used the i5-2500k machine I use for this kind of tests, and I did 30 
runs of


   pgbench -S -T 60 pgbench

on scale 10 database (analyzed and frozen before), both with and without 
the patch applied. FWIW the machine is one of the least noisy ones when 
it comes to such benchmarks.


The results look like this:

  masterpatcheddiff
---
median16531 16459 -0.4%
average   16526 16473 -0.3%

It's a bit more obvious when doing a scatter plot of the results (after 
sorting each time series) with master on x-axis and patched on y-axis. 
Ideally, we'd get the data points on a diagonal (no change) or above it 
(speedup), but the points are actually below. See the chart attached.


But I do agree this is mostly at the noise level, pretty good for a 
first cut that intentionally does not include any hacks. It's definitely 
way below the benefits of this patch, +1 to applying this sooner than later.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Greeting for coming back, and where is PostgreSQL going

2016-03-04 Thread MauMau

Hello,


Long time no see.  I'm back.

Although you may not remember me, I was certainly here more than a year ago, 
submitting tiny patches for bug fixes and trivial functionalities, and 
reviewing/testing patches from others.  That was a fruitful and fun time for 
me.  Thank you a lot for helping me.

After that, I had to stay away from the community for some reason.  Now, I'd be 
happy if I can contribute to PostgreSQL again.  But please excuse me for my 
slow restart, as the blank period needs some rehabilitation.



Let me briefly introduce myself.  I'm MauMau, this is a nickname at home.  
And I'm Takayuki Tsunakawa, a male database engineer who works for Fujitsu in 
Japan.  I'm now able to participate in the community activity at work.

I've been visually impaired since birth, and now I'm almost blind (can only 
sense the light).  I'm using screen reader software to use PCs and smartphones. 
 As I'm using pgindent, I'm sure the source code style won't be bad.  But I 
might overlook some styling problems like indentation in the documentation 
patches.  I'd appreciate it if you could introduce a nice editor for editing 
SGML/XML documents.



I'm excited to join the great PostgreSQL community.  I'm dreaming PostgreSQL 
will evolve from the current "most advanced open source database" to "most 
popular and advanced database".  In that respect, I want to expand the 
PostgreSQL ecosystem (interoperability with other software), as well as adding 
new features.  Let me consult you about how to expand the ecosystem in another 
thread soon.

Finally, I'm wondering what direction PostgreSQL is headed for.  Especially, 
I'm curious about whether PostgreSQL will become a MPP database for OLTP and 
analytics by integrating with Postgres-XL/XC.  I don't yet figure out which 
segment PostgreSQL should aim for, now that Hadoop family is prominent in 
analytics and MySQL is still more popular in Web apps.  I'd like to know what 
community people are seeing in the future of PostgreSQL.


Regards
MauMau

Re: [HACKERS] ExecGather() + nworkers

2016-03-04 Thread Amit Kapila

On Fri, Mar 4, 2016 at 11:57 AM, Haribabu Kommi 
wrote:

> On Wed, Jan 13, 2016 at 7:19 PM, Amit Kapila 
> wrote:
> >>
> >
> > Changed the code such that nworkers_launched gets used wherever
> > appropriate instead of nworkers.  This includes places other than
> > pointed out above.
>
> The changes of the patch are simple optimizations that are trivial.
> I didn't find any problem regarding the changes. I think the same
> optimization is required in "ExecParallelFinish" function also.
>
>
There is already one change as below for ExecParallelFinish() in patch.

@@ -492,7 +492,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)

  WaitForParallelWorkersToFinish(pei->pcxt);



  /* Next, accumulate buffer usage. */

- for (i = 0; i < pei->pcxt->nworkers; ++i)

+ for (i = 0; i < pei->pcxt->nworkers_launched; ++i)

  InstrAccumParallelQuery(&pei->buffer_usage[i]);

Can you be slightly more specific, where exactly you are expecting more
changes?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Greeting for coming back, and where is PostgreSQL going

2016-03-04 Thread Michael Paquier

On Fri, Mar 4, 2016 at 8:20 PM, MauMau  wrote:
> Long time no see.  I'm back.

This is a good surprise.

> Although you may not remember me, I was certainly here more than a year ago,
> submitting tiny patches for bug fixes and trivial functionalities, and
> reviewing/testing patches from others.  That was a fruitful and fun time for
> me.  Thank you a lot for helping me.

Glad to see you back.

> After that, I had to stay away from the community for some reason.  Now, I'd
> be happy if I can contribute to PostgreSQL again.  But please excuse me for
> my slow restart, as the blank period needs some rehabilitation.

I don't think you need to explain yourself. Matters of life happen all the time.

> Let me briefly introduce myself.  I'm MauMau, this is a nickname at home.
> And I'm Takayuki Tsunakawa, a male database engineer who works for Fujitsu
> in Japan.  I'm now able to participate in the community activity at work.

Cool to hear that as well. We are pretty close by... よろしくお願いいたします。

> Finally, I'm wondering what direction PostgreSQL is headed for.  Especially,
> I'm curious about whether PostgreSQL will become a MPP database for OLTP and
> analytics by integrating with Postgres-XL/XC.  I don't yet figure out which
> segment PostgreSQL should aim for, now that Hadoop family is prominent in
> analytics and MySQL is still more popular in Web apps.  I'd like to know
> what community people are seeing in the future of PostgreSQL.

These days, there is a lot of discussion and activity to make Postgres
better at scaling out. There are discussions about backporting stuff
from XC/XL back to core, though that's a tough work. This thread is a
good summary of what is happening lately in this area:
http://www.postgresql.org/message-id/[email protected]

Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecGather() + nworkers

2016-03-04 Thread Haribabu Kommi

On Fri, Mar 4, 2016 at 10:33 PM, Amit Kapila  wrote:
> On Fri, Mar 4, 2016 at 11:57 AM, Haribabu Kommi 
> wrote:
>>
>> On Wed, Jan 13, 2016 at 7:19 PM, Amit Kapila 
>> wrote:
>> >>
>> >
>> > Changed the code such that nworkers_launched gets used wherever
>> > appropriate instead of nworkers.  This includes places other than
>> > pointed out above.
>>
>> The changes of the patch are simple optimizations that are trivial.
>> I didn't find any problem regarding the changes. I think the same
>> optimization is required in "ExecParallelFinish" function also.
>>
>
> There is already one change as below for ExecParallelFinish() in patch.
>
> @@ -492,7 +492,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
>
>   WaitForParallelWorkersToFinish(pei->pcxt);
>
>
>
>   /* Next, accumulate buffer usage. */
>
> - for (i = 0; i < pei->pcxt->nworkers; ++i)
>
> + for (i = 0; i < pei->pcxt->nworkers_launched; ++i)
>
>   InstrAccumParallelQuery(&pei->buffer_usage[i]);
>
>
> Can you be slightly more specific, where exactly you are expecting more
> changes?

I missed it during the comparison with existing code and patch.
Everything is fine with the patch. I marked the patch as ready for committer.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecGather() + nworkers

2016-03-04 Thread Amit Kapila

On Fri, Mar 4, 2016 at 5:21 PM, Haribabu Kommi 
wrote:

> On Fri, Mar 4, 2016 at 10:33 PM, Amit Kapila 
> wrote:
> > On Fri, Mar 4, 2016 at 11:57 AM, Haribabu Kommi <
> [email protected]>
> > wrote:
> >>
> >> On Wed, Jan 13, 2016 at 7:19 PM, Amit Kapila 
> >> wrote:
> >> >>
> >> >
> >> > Changed the code such that nworkers_launched gets used wherever
> >> > appropriate instead of nworkers.  This includes places other than
> >> > pointed out above.
> >>
> >> The changes of the patch are simple optimizations that are trivial.
> >> I didn't find any problem regarding the changes. I think the same
> >> optimization is required in "ExecParallelFinish" function also.
> >>
> >
> > There is already one change as below for ExecParallelFinish() in patch.
> >
> > @@ -492,7 +492,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
> >
> >   WaitForParallelWorkersToFinish(pei->pcxt);
> >
> >
> >
> >   /* Next, accumulate buffer usage. */
> >
> > - for (i = 0; i < pei->pcxt->nworkers; ++i)
> >
> > + for (i = 0; i < pei->pcxt->nworkers_launched; ++i)
> >
> >   InstrAccumParallelQuery(&pei->buffer_usage[i]);
> >
> >
> > Can you be slightly more specific, where exactly you are expecting more
> > changes?
>
> I missed it during the comparison with existing code and patch.
> Everything is fine with the patch. I marked the patch as ready for
> committer.
>
>
Thanks!


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Greeting for coming back, and where is PostgreSQL going

2016-03-04 Thread Tatsuo Ishii

Tasunakawa-san,

Welcome back to PostgreSQL!
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

> Hello,
> 
> 
> Long time no see.  I'm back.
> 
> Although you may not remember me, I was certainly here more than a year ago, 
> submitting tiny patches for bug fixes and trivial functionalities, and 
> reviewing/testing patches from others.  That was a fruitful and fun time for 
> me.  Thank you a lot for helping me.
> 
> After that, I had to stay away from the community for some reason.  Now, I'd 
> be happy if I can contribute to PostgreSQL again.  But please excuse me for 
> my slow restart, as the blank period needs some rehabilitation.
> 
> 
> 
> Let me briefly introduce myself.  I'm MauMau, this is a nickname at home. 
>  And I'm Takayuki Tsunakawa, a male database engineer who works for Fujitsu 
> in Japan.  I'm now able to participate in the community activity at work.
> 
> I've been visually impaired since birth, and now I'm almost blind (can only 
> sense the light).  I'm using screen reader software to use PCs and 
> smartphones.  As I'm using pgindent, I'm sure the source code style won't be 
> bad.  But I might overlook some styling problems like indentation in the 
> documentation patches.  I'd appreciate it if you could introduce a nice 
> editor for editing SGML/XML documents.
> 
> 
> 
> I'm excited to join the great PostgreSQL community.  I'm dreaming PostgreSQL 
> will evolve from the current "most advanced open source database" to "most 
> popular and advanced database".  In that respect, I want to expand the 
> PostgreSQL ecosystem (interoperability with other software), as well as 
> adding new features.  Let me consult you about how to expand the ecosystem in 
> another thread soon.
> 
> Finally, I'm wondering what direction PostgreSQL is headed for.  Especially, 
> I'm curious about whether PostgreSQL will become a MPP database for OLTP and 
> analytics by integrating with Postgres-XL/XC.  I don't yet figure out which 
> segment PostgreSQL should aim for, now that Hadoop family is prominent in 
> analytics and MySQL is still more popular in Web apps.  I'd like to know what 
> community people are seeing in the future of PostgreSQL.
> 
> 
> Regards
> MauMau


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP / recovery-test fs-level backups, psql enhancements etc

2016-03-04 Thread Craig Ringer

On 4 March 2016 at 05:08, Alvaro Herrera  wrote:

>
> Okay, so far I have pushed 0001 and 0002 squashed (commit 5bec1ad4648),
> 0003 (commit 7d9a4301c08), 0005 and 0006 squashed (commit 2c83f435a3de).
> In the last one I chose to rename your psql_check to safe_psql and
> tweaked a few other things, not worthy of individual mention.

I've just noticed that I failed to initialized $$timed_out to zero in
PostgresNode::psql if a ref is passed for it.

Fix attached.

The tests are proving useful already; I've shown that timeline following
works great with a filesystem-level copy, but that there are WAL retention
issues (unsurprisingly) when the backup is created without slots then the
slot state is synced over by an external client.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] TAP / recovery-test fs-level backups, psql enhancements etc

2016-03-04 Thread Craig Ringer

On 4 March 2016 at 20:35, Craig Ringer  wrote:


> Fix attached.
>
>
Apparently I need a "remind me if you see the word attach in an email"
plugin. Sigh.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From b7158678086f137eaa38782aadb51ec16c44871b Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 4 Mar 2016 20:40:43 +0800
Subject: [PATCH] TAP: Correctly initialized $timed_out if passed

---
 src/test/perl/PostgresNode.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index f0f70ea..ea4fb63 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1025,6 +1025,8 @@ sub psql
 	  IPC::Run::timeout($params{timeout}, exception => $timeout_exception)
 	  if (defined($params{timeout}));
 
+	${$params{timed_out}} = 0 if defined $params{timed_out};
+
 	# IPC::Run would otherwise append to existing contents:
 	$$stdout = "" if ref($stdout);
 	$$stderr = "" if ref($stderr);
-- 
2.1.0


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Craig Ringer

On 2 March 2016 at 10:19, Michael Paquier  wrote:

> On Wed, Mar 2, 2016 at 12:40 AM, Andrew Dunstan 
> wrote:
> > On 03/01/2016 08:00 AM, Michael Paquier wrote:
> >> As of now the MSVC scripts control if TAP tests are enabled or not
> >> using a boolean flag as $config->{tap_tests}. However, this flag is
> >> just taken into account in vcregress.pl, with the following issues:
> >> 1) config_default.pl does not list tap_tests, so it is unclear to
> >> users to enable them. People need to look into vcregress.pl as a start
> >> point.
> >> 2) GetFakeConfigure() does not translate $config->{tap_tests} into
> >> --enable-tap-tests, leading to pg_config not reporting it in
> >> CONFIGURE. This is inconsistent with what is done in ./configure.
> >>
> >> Attached is a patch to address those two issues.
> >
> > Good work. There seem to be some unrelated whitespace changes. Shouldn't
> > this just be two extra lines?
>
> pertidy is telling me the contrary. Is that bad to run it for such a
> change? I thought we cared about the format of the perl code, and the
> length of the variable name "tap_tests" has an impact on the
> indentation of the whole block per the rules in
> src/tools/pgindent/perltidyrc. (Actually I made a mistake in previous
> patch the portion for asserts should not be indented, not sure why it
> was, attached is an updated patch).
>
>
If it's the result of perltidy changing its mind about the formatting as a
result of this change I guess we have to eyeroll and live with it. perltidy
leaves the file alone as it is in the tree currently, so that be it.

Gripe withdrawn, ready for committer IMO

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] improving GROUP BY estimation

2016-03-04 Thread Tomas Vondra

On Thu, 2016-03-03 at 11:42 -0800, Mark Dilger wrote:
> > On Mar 3, 2016, at 11:27 AM, Alexander Korotkov  
> > wrote:
> > 
> > On Thu, Mar 3, 2016 at 10:16 PM, Tomas Vondra 
> >  wrote:
> > So yes, each estimator works great for exactly the opposite cases. But 
> > notice that typically, the results of the new formula is much higher than 
> > the old one, sometimes by two orders of magnitude (and it shouldn't be 
> > difficult to construct examples of much higher differences).
> > 
> > The table also includes the 'average' estimator you propose, but it's 
> > rather obvious that the result is always much closer to the new value, 
> > simply because
> > 
> >(small number) + (huge number)
> >--
> >   2
> > 
> > is always much closer to the huge number. We're usually quite happy
> > when the estimates are within the same order of magnitude, so whether
> > it's K or K/2 makes pretty much no difference.
> > 
> > I believe that Mark means geometrical average, i.e. sqrt((small number) * 
> > (huge number)).

Ah, OK. Haven't realized you meant geometric mean. With that it looks
like this:

1) independent

   10  50 100 50010005000
 --
 actual   919382962449944   10001   10001
 current   10  50 102 51610184996
 new  97340016382989799519951
 average  49120253205520654847473
 geom. mean99 447 807226031837051

2) dependent

   10  50 100 50010005000
 --
 actual10  50 100 50010005000
 current   10  53 105 50810165014
 new  880410564729955   10018   10018
 average  44520793288523155177516
 geom. mean94 466 824224931907087

To better illustrate the differences, we may use the "ratio error"
defined as

err(a,b) = MAX(a/b, b/a)

where 'a' is the actual value and 'b' is the estimate. That gives us:

1) independent

10   50  100  50010005000
 --
   current   91.9076.5861.2219.279.822.00
   new1.06 1.04 1.02 1.001.011.01
   average1.87 1.89 1.95 1.911.821.34
   geom. mean 9.32 8.56 7.74 4.403.141.42

2) dependent

10   50  100  50010005000
 --
   current1.00 1.06 1.05 1.021.021.00
   new   88.0082.1064.7219.91   10.022.00
   average   44.5041.5832.8810.465.521.50
   geom. mean 9.38 9.33 8.24 4.503.191.42

So the geometric mean seems to be working better than plain average. But
I don't think we can simply conclude we should use the geometric mean of
the estimates, as it assumes the risks of over- and under-estimating the
cardinality are the same. Because they aren't.

> 
> Yes, that is what I proposed upthread.  I'm not wedded to that, though.
> In general, I am with Tomas on this one, believing that his estimate
> will be much better than the current estimate.  But I believe the *best*
> estimate will be somewhere between his and the current one, and I'm
> fishing for any decent way of coming up with a weighted average that
> is closer to his than to the current, but not simply equal to his proposal.
> 
> The reason I want the formula to be closer to Tomas's than to the
> current is that I think that on average, across all tables, across all
> databases, in practice it will be closer to the right estimate than the
> current formula.  That's just my intuition, and so I can't defend it.
> But if my intuition is right, the best formula we can adopt would be one
> that is moderated from his by a little bit, and in the direction of the
> estimate that the current code generates.

I think looking merely at what fraction of data sets (or queries) uses
dependent GROUP BY and WHERE clauses is not sufficient.

The general behavior of the two GROUP BY estimators is that the current
one tends to under-estimate, while the new one tends to over-estimate.
(It's not difficult to construct counter-examples by using more complex
dependencies between the columns etc. but let's ignore that.)

The risk associated with over-estimation is that we may switch from
HashAggregate to GroupAggregate, and generally select paths better
suited for large number of rows. Not grea

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Amit Kapila

On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
wrote:

> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
> wrote:
>
>> > I think the wait event types should be documented - and the wait
>> > events too, perhaps.
>> >
>>
>> As discussed upthread, I have added documentation for all the possible
>> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>> explanation as same.
>>
>
> Do you think it worth grouping rows in "wait_event Description" table by
> wait event type?
>

They are already grouped (implicitly), do you mean to say that we should
add wait event type name as well in that table? If yes, then the only
slight worry is that there will lot of repetition in wait_event_type
column, otherwise it is okay.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Thom Brown

On 4 March 2016 at 04:05, Amit Kapila  wrote:
> On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>>
>> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
>> wrote:
>> >> I wouldn't bother tinkering with it at this point.  The value isn't
>> >> going to be recorded on disk anywhere, so it will be easy to change
>> >> the way it's computed in the future if we ever need to do that.
>> >>
>> >
>> > Okay. Find the rebased patch attached with this mail.  I have moved
>> > this patch to upcoming CF.
>>
>> I would call the functions pgstat_report_wait_start() and
>> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
>> pgstat_report_end_waiting().
>>
>
> Changed as per suggestion and made these functions inline.
>
>> I think pgstat_get_wait_event_type should not return HWLock, a term
>> that appears nowhere in our source tree at present.  How about just
>> "Lock"?
>>
>
> Changed as per suggestion.
>
>> I think the wait event types should be documented - and the wait
>> events too, perhaps.
>>
>
> As discussed upthread, I have added documentation for all the possible wait
> events and an example.  Some of the LWLocks like AsyncQueueLock and
> AsyncCtlLock are used for quite similar purpose, so I have kept there
> explanation as same.
>
>> Maybe it's worth having separate wait event type names for lwlocks and
>> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
>> document the difference: "LWLockNamed indicates that the backend is
>> waiting for a specific, named LWLock.  The event name is the name of
>> that lock.  LWLockTranche indicates that the backend is waiting for
>> any one of a group of locks with similar function.  The event name
>> identifies the general purpose of locks in that group."
>>
>
> Changed as per suggestion.
>
>> There's no requirement that every session have every tranche
>> registered.  I think we should consider displaying "extension" for any
>> tranche that's not built-in, or at least for tranches that are not
>> registered (rather than "unknown wait event").
>>
>> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>>
>> Isn't the second part automatically true at this point?
>>
>
> Fixed.
>
>> The changes to LockBufferForCleanup() don't look right to me.  Waiting
>> for a buffer pin might be a reasonable thing to define as a wait
>> event, but it shouldn't reported as if we were waiting on the LWLock
>> itself.
>>
>
> As discussed upthread, added a new wait event BufferPin for this case.
>
>> What happens if an error is thrown while we're in a wait?
>>
>
> As discussed upthread, added in AbortTransaction and from where ever
> LWLockReleaseAll() gets called, point to note is that we can call this
> function only when we are sure there is no further possibility of wait on
> LWLock.
>
>> Does this patch hurt performance?
>>
>
> Performance tests are underway.

I've attached a revised version of the patch with the following corrections:

+ 
+  LWLockTranche: The backend is waiting for any one of a
+  group of locks with similar function.  The wait_event
+  name for this type of wait identifies the general purpose of locks
+  in that group.
+ 

s/with similar/with a similar/

+
+ ControlFileLock
+ A server process is waiting to read or update the control file
+ or creation of a new WAL log file.
+

As the L in WAL stands for "log" anyway, I think the extra "log" word
can be removed.

+
+ RelCacheInitLock
+ A server process is waiting to read or write to relation cache
+ initialization file.
+

s/to relation/to the relation/

+
+ BtreeVacuumLock
+  A server process is waiting to read or update vacuum related
+  information for Btree index.
+

s/vacuum related/vacuum-related/
s/for Btree/for a Btree/

+
+ AutovacuumLock
+ A server process which could be autovacuum worker is waiting to
+ update or read current state of autovacuum workers.
+

s/could be autovacuum/could be that an autovacuum/
s/read current/read the current/

(discussed with Amit offline about other sources of wait, and he
suggested autovacuum launcher, so I've added that in too)

+
+ AutovacuumScheduleLock
+ A server process is waiting on another process to ensure that
+ the table it has selected for vacuum still needs vacuum.
+ 
+

s/for vacuum/for a vacuum/
s/still needs vacuum/still needs vacuuming/

+
+ SyncScanLock
+ A server process is waiting to get the start location of scan
+ on table for synchronized scans.
+

s/of scan/of a scan/
s/on table/on a table/

+
+ SerializableFinishedListLock
+ A server process is waiting to access list of finished
+ serializable transactions.
+

s/to access list/to access the list/

+
+ Seriali

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Alexander Korotkov

On Fri, Mar 4, 2016 at 4:20 PM, Amit Kapila  wrote:

> On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
> wrote:
>
>> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
>> wrote:
>>
>>> > I think the wait event types should be documented - and the wait
>>> > events too, perhaps.
>>> >
>>>
>>> As discussed upthread, I have added documentation for all the possible
>>> wait events and an example.  Some of the LWLocks like AsyncQueueLock and
>>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>>> explanation as same.
>>>
>>
>> Do you think it worth grouping rows in "wait_event Description" table by
>> wait event type?
>>
>
> They are already grouped (implicitly), do you mean to say that we should
> add wait event type name as well in that table?
>

Yes.


> If yes, then the only slight worry is that there will lot of repetition in
> wait_event_type column, otherwise it is okay.
>

There is morerows attribute of entry tag in Docbook SGML, it behaves like
rowspan in HTML.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Thom Brown

On 4 March 2016 at 13:35, Thom Brown  wrote:
> On 4 March 2016 at 04:05, Amit Kapila  wrote:
>> On Wed, Feb 24, 2016 at 7:14 PM, Robert Haas  wrote:
>>>
>>> On Mon, Feb 22, 2016 at 10:05 AM, Amit Kapila 
>>> wrote:
>>> >> I wouldn't bother tinkering with it at this point.  The value isn't
>>> >> going to be recorded on disk anywhere, so it will be easy to change
>>> >> the way it's computed in the future if we ever need to do that.
>>> >>
>>> >
>>> > Okay. Find the rebased patch attached with this mail.  I have moved
>>> > this patch to upcoming CF.
>>>
>>> I would call the functions pgstat_report_wait_start() and
>>> pgstat_report_wait_end() instead of pgstat_report_start_waiting() and
>>> pgstat_report_end_waiting().
>>>
>>
>> Changed as per suggestion and made these functions inline.
>>
>>> I think pgstat_get_wait_event_type should not return HWLock, a term
>>> that appears nowhere in our source tree at present.  How about just
>>> "Lock"?
>>>
>>
>> Changed as per suggestion.
>>
>>> I think the wait event types should be documented - and the wait
>>> events too, perhaps.
>>>
>>
>> As discussed upthread, I have added documentation for all the possible wait
>> events and an example.  Some of the LWLocks like AsyncQueueLock and
>> AsyncCtlLock are used for quite similar purpose, so I have kept there
>> explanation as same.
>>
>>> Maybe it's worth having separate wait event type names for lwlocks and
>>> lwlock tranches.  We could report LWLockNamed and LWLockTranche and
>>> document the difference: "LWLockNamed indicates that the backend is
>>> waiting for a specific, named LWLock.  The event name is the name of
>>> that lock.  LWLockTranche indicates that the backend is waiting for
>>> any one of a group of locks with similar function.  The event name
>>> identifies the general purpose of locks in that group."
>>>
>>
>> Changed as per suggestion.
>>
>>> There's no requirement that every session have every tranche
>>> registered.  I think we should consider displaying "extension" for any
>>> tranche that's not built-in, or at least for tranches that are not
>>> registered (rather than "unknown wait event").
>>>
>>> +   if (lock->tranche == 0 && lockId < NUM_INDIVIDUAL_LWLOCKS)
>>>
>>> Isn't the second part automatically true at this point?
>>>
>>
>> Fixed.
>>
>>> The changes to LockBufferForCleanup() don't look right to me.  Waiting
>>> for a buffer pin might be a reasonable thing to define as a wait
>>> event, but it shouldn't reported as if we were waiting on the LWLock
>>> itself.
>>>
>>
>> As discussed upthread, added a new wait event BufferPin for this case.
>>
>>> What happens if an error is thrown while we're in a wait?
>>>
>>
>> As discussed upthread, added in AbortTransaction and from where ever
>> LWLockReleaseAll() gets called, point to note is that we can call this
>> function only when we are sure there is no further possibility of wait on
>> LWLock.
>>
>>> Does this patch hurt performance?
>>>
>>
>> Performance tests are underway.
>
> I've attached a revised version of the patch with the following corrections:
>
> + 
> +  LWLockTranche: The backend is waiting for any one of a
> +  group of locks with similar function.  The wait_event
> +  name for this type of wait identifies the general purpose of locks
> +  in that group.
> + 
>
> s/with similar/with a similar/
>
> +
> + ControlFileLock
> + A server process is waiting to read or update the control 
> file
> + or creation of a new WAL log file.
> +
>
> As the L in WAL stands for "log" anyway, I think the extra "log" word
> can be removed.
>
> +
> + RelCacheInitLock
> + A server process is waiting to read or write to relation 
> cache
> + initialization file.
> +
>
> s/to relation/to the relation/
>
> +
> + BtreeVacuumLock
> +  A server process is waiting to read or update vacuum related
> +  information for Btree index.
> +
>
> s/vacuum related/vacuum-related/
> s/for Btree/for a Btree/
>
> +
> + AutovacuumLock
> + A server process which could be autovacuum worker is waiting 
> to
> + update or read current state of autovacuum workers.
> +
>
> s/could be autovacuum/could be that an autovacuum/
> s/read current/read the current/
>
> (discussed with Amit offline about other sources of wait, and he
> suggested autovacuum launcher, so I've added that in too)
>
> +
> + AutovacuumScheduleLock
> + A server process is waiting on another process to ensure that
> + the table it has selected for vacuum still needs vacuum.
> + 
> +
>
> s/for vacuum/for a vacuum/
> s/still needs vacuum/still needs vacuuming/
>
> +
> + SyncScanLock
> + A server process is waiting to get the start location of scan
> + on table for synchronized scans.
> +
>
> s/of scan/of a scan/

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-03-04 Thread Thom Brown

On 4 March 2016 at 13:41, Alexander Korotkov  wrote:
> On Fri, Mar 4, 2016 at 4:20 PM, Amit Kapila  wrote:
>>
>> On Fri, Mar 4, 2016 at 4:01 PM, Alexander Korotkov 
>> wrote:
>>>
>>> On Fri, Mar 4, 2016 at 7:05 AM, Amit Kapila 
>>> wrote:

 > I think the wait event types should be documented - and the wait
 > events too, perhaps.
 >

 As discussed upthread, I have added documentation for all the possible
 wait events and an example.  Some of the LWLocks like AsyncQueueLock and
 AsyncCtlLock are used for quite similar purpose, so I have kept there
 explanation as same.
>>>
>>>
>>> Do you think it worth grouping rows in "wait_event Description" table by
>>> wait event type?
>>
>>
>> They are already grouped (implicitly), do you mean to say that we should
>> add wait event type name as well in that table?
>
>
> Yes.
>
>>
>> If yes, then the only slight worry is that there will lot of repetition in
>> wait_event_type column, otherwise it is okay.
>
>
> There is morerows attribute of entry tag in Docbook SGML, it behaves like
> rowspan in HTML.

+1

Yes, we do this elsewhere in the docs.  And it is difficult to look
through the wait event names at the moment.

I'm also not keen on all the "A server process is waiting" all the way
down the list.

Thom


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-04 Thread Greg Stark

On Tue, Mar 1, 2016 at 3:02 PM, Tom Lane  wrote:
> Well, my point is that no such path would have been generated if the
> subquery hadn't had an internal reason to consider sorting on b.id.
> The "accidental" part of this is that the subquery's GROUP BY key
> matches what the outer query needs as a mergejoin key.

Hm. I can't seem to get it to generate such plans here. This is after
disabling hashjoin or else it doesn't want to do a sort at all:

postgres=# explain select * from (select * from v group by i) as v1
natural join (select * from v group by i) as v2;
QUERY PLAN
---
 Merge Join  (cost=107.04..111.04 rows=200 width=4)
   Merge Cond: (v.i = v_1.i)
   ->  Sort  (cost=53.52..54.02 rows=200 width=4)
 Sort Key: v.i
 ->  HashAggregate  (cost=41.88..43.88 rows=200 width=4)
   Group Key: v.i
   ->  Seq Scan on v  (cost=0.00..35.50 rows=2550 width=4)
   ->  Sort  (cost=53.52..54.02 rows=200 width=4)
 Sort Key: v_1.i
 ->  HashAggregate  (cost=41.88..43.88 rows=200 width=4)
   Group Key: v_1.i
   ->  Seq Scan on v v_1  (cost=0.00..35.50 rows=2550 width=4)
(12 rows)

I'm trying to construct a torture case where it generates lots more
paths than HEAD. I don't think a percent or two on planning time is
significant but if there are cases where the planning time increases
quickly that would be something to code against.

-- 
greg


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 12:46 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> A couple of my colleagues have been looking into this.  It's not
>> entirely clear to me what's going on here yet, but it looks like the
>> stats get there if you wait long enough.  Rahila Syed was able to
>> reproduce the problem and says that the attached patch fixes it.  But
>> I don't quite understand why this should fix it.
>
> I don't like this patch much.  While the new function is not bad in
> itself, it looks really weird to call it immediately after the other
> wait function.  And the reason for that, AFAICT, is that somebody dropped
> the entire "truncation stats" test sequence into the middle of unrelated
> tests, evidently in the vain hope that that way they could piggyback
> on the existing wait.  Which these failures are showing us is wrong.
>
> I think we should move all the inserted logic down so that it's not in the
> middle of unrelated testing.

Sure.  If you have an idea what the right thing to do is, please go
ahead.  I actually don't have a clear idea what's going on here.  I
guess it's that the wait_for_stats() guarantees that the stats message
from the index insertion has been received but the status messages
from the "trunc" tables might not have gotten there yet.  I thought
maybe that works without parallelism because all of those messages are
coming from the same backend, and therefore if you have the later one
you must have all of the earlier ones, too.  But if you're running
some of the queries in parallel workers then it's possible for a stats
message from a worker run later to arrive later.

But it's not that after all, because when I run the regression tests
with the pg_sleep removed, I get this:

*** /Users/rhaas/pgsql/src/test/regress/expected/stats.out
2016-03-04 08:55:33.0 -0500
--- /Users/rhaas/pgsql/src/test/regress/results/stats.out
2016-03-04 09:00:29.0 -0500
***
*** 127,140 
   1
  (1 row)

- -- force the rate-limiting logic in pgstat_report_tabstat() to time out
- -- and send a message
- SELECT pg_sleep(1.0);
-  pg_sleep
- --
-
- (1 row)
-
  -- wait for stats collector to update
  SELECT wait_for_stats();
   wait_for_stats
--- 127,132 
***
*** 148,158 
   WHERE relname like 'trunc_stats_test%' order by relname;
relname  | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup
| n_dead_tup
  
---+---+---+---++
!  trunc_stats_test  | 3 | 0 | 0 |  0
|  0
!  trunc_stats_test1 | 4 | 2 | 1 |  1
|  0
!  trunc_stats_test2 | 1 | 0 | 0 |  1
|  0
!  trunc_stats_test3 | 4 | 0 | 0 |  2
|  2
!  trunc_stats_test4 | 2 | 0 | 0 |  0
|  2
  (5 rows)

  SELECT st.seq_scan >= pr.seq_scan + 1,
--- 140,150 
   WHERE relname like 'trunc_stats_test%' order by relname;
relname  | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup
| n_dead_tup
  
---+---+---+---++
!  trunc_stats_test  | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test1 | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test2 | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test3 | 0 | 0 | 0 |  0
|  0
!  trunc_stats_test4 | 0 | 0 | 0 |  0
|  0
  (5 rows)

  SELECT st.seq_scan >= pr.seq_scan + 1,
***
*** 163,169 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column?
  --+--+--+--
!  t| t| t| t
  (1 row)

  SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
--- 155,161 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column? | ?column? | ?column?
  --+--+--+--
!  f| f| f| f
  (1 row)

  SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
***
*** 172,178 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column?
  --+--
!  t| t
  (1 row)

  SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer
--- 164,170 
   WHERE st.relname='tenk2' AND cl.relname='tenk2';
   ?column? | ?column?
  --+--
!  t| f
  (1 row)

  SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer

That looks suspiciously similar to the failure we're getting with the
force_parallel_mode testing, but I'm still confused.

>> BTW, this comment is obsolete:
>
>> -- force the rate-limiting logic in pgstat_report_tabstat() to time out
>> -- and send a message
>> SELECT pg_sleep(1.

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Michael Paquier

On Fri, Mar 4, 2016 at 3:54 PM, Michael Paquier
 wrote:
> I still need to dig into that in more details. For the time being the
> patch attached is useful IMO to plug in VS 2015 with the existing
> infrastructure. So if anybody has a Windows environment, feel free to
> play with it and dig into those problems. I'll update this thread once
> I have a more advanced status.

OK, attached are a set of patches that allowed me to compile Postgres
using VS2015, in more details:
- 0001, as mentioned by Petr upthread, psed is removed from the core
distribution of Perl in 5.22, so when installing ActivePerl it is not
possible to create probes.h, and the code compilation would fail. I
bumped into that so here is a patch. What I am proposing here is to
replace psed by sed, sed being available in MSYS like bison and flex,
so when building using MSVC the environment to set up is normally
already good to go even with this additional dependency. Now, it is
important to mention that probes.h is not part of a source tarball. I
think that we would want probes.h to be part of a source tarball so as
it would be possible to compile the code on Windows using MSVC without
having to install MSYS. I haven't done that in this patch, thoughts on
the matter are welcome.
- 0002, which adds support for VS2015 in src/tools/scripts
- 0003, to address a compilation failure that I bumped into when
compiling ecpg. In src/port, TIMEZONE_GLOBAL and TZNAME_GLOBAL refer
to respectively timezone and tzname, however for win32, those should
be _timezone and _tzname. See here:
https://msdn.microsoft.com/en-us/library/htb3tdkc.aspx
- 0004, which is to address the problem of the missing lc_codepage
from locale.h in src/port/. I have been pondering about the use of
more fancy routines like GetLocaleInfoEx as mentioned by Petr
upthread. However, I think that we had better avoid any kind of
complication and just fall back to the old code path should _MSC_VER
>= 1900. We could always reuse lc_codepage if it gets reintroduced in
a future version of VS.

This set of patches is clearly a work-in-progress, but I am at the
point where feedback is welcome, and the code can compile. The issue
wit psed is something I think could be backpatched a bit more than the
VS2015 core patches, support for perl 5.22 happening on all the
supported branches.
Note that I did not bump into the stdbool issues. Note as well that VS
has complained about a couple of warnings. I am attaching them in the
file named VS2015_warnings.txt. Those are quite interesting as well.
-- 
Michael
"C:\Users\IEUser\git\postgres\pgsql.sln" (default target) (1) ->
"C:\Users\IEUser\git\postgres\ascii_and_mic.vcxproj" (default target) (2) ->
"C:\Users\IEUser\git\postgres\postgres.vcxproj" (default target) (3) ->
(ClCompile target) ->
  src/backend/libpq/ip.c(476): warning C4996: 'WSASocketA': Use WSASocketW() 
instead or define _WINSOCK_DEPRECATED_NO_W
ARNINGS to disable deprecated API warnings 
[C:\Users\IEUser\git\postgres\postgres.vcxproj]
  C:\Program Files (x86)\Windows Kits\8.1\Include\um\dbghelp.h(1544): warning 
C4091: 'typedef ': ignored on left of ''
when no variable is declared (compiling source file 
src/backend/port/win32/crashdump.c) [C:\Users\IEUser\git\postgres\p
ostgres.vcxproj]
  C:\Program Files (x86)\Windows Kits\8.1\Include\um\dbghelp.h(3190): warning 
C4091: 'typedef ': ignored on left of ''
when no variable is declared (compiling source file 
src/backend/port/win32/crashdump.c) [C:\Users\IEUser\git\postgres\p
ostgres.vcxproj]
  src/backend/port/win32/socket.c(247): warning C4996: 'WSASocketA': Use 
WSASocketW() instead or define _WINSOCK_DEPREC
ATED_NO_WARNINGS to disable deprecated API warnings 
[C:\Users\IEUser\git\postgres\postgres.vcxproj]
  src/backend/postmaster/postmaster.c(5892): warning C4996: 
'WSADuplicateSocketA': Use WSADuplicateSocketW() instead or
 define _WINSOCK_DEPRECATED_NO_WARNINGS to disable deprecated API warnings 
[C:\Users\IEUser\git\postgres\postgres.vcxpr
oj]
  src/backend/postmaster/postmaster.c(5919): warning C4996: 'WSASocketA': Use 
WSASocketW() instead or define _WINSOCK_D
EPRECATED_NO_WARNINGS to disable deprecated API warnings 
[C:\Users\IEUser\git\postgres\postgres.vcxproj]
  src/port/getaddrinfo.c(199): warning C4996: 'gethostbyname': Use 
getaddrinfo() or GetAddrInfoW() instead or define _W
INSOCK_DEPRECATED_NO_WARNINGS to disable deprecated API warnings 
[C:\Users\IEUser\git\postgres\postgres.vcxproj]
  src/port/thread.c(134): warning C4996: 'gethostbyname': Use getaddrinfo() or 
GetAddrInfoW() instead or define _WINSOC
K_DEPRECATED_NO_WARNINGS to disable deprecated API warnings 
[C:\Users\IEUser\git\postgres\postgres.vcxproj]


"C:\Users\IEUser\git\postgres\pgsql.sln" (default target) (1) ->
"C:\Users\IEUser\git\postgres\latin2_and_win1250.vcxproj" (default target) (9) 
->
(PrepareForBuild target) ->
  C:\Program Files 
(x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppBuild.targets(392,5): 
warning MSB8028: The interm
ediate directory (.\Re

Re: [HACKERS] raw output from copy

2016-03-04 Thread Daniel Verite

Corey Huinker wrote:

> So, for me, RAW is the right solution, or at least *a* right solution.

Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.

But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.

Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.
Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.

[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng

[2] http://www.postgresql.org/message-id/[email protected]

[3] http://stackoverflow.com/questions/6730729

[4] http://www.postgresql.org/message-id/[email protected]

[5] http://www.postgresql.org/message-id/[email protected]


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Alvaro Herrera

Michael Paquier wrote:

> - 0001, as mentioned by Petr upthread, psed is removed from the core
> distribution of Perl in 5.22, so when installing ActivePerl it is not
> possible to create probes.h, and the code compilation would fail. I
> bumped into that so here is a patch. What I am proposing here is to
> replace psed by sed, sed being available in MSYS like bison and flex,
> so when building using MSVC the environment to set up is normally
> already good to go even with this additional dependency. Now, it is
> important to mention that probes.h is not part of a source tarball. I
> think that we would want probes.h to be part of a source tarball so as
> it would be possible to compile the code on Windows using MSVC without
> having to install MSYS. I haven't done that in this patch, thoughts on
> the matter are welcome.

I think the path of least resistance is to change the sed script into a
Perl script.  Should be fairly simple ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Alvaro Herrera

Aleksander Alekseev wrote:
> > Attached is a v3 which test integers more logically. I'm a lazy
> > programmer who tends to minimize the number of key strokes.
> 
> Well. From what I can tell this patch is Ready for Committer. 

I'm not a fan of this approach either.  Would it be too complicated if
we had a global variable that indicates which thread is the progress
reporter?  We start that with thread 0, but if the reporter thread
finishes its transactions then it elects some other thread which hasn't
yet finished.  For this to work, each thread would have to maintain in a
global variable whether it has finished or not.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane

Robert Haas  writes:
> Sure.  If you have an idea what the right thing to do is, please go
> ahead.

Yeah, I'll modify the patch and commit sometime later today.

> I actually don't have a clear idea what's going on here.  I
> guess it's that the wait_for_stats() guarantees that the stats message
> from the index insertion has been received but the status messages
> from the "trunc" tables might not have gotten there yet.

That's what it looks like to me.  I now think that the apparent
connection to parallel query is a mirage.  The reason we've only
seen a few cases so far is that the flapping test is new: it
wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
as-is, I think we'd eventually see the same failure without forcing
parallel mode.  In fact, that's pretty much what you describe below,
isn't it?  The pg_sleep is sort of half-bakedly substituting for
a proper wait.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_resetxlog reference page reorganization

2016-03-04 Thread Alexander Korotkov

Hi, Peter!

I've assigned myself as reviewer of this patch.

On Tue, Mar 1, 2016 at 2:53 AM, Peter Eisentraut  wrote:

> The pg_resetxlog reference page has grown over the years into an
> unnavigable jungle, so here is a patch that reorganizes it to be more in
> the style of the other ref pages, with a normal options list.
>

Patch applies cleanly on head, documentation compiles with no problem.
pg_resetxlog page definitely looks much better than it was before.
I don't see any problems or issues with this patch.
So, I mark it "Ready for committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WAL log only necessary part of 2PC GID

2016-03-04 Thread Jesper Pedersen


On 02/29/2016 08:45 AM, Pavan Deolasee wrote:

Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined
as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
used by the applications though may be much smaller than that. So IMO
instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and
some simple crash recovery tests seem to be work ok. In terms of
performance, a quick test shows marginal improvement in tps using the
script that Stas Kelvich used for his work on speeding up twophase
transactions. The only change I made is to keep the :scale unchanged
because increasing the :scale in every iteration will result in only a
handful updates (not sure why Stas had that in his original script)

\set naccounts 10 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
:from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
:to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25%
with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because
of HOT pruning and page defragmentation. If we just look at the WAL
overhead caused by 2PC, the decline is somewhere close to 50%. I took
numbers using simple 1PC for reference and to understand the overhead of
2PC.

HEAD (1PC): 382 bytes / transaction



I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has 
changed definition, right ?


Thanks for working on this !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Alvaro Herrera

Tom Lane wrote:
> Robert Haas  writes:
> > Sure.  If you have an idea what the right thing to do is, please go
> > ahead.
> 
> Yeah, I'll modify the patch and commit sometime later today.
> 
> > I actually don't have a clear idea what's going on here.  I
> > guess it's that the wait_for_stats() guarantees that the stats message
> > from the index insertion has been received but the status messages
> > from the "trunc" tables might not have gotten there yet.
> 
> That's what it looks like to me.  I now think that the apparent
> connection to parallel query is a mirage.  The reason we've only
> seen a few cases so far is that the flapping test is new: it
> wad added in commit d42358efb16cc811, on 20 Feb.  If we left it
> as-is, I think we'd eventually see the same failure without forcing
> parallel mode.  In fact, that's pretty much what you describe below,
> isn't it?  The pg_sleep is sort of half-bakedly substituting for
> a proper wait.

It was added on Feb 20 all right, but of *last year*.  It's been there
working happily for a year now.

The reason I added the trunc test in the middle of the index update
tests is that I dislike tests that sleep for long without real purpose;
it seems pretty reasonable to me to have both sleeps actually be the
same wait.

Instead of adding another sleep function, another possibility is to add
two booleans, one for the index counter and another for the truncate
counters, and only terminate the sleep if both are true.  I don't see
any reason to make this test any slower than it already is.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Alvaro Herrera

I would like to have a patch for this finalized today, so that we can
apply to master before or during the weekend; with it in the tree for
about a week we can be more confident and backpatch close to next
weekend, so that we see it in the next set of minor releases.  Does that
sound good?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] syslog configurable line splitting behavior

2016-03-04 Thread Alexander Korotkov

On Sat, Feb 27, 2016 at 6:49 AM, Peter Eisentraut  wrote:

> Writing log messages to syslog caters to ancient syslog implementations
> in two ways:
>
> - sequence numbers
> - line splitting
>
> While these are arguably reasonable defaults, I would like a way to turn
> them off, because they get in the way of doing more interesting things
> with syslog (e.g., logging somewhere that is not just a text file).
>
> So I propose the two attached patches that introduce new configuration
> Boolean parameters syslog_sequence_numbers and syslog_split_lines that
> can toggle these behaviors.
>

Would it have any usage if we make PG_SYSLOG_LIMIT configurable (-1 for
disable) instead of introducing boolean?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> That's what it looks like to me.  I now think that the apparent
>> connection to parallel query is a mirage.  The reason we've only
>> seen a few cases so far is that the flapping test is new: it
>> wad added in commit d42358efb16cc811, on 20 Feb.

> It was added on Feb 20 all right, but of *last year*.  It's been there
> working happily for a year now.

Wup, you're right, failed to look closely enough at the commit log
entry.  So that puts us back to wondering why exactly parallel query
is triggering this.  Still, Robert's experiment with removing the
pg_sleep seems fairly conclusive: it is possible to get the failure
without parallel query.

> Instead of adding another sleep function, another possibility is to add
> two booleans, one for the index counter and another for the truncate
> counters, and only terminate the sleep if both are true.  I don't see
> any reason to make this test any slower than it already is.

Well, that would make the function more complicated, but maybe it's a
better answer.  On the other hand, we know that the stats updates are
delivered in a deterministic order, so why not simply replace the
existing test in the wait function with one that looks for the truncation
updates?  If we've gotten those, we must have gotten the earlier ones.

In any case, the real answer to making the test less slow is to get rid of
that vestigial pg_sleep.  I'm wondering why we failed to remove that when
we put in the wait_for_stats function...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 10:33 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Sure.  If you have an idea what the right thing to do is, please go
>> ahead.
>
> Yeah, I'll modify the patch and commit sometime later today.

OK, if you're basing that on the patch I sent upthread, please credit
Rahila Syed as the original author of that code.  (I modified it
before posting, but only trivially.)  Of course if you do something
else, then never mind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timeline following for logical slots

2016-03-04 Thread Craig Ringer

On 1 March 2016 at 21:00, Craig Ringer  wrote:

> Hi all
>
> Per discussion on the failover slots thread (
> https://commitfest.postgresql.org/9/488/) I'm splitting timeline
> following for logical slots into its own separate patch.
>
>
I've updated the logical decoding timeline following patch to fix a bug
found as a result of test development related to how Pg renames the last
WAL seg on the old timeline to suffix it with .partial on promotion. The
xlogreader must switch to reading from the newest-timeline version of a
given segment eagerly, for the first page of the segment, since that's the
only one guaranteed to actually exist.

I'd really appreciate some review of the logic there by people who know
timelines well and preferably know the xlogreader. It's really just one
function and 2/3 comments; the code is simple but the reasoning leading to
it is not.

I've also attached an updated version of the tests posted a few days ago.
The tests depend on the remaining patches from the TAP enhancements tree so
it's easiest to just get the whole tree from
https://github.com/2ndQuadrant/postgres/tree/dev/logical-decoding-timeline-following
(subject to regular rebases and force pushes, do not use as a base).

The tests now include a test module that exposes some slots guts to SQL to
allow the client to sync slot state from master to replica(s) without
needing failover slots and the use of extra WAL as transport. It's very
much for-testing-only.

The new test module is used by a second round of tests to demonstrate the
practicality of failover of a logical replication client to a physical
replica using a base backup taken by pg_basebackup and without the presence
of failover slots. I won't pretend it's pretty.

This proves that the approach works barring unforseen showstoppers. It also
proves it's pretty ugly - failover slots provide a much, MUCH simpler and
safer way for clients to achieve this with way less custom code needed by
each client to sync slot state.

I've got a bit of cleanup to do in the test suite and a few more tests to
write for cases where the slot on the replica is allowed to fall behind the
slot on the master but this is mostly waiting on the remaining two TAP test
patches before it can be evaluated for possible push.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 37bd2e654345af65749ccff6ca73d3afebf67072 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 11 Feb 2016 10:44:14 +0800
Subject: [PATCH 1/2] Allow logical slots to follow timeline switches

Make logical replication slots timeline-aware, so replay can
continue from a historical timeline onto the server's current
timeline.

This is required to make failover slots possible and may also
be used by extensions that CreateReplicationSlot on a standby
and replay from that slot once the replica is promoted.

This does NOT add support for replaying from a logical slot on
a standby or for syncing slots to replicas.
---
 src/backend/access/transam/xlogreader.c|  43 -
 src/backend/access/transam/xlogutils.c | 240 +++--
 src/backend/replication/logical/logicalfuncs.c |  38 +++-
 src/include/access/xlogreader.h|  35 +++-
 src/include/access/xlogutils.h |   2 +
 5 files changed, 323 insertions(+), 35 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index fcb0872..5899f44 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -10,6 +10,9 @@
  *
  * NOTES
  *		See xlogreader.h for more notes on this facility.
+ *
+ * 		The xlogreader is compiled as both front-end and backend code so
+ * 		it may not use elog, server-defined static variables, etc.
  *-
  */

@@ -116,6 +119,9 @@ XLogReaderAllocate(XLogPageReadCB pagereadfunc, void *private_data)
 		return NULL;
 	}

+	/* Will be loaded on first read */
+	state->timelineHistory = NULL;
+
 	return state;
 }

@@ -135,6 +141,13 @@ XLogReaderFree(XLogReaderState *state)
 	pfree(state->errormsg_buf);
 	if (state->readRecordBuf)
 		pfree(state->readRecordBuf);
+#ifdef FRONTEND
+	/* FE code doesn't use this and we can't list_free_deep on FE */
+	Assert(state->timelineHistory == NULL);
+#else
+	if (state->timelineHistory)
+		list_free_deep(state->timelineHistory);
+#endif
 	pfree(state->readBuf);
 	pfree(state);
 }
@@ -208,9 +221,11 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)

 	if (RecPtr == InvalidXLogRecPtr)
 	{
+		/* No explicit start point, read the record after the one we just read */
 		RecPtr = state->EndRecPtr;

 		if (state->ReadRecPtr == InvalidXLogRecPtr)
+			/* allow readPageTLI to go backward */
 			randAccess = true;

 		/*
@@ -223,6 +238,8 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 	else

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> That's what it looks like to me.  I now think that the apparent
>>> connection to parallel query is a mirage.  The reason we've only
>>> seen a few cases so far is that the flapping test is new: it
>>> wad added in commit d42358efb16cc811, on 20 Feb.
>
>> It was added on Feb 20 all right, but of *last year*.  It's been there
>> working happily for a year now.
>
> Wup, you're right, failed to look closely enough at the commit log
> entry.  So that puts us back to wondering why exactly parallel query
> is triggering this.  Still, Robert's experiment with removing the
> pg_sleep seems fairly conclusive: it is possible to get the failure
> without parallel query.
>
>> Instead of adding another sleep function, another possibility is to add
>> two booleans, one for the index counter and another for the truncate
>> counters, and only terminate the sleep if both are true.  I don't see
>> any reason to make this test any slower than it already is.
>
> Well, that would make the function more complicated, but maybe it's a
> better answer.  On the other hand, we know that the stats updates are
> delivered in a deterministic order, so why not simply replace the
> existing test in the wait function with one that looks for the truncation
> updates?  If we've gotten those, we must have gotten the earlier ones.

I'm not sure if that's actually true with parallel mode.  I'm pretty
sure the earlier workers will have terminated before the later ones
start, but is that enough to guarantee that the stats collector sees
the messages in that order?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Tom Lane

Alvaro Herrera  writes:
> I would like to have a patch for this finalized today, so that we can
> apply to master before or during the weekend; with it in the tree for
> about a week we can be more confident and backpatch close to next
> weekend, so that we see it in the next set of minor releases.  Does that
> sound good?

I see no reason to wait before backpatching.  If you're concerned about
having testing, the more branches it is in, the more buildfarm cycles
you will get on it.  And we're not going to cut any releases in between,
so what's the benefit of not having it there?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Alvaro Herrera

Craig Ringer wrote:

> If it's the result of perltidy changing its mind about the formatting as a
> result of this change I guess we have to eyeroll and live with it. perltidy
> leaves the file alone as it is in the tree currently, so that be it.
> 
> Gripe withdrawn, ready for committer IMO

Okay, thanks.  I applied it back to 9.4, which is when
--enable-tap-tests appeared.  I didn't perltidy 9.4's config_default.pl,
though.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Alvaro Herrera

Robert Haas wrote:

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Um.  So if you have two queries that run in sequence, it's possible
for workers of the first query to be still running when workers for the
second query finish?  That would be very strange.

If that's not what you're saying, I don't understand what guarantees you
say we don't have.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Failover Slots

2016-03-04 Thread Craig Ringer

On 24 February 2016 at 18:02, Craig Ringer  wrote:

> I really want to focus on the first patch, timeline following for logical
> slots. That part is much less invasive and is useful stand-alone. I'll move
> it to a separate CF entry and post it to a separate thread as I think it
> needs consideration independently of failover slots.
>

Just an update on the failover slots status: I've moved timeline following
for logical slots into its own patch set and CF entry and added a bunch of
tests.

https://commitfest.postgresql.org/9/488/

Some perl TAP test framework enhancements were needed for that; they're
mostly committed now with a few pending.

https://commitfest.postgresql.org/9/569/

Once some final changes are made to the tests for timeline following I'll
address the checkpoint issue in failover slots by doing the checkpoint of
slots at the start of a checkpoint/restartpoint, while we can still write
WAL. Per the comments in CheckPointReplicationSlots it's mostly done in a
checkpoint currently for convenience.

Then I'll write some TAP tests for failover slots and submit an updated
patch for them, by which time hopefully timeline following for logical
slots will be committed.

In other words this patch isn't dead, the foundations are just being
rebased out from under it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Craig Ringer

On 5 March 2016 at 00:10, Alvaro Herrera  wrote:

> Craig Ringer wrote:
>
> > If it's the result of perltidy changing its mind about the formatting as
> a
> > result of this change I guess we have to eyeroll and live with it.
> perltidy
> > leaves the file alone as it is in the tree currently, so that be it.
> >
> > Gripe withdrawn, ready for committer IMO
>
> Okay, thanks.  I applied it back to 9.4, which is when
> --enable-tap-tests appeared.  I didn't perltidy 9.4's config_default.pl,
> though.


Thanks very much. It didn't occur to me to backport it, but it seems
harmless.

https://commitfest.postgresql.org/9/566/ marked as committed.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane

Robert Haas  writes:
> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
>> Well, that would make the function more complicated, but maybe it's a
>> better answer.  On the other hand, we know that the stats updates are
>> delivered in a deterministic order, so why not simply replace the
>> existing test in the wait function with one that looks for the truncation
>> updates?  If we've gotten those, we must have gotten the earlier ones.

> I'm not sure if that's actually true with parallel mode.  I'm pretty
> sure the earlier workers will have terminated before the later ones
> start, but is that enough to guarantee that the stats collector sees
> the messages in that order?

Huh?  Parallel workers are read-only; what would they be doing sending
any of these messages?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Mar 4, 2016 at 11:03 AM, Tom Lane  wrote:
>>> Well, that would make the function more complicated, but maybe it's a
>>> better answer.  On the other hand, we know that the stats updates are
>>> delivered in a deterministic order, so why not simply replace the
>>> existing test in the wait function with one that looks for the truncation
>>> updates?  If we've gotten those, we must have gotten the earlier ones.
>
>> I'm not sure if that's actually true with parallel mode.  I'm pretty
>> sure the earlier workers will have terminated before the later ones
>> start, but is that enough to guarantee that the stats collector sees
>> the messages in that order?
>
> Huh?  Parallel workers are read-only; what would they be doing sending
> any of these messages?

Mumble.  I have no idea what's happening here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-04 Thread Anastasia Lubennikova


27.02.2016 09:57, Vitaly Burovoy:

Hello, Hackers!

I worked on a patch[1] allows "EXTRACT(epoch FROM
+-Inf::timestamp[tz])" to return "+-Inf::float8".
There is an opposite function "to_timestamp(float8)" which now defined as:
SELECT ('epoch'::timestamptz + $1 * '1 second'::interval)


Hi,
thank you for the patches.
Could you explain, whether they depend on each other?


Since intervals do not support infinity values, it is impossible to do
something like:

SELECT to_timestamp('infinity'::float8);

... which is not good.

Supporting of such converting is in the TODO list[2] (by "converting
between infinity timestamp and float8").


You mention intervals here, and TODO item definitely says about 
'infinity' interval,

while patch and all the following discussion concerns to timestamps.
Is it a typo or I misunderstood something important?
I assumed that following query will work, but it isn't. Could you 
clarify that?

select to_timestamp('infinity'::interval);


Proposed patch implements it.

There is an other patch in the CF[3] 2016-03 implements checking of
timestamp[tz] for being in allowed range. Since it is wise to set
(fix) the upper boundary of timestamp[tz]s, I've included the file
"src/include/datatype/timestamp.h" from there to check that an input
value and a result are in the allowed range.

There is no changes in a documentation because allowed range is the
same as officially supported[4] (i.e. until 294277 AD).


I think that you should update documentation. At least description of 
epoch on this page:

http://www.postgresql.org/docs/devel/static/functions-datetime.html

Here is how you can convert an epoch value back to a time stamp:

SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 982384720.12 * INTERVAL '1 second';

(The |to_timestamp| function encapsulates the above conversion.)


More thoughts about the patch:

1. When I copy value from hints for min and max values (see examples 
below), it works fine for min, while max still leads to error.
It comes from the check   "if (seconds >= epoch_ubound)". I wonder, 
whether you should change hint message?


select to_timestamp(-210866803200.00);
  to_timestamp
-
 4714-11-24 02:30:17+02:30:17 BC
(1 row)


select to_timestamp(9224318016000.00);
ERROR:  UNIX epoch out of range: "9224318016000.00"
HINT:  Maximal UNIX epoch value is "9224318016000.00"

2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h:

 * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy
 * about the maximum, since it's far enough out to not be especially
 * interesting.

Maybe you can expand it?
- Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases?
- Why do we need to hold both definitions? I suppose, it's a matter of 
backward compatibility, isn't it?


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice 
abbreviation, but it seems slightly confusing to me.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 11:09 AM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I would like to have a patch for this finalized today, so that we can
>> apply to master before or during the weekend; with it in the tree for
>> about a week we can be more confident and backpatch close to next
>> weekend, so that we see it in the next set of minor releases.  Does that
>> sound good?
>
> I see no reason to wait before backpatching.  If you're concerned about
> having testing, the more branches it is in, the more buildfarm cycles
> you will get on it.  And we're not going to cut any releases in between,
> so what's the benefit of not having it there?

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-03-04 Thread Robert Haas

On Wed, Mar 2, 2016 at 6:41 PM, Tom Lane  wrote:
> Jim Nasby  writes:
>> On 3/2/16 4:21 PM, Peter Geoghegan wrote:
>>> I think you should commit this. The chances of anyone other than you
>>> and Masahiko recalling that you developed this tool in 3 years is
>>> essentially nil. I think that the cost of committing a developer-level
>>> debugging tool like this is very low. Modules like pg_freespacemap
>>> currently already have no chance of being of use to ordinary users.
>>> All you need to do is restrict the functions to throw an error when
>>> called by non-superusers, out of caution.
>>>
>>> It's a problem that modules like pg_stat_statements and
>>> pg_freespacemap are currently lumped together in the documentation,
>>> but we all know that.
>
>> +1.
>
> Would it make any sense to stick it under src/test/modules/ instead of
> contrib/ ?  That would help make it clear that it's a debugging tool
> and not something we expect end users to use.

I actually think end-users might well want to use it.  Also, I created
it by hacking up pg_freespacemap, so it may make sense to have it in
the same place.

I would also be tempted to add an additional C functions that scan the
entire visibility map and return counts of the total number of bits of
each type that are set, and similarly for the page level bits.
Presumably that would be much faster than

I am also tempted to change the API to be a bit more friendly,
although I am not sure exactly how.  This was a quick and dirty hack
so that I could test, but the hardest thing about making it not a
quick and dirty hack is probably deciding on a good UI.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 12:06 AM, Dilip Kumar  wrote:
> I have tried the approach of group extend,
>
> 1. We convert the extension lock to TryLock and if we get the lock then
> extend by one block.2.
> 2. If don't get the Lock then use the Group leader concep where only one
> process will extend for all, Slight change from ProcArrayGroupClear is that
> here other than satisfying the requested backend we Add some extra blocks in
> FSM, say GroupSize*10.
> 3. So Actually we can not get exact load but still we have some factor like
> group size tell us exactly the contention size and we extend in multiple of
> that.

This approach seems good to me, and the performance results look very
positive.  The nice thing about this is that there is not a
user-configurable knob; the system automatically determines when
larger extensions are needed, which will mean that real-world users
are much more likely to benefit from this.  I don't think it matters
that this is a little faster or slower than an approach with a manual
knob; what matter is that it is a huge improvement over unpatched
master, and that it does not need a knob.  The arbitrary constant of
10 is a little unsettling but I think we can live with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Greeting for coming back, and where is PostgreSQL going

2016-03-04 Thread Joshua D. Drake

On 03/04/2016 03:20 AM, MauMau wrote:

> I've been visually impaired since birth, and now I'm almost blind (can 
> only sense the light).  I'm using screen reader software to use PCs and 
> smartphones.  As I'm using pgindent, I'm sure the source code style 
> won't be bad.  But I might overlook some styling problems like 
> indentation in the documentation patches.  I'd appreciate it if you 
> could introduce a nice editor for editing SGML/XML documents.

Welcome back!

There are quite a few editors that handle SGML/XML well. In the open
source world the two most common are likely:

Emacs
 This is what Practical PostgreSQL was written in
Bluefish
 This is a GTK based editor that has some nice touches

There are others I am sure but those are the two I have experience with.

Sincerely,

JD


-- 
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-04 Thread Tom Lane

Robert Haas  writes:
> This approach seems good to me, and the performance results look very
> positive.  The nice thing about this is that there is not a
> user-configurable knob; the system automatically determines when
> larger extensions are needed, which will mean that real-world users
> are much more likely to benefit from this.  I don't think it matters
> that this is a little faster or slower than an approach with a manual
> knob; what matter is that it is a huge improvement over unpatched
> master, and that it does not need a knob.  The arbitrary constant of
> 10 is a little unsettling but I think we can live with it.

+1.  "No knob" is a huge win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with NULLS LAST, with postgres_fdw sort pushdown

2016-03-04 Thread Robert Haas

On Thu, Mar 3, 2016 at 12:08 AM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Thu, Mar 3, 2016 at 7:27 AM, Michael Paquier 
>> wrote:
>>> Per explain.c, this looks inconsistent to me. Shouldn't NULLS LAST be
>>> applied only if DESC is used in this ORDER BY clause?
>
>> ... In this case we are constructing a query to be
>> sent to the foreign server and it's better not to leave the defaults to be
>> interpreted by the foreign server; in case it interprets them in different
>> fashion. get_rule_orderby() also explicitly adds these options.
>
> Yeah, I agree that we don't need to go out of our way to make the query
> succinct here.  Explicitness is easier and safer too, so why not?

+1.  So, committed Ashutosh's version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GetExistingLocalJoinPath() vs. the docs

2016-03-04 Thread Robert Haas

On Wed, Mar 2, 2016 at 1:12 AM, Ashutosh Bapat
 wrote:
>> I think that you need to take a little broader look at this section.
>> At the top, it says "To use any of these functions, you need to
>> include the header file foreign/foreign.h in your source file", but
>> this function is defined in foreign/fdwapi.h.  It's not clear to me
>> whether we should consider moving the prototype, or just document that
>> this function is someplace else.  The other functions prototyped in
>> fdwapi.h aren't documented at all, except for
>> IsImportableForeignTable, which is mentioned in passing.
>>
>> Further down, the section says "Some object types have name-based
>> lookup functions in addition to the OID-based ones:" and you propose
>> to put the documentation for this function after that.  But this
>> comment doesn't actually describe this particular function.
>>
>>
>> Actually, this function just doesn't seem to fit into this section at
>> all.  It's really quite different from the others listed there.  How
>> about something like the attached instead?
>
> Right. Mentioning the function in the description of relevant function looks
> better and avoids some duplication.

Cool, committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Robert Haas

On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, thank you for the comments.
>> I think we should leave string_length as it is and use a new variable
>> for character-based length, as in the attached.
>
> Basically agreed but I like byte_length for the previous
> string_length and string_length for string_length_cars. Also
> text_length is renamed in the attached patch.

I committed this and back-patched this but (1) I avoided changing the
other functions for now and (2) I gave both the byte length and the
character length new names to avoid confusion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-04 Thread Pavel Stehule

2016-03-04 15:54 GMT+01:00 Daniel Verite :

> Corey Huinker wrote:
>
> > So, for me, RAW is the right solution, or at least *a* right solution.
>
> Questions on how to extract from a bytea column come up on a regular
> basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
> are to encode the contents in text and decode them in an additional
> step, or use COPY BINARY and filter out the headers.
>
> But none of this is as straightforward and efficient as the proposed
> COPY RAW.
> Also the conversion to text can't be used at all on very large
> contents (>512MB), as mentioned in another recent thread [5]
> (this is the same reason why pg_dump can't dump such rows),
> but COPY RAW doesn't have this limitation.
>
> Technically COPY BINARY should be sufficient, but it seems that
> people dislike having to deal with its headers.
>
Also it's not supported by any of the drivers of popular
> script languages that otherwise provide COPY in text format
> (DBD::Pg, php, psycopg2...)
> Maybe the RAW format would have a better chance to get support
> there, because of its simplicity.
>

exactly - I would to decrease dependency on PostgreSQL internals. Working
with clean content is simple and possible with any environment without
unclean operations.

Regards

Pavel


>
> [1]
>
> http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
> ng
>
> [2] http://www.postgresql.org/message-id/[email protected]
>
> [3] http://stackoverflow.com/questions/6730729
>
> [4]
> http://www.postgresql.org/message-id/[email protected]
>
> [5] http://www.postgresql.org/message-id/[email protected]
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 12:02 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello, thank you for the comments.
>> >> I think we should leave string_length as it is and use a new variable
>> >> for character-based length, as in the attached.
>> >
>> > Basically agreed but I like byte_length for the previous
>> > string_length and string_length for string_length_cars. Also
>> > text_length is renamed in the attached patch.
>>
>> I committed this and back-patched this but (1) I avoided changing the
>> other functions for now and (2) I gave both the byte length and the
>> character length new names to avoid confusion.
>
> These tweaks appear to have been universally disliked by buildfarm
> members.

Crap.  Wasn't careful enough, sorry.  Will fix shortly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Alvaro Herrera

Robert Haas wrote:
> On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
>  wrote:
> > Hello, thank you for the comments.
> >> I think we should leave string_length as it is and use a new variable
> >> for character-based length, as in the attached.
> >
> > Basically agreed but I like byte_length for the previous
> > string_length and string_length for string_length_cars. Also
> > text_length is renamed in the attached patch.
> 
> I committed this and back-patched this but (1) I avoided changing the
> other functions for now and (2) I gave both the byte length and the
> character length new names to avoid confusion.

These tweaks appear to have been universally disliked by buildfarm
members.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql completion for ids in multibyte string

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 12:08 PM, Robert Haas  wrote:
> On Fri, Mar 4, 2016 at 12:02 PM, Alvaro Herrera
>  wrote:
>> Robert Haas wrote:
>>> On Wed, Mar 2, 2016 at 8:07 PM, Kyotaro HORIGUCHI
>>>  wrote:
>>> > Hello, thank you for the comments.
>>> >> I think we should leave string_length as it is and use a new variable
>>> >> for character-based length, as in the attached.
>>> >
>>> > Basically agreed but I like byte_length for the previous
>>> > string_length and string_length for string_length_cars. Also
>>> > text_length is renamed in the attached patch.
>>>
>>> I committed this and back-patched this but (1) I avoided changing the
>>> other functions for now and (2) I gave both the byte length and the
>>> character length new names to avoid confusion.
>>
>> These tweaks appear to have been universally disliked by buildfarm
>> members.
>
> Crap.  Wasn't careful enough, sorry.  Will fix shortly.

Fix pushed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Fabien COELHO



Hello Alvaro,


I looked at 19.d and I think the design has gotten pretty convoluted.  I
think we could simplify with the following changes:

struct script_t gets a new member, of type Command **, which is
initially null.

function process_builtin receives the complete script_t (not individual
memebers of it) constructs the Command ** array and puts it in
script_t's new member; return value is the same script_t struct it got
(except it's now augmented with the Command **array).

function process_file constructs a new script_t from the string list,
creates its Command **array just like process_builtin and returns the
constructed struct.

function addScript receives script_t instead of individual members of
it, and does the appropriate thing.


Why not. Here are two versions:

  *-20.patch is the initial rebased version

  *-21.patch does what you suggested above, some hidden awkwardness
 but much less that the previous one.

--
Fabiendiff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..dd3fb1d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 Unambiguous prefixes of builtin names are accepted.
@@ -322,12 +324,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -687,9 +691,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1194,12 +1202,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: 
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..c131681 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -300,23 +303,26 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
 	char	   *commands;		/* actual pgbench script */
-}
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",
@@ -392,9 +398,9 @@ usage(void)
 	 "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
-		   "  -b, --builtin=NAME   add buitin script (use \"-b list\" to display\n"
-		   "   available scripts)\n"
-		   "  -f, --file=FILENAME  add transaction script from FILENAME\n"
+		   "  -b, -

Re: [HACKERS] Incorrect error message in InitializeSessionUserId

2016-03-04 Thread Robert Haas

On Thu, Mar 3, 2016 at 9:23 PM, Michael Paquier
 wrote:
> On Fri, Mar 4, 2016 at 10:45 AM, Haribabu Kommi
>  wrote:
>> On Wed, Mar 2, 2016 at 12:21 AM, Dmitriy Sarafannikov
>>  wrote:
>>> Hi all,
>>>
>>> I have found incorrect error message in InitializeSessionUserId function
>>> if you try to connect to database by role Oid (for example
>>> BackgroundWorkerInitializeConnectionByOid).
>>> If role have no permissions to login, you will see error message like this:
>>> FATAL:  role "(null)" is not permitted to log in
>>>
>>> I changed few lines of code and fixed this.
>>> Patch is attached.
>>> I want to add this patch to commitfest.
>>> Any objections?
>>>
>>
>> The patch adds the support of taking the role name from the role tuple
>> instead of using the provided rolename variable, because it is possible
>> that rolename variable is NULL if the connection is from a background
>> worker.
>>
>> The patch is fine, I didn't find any problems, I marked it as ready for
>> committer.
>>
>> IMO this patch may need to backpatch supported branches as it is
>> a bug fix. Committer can decide.
>
> +1 for the backpatch. The current error message should the rolename be
> undefined in this context is misleading for users.

Back-patched to 9.5.  I don't think this is relevant for earlier branches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Fabien COELHO



Hello Alvaro,


Attached is a v3 which test integers more logically. I'm a lazy
programmer who tends to minimize the number of key strokes.


Well. From what I can tell this patch is Ready for Committer.


I'm not a fan of this approach either.  Would it be too complicated if
we had a global variable that indicates which thread is the progress
reporter?  We start that with thread 0, but if the reporter thread
finishes its transactions then it elects some other thread which hasn't
yet finished.  For this to work, each thread would have to maintain in a
global variable whether it has finished or not.


Hmmm.

Probably it is possible, but it will sure need more that one little 
condition to be achieved... I do not think that introducing a non trivial 
distributed election algorithm involving locks and so would be a good 
decision for this very little matter.


My advice is "keep it simple".

If this is a blocker, I can sure write such an algorithm, when I have some 
spare time, but I'm not sure that the purpose is worth it.


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-04 Thread Robert Haas

On Thu, Mar 3, 2016 at 8:54 PM, Kouhei Kaigai  wrote:
> I found one other, but tiny, problem to implement SSD-to-GPU direct
> data transfer feature under the PostgreSQL storage.
>
> Extension cannot know the raw file descriptor opened by smgr.
>
> I expect an extension issues an ioctl(2) on the special device file
> on behalf of the special kernel driver, to control the P2P DMA.
> This ioctl(2) will pack file descriptor of the DMA source and some
> various information (like base position, range, destination device
> pointer, ...).
>
> However, the raw file descriptor is wrapped in the fd.c, instead of
> the File handler, thus, not visible to extension. oops...
>
> The attached patch provides a way to obtain raw file descriptor (and
> relevant flags) of a particular File virtual file descriptor on
> PostgreSQL. (No need to say, extension has to treat the raw descriptor
> carefully not to give an adverse effect to the storage manager.)
>
> How about this tiny enhancement?

Why not FileDescriptor(), FileFlags(), FileMode() as separate
functions like FilePathName()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-04 Thread Tom Lane

OK, here is a version that I think addresses all of the recent comments:

* I refactored the grouping-sets stuff as suggested by Robert and David.
The GroupingSetsPath code is now used *only* when there are grouping sets,
otherwise what you get is a plain AGG_SORTED AggPath.  This allowed
removal of a boatload of weird corner cases in the GroupingSets code path,
so it was a good change.  (Fundamentally, that's cleaning up some
questionable coding in the grouping sets patch rather than fixing anything
directly related to pathification, but I like the code better now.)

* I refactored the handling of targetlists in createplan.c.  After some
reflection I decided that the disuse_physical_tlist callers fell into
three separate categories: those that actually needed the exact requested
tlist to be returned, those that wanted non-bloated tuples because they
were going to put them into sort or hash storage, and those that needed
grouping columns to be properly labeled.  The new approach is to pass down
a "flags" word that specifies which if any of these cases apply at a
specific plan level.  use_physical_tlist now always makes the right
decision to start with, and disuse_physical_tlist is gone entirely, which
should make things a bit faster since we won't uselessly construct and
discard physical tlists.  The missing logic from make_subplanTargetList
and locate_grouping_columns is reincarnated in the physical-tlist code.

* Added explicit limit/offset fields to LimitPath, as requested by Teodor.

* Removed SortPath.sortgroupclauses.

* Fixed handling of parallel-query fields in new path node types.
(BTW, I found what seemed to be a couple of pre-existing bugs of
the same kind, eg create_mergejoin_path was different from the
other two kinds of join as to setting parallel_degree.)


What remains to be done, IMV:

* Performance testing as per yesterday's discussion.

* Debug support in outfuncs.c and print_path() for new node types.

* Clean up unfinished work on function header comments.

* Write some documentation about how FDWs might use this.

I'll work on the performance testing next.  Barring unsatisfactory
results from that, I think this could be committable in a couple
of days.

regards, tom lane



upper-planner-pathification-2.patch.gz
Description: upper-planner-pathification-2.patch.gz

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re[2]: [HACKERS] Incorrect error message in InitializeSessionUserId

2016-03-04 Thread Dmitriy Sarafannikov

>On Fri, Mar 4, 2016, 5:23 +03:00 от Michael Paquier < 
>[email protected] >:
>
>> The patch adds the support of taking the role name from the role tuple
>> instead of using the provided rolename variable, because it is possible
>> that rolename variable is NULL if the connection is from a background
>> worker.
>>
>> The patch is fine, I didn't find any problems, I marked it as ready for
>> committer.
>>
>> IMO this patch may need to backpatch supported branches as it is
>> a bug fix. Committer can decide.
>
>+1 for the backpatch. The current error message should the rolename be
>undefined in this context is misleading for users.
>-- 
>Michael

Thanks for feedback.

This patch successfully applies to the 9.5 branch.
In 9.4 and below versions InitializeSessionUserId function has other signature:
void InitializeSessionUserId(const char *rolename)
and it is impossible to pass role Oid to this function.

In this way, the patch is relevant only to the master and 9.5 branches

Regards,
Dmitriy Sarafannikov
-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExecGather() + nworkers

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 6:55 AM, Amit Kapila  wrote:
> On Fri, Mar 4, 2016 at 5:21 PM, Haribabu Kommi 
> wrote:
>>
>> On Fri, Mar 4, 2016 at 10:33 PM, Amit Kapila 
>> wrote:
>> > On Fri, Mar 4, 2016 at 11:57 AM, Haribabu Kommi
>> > 
>> > wrote:
>> >>
>> >> On Wed, Jan 13, 2016 at 7:19 PM, Amit Kapila 
>> >> wrote:
>> >> >>
>> >> >
>> >> > Changed the code such that nworkers_launched gets used wherever
>> >> > appropriate instead of nworkers.  This includes places other than
>> >> > pointed out above.
>> >>
>> >> The changes of the patch are simple optimizations that are trivial.
>> >> I didn't find any problem regarding the changes. I think the same
>> >> optimization is required in "ExecParallelFinish" function also.
>> >>
>> >
>> > There is already one change as below for ExecParallelFinish() in patch.
>> >
>> > @@ -492,7 +492,7 @@ ExecParallelFinish(ParallelExecutorInfo *pei)
>> >
>> >   WaitForParallelWorkersToFinish(pei->pcxt);
>> >
>> >
>> >
>> >   /* Next, accumulate buffer usage. */
>> >
>> > - for (i = 0; i < pei->pcxt->nworkers; ++i)
>> >
>> > + for (i = 0; i < pei->pcxt->nworkers_launched; ++i)
>> >
>> >   InstrAccumParallelQuery(&pei->buffer_usage[i]);
>> >
>> >
>> > Can you be slightly more specific, where exactly you are expecting more
>> > changes?
>>
>> I missed it during the comparison with existing code and patch.
>> Everything is fine with the patch. I marked the patch as ready for
>> committer.
>>
>
> Thanks!

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Alvaro Herrera

Fabien COELHO wrote:

Hi,

>   *-21.patch does what you suggested above, some hidden awkwardness
>  but much less that the previous one.

Yeah, I think this is much nicer, don't you agree?

However, this is still a bit broken -- you cannot return a stack
variable from process_file, because the stack goes away once the
function returns.  You need to malloc it.

Also, you forgot to update the comments in process_file,
process_builtin, etc.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Alvaro Herrera

Fabien COELHO wrote:

> Probably it is possible, but it will sure need more that one little
> condition to be achieved... I do not think that introducing a non trivial
> distributed election algorithm involving locks and so would be a good
> decision for this very little matter.
> 
> My advice is "keep it simple".
> 
> If this is a blocker, I can sure write such an algorithm, when I have some
> spare time, but I'm not sure that the purpose is worth it.

You're probably right, but TBH I'm pretty unsure about this whole thing.
I will leave it alone for the time being.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-04 Thread Robert Haas

On Thu, Mar 3, 2016 at 11:00 PM, David Rowley
 wrote:
> On 17 February 2016 at 17:50, Haribabu Kommi  wrote:
>> Here I attached a draft patch based on previous discussions. It still needs
>> better comments and optimization.
>
> Over in [1] Tom posted a large change to the grouping planner which
> causes large conflict with the parallel aggregation patch. I've been
> looking over Tom's patch and reading the related thread and I've
> observed 3 things:
>
> 1. Parallel Aggregate will be much easier to write and less code to
> base it up top of Tom's upper planner changes. The latest patch does
> add a bit of cruft (e.g create_gather_plan_from_subplan()) which won't
> be required after Tom pushes the changes to the upper planner.
> 2. If we apply parallel aggregate before Tom's upper planner changes
> go in, then Tom needs to reinvent it again when rebasing his patch.
> This seems senseless, so this is why I did this work.
> 3. Based on the thread, most people are leaning towards getting Tom's
> changes in early to allow a bit more settle time before beta, and
> perhaps also to allow other patches to go in after (e.g this)
>
> So, I've done a bit of work and I've rewritten the parallel aggregate
> code to base it on top of Tom's patch posted in [1].

Great!

> 3. The code never attempts to mix and match Grouping Agg and Hash Agg
> plans. e.g it could be an idea to perform Partial Hash Aggregate ->
> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize
> stage. I just thought doing this is more complex than what's really
> needed, but if someone can think of a case where this would be a great
> win then I'll listen, but you have to remember we don't have any
> pre-sorted partial paths at this stage, so an explicit sort is
> required *always*. This might change if someone invented partial btree
> index scans... but until then...

Actually, Rahila Syed is working on that.  But it's not done yet, so
presumably will not go into 9.6.

I don't really see the logic of this, though.  Currently, Gather
destroys the input ordering, so it seems preferable for the
finalize-aggregates stage to use a hash aggregate whenever possible,
whatever the partial-aggregate stage did.  Otherwise, we need an
explicit sort.  Anyway, it seems like the two stages should be costed
and decided on their own merits - there's no reason to chain the two
decisions together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-04 Thread Robert Haas

On Thu, Mar 3, 2016 at 2:48 AM, Shulgin, Oleksandr
 wrote:
> On Wed, Mar 2, 2016 at 7:33 PM, Alvaro Herrera 
> wrote:
>> Shulgin, Oleksandr wrote:
>>
>> > Alright.  I'm attaching the latest version of this patch split in two
>> > parts: the first one is NULLs-related bugfix and the second is the
>> > "improvement" part, which applies on top of the first one.
>>
>> So is this null-related bugfix supposed to be backpatched?  (I assume
>> it's not because it's very likely to change existing plans).
>
> For the good, because cardinality estimations will be more accurate in these
> cases, so yes I would expect it to be back-patchable.

-1.  I think the cost of changing existing query plans in back
branches is too high.  The people who get a better plan never thank
us, but the people who (by bad luck) get a worse plan always complain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Fabien COELHO




  *-21.patch does what you suggested above, some hidden awkwardness
 but much less that the previous one.


Yeah, I think this is much nicer, don't you agree?


Yep, I said "less awkwarness than previous", a pretty contrived way to say 
better:-)



However, this is still a bit broken -- you cannot return a stack
variable from process_file, because the stack goes away once the
function returns.  You need to malloc it.


That is why the "fs" variable in process_file is declared "static", and 
why I wrote "some hidden awkwarness".


I did want to avoid a malloc because then who would free the struct? 
addScript cannot to it systematically because builtins are static. Or it 
would have to create an on purpose struct, but I then that would be more 
awkwarness, and malloc/free to pass arguments between functions is not 
efficient nor very elegant.


So the "static" option looked like the simplest & most elegant version.


Also, you forgot to update the comments in process_file,
process_builtin, etc.


Indeed. v22 attached with better comments.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..dd3fb1d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 Unambiguous prefixes of builtin names are accepted.
@@ -322,12 +324,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -687,9 +691,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1194,12 +1202,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: 
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..d7af86e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -300,23 +303,27 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
-	char	   *commands;		/* actual pgbench script */
-}
+	char	   *script;			/* actual pgbench script */
+	Command   **commands; 		/* temporary intermediate holder */
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",
@@ -334,7 +341,8 @@ static struct
 		"UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;\n"
 		"UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;\n"
 		"INSERT INTO pgbe

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Fabien COELHO




Probably it is possible, but it will sure need more that one little
condition to be achieved... I do not think that introducing a non trivial
distributed election algorithm involving locks and so would be a good
decision for this very little matter.

My advice is "keep it simple".

If this is a blocker, I can sure write such an algorithm, when I have some
spare time, but I'm not sure that the purpose is worth it.


You're probably right, but TBH I'm pretty unsure about this whole thing.


If the question is "is there a bug", then answer is yes. The progress 
report may disappear if thread 0 happens to stop, even of all other 
threads go on. Obviously it only concerns slow queries, but there is no 
reason why pgbench should not work with slow queries. I can imagin good 
reason to do that, say to check the impact of such queries on an OLTP 
load.


The bug can be kept instead, and it can be called a feature.


I will leave it alone for the time being.


Maybe you could consider pushing the first part of the patch, which stops 
if a transaction is scheduled after the end of the run? Or is this part 
bothering you as well?


--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Alvaro Herrera

Fabien COELHO wrote:

> >However, this is still a bit broken -- you cannot return a stack
> >variable from process_file, because the stack goes away once the
> >function returns.  You need to malloc it.
> 
> That is why the "fs" variable in process_file is declared "static", and why
> I wrote "some hidden awkwarness".
> 
> I did want to avoid a malloc because then who would free the struct?
> addScript cannot to it systematically because builtins are static. Or it
> would have to create an on purpose struct, but I then that would be more
> awkwarness, and malloc/free to pass arguments between functions is not
> efficient nor very elegant.
> 
> So the "static" option looked like the simplest & most elegant version.

Surely that trick breaks if you have more than one -f switch, no?  Oh, I
see what you're doing: you only use the command list, which is
allocated, so it doesn't matter that the rest of the struct changes
later.  That seems rather nasty to me -- I'd avoid that.

I'm not concerned about freeing the struct; what's the problem with it
surviving until the program terminates?  If somebody specifies thousands
of -f switches, they will waste a few bytes with each, but I'm hardly
concerned about a few dozen kilobytes there ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Alvaro Herrera

Fabien COELHO wrote:
> 
> >>Probably it is possible, but it will sure need more that one little
> >>condition to be achieved... I do not think that introducing a non trivial
> >>distributed election algorithm involving locks and so would be a good
> >>decision for this very little matter.
> >>
> >>My advice is "keep it simple".
> >>
> >>If this is a blocker, I can sure write such an algorithm, when I have some
> >>spare time, but I'm not sure that the purpose is worth it.
> >
> >You're probably right, but TBH I'm pretty unsure about this whole thing.
> 
> If the question is "is there a bug", then answer is yes. The progress report
> may disappear if thread 0 happens to stop, even of all other threads go on.
> Obviously it only concerns slow queries, but there is no reason why pgbench
> should not work with slow queries. I can imagin good reason to do that, say
> to check the impact of such queries on an OLTP load.
> 
> The bug can be kept instead, and it can be called a feature.

No, I agree that this looks like a bug and that we should fix it; for
example, if all connections from thread 0 terminate for some reason,
there will be no more reports, even if the other threads continue.
That's bad too.

What I'm unsure about is the proposed fix.

> >I will leave it alone for the time being.
> 
> Maybe you could consider pushing the first part of the patch, which stops if
> a transaction is scheduled after the end of the run? Or is this part
> bothering you as well?

So there are *two* bugs here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench stats per script & other stuff

2016-03-04 Thread Fabien COELHO




That is why the "fs" variable in process_file is declared "static", and why
I wrote "some hidden awkwarness".

I did want to avoid a malloc because then who would free the struct?
addScript cannot to it systematically because builtins are static. Or it
would have to create an on purpose struct, but I then that would be more
awkwarness, and malloc/free to pass arguments between functions is not
efficient nor very elegant.

So the "static" option looked like the simplest & most elegant version.


Surely that trick breaks if you have more than one -f switch, no?  Oh, I
see what you're doing: you only use the command list, which is
allocated, so it doesn't matter that the rest of the struct changes
later.


The two fields that matter (desc and commands) are really copied into 
sql_scripts, so what stays in the is overriden if used another time.



I'm not concerned about freeing the struct; what's the problem with it
surviving until the program terminates?


It is not referenced anywhere so it is a memory leak.

If somebody specifies thousands of -f switches, they will waste a few 
bytes with each, but I'm hardly concerned about a few dozen kilobytes 
there ...


Ok, so you prefer a memory leak. I hate it on principle.

Here is a v23 with a memory leak anyway.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..dd3fb1d 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin=scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the script.  If not specified, it is set to 1.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 Unambiguous prefixes of builtin names are accepted.
@@ -322,12 +324,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -687,9 +691,13 @@ pgbench  options  dbname
   What is the Transaction Actually Performed in pgbench?
 
   
-   Pgbench executes test scripts chosen randomly from a specified list.
+   pgbench executes test scripts chosen randomly
+   from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1194,12 +1202,11 @@ number of clients: 10
 number of threads: 1
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
+latency average = 15.844 ms
+latency stddev = 2.715 ms
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: 
- - 1 transactions (100.0% of total, tps = 618.764555)
- - latency average = 15.844 ms
- - latency stddev = 2.715 ms
+script statistics:
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 8b0b17a..5363648 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -38,6 +38,7 @@
 #include "portability/instr_time.h"
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,6 +180,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@'/* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -300,23 +303,27 @@ typedef struct
 static struct
 {
 	const char *name;
+	int			weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int64 total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
-#define N_BUILTIN 3
-static struct
+typedef struct script_t
 {
 	char	   *name;			/* very short name for -b ... */
 	char	   *desc;			/* short description */
-	char	   *commands;		/* actual pgbench script */
-}
+	char	   *script;			/* actual pgbench script */
+	Command   **commands; 		/* temporary intermediate holder */
+} script_t;
 
-			builtin_script[] =
+#define N_BUILTIN 3
+static script_t builtin_script[] =
 {
 	{
 		"tpcb-like",

Re: [HACKERS] Logic problem in SerializeSnapshot()

2016-03-04 Thread Robert Haas

On Tue, Mar 1, 2016 at 12:34 AM, Rushabh Lathia
 wrote:
> During the testing of parallel query (with force_parallel_mode = regress),
> noticed random server crash with the below stack:
>
> #0  0x003fc84896d5 in memcpy () from /lib64/libc.so.6
> #1  0x00a36867 in SerializeSnapshot (snapshot=0x1e49f40,
> start_address=0x7f391e9ec728 ) at
> snapmgr.c:1523
> #2  0x00522a20 in InitializeParallelDSM (pcxt=0x1e49ce0) at
> parallel.c:330
> #3  0x006dd256 in ExecInitParallelPlan (planstate=0x1f012b0,
> estate=0x1f00be8, nworkers=1) at execParallel.c:398
> #4  0x006f8abb in ExecGather (node=0x1f00d00) at nodeGather.c:160
> #5  0x006de42e in ExecProcNode (node=0x1f00d00) at
> execProcnode.c:516
> #6  0x006da4fd in ExecutePlan (estate=0x1f00be8,
> planstate=0x1f00d00, use_parallel_mode=1 '\001', operation=CMD_SELECT,
> sendTuples=1 '\001', numberTuples=0,
> direction=ForwardScanDirection, dest=0x1e5e118) at execMain.c:1633
>
> So started looking into SerializeSnapshot() and with code reading I found
> that
> we ignore copying the SubXID array if it has overflowed, unless the snapshot
> was taken during recovey, and for this we mark the
> serialized_snapshot->subxcnt
> to 0. But later while copying the SubXID array we check then condition based
> on
> snapshot->subxcnt. We should check serialized_snapshot->subxcnt rather then
> snapshot->subxcnt.
>
> I tried hard to come up with individual test but somehow I was unable to
> create testcase.
>
> PFA patch to fix the issue.

Thanks, that looks right to me.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench small bug fix

2016-03-04 Thread Fabien COELHO




You're probably right, but TBH I'm pretty unsure about this whole thing.


If the question is "is there a bug", then answer is yes. The progress report
may disappear if thread 0 happens to stop, even of all other threads go on.
Obviously it only concerns slow queries, but there is no reason why pgbench
should not work with slow queries. I can imagin good reason to do that, say
to check the impact of such queries on an OLTP load.

The bug can be kept instead, and it can be called a feature.


No, I agree that this looks like a bug and that we should fix it; for
example, if all connections from thread 0 terminate for some reason,
there will be no more reports, even if the other threads continue.
That's bad too.

What I'm unsure about is the proposed fix.


I will leave it alone for the time being.


Maybe you could consider pushing the first part of the patch, which stops if
a transaction is scheduled after the end of the run? Or is this part
bothering you as well?


So there are *two* bugs here?


Hmmm... AFAICR, maybe fixing the first creates the second issue, i.e. 
maybe the second issue is currently hidden by the thread going on after 
the end of the run, so the second is just a latent bug that cannot be 
encountered.


I'm not sure whether I'm very clear:-)

--
Fabien.


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Add generate_series(date, date) and generate_series(date, date, integer)

2016-03-04 Thread David Steele

On 2/21/16 2:24 PM, Vik Fearing wrote:
> On 02/21/2016 07:56 PM, Corey Huinker wrote:
>>
>> Other than that, the only difference is the ::date part. Is it
>> really worth adding extra code just for that? I would say not.
>>
>>
>> I would argue it belongs for the sake of completeness. 
> 
> So would I.
> 
> +1 for adding this missing function.

+1. FWIW, a sample query I wrote for a customer yesterday would have
looked nicer with this function. Here's how the generate_series looked:

generate_series('2016-03-01'::date, '2016-03-31'::date, interval '1
day')::date

But it would have been cleaner to write:

generate_series('2016-03-01'::date, '2016-03-31'::date)

More importantly, though, I don't like that the timestamp version of the
function happily takes date parameters but returns timestamps. I feel
this could lead to some subtle bugs for the unwary.

-- 
-David
[email protected]



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] transam README small fix

2016-03-04 Thread Robert Haas

On Tue, Mar 1, 2016 at 4:31 AM, Stas Kelvich  wrote:
> Transaction function call sequence description in transam/README is slightly 
> outdated. Select now handled by PortalRunSelect instead of ProcessQuery. It 
> is also hard to follow what tabulation there means — sometimes that means 
> “function called by function”, sometimes it isn't. So I’ve also changed it to 
> actual call nesting.

After some study, this looks good to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transam README small fix

2016-03-04 Thread Stas Kelvich

Thanks.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


> On 04 Mar 2016, at 22:14, Robert Haas  wrote:
> 
> On Tue, Mar 1, 2016 at 4:31 AM, Stas Kelvich  wrote:
>> Transaction function call sequence description in transam/README is slightly 
>> outdated. Select now handled by PortalRunSelect instead of ProcessQuery. It 
>> is also hard to follow what tabulation there means — sometimes that means 
>> “function called by function”, sometimes it isn't. So I’ve also changed it 
>> to actual call nesting.
> 
> After some study, this looks good to me, so committed.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-03-04 Thread Andres Freund

On 2016-02-12 15:56:45 +0100, Andres Freund wrote:
> Hi,
> 
> 
> On 2016-02-10 23:26:20 -0500, Robert Haas wrote:
> > I think the part about whacking around the FDW API is a little more
> > potentially objectionable to others, so I want to hold off doing that
> > unless a few more people chime in with +1.  Perhaps we could start a
> > new thread to talk about that specific idea.  This is useful even
> > without that, though.
> 
> FWIW, I can delete a couple hundred lines of code from citusdb thanks to
> this...

And I'm now working on doing that.


> why exactly did you expose read/writeBitmapset(), and nothing else?
> Afaics a lot of the other read routines are also pretty necessary from
> the outside?

I'd like to also expose at least outDatum()/readDatum() - they're not
entirely trivial, so it'd be sad to copy them.


What I'm wondering about right now is how an extensible node should
implement the equivalent of
#define WRITE_NODE_FIELD(fldname) \
(appendStringInfo(str, " :" CppAsString(fldname) " "), \
 _outNode(str, node->fldname))

given that _outNode isn't public, that seems to imply having to do
something like

#define WRITE_NODE_FIELD(fldname) \
(appendStringInfo(str, " :" CppAsString(fldname) " "), \
 appendStringInfo(str, nodeToString(node->fldname)))

i.e. essentially doubling memory overhead. Istm we should make
outNode() externally visible?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Typo in comment

2016-03-04 Thread Thomas Munro

Hi

Here is a patch to fix a typo in a comment in timestamp.c.

-- 
Thomas Munro
http://www.enterprisedb.com


typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-04 Thread Vitaly Burovoy

On 3/4/16, Anastasia Lubennikova  wrote:
> 27.02.2016 09:57, Vitaly Burovoy:
>> Hello, Hackers!
>>
>> I worked on a patch[1] allows "EXTRACT(epoch FROM
>> +-Inf::timestamp[tz])" to return "+-Inf::float8".
>> There is an opposite function "to_timestamp(float8)" which now defined
>> as:
>> SELECT ('epoch'::timestamptz + $1 * '1 second'::interval)
>
> Hi,
> thank you for the patches.

Thank you for the review.

> Could you explain, whether they depend on each other?

Only logically. They reverse each other:
postgres=# SELECT v, to_timestamp(v), extract(epoch FROM to_timestamp(v)) FROM
postgres-#   unnest(ARRAY['+inf', '-inf', 0, 65536, 982384720.12]::float8[]) v;
  v   |   to_timestamp|  date_part
--+---+--
 Infinity | infinity  | Infinity
-Infinity | -infinity |-Infinity
0 | 1970-01-01 00:00:00+00|0
65536 | 1970-01-01 18:12:16+00|65536
 982384720.12 | 2001-02-17 04:38:40.12+00 | 982384720.12
(5 rows)

>> Since intervals do not support infinity values, it is impossible to do
>> something like:
>>
>> SELECT to_timestamp('infinity'::float8);
>>
>> ... which is not good.
>>
>> Supporting of such converting is in the TODO list[2] (by "converting
>> between infinity timestamp and float8").
>
> You mention intervals here, and TODO item definitely says about
> 'infinity' interval,

Yes, it is in the same block. But I wanted to point to the link
"converting between infinity timestamp and float8". There are two-way
conversion examples.

> while patch and all the following discussion concerns to timestamps.
> Is it a typo or I misunderstood something important?

It is just a reason why I rewrote it as an internal function.
I asked whether to just rewrite the function
"pg_catalog.to_timestamp(float8)" as an internal one or to add support
of infinite intervals. Tom Lane answered[5] "you should stay away from
infinite intervals".
So I left intervals as is.

> I assumed that following query will work, but it isn't. Could you
> clarify that?
> select to_timestamp('infinity'::interval);

It is not hard. There is no logical way to convert interval (e.g.
"5minutes") to a timestamp (or date).
There never was a function "to_timestamp(interval)" and never will be.
postgres=# select to_timestamp('5min'::interval);
ERROR:  function to_timestamp(interval) does not exist
LINE 1: select to_timestamp('1min'::interval);
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

>> Proposed patch implements it.
>>
>> There is an other patch in the CF[3] 2016-03 implements checking of
>> timestamp[tz] for being in allowed range. Since it is wise to set
>> (fix) the upper boundary of timestamp[tz]s, I've included the file
>> "src/include/datatype/timestamp.h" from there to check that an input
>> value and a result are in the allowed range.
>>
>> There is no changes in a documentation because allowed range is the
>> same as officially supported[4] (i.e. until 294277 AD).
>
> I think that you should update documentation. At least description of
> epoch on this page:
> http://www.postgresql.org/docs/devel/static/functions-datetime.html

Thank you very much for pointing where it is located (I saw only
"to_timestamp(TEXT, TEXT)").
I'll think how to update it.

> More thoughts about the patch:
>
> 1. When I copy value from hints for min and max values (see examples
> below), it works fine for min, while max still leads to error.
> It comes from the check   "if (seconds >= epoch_ubound)". I wonder,
> whether you should change hint message?
>
> select to_timestamp(-210866803200.00);
>to_timestamp
> -
>   4714-11-24 02:30:17+02:30:17 BC
> (1 row)
>
>
> select to_timestamp(9224318016000.00);
> ERROR:  UNIX epoch out of range: "9224318016000.00"
> HINT:  Maximal UNIX epoch value is "9224318016000.00"

I agree, it is a little confusing. Do you (or anyone) know how to
construct a condensed phrase that it is an exclusive upper bound of an
allowed UNIX epoch range?

> 2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h:
>
>   * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy
>   * about the maximum, since it's far enough out to not be especially
>   * interesting.

It is just about the accuracy to the day for a lower bound, and to the
year (not to a day) for an upper bound.

> Maybe you can expand it?
> - Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases?
> - Why do we need to hold both definitions? I suppose, it's a matter of
> backward compatibility, isn't it?

Yes. I tried to be less invasive from the point of view of endusers.
They can be sure if they follow the documentation they won't get into
trouble.

> 3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice
> abbreviation, but it seems slightly

Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-04 Thread Corey Huinker

>
>
> I feel rather uneasy about simply removing the 'infinity' checks. Is there
> a way to differentiate those two cases, i.e. when the generate_series is
> called in target list and in the FROM part? If yes, we could do the check
> only in the FROM part, which is the case that does not work (and consumes
> arbitrary amounts of memory).
>
>
It would be simple enough to remove the infinity test on the "stop" and
leave it on the "start". Or yank both. Just waiting for others to agree
which checks should remain.

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-03-04 Thread Robert Haas

On Tue, Feb 23, 2016 at 1:18 AM, Etsuro Fujita
 wrote:
> Thanks again for updating the patch and fixing the issues!

Some comments on the latest version.  I haven't reviewed the
postgres_fdw changes in detail here, so this is just about the core
changes.

I see that show_plan_tlist checks whether the operation is any of
CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
else where a similar test is needed instead tests whether the
operation is *not* CMD_SELECT.  I think this place should do it that
way, too.

+   resultRelInfo = mtstate->resultRelInfo;
for (i = 0; i < nplans; i++)
{
ExecAuxRowMark *aerm;

+   /*
+* ignore subplan if the FDW pushes down the
command to the remote
+* server; the ModifyTable won't have anything
to do except for
+* evaluation of RETURNING expressions
+*/
+   if (resultRelInfo->ri_FdwPushdown)
+   {
+   resultRelInfo++;
+   continue;
+   }
+
subplan = mtstate->mt_plans[i]->plan;
aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
mtstate->mt_arowmarks[i] =
lappend(mtstate->mt_arowmarks[i], aerm);
+   resultRelInfo++;
}


This kind of thing creates a hazard for future people maintaining this
code.  If somebody adds some code to this loop that needs to execute
even when resultRelInfo->ri_FdwPushdown is true, they have to add two
copies of it.  It's much better to move the three lines of logic that
execute only in the non-pushdown case inside of if
(!resultRelInfo->ri_FdwPushdown).

This issue crops up elsewhere as well.  The changes to
ExecModifyTable() have the same problem -- in that case, it might be
wise to move the code that's going to have to be indented yet another
level into a separate function.   That code also says this:

+   /* No need to provide scan tuple to
ExecProcessReturning. */
+   slot = ExecProcessReturning(resultRelInfo,
NULL, planSlot);

...but, uh, why not?  The comment says what the code does, but what it
should do is explain why it does it.

On a broader level, I'm not very happy with the naming this patch
uses.  Here's an example:

+
+ If an FDW supports optimizing foreign table updates, it still needs to
+ provide PlanDMLPushdown, BeginDMLPushdown,
+ IterateDMLPushdown and EndDMLPushdown
+ described below.
+

"Optimizing foreign table updates" is both inaccurate (since it
doesn't only optimize updates) and so vague as to be meaningless
unless you already know what it means.  The actual patch uses
terminology like "fdwPushdowns" which is just as bad.  We might push a
lot of things to the foreign side -- sorts, joins, aggregates, limits
-- and this is just one of them.  Worse, "pushdown" is itself
something of a term of art - will people who haven't been following
all of the mammoth, multi-hundred-email threads on this topic know
what that means?  I think we need some better terminology here.

The best thing that I can come up with offhand is "bulk modify".  So
we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
ResultRelInfo flag could be ri_usesFDWBulkModify.  The documentation
could say something like this:

Some inserts, updates, and deletes to foreign tables can be optimized
by implementing an alternate set of interfaces.  The ordinary
interfaces for inserts, updates, and deletes fetch rows from the
remote server and then modify those rows one at a time.  In some
cases, this row-by-row approach is necessary, but it can be
inefficient.  If it is possible for the foreign server to determine
which rows should be modified without actually retrieving them, and if
there are no local triggers which would affect the operation, then it
is possible to arrange things so that the entire operation is
performed on the remote server.  The interfaces described below make
this possible.

+ Begin executing a foreign table update directly on the remote server.

I think this should say "Prepare to execute a bulk modification
directly on the remote server".  It shouldn't actually begin the
execution phase.

+ End the table update and release resources.  It is normally not important

And I think this one should say "Clean up following a bulk
modification on the remote server".  It's not actually ending the
update; the iterate method already did that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in comment

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 2:39 PM, Thomas Munro
 wrote:
> Here is a patch to fix a typo in a comment in timestamp.c.

That looks like a typo, all right.  Committed.

(It's "commit small patches day" for me today, in case anybody hasn't
caught on to that already...)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relaxing SSL key permission checks

2016-03-04 Thread Alvaro Herrera

Christoph Berg wrote:
> Re: To Tom Lane 2016-02-19 <[email protected]>
> > Updated patch attached.
> 
> *Blush* I though I had compile-tested the patch, but without
> --enable-openssl it wasn't built :(.
> 
> The attached patch has successfully been included in the 9.6 Debian
> package, passed the regression tests there, and I've also done some
> chmod/chown tests on the filesystem to verify it indeed catches the
> cases laid out by Tom.

This looks like a pretty reasonable change to me.

For the sake of cleanliness, I propose splitting out the checks for
regular file and for owned-by-root-or-us from the actual chmod-level
checks at the same time.  That way we can provide more specific error
messages for each case.  (Furthermore, I'm pretty sure that the check
for superuserness could be applied on Windows also -- in the attached
patch it's still #ifdef'ed out, because I don't know how to write it.)

After doing that change I started to look at the details of the check
and found some mistakes:

* it said "g=w" instead of "g=r", in contradiction with the numeric
specification: g=w means 020 rather than 040.  We want the file to be
group-readable, not group-writeable.

* it failed to check for S_IXUSR, so permissions 0700 were okay, in
contradiction with what the error message indicates.  This is a
preexisting bug actually.  Do we want to fix it by preventing a
user-executable file (possibly breaking compability with existing
executable key files), or do we want to document what the restriction
really is?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1e3dfb6..1330845 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -206,8 +206,30 @@ be_tls_init(void)
 	 errmsg("could not access private key file \"%s\": %m",
 			ssl_key_file)));
 
+		if (!S_ISREG(buf.st_mode))
+			ereport(FATAL,
+	(errcode(ERRCODE_CONFIG_FILE_ERROR),
+	 errmsg("private key file \"%s\" is not a regular file",
+			ssl_key_file)));
+
 		/*
-		 * Require no public access to key file.
+		 * Refuse to load files owned by users other than us or root.
+		 *
+		 * XXX surely we can check this on Windows somehow, too.
+		 */
+#if !defined(WIN32) && !defined(__CYGWIN__)
+		if (buf.st_uid != geteuid() && buf.st_uid != 0)
+			ereport(FATAL,
+	(errcode(ERRCODE_CONFIG_FILE_ERROR),
+	 errmsg("private key file \"%s\" must be owned by the database user or root",
+			ssl_key_file)));
+#endif
+
+		/*
+		 * Require no public access to key file. If the file is owned by us,
+		 * require mode 0600 or less. If owned by root, require 0640 or less
+		 * to allow read access through our gid, or a supplementary gid that
+		 * allows to read system-wide certificates.
 		 *
 		 * XXX temporarily suppress check when on Windows, because there may
 		 * not be proper support for Unix-y file permissions.  Need to think
@@ -215,12 +237,13 @@ be_tls_init(void)
 		 * directory permission check in postmaster.c)
 		 */
 #if !defined(WIN32) && !defined(__CYGWIN__)
-		if (!S_ISREG(buf.st_mode) || buf.st_mode & (S_IRWXG | S_IRWXO))
+		if ((buf.st_uid == geteuid() && buf.st_mode & (S_IXUSR | S_IRWXG | S_IRWXO)) ||
+			(buf.st_uid == 0 && buf.st_mode & (S_IXUSR | S_IWGRP | S_IXGRP | S_IRWXO)))
 			ereport(FATAL,
 	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-  errmsg("private key file \"%s\" has group or world access",
-		 ssl_key_file),
-   errdetail("Permissions should be u=rw (0600) or less.")));
+	 errmsg("private key file \"%s\" has group or world access",
+			ssl_key_file),
+	 errdetail("File must have permissions u=rw (0600) or less if owned by the datbase user, or permissions u=rw,g=r (0640) or less if owned by root.")));
 #endif
 
 		if (SSL_CTX_use_PrivateKey_file(SSL_context,

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Roles

2016-03-04 Thread Robert Haas

On Mon, Feb 29, 2016 at 10:02 PM, Stephen Frost  wrote:
> Attached is a stripped-down version of the default roles patch which
> includes only the 'pg_signal_backend' default role.  This provides the
> framework and structure for other default roles to be added and formally
> reserves the 'pg_' role namespace.  This is split into two patches, the
> first to formally reserve 'pg_', the second to add the new default role.
>
> Will add to the March commitfest for review.

Here is a review of the first patch:

+   if (!IsA(node, RoleSpec))
+   elog(ERROR, "invalid node type %d", node->type);

That looks strange.  Why not just Assert(IsA(node, RoleSpec))?

+
+   return;

Useless return.

@@ -673,6 +673,7 @@ dumpRoles(PGconn *conn)
 "pg_catalog.shobj_description(oid,
'pg_authid') as rolcomment, "
  "rolname =
current_user AS is_current_user "
  "FROM pg_authid "
+ "WHERE rolname !~ '^pg_' "
  "ORDER BY 2");
else if (server_version >= 90100)
printfPQExpBuffer(buf,
@@ -895,6 +896,7 @@ dumpRoleMembership(PGconn *conn)
   "LEFT JOIN pg_authid ur on
ur.oid = a.roleid "
   "LEFT JOIN pg_authid um on
um.oid = a.member "
   "LEFT JOIN pg_authid ug on
ug.oid = a.grantor "
+  "WHERE NOT (ur.rolname ~
'^pg_' AND um.rolname ~ '^pg_')"
   "ORDER BY 1,2,3");

If I'm reading this correctly, when dumping a 9.5 server, we'll
silently drop any roles whose names start with pg_ from the dump, and
hope that doesn't break anything.  When dumping a 9.4 or older server,
we'll include those roles and hope that they miraculously restore on
the new server.  I'm thinking neither of those approaches is going to
work out, and the difference between them seems totally unprincipled.

@@ -631,7 +637,8 @@ check_is_install_user(ClusterInfo *cluster)
res = executeQueryOrDie(conn,
"SELECT rolsuper, oid "
"FROM
pg_catalog.pg_roles "
-   "WHERE rolname
= current_user");
+   "WHERE rolname
= current_user "
+   "AND rolname
!~ '^pg_'");

/*
 * We only allow the install user in the new cluster (see comment below)
@@ -647,7 +654,8 @@ check_is_install_user(ClusterInfo *cluster)

res = executeQueryOrDie(conn,
"SELECT COUNT(*) "
-   "FROM
pg_catalog.pg_roles ");
+   "FROM
pg_catalog.pg_roles "
+   "WHERE rolname
!~ '^pg_'");

if (PQntuples(res) != 1)
pg_fatal("could not determine the number of users\n");

What bad thing would happen without these changes that would be
avoided with these changes?  I can't think of anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-03-04 Thread Tom Lane

Robert Haas  writes:
> On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane  wrote:
>> Huh?  Parallel workers are read-only; what would they be doing sending
>> any of these messages?

> Mumble.  I have no idea what's happening here.

OK, after inserting a bunch of debug logging I have figured out what is
happening.  The updates on trunc_stats_test et al, being updates, are
done in the session's main backend.  But we also have these queries:

-- do a seqscan
SELECT count(*) FROM tenk2;
-- do an indexscan
SELECT count(*) FROM tenk2 WHERE unique1 = 1;

These can be, and are, done in parallel worker processes (and not
necessarily the same one, either).  AFAICT, the parallel worker
processes send their stats messages to the stats collector more or
less immediately after processing their queries.  However, because
of the rate-limiting logic in pgstat_report_stat, the main backend
doesn't.  The point of that "pg_sleep(1.0)" (which was actually added
*after* wait_for_stats) is to ensure that the half-second delay in
the rate limiter has been soaked up, and the stats messages sent,
before we start waiting for the results to become visible in the
stats collector's output.

So the sequence of events when we get a failure looks like

1. parallel workers send stats updates for seqscan and indexscan
on tenk2.

2. stats collector emits output files, probably as a result of
an autovacuum request.

3. session's main backend finishes "pg_sleep(1.0)" and sends
stats updates for what it's done lately, including the
updates on trunc_stats_test et al.

4. wait_for_stats() observes that the tenk2 idx_scan count has
already advanced and figures it need not wait at all.

5. We print stale stats for trunc_stats_test et al.

So it appears to me that to make this robust, we need to adjust
wait_for_stats to verify advances on *all three of* the tenk2
seq_scan count, the tenk2 idx_scan count, and at least one of
the trunc_stats_test tables' counters, because those could be
coming from three different backend processes.

If we ever allow parallel workers to do writes, this will really
become a mess.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-04 Thread Alvaro Herrera

I wonder why do we have two identical copies of clause_sides_match_join ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Julien Rouhaud

On 03/03/2016 10:54, Kyotaro HORIGUCHI wrote:
> Hello,
> 
> At Wed, 2 Mar 2016 17:48:06 -0600, Jim Nasby  wrote 
> in <[email protected]>
>> On 3/2/16 10:48 AM, Julien Rouhaud wrote:
>>> Good point, I don't see a lot of information available with this hooks
>>> that a native system statistics couldn't offer. To have the same
>>> amount
>>> of information, I think we'd need a pg_stat_autovacuum view that shows
>>> a
>>> realtime insight of the workers, and also add some aggregated counters
>>> to PgStat_StatTabEntry. I wonder if adding counters to
>>> PgStat_StatTabEntry would be accepted though.
>>
>> I would also really like to see a means of logging (auto)vacuum
>> activity in the database itself. We figured out how to do that with
>> pg_stat_statements, which was a lot harder... it seems kinda silly not
>> to offer that for vacuum. Hooks plus shared memory data should allow
>> for that (the only tricky bit is the hook would need to start and then
>> commit a transaction, but that doesn't seem onerous).
>>
>> I think the shared memory structures should be done as well. Having
>> that real-time info is also valuable.
>>
>> I don't see too much point in adding stuff to the stats system for
>> this.
> 
> I wonder why there haven't been discussions so far on what kind
> of information we want by this feature. For example I'd be happy
> to see the time of last autovacuum trial and the cause if it has
> been skipped for every table. Such information would (maybe)
> naturally be shown in pg_stat_*_tables.
> 
> =
> =# select relid, last_completed_autovacuum, last_completed_autovacv_status, 
> last_autovacuum_trial, last_autovacuum_result from pg_stat_user_tables;
> -[ RECORD 1 ]-+--
> relid | 16390
> last_completed_autovacuum | 2016-03-01 01:25:00.349074+09
> last_completed_autovac_status | Completed in 4 seconds. Scanned 434 pages, 
> skipped 23 pages
> last_autovacuum_trial | 2016-03-03 17:33:04.004322+09
> last_autovac_traial_status| Canceled by PID 2355. Processed 144/553 pages.
> -[ RECORD 2 ]--+--
> ...
> last_autovacuum_trial | 2016-03-03 07:25:00.349074+09
> last_autovac_traial_status| Completed in 4 seconds. Scanned 434 pages, 
> skipped 23 pages
> -[ RECORD 3 ]--+--
> ...
> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status | Processing by PID 42334, 564 / 32526 pages 
> done.
> -[ RECORD 4 ]--+--
> ...
> last_autovacuum_trial | 2016-03-03 17:59:12.324454+09
> last_autovac_trial_status | Skipped by dead-tuple threashold.
> =
> 
> Apart from the appropriateness of the concrete shape, it would be
> done by extending the current stats system and needs modification
> of some other parts but the hooks and WorkerInfoData is not
> enough. This might be a business of Rahila's "VACUUM Progress
> Checker" and it convers some real-time info.
> 
> https://commitfest.postgresql.org/9/545/
> 
> On the other hand, it would be in another place and needs another
> method if we want a history like the current autovacuum
> completion logs (at debug3..) of 100 latest invocation of
> autovacuum worker. Anyway the WorkerInfoData is not enough.
> 
> 
> What kind of information we (will) want to have?
> 

Very good suggestion.

I think the most productive way to work on this is to start a wiki page
to summarize what's the available information, what we should store and
how to represent it.

I'll update this thread as soon as I'll have a first draft finished.

> 
> regards,
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-04 Thread Robert Haas

On Fri, Feb 26, 2016 at 3:28 AM,   wrote:
> Thank you for your comments.
> Please find attached patch addressing following comments.
>
>>As I might have written upthread, transferring the whole string
>>as a progress message is useless at least in this scenario. Since
>>they are a set of fixed messages, each of them can be represented
>>by an identifier, an integer number. I don't see a reason for
>>sending the whole of a string beyond a backend.
> Agreed. I used following macros.
> #define VACUUM_PHASE_SCAN_HEAP  1
> #define VACUUM_PHASE_VACUUM_INDEX_HEAP  2
>
>>I guess num_index_scans could better be reported after all the indexes are
>>done, that is, after the for loop ends.
> Agreed.  I have corrected it.
>
>> CREATE VIEW pg_stat_vacuum_progress AS
>>   SELECT S.s[1] as pid,
>>  S.s[2] as relid,
>>  CASE S.s[3]
>>WHEN 1 THEN 'Scanning Heap'
>>WHEN 2 THEN 'Vacuuming Index and Heap'
>>ELSE 'Unknown phase'
>>  END,
>>
>>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
>>
>> # The name of the function could be other than *_command_progress.
> The name of function is updated as pg_stat_get_progress_info() and also 
> updated the function.
> Updated the pg_stat_vacuum_progress view as suggested.

I'm positive I've said this at least once before while reviewing this
patch, and I think more than once: we should be trying to build a
general progress-reporting facility here with vacuum as the first
user.  Therefore, for example, pg_stat_get_progress_info's output
columns should have generic names, not names specific to VACUUM.
pg_stat_vacuum_progress can alias them to a vacuum-specific name.  See
for example the relationship between pg_stats and pg_statistic.

I think VACUUM should have three phases, not two.  lazy_vacuum_index()
and lazy_vacuum_heap() are lumped together right now, but I think they
shouldn't be.

Please create named constants for the first argument to
pgstat_report_progress_update_counter(), maybe with names like
PROGRESS_VACUUM_WHATEVER.

+   /* Update current block number of the relation */
+   pgstat_report_progress_update_counter(2, blkno + 1);

Why + 1?

I thought we had a plan to update the counter of scanned index pages
after each index page was vacuumed by the AM.  Doing it only after
vacuuming the entire index is much less granular and generally less
useful.   See 
http://www.postgresql.org/message-id/[email protected]

+   if (blkno == nblocks - 1 &&
vacrelstats->num_dead_tuples == 0 && nindexes != 0
+   && vacrelstats->num_index_scans == 0)
+   total_index_pages = 0;

I'm not sure what this is trying to do, perhaps because there is no
comment explaining it.  Whatever the intent, I suspect that such a
complex test is likely to be fragile.  Perhaps there is a better way?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Equivalent of --enable-tap-tests in MSVC scripts

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 1:16 AM, Craig Ringer  wrote:
> On 5 March 2016 at 00:10, Alvaro Herrera  wrote:
>>
>> Craig Ringer wrote:
>>
>> > If it's the result of perltidy changing its mind about the formatting as
>> > a
>> > result of this change I guess we have to eyeroll and live with it.
>> > perltidy
>> > leaves the file alone as it is in the tree currently, so that be it.
>> >
>> > Gripe withdrawn, ready for committer IMO
>>
>> Okay, thanks.  I applied it back to 9.4, which is when
>> --enable-tap-tests appeared.  I didn't perltidy 9.4's config_default.pl,
>> though.
>
>
> Thanks very much. It didn't occur to me to backport it, but it seems
> harmless.
>
> https://commitfest.postgresql.org/9/566/ marked as committed.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 12:08 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> - 0001, as mentioned by Petr upthread, psed is removed from the core
>> distribution of Perl in 5.22, so when installing ActivePerl it is not
>> possible to create probes.h, and the code compilation would fail. I
>> bumped into that so here is a patch. What I am proposing here is to
>> replace psed by sed, sed being available in MSYS like bison and flex,
>> so when building using MSVC the environment to set up is normally
>> already good to go even with this additional dependency. Now, it is
>> important to mention that probes.h is not part of a source tarball. I
>> think that we would want probes.h to be part of a source tarball so as
>> it would be possible to compile the code on Windows using MSVC without
>> having to install MSYS. I haven't done that in this patch, thoughts on
>> the matter are welcome.
>
> I think the path of least resistance is to change the sed script into a
> Perl script.  Should be fairly simple ...

Yes that's possible as well. It would be better to use the same
process for *nix platforms as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 1:23 AM, Robert Haas  wrote:
> On Fri, Mar 4, 2016 at 11:09 AM, Tom Lane  wrote:
>> Alvaro Herrera  writes:
>>> I would like to have a patch for this finalized today, so that we can
>>> apply to master before or during the weekend; with it in the tree for
>>> about a week we can be more confident and backpatch close to next
>>> weekend, so that we see it in the next set of minor releases.  Does that
>>> sound good?
>>
>> I see no reason to wait before backpatching.  If you're concerned about
>> having testing, the more branches it is in, the more buildfarm cycles
>> you will get on it.  And we're not going to cut any releases in between,
>> so what's the benefit of not having it there?
>
> Agreed.

OK. I could produce that by tonight my time, not before unfortunately.
And FWIW, per the comments of Andres, it is not clear to me what we
gain by having a common routine for link() and rename() knowing that
half the code paths performing a rename do not rely on link(). At
least it sound dangerous to me to introduce a dependency to link() in
code paths that depend just on rename() for back branches. On HEAD, we
could be more adventurous for sure. Regarding the replacement of
stat() by something relying on OpenTransientFile I agree though. For
the flush of the parent directory in link_safe() we'd still want to do
it, and we are fine to not flush the parent directory of the old file
because the backend does not move files across paths.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 6:52 AM, Julien Rouhaud
 wrote:
> Very good suggestion.
>
> I think the most productive way to work on this is to start a wiki page
> to summarize what's the available information, what we should store and
> how to represent it.
>
> I'll update this thread as soon as I'll have a first draft finished.

New design discussions are a little bit late for 9.6 I am afraid :(
Perhaps we should consider this patch as returned with feedback for
the time being? The hook approach is not something I'd wish for if we
can improve in-core facility that would help user to decide better how
to tune autovacuum parameters. The VACUUM progress facility covers a
different need by helping to track how long a scan is still going to
take. What we want here is something that would run on top of that.
Logs at least may be helpful for things like pgbadger.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Andres Freund

On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund  wrote:
> > Hi,
> 
> Thanks for the review.
> 
> >> +/*
> >> + * rename_safe -- rename of a file, making it on-disk persistent
> >> + *
> >> + * This routine ensures that a rename file persists in case of a crash by 
> >> using
> >> + * fsync on the old and new files before and after performing the rename 
> >> so as
> >> + * this categorizes as an all-or-nothing operation.
> >> + */
> >> +int
> >> +rename_safe(const char *oldfile, const char *newfile)
> >> +{
> >> + struct stat filestats;
> >> +
> >> + /*
> >> +  * First fsync the old entry and new entry, it this one exists, to 
> >> ensure
> >> +  * that they are properly persistent on disk. Calling this routine 
> >> with
> >> +  * an existing new target file is fine, rename() will first remove it
> >> +  * before performing its operation.
> >> +  */
> >> + fsync_fname(oldfile, false);
> >> + if (stat(newfile, &filestats) == 0)
> >> + fsync_fname(newfile, false);
> >
> > I don't think we want any stat()s here. I'd much, much rather check open
> > for ENOENT.
> 
> OK. So you mean more or less that, right?
> int fd;
> fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0);
> if (fd < 0)
> {
> if (errno != ENOENT)
> return -1;
> }
> else
> {
> pg_fsync(fd);
> CloseTransientFile(fd);
> }

Yes. Otherwise the check is racy: The file could be gone by the time you
do the fsync; leading to a spurious ERROR (which often would get
promoted to a PANIC).

> >> +/*
> >> + * link_safe -- make a file hard link, making it on-disk persistent
> >> + *
> >> + * This routine ensures that a hard link created on a file persists on 
> >> system
> >> + * in case of a crash by using fsync where on the link generated as well 
> >> as on
> >> + * its parent directory.
> >> + */
> >> +int
> >> +link_safe(const char *oldfile, const char *newfile)
> >> +{
> >
> > If we go for a new abstraction here, I'd rather make it
> > 'replace_file_safe' or something, and move the link/rename code #ifdef
> > into it.
> 
> Hm. OK. I don't see any reason why switching to link() even in code
> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
> be a problem thinking about it. Should HAVE_WORKING_LINK be available
> on a platform we can combine it with unlink. Is that in line with what
> you think?

I wasn't trying to suggest we should replace all rename codepaths with
the link wrapper, just the ones that already have a HAVE_WORKING_LINK
check. The name of the routine I suggested is bad though...

> >> + if (link(oldfile, newfile) < 0)
> >> + return -1;
> >> +
> >> + /*
> >> +  * Make the link persistent in case of an OS crash, the new entry
> >> +  * generated as well as its parent directory need to be flushed.
> >> +  */
> >> + fsync_fname(newfile, false);
> >> +
> >> + /*
> >> +  * Same for parent directory. This routine is never called to rename
> >> +  * files across directories, but if this proves to become the case,
> >> +  * flushing the parent directory if the old file would be necessary.
> >> +  */
> >> + fsync_parent_path(newfile);
> >> + return 0;
> >
> > I think it's a seriously bad idea to encode that knowledge in such a
> > general sounding routine.  We could however argue that this is about
> > safely replacing the *target* file; not about safely removing the old
> > file.
> 
> Not sure I am following here. Are you referring to the fact that if
> the new file and old file are on different directories would make this
> routine unreliable?

Yes.


> Because yes that's the case if we want to make both of them
> persistent, and I think we want to do so.

That's one way.


> Do you suggest to correct this comment to remove the mention to the
> old file's parent directory because we just care about having the new
> file as being persistent?

That's one approach, yes. Combined with the fact that you can't actually
reliably rename across directories, the two could be on different
filesystems after all, that'd be a suitable defense. It just needs to be
properly documented in the function header, not at the bottom.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Andres Freund

On 2016-03-05 07:29:35 +0900, Michael Paquier wrote:
> OK. I could produce that by tonight my time, not before unfortunately.

I'm switching to this patch, after pushing the pending logical decoding
fixes. Probably not today, but tomorrow PST afternoon should work.

> And FWIW, per the comments of Andres, it is not clear to me what we
> gain by having a common routine for link() and rename() knowing that
> half the code paths performing a rename do not rely on link().

I'm not talking about replacing all renames with this. Just the ones
that currently use link(). There's not much point in introducing
link_safe(), when all the callers have the same duplicated code, with a
fallback to rename().

Andres

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 7:35 AM, Andres Freund  wrote:
> On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
>> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund  wrote:
>> > I don't think we want any stat()s here. I'd much, much rather check open
>> > for ENOENT.
>>
>> OK. So you mean more or less that, right?
>> int fd;
>> fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0);
>> if (fd < 0)
>> {
>> if (errno != ENOENT)
>> return -1;
>> }
>> else
>> {
>> pg_fsync(fd);
>> CloseTransientFile(fd);
>> }
>
> Yes. Otherwise the check is racy: The file could be gone by the time you
> do the fsync; leading to a spurious ERROR (which often would get
> promoted to a PANIC).

Yeah, that makes sense.

>> >> +/*
>> >> + * link_safe -- make a file hard link, making it on-disk persistent
>> >> + *
>> >> + * This routine ensures that a hard link created on a file persists on 
>> >> system
>> >> + * in case of a crash by using fsync where on the link generated as well 
>> >> as on
>> >> + * its parent directory.
>> >> + */
>> >> +int
>> >> +link_safe(const char *oldfile, const char *newfile)
>> >> +{
>> >
>> > If we go for a new abstraction here, I'd rather make it
>> > 'replace_file_safe' or something, and move the link/rename code #ifdef
>> > into it.
>>
>> Hm. OK. I don't see any reason why switching to link() even in code
>> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
>> be a problem thinking about it. Should HAVE_WORKING_LINK be available
>> on a platform we can combine it with unlink. Is that in line with what
>> you think?
>
> I wasn't trying to suggest we should replace all rename codepaths with
> the link wrapper, just the ones that already have a HAVE_WORKING_LINK
> check. The name of the routine I suggested is bad though...

So we'd introduce a first routine rename_or_link_safe(), say replace_safe().

>> Do you suggest to correct this comment to remove the mention to the
>> old file's parent directory because we just care about having the new
>> file as being persistent?
>
> That's one approach, yes. Combined with the fact that you can't actually
> reliably rename across directories, the two could be on different
> filesystems after all, that'd be a suitable defense. It just needs to be
> properly documented in the function header, not at the bottom.

OK. Got it. Or the two could be on the same filesystem. Still, link()
and rename() do not support doing their stuff on different filesystems
(EXDEV).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 7:37 AM, Andres Freund  wrote:
> On 2016-03-05 07:29:35 +0900, Michael Paquier wrote:
>> OK. I could produce that by tonight my time, not before unfortunately.
>
> I'm switching to this patch, after pushing the pending logical decoding
> fixes. Probably not today, but tomorrow PST afternoon should work.

OK, so if that's the case there is not need to step on your toes seen from here.

>> And FWIW, per the comments of Andres, it is not clear to me what we
>> gain by having a common routine for link() and rename() knowing that
>> half the code paths performing a rename do not rely on link().
>
> I'm not talking about replacing all renames with this. Just the ones
> that currently use link(). There's not much point in introducing
> link_safe(), when all the callers have the same duplicated code, with a
> fallback to rename().

Indeed, that's the case. I don't have a better name than replace_safe
though. replace_paranoid is not a very appealing name either.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-04 Thread Andres Freund

On 2016-03-05 07:43:00 +0900, Michael Paquier wrote:
> On Sat, Mar 5, 2016 at 7:35 AM, Andres Freund  wrote:
> > On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
> >> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund  wrote:
> >> Hm. OK. I don't see any reason why switching to link() even in code
> >> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
> >> be a problem thinking about it. Should HAVE_WORKING_LINK be available
> >> on a platform we can combine it with unlink. Is that in line with what
> >> you think?
> >
> > I wasn't trying to suggest we should replace all rename codepaths with
> > the link wrapper, just the ones that already have a HAVE_WORKING_LINK
> > check. The name of the routine I suggested is bad though...
>
> So we'd introduce a first routine rename_or_link_safe(), say replace_safe().

Or actually maybe just link_safe(), which falls back to access() &&
rename() if !HAVE_WORKING_LINK.

> > That's one approach, yes. Combined with the fact that you can't actually
> > reliably rename across directories, the two could be on different
> > filesystems after all, that'd be a suitable defense. It just needs to be
> > properly documented in the function header, not at the bottom.
>
> OK. Got it. Or the two could be on the same filesystem.

> Still, link() and rename() do not support doing their stuff on
> different filesystems (EXDEV).

That's my point ...


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Julien Rouhaud

On 04/03/2016 23:34, Michael Paquier wrote:
> On Sat, Mar 5, 2016 at 6:52 AM, Julien Rouhaud
>  wrote:
>> Very good suggestion.
>>
>> I think the most productive way to work on this is to start a wiki page
>> to summarize what's the available information, what we should store and
>> how to represent it.
>>
>> I'll update this thread as soon as I'll have a first draft finished.
> 
> New design discussions are a little bit late for 9.6 I am afraid :(
> Perhaps we should consider this patch as returned with feedback for
> the time being? The hook approach is not something I'd wish for if we
> can improve in-core facility that would help user to decide better how
> to tune autovacuum parameters.

Yes, it's clearly not suited for the final commitfest. I just closed the
patch as "returned with feedback".

I'll work on the feedbacks I already had to document a wiki page, and
wait for this commitfest to be more or less finished before starting a
new thread on autovacuum instrumentation design.


> The VACUUM progress facility covers a
> different need by helping to track how long a scan is still going to
> take. What we want here is something that would run on top of that.
> Logs at least may be helpful for things like pgbadger.
> 



-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish autovacuum informations

2016-03-04 Thread Michael Paquier

On Sat, Mar 5, 2016 at 9:21 AM, Julien Rouhaud
 wrote:
> On 04/03/2016 23:34, Michael Paquier wrote:
>> New design discussions are a little bit late for 9.6 I am afraid :(
>> Perhaps we should consider this patch as returned with feedback for
>> the time being? The hook approach is not something I'd wish for if we
>> can improve in-core facility that would help user to decide better how
>> to tune autovacuum parameters.
>
> Yes, it's clearly not suited for the final commitfest. I just closed the
> patch as "returned with feedback".
>
> I'll work on the feedbacks I already had to document a wiki page, and
> wait for this commitfest to be more or less finished before starting a
> new thread on autovacuum instrumentation design.

OK, thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 125 matches

Mail list logo