Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Konstantin Knizhnik

On 02/27/2016 04:16 AM, Simon Riggs wrote:

On 27 February 2016 at 00:33, Simon Riggs > wrote:

On 27 February 2016 at 00:29, Andres Freund > wrote:

On 2016-02-26 18:05:55 +0300, Konstantin Knizhnik wrote:
> The reason of the problem is that invalidation messages are not 
delivered to
> replica after the end of concurrent create index.
> Invalidation messages are included in xlog as part of transaction 
commit
> record.
> Concurrent index create is split into three transaction, last of 
which is
> just performing inplace update of index tuple, marking it as valid and
> invalidating cache. But as far as this transaction is not assigned 
XID, no
> transaction record is created in WAL and send to replicas. As a 
result,
> replica doesn't receive this invalidation messages.

Ugh, that's a fairly ugly bug.


Looking now.


If the above is true, then the proposed fix wouldn't work either.

No point in sending a cache invalidation message on the standby if you haven't 
also written WAL, since the catalog re-read would just see the old row.

heap_inplace_update() does write WAL, which blows away the starting premise.

So I'm not seeing this as an extant bug in an open source version of 
PostgreSQL, in my current understanding.



Inplace update really creates record in WAL and this is why index is marked as 
valid at replica.
But invalidation messages are sent only with transaction commit record and such 
record is not created in this case,
because there is no assigned XID.

This is a real bug which originally observed by one of our customers with 
different versions of Postgres (last one them have tried was 9.5.1).
Then we reproduced it locally and determined the reason of the problem.
Repro scenario is very simple: you just need to create large enough table 
(pgbench with scale factor 100 works well in my case)
so that "create index concurrently" takes substantial amount of time. If, while 
this statement is in progress, you will execute some query at replica which
uses this index, then it will cache state of relation without index. And after 
even when index is actually constructed, it will never be used in this backend 
(but other backends at replica will use it).

I am not sure about the best way of fixing the problem.
I have not tested Andreas proposal:

if (nrels != 0 || nmsgs != 0 || RelcacheInitFileInval)

if it actually fixes the problem.
Assigning XID in heap_inplace_update definitely should work.
It is better than forcing assignment XID in DefineIndex? I am not sure, because 
this problem seems to be related only with concurrent update
(but may be I am wrong).
At least not all inplace updates should cause catalog invalidation and so 
require XID assignment.




--
Simon Riggs http://www.2ndQuadrant.com/ 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

On 02/27/2016 06:57 AM, Robert Haas wrote:

On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
 wrote:

pg_tsdtm  is based on another approach: it is using system time as CSN and
doesn't require arbiter. In theory there is no limit for scalability. But
differences in system time and necessity to use more rounds of communication
have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?


If node receives message from "feature" it just needs to wait until this future 
arrive.
Practically we just "adjust" system time in this case, moving it forward 
(certainly system time is not actually changed, we just set correction value which need 
to be added to system time).
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do here.

Few notes:
1. I can not prove that our pg_tsdtm absolutely correctly implements approach 
described in this article.
2. I didn't try to formally prove that our implementation can not cause some 
serialization anomalies.
3. We just run various synchronization tests (including simplest debit-credit 
test which breaks old version of Postgtes-XL) during several days and we didn't 
get any inconsistencies.
4. We have tested pg_tsdtm both at single node, blade cluster and geographically distributed nodes (distance more than thousand kilometers: one server was in Vladivostok, another in Kaliningrad). Ping between these two servers takes about 100msec. 
Performance of our benchmark drops about 100 times but there was no inconsistencies.


Also I once again want to notice that primary idea of the proposed patch was 
not pg_tsdtm.
There are well know limitation of this  pg_tsdtm which we will try to address 
in future.
What we want is to include XTM API in PostgreSQL to be able to continue our 
experiments with different transaction managers and implementing multimaster on 
top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as 
PostgreSQL extension and everybody can use it.
Otherwise we have to propose our own fork of Postgres which significantly 
complicates using and maintaining it.


So there is no ideal solution which can work well for all cluster. This is
why it is not possible to develop just one GTM, propose it as a patch for
review and then (hopefully) commit it in Postgres core. IMHO it will never
happen. And I do not think that it is actually needed. What we need is a way
to be able to create own transaction managers as Postgres extension not
affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?


Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs 
and be  efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is 
Postgres should use kind of TM API.
I hope that everybody will agree that doing it in this way:

#ifdef PGXC
/* In Postgres-XC, stop timestamp has to follow the timeline of GTM */
xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
#else
xlrec.xact_time = xactStopTimestamp;
#endif

or in this way:

xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp : 
xactStopTimestamp;

is very very bad idea.
In OO programming we should have abstract TM interface and several 
implementations of this interface, for example
MVCC_TM, 2PL_TM, Distributed_TM...
This is actually what can be done with our XTM API.
As far as Postgres is implemented in C, not in C++ we have to emulate 
interfaces using structures with function pointers.
And please notice that there is completely no need to include DTM 
implementation in core, as far as it is not needed for everybody.
It can be easily distributed as extension.

I have that quite soon we can propose multimaster extension which should provides functionality similar with MySQL Gallera. But even right now we have integrated pg_dtm and pg_tsdtm with pg_shard and postgres_fdw, allowing to provide distributed 
consistency for them.






All arguments against XTM can be applied to any other extension API in
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are
not handled by this API. For example performing aggregation and grouping at
foreign server side.  But still it is very useful and flexible mechanism,
allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list 

Re: [HACKERS] raw output from copy

2016-02-26 Thread Pavel Stehule
Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule :

> Hi,
>
> Psql based implementation needs new infrastructure (more than few lines)
>
> Missing:
>
> * binary mode support
> * parametrized query support,
>
> I am not against, but both points I proposed, and both was rejected.
>
> So why dont use current infrastructure? Raw copy is trivial patch.
>

I was asked by Daniel Verite about reopening this patch in opened
commitfest.

I am sending rebased patch

Regards

Pavel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
new file mode 100644
index 07e2f45..68fbfd8
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { ta
*** 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   or binary.
The default is text.
   
  
--- 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   binary or raw.
The default is text.
   
  
*** OIDs to be shown as null if that ever pr
*** 888,893 
--- 888,925 
  
 

+ 
+   
+  Raw Format
+ 
+
+ The raw format option causes all data to be
+ stored/read as binary format rather than as text. It shares format
+ for data with binary format. This format doesn't
+ use any metadata - only row data in network byte order are exported
+ or imported.
+
+ 
+
+ Because this format doesn't support any delimiter, only one value
+ can be exported or imported. NULL values are not allowed.
+
+
+ The raw format can be used for export or import
+ bytea values.
+ 
+ COPY images(data) FROM '/usr1/proj/img/01.jpg' (FORMAT raw);
+ 
+ It can be used successfully for export XML in different encoding
+ or import valid XML document with any supported encoding:
+ 
+
+   
   
  
   
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 3201476..beb9152
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** typedef enum EolType
*** 89,94 
--- 89,99 
   * it's faster to make useless comparisons to trailing bytes than it is to
   * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
   * when we have to do it the hard way.
+  *
+  * COPY supports three modes: text, binary and raw. The text format is plain
+  * text multiline format with specified delimiter. The binary format holds
+  * metadata (numbers, sizes) and data. The raw format holds data only and
+  * only one non NULL value can be processed.
   */
  typedef struct CopyStateData
  {
*** typedef struct CopyStateData
*** 110,115 
--- 115,121 
  	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
  	bool		is_program;		/* is 'filename' a program to popen? */
  	bool		binary;			/* binary format? */
+ 	bool		raw;			/* required raw binary? */
  	bool		oids;			/* include OIDs? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
*** typedef struct CopyStateData
*** 199,204 
--- 205,213 
  	char	   *raw_buf;
  	int			raw_buf_index;	/* next byte to process */
  	int			raw_buf_len;	/* total # of bytes stored */
+ 
+ 	/* field for RAW mode */
+ 	bool		row_processed;		/* true, when first row was processed */
  } CopyStateData;
  
  /* DestReceiver for COPY (query) TO */
*** SendCopyBegin(CopyState cstate)
*** 342,350 
  		/* new way */
  		StringInfoData buf;
  		int			natts = list_length(cstate->attnumlist);
! 		int16		format = (cstate->binary ? 1 : 0);
  		int			i;
  
  		pq_beginmessage(, 'H');
  		pq_sendbyte(, format);		/* overall format */
  		pq_sendint(, natts, 2);
--- 351,366 
  		/* new way */
  		StringInfoData buf;
  		int			natts = list_length(cstate->attnumlist);
! 		int16		format;
  		int			i;
  
+ 		if (cstate->raw)
+ 			format = 2;
+ 		else if (cstate->binary)
+ 			format = 1;
+ 		else
+ 			format = 0;
+ 
  		pq_beginmessage(, 'H');
  		pq_sendbyte(, format);		/* overall format */
  		pq_sendint(, natts, 2);
*** SendCopyBegin(CopyState cstate)
*** 356,362 
  	else if (PG_PROTOCOL_MAJOR(FrontendProtocol) >= 2)
  	{
  		/* old way */
! 		if (cstate->binary)
  			ereport(ERROR,
  	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			errmsg("COPY BINARY is not supported to stdout or from stdin")));
--- 372,378 
  	else if (PG_PROTOCOL_MAJOR(FrontendProtocol) >= 2)
  	{
  		/* old way */
! 		if (cstate->binary && cstate->raw)
  			ereport(ERROR,
  	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			errmsg("COPY BINARY is not supported to stdout or from stdin")));
*** SendCopyBegin(CopyState cstate)
*** 368,374 
  	else
  	{
  		/* very old way */
! 		if (cstate->binary)
  			ereport(ERROR,
  	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			

Re: [HACKERS] proposal: get oldest LSN - function

2016-02-26 Thread Kartyshov Ivan

On 27.02.2016 03:07, Andres Freund wrote

How does it help with any of that?

Hi, thank you for fast answer.
Maybe i wasn't too accurate in terms, because I newbie, but:
We can get information about xlog, using big amout of support function 
(pg_current_xlog_location(), pg_current_xlog_insert_location(), 
pg_xlogfile_name_offset(), pg_xlogfile_name(), 
pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), ... 
etc) they helps to get get useful information about xlog files and its 
content. So, this patch extends this amount of functions.


This function is mostly for debugging purposes.

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com
Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-26 Thread Vinayak Pokale
Hello,

On Fri, Feb 26, 2016 at 6:19 PM, Amit Langote  wrote:

>
> Hi Vinayak,
>
> Thanks for updating the patch! A quick comment:
>
> On 2016/02/26 17:28, poku...@pm.nttdata.co.jp wrote:
> >> CREATE VIEW pg_stat_vacuum_progress AS
> >>   SELECT S.s[1] as pid,
> >>  S.s[2] as relid,
> >>  CASE S.s[3]
> >>WHEN 1 THEN 'Scanning Heap'
> >>WHEN 2 THEN 'Vacuuming Index and Heap'
> >>ELSE 'Unknown phase'
> >>  END,
> >>
> >>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S;
> >>
> >> # The name of the function could be other than *_command_progress.
> > The name of function is updated as pg_stat_get_progress_info() and also
> updated the function.
> > Updated the pg_stat_vacuum_progress view as suggested.
>
> So, pg_stat_get_progress_info() now accepts a parameter to distinguish
> different commands.  I see the following in its definition:
>
> +   /*  Report values for only those backends which are
> running VACUUM
> command */
> +   if (cmdtype == COMMAND_LAZY_VACUUM)
> +   {
> +   /*Progress can only be viewed by role member.*/
> +   if (has_privs_of_role(GetUserId(),
> beentry->st_userid))
> +   {
> +   values[2] =
> UInt32GetDatum(beentry->st_progress_param[0]);
> +   values[3] =
> UInt32GetDatum(beentry->st_progress_param[1]);
> +   values[4] =
> UInt32GetDatum(beentry->st_progress_param[2]);
> +   values[5] =
> UInt32GetDatum(beentry->st_progress_param[3]);
> +   values[6] =
> UInt32GetDatum(beentry->st_progress_param[4]);
> +   values[7] =
> UInt32GetDatum(beentry->st_progress_param[5]);
> +   if (beentry->st_progress_param[1] != 0)
> +   values[8] =
> Float8GetDatum(beentry->st_progress_param[2] * 100 /
> beentry->st_progress_param[1]);
> +   else
> +   nulls[8] = true;
> +   }
> +   else
> +   {
> +   nulls[2] = true;
> +   nulls[3] = true;
> +   nulls[4] = true;
> +   nulls[5] = true;
> +   nulls[6] = true;
> +   nulls[7] = true;
> +   nulls[8] = true;
> +   }
> +   }
>
> How about doing this in a separate function which takes the command id as
> parameter and returns an array of values and the number of values (per
> command id). pg_stat_get_progress_info() then creates values[] and nulls[]
> arrays from that and returns that as result set.  It will be a cleaner
> separation of activities, perhaps.
>
> +1

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-02-26 Thread Amit Kapila
On Sat, Feb 27, 2016 at 10:03 AM, Amit Kapila 
wrote:
>
>
> Here, we can see that there is a gain of ~15% to ~38% at higher client
> count.
>
> The attached document (perf_write_clogcontrollock_data_v6.ods) contains
> data, mainly focussing on single client performance.  The data is for
> multiple runs on different machines, so I thought it is better to present
> in form of document rather than dumping everything in e-mail.  Do let me
> know if there is any confusion in understanding/interpreting the data.
>
>
Forgot to mention that all these tests have been done by
reverting commit-ac1d794.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-02-26 Thread Amit Kapila
On Tue, Feb 23, 2016 at 7:06 PM, Robert Haas  wrote:

> On Sun, Feb 21, 2016 at 7:45 PM, Amit Kapila 
> wrote:
>
>> I mean, my basic feeling is that I would not accept a 2-3% regression in
>>> the single client case to get a 10% speedup in the case where we have 128
>>> clients.
>>>
>>
>>
When I tried by running the pgbench first with patch and then with Head, I
see 1.2% performance increase with patch.  TPS with patch is 976 and with
Head it is 964. For 3, 30 mins TPS data, refer "Patch –
group_clog_update_v5" and before that "HEAD – Commit 481725c0"
in perf_write_clogcontrollock_data_v6.ods attached with this mail.

Nonetheless, I have observed that below new check has been added by the
patch which can effect single client performance.  So I have changed it
such that new check is done only when we there is actually a need of group
update which means when multiple clients tries to update clog at-a-time.

+   if (!InRecovery &&
+   all_trans_same_page &&
+   nsubxids < PGPROC_MAX_CACHED_SUBXIDS &&
+   !IsGXactActive())



> I understand your point.  I think to verify whether it is run-to-run
>> variation or an actual regression, I will re-run these tests on single
>> client multiple times and post the result.
>>
>
> Perhaps you could also try it on a couple of different machines (e.g.
> MacBook Pro and a couple of different large servers).
>


Okay, I have tried latest patch (group_update_clog_v6.patch) on 2 different
big servers and then on Mac-Pro. The detailed data for various runs can be
found in attached document perf_write_clogcontrollock_data_v6.ods.  I have
taken the performance data for higher client-counts with somewhat larger
scale factor (1000) and data for median of same is as below:

M/c configuration
-
RAM - 500GB
8 sockets, 64 cores(Hyperthreaded128 threads total)

Non-default parameters

max_connections = 1000
shared_buffers=32GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout=35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB


Client_Count/Patch_ver 1 8 64 128 256
HEAD 871 5090 17760 17616 13907
PATCH 900 5110 18331 20277 19263


Here, we can see that there is a gain of ~15% to ~38% at higher client
count.

The attached document (perf_write_clogcontrollock_data_v6.ods) contains
data, mainly focussing on single client performance.  The data is for
multiple runs on different machines, so I thought it is better to present
in form of document rather than dumping everything in e-mail.  Do let me
know if there is any confusion in understanding/interpreting the data.

Thanks to Dilip Kumar for helping me in conducting test of this patch on
MacBook-Pro.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v6.patch
Description: Binary data


perf_write_clogcontrollock_data_v6.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Respect TEMP_CONFIG when running contrib regression tests.

2016-02-26 Thread Robert Haas
On Sat, Feb 27, 2016 at 9:00 AM, Andrew Dunstan  wrote:
>> Sure.  Saving three lines of Makefile duplication is hardly a
>> world-shattering event, so I thought there might be some other
>> purpose.  But I'm not against saving three lines of duplication
>> either, if it won't break anything.
> The point is that we should do this for several other test sets as well as
> contrib - isolation tests, PL tests and ecpg tests.

OK, I was wondering about that.  I can try to write a patch, or
someone else can, but if you already understand what needs to be done,
perhaps you should just go ahead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
 wrote:
> pg_tsdtm  is based on another approach: it is using system time as CSN and
> doesn't require arbiter. In theory there is no limit for scalability. But
> differences in system time and necessity to use more rounds of communication
> have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?

> So there is no ideal solution which can work well for all cluster. This is
> why it is not possible to develop just one GTM, propose it as a patch for
> review and then (hopefully) commit it in Postgres core. IMHO it will never
> happen. And I do not think that it is actually needed. What we need is a way
> to be able to create own transaction managers as Postgres extension not
> affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?

> All arguments against XTM can be applied to any other extension API in
> Postgres, for example FDW.
> Is it general enough? There are many useful operations which currently are
> not handled by this API. For example performing aggregation and grouping at
> foreign server side.  But still it is very useful and flexible mechanism,
> allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-02-26 Thread Andres Freund
On February 26, 2016 7:55:18 PM PST, Amit Kapila  
wrote:
>On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund 
>wrote:
>>
>> Hi,
>>
>> On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:
>> > From past few weeks, we were facing some performance degradation in
>the
>> > read-only performance bench marks in high-end machines.  My
>colleague
>> > Mithun, has tried by reverting commit ac1d794 which seems to
>degrade the
>> > performance in HEAD on high-end m/c's as reported previously[1],
>but
>still
>> > we were getting degradation, then we have done some profiling to
>see
>what
>> > has caused it  and we found that it's mainly caused by spin lock
>when
>> > called via pin/unpin buffer and then we tried by reverting commit
>6150a1b0
>> > which has recently changed the structures in that area and it turns
>out
>> > that reverting that patch, we don't see any degradation in
>performance.
>> > The important point to note is that the performance degradation
>doesn't
>> > occur every time, but if the tests are repeated twice or thrice, it
>> > is easily visible.
>>
>> > m/c details
>> > IBM POWER-8
>> > 24 cores,192 hardware threads
>> > RAM - 492GB
>> >
>> > Non-default postgresql.conf settings-
>> > shared_buffers=16GB
>> > max_connections=200
>> > min_wal_size=15GB
>> > max_wal_size=20GB
>> > checkpoint_timeout=900
>> > maintenance_work_mem=1GB
>> > checkpoint_completion_target=0.9
>> >
>> > scale_factor - 300
>> >
>> > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is
>469002 at
>> > 64-client count and then at
>6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
>> > went down to 200807.  This performance numbers are median of 3
>15-min
>> > pgbench read-only tests.  The similar data is seen even when we
>revert
>the
>> > patch on latest commit.  We have yet to perform detail analysis as
>to
>why
>> > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to
>degradation,
>> > but any ideas are welcome.
>>
>> Ugh. Especially the varying performance is odd. Does it vary between
>> restarts, or is it just happenstance?  If it's the former, we might
>be
>> dealing with some alignment issues.
>>
>
>It varies between restarts.
>
>>
>> If not, I wonder if the issue is massive buffer header contention. As
>a
>> LL/SC architecture acquiring the content lock might interrupt buffer
>> spinlock acquisition and vice versa.
>>
>> Does applying the patch from
>http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
>> change the picture?
>>
>
>Not tried, but if this is alignment issue as you are suspecting above,
>then
>does it make sense to try this out?

It's the other theory I had. And it's additionally useful testing regardless of 
this regression...

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-02-26 Thread Amit Kapila
On Sat, Feb 27, 2016 at 12:41 AM, Andres Freund  wrote:
>
> Hi,
>
> On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:
> > From past few weeks, we were facing some performance degradation in the
> > read-only performance bench marks in high-end machines.  My colleague
> > Mithun, has tried by reverting commit ac1d794 which seems to degrade the
> > performance in HEAD on high-end m/c's as reported previously[1], but
still
> > we were getting degradation, then we have done some profiling to see
what
> > has caused it  and we found that it's mainly caused by spin lock when
> > called via pin/unpin buffer and then we tried by reverting commit
6150a1b0
> > which has recently changed the structures in that area and it turns out
> > that reverting that patch, we don't see any degradation in performance.
> > The important point to note is that the performance degradation doesn't
> > occur every time, but if the tests are repeated twice or thrice, it
> > is easily visible.
>
> > m/c details
> > IBM POWER-8
> > 24 cores,192 hardware threads
> > RAM - 492GB
> >
> > Non-default postgresql.conf settings-
> > shared_buffers=16GB
> > max_connections=200
> > min_wal_size=15GB
> > max_wal_size=20GB
> > checkpoint_timeout=900
> > maintenance_work_mem=1GB
> > checkpoint_completion_target=0.9
> >
> > scale_factor - 300
> >
> > Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is
469002 at
> > 64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
> > went down to 200807.  This performance numbers are median of 3 15-min
> > pgbench read-only tests.  The similar data is seen even when we revert
the
> > patch on latest commit.  We have yet to perform detail analysis as to
why
> > the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
> > but any ideas are welcome.
>
> Ugh. Especially the varying performance is odd. Does it vary between
> restarts, or is it just happenstance?  If it's the former, we might be
> dealing with some alignment issues.
>

It varies between restarts.

>
> If not, I wonder if the issue is massive buffer header contention. As a
> LL/SC architecture acquiring the content lock might interrupt buffer
> spinlock acquisition and vice versa.
>
> Does applying the patch from
http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
> change the picture?
>

Not tried, but if this is alignment issue as you are suspecting above, then
does it make sense to try this out?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:
> We do not have formal prove that proposed XTM is "general enough" to handle
> all possible transaction manager implementations.
> But there are two general ways of dealing with isolation: snapshot based and
> CSN  based.

I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.

For example, consider a table with a million rows spread across any
number of servers.  Consider also a series of update transactions each
of which reads exactly one row and then writes that row.  If we adopt
any solution that involves a central coordinator to arbitrate commit
ordering, this is going to require at least one and probably two
million network round trips, one per transaction to get a snapshot and
a second to commit.  But all of this is completely unnecessary.
Because each transaction touches only a single node, a perfect global
transaction manager doesn't really need to do anything at all in this
case.  The existing PostreSQL mechanisms - snapshot isolation, and SSI
if you have it turned on - will provide just as much transaction
isolation on this workload as they would on a workload that only
touched a single node.  If we design a GTM that does two million
network round trips in this scenario, we have just wasted two million
network round trips.

Now consider another workload where each transaction reads a row one
one server, reads a row on another server, and then updates the second
row.  Here, the GTM has a job to do.  If T1 reads R1, reads R2, writes
R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
could happen that both transactions see the pre-update values of the
row they read first and yet both transactions go on to commit.  That's
not equivalent to any serial history, so transaction isolation is
broken.  A GTM which aims to provide true cluster-wide serializability
must do something to keep that from happening.  If all of this were
happening on a single node, those transactions would succeed if run at
READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

I have seen zero discussion of any of this.  What I think we 

[HACKERS] syslog configurable line splitting behavior

2016-02-26 Thread Peter Eisentraut
Writing log messages to syslog caters to ancient syslog implementations
in two ways:

- sequence numbers
- line splitting

While these are arguably reasonable defaults, I would like a way to turn
them off, because they get in the way of doing more interesting things
with syslog (e.g., logging somewhere that is not just a text file).

So I propose the two attached patches that introduce new configuration
Boolean parameters syslog_sequence_numbers and syslog_split_lines that
can toggle these behaviors.
From e6a17750956e3e6950683bad397a74adb30f30a2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 26 Feb 2016 22:34:30 -0500
Subject: [PATCH 1/2] Add syslog_sequence_numbers parameter

---
 doc/src/sgml/config.sgml  | 28 +++
 src/backend/utils/error/elog.c| 12 ++--
 src/backend/utils/misc/guc.c  | 10 ++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/utils/elog.h  |  1 +
 5 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a09ceb2..0d1ae4b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4218,6 +4218,34 @@ Where To Log

   
 
+  
+   syslog_sequence_numbers (boolean)
+
+ syslog_sequence_numbers configuration parameter
+
+   
+
+   
+
+ When logging to syslog and this is on (the
+ default), then each message will be prefixed by an increasing
+ sequence number (such as [2]).  This circumvents
+ the --- last message repeated N times --- suppression
+ that many syslog implementations perform by default.  In more modern
+ syslog implementations, repeat message suppression can be configured
+ (for example, $RepeatedMsgReduction
+ in rsyslog), so this might not be
+ necessary.  Also, you could turn this off if you actually want to
+ suppress repeated messages.
+
+
+
+ This parameter can only be set in the postgresql.conf
+ file or on the server command line.
+
+  
+ 
+
  
   event_source (string)
   
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 9005b26..0bc96b4 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -106,6 +106,7 @@ int			Log_error_verbosity = PGERROR_VERBOSE;
 char	   *Log_line_prefix = NULL;		/* format for extra log line info */
 int			Log_destination = LOG_DESTINATION_STDERR;
 char	   *Log_destination_string = NULL;
+bool		syslog_sequence_numbers = true;
 
 #ifdef HAVE_SYSLOG
 
@@ -2008,7 +2009,11 @@ write_syslog(int level, const char *line)
 
 			chunk_nr++;
 
-			syslog(level, "[%lu-%d] %s", seq, chunk_nr, buf);
+			if (syslog_sequence_numbers)
+syslog(level, "[%lu-%d] %s", seq, chunk_nr, buf);
+			else
+syslog(level, "[%d] %s", chunk_nr, buf);
+
 			line += buflen;
 			len -= buflen;
 		}
@@ -2016,7 +2021,10 @@ write_syslog(int level, const char *line)
 	else
 	{
 		/* message short enough */
-		syslog(level, "[%lu] %s", seq, line);
+		if (syslog_sequence_numbers)
+			syslog(level, "[%lu] %s", seq, line);
+		else
+			syslog(level, "%s", line);
 	}
 }
 #endif   /* HAVE_SYSLOG */
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index ea5a09a..bc8faa9 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -1632,6 +1632,16 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"syslog_sequence_numbers", PGC_SIGHUP, LOGGING_WHERE,
+			gettext_noop("Add sequence number to syslog messags to avoid duplicate suppression."),
+			NULL
+		},
+		_sequence_numbers,
+		true,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ee3d378..a85ba36 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -358,6 +358,7 @@
 # These are relevant when logging to syslog:
 #syslog_facility = 'LOCAL0'
 #syslog_ident = 'postgres'
+#syslog_sequence_numbers = true
 
 # This is only relevant when logging to eventlog (win32):
 #event_source = 'PostgreSQL'
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 326896f..bfbcf96 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -396,6 +396,7 @@ extern int	Log_error_verbosity;
 extern char *Log_line_prefix;
 extern int	Log_destination;
 extern char *Log_destination_string;
+extern bool syslog_sequence_numbers;
 
 /* Log destination bitmap */
 #define LOG_DESTINATION_STDERR	 1
-- 
2.7.2

From 72ea7dc222f41ab8246c0aa080d1c1adaf78f4c7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 26 Feb 2016 

Re: [HACKERS] Sanity checking for ./configure options?

2016-02-26 Thread Peter Eisentraut
On 2/22/16 6:24 PM, Jim Nasby wrote:
> On 2/5/16 10:08 AM, David Fetter wrote:
>> On Wed, Feb 03, 2016 at 06:02:57PM -0600, Jim Nasby wrote:
>>> I just discovered that ./configure will happily accept
>>> '--with-pgport=' (I
>>> was actually doing =$PGPORT, and didn't realize $PGPORT was empty).
>>> What you
>>> end up with is a compile error in guc.c, with no idea why it's
>>> broken. Any
>>> reason not to have configure or at least make puke if pgport isn't
>>> valid?
>>
>> That seems like a good idea.
> 
> Patch attached. I've verified it with --with-pgport=, =0, =7 and =1.
> It catches what you'd expect it to.

Your code and comments suggest that you can specify the port to
configure by setting PGPORT, but that is not the case.

test == is not portable (bashism).

Error messages should have consistent capitalization.

Indentation in configure is two spaces.

> As the comment states, it doesn't catch things like --with-pgport=1a in
> configure, but the compile error you get with that isn't too hard to
> figure out, so I think it's OK.

Passing a non-integer as argument will produce an error message like
(depending on shell)

./configure: line 3107: test: 11a: integer expression expected

but will not actually abort configure.

It would work more robustly if you did something like this

elif test "$default_port" -ge "1" -a "$default_port" -le "65535"; then
  :
else
  AC_MSG_ERROR([port must be between 1 and 65535])
fi

but that still leaks the shell's error message.

There is also the risk of someone specifying a number with a leading
zero, which C would interpret as octal but the shell would not.

To make this really robust, you might need to do pattern matching on the
value.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Andres Freund
On 2016-02-27 01:45:57 +, Simon Riggs wrote:
> Surely then the fix is to make heap_inplace_update() assign an xid? That
> way any catalog change will always generate a commit record containing the
> invalidation that goes with the change. No need to fix up the breakage
> later.

Well, we could, but it'd also break things where we rely on
heap_inplace_update not assigning an xid.  I'm not seing why that's
better than my proposal of doing this
(http://archives.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244%40alap3.anarazel.de),
by emitting a commit record in RecordTransactionCommmit() if nrels != 0
|| nmsgs != 0 || RelcacheInitFileInval
.

> The other heap_insert|update|delete functions (and similar) all assign xid,
> so it is consistent for us to do that for inplace_update also.

I don't think that follows. Inplace updates an absolute special case,
where are *not allowed* to include an xid in the tuple.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Vitaly Burovoy
On 2/26/16, Shulgin, Oleksandr  wrote:
> On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov
> 
> wrote:
>
>> The following review has been posted through the commitfest application:
>
>> make installcheck-world:  tested, failed
>> Implements feature:   tested, failed
>> Spec compliant:   tested, failed
>> Documentation:tested, failed
>>
>> Applied this patch, it works well, make what it expected correctly, code
>> style is maintained. Regression tests passed OK. No documentation or
>> comments.
>>
>
> Why does it say "tested, failed" for all points above there? ;-)

I hope Ivan misunderstood that "tested" and "passed" are two different
options, not a single "tested, passed" ;-)

Unfortunately, it seems there should be a discussion about meaning of
"": it seems authors made it as ISO8601-compliant (but there are
still several bugs), but there is no proofs in the documentation[1].

On the one hand there is only "year" mentioned for "", "YYY",
etc., and "ISO 8601 ... year" is "week-numbering", i.e. "IYYY", "IYY",
etc. only for using with "IDDD", "ID" and "IW".

On the other hand "extract" has two different options: "year" and
"isoyear" and only the second one is ISO8601-compliant (moreover it is
week-numbering at the same time):
postgres=# SELECT y src, EXTRACT(year FROM y) AS year, EXTRACT(isoyear
FROM y) AS isoyear
postgres-# FROM unnest(ARRAY[
postgres(# '4713-01-01 BC','4714-12-31 BC','4714-12-29 BC','4714-12-28
BC']::date[]) y;
  src  | year  | isoyear
---+---+-
 4713-01-01 BC | -4713 |   -4712
 4714-12-31 BC | -4714 |   -4712
 4714-12-29 BC | -4714 |   -4712
 4714-12-28 BC | -4714 |   -4713
(4 rows)

So there is lack of consistency: something should be changed: either
"extract(year..." (and the documentation[2], but there is "Keep in
mind there is no 0 AD, ..." for a long time, so it is a change which
breaks compatibility; and the patch will be completely different), or
the patch (it has an influence on "IYYY", so it is obviously wrong).

Now (after the letter[3] from Thomas Munro) I know the patch is not
ready for committing even with minimal changes. But I'm waiting for a
discussion: what part should be changed?

I would change behavior of "to_date" and "to_timestamp" to match with
extract options "year"/"isoyear", but I think getting decision of the
community before modifying the patch is a better idea.

[1]http://www.postgresql.org/docs/devel/static/functions-formatting.html
[2]http://www.postgresql.org/docs/devel/static/functions-datetime.html
[3]http://www.postgresql.org/message-id/CAEepm=0aznvy+frtyni04imdw4tlgzaelljqmhmcjbre+ln...@mail.gmail.com
--
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Simon Riggs
On 27 February 2016 at 01:23, Andres Freund  wrote:

> On 2016-02-27 01:16:34 +, Simon Riggs wrote:
> > If the above is true, then the proposed fix wouldn't work either.
> >
> > No point in sending a cache invalidation message on the standby if you
> > haven't also written WAL, since the catalog re-read would just see the
> old
> > row.
> >
> > heap_inplace_update() does write WAL, which blows away the starting
> premise.
>
> I'm not following here. heap_inplace_update() indeed writes WAL, but it
> does *NOT* (and may not) assign an xid. Thus we're not emitting the
> relcache invalidation queued in DefineIndex(), as
> RecordTransactionCommit() currently skips emitting a commit record if
> there's no xid.


OK.

Surely then the fix is to make heap_inplace_update() assign an xid? That
way any catalog change will always generate a commit record containing the
invalidation that goes with the change. No need to fix up the breakage
later.

The other heap_insert|update|delete functions (and similar) all assign xid,
so it is consistent for us to do that for inplace_update also.


diff --git a/src/backend/access/heap/heapam.c
b/src/backend/access/heap/heapam.c
index f443742..94282a0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6022,6 +6022,7 @@ heap_abort_speculative(Relation relation, HeapTuple
tuple)
 void
 heap_inplace_update(Relation relation, HeapTuple tuple)
 {
+   TransactionId xid = GetCurrentTransactionId();
Buffer  buffer;
Pagepage;
OffsetNumber offnum;

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


consistent_xid_assignment_for_inplace.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Andres Freund
On 2016-02-27 01:16:34 +, Simon Riggs wrote:
> If the above is true, then the proposed fix wouldn't work either.
> 
> No point in sending a cache invalidation message on the standby if you
> haven't also written WAL, since the catalog re-read would just see the old
> row.
> 
> heap_inplace_update() does write WAL, which blows away the starting premise.

I'm not following here. heap_inplace_update() indeed writes WAL, but it
does *NOT* (and may not) assign an xid. Thus we're not emitting the
relcache invalidation queued in DefineIndex(), as
RecordTransactionCommit() currently skips emitting a commit record if
there's no xid.


> So I'm not seeing this as an extant bug in an open source version of
> PostgreSQL, in my current understanding.

But the first message in the thread demonstrated a reproducible problem?


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Simon Riggs
On 27 February 2016 at 00:33, Simon Riggs  wrote:

> On 27 February 2016 at 00:29, Andres Freund  wrote:
>
>> On 2016-02-26 18:05:55 +0300, Konstantin Knizhnik wrote:
>> > The reason of the problem is that invalidation messages are not
>> delivered to
>> > replica after the end of concurrent create index.
>> > Invalidation messages are included in xlog as part of transaction commit
>> > record.
>> > Concurrent index create is split into three transaction, last of which
>> is
>> > just performing inplace update of index tuple, marking it as valid and
>> > invalidating cache. But as far as this transaction is not assigned XID,
>> no
>> > transaction record is created in WAL and send to replicas. As a result,
>> > replica doesn't receive this invalidation messages.
>>
>> Ugh, that's a fairly ugly bug.
>
>
> Looking now.
>

If the above is true, then the proposed fix wouldn't work either.

No point in sending a cache invalidation message on the standby if you
haven't also written WAL, since the catalog re-read would just see the old
row.

heap_inplace_update() does write WAL, which blows away the starting premise.

So I'm not seeing this as an extant bug in an open source version of
PostgreSQL, in my current understanding.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-02-26 Thread Euler Taveira
On 26-02-2016 17:51, Roma Sokolov wrote:
> Fixed the patch to be more descriptive and to avoid repeating same 
> computation over and over again. See v2 of the patch attached.
> 
Oh, much better.

> Why do you think that? Should I remove them or maybe send as separate
> patch?
> 
Because it is not a common practice to test catalog dependency on
separate tests (AFAICS initial catalogs are tested with oidjoins.sql).
Also, your test case is too huge for such a small use case. If you can
reduce it to a small set of commands using some of the oidjoins.sql
queries, I think it could be sufficient.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Simon Riggs
On 27 February 2016 at 00:29, Andres Freund  wrote:

> On 2016-02-26 18:05:55 +0300, Konstantin Knizhnik wrote:
> > The reason of the problem is that invalidation messages are not
> delivered to
> > replica after the end of concurrent create index.
> > Invalidation messages are included in xlog as part of transaction commit
> > record.
> > Concurrent index create is split into three transaction, last of which is
> > just performing inplace update of index tuple, marking it as valid and
> > invalidating cache. But as far as this transaction is not assigned XID,
> no
> > transaction record is created in WAL and send to replicas. As a result,
> > replica doesn't receive this invalidation messages.
>
> Ugh, that's a fairly ugly bug.


Looking now.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Andres Freund
On 2016-02-26 18:05:55 +0300, Konstantin Knizhnik wrote:
> The reason of the problem is that invalidation messages are not delivered to
> replica after the end of concurrent create index.
> Invalidation messages are included in xlog as part of transaction commit
> record.
> Concurrent index create is split into three transaction, last of which is
> just performing inplace update of index tuple, marking it as valid and
> invalidating cache. But as far as this transaction is not assigned XID, no
> transaction record is created in WAL and send to replicas. As a result,
> replica doesn't receive this invalidation messages.

Ugh, that's a fairly ugly bug.


> To fix the problem it is just enough to assign XID to transaction.
> It can be done by adding GetCurrentTransactionId() call to the end of
> DefineIdnex function:

I think it'd be a better idea to always create a commit record if
there's pending invalidation messages. It looks to me that that'd be a
simple addition to RecordTransactionCommit.  Otherwise we'd end up with
something rather fragile, relying on people to recognize such
problems. E.g. Vacuum and analyze also have this problem.

Tom, everyone, do you see any reason not to go with such an approach?
Not sending invalidations we've queued up seems like it could cause
rather serious problems with HS and/or logical decoding.

Basically I'm thinking about assigning an xid
if (nrels != 0 || nmsgs != 0 || RelcacheInitFileInval)

since we really don't ever want to miss sending out WAL in those
cases. Strictly speaking we don't need an xid, but it seems rather
fragile to suddenly have commit records without one.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: get oldest LSN - function

2016-02-26 Thread Andres Freund
On 2016-02-26 18:45:14 +0300, Kartyshov Ivan wrote:
> Function pg_oldest_xlog_location gets us the oldest LSN (Log Sequence
> Number) in xlog.
> 
> It is useful additional tool for DBA (we can get replicationSlotMinLSN, so
> why not in master), it can show us, if xlog replication or wal-sender is
> working properly or indicate if replication on startup can get up to date
> with master, or after long turnoff must be recovered from archive.

How does it help with any of that?


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-02-26 Thread Andres Freund
Hi,

I'm not really convinced by RegisterStandbyMsgPrefix() et al. There's
not much documentation about what it actually is supposed to
acomplish. Afaics you're basically forced to use
shared_preload_libraries with it right now?  Also, iterating through a
linked list everytime something is logged doesn't seem very satisfying?


On 2016-02-24 18:35:16 +0100, Petr Jelinek wrote:
> +SET synchronous_commit = on;
> +SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
> 'test_decoding');
> + ?column? 
> +--
> + init
> +(1 row)

> +SELECT 'msg1' FROM pg_logical_send_message(true, 'test', 'msg1');
> + ?column? 
> +--
> + msg1
> +(1 row)

Hm. Somehow 'sending' a message seems wrong here. Maybe 'emit'?

> +  
> +   
> +
> + pg_logical_send_message
> +
> +
> pg_logical_send_message(transactional
>  bool, prefix text, 
> content text)
> +   
> +   
> +void
> +   
> +   
> +Write text logical decoding message. This can be used to pass generic
> +messages to logical decoding plugins through WAL. The parameter
> +transactional specifies if the message should
> +be part of current transaction or if it should be written and decoded
> +immediately. The prefix has to be prefix which
> +was registered by a plugin. The content is
> +content of the message.
> +   
> +  

It's not decoded immediately, even if emitted non-transactionally.

> +
> + Generic Message Callback
> +
> + 
> +  The optional message_cb callback is called 
> whenever
> +  a logical decoding message has been decoded.
> +
> +typedef void (*LogicalDecodeMessageCB) (
> +struct LogicalDecodingContext *,
> +ReorderBufferTXN *txn,
> +XLogRecPtr message_lsn,
> +bool transactional,
> +const char *prefix,
> +Size message_size,
> +const char *message
> +);

We should at least document what txn is set to if not transactional.

> +void
> +logicalmsg_desc(StringInfo buf, XLogReaderState *record)
> +{
> + char   *rec = XLogRecGetData(record);
> + xl_logical_message *xlrec = (xl_logical_message *) rec;
> +
> + appendStringInfo(buf, "%s message size %zu bytes",
> +  xlrec->transactional ? "transactional" 
> : "nontransactional",
> +  xlrec->message_size);
> +}

Shouldn't we check
  uint8 info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
  if XLogRecGetInfo(record) == XLOG_LOGICAL_MESSAGE
here?

> +const char *
> +logicalmsg_identify(uint8 info)
> +{
> + return NULL;
> +}

Huh?


> +void
> +ReorderBufferQueueMessage(ReorderBuffer *rb, TransactionId xid, XLogRecPtr 
> lsn,
> +   bool transactional, const 
> char *prefix, Size msg_sz,
> +   const char *msg)
> +{
> + ReorderBufferTXN *txn = NULL;
> +
> + if (transactional)
> + {
> + ReorderBufferChange *change;
> +
> + txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> +
> + Assert(xid != InvalidTransactionId);
> + Assert(txn != NULL);
> +
> + change = ReorderBufferGetChange(rb);
> + change->action = REORDER_BUFFER_CHANGE_MESSAGE;
> + change->data.msg.transactional = true;
> + change->data.msg.prefix = pstrdup(prefix);
> + change->data.msg.message_size = msg_sz;
> + change->data.msg.message = palloc(msg_sz);
> + memcpy(change->data.msg.message, msg, msg_sz);
> +
> + ReorderBufferQueueChange(rb, xid, lsn, change);
> + }
> + else
> + {
> + rb->message(rb, txn, lsn, transactional, prefix, msg_sz, msg);
> + }
> +}


This approach prohibts catalog access when processing a nontransaction
message as there's no snapshot set up.

> + case REORDER_BUFFER_CHANGE_MESSAGE:
> + {
> + char   *data;
> + size_t  prefix_size = 
> strlen(change->data.msg.prefix) + 1;
> +
> + sz += prefix_size + 
> change->data.msg.message_size;
> + ReorderBufferSerializeReserve(rb, sz);
> +
> + data = ((char *) rb->outbuf) + 
> sizeof(ReorderBufferDiskChange);
> + memcpy(data, change->data.msg.prefix,
> +prefix_size);
> + memcpy(data + prefix_size, 
> change->data.msg.message,
> +change->data.msg.message_size);
> + break;
> + }
>   case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
>   {
>   Snapshotsnap;
> @@ -2354,6 

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-26 Thread Andres Freund
On 2016-02-02 13:12:50 +0300, Alexander Korotkov wrote:
> On Tue, Feb 2, 2016 at 12:43 AM, Andres Freund  wrote:
> 
> > On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote:
> > > On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
> > > a.korot...@postgrespro.ru> wrote:
> > > >> ClientBasePatch
> > > >> 11974419382
> > > >> 8125923126395
> > > >> 3231393151
> > > >> 64387339496830
> > > >> 128306412350610
> > > >>
> > > >> Shared Buffer= 512MB
> > > >> max_connections=150
> > > >> Scale Factor=300
> > > >>
> > > >> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> > > >>
> > > >> ClientBasePatch
> > > >> 11716916454
> > > >> 8108547105559
> > > >> 32241619262818
> > > >> 64206868233606
> > > >> 128137084217013
> >
> > So, there's a small regression on low client counts. That's worth
> > addressing.
> >
> 
> Interesting. I'll try to reproduce it.

Any progress here?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Simon Riggs
On 26 February 2016 at 22:48, Kevin Grittner  wrote:

> On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
>  wrote:
>
> > pg_tsdtm  is based on another approach: it is using system time
> > as CSN
>
> Which brings up an interesting point, if we want logical
> replication to be free of serialization anomalies for those using
> serializable transactions, we need to support applying transactions
> in an order which may not be the same as commit order -- CSN (as
> such) would be the wrong thing.  If serializable transaction 1 (T1)
> modifies a row and concurrent serializable transaction 2 (T2) reads
> the old version of the row, and modifies something based on that,
> T2 must be applied to a logical replica first even if T1 commits
> before it; otherwise the logical replica could see a state not
> consistent with business rules and which could not have been seen
> (due to SSI) on the source database.


How would SSI allow that commit order?

Surely there is a read-write dependency that would cause T2 to be aborted?


> Any DTM API which does not
> support some mechanism to rearrange the order of transactions from
> commit order to some other order (based on, for example, read-write
> dependencies) is not complete.  If it does support that, it gives
> us a way forward for presenting consistent data on logical
> replicas.
>

You appear to be saying that SSI allows transactions to commit in a
non-serializable order.

Do you have a test case?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Add a test framework for recovery

2016-02-26 Thread Michael Paquier
On Sat, Feb 27, 2016 at 4:59 AM, Alvaro Herrera
 wrote:
> Alvaro Herrera wrote:
>> Add a test framework for recovery
>>
>> This long-awaited framework is an expansion of the existing PostgresNode
>> stuff to support additional features for recovery testing; the recovery
>> tests included in this commit are a starting point that cover some of
>> the recovery features we have.  More scripts are expected to be added
>> later.
>
> It seems the buildfarm doesn't run these tests.  It'd be good to fix
> that, but not sure what's the best way to go about it.  I surely prefer
> that it didn't involve changing the buildfarm client script ...

The buildfarm script does a per-path test with isolation/ and
regression/, modifying it would be needed anyway for MSVC.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Kevin Grittner
On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
 wrote:

> pg_tsdtm  is based on another approach: it is using system time
> as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-26 Thread Michael Paquier
On Sat, Feb 27, 2016 at 4:30 AM, Alvaro Herrera
 wrote:
> Craig Ringer wrote:
>> Should be committed ASAP IMO.
>
> Finally pushed it.  Let's see how it does in the buildfarm.  Now let's
> get going and add more tests, I know there's no shortage of people with
> test scripts waiting for this.
>
> Thanks, Michael, for the persistency, and thanks to all reviewers.  (I'm
> sorry we seem to have lost Amir Rohan in the process.  He was doing
> a great job.)

Date of first message of this thread: Mon, 2 Dec 2013 15:40:41 +0900
Date of this message: Fri, 26 Feb 2016 16:30:08 -0300
This has been a long trip. Thanks a lot to all involved. Many people
have reviewed and helped out with this patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-02-26 Thread Roma Sokolov
Thanks for comments, Euler! 

> ... is hard to understand. Instead, you could separate the conditional
> expression into a variable.

Fixed the patch to be more descriptive and to avoid repeating same 
computation over and over again. See v2 of the patch attached.

> I don't think those are mandatory.


Why do you think that? Should I remove them or maybe send as separate
patch?

Cheers,
Roma



fix_drop_operator_reset_oprcom_and_oprnegate_fields_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

On 02/26/2016 09:30 PM, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:


Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently
synchronize their branch with master,
and, what is more important, for customers, which can not use standard
version of PostgreSQL.
It may cause problems with system certification, with running Postgres in
cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
wrong direction.

That's not the point, though.  I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks.  However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that.  Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?


There are many different clusters which require different approaches for 
managing distributed transactions.
Some clusters do no need distributed transactions at all: if you are executing 
OLAP queries on read-only database GTM will  just add extra overhead.

pg_dtm uses centralized arbiter. It is similar with Postgres-XL DTM. Presence of single arbiter signficantly simplify all distributed algorithms: failure detection, global deadlock elimination, ... But at the same time arbiter is SPOF and main factor 
limiting cluster scalability.


pg_tsdtm  is based on another approach: it is using system time as CSN and doesn't require arbiter. In theory there is no limit for scalability. But differences in system time and necessity to use more rounds of communication have negative impact on 
performance.


So there is no ideal solution which can work well for all cluster. This is why it is not possible to develop just one GTM, propose it as a patch for review and then (hopefully) commit it in Postgres core. IMHO it will never happen. And I do not think that 
it is actually needed. What we need is a way to be able to create own transaction managers as Postgres extension not affecting its  core.


All arguments against XTM can be applied to any other extension API in 
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are not handled by this API. For example performing aggregation and grouping at foreign server side.  But still it is very useful and flexible mechanism, allowing to implement many 
wonderful things.


From my point of view good system should be as open and customizable as 
possible, if it doesn't affect  performance.
Replacing direct function calls with indirect function calls in almost all 
cases can not suffer performance as well as adding hooks.
So without any extra price we get better flexibility. What's wrong with it?






--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Bruce Momjian
On Fri, Feb 26, 2016 at 03:30:29PM -0300, Alvaro Herrera wrote:
> That's not the point, though.  I don't think a Postgres clone with a GTM
> solves any particular problem that's not already solved by the existing
> forks.  However, if you have a clone at home and you make a GTM work on
> it, then you take the GTM as a patch and post it for discussion.
> There's no need for hooks for that.  Just make sure your GTM solves the
> problem that it is supposed to solve.
> 
> Excuse me if I've missed the discussion elsewhere -- why does
> PostgresPro have *two* GTMs instead of a single one?

I think the issue is that a GTM that works for a low-latency network
doesn't work well for a high-latency network, so the high-latency GTM
has fewer features and guarantees.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add a test framework for recovery

2016-02-26 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Add a test framework for recovery
> 
> This long-awaited framework is an expansion of the existing PostgresNode
> stuff to support additional features for recovery testing; the recovery
> tests included in this commit are a starting point that cover some of
> the recovery features we have.  More scripts are expected to be added
> later.

It seems the buildfarm doesn't run these tests.  It'd be good to fix
that, but not sure what's the best way to go about it.  I surely prefer
that it didn't involve changing the buildfarm client script ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-26 Thread Alvaro Herrera
Craig Ringer wrote:

> Should be committed ASAP IMO.

Finally pushed it.  Let's see how it does in the buildfarm.  Now let's
get going and add more tests, I know there's no shortage of people with
test scripts waiting for this.

Thanks, Michael, for the persistency, and thanks to all reviewers.  (I'm
sorry we seem to have lost Amir Rohan in the process.  He was doing
a great job.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-02-26 Thread Andres Freund
Hi,

On 2016-02-25 12:56:39 +0530, Amit Kapila wrote:
> From past few weeks, we were facing some performance degradation in the
> read-only performance bench marks in high-end machines.  My colleague
> Mithun, has tried by reverting commit ac1d794 which seems to degrade the
> performance in HEAD on high-end m/c's as reported previously[1], but still
> we were getting degradation, then we have done some profiling to see what
> has caused it  and we found that it's mainly caused by spin lock when
> called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
> which has recently changed the structures in that area and it turns out
> that reverting that patch, we don't see any degradation in performance.
> The important point to note is that the performance degradation doesn't
> occur every time, but if the tests are repeated twice or thrice, it
> is easily visible.

> m/c details
> IBM POWER-8
> 24 cores,192 hardware threads
> RAM - 492GB
> 
> Non-default postgresql.conf settings-
> shared_buffers=16GB
> max_connections=200
> min_wal_size=15GB
> max_wal_size=20GB
> checkpoint_timeout=900
> maintenance_work_mem=1GB
> checkpoint_completion_target=0.9
> 
> scale_factor - 300
> 
> Performance at commit 43cd468cf01007f39312af05c4c92ceb6de8afd8 is 469002 at
> 64-client count and then at 6150a1b08a9fe7ead2b25240be46dddeae9d98e1, it
> went down to 200807.  This performance numbers are median of 3 15-min
> pgbench read-only tests.  The similar data is seen even when we revert the
> patch on latest commit.  We have yet to perform detail analysis as to why
> the commit 6150a1b08a9fe7ead2b25240be46dddeae9d98e1 lead to degradation,
> but any ideas are welcome.

Ugh. Especially the varying performance is odd. Does it vary between
restarts, or is it just happenstance?  If it's the former, we might be
dealing with some alignment issues.

If not, I wonder if the issue is massive buffer header contention. As a
LL/SC architecture acquiring the content lock might interrupt buffer
spinlock acquisition and vice versa.

Does applying the patch from 
http://archives.postgresql.org/message-id/CAPpHfdu77FUi5eiNb%2BjRPFh5S%2B1U%2B8ax4Zw%3DAUYgt%2BCPsKiyWw%40mail.gmail.com
change the picture?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Ivan Kartyshov
> Why does it say "tested, failed" for all points above there? ;-)

Hi,  I just used Web reviewer form on https://commitfest.postgresql.org to make 
review on patch, but form doesn't work properly unlike the patch.))  

Re: [HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-02-26 Thread Euler Taveira
On 26-02-2016 12:46, Roma Sokolov wrote:
> Regression tests are added to check DROP OPERATOR behaves as intended 
> (including
> case with self-commutator and unlikely case with operator being both negator 
> and
> commutator).
> 
I don't think those are mandatory.

> Should this patch be added to CommitFest?
> 
Why not?

I didn't test your patch but

+  if (isDelete ? (t->oprcom == baseId || t->oprnegate == baseId)
+  : !OidIsValid(t->oprcom) || !OidIsValid(t->oprnegate))

... is hard to understand. Instead, you could separate the conditional
expression into a variable.

+ if (isDelete ? t->oprnegate == baseId : !OidIsValid(t->oprnegate))

It could be separate into a variable to be readable (or at least deserve
a comment).

(isDelete ? InvalidOid : ObjectIdGetDatum(baseId))

... and this one too. It is used in 4 places in that function.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-26 Thread Alvaro Herrera
Kyotaro HORIGUCHI wrote:

> So, I'd like to propose four (or five) changes to this harness.
> 
>  - prove_check to remove all in tmp_check
> 
>  - TestLib to preserve temporary directories/files if the current
>test fails.
> 
>  - PostgresNode::get_new_node to create data directory with
>meaningful basenames.
> 
>  - PostgresNode::psql to return a list of ($stdout, $stderr) if
>requested. (The previous behavior is not changed)
> 
>  - (recovery/t/00x_* gives test number to node name)
> 
> As a POC, the attached diff will appliy on the 0001 and (fixed)
> 0003 patches.

These changes all seem very reasonable to me.  I'm not so sure about the
last one.  Perhaps the framework ought to generate an appropriate subdir
name using the test file name plus the node name, so that instead of
tmp_ it becomes tmp_001_master_ or something like that?  Having
be a coding convention doesn't look real nice to me.

I didn't try to apply your patch but I'm fairly certain it would
conflict with what's here now; can you please rebase and resend?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Alvaro Herrera
Konstantin Knizhnik wrote:

> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
> But it cause big problems both for developers, which have to permanently
> synchronize their branch with master,
> and, what is more important, for customers, which can not use standard
> version of PostgreSQL.
> It may cause problems with system certification, with running Postgres in
> cloud,...
> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
> wrong direction.

That's not the point, though.  I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks.  However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that.  Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik
We do not have formal prove that proposed XTM is "general enough" to 
handle all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based 
and CSN  based.

pg_dtm and pg_tsdtm prove that both of them can be implemented using XTM.
If you know some approach to distributed transaction manager 
implementation, please let us know.

Otherwise your statement "is not general enough" is not concrete enough.
Postgres-XL GTM can be in principle implemented as extension based on XTM.

This API is based on existed PostgreSQL TM functions: we do not 
introduce some new abstractions.
Is it possible that some other TM function has to be encapsulated? Yes, 
it is.
But I do not see much problems with adding this function to XTM in 
future if it is actually needed.
It happens with most APIs. It is awful when API functions are changed, 
breaking application based on this API.
But as far as functions encapsulated in XTM are in any case present in 
PostgreSQL core, I do not think
that them will be changed in future unless there are some plans to 
completely rewrite Postgres transaction manager...


Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently 
synchronize their branch with master,
and, what is more important, for customers, which can not use standard 
version of PostgreSQL.
It may cause problems with system certification, with running Postgres 
in cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it 
is wrong direction.




On 26.02.2016 19:06, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.


And what makes us think we
really need multiple transaction managers, anyway?

If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.


dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 10:00 PM, Joshua D. Drake  
wrote:
> Robert, this is all a game. It is a game of who wins the intellectual prize
> to whatever problem. Who gets the market or mind share and who gets to
> pretend they win the Oscar for coolest design.

JD, I don't have a horse in this race.  I am not developing a GTM and
I would be quite happy never to have to develop a GTM.  That doesn't
mean I think we should add these proposed hooks.  I think that's just
freezing the way that potential GTMs have to interact with the rest of
the system before we actually have a solution that the community is
willing to endorse.  I don't know what problem that solves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Joshua D. Drake

On 02/26/2016 08:06 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.


I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.


Correct.

[snip]



Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.


No it didn't. It allowed MySQL people to use the tool that best fit 
their needs.



We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.


The reason people developed a bunch of external replication tools (and 
continue to) is because .Org has shown a unique lack of leadership in 
providing solutions for the problem. Historically speaking .Org was anti 
replication in core. It wasn't about who was going to be best. It was 
who was going to be best for what problem. The inclusion of the 
replication tools we have now speaks very loudly to the that lack of 
leadership.


The moment .Org showed leadership and developed a reasonable solution to 
80% of the problem, a great majority of people moved to hot standby and 
streaming replication. It is easy. It does not answer all the questions 
but it is default, in core and that gives people piece of mind. This is 
also why once PgLogical is up to -core quality and in -core, the great 
majority of people will work to dump Slony/Londiste/Insertproghere and 
use PgLogical.


If .Org was interested in showing leadership in this area, a few hackers 
would get together with a few other hackers from XL and XC (although as 
I understand it XL is further along), have a few heart to heart, mind to 
mind meetings and determine:


* Are either of these two solutions worth it?
Yes? Then let's start working on an integration plan and get it done.
No? Then let's start working on a .Org plan to solve that problem.

But that likely won't happen because NIH.



I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.


dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !


I don't like to play games with the architecture of PostgreSQL.



Robert, this is all a game. It is a game of who wins the intellectual 
prize to whatever problem. Who gets the market or mind share and who 
gets to pretend they win the Oscar for coolest design.


Sincerely,

jD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

2016-02-26 Thread Gabe F. Rudy
Ok, I get that.

Really what I am *rooting* for is Aggregate (and Sort By) Push-Down to FDW 
plugins.

I can already internalize conditional filters for most cases, and doing a count 
on the filtered results would be considerably faster in my FDW back-end before 
all the records and Datums have to be constructed for postgres to do the 
counting.

Similarly, I'm very excited about the potential for FDW to advertise a-priori 
sort states, so things like external merge-sorts can pass-through the request 
for sorted data for fields in which sorting is a no-op in my backend.

Importantly my IDs are sorted by definition since they are essentially array 
indexes into the column-store, so joining on them with merge-sort should be 
blazing fast, but currently time is wasted sorting these pre-sorted fields.

Just my 2c, and I'll be tracking the 9.6 progress that includes some of these 
proposals.

Gabe

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Thursday, February 25, 2016 11:21 PM
To: Gabe F. Rudy 
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or 
other constant time push-down

"Gabe F. Rudy"  writes:
> Is there any way to convince Postgres FDW to leverage the analyze row counts 
> or even the "double* totalRowCount" returned from the AcquireSampleRows 
> callback from my AnalyzeForeignTable function so that it does not do a 
> full-table scan for a COUNT(*) etc?

No.  In PG's view, ANALYZE-based row counts are imprecise by definition.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 6:16 PM, Michael Paquier
 wrote:
> On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI
>  wrote:
>> I marked this as "ready for commiter" and tried to add me as the
>> *second* author. But the CF app forces certain msyterious order
>> for listed names. Is there any means to arrange the author names
>> in desired order?
>
> Those are automatically classified by alphabetical order.

Doh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

2016-02-26 Thread Simon Riggs
On 25 February 2016 at 09:48, Gabe F. Rudy  wrote:

> Hey all,
>
>
>
> I’m building a FDW around a column-store backend (similar to CStore but
> for genomic data!).
>
>
>
> I have tables in the billions of rows, and have a common query pattern of
> asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).
>
>
>
> This is a read-optimized system in which I know in constant time the exact
> dimensions of the table.
>
>
>
> Is there any way to convince Postgres FDW to leverage the analyze row
> counts or even the “double* totalRowCount” returned from the
> AcquireSampleRows callback from my AnalyzeForeignTable function so that it
> does not do a full-table scan for a COUNT(*) etc?
>
>
>
> My current fallback is to export a specialized function that returns the
> table row count for a given FDW table, but that then leaks into the
> user-application driving these queries.
>

Look at TABLESAMPLE, which does mostly what you're asking.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:
> Right now tm is hardcoded and it's doesn't matter  "if other people might
> need" at all.  We at least provide developers ("other people")  ability to
> work on their implementations and the patch  is safe and doesn't sacrifices
> anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.

>> And what makes us think we
>> really need multiple transaction managers, anyway?
>
> If you brave enough to say that one tm-fits-all and you are able to teach
> existed tm to play well  in various clustering environment during
> development period, which is short, than probably we don't need  multiple
> tms. But It's too perfect to believe and practical solution is to let
> multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.

> dtms.  It's time to start working on dtm, I believe. The fact you don't
> think about distributed transactions support doesn't mean there no "other
> people", who has different ideas on postgres future.  That's why we propose
> this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-26 Thread Alvaro Herrera
Michael Paquier wrote:
 
> Attached are rebased patches, split into 3 parts doing the following:
> - 0001, fix default configuration of MSVC builds ignoring TAP tests

BTW you keep submitting this one and I keep ignoring it.  I think you
should start a separate thread for this one, so that some
Windows-enabled committer can look at it and maybe push it.  I still
don't understand why this matters.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.

2016-02-26 Thread Alvaro Herrera
Victor Wagner wrote:

> I'll second Stas' suggestion about psql_ok/psql_fail functions.
> 
> 1. psql_ok instead of just psql would provide visual feedback for the
> reader of code. One would see 'here condition is tested, here is
> something ended with _ok/_fail'.
> 
> It would be nice that seeing say "use Test::More tests => 4"
> one can immediately see "Yes, there is three _ok's and one _fail in the
> script'
> 
> 2. I have use case for psql_fail code. In my libpq failover patch there
> is number of cases, where it should be tested that connection is not
> established,
> 
> But this is rather about further evolution of the tap test library, not
> about this set of tests.

This makes sense to me.  Please submit a patch for this.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 8:41 PM, Simon Riggs  wrote:
> Don't understand this. If a problem is caused by one of two things, first
> you check one, then the other.

I don't quite understand how you think that patch can be decomposed
into multiple, independent changes.  It was one commit because every
change in there is interdependent with every other one, at least as
far as I can see.  I don't really understand how you'd split it up, or
what useful information you'd hope to gain from testing a split patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator

2016-02-26 Thread Roma Sokolov
Hello, hackers!

As was mentioned in that message [1], and earlier in [2], DROPping OPERATOR with
negator or commutator doesn't update respective fields (oprnegate and oprcom) on
them. While this issue is probably not very frequent, this behaviour leaves
database in inconsistent state and prevents from using former negator or
commutator.

Proposed patch fixes unwanted behaviour by reseting oprnegate and oprcom fields
to InvalidOid when deleting operator. Code updates tries to update other
operators only if necessary, and only if negator or commutator have target oid
in their respective fields.

To implement desired behaviour with as little code as possible, OperatorUpd
function is extended to accept additional parameter and made externally visible.
It's used from RemoveOperatorById, after checking that update is indeed needed.

Regression tests are added to check DROP OPERATOR behaves as intended (including
case with self-commutator and unlikely case with operator being both negator and
commutator).

Should this patch be added to CommitFest?

[1] — 
http://www.postgresql.org/message-id/ca+tgmoythtzzf6yhnq22szpw7otit68iu9wvvidb2jz50kd...@mail.gmail.com
[2] — http://www.postgresql.org/message-id/15208.1285772...@sss.pgh.pa.us

Cheers,
Roma



fix_drop_operator_reset_oprcom_and_oprnegate_fields_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] proposal: get oldest LSN - function

2016-02-26 Thread Kartyshov Ivan

Hello, I want to suggest a client-side little function, implemented
in the attached patch.

Function pg_oldest_xlog_location gets us the oldest LSN (Log Sequence 
Number) in xlog.


It is useful additional tool for DBA (we can get replicationSlotMinLSN, 
so why not in master), it can show us, if xlog replication or wal-sender 
is working properly or indicate if replication on startup can get up to 
date with master, or after long turnoff must be recovered from archive.


Anyway, does it look useful enough to be part of postgres?
I guess I should push this to commitfest if that's the case.

Best regards,

--
Ivan Kartyshov
Postgres Professional: www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f9eea76..f774233 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -16908,6 +16908,8 @@ SELECT set_config('log_statement_stats', 'off', false);


 pg_current_xlog_location
+   
+pg_oldest_xlog_location


 pg_start_backup
@@ -16981,6 +16983,13 @@ SELECT set_config('log_statement_stats', 'off', false);
   
   

+pg_oldest_xlog_location()
+
+   pg_lsn
+   Get the oldest WAL LSN (log sequence number)
+  
+  
+   
 pg_start_backup(label text , fast boolean )
 
pg_lsn
@@ -17096,6 +17105,7 @@ postgres=# select pg_start_backup('label_goes_here');

 

+pg_oldest_xlog_location displays the oldest WAL LSN.
 pg_current_xlog_location displays the current transaction log write
 location in the same format used by the above functions.  Similarly,
 pg_current_xlog_insert_location displays the current transaction log
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 6cb690c..5a0e887 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -860,6 +860,8 @@ primary_conninfo = 'host=192.168.1.50 port=5432 user=foo password=foopass'
  The last WAL receive location in the standby is also displayed in the
  process status of the WAL receiver process, displayed using the
  ps command (see  for details).
+ Also we can get the oldest WAL LSN (Log Sequence Number) 
+ pg_oldest_xlog_location, it can give us a useful tool for DBA, additionally it can show us, if xlog replication or wal-sender is working properly or indicate if replication on startup can can get up to date with master, or after long turnoff must be recovered from archive. 
 
 
  You can retrieve a list of WAL sender processes via the
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 94b79ac..067d51c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10669,6 +10669,18 @@ GetXLogWriteRecPtr(void)
 }
 
 /*
+ * Get oldest WAL write pointer
+ */
+XLogRecPtr
+GetXLogOldestLSNPtr(void)
+{
+	XLogRecPtr	result;
+
+	XLogSegNoOffsetToRecPtr(XLogGetLastRemovedSegno()+1, 1, result);
+	return result;
+}
+
+/*
  * Returns the redo pointer of the last checkpoint or restartpoint. This is
  * the oldest point in WAL that we still need, if we have to restart recovery.
  */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 31cbb01..44e01e1 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -195,6 +195,27 @@ pg_current_xlog_location(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Report the oldest WAL write location (same format as pg_start_backup etc)
+ *
+ * This is useful for determining the first LSN in existing sequences
+ */
+Datum
+pg_oldest_xlog_location(PG_FUNCTION_ARGS)
+{
+	XLogRecPtr	oldest_recptr;
+
+	if (RecoveryInProgress())
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("recovery is in progress"),
+ errhint("WAL control functions cannot be executed during recovery.")));
+
+	oldest_recptr = GetXLogOldestLSNPtr();
+
+	PG_RETURN_LSN(oldest_recptr);
+}
+
+/*
  * Report the current WAL insert location (same format as pg_start_backup etc)
  *
  * This function is mostly for debugging purposes.
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b24e434..3c2cefb 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5202,6 +5202,10 @@ DESCR("get an individual replication origin's replication progress");
 DATA(insert OID = 6014 ( pg_show_replication_origin_status PGNSP PGUID 12 1 100 0 0 f f f f f t v r 0 0 2249 "" "{26,25,3220,3220}" "{o,o,o,o}" "{local_id, external_id, remote_lsn, local_lsn}" _null_ _null_ pg_show_replication_origin_status _null_ _null_ _null_ ));
 DESCR("get progress for all replication origins");
 
+
+DATA(insert OID = 6015 ( pg_oldest_xlog_location	PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_oldest_xlog_location _null_ _null_ _null_ ));
+DESCR("pg oldest 

Re: [HACKERS] get current log file

2016-02-26 Thread Armor
I think I know what you are concerned about. May be I did not explain my 
solution very clearly.
(i) Using a variable named last_syslogger_file_time replace 
first_syslogger_file_time in syslogger.c. When postmaster initialize logger 
process,   last_syslogger_file_time will be assign the time stamp when logger 
start, then fork the child process for logger. Later logger will create a log 
file based on last_syslogger_file_time . And   last_syslogger_file_time in the 
postmaster process will be inherited by other  auxiliary processes 

(ii) when pgstat process initialize, it will read  last_syslogger_file_time 
from pg stat file of last time (because pgstat process will write it to pg stat 
file). And then pgstat process will get last_syslogger_file_time inherit from 
postmaster,  if this version of  last_syslogger_file_time is larger then that 
read from the stat file, it means logger create a new log file so use it as the 
latest value; else means pgstat process crashed before, so it need to use the 
value from stat file as the latest.
(iii) when logger rotate a log file, it will assign time stamp to 
last_syslogger_file_time  and send it to pg_stat process. And pg_stat process 
will write last_syslogger_file_time to stat file so can be read by other 
backends.
() Adding a stat function named pg_stat_get_log_file_name, when user call 
it, it will read  last_syslogger_file_time from stat file and construct the log 
file name based on log file name format and last_syslogger_file_time, return 
the log file name eventually.


However, there is a risk for this solution: when logger create a new log file 
and then try to send new last_syslogger_file_time to pg_stat process, and 
pg_stat process crash at this moment, so the new pg_stat process cannot get the 
latest  last_syslogger_file_time. However, I think this case is a corner case. 

--
Jerry Yu
https://github.com/scarbrofair


 




-- Original --
From:  "Tom Lane";;
Date:  Thu, Feb 25, 2016 10:47 PM
To:  "Robert Haas"; 
Cc:  "Euler Taveira"; "Armor"; 
"Alvaro Herrera"; "Pgsql 
Hackers"; 
Subject:  Re: [HACKERS] get current log file



Robert Haas  writes:
> On Thu, Feb 25, 2016 at 1:15 AM, Euler Taveira >> wrote:
>>> To pass last_syslogger_file_time, we have 2 solutions: 1, add a
>>> global variable to record last_syslogger_file_time which shared by
>>> backends and syslogger, so backends can get last_syslogger_file_time
>>> very easily; 2 syslogger process send last_syslogger_file_time to pgstat
>>> process when last_syslogger_file_time changes, just as other auxiliary
>>> processes send stat  message to pgstat process, and  pgstat process will
>>> write  last_syslogger_file_time into stat file so that backend can
>>> get last_syslogger_file_time via reading this stat file.

>> I prefer (1) because (i) logfile name is not statistics and (ii) stats
>> collector could not respond in certain circumstances (and even discard
>> some messages).

> (1) seems like a bad idea, because IIUC, the syslogger process doesn't
> currently touch shared memory.  And in fact, shared memory can be
> reset after a backend exits abnormally, but the syslogger (alone among
> all PostgreSQL processes other than the postmaster) lasts across
> multiple such resets.

Yes, allowing the syslogger to depend on shared memory is right out.
I don't particularly care for having it assume the stats collector
exists, either -- in fact, given the current initialization order
it's physically impossible for syslogger to send to stats collector
because the former is started before the latter's communication
socket is made.

I haven't actually heard a use-case for exposing the current log file name
anyway.  But if somebody convinced me that there is one, I should think
that the way to implement it is to report the actual *name*, not
components out of which you could reconstruct the name only by assuming
that you know everything about the current syslogger configuration and
the code that builds log file names.  That's obviously full of race
conditions and code-maintenance hazards.

regards, tom lane

Re: [HACKERS] Sanity checking for ./configure options?

2016-02-26 Thread Ivan Kartyshov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Tested, I think it`s rather important to make cleanup work on that project.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol

2016-02-26 Thread Valery Popov



26.02.2016 01:10, Michael Paquier пишет:

On Fri, Feb 26, 2016 at 1:38 AM, Valery Popov  wrote:

Hi, Michael


23.02.2016 10:17, Michael Paquier пишет:

Attached is a set of patches implementing a couple of things that have
been discussed, so let's roll in.

Those 4 patches are aimed at putting in-core basics for the concept I
call password protocol aging, which is a way to allow multiple
password protocols to be defined in Postgres, and aimed at easing
administration as well as retirement of outdated protocols, which is
something that is not doable now in Postgres.

The second set of patch 0005~0008 introduces a new protocol, SCRAM.
9) 0009 is the SCRAM authentication itself

The theme with password checking is interesting for me, and I can give
review for CF for some features.
I think that review of all suggested features will require a lot of time.
Is it possible to make subset of patches concerning only password strength
and its aging?
The patches you have applied are non-independent. They should be apply
consequentially one by one.
Thus the patch 0009 can't be applied without git error  before 0001.
In this conditions all patches were successfully applied and compiled.
All tests successfully passed.

If you want to focus on the password protocol aging, you could just
have a look at 0001~0004.

OK, I will review patches 0001-0004, for starting.

--
Regards,
Valery Popov
Postgres Professional http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-02-26 Thread Simon Riggs
On 25 February 2016 at 18:42, Amit Kapila  wrote:

> On Thu, Feb 25, 2016 at 11:38 PM, Simon Riggs 
> wrote:
>
>> On 24 February 2016 at 23:26, Amit Kapila 
>> wrote:
>>
>>> From past few weeks, we were facing some performance degradation in the
>>> read-only performance bench marks in high-end machines.  My colleague
>>> Mithun, has tried by reverting commit ac1d794 which seems to degrade the
>>> performance in HEAD on high-end m/c's as reported previously[1], but still
>>> we were getting degradation, then we have done some profiling to see what
>>> has caused it  and we found that it's mainly caused by spin lock when
>>> called via pin/unpin buffer and then we tried by reverting commit 6150a1b0
>>> which has recently changed the structures in that area and it turns out
>>> that reverting that patch, we don't see any degradation in performance.
>>> The important point to note is that the performance degradation doesn't
>>> occur every time, but if the tests are repeated twice or thrice, it
>>> is easily visible.
>>>
>>
>> Not seen that on the original patch I posted. 6150a1b0 contains multiple
>> changes to the lwlock structures, one written by me, others by Andres.
>>
>> Perhaps we should revert that patch and re-apply the various changes in
>> multiple commits so we can see the differences.
>>
>>
> Yes, thats one choice, other is locally we can narrow down the root cause
> of problem and then try to address the same.  Last time similar issue came
> up on list, agreement [1] was to note down it in PostgreSQL 9.6 open items
> and then work on it.  I think for this problem, we haven't got to the root
> cause of problem, so we can try to investigate it.  If nobody else steps up
> to reproduce and look into problem, in few days, I will look into it.
>
> [1] -
> http://www.postgresql.org/message-id/CA+TgmoYjYqegXzrBizL-Ov7zDsS=GavCnxYnGn9WZ1S=rp8...@mail.gmail.com
>

Don't understand this. If a problem is caused by one of two things, first
you check one, then the other.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Konstantin Knizhnik
The reason of the problem is that invalidation messages are not 
delivered to replica after the end of concurrent create index.
Invalidation messages are included in xlog as part of transaction commit 
record.
Concurrent index create is split into three transaction, last of which 
is just performing inplace update of index tuple, marking it as valid 
and invalidating cache. But as far as this transaction is not assigned 
XID, no transaction record is created in WAL and send to replicas. As a 
result, replica doesn't receive this invalidation messages.


To fix the problem it is just enough to assign XID to transaction.
It can be done by adding GetCurrentTransactionId() call to the end of 
DefineIdnex function:


diff --git a/src/backend/commands/indexcmds.c 
b/src/backend/commands/indexcmds.c

index 13b04e6..1024603 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -881,6 +881,12 @@ DefineIndex(Oid relationId,
 CacheInvalidateRelcacheByRelid(heaprelid.relId);

 /*
+ * Force WAL commit record to ensure that replica receives invalidation
+ * messages.
+ */
+GetCurrentTransactionId();
+
+/*
  * Last thing to do is release the session-level lock on the 
parent table.

  */
 UnlockRelationIdForSession(, ShareUpdateExclusiveLock);




On 26.02.2016 15:41, Васильев Дмитрий wrote:
Session opened on replica doesn't see concurrently created indexes at 
this time on master.


We have master and replica:

1. master: pgbench -i -s 10

2. replica:
explain (analyze,verbose) select * from pgbench_accounts where 
abalance = 1;


3. master:
ALTER INDEX pgbench_accounts_abalance_idx RENAME TO 
pgbench_accounts_abalance_idx_delme;
CREATE INDEX CONCURRENTLY pgbench_accounts_abalance_idx ON 
pgbench_accounts USING btree (abalance);

DROP INDEX pgbench_accounts_abalance_idx_delme;

4. at this time on replica:

explain (analyze,verbose) select * from pgbench_accounts where 
abalance = 1;
pgbench=# explain (analyze,verbose) select * from pgbench_accounts 
where abalance = 1;

QUERY PLAN

Index Scan using pgbench_accounts_abalance_idx on 
public.pgbench_accounts (cost=0.42..4.44 rows=1 width=97) (actual 
time=655.781..655.781 rows=0 loops=1)

Output: aid, bid, abalance, filler
Index Cond: (pgbench_accounts.abalance = 1)
Planning time: 9388.259 ms
Execution time: 655.900 ms
(5 rows)

pgbench=# explain (analyze,verbose) select * from pgbench_accounts 
where abalance = 1;

QUERY PLAN
--
Index Scan using pgbench_accounts_abalance_idx_delme on 
public.pgbench_accounts (cost=0.42..4.44 rows=1 width=97) (actual 
time=0.014..0.014 rows=0 loops=1)

Output: aid, bid, abalance, filler
Index Cond: (pgbench_accounts.abalance = 1)
Planning time: 0.321 ms
Execution time: 0.049 ms
(5 rows)

pgbench=# explain (analyze,verbose) select * from pgbench_accounts 
where abalance = 1;

QUERY PLAN

Seq Scan on public.pgbench_accounts (cost=0.00..28894.00 rows=1 
width=97) (actual time=3060.451..3060.451 rows=0 loops=1)

Output: aid, bid, abalance, filler
Filter: (pgbench_accounts.abalance = 1)
Rows Removed by Filter: 100
Planning time: 0.087 ms
Execution time: 3060.484 ms
(6 rows)

pgbench=# \d+ pgbench_accounts
Table "public.pgbench_accounts"
Column | Type | Modifiers | Storage | Stats target | Description
--+---
aid | integer | not null | plain | |
bid | integer | | plain | |
abalance | integer | | plain | |
filler | character(84) | | extended | |
Indexes:
"pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
"pgbench_accounts_abalance_idx" btree (abalance)
Options: fillfactor=100

​New opened session successfully uses this index.
Tested with PostgreSQL 9.5.1.

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] get current log file

2016-02-26 Thread Tom Lane
Euler Taveira  writes:
> Those are good concerns. Also, we already have emit_log_hook that could
> grab server log messages. A small extension using the hook (there are
> some out there) could be use with a log consuming tool.

Hmmm ... emit_log_hook runs in the process calling elog, no?  That would
not have any special visibility into the state of syslogger.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get current log file

2016-02-26 Thread Euler Taveira
On 26-02-2016 11:50, Tom Lane wrote:
> This needs to be explained a lot more clearly than it has been so far,
> else we are going to reject this proposed feature as being more code and
> more overhead than is justified.  Exactly why would you need a pointer to
> the current log file, rather than just configuring whatever tool you use
> to vacuum up everything in the pg_log directory?  Why would this use-case
> not suffer from nasty race conditions (ie, what happens when current log
> file changes immediately before or immediately after you look at the
> pointer)?
> 
Those are good concerns. Also, we already have emit_log_hook that could
grab server log messages. A small extension using the hook (there are
some out there) could be use with a log consuming tool.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get current log file

2016-02-26 Thread Tom Lane
Euler Taveira  writes:
> On 26-02-2016 08:03, Robert Haas wrote:
>> But there's one thing I'm slightly baffled about: why would you
>> actually need this?

> The use case I have in mind is consume log file by using a tool like
> logstash. In this case, logstash accepts patterns and you can also use
> syslog for it.

This needs to be explained a lot more clearly than it has been so far,
else we are going to reject this proposed feature as being more code and
more overhead than is justified.  Exactly why would you need a pointer to
the current log file, rather than just configuring whatever tool you use
to vacuum up everything in the pg_log directory?  Why would this use-case
not suffer from nasty race conditions (ie, what happens when current log
file changes immediately before or immediately after you look at the
pointer)?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Shulgin, Oleksandr
On Fri, Feb 26, 2016 at 3:24 PM, Ivan Kartyshov 
wrote:

> The following review has been posted through the commitfest application:
>


> make installcheck-world:  tested, failed
> Implements feature:   tested, failed
> Spec compliant:   tested, failed
> Documentation:tested, failed
>
> Applied this patch, it works well, make what it expected correctly, code
> style is maintained. Regression tests passed OK. No documentation or
> comments.
>

Why does it say "tested, failed" for all points above there? ;-)

--
Alex


Re: [HACKERS] Sanity checking for ./configure options?

2016-02-26 Thread Tom Lane
David Fetter  writes:
> On Fri, Feb 26, 2016 at 04:55:23PM +0530, Robert Haas wrote:
>> On Wed, Feb 24, 2016 at 4:01 AM, David Fetter  wrote:
>>> I'm thinking that both the GUC check and the configure one should
>>> restrict it to [1024..65535].

>> Doesn't sound like a good idea to me.  If somebody has a reason they
>> want to do that, they shouldn't have to hack the source code and
>> recompile to make it work.

> I'm not sure I understand a use case here.

> On *n*x, we already disallow running as root pretty aggressively,
> using the "have to hack the source code and recompile" level of effort
> you aptly described.  This is just cleanup work on that project, as I
> see it.

> What am I missing?

You're assuming that every system under the sun prevents non-root
processes from opening ports below 1024.  I do not know if that's
true, and even if it is, it doesn't seem to me that it's our job
to enforce it.  I agree with Robert --- restricting to [1,65535]
is plenty good enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get current log file

2016-02-26 Thread Euler Taveira
On 26-02-2016 08:03, Robert Haas wrote:
> I don't think we're going to accept this feature if it might fail in
> corner cases.  And that design seems awfully complex.
> 
Agree.

> The obvious way to implement this, to me at least, seems to be for the
> syslogger to write a file someplace in the data directory containing
> the name of the current log file.  When it switches log files, it
> rewrites that file.  When you want to know what the current logfile
> is, you read that file.
> 
That is not an elegant solution but it is simple. However, it is another
file in PGDATA. I can live with that but if we have consensus, let's do
it optional.

> But there's one thing I'm slightly baffled about: why would you
> actually need this?
> 
The use case I have in mind is consume log file by using a tool like
logstash. In this case, logstash accepts patterns and you can also use
syslog for it.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATH] Correct negative/zero year in to_date/to_timestamp

2016-02-26 Thread Ivan Kartyshov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, failed
Spec compliant:   tested, failed
Documentation:tested, failed

Applied this patch, it works well, make what it expected correctly, code style 
is maintained. Regression tests passed OK. No documentation or comments.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Oleg Bartunov
On Fri, Feb 26, 2016 at 3:50 PM, Robert Haas  wrote:

> On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov 
> wrote:
> > I already several times pointed, that we need XTM to be able to continue
> > development in different directions, since there is no clear winner.
> > Moreover, I think there is no fits-all  solution and while I agree we
> need
> > one built-in in the core, other approaches should have ability to exists
> > without patching.
>
> I don't think I necessarily agree with that.  Transaction management
> is such a fundamental part of the system that I think making it
> pluggable is going to be really hard.  I understand that you've done
> several implementations based on your proposed API, and that's good as
> far as it goes, but how do we know that's really going to be general
> enough for what other people might need?


Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.



> And what makes us think we
> really need multiple transaction managers, anyway?



If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.



> Even writing one
> good distributed transaction manager seems like a really hard project
> - why would we want to write two or three or five?
>

again, right now it's simply impossible to any bright person to work on
dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Васильев Дмитрий
This is obviously a bug because without "concurrently" create index this do
not reproduce.

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

2016-02-26 16:36 GMT+03:00 Alexander Korotkov :

> On Fri, Feb 26, 2016 at 3:41 PM, Васильев Дмитрий <
> d.vasil...@postgrespro.ru> wrote:
>
>> Session opened on replica doesn't see concurrently created indexes at
>> this time on master.
>>
>
> As I get, on standby index is visible when you run SQL queries on catalog
> tables (that is what \d+ does), but planner doesn't pick it. Assuming that
> in new session planner picks this index, it seems to be bug for me.
>
> --
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>


Re: [HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Alexander Korotkov
On Fri, Feb 26, 2016 at 3:41 PM, Васильев Дмитрий  wrote:

> Session opened on replica doesn't see concurrently created indexes at this
> time on master.
>

As I get, on standby index is visible when you run SQL queries on catalog
tables (that is what \d+ does), but planner doesn't pick it. Assuming that
in new session planner picks this index, it seems to be bug for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sanity checking for ./configure options?

2016-02-26 Thread David Fetter
On Fri, Feb 26, 2016 at 04:55:23PM +0530, Robert Haas wrote:
> On Wed, Feb 24, 2016 at 4:01 AM, David Fetter  wrote:
> > I'm thinking that both the GUC check and the configure one should
> > restrict it to [1024..65535].
> 
> Doesn't sound like a good idea to me.  If somebody has a reason they
> want to do that, they shouldn't have to hack the source code and
> recompile to make it work.

I'm not sure I understand a use case here.

On *n*x, we already disallow running as root pretty aggressively,
using the "have to hack the source code and recompile" level of effort
you aptly described.  This is just cleanup work on that project, as I
see it.

What am I missing?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov  wrote:
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.
> Moreover, I think there is no fits-all  solution and while I agree we need
> one built-in in the core, other approaches should have ability to exists
> without patching.

I don't think I necessarily agree with that.  Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard.  I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need?  And what makes us think we
really need multiple transaction managers, anyway?  Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-02-26 Thread Michael Paquier
On Fri, Feb 26, 2016 at 4:18 PM, Kyotaro HORIGUCHI
 wrote:
> I marked this as "ready for commiter" and tried to add me as the
> *second* author. But the CF app forces certain msyterious order
> for listed names. Is there any means to arrange the author names
> in desired order?

Those are automatically classified by alphabetical order.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Relation cache invalidation on replica

2016-02-26 Thread Васильев Дмитрий
Session opened on replica doesn't see concurrently created indexes at this
time on master.

We have master and replica:

1. master: pgbench -i -s 10

2. replica:
explain (analyze,verbose) select * from pgbench_accounts where abalance = 1;

3. master:
ALTER INDEX pgbench_accounts_abalance_idx RENAME TO
pgbench_accounts_abalance_idx_delme;
CREATE INDEX CONCURRENTLY pgbench_accounts_abalance_idx ON pgbench_accounts
USING btree (abalance);
DROP INDEX pgbench_accounts_abalance_idx_delme;

4. at this time on replica:

explain (analyze,verbose) select * from pgbench_accounts where abalance = 1;
pgbench=# explain (analyze,verbose) select * from pgbench_accounts where
abalance = 1;
QUERY PLAN

Index Scan using pgbench_accounts_abalance_idx on public.pgbench_accounts
(cost=0.42..4.44 rows=1 width=97) (actual time=655.781..655.781 rows=0
loops=1)
Output: aid, bid, abalance, filler
Index Cond: (pgbench_accounts.abalance = 1)
Planning time: 9388.259 ms
Execution time: 655.900 ms
(5 rows)

pgbench=# explain (analyze,verbose) select * from pgbench_accounts where
abalance = 1;
QUERY PLAN
--
Index Scan using pgbench_accounts_abalance_idx_delme on
public.pgbench_accounts (cost=0.42..4.44 rows=1 width=97) (actual
time=0.014..0.014 rows=0 loops=1)
Output: aid, bid, abalance, filler
Index Cond: (pgbench_accounts.abalance = 1)
Planning time: 0.321 ms
Execution time: 0.049 ms
(5 rows)

pgbench=# explain (analyze,verbose) select * from pgbench_accounts where
abalance = 1;
QUERY PLAN

Seq Scan on public.pgbench_accounts (cost=0.00..28894.00 rows=1 width=97)
(actual time=3060.451..3060.451 rows=0 loops=1)
Output: aid, bid, abalance, filler
Filter: (pgbench_accounts.abalance = 1)
Rows Removed by Filter: 100
Planning time: 0.087 ms
Execution time: 3060.484 ms
(6 rows)

pgbench=# \d+ pgbench_accounts
Table "public.pgbench_accounts"
Column | Type | Modifiers | Storage | Stats target | Description
--+---
aid | integer | not null | plain | |
bid | integer | | plain | |
abalance | integer | | plain | |
filler | character(84) | | extended | |
Indexes:
"pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
"pgbench_accounts_abalance_idx" btree (abalance)
Options: fillfactor=100

​New opened session successfully uses this index.
Tested with PostgreSQL 9.5.1.

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: [HACKERS] Sanity checking for ./configure options?

2016-02-26 Thread Robert Haas
On Wed, Feb 24, 2016 at 4:01 AM, David Fetter  wrote:
> I'm thinking that both the GUC check and the configure one should
> restrict it to [1024..65535].

Doesn't sound like a good idea to me.  If somebody has a reason they
want to do that, they shouldn't have to hack the source code and
recompile to make it work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepared Statement support for Parallel query

2016-02-26 Thread Amit Kapila
On Fri, Feb 26, 2016 at 4:37 PM, Robert Haas  wrote:
>
> On Thu, Feb 25, 2016 at 1:09 PM, Robert Haas 
wrote:
> > On Thu, Feb 25, 2016 at 8:53 AM, Amit Kapila 
wrote:
> >>>   But if the user says
> >>> they want to PREPARE the query, they are probably not going to fetch
> >>> all rows.
> >>
> >> After PREPARE, user will execute the statement using EXECUTE and
> >> I don't see how user can decide number of rows to fetch which can
> >> influence the execution.  Can you please elaborate your point more
> >> and what is your expectation for the same?
> >
> > Argh.  I'm getting confused between prepared statements and cursors.
> > So if the user does PREPARE followed by EXECUTE, then that is OK.  The
> > problem is only if they use DECLARE .. CURSOR FOR, which your patch
> > doesn't affect.
> >
> > So, committed.
>
> And, I'm going to revert this part.  If you'd run the regression tests
> under force_parallel_mode=regress, max_parallel_degree>0, you would
> have noticed that this part breaks it, because of CREATE TABLE ... AS
> EXECUTE.
>

I will look into it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Prepared Statement support for Parallel query

2016-02-26 Thread Robert Haas
On Thu, Feb 25, 2016 at 1:09 PM, Robert Haas  wrote:
> On Thu, Feb 25, 2016 at 8:53 AM, Amit Kapila  wrote:
>>>   But if the user says
>>> they want to PREPARE the query, they are probably not going to fetch
>>> all rows.
>>
>> After PREPARE, user will execute the statement using EXECUTE and
>> I don't see how user can decide number of rows to fetch which can
>> influence the execution.  Can you please elaborate your point more
>> and what is your expectation for the same?
>
> Argh.  I'm getting confused between prepared statements and cursors.
> So if the user does PREPARE followed by EXECUTE, then that is OK.  The
> problem is only if they use DECLARE .. CURSOR FOR, which your patch
> doesn't affect.
>
> So, committed.

And, I'm going to revert this part.  If you'd run the regression tests
under force_parallel_mode=regress, max_parallel_degree>0, you would
have noticed that this part breaks it, because of CREATE TABLE ... AS
EXECUTE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get current log file

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 8:31 AM, Armor  wrote:
> I think I know what you are concerned about. May be I did not explain my
> solution very clearly.
> (i) Using a variable named last_syslogger_file_time replace
> first_syslogger_file_time in syslogger.c. When postmaster initialize logger
> process,   last_syslogger_file_time will be assign the time stamp when
> logger start, then fork the child process for logger. Later logger will
> create a log file based on last_syslogger_file_time . And
> last_syslogger_file_time in the postmaster process will be inherited by
> other  auxiliary processes
> (ii) when pgstat process initialize, it will read  last_syslogger_file_time
> from pg stat file of last time (because pgstat process will write it to pg
> stat file). And then pgstat process will get last_syslogger_file_time
> inherit from postmaster,  if this version of  last_syslogger_file_time is
> larger then that read from the stat file, it means logger create a new log
> file so use it as the latest value; else means pgstat process crashed
> before, so it need to use the value from stat file as the latest.
> (iii) when logger rotate a log file, it will assign time stamp to
> last_syslogger_file_time  and send it to pg_stat process. And pg_stat
> process will write last_syslogger_file_time to stat file so can be read by
> other backends.
> () Adding a stat function named pg_stat_get_log_file_name, when user
> call it, it will read  last_syslogger_file_time from stat file and construct
> the log file name based on log file name format and
> last_syslogger_file_time, return the log file name eventually.
>
> However, there is a risk for this solution: when logger create a new log
> file and then try to send new last_syslogger_file_time to pg_stat process,
> and pg_stat process crash at this moment, so the new pg_stat process cannot
> get the latest  last_syslogger_file_time. However, I think this case is a
> corner case.

I don't think we're going to accept this feature if it might fail in
corner cases.  And that design seems awfully complex.

The obvious way to implement this, to me at least, seems to be for the
syslogger to write a file someplace in the data directory containing
the name of the current log file.  When it switches log files, it
rewrites that file.  When you want to know what the current logfile
is, you read that file.

But there's one thing I'm slightly baffled about: why would you
actually need this?  I mean, it seems like a good idea to set
log_filename to a pattern that makes the name of the current logfile
pretty well predictable.  If not, maybe you should just fix that.
Also, if not on Windows, if you do get confused about which logfile is
active, you could just use lsof on the log_directory to figure out
which file the syslogger has open.  I just can't really remember
having a problem with this, and I'm wondering why someone would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-26 Thread Robert Haas
On Wed, Feb 24, 2016 at 12:59 PM, Thomas Munro
 wrote:
> On Wed, Feb 24, 2016 at 5:48 PM, Thomas Munro
>  wrote:
>> Here is a first pass at that. [...]
>
> On Wed, Feb 24, 2016 at 1:23 AM, Robert Haas  wrote:
>> file_fdw is parallel-safe, ...
>
> And here is a patch to apply on top of the last one, to make file_fdw
> return true.  But does it really work correctly under parallelism?

Seems like it.  Running the regression tests for file_fdw under
force_parallel_mode=regress, max_parallel_degree>0 passes; you can
verify that's actually doing something by using
force_parallel_mode=on, which will result in some predictable
failures.  From a theoretical point of view, there's no reason I can
see why reading a file shouldn't work just as well from a parallel
worker as from the leader.  They both have the same view of the
filesystem, and in neither case are we trying to write any data; we're
just trying to read it.

Committed these patches after revising the comment you wrote and
adding documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_filedump patch for 9.5

2016-02-26 Thread Pavel Raiskup
On Saturday 08 of August 2015 20:38:38 Satoshi Nagayasu wrote:
> I have created a patch for pg_filedump to work with 9.5.
> Here is a list of changes.
> 
>  * Fix to rename CRC32 macros to work with 9.5.
>  * Fix to add missing DBState: DB_SHUTDOWNED_IN_RECOVERY.
>  * Fix to add missing page flags for Btree and GIN.
>  * Update copyright date.
> 
> Please take a look. Any comments are welcome.

Thanks for the patch;  it helps with building against 9.5.

Hints I can give ATM:

  * copyright is outdated now 
  * to allow the build with 'make
PGSQL_INCLUDE_DIR=/usr/include/pgsql/server' the following patch is
needed:

--- a/Makefile
+++ b/Makefile
@@ -18,7 +18,7 @@ DISTFILES= README.pg_filedump Makefile Makefile.contrib \
 all: pg_filedump
 
 pg_filedump: pg_filedump.o
-   ${CC} ${CFLAGS} -o pg_filedump pg_filedump.o
+   ${CC} ${CFLAGS} -o pg_filedump pg_filedump.o -lpgport

Pavel



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Cache data in GetSnapshotData()

2016-02-26 Thread Robert Haas
On Thu, Feb 25, 2016 at 12:57 PM, Mithun Cy  wrote:
> I have fixed all of the issues reported by regress test. Also now when
> backend try to cache the snapshot we also try to store the self-xid and sub
> xid, so other backends can use them.
>
> I also did some read-only perf tests.

I'm not sure what you are doing that keeps breaking threading for
Gmail, but this thread is getting split up for me into multiple
threads with only a few messages in each.  The same seems to be
happening in the community archives.  Please try to figure out what is
causing that and stop doing it.

I notice you seem to not to have implemented this suggestion by Andres:

http://www.postgresql.org//message-id/20160104085845.m5nrypvmmpea5...@alap3.anarazel.de

Also, you should test this on a machine with more than 2 cores.
Andres's original test seems to have been on a 4-core system, where
this would be more likely to work out.

Also, Andres suggested testing this on an 80-20 write mix, where as
you tested it on 100% read-only.  In that case there is no blocking
ProcArrayLock, which reduces the chances that the patch will benefit.

I suspect, too, that the chances of this patch working out have
probably been reduced by 0e141c0fbb211bdd23783afa731e3eef95c9ad7a,
which seems to be targeting mostly the same problem.  I mean it's
possible that this patch could win even when no ProcArrayLock-related
blocking is happening, but the original idea seems to have been that
it would help mostly with that case.  You could try benchmarking it on
the commit just before that one and then on current sources and see if
you get the same results on both, or if there was more benefit before
that commit.

Also, you could consider repeating the LWLOCK_STATS testing that Amit
did in his original reply to Andres.  That would tell you whether the
performance is not increasing because the patch doesn't reduce
ProcArrayLock contention, or whether the performance is not increasing
because the patch DOES reduce ProcArrayLock contention but then the
contention shifts to CLogControlLock or elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-26 Thread Amit Langote

Hi Vinayak,

Thanks for updating the patch! A quick comment:

On 2016/02/26 17:28, poku...@pm.nttdata.co.jp wrote:
>> CREATE VIEW pg_stat_vacuum_progress AS 
>>   SELECT S.s[1] as pid, 
>>  S.s[2] as relid, 
>>  CASE S.s[3] 
>>WHEN 1 THEN 'Scanning Heap' 
>>WHEN 2 THEN 'Vacuuming Index and Heap' 
>>ELSE 'Unknown phase' 
>>  END, 
>> 
>>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S; 
>>
>> # The name of the function could be other than *_command_progress.
> The name of function is updated as pg_stat_get_progress_info() and also 
> updated the function.
> Updated the pg_stat_vacuum_progress view as suggested.

So, pg_stat_get_progress_info() now accepts a parameter to distinguish
different commands.  I see the following in its definition:

+   /*  Report values for only those backends which are running 
VACUUM
command */
+   if (cmdtype == COMMAND_LAZY_VACUUM)
+   {
+   /*Progress can only be viewed by role member.*/
+   if (has_privs_of_role(GetUserId(), beentry->st_userid))
+   {
+   values[2] = 
UInt32GetDatum(beentry->st_progress_param[0]);
+   values[3] = 
UInt32GetDatum(beentry->st_progress_param[1]);
+   values[4] = 
UInt32GetDatum(beentry->st_progress_param[2]);
+   values[5] = 
UInt32GetDatum(beentry->st_progress_param[3]);
+   values[6] = 
UInt32GetDatum(beentry->st_progress_param[4]);
+   values[7] = 
UInt32GetDatum(beentry->st_progress_param[5]);
+   if (beentry->st_progress_param[1] != 0)
+   values[8] = 
Float8GetDatum(beentry->st_progress_param[2] * 100 /
beentry->st_progress_param[1]);
+   else
+   nulls[8] = true;
+   }
+   else
+   {
+   nulls[2] = true;
+   nulls[3] = true;
+   nulls[4] = true;
+   nulls[5] = true;
+   nulls[6] = true;
+   nulls[7] = true;
+   nulls[8] = true;
+   }
+   }

How about doing this in a separate function which takes the command id as
parameter and returns an array of values and the number of values (per
command id). pg_stat_get_progress_info() then creates values[] and nulls[]
arrays from that and returns that as result set.  It will be a cleaner
separation of activities, perhaps.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-02-26 Thread pokurev
Hello,

Thank you for your comments.
Please find attached patch addressing following comments.

>As I might have written upthread, transferring the whole string 
>as a progress message is useless at least in this scenario. Since 
>they are a set of fixed messages, each of them can be represented 
>by an identifier, an integer number. I don't see a reason for 
>sending the whole of a string beyond a backend. 
Agreed. I used following macros.
#define VACUUM_PHASE_SCAN_HEAP  1 
#define VACUUM_PHASE_VACUUM_INDEX_HEAP  2

>I guess num_index_scans could better be reported after all the indexes are 
>done, that is, after the for loop ends.
Agreed.  I have corrected it.

> CREATE VIEW pg_stat_vacuum_progress AS 
>   SELECT S.s[1] as pid, 
>  S.s[2] as relid, 
>  CASE S.s[3] 
>WHEN 1 THEN 'Scanning Heap' 
>WHEN 2 THEN 'Vacuuming Index and Heap' 
>ELSE 'Unknown phase' 
>  END, 
> 
>   FROM pg_stat_get_command_progress(PROGRESS_COMMAND_VACUUM) as S; 
> 
> # The name of the function could be other than *_command_progress.
The name of function is updated as pg_stat_get_progress_info() and also updated 
the function.
Updated the pg_stat_vacuum_progress view as suggested.

Regards,
Vinayak


Vacuum_progress_checker_v12.patch
Description: Vacuum_progress_checker_v12.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers