date:20170119

Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-19 Thread Michael Paquier

On Fri, Jan 20, 2017 at 4:11 AM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>
>> After sleeping on this, I'm inclined to go with Amit's fix for now.
>> It seems less likely to break anything in the back-branches than any
>> other option I can think up.
>
> Yeah, no objections here.

+1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-01-19 Thread Michael Paquier

On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas  wrote:
> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas  wrote:
>> Problems 2-4 actually have to do with a DestReceiver of type
>> DestRemote really, really wanting to have an associated Portal and
>> database connection, so one approach is to create a stripped-down
>> DestReceiver that doesn't care about those things and then passing
>> that to GetPGVariable.
>
> I tried that and it worked out pretty well, so I'm inclined to go with
> this approach.  Proposed patches attached.  0001 adds the new
> DestReceiver type, and 0002 is a revised patch to implement the SHOW
> command itself.
>
> Thoughts, comments?

This looks like a sensible approach to me. DestRemoteSimple could be
useful for background workers that are not connected to a database as
well. Isn't there a problem with PGC_REAL parameters?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-19 Thread Ashutosh Bapat

On Fri, Jan 20, 2017 at 12:46 PM, Michael Paquier
 wrote:
> On Fri, Jan 20, 2017 at 10:56 AM, Haribabu Kommi
>  wrote:
>> The Assert case can be hit only, when the user added to new options to
>> display
>> to the user through view but not updating the macro to the max number of
>> options then, it can lead to that assert.
>>
>> Updated patch attached including reverting of file leak changes.
>
> OK, thanks for the new version. I am marking this version as ready for
> committer.

I do intend to make a pass ASAP.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

Hi Andres,

On 2017/01/20 15:15, Andres Freund wrote:
> On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
>> Committed.
> 
> One of the patches around this topic committed recently seems to cause
> valgrind failures like
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02
> :
> ==24969== Conditional jump or move depends on uninitialised value(s)
> ==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)
> ==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318)
> ==24969==by 0x536643: partition_bounds_equal (partition.c:627)
> ==24969==by 0x820864: equalPartitionDescs (relcache.c:1203)
> ==24969==by 0x82423A: RelationClearRelation (relcache.c:2553)
> ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
> ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
> ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
> ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
> ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
> ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
> ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
> ==24969==  Uninitialised value was created by a heap allocation
> ==24969==at 0x85AA83: palloc (mcxt.c:914)
> ==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528)
> ==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348)
> ==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524)
> ==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
> ==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
> ==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
> ==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
> ==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
> ==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
> ==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
> ==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490)
> ==24969== 
> ==24969== VALGRINDERROR-END

Thanks for the report.  This being my first time reading a valgrind report
on buildfarm, is it correct to to assume that the command immediately
preceding VALGRINDERROR-BEGIN is what triggered the failure?

... LOG:  statement: truncate list_parted;
==24969== VALGRINDERROR-BEGIN
==24969== Conditional jump or move depends on uninitialised value(s)
==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)

So in this case: truncate list_parted?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-19 Thread Michael Paquier

On Fri, Jan 20, 2017 at 10:56 AM, Haribabu Kommi
 wrote:
> The Assert case can be hit only, when the user added to new options to
> display
> to the user through view but not updating the macro to the max number of
> options then, it can lead to that assert.
>
> Updated patch attached including reverting of file leak changes.

OK, thanks for the new version. I am marking this version as ready for
committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-19 Thread Michael Paquier

On Fri, Jan 20, 2017 at 3:29 AM, Karl O. Pinc  wrote:
> On Thu, 19 Jan 2017 12:12:18 -0300
> Alvaro Herrera  wrote:
>
>> Karl O. Pinc wrote:
>> > On Wed, 18 Jan 2017 19:27:40 -0300
>> > Alvaro Herrera  wrote:
>>
>> > > I thought this part was odd -- I mean, why is SysLogger_Start()
>> > > being called if the collector is not enabled?  Turns out we do it
>> > > and return early if not enabled.  But not in all cases -- there
>> > > is one callsite in postmaster.c that avoids the call if the
>> > > collector is disabled. That needs to be changed if we want this
>> > > to work reliably.
>> >
>> > Is this an argument for having the current_logfiles always exist
>> > and be empty when there is no in-filesystem logfile?  It always felt
>> > to me that the code would be simpler that way.
>>
>> I don't know.  I am just saying that you need to patch postmaster.c
>> line 1726 to remove the second arm of the &&.
>
> Gilles,
>
> I'm not available just now.  Can you do this or enlist Michael?

Okay I just did it. At the same time the check for ferror is not
necessary as fgets() returns NULL on an error as well so that's dead
code. I have also removed the useless call to FreeFile().
-- 
Michael
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 07afa3c77a..427980d77c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4218,6 +4218,11 @@ SELECT * FROM parent WHERE key = 2400;
   where to log
  
 
+ 
+   current_logfiles
+   and the log_destination configuration parameter
+ 
+
  
 
  
@@ -4248,6 +4253,27 @@ SELECT * FROM parent WHERE key = 2400;
  must be enabled to generate
 CSV-format log output.

+   
+When either stderr or
+csvlog are included, the file
+current_logfiles is created to record the location
+of the log file(s) currently in use by the logging collector and the
+associated logging destination. This provides a convenient way to
+find the logs currently in use by the instance. Here is an example of
+this file's content:
+
+stderr pg_log/postgresql.log
+csvlog pg_log/postgresql.csv
+
+
+current_logfiles is recreated when a new log file
+is created as an effect of rotation, and
+when log_destination is reloaded.  It is removed when
+neither stderr
+nor csvlog are included
+in log_destination, and when the logging collector is
+disabled.
+   
 

 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2504a466e6..5b0be26e0a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15449,6 +15449,13 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS 
t(ls,n);
   
 
   
+   
pg_current_logfile(text)
+   text
+   Primary log file name, or log in the requested format,
+   currently in use by the logging collector
+  
+
+  

pg_my_temp_schema()
oid
OID of session's temporary schema, or 0 if none
@@ -15667,6 +15674,45 @@ SET search_path TO schema , 
schema, ..

 

+pg_current_logfile
+   
+
+   
+Logging
+pg_current_logfile function
+   
+
+   
+ current_logfiles
+ and the pg_current_logfile function
+   
+
+   
+Logging
+current_logfiles file and the pg_current_logfile
+function
+   
+
+   
+pg_current_logfile returns, as text,
+the path of the log file(s) currently in use by the logging collector.
+The path includes the  directory
+and the log file name.  Log collection must be enabled or the return value
+is NULL.  When multiple log files exist, each in a
+different format, pg_current_logfile called
+without arguments returns the path of the file having the first format
+found in the ordered list: stderr, csvlog.
+NULL is returned when no log file has any of these
+formats.  To request a specific file format supply, as text,
+either csvlog or stderr as the value of the
+optional parameter. The return value is NULL when the
+log format requested is not a configured
+.  The
+pg_current_logfiles reflects the contents of the
+current_logfiles file.
+   
+
+   
 pg_my_temp_schema

 
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 5c52824dfc..9865751aa5 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -36,6 +36,10 @@ these required items, the cluster configuration files
 PGDATA, although it is possible to place them elsewhere.
 
 
+
+  current_logfiles
+
+
 
 Contents of PGDATA
 
@@ -61,6 +65,12 @@ Item
 
 
 
+ current_logfiles
+ File recording the log file(s) currently written to by the logging
+  collector
+
+
+
  global
  Subdirectory containing cluster-wide tables, such as
  pg_database
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 07f291b7cd..cbeeab80b5 100644
--- a/src/backend/catalog/system_views.sql

Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-19 Thread Amit Khandekar

On 18 January 2017 at 02:32, Robert Haas  wrote:
> On Fri, Jan 13, 2017 at 8:45 AM, Alvaro Herrera
>  wrote:
>> I think this is the same problem as reported in
>> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com
>
> If I understand correctly, and it's possible that I don't, the issues
> are distinct.  I think that the issue in that thread has to do with
> the autovacuum launcher starting workers over and over again in a
> tight loop, whereas this issue seems to be about autovacuum workers
> restarting the launcher over and over again in a tight loop.  In that
> thread, it's the autovacuum launcher that is looping, which can only
> happen when autovacuum=on.  In this thread, the autovacuum launcher is
> repeatedly exiting and getting restarted, which can only happen when
> autovacuum=off.
Yes, that's true : in the other thread, autovacuum is on. Although, I
haven't been able to get why there would there be a storm of workers
spawned in case of autovacuum on. When it is on, the launcher starts
worker only it's time to start the worker.

>
> I would be tempted to install something directly in postmaster.c.  If
> CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) && Shutdown ==
> NoShutdown but we last set start_autovac_launcher = true less than 10
> seconds ago, don't do it again.

My impression was that postmaster is supposed to just do a minimal
work of starting auto-vacuum launcher if not already. And, the work of
ensuring all the things keep going is the job of auto-vacuum launcher.

> That limits us to launching the
> autovacuum launcher at most six times a minute when autovacuum = off.
> You could argue that defeats the point of the SendPostmasterSignal in
> SetTransactionIdLimit, but I don't think so.  If vacuuming the oldest
> database took less than 10 seconds, then we won't vacuum the
> next-oldest database until we hit the next 64kB transaction ID
> boundary, but that can only cause a problem if we've got so many
> databases that we don't get to them all before we run out of
> transaction IDs, which is almost unthinkable.  If you had a ten
> million tiny databases that all crossed the threshold at the same
> instant, it would take you 640 million transaction IDs to visit them
> all.  If you also had autovacuum_freeze_max_age set very close to the
> upper limit for that variable, you could conceivably have the system
> shut down before all of those databases were reached.  But that's a
> pretty artificial scenario.  If someone has that scenario, perhaps
> they should consider more sensible configuration choices.

Yeah this logic makes sense ...

But I guess , from looking at the code, it seems that it was carefully
made sure that in case of auto-vacuum off, we should clean up all
databases as fast as possible with multiple workers cleaning up
multiple tables in parallel.

Instead of autovacuum launcher and worker together making sure that
the cycle of iterations keep on running, I was thinking the
auto-vacuum launcher itself should make sure it does not spawn another
worker on the same database if it did nothing. But that seemed pretty
invasive.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Andres Freund

On 2017-01-19 14:18:23 -0500, Robert Haas wrote:
> Committed.

One of the patches around this topic committed recently seems to cause
valgrind failures like
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02
:
==24969== Conditional jump or move depends on uninitialised value(s)
==24969==at 0x4C38DA: btint4cmp (nbtcompare.c:97)
==24969==by 0x83860C: FunctionCall2Coll (fmgr.c:1318)
==24969==by 0x536643: partition_bounds_equal (partition.c:627)
==24969==by 0x820864: equalPartitionDescs (relcache.c:1203)
==24969==by 0x82423A: RelationClearRelation (relcache.c:2553)
==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
==24969==  Uninitialised value was created by a heap allocation
==24969==at 0x85AA83: palloc (mcxt.c:914)
==24969==by 0x53648E: RelationBuildPartitionDesc (partition.c:528)
==24969==by 0x823F93: RelationBuildDesc (relcache.c:1348)
==24969==by 0x8241DB: RelationClearRelation (relcache.c:2524)
==24969==by 0x8248BA: RelationFlushRelation (relcache.c:2662)
==24969==by 0x824983: RelationCacheInvalidateEntry (relcache.c:2714)
==24969==by 0x81D9D6: LocalExecuteInvalidationMessage (inval.c:568)
==24969==by 0x81CB0D: ProcessInvalidationMessages (inval.c:444)
==24969==by 0x81D3CB: CommandEndInvalidationMessages (inval.c:1056)
==24969==by 0x4F6735: AtCCI_LocalCache (xact.c:1374)
==24969==by 0x4F8249: CommandCounterIncrement (xact.c:957)
==24969==by 0x82538B: RelationSetNewRelfilenode (relcache.c:3490)
==24969== 
==24969== VALGRINDERROR-END


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Ashutosh Bapat

On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas  wrote:
> On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
>  wrote:
>>> Yes, I think that's broadly the approach Tom was recommending.
>>
>> I don't have problem with that approach, but the latest patch has
>> following problems.
>> 1. We are copying chunks of path creation logic, instead of reusing
>> corresponding functions.
>
> Exactly which functions do you think we ought to be reusing that the
> patch currently doesn't reuse?
>
>> 2. There are many cases where the new function would return no local
>> path and hence postgres_fdw doesn't push down the join [1]. This will
>> be performance regression compared to 9.6.
>
> Some, or many?  How many?  Which ones?  At least some of the problems
> you were complaining about look like basic validity checks that were
> copied from joinpath.c, so they're probably necessary for correctness.
> It's worth remembering that we're trying to fix a bug here; if that
> makes some cases perform less well, that's sad, but not as sad as
> throwing a bogus error, which is what's happening right now.

AFAIU, the problem esp. with parameterized paths is this: If rel1 is
required to be parameterized by rel2 (because of lateral references?),
a nested loop join with rel2 as outer relation and rel1 as inner
relation is possible. postgres_fdw and hence this function, however
doesn't consider all the possible join combinations and thus when this
function is presented with rel1 as outer and rel2 as inner would
refuse to create a path. More generally, while creating local paths,
we try many combinations of participating relations, some of which do
not produce local paths and some of them do (AFAIK, it happens in case
of lateral references, but there might be other cases). postgres_fdw,
however considers only a single combination, which may or may not have
produced local path. In such a case, postgres_fdw would create a
foreign join path but won't get a local path and thus bail out.
Obviously, we don't want CreateLocalJoinPath() to try out all the
combinations. That is the simplicity of copying an existing path; all
combinations have been tried already. If we really want a nested loop
join, we may try pulling inner and outer paths from an existing path
and build a nested loop join (that's just a suggestion).

Take example of
/*
 * If the cheapest-total outer path is parameterized by the
 * inner rel, we can't generate a nestloop path.  (There's no
 * use looking for alternative outer paths, since it should
 * already be the least-parameterized available path.)
 */
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;

We may be able to create a nest loop path if we switch inner and outer
relations.

I have provided a patch in [1] to fix the bogus error. But if we want
to replace one approach (when there's a way to fix bug in that
approach) with another (to fix that bug and improve), the other
approach should be equally efficient.

>
> I'm a bit sketchy about this kind of thing, though:
>
> ! if (required_outer)
>   {
> ! bms_free(required_outer);
> ! return NULL;
>   }
>
> I don't understand what would stop that from making this fail to
> generate parameterized paths in some cases in which it would be legal
> to do so, and the comment is very brief.

I am not so much worried about this though :).
GetExistingLocalJoinPath() also does not handle that case. The new
function is not making it worse in this case.
731 /* Skip parameterised paths. */
732 if (path->param_info != NULL)
733 continue;

[1] 
https://www.postgresql.org/message-id/cafjfprd9ozekur9aormh2toxq4nyuusw2kb9s%2bo%3duvz4xcr%3...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2017-01-19 Thread vinayak

On 2017/01/18 14:15, Haribabu Kommi wrote:

On Sat, Jan 14, 2017 at 2:58 AM, Dilip Kumar > wrote:

On Wed, Jan 11, 2017 at 7:47 PM, Dilip Kumar
mailto:dilipbal...@gmail.com>> wrote:
> +void
> +pgstat_count_sqlstmt(const char *commandTag)
> +{
> + PgStat_SqlstmtEntry *htabent;
> + bool found;
> +
> + if (!pgstat_track_sql)
> + return
>
> Callers of pgstat_count_sqlstmt are always ensuring that
> pgstat_track_sql is true, seems it's redundant here.

I have done testing of the feature, it's mostly working as per the
expectation.

Thanks for the review and test.

The use case for this view is to find out the number of different SQL 
statements
that are getting executed successfully on an instance by the 
application/user.

I have few comments/questions.

If you execute the query with EXECUTE then commandTag will be EXECUTE
that way we will not show the actual query type, I mean all the
statements will get the common tag "EXECUTE".

You can try this.
PREPARE fooplan (int) AS INSERT INTO t VALUES($1);
EXECUTE fooplan(1);

--

Yes, that's correct. Currently the count is incremented based on the 
base command,
because of this reason, the EXECUTE is increased, instead of the 
actual operation.

+ /* Destroy the SQL statement counter stats HashTable */
+ hash_destroy(pgstat_sql);
+
+ /* Create SQL statement counter Stats HashTable */

I think in the patch we have used mixed of "Stats HashTable" and
"stats HashTable", I think better
to be consistent throughout the patch. Check other similar instances.

Corrected.

@@ -1102,6 +1102,12 @@ exec_simple_query(const char *query_string)

  PortalDrop(portal, false);

+ /*
+ * Count SQL statement for pg_stat_sql view
+ */
+ if (pgstat_track_sql)
+ pgstat_count_sqlstmt(commandTag);

We are only counting the successful SQL statement, Is that
intentional?

Yes, I thought of counting the successful SQL operations that changed the
database over a period of time. I am not sure whether there will be many
failure operations that can occur on an instance to be counted.

--
I have noticed that pgstat_count_sqlstmt is called from
exec_simple_query and exec_execute_message. Don't we want to track the
statement executed from functions?
---

The view is currently designed to count user/application initiated SQL 
statements,
but not the internal SQL queries that are getting executed from 
functions and etc.

>>+void
>>+pgstat_count_sqlstmt(const char *commandTag)
>>+{
>>+ PgStat_SqlstmtEntry *htabent;
>>+ bool found;
>>+
>>+ if (!pgstat_track_sql)
>>+ return
>
>Callers of pgstat_count_sqlstmt are always ensuring that
>pgstat_track_sql is true, seems it's redundant here.

Removed the check in pgstat_count_sqlstmt statement and add it 
exec_execute_message

function where the check is missed.

Updated patch attached.

I have reviewed the patch. All the test cases works fine.

Regards,
Vinayak Pokale
NTT Open Source Software Center

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Etsuro Fujita


On 2017/01/19 12:26, Ashutosh Bapat wrote:

On Thu, Jan 19, 2017 at 2:14 AM, Robert Haas  wrote:

On Fri, Jan 13, 2017 at 6:22 AM, Etsuro Fujita
 wrote:

My biggest concern about GetExistingLocalJoinPath is that might not be
extendable to the case of foreign-join paths with parameterization; in which
case, fdw_outerpath for a given foreign-join path would need to have the
same parameterization as the foreign-join path, but there might not be any
existing paths with the same parameterization in the path list.



I agree that this is a problem.



Effectively, it means that foreign join path creation will have a
parameterization different (per add_path()) from any local join
produced. But why would it be so?


I think it's better to give the FDW a chance to do that because the FDW 
might have more knowledge about the parameterization for joinrels than core.



The parameterization bubbles up from
the base relation. The process for creating parameterized local and
foreign paths for a base relation is same. Thus we will have same
parameterizations considered for foreign as well as local joins. Those
different parameterizations will be retained add_path(), so we should
find one there


Is that right?  I think there would be cases where we can't find one 
because add_path removes paths dominated by others.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Keith Fiske

So testing things out in pg_partman for native sub-partitioning and ran
into what is a bug for me that I know I have to fix, but I'm curious if
this can be prevented in the first place within the native partitioning
code itself. The below shows a sub-partitioning set where the sub-partition
has a constraint range that is outside of the range of its parent. If the
columns were different I could see where this would be allowed, but the
same column is used throughout the levels of sub-partitioning.
Understandable if that may be too complex to check for, but figured I'd
bring it up as something I accidentally ran into in case you see an easy
way to prevent it.

This was encountered as of commit ba61a04bc7fefeee03416d9911eb825c4897c223.

keith@keith=# \d+ partman_test.time_taptest_table
  Table "partman_test.time_taptest_table"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p2015 FOR VALUES FROM
('2015-01-01 00:00:00-05') TO ('2016-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2016 FOR VALUES FROM
('2016-01-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2017 FOR VALUES FROM
('2017-01-01 00:00:00-05') TO ('2018-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2018 FOR VALUES FROM
('2018-01-01 00:00:00-05') TO ('2019-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2019 FOR VALUES FROM
('2019-01-01 00:00:00-05') TO ('2020-01-01 00:00:00-05')

keith@keith=# \d+ partman_test.time_taptest_table_p2015
   Table "partman_test.time_taptest_table_p2015"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition of: partman_test.time_taptest_table FOR VALUES FROM ('2015-01-01
00:00:00-05') TO ('2016-01-01 00:00:00-05')
Partition key: RANGE (col3)
Partitions: partman_test.time_taptest_table_p2015_p2016_11 FOR VALUES FROM
('2016-11-01 00:00:00-04') TO ('2016-12-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2016_12 FOR VALUES FROM
('2016-12-01 00:00:00-05') TO ('2017-01-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_01 FOR VALUES FROM
('2017-01-01 00:00:00-05') TO ('2017-02-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_02 FOR VALUES FROM
('2017-02-01 00:00:00-05') TO ('2017-03-01 00:00:00-05'),
partman_test.time_taptest_table_p2015_p2017_03 FOR VALUES FROM
('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04')

keith@keith=# \d+ partman_test.time_taptest_table_p2015_p2017_03
   Table
"partman_test.time_taptest_table_p2015_p2017_03"
 Column |   Type   | Collation | Nullable | Default |
Storage  | Stats target | Description
+--+---+--+-+--+--+-
 col1   | integer  |   |  | |
plain|  |
 col2   | text |   |  | |
extended |  |
 col3   | timestamp with time zone |   | not null | now()   |
plain|  |
Partition of: partman_test.time_taptest_table_p2015 FOR VALUES FROM
('2017-03-01 00:00:00-05') TO ('2017-04-01 00:00:00-04')

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com

Re: [HACKERS] SearchSysCache, SysCacheGetAttr, and heap_getattr()

2017-01-19 Thread Tom Lane

Stephen Frost  writes:
> There's some inconsistency when it comes to if we actually use
> SysCacheGetAttr() when pulling an attribute for a tuple we got via
> SearchSysCache(), or if we use heap_getattr().

> Maybe I'm missing something, but that seems less than ideal.

Well, SysCacheGetAttr just invokes heap_getattr using a tuple descriptor
obtained from the syscache entry.  AFAICT the point of it is that callers
need not lay their hands on a tuple descriptor for the relevant system
catalog some other way.

> I've generally been under the belief that using heap_getattr() is 'ok' when
> we've already opened and locked the relation, but there are some other
> checks done through SysCacheGetAttr() that you don't get with
> heap_getattr()...

Basically only that you supplied a valid cacheID, AFAICS.

> In short, should we be fixing these cases to always use
> SysCacheGetAttr() when working with a tuple returned by
> SearchSysCache()?

I can't get excited about it unless the caller is heap_open'ing
the catalog just to get a tupdesc for this purpose.  Then it'd
be worth changing so you could remove the heap_open/heap_close.

If the caller has the catalog opened because it's going to do an
insert/update/delete, you could argue about whether it's stylistically
better to use a tupdesc from the syscache or one from the relation.
I think that might be a case-by-case decision, but I'd lean to using
a tupdesc from the relation when preparing tuples to be stored there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Etsuro Fujita


On 2017/01/18 20:34, Ashutosh Bapat wrote:

On Wed, Jan 18, 2017 at 8:18 AM, Etsuro Fujita
 wrote:

Done.  Attached is the new version.  I also adjusted the code a bit further.



With this patch there are multiple cases, where CreateLocalJoinPath()
would return NULL and hence postgres_fdw would not push a join down
for example
/*
 * (1) if either cheapest-total path is parameterized by the
 * other rel, we can't generate a hashjoin/mergejoin path, and
 * (2) proposed hashjoin/mergejoin path is still parameterized
 * (ie, the required_outer set calculated by
 * calc_non_nestloop_required_outer isn't NULL), it's not what
 * we want; which means that both the cheapest-total paths
 * should be unparameterized.
 */
if (outer_path->param_info || inner_path->param_info)
return NULL;
or
/*
 * If the cheapest-total outer path is parameterized by the
 * inner rel, we can't generate a nestloop path.  (There's no
 * use looking for alternative outer paths, since it should
 * already be the least-parameterized available path.)
 */
if (PATH_PARAM_BY_REL(outer_path, innerrel))
return NULL;
/* If proposed path is still parameterized, don't return it. */
required_outer = calc_nestloop_required_outer(outer_path,
  inner_path);
if (required_outer)
{
bms_free(required_outer);
return NULL;
}

Am I right?

The earlier version of this function GetExistingLocalJoinPath() used
to return a local path in those cases and hence postgres_fdw was able
to push down corresponding queries. So, we are introducing a
performance regression.


Really?  My understanding is: postgres_fdw will never have those cases 
because it can always get the cheapest-total paths that are unparameterized.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SearchSysCache, SysCacheGetAttr, and heap_getattr()

2017-01-19 Thread Stephen Frost

Greetings,

There's some inconsistency when it comes to if we actually use
SysCacheGetAttr() when pulling an attribute for a tuple we got via
SearchSysCache(), or if we use heap_getattr().

Maybe I'm missing something, but that seems less than ideal.  I've
generally been under the belief that using heap_getattr() is 'ok' when
we've already opened and locked the relation, but there are some other
checks done through SysCacheGetAttr() that you don't get with
heap_getattr()...

In short, should we be fixing these cases to always use
SysCacheGetAttr() when working with a tuple returned by
SearchSysCache()?

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Amit Langote

On 2017/01/20 4:18, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote wrote:
>> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch
>>
>> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
>> it's possible for a different TupleTableSlot to be used for partitioned
>> tables at successively lower levels.  If we do end up changing the slot
>> from the original, we must update ecxt_scantuple to point to the new one
>> for partition key of the tuple to be computed correctly.
>>
>> Last posted here:
>> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp
> 
> Why does FormPartitionKeyDatum need this?  Could that be documented in
> a comment in here someplace, perhaps the header comment to
> FormPartitionKeyDatum?

FormPartitionKeyDatum() computes partition key expressions (if any) and
the expression evaluation machinery expects ecxt_scantuple to point to the
tuple to extract attribute values from.

FormPartitionKeyDatum() already has a tiny comment, which it seems is the
only thing we could say here about this there:

* the ecxt_scantuple slot of estate's per-tuple expr context must point to
* the heap tuple passed in.

In get_partition_for_tuple() which calls FormPartitionKeyDatum(), the
patch adds this comment (changed it a little from the last version):

+ /*
+  * Extract partition key from tuple. Expression evaluation machinery
+  * that FormPartitionKeyDatum() invokes expects ecxt_scantuple to
+  * point to the correct tuple slot.  The slot might have changed from
+  * what was used for the parent table if the table of the current
+  * partitioning level has different tuple descriptor from the parent.
+  * So update ecxt_scantuple accordingly.
+  */
+ ecxt->ecxt_scantuple = slot;
FormPartitionKeyDatum(parent, slot, estate, values, isnull);

It says why we need to change which slot ecxt_scantuple points to.

>> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch
>>
>> Automatically updatable views failed to handle partitioned tables.
>> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
>> the WCO expressions having been suitably converted for each partition
>> (think applying map_partition_varattnos to Vars in the WCO expressions
>> just like with partition constraint expressions).
> 
> The changes to execMain.c contain a hunk which has essentially the
> same code twice.  That looks like a mistake.  Also, the patch doesn't
> compile because convert_tuples_by_name() takes 3 arguments, not 4.

Actually, I realized that and sent the updated version [1] of this patch
that fixed this issue.  In the updated version, I removed that code block
(the 2 copies of it), because we are still discussing what to do about
showing tuples in constraint violation (in this case, WITH CHECK OPTION
violation) messages.  Anyway, attached here again.

>> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
>>
>> Currently, the tuple conversion is performed after a tuple is routed,
>> even if the attributes of a target leaf partition map one-to-one with
>> those of the root table, which is wasteful.  Avoid that by making
>> convert_tuples_by_name() return a NULL map for such cases.
> 
> +Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid);
> 
> I think you mean Assert(consider_typeid || indesc->tdhasoid ==
> outdesc->tdhasoid);

Ah, you're right.

> But I wonder why we don't instead just change this function to
> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
> comparing the type OIDs is to find out whether the table-has-OIDs
> setting matches, we could instead test that directly and avoid needing
> to pass an extra argument.  I wonder if there's some other reason this
> code is there which is not documented in the comment...

With the following patch, regression tests run fine:

  if (indesc->natts == outdesc->natts &&
- indesc->tdtypeid == outdesc->tdtypeid)
+ indesc->tdhasoid != outdesc->tdhasoid)
 {

If checking tdtypeid (instead of tdhasoid directly) has some other
consideration, I'd would have seen at least some tests broken by this
change.  So, if we are to go with this, I too prefer it over my previous
proposal to add an argument to convert_tuples_by_name().  Attached 0003
implements the above approach.

> Phew.  Thanks for all the patches, sorry I'm having trouble keeping up.

Thanks a lot for taking time to look through them and committing.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/92fe2a71-5eb7-ee8d-53ef-cfd5a65dfc3d%40lab.ntt.co.jp
>From c7088290221a9fe0818139145b7bdf6731bc419a Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 28 Dec 2016 10:10:26 +0900
Subject: [PATCH 1/3] Set ecxt_scantuple correctly for tuple-routing

In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
it's possible for a different TupleTableSlot to be used for partitioned
tables at successively lower levels.  If we do end up changing the slot
fro

Re: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-01-19 Thread Tom Lane

Peter Geoghegan  writes:
> A customer is on 9.6.1, and complains of a segfault observed at least
> 3 times. In all cases, the logs look like this:
> ...
> I can use GDB to get details of the instruction pointer that appeared
> in the kernel trap error, which shows a function from the expanded
> value representation infrastructure:

It would help to know the data types of the columns involved in this
query; but just eyeballing it, it doesn't look like it involves any
array operations, so it's pretty hard to believe that the expanded-object
code could have gotten invoked intentionally.  (The mere presence of
an array column somewhere in the vicinity would not do that; you'd
need to invoke an array-ish operation, or at least pass the array into
a plpgsql function.)

If I had to bet on the basis of this much info, I would bet that the
parallel-query infrastructure is dropping the ball somewhere and
transmitting a corrupted datum that accidentally looks like it is
an expanded-object reference.

If $customer wants a quick fix, I'd suggest seeing whether disabling
parallel query makes the problem go away.  That might be a good first
step anyway, just to narrow down where the problem lies.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)

2017-01-19 Thread Haribabu Kommi

On Wed, Jan 18, 2017 at 2:25 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 12/29/16 10:55 PM, Haribabu Kommi wrote:
> > Fujitsu was interested in developing a columnar storage extension with
> > minimal
> > changes the server backend.
> >
> > The columnar store is implemented as an extension using index access
> > methods.
> > This can be easily enhanced with pluggable storage methods once they are
> > available.
> >
> > A new index method (VCI) is added to create columnar index on the table.
>
> I'm confused.  You say that you are adding an index access method, for
> which we have a defined extension mechanism, but the code doesn't do
> that.  Instead, it sprinkles a bunch of hooks through the table access
> code.  So you are really adding ways to add alternatives to heap
> storage, except we have no way to know whether these hooks have been
> designed with any kind of generality in mind.  So is it an index access
> method or a table access method?
>

Yes, it is a mix of both index and table access methods. The current design
of Vertical clustered index needs both access methods, because of this
reason
we used both access methods.

Either way, you shouldn't need a new relkind.  Note that all indexes
> have the same relkind, even if they use different access methods.
>
> I think there are two ways to integrate column storage into PostgreSQL:
> One is to use the FDW interface.  That has been done before, see
> cstore_fdw.  The other is to define a storage manager extension
> interface.  That has been tried but has not been completed yet.  Adding
> a bunch of custom hooks all over the place seems worse than both of those.
>

Thanks for your suggestion. Yes, I also agree that the best way to integrate
column storage for a better performance is through storage manager extension
interface.

It is better first try to finish the pluggable storage interface and
integrate this
columnar store is a good way to proceed.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-01-19 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 19 Jan 2017 18:37:31 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20170119.183731.223893446.horiguchi.kyot...@lab.ntt.co.jp>
> > > - Delaying recycling a segment until the last partial record on it
> > >   completes. This seems doable in page-wise (coarse resolution)
> > >   but would cost additional reading of past xlog files (page
> > >   header of past pages is required).
> > 
> > Hm, yes. That looks like the least invasive way to go. At least that
> > looks more correct than the others.
> 
> The attached patch does that. Usually it reads page headers only
> on segment boundaries, but once continuation record found (or
> failed to read the next page header, that is, the first record on
> the first page in the next segment has not been replicated), it
> becomes to happen on every page boundary until non-continuation
> page comes.
> 
> I leave a debug info (at LOG level) in the attached file shown on
> every state change of keep pointer. At least for pgbench, the
> cost seems ignorable.

I revised it. It became neater and less invasive.

 - Removed added keep from struct WalSnd. It is never referrenced
   from other processes. It is static variable now.

 - Restore keepPtr from replication slot on starting.

 - Moved the main part to more appropriate position.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index f3082c3..0270474 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -185,6 +185,12 @@ static volatile sig_atomic_t replication_active = false;
 static LogicalDecodingContext *logical_decoding_ctx = NULL;
 static XLogRecPtr logical_startptr = InvalidXLogRecPtr;
 
+/*
+ * Segment keep pointer for physical slots. Has a valid value only when it
+ * differs from the current flush pointer.
+ */
+static XLogRecPtr	   keepPtr = InvalidXLogRecPtr;
+
 /* Signal handlers */
 static void WalSndSigHupHandler(SIGNAL_ARGS);
 static void WalSndXLogSendHandler(SIGNAL_ARGS);
@@ -217,7 +223,7 @@ static void WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, Tran
 static void WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid, bool last_write);
 static XLogRecPtr WalSndWaitForWal(XLogRecPtr loc);
 
-static void XLogRead(char *buf, XLogRecPtr startptr, Size count);
+static bool XLogRead(char *buf, XLogRecPtr startptr, Size count, bool noutfoundok);
 
 
 /* Initialize walsender process before entering the main command loop */
@@ -538,6 +544,9 @@ StartReplication(StartReplicationCmd *cmd)
 			ereport(ERROR,
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 (errmsg("cannot use a logical replication slot for physical replication";
+
+		/* Restore keepPtr from replication slot */
+		keepPtr = MyReplicationSlot->data.restart_lsn;
 	}
 
 	/*
@@ -553,6 +562,10 @@ StartReplication(StartReplicationCmd *cmd)
 	else
 		FlushPtr = GetFlushRecPtr();
 
+	/* Set InvalidXLogRecPtr if catching up */
+	if (keepPtr == FlushPtr)
+		keepPtr = InvalidXLogRecPtr;
+	
 	if (cmd->timeline != 0)
 	{
 		XLogRecPtr	switchpoint;
@@ -774,7 +787,7 @@ logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int req
 		count = flushptr - targetPagePtr;
 
 	/* now actually read the data, we know it's there */
-	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ);
+	XLogRead(cur_page, targetPagePtr, XLOG_BLCKSZ, false);
 
 	return count;
 }
@@ -1551,7 +1564,7 @@ static void
 ProcessStandbyReplyMessage(void)
 {
 	XLogRecPtr	writePtr,
-flushPtr,
+flushPtr, oldFlushPtr,
 applyPtr;
 	bool		replyRequested;
 
@@ -1580,6 +1593,7 @@ ProcessStandbyReplyMessage(void)
 		WalSnd	   *walsnd = MyWalSnd;
 
 		SpinLockAcquire(&walsnd->mutex);
+		oldFlushPtr = walsnd->flush;
 		walsnd->write = writePtr;
 		walsnd->flush = flushPtr;
 		walsnd->apply = applyPtr;
@@ -1597,7 +1611,78 @@ ProcessStandbyReplyMessage(void)
 		if (SlotIsLogical(MyReplicationSlot))
 			LogicalConfirmReceivedLocation(flushPtr);
 		else
-			PhysicalConfirmReceivedLocation(flushPtr);
+		{
+			/*
+			 * On recovery, a continuation reocrd must be available from
+			 * single WAL source. So physical replication slot should stay in
+			 * the first segment for a continuation record spanning multiple
+			 * segments. Since this doesn't look into individual record,
+			 * keepPtr may stay a bit too behind.
+			 *
+			 * Since the objective is avoding to remove required segments,
+			 * checking every segment is enough. But once keepPtr goes behind,
+			 * check every page for quick restoration.
+			 *
+			 * keepPtr has a valid value only when it is behind flushPtr.
+			 */
+			if (oldFlushPtr != InvalidXLogRecPtr &&
+(keepPtr == InvalidXLogRecPtr ?
+ oldFlushPtr / XLOG_SEG_SIZE != flushPtr / XLOG_SEG_SIZE :
+ keepPtr / XLOG_BLCKSZ != flushPtr / XLOG_BLCKSZ))
+			{
+XLogRecPtr rp;
+XLogRecPtr oldKeepPtr = keepPtr; /* for debug */

[HACKERS] Failure on sittella

2017-01-19 Thread Andres Freund

Hi Andrew, All,

There's a buildfarm failure (just on sittella so far) that I can't quite
interpret from here. Timing wise it looks like it might be correlated
with ea15e18677fc2eff3135023e27f69ed8821554ed - "Remove obsoleted code
relating to targetlist SRF evaluation." but I'm not quite seeing the
connection.

Any chance you could get a backtrace for the failure?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Stephen Frost

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-19 20:45:57 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > > > WAL replay does do more work, generally speaking (the WAL has to be
> > > > read, the checksum validated on it, and then the write has to go out,
> > > > while the checkpointer just writes the page out from memory), but it's
> > > > also dealing with less contention on the system (there aren't a bunch of
> > > > backends hammering the disks to pull data in with reads when you're
> > > > doing crash recovery...).
> > > 
> > > There's a huge difference though: WAL replay is single threaded, whereas
> > > generating WAL is not.  
> > 
> > I'm aware- but *checkpointing* is still single-threaded, unless, as I
> > mentioned, you end up with backends pushing out their own changes to the
> > heap to make room for new pages to come in.
> 
> Sure, but buffer checkpointing isn't necessarily that large a portion of
> the work done in one checkpoint cycle, in comparison to all the WAL
> being generated.  Quite commonly a lot of the buffers will already have
> been flushed to disk by backend and/or bgwriter, and are clean by the
> time checkpointer gets to them.  So I don't think checkpointer being
> single threaded necessarily means much WRT replay performance.

Yes, good point, we also have the bgwriter going through and helping.

> > > Especially if there's synchronous IO required
> > > (most commonly reading in data, because more data was modified in the
> > > current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> > > buffers), you can be significantly slower than generating the WAL.
> > 
> > That is an interesting point, if I'm following what you're saying
> > correctly- during the replay we can end up having more pages modified
> > than fit in shared buffers, which means that we have to read back in
> > pages that we pushed out to implement the non-FPI WAL changes to that
> > page.
> 
> Right. (And not just during replay obviously, also during the intial WAL
> generation).

Sure.

> > I wonder if we should have a way to configure the amount of memory
> > allowed to be used for WAL replay, independent of shared_buffers?
> 
> I don't quite see how that'd work, especially with HS.  We just use the
> normal shared buffers code etc, and there we can't just resize the
> amount of shared_buffers allocated after doing crash recovery.

It wouldn't work with HS (or, at least, I have no idea how it would).  I
was specifically thinking about *just* during crash recovery there
(sorry that I didn't make that clear), and my thought was that we'd just
allocate the memory locally, not as shared memory, and then drop the
whole thing and allocate shared_buffers after crash recovery was done.

Obviously, this is a lot of hand-waving, but that's what I was
thinking.

> > That said, I wonder if our eviction algorithm could be
> > improved/changed when performing WAL replay too to reduce the chances
> > that we'll have to read a page back in.
> 
> I don't think that's a that promising angle of attach. Having a separate
> pre-fetching backend that parses the WAL and pre-reads everything
> necessary seems more promising.

I agree, that would be helpful and could help with HS too, which I agree
is an important piece.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Parallel Index Scans

2017-01-19 Thread Haribabu Kommi

On Thu, Jan 19, 2017 at 1:18 AM, Amit Kapila 
wrote:

> On Wed, Jan 18, 2017 at 6:25 AM, Haribabu Kommi
>  wrote:
> >
> >
> > On Mon, Jan 16, 2017 at 11:11 PM, Amit Kapila 
> > wrote:
> >>
> >
> > + * index_beginscan_parallel - join parallel index scan
> >
> > The name and the description doesn't sync properly, any better
> description?
> >
>
> This can be called by both the worker and leader of parallel index
> scan.  What problem do you see with it.  heap_beginscan_parallel has
> similar description, so not sure changing here alone makes sense.
>

Ok.


>
> >
> > +extern BlockNumber _bt_parallel_seize(IndexScanDesc scan, bool
> *status);
> > +extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber
> > scan_page);
> >
> > Any better names for the above functions, as these function will
> provide/set
> > the next page that needs to be read.
> >
>
> These functions also set the state of scan.  IIRC, these names were
> being agreed between Robert and Rahila as well (suggested offlist by
> Robert).  I am open to change if you or others have any better
> suggestions.
>

I didn't find any better names other than the following,

_bt_get_next_parallel_page
_bt_set_next_parallel_page


> >
> > + /* reset (parallel) index scan */
> > + if (node->iss_ScanDesc)
> > + {
> >
> > Why this if check required? There is an assert check in later function
> > calls.
> >
>
> This is required because we don't initialize the scan descriptor for
> parallel-aware nodes during ExecInitIndexScan.  It got initialized
> later at the time of execution when we initialize dsm.  Now, it is
> quite possible that Gather node can occur on inner side of join in
> which case Rescan can be called before even execution starts. This is
> the reason why we have similar check in ExecReScanSeqScan which is
> added during parallel sequential scans (f0661c4e). Does that answer
> your question?


Thanks for the details. got it.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-19 Thread Haribabu Kommi

On Thu, Jan 19, 2017 at 11:28 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Thu, Jan 19, 2017 at 1:26 PM, Michael Paquier
>  wrote:
> > On Thu, Jan 19, 2017 at 4:25 PM, Haribabu Kommi
> >  wrote:
> >> Added the cleanup mechanism. But the tokenize_file() function call
> >> present in many places, But in one flow still it is possible to have
> >> file descriptor leak because of pg_hba_rules view. Because of this
> >> reason, added the cleanup everywhere.
> >
> > Oops, sorry. Actually you don't need that. AllocateFile() registers
> > the fd opened with the sub-transactions it is involved with... So if
> > there is an ERROR nothing leaks.
>
> I agree. If we need any fix, it should be a separate patch.
>
> The patch is in much better shape than previous versions. Thanks for
> working on it.
>

Thanks for the review.

Here are some more review comments.
> 'indicates' should be used instead of 'indicating'
> +  
> +   If the configuration file contains any problems,
> error field
> +   indicating the problem of that rule. Following is the sample
> output of the view.
> +  
> The first sentence may be rewritten as
> error field, if not NULL, describes problem in
> the rule on the line line_number.
>

Changed accordingly.


> Instead of showing same values like {all}, trust on multiple lines, you may
> show an example with different values on different lines.
> +
> + line_number | type  | database | user_name | auth_method
> +-+---+--+---+-
> +  84 | local | {all}| {all} | trust
> +  86 | host  | {all}| {all} | trust
> +  88 | host  | {all}| {all} | trust
> +(3 rows)
> +
>

Added more rows with different options.

getauthmethod() deparses the authentication tokens parsed in
> parse_hba_line()
> starting with /* Get the authentication method */. There is less chance
> that
> those tokens would be changed later, but we might need adjustments when new
> methods are added or method names are changed. Instead, we might want to
> create
> an array of token where nth token indicates auth_method = n. The code
> block in
> parse_hba_line() can be changed to look up this array and assign index of
> the
> token if found to auth_method. Token which are enabled by compiler flags
> will
> be part of the array only when that flag is enabled, otherwise they will be
> NULL.
> #ifdef ENABLE_GSS
> parsedline->auth_method = uaGSS;
> #else
> unsupauth = "gss";
> #endif
> If we do that getauthmethod() simply fetches the token by referencing array
> with auth_method as index, with some special handling for uaImplicitReject.
> This will take away any future maintenance needed. Something similar can be
> done to conntype.
>

Thanks for the improvement suggestion.
I am thinking of whether is it really required, as because we rarely change,
the name of authentication option that is already exposed and also added new
options can easily found by the compiler in case if it is missed to add.



> This is not going to help in binary without CASSERT i.e. for most users, if
> they provide more than 12 options, albeit resulting in an error. Please
> convert
> this into an elog() or another error that hba parser throws.
> +Assert(noptions <= MAX_OPTIONS);


No. In case if user provides more than 12 options that are invalid, during
the parsing
itself, it identifies that it is an invalid option and error string is
stored in error filed.

The Assert case can be hit only, when the user added to new options to
display
to the user through view but not updating the macro to the max number of
options
then, it can lead to that assert.

Updated patch attached including reverting of file leak changes.

Regards,
Hari Babu
Fujitsu Australia


pg_hba_rules_10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Andres Freund

On 2017-01-19 20:45:57 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > > WAL replay does do more work, generally speaking (the WAL has to be
> > > read, the checksum validated on it, and then the write has to go out,
> > > while the checkpointer just writes the page out from memory), but it's
> > > also dealing with less contention on the system (there aren't a bunch of
> > > backends hammering the disks to pull data in with reads when you're
> > > doing crash recovery...).
> > 
> > There's a huge difference though: WAL replay is single threaded, whereas
> > generating WAL is not.  
> 
> I'm aware- but *checkpointing* is still single-threaded, unless, as I
> mentioned, you end up with backends pushing out their own changes to the
> heap to make room for new pages to come in.

Sure, but buffer checkpointing isn't necessarily that large a portion of
the work done in one checkpoint cycle, in comparison to all the WAL
being generated.  Quite commonly a lot of the buffers will already have
been flushed to disk by backend and/or bgwriter, and are clean by the
time checkpointer gets to them.  So I don't think checkpointer being
single threaded necessarily means much WRT replay performance.

> > Especially if there's synchronous IO required
> > (most commonly reading in data, because more data was modified in the
> > current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> > buffers), you can be significantly slower than generating the WAL.
> 
> That is an interesting point, if I'm following what you're saying
> correctly- during the replay we can end up having more pages modified
> than fit in shared buffers, which means that we have to read back in
> pages that we pushed out to implement the non-FPI WAL changes to that
> page.

Right. (And not just during replay obviously, also during the intial WAL
generation).

> I wonder if we should have a way to configure the amount of memory
> allowed to be used for WAL replay, independent of shared_buffers?

I don't quite see how that'd work, especially with HS.  We just use the
normal shared buffers code etc, and there we can't just resize the
amount of shared_buffers allocated after doing crash recovery.

> That said, I wonder if our eviction algorithm could be
> improved/changed when performing WAL replay too to reduce the chances
> that we'll have to read a page back in.

I don't think that's a that promising angle of attach. Having a separate
pre-fetching backend that parses the WAL and pre-reads everything
necessary seems more promising.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Stephen Frost

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> > WAL replay does do more work, generally speaking (the WAL has to be
> > read, the checksum validated on it, and then the write has to go out,
> > while the checkpointer just writes the page out from memory), but it's
> > also dealing with less contention on the system (there aren't a bunch of
> > backends hammering the disks to pull data in with reads when you're
> > doing crash recovery...).
> 
> There's a huge difference though: WAL replay is single threaded, whereas
> generating WAL is not.  

I'm aware- but *checkpointing* is still single-threaded, unless, as I
mentioned, you end up with backends pushing out their own changes to the
heap to make room for new pages to come in.  Or is there some other way
the checkpoint ends up being performed with multiple processes?

> Especially if there's synchronous IO required
> (most commonly reading in data, because more data was modified in the
> current checkpointthan fit in shared buffers, so FPIs don't pre-fill
> buffers), you can be significantly slower than generating the WAL.

That is an interesting point, if I'm following what you're saying
correctly- during the replay we can end up having more pages modified
than fit in shared buffers, which means that we have to read back in
pages that we pushed out to implement the non-FPI WAL changes to that
page.  I wonder if we should have a way to configure the amount of
memory allowed to be used for WAL replay, independent of shared_buffers?

I mean, really, during crash recovery on a dedicated database box, you'd
probably want to say "ALL the memory can be used if it makes crash
recovery faster!".  That said, I wonder if our eviction algorithm could
be improved/changed when performing WAL replay too to reduce the chances
that we'll have to read a page back in.

Very interesting.

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-01-19 Thread Peter Geoghegan

A customer is on 9.6.1, and complains of a segfault observed at least
3 times. In all cases, the logs look like this:

Jan 11 16:11:07 ip-10-0-118-82 kernel: [41913.530453] traps:
postgres[6561] general protection ip:55fcf08b0491 sp:7ffc17dfa650
error:0 in postgres[55fcf0557000+638000]
Jan 11 16:11:07 ip-10-0-118-82 postgresql[7]: [4-1] LOG:  server
process (PID 643) was terminated by signal 11: Segmentation fault
Jan 11 16:11:07 ip-10-0-118-82 postgresql[7]: [4-2] DETAIL:  Failed
process was running: SELECT COUNT(count_column) FROM (SELECT  1 AS
count_column FROM "albums" INNER JOIN "album_photo_assignments" ON
"albums"."id" = "album_photo_assignments"."album_id" WHERE
"albums"."deleted_at" IS NULL AND "album_photo_assignments"."photo_id"
= $1 LIMIT 1000 OFFSET 0) subquery_for_count
Jan 11 16:11:07 ip-10-0-118-82 postgresql[7]: [5-1] LOG:  terminating
any other active server processes
Jan 11 16:11:07 ip-10-0-118-82 postgresql[80]: [6-1] WARNING:
terminating connection because of crash of another server process
Jan 11 16:11:07 ip-10-0-118-82 postgresql[80]: [6-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
Jan 11 16:11:07 ip-10-0-118-82 postgresql[80]: [6-3] HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.

The "general protection ip:55fcf08b0491 sp:7ffc17dfa650 error:0 in
postgres[55fcf0557000+638000]" details are identical in all 3
segfaults observed. Note that the kernel in question is
"4.4.0-57-generic #78~14.04.1-Ubuntu". These crashes were days apart,
but I have yet to isolate the problem. The customer tells me that this
is a representative query plan:

explain SELECT COUNT(count_column)
FROM (SELECT  1 AS count_column
FROM "albums"
INNER JOIN "album_photo_assignments" ON "albums"."id" =
"album_photo_assignments"."album_id"
WHERE "albums"."deleted_at" IS NULL AND
"album_photo_assignments"."photo_id" =
'fd0bcbb3-9c4c-4036-954a-cf01935d004c'
LIMIT 1000 OFFSET 0)
subquery_for_count ;
 QUERY PLAN
─
 Aggregate  (cost=16419.94..16419.94 rows=1 width=8)
   ->  Limit  (cost=1000.06..16419.94 rows=1 width=4)
 ->  Nested Loop  (cost=1000.06..16419.94 rows=1 width=4)
   ->  Gather  (cost=1000.00..16415.87 rows=1 width=16)
 Workers Planned: 1
 ->  Parallel Seq Scan on album_photo_assignments
(cost=0.00..15415.77 rows=1 width=16)
   Filter: (photo_id =
'fd0bcbb3-9c4c-4036-954a-cf01935d004c'::uuid)
   ->  Index Scan using albums_pkey on albums
(cost=0.06..4.06 rows=1 width=16)
 Index Cond: (id = album_photo_assignments.album_id)
 Filter: (deleted_at IS NULL)
(10 rows)

Note that this isn't a prepared statement, unlike the problematic
query we see from the logs. And, I've not yet managed to verify that
the problem case involved execution of this particular "$1 substitute
constant", since that isn't displayed in the logs. So, while this
query plan might well be representative of the problem, it may also
not be.

I can use GDB to get details of the instruction pointer that appeared
in the kernel trap error, which shows a function from the expanded
value representation infrastructure:

(gdb) info symbol 0x55fcf08b0491
EOH_get_flat_size + 1 in section .text of /usr/lib/postgresql/9.6/bin/postgres
(gdb) info symbol 0x55fcf08b0490
EOH_get_flat_size in section .text of /usr/lib/postgresql/9.6/bin/postgres
(gdb) disassemble 0x55fcf08b0490
Dump of assembler code for function EOH_get_flat_size:
   0x55fcf08b0490 <+0>: push   %rbp
   0x55fcf08b0491 <+1>: mov0x8(%rdi),%rax
   0x55fcf08b0495 <+5>: mov%rsp,%rbp
   0x55fcf08b0498 <+8>: pop%rbp
   0x55fcf08b0499 <+9>: mov(%rax),%rax
   0x55fcf08b049c <+12>: jmpq   *%rax
End of assembler dump.

(I believe that Linux outputs VMA-wise addresses in the kernel trap
error quoted above, an address from the userland program (Postgres)
address space, so GDB does in fact display interesting disassembly --
the documentation on this seems extremely limited.)

Note that there is a text array column in the table "albums". I note
that there have been fixes in this area already, that are not in 9.6.1
(commit 77cd0dc, for one). I know next to nothing about this code, and
would appreciate some guidance.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Stephen Frost

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Fri, Jan 20, 2017 at 12:06 AM, Stephen Frost  wrote:
> > We did make the WAL checksum routines a lot
> > faster with 9.6, as I recall, so perhaps there's been some change there
> > too.
> 
> 9.5, commit 5028f22f with Abhijit's and Heikki's work on CRC-32 computations.

Oh, was it 9.5 where we added the code to use CPU OPs to speed up the
calculations..?  If so then that surprises me a bit, as, if I recall
correctly, I had a 9.5-based server where the replica simply couldn't
keep up with replay and was entirely CPU-bound..

I had thought those changes went into 9.6.  If they were in 9.5 then
that's possibly a bit disappointing.

Of course, this was a while ago, perhaps that was actually a 9.4
server..  mmm, will have to go find out.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Andres Freund

On 2017-01-19 10:06:09 -0500, Stephen Frost wrote:
> WAL replay does do more work, generally speaking (the WAL has to be
> read, the checksum validated on it, and then the write has to go out,
> while the checkpointer just writes the page out from memory), but it's
> also dealing with less contention on the system (there aren't a bunch of
> backends hammering the disks to pull data in with reads when you're
> doing crash recovery...).

There's a huge difference though: WAL replay is single threaded, whereas
generating WAL is not.  Especially if there's synchronous IO required
(most commonly reading in data, because more data was modified in the
current checkpointthan fit in shared buffers, so FPIs don't pre-fill
buffers), you can be significantly slower than generating the WAL.
Especially when you deal with SSDs, which can handle a lot of IO in
parallel, you can easily run into such issues.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Michael Paquier

On Fri, Jan 20, 2017 at 12:06 AM, Stephen Frost  wrote:
> We did make the WAL checksum routines a lot
> faster with 9.6, as I recall, so perhaps there's been some change there
> too.

9.5, commit 5028f22f with Abhijit's and Heikki's work on CRC-32 computations.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-19 Thread Michael Paquier

On Fri, Jan 20, 2017 at 12:08 AM, Karl O. Pinc  wrote:
> Is this an argument for having the current_logfiles always exist
> and be empty when there is no in-filesystem logfile?  It always felt
> to me that the code would be simpler that way.

Well, you'll need to do something in any case if the logging_collector
is found disabled and the syslogger process is restarted. So just
removing it looked cleaner to me. I am not strongly attached to one
way of doing or the other though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Tom Lane

I wrote:
> Hmm ... that line was last touched by ab1f0c822, so I'm betting that
> I broke it somehow, but I'm not sure how.
> It looks like S_3 might have been parsed from a totally empty source
> string?  But if that's the trigger, I'd think it'd be entirely trivial
> to reproduce.

Oh, duh: the reason it's not trivial to reproduce is you have to try
to bind an empty prepared statement *in an already-aborted transaction*.

Will push a fix in a bit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Jorge Solórzano

Yes, in fact it's a totally empty string:
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java#L1317

Jorge Solórzano
me.jorsol.com

On Thu, Jan 19, 2017 at 3:57 PM, Tom Lane  wrote:

> =?UTF-8?Q?Jorge_Sol=C3=B3rzano?=  writes:
> > After many tries I get this:
> > #0  0x00836950 in exec_bind_message
> (input_message=0x7ffce81048a0)
> > at postgres.c:1562
>
> Hmm ... that line was last touched by ab1f0c822, so I'm betting that
> I broke it somehow, but I'm not sure how.
>
> It looks like S_3 might have been parsed from a totally empty source
> string?  But if that's the trigger, I'd think it'd be entirely trivial
> to reproduce.
>
> regards, tom lane
>

Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-19 Thread Tom Lane

Andres Freund  writes:
> Maybe we ought to remove the paranoia bit about retset
> though - it's not paranoia if it has an effect.

Exactly, and I already did that in my version.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-01-19 Thread Nico Williams

On Sat, Dec 17, 2016 at 08:15:49PM -0600, Kevin Grittner wrote:
> On Sun, Dec 4, 2016 at 11:35 PM, Haribabu Kommi
>  wrote:
> > Moved to next CF with "waiting on author" status.
> 
> [...]

I hope what I've done about delta relations will be mostly irrelevant
given your patch (which I've not looked at in detail), but just FYI,
I've built an alternate, all-SQL-coded materialized view system that
captures deltas between refreshes and deltas from direct DMLs of the
materialized view:

https://github.com/twosigma/postgresql-contrib/blob/master/pseudo_mat_views.sql

There are some good ideas there, IMO, even if that implementation were
useless because of your patch.

Incidentally, it's really nice that PG has some "higher order" SQL
features that make this sort of thing easier.  In particular, here, row
values and record types, and being able to refer to a table as a column
of the table's record type.

Nico
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Causal reads take II

2017-01-19 Thread Thomas Munro

On Fri, Jan 20, 2017 at 3:01 AM, Ants Aasma  wrote:
> Yes there is a race even with all transactions having the same
> synchronization level. But nobody will mind if we some day fix that
> race. :)

We really should fix that!

> With different synchronization levels it is much trickier to
> fix as either async commits must wait behind sync commits before
> becoming visible, return without becoming visible or visibility order
> must differ from commit record LSN order. The first makes the async
> commit feature useless, second seems a no-go for semantic reasons,
> third requires extra information sent to standby's so they know the
> actual commit order.

Thought experiment:

1.  Log commit and make visible atomically (so DO and REDO agree on
visibility order).
2.  Introduce flag 'not yet visible' to commit record for sync rep commits.
3.  Introduce a new log record 'make all invisible commits up to LSN X
visible', which is inserted when enough sync rep standbys reply.  Log
this + make visible on primary atomically (again, so DO and REDO agree
on visibility order).
4.  Teach GetSnapshotData to deal with this using .

Now standby and primary agree on visibility order of async and sync
transactions, and no standby will allow you to see transactions that
the primary doesn't yet consider to be durable (ie flushed on a quorum
of standbys etc).  But... sync rep has to flush xlog twice on primary,
and standby has to wait to make things visible, and remote_apply would
either need to be changed or supplemented with a new level
remote_apply_and_visible, and it's not obvious how to actually do
atomic visibility + logging (I heard ProcArrayLock is kinda hot...).
Hmm.  Doesn't sound too popular...

>>> In CSN based snapshot
>>> discussions we came to the conclusion that to make standby visibility
>>> order match master while still allowing for async transactions to
>>> become visible before they are durable we need to make the commit
>>> sequence a vector clock and transmit extra visibility ordering
>>> information to standby's. Having one more level of delay between wal
>>> logging of commit and making it visible would make the problem even
>>> worse.
>>
>> I'd like to read that... could you please point me at the right bit of
>> that discussion?
>
> Some of that discussion was face to face at pgconf.eu, some of it is
> here: 
> https://www.postgresql.org/message-id/CA+CSw_vbt=cwluogr7gxdpnho_y4cz7x97+o_bh-rfo7ken...@mail.gmail.com
>
> Let me know if you have any questions.

Thanks!  That may take me some time...

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2017-01-19 Thread Robert Haas

On Wed, Jan 18, 2017 at 8:03 AM, Amit Kapila  wrote:
>> Why do we need separate AM support for index_parallelrescan()?  Why
>> can't index_rescan() cover that case?
>
> The reason is that sometime index_rescan is called when we have to
> just update runtime scankeys in index and we don't want to reset
> parallel scan for that.

Why not?  I see the code, but the comments don't seem to offer any
justification for that.  And it seems wrong to me.  If the scan keys
change, surely that only makes sense if we restart the scan.  You
can't just blindly continue the same scan if the keys have changed,
can you?

I think the reason that this isn't breaking for you is that it's
difficult or impossible to get a parallel index scan someplace where
the keys would change at runtime.  Normally, the parallel scan is on
the driving relation, and so there are no runtime keys.  We currently
have no way for a parallel scan to occur on the inner side of a nested
loop unless there's an intervening Gather node - and in that case the
keys can't change without relaunching the workers.  It's hard to see
how it could work otherwise.  For example, suppose you had something
like this:

Gather
-> Nested Loop
  -> Parallel Seq Scan on a
  -> Hash Join
-> Seq Scan on b
-> Parallel Shared Hash
  -> Parallel Index Scan on c
  Index Cond: c.x = a.x

Well, the problem here is that there's nothing to ensure that various
workers who are cooperating to build the hash table all have the same
value for a.x, nor is there any thing to ensure that they'll all get
done with the shared hash table at the same time.  So this is just
chaos.  I think we have to disallow this kind of plan as nonsensical.
Given that, I'm not sure a reset of the runtime keys can ever really
happen.  Have you investigated this?

I extracted the generic portions of this infrastructure (i.e. not the
btree-specific stuff) and spent some time working on it today.  The
big thing I fixed was the documentation, which you added in a fairly
illogical part of the file.  You had all of the new functions for
supporting parallel scans in a section that explicitly says it's for
mandatory functions, and which precedes the section on regular
non-parallel scans.  I moved the documentation for all three new
methods to the same place, added some explanation of parallel scans in
general, and rewrote the descriptions for the individual functions to
be more clear.  Also, in indexam.c, I adjusted things to use
RELATION_CHECKS in a couple of places, did some work on comments and
coding style, and fixed a place that should have used the new
OffsetToPointer macro but instead hand-rolled the thing with the casts
backwards.  Adding an integer to a value of type "void *" does not
work on all compilers.  The patch I ended up with is attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

parallel-generic-index-scan.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-19 Thread Andres Freund

On 2017-01-19 13:06:20 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-01-18 16:56:46 -0500, Tom Lane wrote:
> >> Andres Freund  writes:
> >> I have not actually looked at 0003 at all yet.  So yeah, please post
> >> for review after you're done rebasing.
>
> > Here's a rebased and lightly massaged version.
>
> I've read through this and made some minor improvements, mostly additional
> comment cleanup.

Thanks!

> One thing I wanted to ask about:
>
> @@ -4303,7 +4303,7 @@ inline_function(Oid funcid, Oid result_type, Oid 
> result_collid,
>
>  /*
>   * Forget it if the function is not SQL-language or has other showstopper
> - * properties.  (The nargs check is just paranoia.)
> + * properties.  (The nargs and retset checks are just paranoia.)
>   */
>  if (funcform->prolang != SQLlanguageId ||
>  funcform->prosecdef ||
>
> I thought this change was simply wrong, and removed it;

Hm. I made that change a while ago.  It might have been a holdover from
the old approach, where it'd indeed have been impossible to see any
tSRFs here.  Or it might have been because we check
querytree->hasTargetSRFs below (which should prevent inlining a function
that actually returns multiple rows).  I agree it's better to leave the
check there. Maybe we ought to remove the paranoia bit about retset
though - it's not paranoia if it has an effect.

> Other than that possible point, I think the attached is committable.

Will do so in a bit, after a s/and retset checks are/check is/. And then
fix that big-endian ordering issue.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Tom Lane

=?UTF-8?Q?Jorge_Sol=C3=B3rzano?=  writes:
> After many tries I get this:
> #0  0x00836950 in exec_bind_message (input_message=0x7ffce81048a0)
> at postgres.c:1562

Hmm ... that line was last touched by ab1f0c822, so I'm betting that
I broke it somehow, but I'm not sure how.

It looks like S_3 might have been parsed from a totally empty source
string?  But if that's the trigger, I'd think it'd be entirely trivial
to reproduce.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving RLS planning

2017-01-19 Thread Tom Lane

Andreas Seltenreich  writes:
> sqlsmith doesn't seem to like 215b43cdc:

> select 1 from information_schema.usage_privileges
> where information_schema._pg_keysequal(
>(select null::smallint[]),
>'{16,25,23}');

> -- TRAP: FailedAssertion("!(!and_clause((Node *) clause))", File: 
> "restrictinfo.c", Line: 81)

Thanks, I'll take a look.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

2017-01-19 Thread Peter Eisentraut

On 1/19/17 10:06 AM, Alvaro Herrera wrote:
>> Also, I wonder whether we should not in vacuum.c change the order of the
>> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
>> to keep everything consistent.
> 
> I am wary of doing that.  The current coding is well battle-tested by
> now, but doing things in the opposite order, not at all.  Pending some
> testing to show that there is no problem with a change, I would leave
> things as they are.

I appreciate this hesitation.

The issue the patch under discussion is addressing is that we have a
user-space interface to read information about commit timestamps.  So if
we truncate away old information before we update the mark where the
good information starts, then we get the race.  We don't have a
user-space interface reading, say, the clog, but it doesn't seem
implausible for that to exist at some point.  How would this code have
to be structured then?

> What I fear is: what
> happens if a concurrent checkpoint reads the values between the two
> operations, and a crash occurs?  I think that the checkpoint might save
> the updated values, so after crash recovery the truncate would not be
> executed, possibly leaving files around.  Leaving files around might be
> dangerous for multixacts at least (it's probably harmless for xids).

Why is leaving files around dangerous?  If this is a problem, why is it
not a problem for commit timestamps?  I don't understand why these
different SLRU uses are different.

Yeah, we could go ahead with this patch as is and it might be fine, but
I feel like we don't understand the broader issue completely yet.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Jorge Solórzano

After many tries I get this:

#0  0x00836950 in exec_bind_message (input_message=0x7ffce81048a0)
at postgres.c:1562
#1  0x0083a23d in PostgresMain (argc=1, argv=0x2bd93d8,
dbname=0x2bd9130 "test", username=0x2bd9110 "test") at postgres.c:4113
#2  0x007aacd4 in BackendRun (port=0x2bd02d0) at postmaster.c:4300
#3  0x007aa3fe in BackendStartup (port=0x2bd02d0) at
postmaster.c:3972
#4  0x007a6a6c in ServerLoop () at postmaster.c:1712
#5  0x007a6058 in PostmasterMain (argc=3, argv=0x2bac970) at
postmaster.c:1320
#6  0x006ee046 in main (argc=3, argv=0x2bac970) at main.c:228


*


#0  0x00836950 in exec_bind_message (input_message=0x7ffce81048a0)
> at postgres.c:1562
> portal_name = 0x2c248c8 ""
> stmt_name = 0x2c248c9 "S_3"
> numPFormats = 0
> pformats = 0x0
> numParams = 0
> numRFormats = 0
> rformats = 0x0
> psrc = 0x2c698a8
> cplan = 0x0
> portal = 0x2bacc50
> query_string = 0x0
> saved_stmt_name = 0x0
> params = 0x0
> oldContext = 0x7ffce81047a0
> save_log_statement_stats = 0 '\000'
> snapshot_set = 0 '\000'
> msec_str =
> "\000H\020\350\374\177\000\000\345\b\224\000\000\000\000\000\020H\020\350\374\177\000\000\277ӿ\373w\351\001"
> __func__ = "exec_bind_message"
> #1  0x0083a23d in PostgresMain (argc=1, argv=0x2bd93d8,
> dbname=0x2bd9130 "test", username=0x2bd9110 "test") at postgres.c:4113
> firstchar = 66
> input_message = {data = 0x2c248c8 "", len = 11, maxlen = 1024,
> cursor = 9, long_ok = 0 '\000'}
> local_sigjmp_buf = {{__jmpbuf = {0, -8329243729337072925, 4625744,
> 140724201868624, 0, 0, -8329243729320295709, 8327513958265701091},
> __mask_was_saved = 1, __saved_mask = {__val = {0, 0,
> 15404064, 45969600, 51539607562, 140724201867680,
> 10064130, 0, 11951218, 11944554, 51539611844, 72198318239795616, 10292727,
> 15403032, 15402848, 15402848
> send_ready_for_query = 0 '\000'
> disable_idle_in_transaction_timeout = 0 '\000'
> __func__ = "PostgresMain"
> #2  0x007aacd4 in BackendRun (port=0x2bd02d0) at postmaster.c:4300
> av = 0x2bd93d8
> maxac = 2
> ac = 1
> secs = 538176510
> usecs = 721824
> i = 1
> __func__ = "BackendRun"
> #3  0x007aa3fe in BackendStartup (port=0x2bd02d0) at
> postmaster.c:3972
> bn = 0x2bd0490
> pid = 0
> __func__ = "BackendStartup"
> #4  0x007a6a6c in ServerLoop () at postmaster.c:1712
> port = 0x2bd02d0
> i = 0
> rmask = {fds_bits = {8, 0 }}
> selres = 1
> now = 1484861310
> readmask = {fds_bits = {56, 0 }}
> nSockets = 6
> last_lockfile_recheck_time = 1484861307
> last_touch_time = 1484861007
> __func__ = "ServerLoop"
> #5  0x007a6058 in PostmasterMain (argc=3, argv=0x2bac970) at
> postmaster.c:1320
> opt = -1
> status = 0
> userDoption = 0x2bce450 "/usr/local/pgsql/data"
> listen_addr_saved = 1 '\001'
> i = 64
> output_config_variable = 0x0
> __func__ = "PostmasterMain"
> #6  0x006ee046 in main (argc=3, argv=0x2bac970) at main.c:228
>

I hope this helps.

Jorge Solórzano
me.jorsol.com

On Thu, Jan 19, 2017 at 11:13 AM, Robert Haas  wrote:

> On Thu, Jan 19, 2017 at 12:09 PM, Dave Cramer 
> wrote:
> > I would have expected more, but this is what I have
> >
> >  bt full
> > #0  InitPredicateLocks () at predicate.c:1250
> > i = 
> > info = {num_partitions = 1, ssize = 140731424825288, dsize = 1,
> >   max_dsize = 0, ffactor = 140731424836952, keysize =
> > 140356326474085,
> >   entrysize = 140728909791233, hash = 0x7ffe96960d58,
> >   match = 0x16da2d1, keycopy = 0x7ffe96960d58, alloc = 0x1703af0,
> >   hcxt = 0x16da2d0, hctl = 0x0}
> > max_table_size = 117899280
> > requestSize = 
> > found = 0 '\000'
>
> I would say that's not a valid stack trace.  There hasn't been a
> change made to that file since October of last year, and the crash is
> apparently recent; also, line 1250 in that file doesn't look like
> something that can crash.  I would guess that you're using an
> executable which doesn't match the core dump, or perhaps that you
> don't have complete debug symbols.  Building with -O0 might help.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

2017-01-19 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 12/29/16 4:28 AM, Craig Ringer wrote:
> > On 29 December 2016 at 16:51, Craig Ringer  wrote:
> >> On 28 December 2016 at 22:16, Petr Jelinek  
> >> wrote:
> >>
> >>> About the patch, it looks good to me for master with the minor exception
> >>> that:
>  + appendStringInfo(buf, "pageno %d, xid %u",
>  + trunc.pageno, trunc.oldestXid);
> >>>
> >>> This should probably say oldestXid instead of xid in the text description.
> >>
> >> Agreed.
> > 
> > Slightly amended attached.
> 
> I've looked over this.  It looks correct to me in principle.

Thanks, pushed.  I added a comment in vacuum.c, as well as removing the
memcpy()s of the xlog record, which were unnecessary now that there's an
intermediate struct.  WAL version bumped.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving RLS planning

2017-01-19 Thread Andreas Seltenreich

Tom Lane writes:

> Thanks for reviewing --- I'll do something with that test case and
> push it.

sqlsmith doesn't seem to like 215b43cdc:

select 1 from information_schema.usage_privileges
where information_schema._pg_keysequal(
   (select null::smallint[]),
   '{16,25,23}');

-- TRAP: FailedAssertion("!(!and_clause((Node *) clause))", File: 
"restrictinfo.c", Line: 81)

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] delta relations in AFTER triggers

2017-01-19 Thread Kevin Grittner

Attached is v9 which fixes bitrot from v8.  No other changes.

Still needs review.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


transition-v9.diff.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SERIALIZABLE on standby servers

2017-01-19 Thread Thomas Munro

On Wed, Nov 16, 2016 at 9:26 AM, Thomas Munro
 wrote:
> On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
>  wrote:
> [..]  Another solution
> could be to have recovery on the standby detect tokens (CSNs
> incremented by PreCommit_CheckForSerializationFailure) arriving out of
> order, but I don't know what exactly it should do about that when it
> is detected: you shouldn't respect an out-of-order claim of safety,
> but then what should you wait for?  Perhaps if the last replayed
> commit record before that was marked SNAPSHOT_SAFE then it's OK to
> leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
> you have to wait for that one to be resolved by a follow-up snapshot
> safety message and then rince-and-repeat (take a new snapshot etc).  I
> think that might work, but it seems strange to allow random races on
> the primary to create extra delays on the standby.  Perhaps there is
> some much simpler way to do all this that I'm missing.
>
> Another detail is that standbys that start up from a checkpoint and
> don't see any SSI transactions commit don't yet have any snapshot
> safety information, but defaulting to assuming that this point is safe
> doesn't seem right, so I suspect it needs to be in checkpoints.
>
> Attached is a tidied up version which doesn't try to address the above
> problems yet.  When time permits I'll come back to this.

I haven't looked at this again yet but a nearby thread reminded me of
another problem with this which I wanted to restate explicitly here in
the context of this patch.  Even without replication in the picture,
there is a race to reach ProcArrayEndTransaction() after
RecordTransactionCommit() runs, which means that the DO history
(normal primary server) and REDO history (recovery) don't always agree
on the order that transactions become visible.  With this patch, this
kind of diverging DO and REDO could allow undetectable read only
serialization anomalies.  I think that ProcArrayEndTransaction() and
RecordTransactionCommit() need to be made atomic in the simple case so
that DO and REDO agree.  Synchronous replication can make that more
likely and it seems like some other approach is probably needed to
delay visibility of not-yet-durable transactions while keeping the
order that transactions become visible the same on all nodes.

Aside from the problems I mentioned in my earlier message (race
between snapshot safety decision and logging order, and lack of
checkpointing of snapshot safety information), it seems like the two
DO vs REDO problems (race to ProcArrayEndTransaction, and deliberately
delayed visibility in syncrep) also need to be addressed before
SERIALIZABLE DEFERRABLE on standbys could make a water tight
guarantee.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow backslash continuations in \set expressions

2017-01-19 Thread Tom Lane

Rafia Sabih  writes:
> Okay, I am marking it as 'Ready for committer' with the following note
> to committer,
> I am getting whitespace errors in v3 of patch, which I corrected in
> v4, however, Fabien is of the opinion that v3 is clean and is showing
> whitespace errors because of downloader, etc. issues in my setup.

FWIW, what I see is that v3 saves out with Windows-style newlines (\r\n)
while v4 saves out with Unix newlines.  It wouldn't really matter, because
"patch" converts Windows newlines (at least mine does), but Unix newlines
are our project style.

As for the substance of the patch ... the fix for continuations in
the INITIAL state is not this simple.  You have rules for

{nonspace}+
{space}+
{newline}
{continuation}

Remember that flex always takes the rule that produces the longest match
starting at the current point.  {space}+ and {newline} don't conflict with
continuations, but {nonspace}+ does: suppose that the next four characters
are "a" "b" "\" newline.  flex will decide that the token is "ab\", rather
than taking the token as being "ab" and letting the "\" wait till next
time when it can be parsed as {continuation}.

In other words, it works to write backslash-newline immediately after a
token in the EXPR state (mainly because backslash isn't a legal part of
any token EXPR state currently allows), but not to write it immediately
after a token in INITIAL state.   I think this is surprising and
inconsistent.

Probably the easiest fix is to add a rule that explicitly matches this
situation:

{nonspace}+{continuation}  { ... throw back 2 chars and return the rest ... }

The "throw back" part is easily done with "yyless(yyleng - 2)"; compare
some rules in the core lexer.  I have not actually tested this,
but I think it will work, because when it applies it would always be
the longest possible match.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Dave Cramer

Not sure you can get the exec. We are working on producing the BT from
Travis-CI or I will build and run the test locally and get the trace

Dave Cramer

On 19 January 2017 at 15:31, Craig Ringer 
wrote:

>
>
> On 20 Jan. 2017 04:13, "Robert Haas"  wrote:
>
> On Thu, Jan 19, 2017 at 12:09 PM, Dave Cramer 
> wrote:
> > I would have expected more, but this is what I have
> >
> >  bt full
> > #0  InitPredicateLocks () at predicate.c:1250
> > i = 
> > info = {num_partitions = 1, ssize = 140731424825288, dsize = 1,
> >   max_dsize = 0, ffactor = 140731424836952, keysize =
> > 140356326474085,
> >   entrysize = 140728909791233, hash = 0x7ffe96960d58,
> >   match = 0x16da2d1, keycopy = 0x7ffe96960d58, alloc = 0x1703af0,
> >   hcxt = 0x16da2d0, hctl = 0x0}
> > max_table_size = 117899280
> > requestSize = 
> > found = 0 '\000'
>
> I would say that's not a valid stack trace.  There hasn't been a
> change made to that file since October of last year, and the crash is
> apparently recent; also, line 1250 in that file doesn't look like
> something that can crash.  I would guess that you're using an
> executable which doesn't match the core dump, or perhaps that you
> don't have complete debug symbols.
>
>
> You really need the same compiler flags, configure opts and preferably
> much the same compiler. Similar or same C library etc.
>
> Can't we get the executables from Travis CI or from whoever produced the
> core? Or get them to obtain a bt ?
>

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Craig Ringer

On 20 Jan. 2017 04:13, "Robert Haas"  wrote:

On Thu, Jan 19, 2017 at 12:09 PM, Dave Cramer  wrote:
> I would have expected more, but this is what I have
>
>  bt full
> #0  InitPredicateLocks () at predicate.c:1250
> i = 
> info = {num_partitions = 1, ssize = 140731424825288, dsize = 1,
>   max_dsize = 0, ffactor = 140731424836952, keysize =
> 140356326474085,
>   entrysize = 140728909791233, hash = 0x7ffe96960d58,
>   match = 0x16da2d1, keycopy = 0x7ffe96960d58, alloc = 0x1703af0,
>   hcxt = 0x16da2d0, hctl = 0x0}
> max_table_size = 117899280
> requestSize = 
> found = 0 '\000'

I would say that's not a valid stack trace.  There hasn't been a
change made to that file since October of last year, and the crash is
apparently recent; also, line 1250 in that file doesn't look like
something that can crash.  I would guess that you're using an
executable which doesn't match the core dump, or perhaps that you
don't have complete debug symbols.

You really need the same compiler flags, configure opts and preferably much
the same compiler. Similar or same C library etc.

Can't we get the executables from Travis CI or from whoever produced the
core? Or get them to obtain a bt ?

[HACKERS] [PATCH] Change missleading comment for tm_mon field of pg_tm structure

2017-01-19 Thread Dmitry Fedin

According to usages, tm_mon (month) field has origin 1, not 0. This is
difference from C-library version of struct tm, which has origin 0 for
real.
For example:

```C
if (DateOrder == DATEORDER_DMY)
sprintf(str, "%02d/%02d", tm->tm_mday, tm->tm_mon);
else
sprintf(str, "%02d/%02d", tm->tm_mon, tm->tm_mday);
in src/backend/utils/adt/datetime.c

```
---
 src/include/pgtime.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/pgtime.h b/src/include/pgtime.h
index dcd0730..52b54b9 100644
--- a/src/include/pgtime.h
+++ b/src/include/pgtime.h
@@ -28,7 +28,7 @@ struct pg_tm
  int tm_min;
  int tm_hour;
  int tm_mday;
- int tm_mon; /* origin 0, not 1 */
+ int tm_mon; /* origin 1, not 0! */
  int tm_year; /* relative to 1900 */
  int tm_wday;
  int tm_yday;


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 4:26 AM, Amit Kapila  wrote:
> Fixed.

I think that the whole idea of GatherSupportsBackwardScan is wrong.
Gather doesn't support a backwards scan under any circumstances,
whether the underlying node does or not.  You can read the tuples
once, in order, and you can't back up.  That's what supporting a
backward scan means: you can back up and then move forward again.
It's there to support cursor operations.

Also note this comment in ExecSupportsBackwardsScan, which seems just
as relevant to parallel index scans as anything else:

/*
 * Parallel-aware nodes return a subset of the tuples in each worker, and
 * in general we can't expect to have enough bookkeeping state to know
 * which ones we returned in this worker as opposed to some other worker.
 */
if (node->parallel_aware)
return false;

If all of that were no issue, the considerations in
TargetListSupportsBackwardScan could be a problem, too.  But I think
there shouldn't be any issue having Gather just continue to return
false.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 12:15 AM, Amit Langote
 wrote:
> 0001-Fix-a-bug-of-insertion-into-an-internal-partition.patch
>
> Since implicit partition constraints are not inherited, an internal
> partition's constraint was not being enforced when targeted directly.
> So, include such constraint when setting up leaf partition result
> relations for tuple-routing.

Committed.

> 0002-Set-ecxt_scantuple-correctly-for-tuple-routing.patch
>
> In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that
> it's possible for a different TupleTableSlot to be used for partitioned
> tables at successively lower levels.  If we do end up changing the slot
> from the original, we must update ecxt_scantuple to point to the new one
> for partition key of the tuple to be computed correctly.
>
> Last posted here:
> https://www.postgresql.org/message-id/99fbfad6-b89b-9427-a6ca-197aad98c48e%40lab.ntt.co.jp

Why does FormPartitionKeyDatum need this?  Could that be documented in
a comment in here someplace, perhaps the header comment to
FormPartitionKeyDatum?

> 0003-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch
>
> In ExecInsert(), do not switch back to the root partitioned table
> ResultRelInfo until after we finish ExecProcessReturning(), so that
> RETURNING projection is done using the partition's descriptor.  For
> the projection to work correctly, we must initialize the same for
> each leaf partition during ModifyTableState initialization.

Committed.

> 0004-Fix-some-issues-with-views-and-partitioned-tables.patch
>
> Automatically updatable views failed to handle partitioned tables.
> Once that's fixed, WITH CHECK OPTIONS wouldn't work correctly without
> the WCO expressions having been suitably converted for each partition
> (think applying map_partition_varattnos to Vars in the WCO expressions
> just like with partition constraint expressions).

The changes to execMain.c contain a hunk which has essentially the
same code twice.  That looks like a mistake.  Also, the patch doesn't
compile because convert_tuples_by_name() takes 3 arguments, not 4.

> 0005-Fix-some-wrong-thinking-in-check_new_partition_bound.patch
>
> Because a given range bound in the PartitionBoundInfo.datums array
> is sometimes a range lower bound and upper bound at other times, we
> must be careful when assuming which, especially when interpreting
> the result of partition_bound_bsearch which returns the index of the
> greatest bound that is less than or equal to probe.  Due to an error
> in thinking about the same, the relevant code in
> check_new_partition_bound() caused invalid partition (index = -1)
> to be chosen as the partition being overlapped.
>
> Last posted here:
> https://www.postgresql.org/message-id/603acb8b-5dec-31e8-29b0-609a68aac50f%40lab.ntt.co.jp

 }
+/*
+ * If equal has been set to true or if there is no "gap"
+ * between the bound at off1 and that at off1 + 1, the new
+ * partition will overlap some partition.  In the former
+ * case, the new lower bound is found to be equal to the
+ * bound at off1, which could only ever be true if the
+ * latter is the lower bound of some partition.  It's
+ * clear in such a case that the new partition overlaps
+ * that partition, whose index we get using its upper
+ * bound (that is, using the bound at off1 + 1).
+ */
 else

Stylistically, we usually avoid this, or at least I do.  The comment
should go inside the "else" block.  But it looks OK apart from that,
so committed with a little rephrasing and reformatting of the comment.

> 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
>
> Currently, the tuple conversion is performed after a tuple is routed,
> even if the attributes of a target leaf partition map one-to-one with
> those of the root table, which is wasteful.  Avoid that by making
> convert_tuples_by_name() return a NULL map for such cases.

+Assert(!consider_typeid && indesc->tdhasoid == outdesc->tdhasoid);

I think you mean Assert(consider_typeid || indesc->tdhasoid ==
outdesc->tdhasoid);

But I wonder why we don't instead just change this function to
consider tdhasoid rather than tdtypeid.  I mean, if the only point of
comparing the type OIDs is to find out whether the table-has-OIDs
setting matches, we could instead test that directly and avoid needing
to pass an extra argument.  I wonder if there's some other reason this
code is there which is not documented in the comment...

> 0007-Avoid-code-duplication-in-map_partition_varattnos.patch
>
> Code to map attribute numbers in map_partition_varattnos() duplicates
> what convert_tuples_by_name_map() does.  Avoid that as pointed out by
> Álvaro Herrera.
>
> Last posted here:
> https://www.postgresql.org/message-id/9ce97382-54c8-deb3-9ee9-a

Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-19 Thread Alvaro Herrera

Robert Haas wrote:

> After sleeping on this, I'm inclined to go with Amit's fix for now.
> It seems less likely to break anything in the back-branches than any
> other option I can think up.

Yeah, no objections here.

Note typo "imporatant" in the comment.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

2017-01-19 Thread Alvaro Herrera

Robert Haas wrote:
> On Thu, Jan 19, 2017 at 10:06 AM, Alvaro Herrera
>  wrote:
> > Peter Eisentraut wrote:
> >
> >> Also, I wonder whether we should not in vacuum.c change the order of the
> >> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
> >> to keep everything consistent.
> >
> > I am wary of doing that.  The current coding is well battle-tested by
> > now, but doing things in the opposite order, not at all.  Pending some
> > testing to show that there is no problem with a change, I would leave
> > things as they are.  Probably said testing is too onerous for the
> > benefit (which is just a little consistency).  What I fear is: what
> > happens if a concurrent checkpoint reads the values between the two
> > operations, and a crash occurs?  I think that the checkpoint might save
> > the updated values, so after crash recovery the truncate would not be
> > executed, possibly leaving files around.  Leaving files around might be
> > dangerous for multixacts at least (it's probably harmless for xids).
> 
> Don't both SLRUs eventually wrap?  If so, leaving file around seems
> dangerous for either in equal measure.

Well, multixact uses the whole 2^32 space as valid, while xid only uses
half of it.  I think you would have to crash on every checkpoint just
between those two points for two billion xacts for there to be a problem
for xids.  Anyway this just reinforces my argument that we shouldn't
move those calls -- moving only the commit_ts one is good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-19 Thread Karl O. Pinc

On Thu, 19 Jan 2017 12:12:18 -0300
Alvaro Herrera  wrote:

> Karl O. Pinc wrote:
> > On Wed, 18 Jan 2017 19:27:40 -0300
> > Alvaro Herrera  wrote:  
> 
> > > I thought this part was odd -- I mean, why is SysLogger_Start()
> > > being called if the collector is not enabled?  Turns out we do it
> > > and return early if not enabled.  But not in all cases -- there
> > > is one callsite in postmaster.c that avoids the call if the
> > > collector is disabled. That needs to be changed if we want this
> > > to work reliably.  
> > 
> > Is this an argument for having the current_logfiles always exist
> > and be empty when there is no in-filesystem logfile?  It always felt
> > to me that the code would be simpler that way.  
> 
> I don't know.  I am just saying that you need to patch postmaster.c
> line 1726 to remove the second arm of the &&.

Gilles,

I'm not available just now.  Can you do this or enlist Michael?

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP - FailedAssertion, File: "array_typanalyze.c", Line: 340

2017-01-19 Thread Erik Rijkers


On 2017-01-19 19:12, Petr Jelinek wrote:

On 19/01/17 18:44, Erik Rijkers wrote:


Could probably be whittled down to something shorter but I hope it's
still easily reproduced.



Just analyze on the pg_subscription is enough.



heh. Ah well, I did find it :)


Can you give the current patch set? I am failing to get a compilable 
set.


In the following order they apply, but then fail during compile.

0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch
0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch
0003-Define-logical-replication-protocol-and-output-plugi-v18.patch
0004-Add-logical-replication-workers-v18fixed.patch
0006-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
0001-Logical-replication-support-for-initial-data-copy-v3.patch
pg_subscription-analyze-fix.diff

The compile fails with:

In file included from ../../../../src/include/postgres.h:47:0,
 from worker.c:27:
worker.c: In function ‘create_estate_for_relation’:
../../../../src/include/c.h:203:14: warning: passing argument 4 of 
‘InitResultRelInfo’ makes pointer from integer without a cast 
[-Wint-conversion]

 #define true ((bool) 1)
  ^
worker.c:187:53: note: in expansion of macro ‘true’
  InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0);
 ^~~~
In file included from ../../../../src/include/funcapi.h:21:0,
 from worker.c:31:
../../../../src/include/executor/executor.h:189:13: note: expected 
‘Relation {aka struct RelationData *}’ but argument is of type ‘char’

 extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 ^
worker.c:187:59: warning: passing argument 5 of ‘InitResultRelInfo’ 
makes integer from pointer without a cast [-Wint-conversion]

  InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0);
   ^~~~
In file included from ../../../../src/include/funcapi.h:21:0,
 from worker.c:31:
../../../../src/include/executor/executor.h:189:13: note: expected ‘int’ 
but argument is of type ‘void *’

 extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 ^
worker.c:187:2: error: too many arguments to function 
‘InitResultRelInfo’

  InitResultRelInfo(resultRelInfo, rel->localrel, 1, true, NULL, 0);
  ^
In file included from ../../../../src/include/funcapi.h:21:0,
 from worker.c:31:
../../../../src/include/executor/executor.h:189:13: note: declared here
 extern void InitResultRelInfo(ResultRelInfo *resultRelInfo,
 ^
make[4]: *** [worker.o] Error 1
make[4]: *** Waiting for unfinished jobs
make[3]: *** [logical-recursive] Error 2
make[2]: *** [replication-recursive] Error 2
make[2]: *** Waiting for unfinished jobs
^[make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2




but perhaps that patchset itself is incorrect, or the order in which I 
applied them.


Can you please put them in the right order?  (I tried already a few...)


thanks,


Erik Rijkers



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP - FailedAssertion, File: "array_typanalyze.c", Line: 340

2017-01-19 Thread Petr Jelinek

On 19/01/17 18:44, Erik Rijkers wrote:
> 
> Could probably be whittled down to something shorter but I hope it's
> still easily reproduced.
> 

Just analyze on the pg_subscription is enough. Looks like it's the
name[] type, when I change it to text[] like in the attached patch it
works fine for me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index ccb8880..0ad7b0e 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -41,7 +41,7 @@ CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHE
 	text		subconninfo;	/* Connection string to the publisher */
 	NameData	subslotname;	/* Slot name on publisher */
 
-	name		subpublications[1];	/* List of publications subscribed to */
+	text		subpublications[1];	/* List of publications subscribed to */
 #endif
 } FormData_pg_subscription;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

2017-01-19 Thread Tom Lane

Andres Freund  writes:
> On 2017-01-18 16:56:46 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>> I have not actually looked at 0003 at all yet.  So yeah, please post
>> for review after you're done rebasing.

> Here's a rebased and lightly massaged version.

I've read through this and made some minor improvements, mostly additional
comment cleanup.  One thing I wanted to ask about:

@@ -4303,7 +4303,7 @@ inline_function(Oid funcid, Oid result_type, Oid 
result_collid,

 /*
  * Forget it if the function is not SQL-language or has other showstopper
- * properties.  (The nargs check is just paranoia.)
+ * properties.  (The nargs and retset checks are just paranoia.)
  */
 if (funcform->prolang != SQLlanguageId ||
 funcform->prosecdef ||

I thought this change was simply wrong, and removed it; AFAIK it's
perfectly possible to get here for set-returning functions, since
the planner does expression simplification long before it worries
about splitting out SRFs.  Did you have a reason to think differently?

Other than that possible point, I think the attached is committable.

regards, tom lane

no-srfs-in-tlists-cleanup-2.patch.gz
Description: no-srfs-in-tlists-cleanup-2.patch.gz

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_background contrib module proposal

2017-01-19 Thread Andrey Borodin

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, failed

I’ll summarize here my thoughts as a reviewer on the current state of the 
pg_background:
1. Current version of a code [1] is fine, from my point of view. I have no 
suggestions on improving it. There is no documentation, but code is commented.
2. Patch is dependent on background sessions from the same commitfest.
3. There can exist more features, but for v1 there is surely enough features.
4. There is some controversy on where implemented feature shall be: in separate 
extension (as in this patch), in db_link, in some PL API, in FDW or somewhere 
else. I think that new extension is an appropriate place for the feature. But 
I’m not certain.
Summarizing these points, appropriate statuses of the patch are ‘Ready for 
committer’ or ‘Rejected’. Between these two I choose ‘Ready for committer’, I 
think patch is committable (after bg sessions).

Best regards, Andrey Borodin.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication WIP - FailedAssertion, File: "array_typanalyze.c", Line: 340

2017-01-19 Thread Erik Rijkers


On 2017-01-19 01:02, Petr Jelinek wrote:

This causes the replica to crash:

#--
#!/bin/bash

# 2 instances on 6972 (master) and 6973 (replica)
# initially without publication or subscription

# clean logs
#echo > 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication/logfile.logical_replication
#echo > 
/var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/logfile.logical_replication2


SLEEP=1
bail=0
pub_count=$( echo "select count(*) from pg_publication" | psql -qtAXp 
6972 )

if  [[ $pub_count -ne 0 ]]
then
  echo "pub_count -ne 0 - deleting pub1 & bailing out"
  echo "drop publication if exists pub1" | psql -Xp 6972
  bail=1
fi
sub_count=$( echo "select count(*) from pg_subscription" | psql -qtAXp 
6973 )

if [[ $sub_count -ne 0 ]]
 then
  echo "sub_count -ne 0 - deleting sub1 & bailing out"
  echo "drop subscription if exists sub1" | psql -Xp 6973
  bail=1
fi

if [[ $bail -eq 1 ]]
then
   exit -1
fi

echo "drop table if exists testt;" | psql -qXap 6972
echo "drop table if exists testt;" | psql -qXap 6973

echo "-- on master  (port 6972):
create table testt(id serial primary key, n integer, c text);
create publication pub1 for all tables; " | psql -qXap 6972

echo "-- on replica (port 6973):
create table testt(id serial primary key, n integer, c text);
create subscription sub1 connection 'port=6972' publication pub1 with 
(disabled);

alter  subscription sub1 enable; "| psql -qXap 6973

sleep $SLEEP

echo "table testt /*limit 3*/; select current_setting('port'), count(*) 
from testt;" | psql -qXp 6972
echo "table testt /*limit 3*/; select current_setting('port'), count(*) 
from testt;" | psql -qXp 6973


echo "-- now crash:
analyze pg_subscription" | psql -qXp 6973
#--



-- log of the replica:

2017-01-19 17:54:09.163 CET 224200 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-19 17:54:09.166 CET 21166 LOG:  logical replication apply for 
subscription sub1 started
2017-01-19 17:54:09.169 CET 21166 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-19 17:54:09.172 CET 21171 LOG:  logical replication sync for 
subscription sub1, table testt started
2017-01-19 17:54:09.190 CET 21171 LOG:  logical replication 
synchronization worker finished processing
TRAP: FailedAssertion("!(((array)->elemtype) == extra_data->type_id)", 
File: "array_typanalyze.c", Line: 340)
2017-01-19 17:54:20.110 CET 224190 LOG:  server process (PID 21183) was 
terminated by signal 6: Aborted
2017-01-19 17:54:20.110 CET 224190 DETAIL:  Failed process was running: 
autovacuum: ANALYZE pg_catalog.pg_subscription
2017-01-19 17:54:20.110 CET 224190 LOG:  terminating any other active 
server processes
2017-01-19 17:54:20.110 CET 224198 WARNING:  terminating connection 
because of crash of another server process
2017-01-19 17:54:20.110 CET 224198 DETAIL:  The postmaster has commanded 
this server process to roll back the current transaction and exit, 
because another server process exited abnormally and possibly corrupted 
shared memory.
2017-01-19 17:54:20.110 CET 224198 HINT:  In a moment you should be able 
to reconnect to the database and repeat your command.
2017-01-19 17:54:20.111 CET 224190 LOG:  all server processes 
terminated; reinitializing
2017-01-19 17:54:20.143 CET 21184 LOG:  database system was interrupted; 
last known up at 2017-01-19 17:38:48 CET
2017-01-19 17:54:20.179 CET 21184 LOG:  recovered replication state of 
node 1 to 0/2CEBF08
2017-01-19 17:54:20.179 CET 21184 LOG:  database system was not properly 
shut down; automatic recovery in progress

2017-01-19 17:54:20.181 CET 21184 LOG:  redo starts at 0/2513E88
2017-01-19 17:54:20.184 CET 21184 LOG:  invalid record length at 
0/2546980: wanted 24, got 0

2017-01-19 17:54:20.184 CET 21184 LOG:  redo done at 0/2546918
2017-01-19 17:54:20.184 CET 21184 LOG:  last completed transaction was 
at log time 2017-01-19 17:54:09.191697+01
2017-01-19 17:54:20.191 CET 21184 LOG:  MultiXact member wraparound 
protections are now enabled
2017-01-19 17:54:20.193 CET 224190 LOG:  database system is ready to 
accept connections

2017-01-19 17:54:20.193 CET 21188 LOG:  autovacuum launcher started
2017-01-19 17:54:20.194 CET 21190 LOG:  logical replication launcher 
started
2017-01-19 17:54:20.194 CET 21190 LOG:  starting logical replication 
worker for subscription "sub1"
2017-01-19 17:54:20.202 CET 21191 LOG:  logical replication apply for 
subscription sub1 started




Could probably be whittled down to something shorter but I hope it's 
still easily reproduced.



thanks,

Erik Rijkers





setup of the 2 instances:


# ./instances.sh
#!/bin/bash
port1=6972
port2=6973
project1=logical_replication
project2=logical_replication2
# pg_stuff_dir=$HOME/pg_stuff
  pg_stuff_dir=/var/data1/pg_stuff
PATH1=$pg_stuff_dir/pg_installations/pgsql.$project1/bin:$PATH
PATH2=$pg_stuff_dir/pg_installations/pgsql.$project2/bin:$PATH
server_dir1=$pg_stuff_dir/pg_installations/pgsql.$proj

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 12:09 PM, Dave Cramer  wrote:
> I would have expected more, but this is what I have
>
>  bt full
> #0  InitPredicateLocks () at predicate.c:1250
> i = 
> info = {num_partitions = 1, ssize = 140731424825288, dsize = 1,
>   max_dsize = 0, ffactor = 140731424836952, keysize =
> 140356326474085,
>   entrysize = 140728909791233, hash = 0x7ffe96960d58,
>   match = 0x16da2d1, keycopy = 0x7ffe96960d58, alloc = 0x1703af0,
>   hcxt = 0x16da2d0, hctl = 0x0}
> max_table_size = 117899280
> requestSize = 
> found = 0 '\000'

I would say that's not a valid stack trace.  There hasn't been a
change made to that file since October of last year, and the crash is
apparently recent; also, line 1250 in that file doesn't look like
something that can crash.  I would guess that you're using an
executable which doesn't match the core dump, or perhaps that you
don't have complete debug symbols.  Building with -O0 might help.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Dave Cramer

I would have expected more, but this is what I have

 bt full
#0  InitPredicateLocks () at predicate.c:1250
i = 
info = {num_partitions = 1, ssize = 140731424825288, dsize = 1,
  max_dsize = 0, ffactor = 140731424836952, keysize =
140356326474085,
  entrysize = 140728909791233, hash = 0x7ffe96960d58,
  match = 0x16da2d1, keycopy = 0x7ffe96960d58, alloc = 0x1703af0,
  hcxt = 0x16da2d0, hctl = 0x0}
max_table_size = 117899280
requestSize = 
found = 0 '\000'


Dave Cramer

On 19 January 2017 at 12:05, Dave Cramer  wrote:

> I'll try to get the stack trace from the core dump, have to build master
> first
>
> Dave Cramer
>
> On 19 January 2017 at 12:01, Jorge Solórzano  wrote:
>
>> Robert, the logs I get from postgres (at least the section that matters)
>> is here:
>> If you need something else just ask...
>>
>> 2017-01-19 08:54:57.319 CST [31734] DEBUG:  CommitTransaction(1) name:
>>> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
>>> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  parse S_1: DROP TABLE
>>> rollbacktest CASCADE
>>> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
>>> CASCADE
>>> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  StartTransaction(1) name:
>>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>>> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
>>> CASCADE
>>> 2017-01-19 08:54:57.321 CST [31734] LOG:  duration: 0.289 ms
>>> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  bind  to S_1
>>> 2017-01-19 08:54:57.321 CST [31734] LOG:  duration: 0.073 ms
>>> 2017-01-19 08:54:57.321 CST [31734] LOG:  execute S_1: DROP TABLE
>>> rollbacktest CASCADE
>>> 2017-01-19 08:54:57.321 CST [31734] ERROR:  table "rollbacktest" does
>>> not exist
>>> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
>>> CASCADE
>>> 2017-01-19 08:54:57.322 CST [31734] DEBUG:  parse S_2: CREATE TABLE
>>> rollbacktest (a int, str text)
>>> 2017-01-19 08:54:57.322 CST [31734] STATEMENT:  CREATE TABLE
>>> rollbacktest (a int, str text)
>>> 2017-01-19 08:54:57.322 CST [31734] DEBUG:  StartTransaction(1) name:
>>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>>> 2017-01-19 08:54:57.322 CST [31734] STATEMENT:  CREATE TABLE
>>> rollbacktest (a int, str text)
>>> 2017-01-19 08:54:57.322 CST [31734] LOG:  duration: 0.279 ms
>>> 2017-01-19 08:54:57.323 CST [31734] DEBUG:  bind  to S_2
>>> 2017-01-19 08:54:57.323 CST [31734] LOG:  duration: 0.163 ms
>>> 2017-01-19 08:54:57.323 CST [31734] LOG:  execute S_2: CREATE TABLE
>>> rollbacktest (a int, str text)
>>> 2017-01-19 08:54:57.324 CST [31734] DEBUG:  building index
>>> "pg_toast_163960_index" on table "pg_toast_163960"
>>> 2017-01-19 08:54:57.324 CST [31734] STATEMENT:  CREATE TABLE
>>> rollbacktest (a int, str text)
>>> 2017-01-19 08:54:57.327 CST [31734] LOG:  duration: 4.232 ms
>>> 2017-01-19 08:54:57.327 CST [31734] DEBUG:  CommitTransaction(1) name:
>>> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 270571/1/5
>>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  parse S_3:
>>> 2017-01-19 08:54:57.329 CST [31734] STATEMENT:
>>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  StartTransaction(1) name:
>>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>>> 2017-01-19 08:54:57.329 CST [31734] STATEMENT:
>>> 2017-01-19 08:54:57.329 CST [31734] LOG:  duration: 0.148 ms
>>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  bind  to S_3
>>> 2017-01-19 08:54:57.329 CST [31734] LOG:  duration: 0.088 ms
>>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  CommitTransaction(1) name:
>>> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
>>> 2017-01-19 08:54:57.330 CST [31734] DEBUG:  parse S_4: BEGIN
>>> 2017-01-19 08:54:57.330 CST [31734] STATEMENT:  BEGIN
>>> 2017-01-19 08:54:57.330 CST [31734] DEBUG:  StartTransaction(1) name:
>>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>>> 2017-01-19 08:54:57.330 CST [31734] STATEMENT:  BEGIN
>>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.189 ms
>>> 2017-01-19 08:54:57.331 CST [31734] DEBUG:  bind  to S_4
>>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.105 ms
>>> 2017-01-19 08:54:57.331 CST [31734] LOG:  execute S_4: BEGIN
>>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.154 ms
>>> 2017-01-19 08:54:57.331 CST [31734] DEBUG:  parse S_5: insert into
>>> rollbacktest(a, str) values (0, 'test')
>>> 2017-01-19 08:54:57.331 CST [31734] STATEMENT:  insert into
>>> rollbacktest(a, str) values (0, 'test')
>>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.277 ms
>>> 2017-01-19 08:54:57.332 CST [31734] DEBUG:  bind  to S_5
>>> 2017-01-19 08:54:57.332 CST [31734] LOG:  duration: 0.168 ms
>>> 2017-01-19 08:54:57.332 CST [31734] LOG:  execute S_5: insert into
>>> rollbacktest(a, str) values (0, 'test')
>>> 2017-01-19 08:54:57.332 CST [31734] LOG:  duration: 0.233 ms
>>> 2017-01-19 08:54:

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 12:01 PM, Jorge Solórzano  wrote:
> Robert, the logs I get from postgres (at least the section that matters) is
> here:
> If you need something else just ask...

That shows that there is a crash, but there's no stack trace in there.
Someone needs to get a core dump and use "gdb" to get  stack trace.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw bug in 9.6

2017-01-19 Thread Robert Haas

On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
 wrote:
>> Yes, I think that's broadly the approach Tom was recommending.
>
> I don't have problem with that approach, but the latest patch has
> following problems.
> 1. We are copying chunks of path creation logic, instead of reusing
> corresponding functions.

Exactly which functions do you think we ought to be reusing that the
patch currently doesn't reuse?

> 2. There are many cases where the new function would return no local
> path and hence postgres_fdw doesn't push down the join [1]. This will
> be performance regression compared to 9.6.

Some, or many?  How many?  Which ones?  At least some of the problems
you were complaining about look like basic validity checks that were
copied from joinpath.c, so they're probably necessary for correctness.
It's worth remembering that we're trying to fix a bug here; if that
makes some cases perform less well, that's sad, but not as sad as
throwing a bogus error, which is what's happening right now.

I'm a bit sketchy about this kind of thing, though:

! if (required_outer)
  {
! bms_free(required_outer);
! return NULL;
  }

I don't understand what would stop that from making this fail to
generate parameterized paths in some cases in which it would be legal
to do so, and the comment is very brief.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Dave Cramer

I'll try to get the stack trace from the core dump, have to build master
first

Dave Cramer

On 19 January 2017 at 12:01, Jorge Solórzano  wrote:

> Robert, the logs I get from postgres (at least the section that matters)
> is here:
> If you need something else just ask...
>
> 2017-01-19 08:54:57.319 CST [31734] DEBUG:  CommitTransaction(1) name:
>> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
>> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  parse S_1: DROP TABLE
>> rollbacktest CASCADE
>> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
>> CASCADE
>> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  StartTransaction(1) name:
>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
>> CASCADE
>> 2017-01-19 08:54:57.321 CST [31734] LOG:  duration: 0.289 ms
>> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  bind  to S_1
>> 2017-01-19 08:54:57.321 CST [31734] LOG:  duration: 0.073 ms
>> 2017-01-19 08:54:57.321 CST [31734] LOG:  execute S_1: DROP TABLE
>> rollbacktest CASCADE
>> 2017-01-19 08:54:57.321 CST [31734] ERROR:  table "rollbacktest" does not
>> exist
>> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
>> CASCADE
>> 2017-01-19 08:54:57.322 CST [31734] DEBUG:  parse S_2: CREATE TABLE
>> rollbacktest (a int, str text)
>> 2017-01-19 08:54:57.322 CST [31734] STATEMENT:  CREATE TABLE rollbacktest
>> (a int, str text)
>> 2017-01-19 08:54:57.322 CST [31734] DEBUG:  StartTransaction(1) name:
>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>> 2017-01-19 08:54:57.322 CST [31734] STATEMENT:  CREATE TABLE rollbacktest
>> (a int, str text)
>> 2017-01-19 08:54:57.322 CST [31734] LOG:  duration: 0.279 ms
>> 2017-01-19 08:54:57.323 CST [31734] DEBUG:  bind  to S_2
>> 2017-01-19 08:54:57.323 CST [31734] LOG:  duration: 0.163 ms
>> 2017-01-19 08:54:57.323 CST [31734] LOG:  execute S_2: CREATE TABLE
>> rollbacktest (a int, str text)
>> 2017-01-19 08:54:57.324 CST [31734] DEBUG:  building index
>> "pg_toast_163960_index" on table "pg_toast_163960"
>> 2017-01-19 08:54:57.324 CST [31734] STATEMENT:  CREATE TABLE rollbacktest
>> (a int, str text)
>> 2017-01-19 08:54:57.327 CST [31734] LOG:  duration: 4.232 ms
>> 2017-01-19 08:54:57.327 CST [31734] DEBUG:  CommitTransaction(1) name:
>> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 270571/1/5
>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  parse S_3:
>> 2017-01-19 08:54:57.329 CST [31734] STATEMENT:
>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  StartTransaction(1) name:
>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>> 2017-01-19 08:54:57.329 CST [31734] STATEMENT:
>> 2017-01-19 08:54:57.329 CST [31734] LOG:  duration: 0.148 ms
>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  bind  to S_3
>> 2017-01-19 08:54:57.329 CST [31734] LOG:  duration: 0.088 ms
>> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  CommitTransaction(1) name:
>> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
>> 2017-01-19 08:54:57.330 CST [31734] DEBUG:  parse S_4: BEGIN
>> 2017-01-19 08:54:57.330 CST [31734] STATEMENT:  BEGIN
>> 2017-01-19 08:54:57.330 CST [31734] DEBUG:  StartTransaction(1) name:
>> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
>> 2017-01-19 08:54:57.330 CST [31734] STATEMENT:  BEGIN
>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.189 ms
>> 2017-01-19 08:54:57.331 CST [31734] DEBUG:  bind  to S_4
>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.105 ms
>> 2017-01-19 08:54:57.331 CST [31734] LOG:  execute S_4: BEGIN
>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.154 ms
>> 2017-01-19 08:54:57.331 CST [31734] DEBUG:  parse S_5: insert into
>> rollbacktest(a, str) values (0, 'test')
>> 2017-01-19 08:54:57.331 CST [31734] STATEMENT:  insert into
>> rollbacktest(a, str) values (0, 'test')
>> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.277 ms
>> 2017-01-19 08:54:57.332 CST [31734] DEBUG:  bind  to S_5
>> 2017-01-19 08:54:57.332 CST [31734] LOG:  duration: 0.168 ms
>> 2017-01-19 08:54:57.332 CST [31734] LOG:  execute S_5: insert into
>> rollbacktest(a, str) values (0, 'test')
>> 2017-01-19 08:54:57.332 CST [31734] LOG:  duration: 0.233 ms
>> 2017-01-19 08:54:57.333 CST [31734] DEBUG:  parse S_6: select * from
>> rollbacktest
>> 2017-01-19 08:54:57.333 CST [31734] STATEMENT:  select * from rollbacktest
>> 2017-01-19 08:54:57.333 CST [31734] LOG:  duration: 0.093 ms
>> 2017-01-19 08:54:57.333 CST [31734] DEBUG:  bind  to S_6
>> 2017-01-19 08:54:57.333 CST [31734] LOG:  duration: 0.188 ms
>> 2017-01-19 08:54:57.333 CST [31734] LOG:  execute S_6: select * from
>> rollbacktest
>> 2017-01-19 08:54:57.333 CST [31734] LOG:  duration: 0.094 ms
>> 2017-01-19 08:54:57.334 CST [31734] DEBUG:  parse S_7: select 1/0
>> 2017-01-19 08:54:57.334 CST [31734] STATEMENT:  select 1/0
>> 2017-01-19 08:54:57.334 CST [31734] LOG:  duration: 0.174 ms

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Jorge Solórzano

Robert, the logs I get from postgres (at least the section that matters) is
here:
If you need something else just ask...

2017-01-19 08:54:57.319 CST [31734] DEBUG:  CommitTransaction(1) name:
> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  parse S_1: DROP TABLE
> rollbacktest CASCADE
> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
> CASCADE
> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  StartTransaction(1) name:
> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
> CASCADE
> 2017-01-19 08:54:57.321 CST [31734] LOG:  duration: 0.289 ms
> 2017-01-19 08:54:57.321 CST [31734] DEBUG:  bind  to S_1
> 2017-01-19 08:54:57.321 CST [31734] LOG:  duration: 0.073 ms
> 2017-01-19 08:54:57.321 CST [31734] LOG:  execute S_1: DROP TABLE
> rollbacktest CASCADE
> 2017-01-19 08:54:57.321 CST [31734] ERROR:  table "rollbacktest" does not
> exist
> 2017-01-19 08:54:57.321 CST [31734] STATEMENT:  DROP TABLE rollbacktest
> CASCADE
> 2017-01-19 08:54:57.322 CST [31734] DEBUG:  parse S_2: CREATE TABLE
> rollbacktest (a int, str text)
> 2017-01-19 08:54:57.322 CST [31734] STATEMENT:  CREATE TABLE rollbacktest
> (a int, str text)
> 2017-01-19 08:54:57.322 CST [31734] DEBUG:  StartTransaction(1) name:
> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
> 2017-01-19 08:54:57.322 CST [31734] STATEMENT:  CREATE TABLE rollbacktest
> (a int, str text)
> 2017-01-19 08:54:57.322 CST [31734] LOG:  duration: 0.279 ms
> 2017-01-19 08:54:57.323 CST [31734] DEBUG:  bind  to S_2
> 2017-01-19 08:54:57.323 CST [31734] LOG:  duration: 0.163 ms
> 2017-01-19 08:54:57.323 CST [31734] LOG:  execute S_2: CREATE TABLE
> rollbacktest (a int, str text)
> 2017-01-19 08:54:57.324 CST [31734] DEBUG:  building index
> "pg_toast_163960_index" on table "pg_toast_163960"
> 2017-01-19 08:54:57.324 CST [31734] STATEMENT:  CREATE TABLE rollbacktest
> (a int, str text)
> 2017-01-19 08:54:57.327 CST [31734] LOG:  duration: 4.232 ms
> 2017-01-19 08:54:57.327 CST [31734] DEBUG:  CommitTransaction(1) name:
> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 270571/1/5
> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  parse S_3:
> 2017-01-19 08:54:57.329 CST [31734] STATEMENT:
> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  StartTransaction(1) name:
> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
> 2017-01-19 08:54:57.329 CST [31734] STATEMENT:
> 2017-01-19 08:54:57.329 CST [31734] LOG:  duration: 0.148 ms
> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  bind  to S_3
> 2017-01-19 08:54:57.329 CST [31734] LOG:  duration: 0.088 ms
> 2017-01-19 08:54:57.329 CST [31734] DEBUG:  CommitTransaction(1) name:
> unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 0/1/0
> 2017-01-19 08:54:57.330 CST [31734] DEBUG:  parse S_4: BEGIN
> 2017-01-19 08:54:57.330 CST [31734] STATEMENT:  BEGIN
> 2017-01-19 08:54:57.330 CST [31734] DEBUG:  StartTransaction(1) name:
> unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
> 2017-01-19 08:54:57.330 CST [31734] STATEMENT:  BEGIN
> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.189 ms
> 2017-01-19 08:54:57.331 CST [31734] DEBUG:  bind  to S_4
> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.105 ms
> 2017-01-19 08:54:57.331 CST [31734] LOG:  execute S_4: BEGIN
> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.154 ms
> 2017-01-19 08:54:57.331 CST [31734] DEBUG:  parse S_5: insert into
> rollbacktest(a, str) values (0, 'test')
> 2017-01-19 08:54:57.331 CST [31734] STATEMENT:  insert into
> rollbacktest(a, str) values (0, 'test')
> 2017-01-19 08:54:57.331 CST [31734] LOG:  duration: 0.277 ms
> 2017-01-19 08:54:57.332 CST [31734] DEBUG:  bind  to S_5
> 2017-01-19 08:54:57.332 CST [31734] LOG:  duration: 0.168 ms
> 2017-01-19 08:54:57.332 CST [31734] LOG:  execute S_5: insert into
> rollbacktest(a, str) values (0, 'test')
> 2017-01-19 08:54:57.332 CST [31734] LOG:  duration: 0.233 ms
> 2017-01-19 08:54:57.333 CST [31734] DEBUG:  parse S_6: select * from
> rollbacktest
> 2017-01-19 08:54:57.333 CST [31734] STATEMENT:  select * from rollbacktest
> 2017-01-19 08:54:57.333 CST [31734] LOG:  duration: 0.093 ms
> 2017-01-19 08:54:57.333 CST [31734] DEBUG:  bind  to S_6
> 2017-01-19 08:54:57.333 CST [31734] LOG:  duration: 0.188 ms
> 2017-01-19 08:54:57.333 CST [31734] LOG:  execute S_6: select * from
> rollbacktest
> 2017-01-19 08:54:57.333 CST [31734] LOG:  duration: 0.094 ms
> 2017-01-19 08:54:57.334 CST [31734] DEBUG:  parse S_7: select 1/0
> 2017-01-19 08:54:57.334 CST [31734] STATEMENT:  select 1/0
> 2017-01-19 08:54:57.334 CST [31734] LOG:  duration: 0.174 ms
> 2017-01-19 08:54:57.334 CST [31734] DEBUG:  bind  to S_7
> 2017-01-19 08:54:57.334 CST [31734] ERROR:  division by zero
> 2017-01-19 08:54:57.334 CST [31734] STATEMENT:  select 1/0
> 2017-01-19 08:54:57.335 CST [31734] DEBUG:  bind  t

Re: [HACKERS] [COMMITTERS] pgsql: Generate fmgr prototypes automatically

2017-01-19 Thread Tom Lane

Peter Eisentraut  writes:
> Here is a patch that moves two blocks from builtins.h into separate
> header files.  That avoids having to include nodes/pg_list.h and
> utils/sortsupport.h.

Seems like a reasonable solution.

> The remaining inclusion of nodes/nodes.h is for the oidparse() function.
>  I think that could be moved out of oid.c and put somewhere near parser/
> or objectaddress.c.  But the practical savings from avoiding nodes.h is
> probably near zero, so I haven't done anything about that here.

Agreed, not much value in that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 10:59 AM, Jorge Solórzano  wrote:
> I have isolated the tests run by the pgjdbc project, I have disabled the
> replication (wal_level = minimal) and the error is still present so it seems
> that this error is not related to the replication, the test that cause the
> error is AutoRollbackTestSuite, I have enable DEBUG mesages in postgresql
> and pgjdbc and this is what I get:
>
> https://drive.google.com/drive/folders/0ByHbu-sR29gda2d2LUFpV0lPdUk

Can somebody paste the stack trace into an email, here?  There's
reference to it upthread, but it's not obvious where it actually is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

2017-01-19 Thread Robert Haas

On Thu, Jan 19, 2017 at 10:06 AM, Alvaro Herrera
 wrote:
> Peter Eisentraut wrote:
>
>> Also, I wonder whether we should not in vacuum.c change the order of the
>> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
>> to keep everything consistent.
>
> I am wary of doing that.  The current coding is well battle-tested by
> now, but doing things in the opposite order, not at all.  Pending some
> testing to show that there is no problem with a change, I would leave
> things as they are.  Probably said testing is too onerous for the
> benefit (which is just a little consistency).  What I fear is: what
> happens if a concurrent checkpoint reads the values between the two
> operations, and a crash occurs?  I think that the checkpoint might save
> the updated values, so after crash recovery the truncate would not be
> executed, possibly leaving files around.  Leaving files around might be
> dangerous for multixacts at least (it's probably harmless for xids).

Don't both SLRUs eventually wrap?  If so, leaving file around seems
dangerous for either in equal measure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum

2017-01-19 Thread Robert Haas

On Tue, Jan 17, 2017 at 4:02 PM, Robert Haas  wrote:
> Amit's chosen yet another possible place to insert the guard: teach
> autovacuum that if a worker skips at least one table due to concurrent
> autovacuum activity AND ends up vacuuming no tables, don't call
> vac_update_datfrozenxid().  Since there is or was another worker
> running, vac_update_datfrozenxid() either already has been called or
> will be when that worker finishes.  So that seems safe.  If his patch
> were changed to skip vac_update_datfrozenxid() in all cases where we
> do nothing rather than only when we skip a table due to concurrent
> activity, we'd reintroduce the dropped-database problem that was fixed
> by 794e3e81a0e8068de2606015352c1254cb071a78.

After sleeping on this, I'm inclined to go with Amit's fix for now.
It seems less likely to break anything in the back-branches than any
other option I can think up.

An updated version of that patch is attached.  I changed the "if"
statement to avoid having an empty "if" clause and a non-empty "else"
clause, and I rewrote the comment based on my previous analysis.

Barring objections, I'll commit and back-patch this version.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


prevent-useless-autovac-v2.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add pg_sequence system catalog

2017-01-19 Thread Stephen Frost

Peter,

* Peter Eisentraut (pete...@gmx.net) wrote:
> Add pg_sequence system catalog

The query this added to dumpSequence() seems to think that sequence
names are unique across databases:

appendPQExpBuffer(query,
  "SELECT seqstart, seqincrement, "
  "CASE WHEN seqincrement > 0 AND seqmax = %s THEN NULL "
  " WHEN seqincrement < 0 AND seqmax = -1 THEN NULL "
  " ELSE seqmax "
  "END AS seqmax, "
  "CASE WHEN seqincrement > 0 AND seqmin = 1 THEN NULL "
  " WHEN seqincrement < 0 AND seqmin = %s THEN NULL "
  " ELSE seqmin "
  "END AS seqmin, "
  "seqcache, seqcycle "
  "FROM pg_class c "
  "JOIN pg_sequence s ON (s.seqrelid = c.oid) "
  "WHERE relname = ",
  bufx, bufm);
appendStringLiteralAH(query, tbinfo->dobj.name, fout);

Note that tbinfo->dobj.name contains just the name, and trying to match
based on just 'relname' isn't going to work since sequences with the
same name can exist in difference schemas.

I'd suggest using our usual approach in pg_dump, which is matching based
on the OID, like so:

WHERE c.oid = '%u'::oid

The OID is in: tbinfo->dobj.catId.oid

Also, you should move the selectSourceSchema() into the per-version
branches and set it to 'pg_catalog' for PG10 and up, which would allow
you to avoid having to qualify the table names, et al.  As is, this
query will also fail if a user has created a 'pg_class' or
'pg_sequence' table in their schema.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Jorge Solórzano

I have isolated the tests run by the pgjdbc project, I have disabled the
replication (wal_level = minimal) and the error is still present so it
seems that this error is not related to the replication, the test that
cause the error is AutoRollbackTestSuite, I have enable DEBUG mesages in
postgresql and pgjdbc and this is what I get:

https://drive.google.com/drive/folders/0ByHbu-sR29gda2d2LUFpV0lPdUk


On Thu, Jan 19, 2017 at 8:42 AM, Vladimir Gordiychuk 
wrote:

> Core dump was extracted from this build https://travis-ci.org/
> Gordiychuk/pgjdbc/builds/193318018
>
>
> 2017-01-19 17:38 GMT+03:00 Vladimir Gordiychuk :
>
>> Last success build from postgresql HEAD was 20 days ago
>> https://travis-ci.org/pgjdbc/pgjdbc/builds/187764274
>> We detect this problem 3 days ago.
>>
>> 2017-01-19 17:22 GMT+03:00 Dave Cramer :
>>
>>> The travis job https://travis-ci.org/pgjdbc/pgjdbc/jobs/192517342 is
>>> seeing a segfault when we are testing against HEAD with REPLICATION turned
>>> on.
>>>
>>> Logs can be found here https://drive.google.com/
>>> drive/folders/0B-Heg5ZYCWbreEE4Uk5LdnJ5eWM?usp=sharing
>>>
>>> Regards,
>>>
>>> Dave Cramer
>>>
>>
>>
>

Re: [HACKERS] increasing the default WAL segment size

2017-01-19 Thread Robert Haas

On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas  wrote:
> Problems 2-4 actually have to do with a DestReceiver of type
> DestRemote really, really wanting to have an associated Portal and
> database connection, so one approach is to create a stripped-down
> DestReceiver that doesn't care about those things and then passing
> that to GetPGVariable.

I tried that and it worked out pretty well, so I'm inclined to go with
this approach.  Proposed patches attached.  0001 adds the new
DestReceiver type, and 0002 is a revised patch to implement the SHOW
command itself.

Thoughts, comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-Add-new-DestReceiver-type-for-use-when-we-lack-catal.patch
Description: invalid/octet-stream


0002-Add-a-SHOW-command-to-the-walsender-command-language.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-19 Thread Albe Laurenz

Wolfgang Wilhelm wrote:
> are you looking for something like this:
> jtc1sc32.org/doc/N2301-2350/32N2311T-text_for_ballot-CD_9075-2.pdf ?

That looks like it is from 2013...

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-19 Thread Peter Eisentraut

On 1/19/17 10:03 AM, Wolfgang Wilhelm wrote:
> are you looking for something like this:
> jtc1sc32.org/doc/N2301-2350/32N2311T-text_for_ballot-CD_9075-2.pdf ?

That's not very recent.


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-19 Thread Alvaro Herrera

Karl O. Pinc wrote:
> On Wed, 18 Jan 2017 19:27:40 -0300
> Alvaro Herrera  wrote:

> > I thought this part was odd -- I mean, why is SysLogger_Start() being
> > called if the collector is not enabled?  Turns out we do it and return
> > early if not enabled.  But not in all cases -- there is one callsite
> > in postmaster.c that avoids the call if the collector is disabled.
> > That needs to be changed if we want this to work reliably.
> 
> Is this an argument for having the current_logfiles always exist
> and be empty when there is no in-filesystem logfile?  It always felt
> to me that the code would be simpler that way.

I don't know.  I am just saying that you need to patch postmaster.c line
1726 to remove the second arm of the &&.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-19 Thread Karl O. Pinc

On Wed, 18 Jan 2017 19:27:40 -0300
Alvaro Herrera  wrote:

> Karl O. Pinc wrote:
> 
> > @@ -511,10 +519,16 @@ int
> >  SysLogger_Start(void)
> >  {
> > pid_t   sysloggerPid;
> > -   char   *filename;
> >  
> > +   /*
> > +* Logging collector is not enabled. We don't know where
> > messages are
> > +* logged.  Remove outdated file holding the current log
> > filenames.
> > +*/
> > if (!Logging_collector)
> > +   {
> > +   unlink(LOG_METAINFO_DATAFILE);
> > return 0;
> > +   }  
> 
> I thought this part was odd -- I mean, why is SysLogger_Start() being
> called if the collector is not enabled?  Turns out we do it and return
> early if not enabled.  But not in all cases -- there is one callsite
> in postmaster.c that avoids the call if the collector is disabled.
> That needs to be changed if we want this to work reliably.

Is this an argument for having the current_logfiles always exist
and be empty when there is no in-filesystem logfile?  It always felt
to me that the code would be simpler that way.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Stephen Frost

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 1/19/17 9:53 AM, Stephen Frost wrote:
> > Sure, but we're talking about replaying WAL vs. doing a checkpoint, not
> > about writing WAL vs. replaying WAL.  Replaying WAL and doing a
> > checkpoint both require writing to lots of different places across the
> > filesystem, of course.
> 
> Yeah, but they are each doing different things, so you can't say that
> one will take the same amount of time or strictly less than the other.
> It might be a good first estimation, but my practical experience is that
> it's not when it really matters. :-/

You've found that WAL replay following a crash takes longer than
checkpointing and therefore crash recovery requires more time than
checkpoint_timeout is set to?

WAL replay does do more work, generally speaking (the WAL has to be
read, the checksum validated on it, and then the write has to go out,
while the checkpointer just writes the page out from memory), but it's
also dealing with less contention on the system (there aren't a bunch of
backends hammering the disks to pull data in with reads when you're
doing crash recovery...).  We did make the WAL checksum routines a lot
faster with 9.6, as I recall, so perhaps there's been some change there
too.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups

2017-01-19 Thread Alvaro Herrera

Peter Eisentraut wrote:

> Also, I wonder whether we should not in vacuum.c change the order of the
> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just
> to keep everything consistent.

I am wary of doing that.  The current coding is well battle-tested by
now, but doing things in the opposite order, not at all.  Pending some
testing to show that there is no problem with a change, I would leave
things as they are.  Probably said testing is too onerous for the
benefit (which is just a little consistency).  What I fear is: what
happens if a concurrent checkpoint reads the values between the two
operations, and a crash occurs?  I think that the checkpoint might save
the updated values, so after crash recovery the truncate would not be
executed, possibly leaving files around.  Leaving files around might be
dangerous for multixacts at least (it's probably harmless for xids).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-19 Thread Wolfgang Wilhelm

Hi,
are you looking for something like this: 
jtc1sc32.org/doc/N2301-2350/32N2311T-text_for_ballot-CD_9075-2.pdf ?
Kind Regards,
Wolfgang

Oleg Bartunov  schrieb am 16:39 Dienstag, 17.Januar 
2017:

On Tue, Jan 17, 2017 at 6:26 PM, Alvaro Herrera  
wrote:

Oleg Bartunov wrote:
> Hi there,
>
> I just bought ISO/IEC 9075-2:2016
> http://www.iso.org/iso/home/ store/catalogue_tc/catalogue_ 
> detail.htm?csnumber=63556
> to satisfy my interest on json support in SQL.  I am not ready to discuss
> here implementation details, but there is one problem with the status of
> this document, which is copyright protected.
>
> "All rights reserved. Unless otherwise specified, no part of this
> publication may be reproduced or utilized otherwise in any form or by any
> means, electronic or mechanical, including photocopying, or posting on the
> internet or an intranet, without prior written permission. Permission can
> be requested from either ISO at the address below or ISO’s member body in
> the country of the requester."
>
> My question is should we ask ISO for permission to use my copy in our
> community or  should we buy at least one another copy (for reviewer) ?

With previous versions, we have relied on draft versions
"indistinguishable from latest" posted by third parties.  I couldn't
find such a version for sql2106 in an (admittedly very quick) search.

I found several mentions about such a draft, but failed to find any.  I think 
we can wait until we have a prototype or somebody share a link.

I doubt ISO would give the community permission to use your copy, but it
wouldn't hurt to ask them. 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Peter Eisentraut

On 1/18/17 3:47 PM, Robert Haas wrote:
> Anybody who has got a script that runs pg_ctl unattended mode likely
> now has to go update that script to add --no-wait.

The state of init scripts and other start scripts out there is such a
mess, it's hard to make general statements like this.  Many start
scripts still start the postmaster directly and have confusing or
outdated advice about whether or not to use pg_ctl.  Some implement
their own waiting logic after starting.  Some just ignore the issue and
do wrong or inconsistent things.

With this change, together with the systemd support that is already out
there with 9.6, and with the new promote wait support, we'll at least
have a consistent approach going forward and have a better shot at
sorting out the current mess.

Someone who really wants the old behavior can add the command-line
option in a backward-compatible way.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 1/18/17 3:12 PM, Stephen Frost wrote:
> > I don't understand what I'm missing when it comes to checkpoint_timeout
> > and the time required to recover from a crash.  You aren't the first
> > person to question that association, but it seems pretty clear to me.
> > 
> > When doing recovery, we have to replay everything since the last
> > checkpoint.  If we are checkpointing at least every 5 minutes then we
> > can't have any more than 5 minutes worth of WAL to replay, right?
> 
> But writing WAL and replaying WAL are two entirely different operations.

Sure, but we're talking about replaying WAL vs. doing a checkpoint, not
about writing WAL vs. replaying WAL.  Replaying WAL and doing a
checkpoint both require writing to lots of different places across the
filesystem, of course.

There can be cases where individual backends are having to evict pages
to clear space for new pages to be pulled in and that could have an
effect of making a checkpoint happen with multiple processes, but that's
a bit of a different situation.  Of course, it would be nice if we could
make our replay of WAL multi-process, and checkpointing too, for that
matter.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Peter Eisentraut

On 1/19/17 9:53 AM, Stephen Frost wrote:
> Sure, but we're talking about replaying WAL vs. doing a checkpoint, not
> about writing WAL vs. replaying WAL.  Replaying WAL and doing a
> checkpoint both require writing to lots of different places across the
> filesystem, of course.

Yeah, but they are each doing different things, so you can't say that
one will take the same amount of time or strictly less than the other.
It might be a good first estimation, but my practical experience is that
it's not when it really matters. :-/

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Stephen Frost

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 1/18/17 3:47 PM, Robert Haas wrote:
> > Anybody who has got a script that runs pg_ctl unattended mode likely
> > now has to go update that script to add --no-wait.
> 
> The state of init scripts and other start scripts out there is such a
> mess, it's hard to make general statements like this.  Many start
> scripts still start the postmaster directly and have confusing or
> outdated advice about whether or not to use pg_ctl.  Some implement
> their own waiting logic after starting.  Some just ignore the issue and
> do wrong or inconsistent things.
> 
> With this change, together with the systemd support that is already out
> there with 9.6, and with the new promote wait support, we'll at least
> have a consistent approach going forward and have a better shot at
> sorting out the current mess.

+1.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Vladimir Gordiychuk

Core dump was extracted from this build
https://travis-ci.org/Gordiychuk/pgjdbc/builds/193318018


2017-01-19 17:38 GMT+03:00 Vladimir Gordiychuk :

> Last success build from postgresql HEAD was 20 days ago
> https://travis-ci.org/pgjdbc/pgjdbc/builds/187764274
> We detect this problem 3 days ago.
>
> 2017-01-19 17:22 GMT+03:00 Dave Cramer :
>
>> The travis job https://travis-ci.org/pgjdbc/pgjdbc/jobs/192517342 is
>> seeing a segfault when we are testing against HEAD with REPLICATION turned
>> on.
>>
>> Logs can be found here https://drive.google.com/
>> drive/folders/0B-Heg5ZYCWbreEE4Uk5LdnJ5eWM?usp=sharing
>>
>> Regards,
>>
>> Dave Cramer
>>
>
>

Re: [HACKERS] [JDBC] SEGFAULT in HEAD with replication

2017-01-19 Thread Vladimir Gordiychuk

Last success build from postgresql HEAD was 20 days ago
https://travis-ci.org/pgjdbc/pgjdbc/builds/187764274
We detect this problem 3 days ago.

2017-01-19 17:22 GMT+03:00 Dave Cramer :

> The travis job https://travis-ci.org/pgjdbc/pgjdbc/jobs/192517342 is
> seeing a segfault when we are testing against HEAD with REPLICATION turned
> on.
>
> Logs can be found here https://drive.google.com/drive/folders/0B-
> Heg5ZYCWbreEE4Uk5LdnJ5eWM?usp=sharing
>
> Regards,
>
> Dave Cramer
>

Re: [HACKERS] Re: Clarifying "server starting" messaging in pg_ctl start without --wait

2017-01-19 Thread Peter Eisentraut

On 1/18/17 3:12 PM, Stephen Frost wrote:
> I don't understand what I'm missing when it comes to checkpoint_timeout
> and the time required to recover from a crash.  You aren't the first
> person to question that association, but it seems pretty clear to me.
> 
> When doing recovery, we have to replay everything since the last
> checkpoint.  If we are checkpointing at least every 5 minutes then we
> can't have any more than 5 minutes worth of WAL to replay, right?

But writing WAL and replaying WAL are two entirely different operations.
 Writing a WAL record involves writing a few bytes sequentially.
Replaying a WAL record might involve hopping all over the system and
applying the changes that the WAL record describes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Generate fmgr prototypes automatically

2017-01-19 Thread Peter Eisentraut

On 1/17/17 3:07 PM, Tom Lane wrote:
> Alternatively ... is there a specific reason why you chose to make
> builtins.h the key inclusion file for this change, rather than having
> callers include fmgrprotos.h directly?  It seems like the stuff remaining
> in builtins.h is just a laundry list of random utility functions.
> Maybe dispersing those to other headers is the thing to do.

Here is a patch that moves two blocks from builtins.h into separate
header files.  That avoids having to include nodes/pg_list.h and
utils/sortsupport.h.

The remaining inclusion of nodes/nodes.h is for the oidparse() function.
 I think that could be moved out of oid.c and put somewhere near parser/
or objectaddress.c.  But the practical savings from avoiding nodes.h is
probably near zero, so I haven't done anything about that here.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From a4f700198f19ce5a09cde8d58870187ea0e05012 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 18 Jan 2017 12:00:00 -0500
Subject: [PATCH] Move some things from builtins.h to new header files

This avoids that builtins.h has to include additional header files.
---
 contrib/bloom/blvalidate.c |  1 +
 contrib/citext/citext.c|  1 +
 contrib/dblink/dblink.c|  1 +
 contrib/fuzzystrmatch/fuzzystrmatch.c  |  1 +
 contrib/pageinspect/btreefuncs.c   |  1 +
 contrib/pageinspect/rawpage.c  |  1 +
 contrib/pgrowlocks/pgrowlocks.c|  1 +
 contrib/pgstattuple/pgstatindex.c  |  1 +
 contrib/pgstattuple/pgstattuple.c  |  1 +
 contrib/postgres_fdw/option.c  |  1 +
 contrib/tsearch2/tsearch2.c|  1 +
 contrib/unaccent/unaccent.c|  1 +
 src/backend/access/brin/brin_validate.c|  1 +
 src/backend/access/gin/ginvalidate.c   |  1 +
 src/backend/access/gist/gistvalidate.c |  1 +
 src/backend/access/hash/hashvalidate.c |  1 +
 src/backend/access/nbtree/nbtvalidate.c|  1 +
 src/backend/access/spgist/spgtextproc.c|  1 +
 src/backend/access/spgist/spgvalidate.c|  1 +
 src/backend/catalog/namespace.c|  1 +
 src/backend/catalog/objectaddress.c|  1 +
 src/backend/catalog/pg_proc.c  |  1 +
 src/backend/commands/extension.c   |  1 +
 src/backend/commands/indexcmds.c   |  1 +
 src/backend/commands/sequence.c|  1 +
 src/backend/commands/tablespace.c  |  1 +
 src/backend/commands/variable.c|  1 +
 src/backend/parser/parse_relation.c|  1 +
 src/backend/postmaster/postmaster.c|  1 +
 src/backend/replication/logical/logicalfuncs.c |  1 +
 src/backend/tsearch/dict_thesaurus.c   |  1 +
 src/backend/tsearch/wparser.c  |  1 +
 src/backend/utils/adt/acl.c|  1 +
 src/backend/utils/adt/jsonb_gin.c  |  1 +
 src/backend/utils/adt/jsonb_util.c |  1 +
 src/backend/utils/adt/regexp.c |  1 +
 src/backend/utils/adt/regproc.c|  2 ++
 src/backend/utils/adt/ruleutils.c  |  1 +
 src/backend/utils/adt/selfuncs.c   |  1 +
 src/backend/utils/adt/tid.c|  1 +
 src/backend/utils/adt/tsvector_op.c|  1 +
 src/backend/utils/adt/varchar.c|  1 +
 src/backend/utils/adt/varlena.c|  1 +
 src/backend/utils/cache/ts_cache.c |  1 +
 src/backend/utils/fmgr/funcapi.c   |  1 +
 src/backend/utils/init/miscinit.c  |  1 +
 src/backend/utils/misc/guc.c   |  1 +
 src/backend/utils/misc/rls.c   |  1 +
 src/include/utils/builtins.h   | 33 +--
 src/include/utils/regproc.h| 28 +++
 src/include/utils/varlena.h| 37 ++
 src/pl/plpgsql/src/pl_comp.c   |  1 +
 src/pl/plpgsql/src/pl_handler.c|  1 +
 53 files changed, 117 insertions(+), 32 deletions(-)
 create mode 100644 src/include/utils/regproc.h
 create mode 100644 src/include/utils/varlena.h

diff --git a/contrib/bloom/blvalidate.c b/contrib/bloom/blvalidate.c
index ab8b848b49..cb75d23199 100644
--- a/contrib/bloom/blvalidate.c
+++ b/contrib/bloom/blvalidate.c
@@ -21,6 +21,7 @@
 #include "catalog/pg_type.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/regproc.h"
 #include "utils/syscache.h"
 
 #include "bloom.h"
diff --git a/contrib/citext/citext.c b/contrib/citext/citext.c
index 1174b70aa7..04f604b15f 100644
--- a/contrib/citext/citext.c
+++ b/contrib/citext/citext.c
@@ -7,6 +7,7 @@
 #include "catalog/pg_collation.h"
 #include "utils/builtins.h"
 #include "utils/formatting.h"
+#include

[HACKERS] SEGFAULT in HEAD with replication

2017-01-19 Thread Dave Cramer

The travis job https://travis-ci.org/pgjdbc/pgjdbc/jobs/192517342 is seeing
a segfault when we are testing against HEAD with REPLICATION turned on.

Logs can be found here
https://drive.google.com/drive/folders/0B-Heg5ZYCWbreEE4Uk5LdnJ5eWM?usp=sharing

Regards,

Dave Cramer

[HACKERS] comment ends in the middle of a

2017-01-19 Thread Robert Haas

ProcessStartupPacket() in src/backend/postmaster/postmaster.c contains
the following comment:

/*
 * Due to backward compatibility
concerns the replication
 * parameter is a hybrid beast which
allows the value to be
 * either boolean or the string
'database'. The latter
 * connects to a specific database
which is e.g. required for
 * logical decoding while.
 */

I'm not sure if the word "while" just needs to be removed there, or if
it's supposed to have more words after it.  But it doesn't seem
correct as-is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to extract bytes from a bit/bit(n) Datum pointer?

2017-01-19 Thread valeriof

Thank you Ashutosh! I've been looking everywhere for this info 



--
View this message in context: 
http://postgresql.nabble.com/How-to-extract-bytes-from-a-bit-bit-n-Datum-pointer-tp5939776p5939811.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Grouped base relation

2017-01-19 Thread Ashutosh Bapat

>
>> Yes, it's hard, but I think without having a separate RelOptInfo the
>> design won't be complete. Is there a subset of problem that can be
>> solved by using a separate RelOptInfo e.g. pushing aggregates down
>> child relations or anything else.
>
> I'm still not convinced that all the fields of RelOptInfo (typically relids)
> need to be duplicated. If the current concept should be improved, I'd move all
> the grouping related fields to a separate structure, e.g. GroupPathInfo, and
> let RelOptInfo point to it. Similar to ParamPathInfo, which contains
> parameterization-specific information, GroupPathInfo would conain the
> grouping-specific information: target, row count, width, maybe path lists too.
>

I didn't think about this option. Still not very clean, but may be acceptable.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Causal reads take II

2017-01-19 Thread Ants Aasma

On Thu, Jan 19, 2017 at 2:22 PM, Thomas Munro
 wrote:
> On Thu, Jan 19, 2017 at 8:11 PM, Ants Aasma  wrote:
>> On Tue, Jan 3, 2017 at 3:43 AM, Thomas Munro
>>  wrote:
>>> Long term, I think it would be pretty cool if we could develop a set
>>> of features that give you distributed sequential consistency on top of
>>> streaming replication.  Something like (this | causality-tokens) +
>>> SERIALIZABLE-DEFERRABLE-on-standbys[3] +
>>> distributed-dirty-read-prevention[4].
>>
>> Is it necessary that causal writes wait for replication before making
>> the transaction visible on the master? I'm asking because the per tx
>> variable wait time between logging commit record and making
>> transaction visible makes it really hard to provide matching
>> visibility order on master and standby.
>
> Yeah, that does seem problematic.  Even with async replication or no
> replication, isn't there already a race in CommitTransaction() where
> two backends could reach RecordTransactionCommit() in one order but
> ProcArrayEndTransaction() in the other order?  AFAICS using
> synchronous replication in one of the transactions just makes it more
> likely you'll experience such a visibility difference between the DO
> and REDO histories (!), by making RecordTransactionCommit() wait.
> Nothing prevents you getting a snapshot that can see t2 but not t1 in
> the DO history, while someone doing PITR or querying an asynchronous
> standby gets a snapshot that can see t1 but not t2 because those
> replay the REDO history.

Yes there is a race even with all transactions having the same
synchronization level. But nobody will mind if we some day fix that
race. :) With different synchronization levels it is much trickier to
fix as either async commits must wait behind sync commits before
becoming visible, return without becoming visible or visibility order
must differ from commit record LSN order. The first makes the async
commit feature useless, second seems a no-go for semantic reasons,
third requires extra information sent to standby's so they know the
actual commit order.

>> In CSN based snapshot
>> discussions we came to the conclusion that to make standby visibility
>> order match master while still allowing for async transactions to
>> become visible before they are durable we need to make the commit
>> sequence a vector clock and transmit extra visibility ordering
>> information to standby's. Having one more level of delay between wal
>> logging of commit and making it visible would make the problem even
>> worse.
>
> I'd like to read that... could you please point me at the right bit of
> that discussion?

Some of that discussion was face to face at pgconf.eu, some of it is
here: 
https://www.postgresql.org/message-id/CA+CSw_vbt=cwluogr7gxdpnho_y4cz7x97+o_bh-rfo7ken...@mail.gmail.com

Let me know if you have any questions.

>> It seems that fixing that would
>> require either keeping some per client state or a global agreement on
>> what snapshots are safe to provide, both of which you tried to avoid
>> for this feature.
>
> Agreed.  You briefly mentioned this problem in the context of pairs of
> read-only transactions a while ago[1].  As you said then, it does seem
> plausible to do that with a token system that gives clients the last
> commit LSN from the snapshot used by a read only query, so that you
> can ask another standby to make sure that LSN has been replayed before
> running another read-only transaction.  This could be handled
> explicitly by a motivated client that is talking to multiple nodes.  A
> more general problem is client A telling client B to go and run
> queries and expecting B to see all transactions that A has seen; it
> now has to pass the LSN along with that communication, or rely on some
> kind of magic proxy that sees all transactions, or a radically
> different system with a GTM.

If/when we do CSN based snapshots, adding a GTM could be relatively
straightforward. It's basically not all that far from what Spanner is
doing by using a timestamp as the snapshot. But this is all relatively
independent of this patch.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Microvacuum support for Hash Index

2017-01-19 Thread Jesper Pedersen


Hi Ashutosh,

On 01/10/2017 08:40 AM, Jesper Pedersen wrote:

On 01/10/2017 05:24 AM, Ashutosh Sharma wrote:

Thanks for reporting this problem. It is basically coming because i
forgot to unpin the bucketbuf in hash_xlog_vacuum_one_page(). Please
find the attached v5 patch that fixes the issue.



The crash is now fixed, but the

--- test.sql ---
\set id random(1,10)
\set val random(0,10)
BEGIN;
UPDATE test SET val = :val WHERE id = :id;
COMMIT;
--- test.sql ---

case gives

client 6 aborted in command 3 of script 0; ERROR:  deadlock detected
DETAIL:  Process 14608 waits for ShareLock on transaction 1444620;
blocked by process 14610.
Process 14610 waits for ShareLock on transaction 1444616; blocked by
process 14608.
HINT:  See server log for query details.
CONTEXT:  while rechecking updated tuple (12,3) in relation "test"
...

using pgbench -M prepared -c 10 -j 10 -T 300 -f test.sql test



I'm not seeing this deadlock with just the WAL v8 patch applied.

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix documentation typo

2017-01-19 Thread Ishii Ayumi

Hi,

Here is a patch that adds temporary column's description to
pg_replication_slots document.

Regards,
--
Ayumi Ishii


fix_document_typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch to implement pg_current_logfile() function

2017-01-19 Thread Karl O. Pinc

On Thu, 19 Jan 2017 16:59:10 +0900
Michael Paquier  wrote:

> On Thu, Jan 19, 2017 at 7:27 AM, Alvaro Herrera
>  wrote:

> > I don't think the "abstract names" stuff is an improvement (just
> > look at the quoting mess in ConfigureNamesString).  I think we
> > should do without that; at least as part of this patch.  If you
> > think there's code that can get better because of the idea, let's
> > see it.  

Fair enough.

FWIW, when I first wrote the "abstract names" stuff there were another
half dozen or so occurrences of the constants.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-01-19 Thread Pavan Deolasee

Hi Alvaro,

On Tue, Jan 17, 2017 at 8:41 PM, Alvaro Herrera 
wrote:

> Reading through the track_root_lp patch now.
>
>
Thanks for the review.


> > + /*
> > +  * For HOT (or WARM) updated tuples, we store the offset
> of the root
> > +  * line pointer of this chain in the ip_posid field of the
> new tuple.
> > +  * Usually this information will be available in the
> corresponding
> > +  * field of the old tuple. But for aborted updates or
> pg_upgraded
> > +  * databases, we might be seeing the old-style CTID chains
> and hence
> > +  * the information must be obtained by hard way
> > +  */
> > + if (HeapTupleHeaderHasRootOffset(oldtup.t_data))
> > + root_offnum = HeapTupleHeaderGetRootOffset(o
> ldtup.t_data);
> > + else
> > + heap_get_root_tuple_one(page,
> > + ItemPointerGetOffsetNumber(&(
> oldtup.t_self)),
> > + &root_offnum);
>
> Hmm.  So the HasRootOffset tests the HEAP_LATEST_TUPLE bit, which is
> reset temporarily during an update.  So that case shouldn't occur often.
>

Right. The root offset is stored only in those tuples where
HEAP_LATEST_TUPLE is set. This flag should generally be set on the tuples
that are being updated, except for the case when the last update failed and
the flag was cleared. In other common case is going to be pg-upgraded
cluster where none of the existing tuples will have this flag set. So in
those cases, we must find the root line pointer hard way.


>
> Oh, I just noticed that HeapTupleHeaderSetNextCtid also clears the flag.
>

Yes, but this should happen only during updates and unless the update
fails, the next-to-be-updated tuple should have the flag set.


>
> > @@ -4166,10 +4189,29 @@ l2:
> >   HeapTupleClearHotUpdated(&oldtup);
> >   HeapTupleClearHeapOnly(heaptup);
> >   HeapTupleClearHeapOnly(newtup);
> > + root_offnum = InvalidOffsetNumber;
> >   }
> >
> > - RelationPutHeapTuple(relation, newbuf, heaptup, false);
>  /* insert new tuple */
> > + /* insert new tuple */
> > + RelationPutHeapTuple(relation, newbuf, heaptup, false,
> root_offnum);
> > + HeapTupleHeaderSetHeapLatest(heaptup->t_data);
> > + HeapTupleHeaderSetHeapLatest(newtup->t_data);
> >
> > + /*
> > +  * Also update the in-memory copy with the root line pointer
> information
> > +  */
> > + if (OffsetNumberIsValid(root_offnum))
> > + {
> > + HeapTupleHeaderSetRootOffset(heaptup->t_data,
> root_offnum);
> > + HeapTupleHeaderSetRootOffset(newtup->t_data, root_offnum);
> > + }
> > + else
> > + {
> > + HeapTupleHeaderSetRootOffset(heaptup->t_data,
> > + ItemPointerGetOffsetNumber(&h
> eaptup->t_self));
> > + HeapTupleHeaderSetRootOffset(newtup->t_data,
> > + ItemPointerGetOffsetNumber(&h
> eaptup->t_self));
> > + }
>
> This is repetitive.  I think after RelationPutHeapTuple it'd be better
> to assign root_offnum = &heaptup->t_self, so that we can just call
> SetRootOffset() on each tuple without the if().
>

Fixed. I actually ripped off HeapTupleHeaderSetRootOffset() completely and
pushed setting of root line pointer into the HeapTupleHeaderSetHeapLatest().
That seems much cleaner because the system expects to find root line
pointer whenever HEAP_LATEST_TUPLE flag is set. Hence it makes sense to set
them together.


>
>
> > + HeapTupleHeaderSetHeapLatest((HeapTupleHeader) item);
> > + if (OffsetNumberIsValid(root_offnum))
> > + HeapTupleHeaderSetRootOffset((HeapTupleHeader)
> item,
> > + root_offnum);
> > + else
> > + HeapTupleHeaderSetRootOffset((HeapTupleHeader)
> item,
> > + offnum);
>
> Just a matter of style, but this reads nicer IMO:
>
> HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> OffsetNumberIsValid(root_offnum) ? root_offnum : offnum);
>

Understood. This code no longer exists in the new patch since
HeapTupleHeaderSetRootOffset is merged with HeapTupleHeaderSetHeapLatest.


>
>
> > @@ -740,8 +742,9 @@ heap_page_prune_execute(Buffer buffer,
> >   * holds a pin on the buffer. Once pin is released, a tuple might be
> pruned
> >   * and reused by a completely unrelated tuple.
> >   */
> > -void
> > -heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
> > +static void
> > +heap_get_root_tuples_internal(Page page, OffsetNumber target_offnum,
> > + OffsetNumber *root_offsets)
> >  {
> >   OffsetNumber offnum,
>
> I think this function deserves more/better/updated commentary.
>

Sure. I added more commentary. I also reworked the function so that the
caller can pass

Re: [HACKERS] PoC: Grouped base relation

2017-01-19 Thread Antonin Houska

Ashutosh Bapat  wrote:

> >> 1. Pushing down aggregates/groups down join tree, so that the number of 
> >> rows
> >> to be joined decreases.  This might be a good optimization to have. However
> >> there are problems in the current patch. Every path built for a relation
> >> (join or base) returns the same result expressed by the relation or its
> >> subset restricted by parameterization or unification. But this patch 
> >> changes
> >> that. It creates paths which represent grouping in the base relation.  I
> >> think, we need a separate relation to represent that result and hold paths
> >> which produce that result. That itself would be a sizable patch.
> >
> > Whether a separate relation (RelOptInfo) should be created for grouped
> > relation is an important design decision indeed. More important than your
> > argument about the same result ("partial path", used to implement parallel
> > nodes actually does not fit this criterion perfectly - it only returns part 
> > of
> > the set) is the fact that the data type (target) differs.
> > I even spent some time coding a prototype where separate RelOptInfo is 
> > created
> > for the grouped relation but it was much more invasive. In particular, if 
> > only
> > some relations are grouped, it's hard to join them with non-grouped ones w/o
> > changing make_rel_from_joinlist and subroutines substantially. (Decision
> > whether the plain or the grouped relation should be involved in joining 
> > makes
> > little sense at the leaf level of the join tree.)
> >
> > So I took the approach that resembles the partial paths - separate pathlists
> > within the same RelOptInfo.

> Yes, it's hard, but I think without having a separate RelOptInfo the
> design won't be complete. Is there a subset of problem that can be
> solved by using a separate RelOptInfo e.g. pushing aggregates down
> child relations or anything else.

I'm still not convinced that all the fields of RelOptInfo (typically relids)
need to be duplicated. If the current concept should be improved, I'd move all
the grouping related fields to a separate structure, e.g. GroupPathInfo, and
let RelOptInfo point to it. Similar to ParamPathInfo, which contains
parameterization-specific information, GroupPathInfo would conain the
grouping-specific information: target, row count, width, maybe path lists too.

> >
> >> 2. Try to push down aggregates based on the equivalence classes, where
> >> grouping properties can be transferred from one relation to the other using
> >> EC mechanism.
> >
> > I don't think the EC part should increase the patch complexity a lot. 
> > Unless I
> > missed something, it's rather isolated to the part where target of the 
> > grouped
> > paths is assembled. And I think it's important even for initial version of 
> > the
> > patch.
> >
> >> This seems to require solving the problem of combining aggregates across 
> >> the
> >> relations. But there might be some usecases which could benefit without
> >> solving this problem.
> >
> > If "combining aggregates ..." refers to joining grouped relations, then I
> > insist on doing this in the initial version of the new feature too. 
> > Otherwise
> > it'd only work if exactly one base relation of the query is grouped.
> 
> No. "combining aggregates" refers to what aggtransmultifn does. But,
> possibly that problem needs to be solved in the first step itself.

ok. As the discussion goes on, I see that this part could be more useful than
I originally thought. I'll consider it.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-19 Thread Andrew Borodin

Hi, Jeff!

Thanks for review!

Here's a patch corrected according to your suggestions.

2017-01-19 11:48 GMT+05:00 Jeff Davis :
> https://www.postgresql.org/message-id/CAJEAwVE4UAmm8fr%2BNW8XTnKV6M--ACoNhL3ES8yoKL2sKhbaiw%40mail.gmail.com
>
> Let me re-summarize what's been done here to make sure I understand:
>
> Each key in GIN has its own posting tree, which is itself a btree,
> holding all of the tuples that contain that key. This posting tree is
> really just a set of tuples, and searches always scan an entire
> posting tree (because all the tuples contain the key, so all are a
> potential match).
>
> Currently, the cleanup process requires blocking all reads and writes
> in the posting list. I assume this is only a problem when a key is
> common enough that its posting list is quite large (how large?).
The posting tree shall be at least 3 levels high to make this patch
useful. I guess it's at least 1 postings of a single term
(assuming average hundred of tuples per page).

> This patch makes a couple changes to improve concurrency:
>
> 1. Vacuum leaves first, without removing any pages. Instead, just keep
> track of pages which are completely empty (more generally, subtrees
> that contain empty pages).
>
> 2. Free the empty pages in those subtrees. That requires blocking
> reads and writes to that subtree, but not the entire posting list.
> Blocking writes is easy: acquiring a cleanup lock on the root of any
> subtree always blocks any writes to that subtree, because the write
> keeps a pin on all pages in that path. Blocking reads is accomplished
> by locking the page to be deleted, then locking/unlocking the page one
> left of that (which may be outside the subtree?).
No, leftmost page within tree. It may be referenced by rightlink from
outside the subtree, that's the reason we ceep it alive even if it's
empy.

> Maybe a basic question, but why is the posting list a btree at all,
> and not, say a doubly-linked list?
When GIN executes searches usually it have to fetch documents which
contains intersection of posting lists. It is faster to intersect
B-trees, if common elements are rare.

>
> Review of the code itself:
>
> * Nice and simple, with a net line count of only +45.
> * Remove commented-out code.
I've removed the code. Though it's purpose was to accent changed
tricky concurrency places
> * In the README, "leaf-to-right" should be left-to-right; and "takinf"
> should be "taking".
Fixed
> * When you say the leftmost page is never deleted; do you mean the
> leftmost of the entire posting tree, or leftmost of the subtree that
> you are removing pages from?
Leftmost of a subtree, containing totally empty subtree. It's not
neccesarily leftmost page of whole posting tree.
> * In order to keep out concurrent reads, you need to lock/unlock the
> left page while holding exclusive lock on the page being deleted, but
> I didn't see how that happens exactly in the code. Where does that
> happen?
in function ginDeletePage()
LockBuffer(lBuffer, GIN_EXCLUSIVE);
We have to mark this page dirty, since it's rightlink is changed. We
cannot mark it dirty without locking it, even if we surely know that
no any reader can reach it out (due to cleanup lock at the root of
cleaned subtree).

Thank you for your suggestions, do not hesitate to ask any questions,
concurrency and GIN both are very interesting topics.

Best regards, Andrey Borodin.
diff --git a/src/backend/access/gin/README b/src/backend/access/gin/README
index fade0cb..990b5ff 100644
--- a/src/backend/access/gin/README
+++ b/src/backend/access/gin/README
@@ -314,10 +314,17 @@ deleted.
 The previous paragraph's reasoning only applies to searches, and only to
 posting trees. To protect from inserters following a downlink to a deleted
 page, vacuum simply locks out all concurrent insertions to the posting tree,
-by holding a super-exclusive lock on the posting tree root. Inserters hold a
-pin on the root page, but searches do not, so while new searches cannot begin
-while root page is locked, any already-in-progress scans can continue
-concurrently with vacuum. In the entry tree, we never delete pages.
+by holding a super-exclusive lock on the parent page of subtree with deletable
+pages. Inserters hold a pin on the root page, but searches do not, so while
+new searches cannot begin while root page is locked, any already-in-progress
+scans can continue concurrently with vacuum in corresponding subtree of
+posting tree. To exclude interference with readers vacuum takes exclusive
+locks in a depth-first scan in left-to-right order of page tuples. Leftmost
+page is never deleted. Thus before deleting any page we obtain exclusive
+lock on any left page, effectively excluding deadlock with any reader, despite
+taking parent lock before current and left lock after current. We take left
+lock not for a concurrency reasons, but rather in need to mark page dirty.
+In the entry tree, we never delete pages.
 
 (This is quite different from the mechanism the btree i

Re: [HACKERS] How to extract bytes from a bit/bit(n) Datum pointer?

2017-01-19 Thread Ashutosh Bapat

On Thu, Jan 19, 2017 at 5:53 PM, valeriof  wrote:
> Hi,
>
> This may come from my lack of experience with Postgres, but I'm trying to
> extract the byte portion of a Datum that is of type VarBit - bit/bit(n). I
> see that the Datum pointer contains the value content of the bytes (after a
> few bytes for the header) but I would need to point to the actual value
> (plus also the number of bytes to be picked up).


>
> I currently have a similar issue with BYTEAOID columns. I have a Datum
> pointer but can't get the actual content.

If you have not looked at DatumGetVarBitP() and DatumGetByteaP(), that
will get you corresponding structure pointers from a Datum. Then check
src/backend/utils/adt/varbit.c and bytea_* functions from
src/backend/utils/adt/varlena.c to understand how those structures are
used.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-19 Thread Pavel Stehule

2017-01-19 13:35 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > Hi
> >
> > New update - rebase after yesterday changes.
> >
> > What you want to change?
>
> I think the problem might come from the still pending patch on that
> series, which Andres posted in
> https://www.postgresql.org/message-id/20170118221154.
> aldebi7yyjvds...@alap3.anarazel.de
> As far as I understand, minor details of that patch might change before
> commit, but it is pretty much in definitive form.
>

ok, we have to wait - please, check XML part if it is good for you

Regards

Pavel


>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

1 2 >

1 - 100 of 125 matches

Mail list logo