date:20160219

Re: [HACKERS] Declarative partitioning

2016-02-19 Thread Amit Langote

On Sat, Feb 20, 2016 at 1:41 PM, Peter Eisentraut  wrote:
> On 2/16/16 9:56 PM, Amit Langote wrote:
>> From now on, instead of attaching multiple files like in the previous
>> message, I will send a single tar.gz which will contain patches created by
>> git-format-patch.
>
> Please don't do that.

OK, will remember. Sorry about that.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [JDBC] JDBC behaviour

2016-02-19 Thread Sridhar N Bamandlapally

Hi All

I understand your point,

may be I didn't understand everyone or everyone didn't understand me

one feature of PostgreSQL is implemented into another feature of Java ( i
say subject PostgreSQL::autocommit Vs JDBC::setAutoCommit ),
i.e PostgreSQL::"set autocommit to FALSE" is implemented as
JDBC::"BEGIN--END"

currently PostgreSQL::"set autocommit to FALSE ( not supported )

say in future, if PostgreSQL come with proper fix/support for "set
autocommit to FALSE" then will JDBC-team change the to code to JDBC::"set
autocommit to FALSE" ?, then what about existing behaviors dependency
applications ?

this could have handled in different way in blogs saying to add "BEGIN-END"
from JDBC-connection-query with warning

simple, if PostgreSQL DB is not support then same with PostgreSQL JDBC too,
if still JDBC want to support then need to support with expected behavior
way only, how come other feature is added to this ?

basically, decision/review seems to be wrong, may be bug in the decision

and why for this we are continuing/forcing the loop is, because

1. "every/entire application developers expected behavior are matching,
only PostgreSQL::JDBC-team is not in sync"
2. "every organisation want there applications to be multi-database
compatible, only PostgreSQL::JDBC-team "

however, looping hackers and ending the loop

sorry, for using hard words(if any), but as open-source we need to complete
transparent


Thanks
Sridhar





On Thu, Feb 18, 2016 at 11:03 PM, Kevin Wooten  wrote:

> Using ‘psql’ executing your example would yield the same result, a command
> error would cause a required rollback before proceeding.  This tells you
> that this is how PostgreSQL, the database, is designed to work. It has
> nothing to do with the Java driver implementation.
>
> You are asking the creators of a client driver implementation to change a
> fundamental behavior of the database.  Repeatedly people have suggested you
> take this up with those creating the actual database (that’s the request to
> move this to the ‘-hackers’ list); yet you persist.
>
> I’m only chiming in because it’s getting quite annoying to have you keep
> this thread alive when the situation has been made quite clear to you.
>
> On Feb 18, 2016, at 9:57 AM, Sridhar N Bamandlapally <
> sridhar@gmail.com> wrote:
>
> There are many reasons why this is required,
>
> 1. Postgres migrated client percentage is high,
>
> 2. For application developers this looks like bug in Postgres, as it throw
> exception for next transaction even when current exception
> suppressed/handled,
>
> 3. Most of non-financial application or data-ware-house application have
> batch transaction process where successful transaction goes into
> data-tables and failed transactions goes into error-log-tables,
>
> this is most generic requirement
>
> cannot effort any reason if client think about rollback to old database or
> feel not meeting requirements  -- please ignore
>
>
>
> On Thu, Feb 18, 2016 at 7:06 PM, Mark Rotteveel 
> wrote:
>
>> On Thu, 18 Feb 2016 13:48:04 +0100 (CET), Andreas Joseph Krogh
>>  wrote:
>> >  I understand that and indeed this isn't something that should be
>> handled
>> >  by the driver, however some of the response in this thread seem to
>> think
>> >  it
>> >  is an absurd expectation from the OP that failure of one statement
>> should
>> >  still allow a commit. Which it isn't if you look at what other database
>> >  systems do.
>> >
>> >  Mark
>> >
>> > If that one failed statement doesn't raise an exception, how does the
>> > client
>> > (code) know that it failed? If it does raise an exception, then what
>> > standard
>> > specifies that that specific exceptions is to be treated as "don't
>> > rollback for
>> > this type of error"?
>>
>> Of course an exception is raised, but the exact handling could then be
>> left to the client. For example the client could catch the exception,
>> decide based on the specific error to execute another statement to "fix"
>> the error condition and then commit. Think of INSERT, duplicate key, then
>> UPDATE before the existence of 'UPSERT'-like statements; if the occurrence
>> of duplicate key is rare it can be cheaper to do than to first SELECT to
>> check for existence and then INSERT or UPDATE, or to UPDATE, INSERT when
>> update count = 0. Another situation could be where the failure is not
>> important (eg it was only a log entry that is considered supporting, not
>> required), so the exception is ignored and the transaction as a whole is
>> committed.
>>
>> Sure, in most cases it is abusing exceptions for flow control and likely
>> an example of bad design, but the point is that it is not outlandish to
>> allow execution of other statements and eventually a commit of a
>> transaction even if one or more statements failed in that transaction; as
>> demonstrated by systems that do allow this (for SQL Server you need to set
>>

Re: [HACKERS] Declarative partitioning

2016-02-19 Thread Peter Eisentraut

On 2/16/16 9:56 PM, Amit Langote wrote:
> From now on, instead of attaching multiple files like in the previous
> message, I will send a single tar.gz which will contain patches created by
> git-format-patch.

Please don't do that.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GCC 6 warning fixes

2016-02-19 Thread Peter Eisentraut

Here are three patches to fix new warnings in GCC 6.

0001 is apparently a typo.

0002 was just (my?) stupid code to begin with.

0003 is more of a workaround.  There could be other ways address this, too.
From 1e5bf0bdcd86b807d881ea82245275389083ec75 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 19 Feb 2016 23:07:46 -0500
Subject: [PATCH 1/3] ecpg: Fix typo

GCC 6 points out the redundant conditions, which were apparently typos.
---
 src/interfaces/ecpg/test/compat_informix/describe.pgc| 2 +-
 src/interfaces/ecpg/test/expected/compat_informix-describe.c | 2 +-
 src/interfaces/ecpg/test/expected/sql-describe.c | 2 +-
 src/interfaces/ecpg/test/sql/describe.pgc| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/interfaces/ecpg/test/compat_informix/describe.pgc b/src/interfaces/ecpg/test/compat_informix/describe.pgc
index 1836ac3..6f6 100644
--- a/src/interfaces/ecpg/test/compat_informix/describe.pgc
+++ b/src/interfaces/ecpg/test/compat_informix/describe.pgc
@@ -150,7 +150,7 @@ exec sql end declare section;
 	exec sql describe st_id2 using descriptor sqlda2;
 	exec sql describe st_id2 into sqlda3;
 
-	if (sqlda1 == NULL || sqlda1 == NULL || sqlda2 == NULL)
+	if (sqlda1 == NULL || sqlda2 == NULL || sqlda3 == NULL)
 		exit(1);
 
 	strcpy(msg, "get descriptor");
diff --git a/src/interfaces/ecpg/test/expected/compat_informix-describe.c b/src/interfaces/ecpg/test/expected/compat_informix-describe.c
index 5951ae6..9eb176e 100644
--- a/src/interfaces/ecpg/test/expected/compat_informix-describe.c
+++ b/src/interfaces/ecpg/test/expected/compat_informix-describe.c
@@ -362,7 +362,7 @@ if (sqlca.sqlcode < 0) exit (1);}
 #line 151 "describe.pgc"
 
 
-	if (sqlda1 == NULL || sqlda1 == NULL || sqlda2 == NULL)
+	if (sqlda1 == NULL || sqlda2 == NULL || sqlda3 == NULL)
 		exit(1);
 
 	strcpy(msg, "get descriptor");
diff --git a/src/interfaces/ecpg/test/expected/sql-describe.c b/src/interfaces/ecpg/test/expected/sql-describe.c
index 356f587..d0e39e9 100644
--- a/src/interfaces/ecpg/test/expected/sql-describe.c
+++ b/src/interfaces/ecpg/test/expected/sql-describe.c
@@ -360,7 +360,7 @@ if (sqlca.sqlcode < 0) exit (1);}
 #line 151 "describe.pgc"
 
 
-	if (sqlda1 == NULL || sqlda1 == NULL || sqlda2 == NULL)
+	if (sqlda1 == NULL || sqlda2 == NULL || sqlda3 == NULL)
 		exit(1);
 
 	strcpy(msg, "get descriptor");
diff --git a/src/interfaces/ecpg/test/sql/describe.pgc b/src/interfaces/ecpg/test/sql/describe.pgc
index cd52c82..b95ab35 100644
--- a/src/interfaces/ecpg/test/sql/describe.pgc
+++ b/src/interfaces/ecpg/test/sql/describe.pgc
@@ -150,7 +150,7 @@ exec sql end declare section;
 	exec sql describe st_id2 using descriptor sqlda2;
 	exec sql describe st_id2 into sqlda3;
 
-	if (sqlda1 == NULL || sqlda1 == NULL || sqlda2 == NULL)
+	if (sqlda1 == NULL || sqlda2 == NULL || sqlda3 == NULL)
 		exit(1);
 
 	strcpy(msg, "get descriptor");
-- 
2.7.1

From 996a5d27a2bc79e1e0b2ee0aa39cfdc7615e874c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 19 Feb 2016 23:07:46 -0500
Subject: [PATCH 2/3] psql: Fix some strange code in SQL help creation

Struct QL_HELP used to be defined as static in the sql_help.h header
file, which is included in sql_help.c and help.c, thus creating two
separate instances of the struct.  This causes a warning from GCC 6,
because the struct is not used in sql_help.c.

Instead, declare the struct as extern in the header file and define it
in sql_help.c.  This also allows making a bunch of functions static
because they are no longer needed outside of sql_help.c.
---
 src/bin/psql/create_help.pl | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/src/bin/psql/create_help.pl b/src/bin/psql/create_help.pl
index fedcc47..b9b8e87 100644
--- a/src/bin/psql/create_help.pl
+++ b/src/bin/psql/create_help.pl
@@ -59,8 +59,6 @@
 #ifndef $define
 #define $define
 
-#define N_(x) (x)/* gettext noop */
-
 #include \"postgres_fe.h\"
 #include \"pqexpbuffer.h\"
 
@@ -72,6 +70,7 @@
 	intnl_count;	/* number of newlines in syntax (for pager) */
 };
 
+extern const struct _helpStruct QL_HELP[];
 ";
 
 print CFILE "/*
@@ -83,6 +82,8 @@
  *
  */
 
+#define N_(x) (x)/* gettext noop */
+
 #include \"$hfile\"
 
 ";
@@ -170,8 +171,7 @@
 	$synopsis =~ s/\\n/\\n"\n$prefix"/g;
 	my @args =
 	  ("buf", $synopsis, map("_(\"$_\")", @{ $entries{$_}{params} }));
-	print HFILE "extern void sql_help_$id(PQExpBuffer buf);\n";
-	print CFILE "void
+	print CFILE "static void
 sql_help_$id(PQExpBuffer buf)
 {
 \tappendPQExpBuffer(" . join(",\n$prefix", @args) . ");
@@ -180,15 +180,14 @@
 ";
 }
 
-print HFILE "
-
-static const struct _helpStruct QL_HELP[] = {
+print CFILE "
+const struct _helpStruct QL_HELP[] = {
 ";
 foreach (sort keys %entries)
 {
 	my $id = $_;
 	$id =~ s/ /_/g;
-	print HFILE "{ \"$_\",
+	print CFILE "{ \"$_\",
   N_(\"$entries{$_}{cmddesc}\"),

Re: [HACKERS] exposing pg_controldata and pg_config as functions

2016-02-19 Thread Joe Conway

On 01/17/2016 04:10 PM, Joe Conway wrote:
> On 01/16/2016 06:02 AM, Michael Paquier wrote:
>> On Wed, Dec 30, 2015 at 9:08 AM, Joe Conway  wrote:
>>> 3) Adds new functions, more or less in line with previous discussions:
>>>* pg_checkpoint_state()
>>>* pg_controldata_state()
>>>* pg_recovery_state()
>>>* pg_init_state()
>>
>> Taking the opposite direction of Josh upthread, why is this split
>> actually necessary? Isn't the idea to provide a SQL interface of what
>> pg_controldata shows? If this split proves to be useful, shouldn't we
>> do it as well for pg_controldata?
> 
> The last discussion moved strongly in the direction of separate
> functions, and that being different from pg_controldata was not a bad
> thing. That said, I'm still of the opinion that there are legitimate
> reasons to want the command line pg_controldata and the SQL functions to
> produce equivalent, if not identical, results. I just wish we could get
> a clear consensus one way or the other.

I've assumed that we are sticking with the separate functions. As such,
here is a rebased patch, with documentation and other fixes such as
Copyright year, Mkvcbuild support, and some cruft removal.

>> I think that those functions should be superuser-only. They provide
>> information about the system globally.
> 
> The tail of this thread seems to be headed away from this direction.
> I'll take another look and propose something concrete.

I've looked at existing functions that seem similar, and as far as I can
see none are superuser-only. I'm certainly happy to make them so if
that's the consensus, but currently they are wide open. Opinions?

For convenience in answering that question, here is what information is
included in the output of each function (\df so you can see the data
types, plus SELECT output for a more readable example):

8<-
postgres=# \x
Expanded display is on.

postgres=# \df pg_checkpoint_state
Name| pg_checkpoint_state
Result data type| record
Argument data types | OUT checkpoint_location pg_lsn, OUT prior_location
pg_lsn, OUT redo_location pg_lsn, OUT redo_wal_file text, OUT
timeline_id integer, OUT prev_timeline_id integer, OUT full_page_writes
boolean, OUT next_xid text, OUT next_oid oid, OUT next_multixact_id xid,
OUT next_multi_offset xid, OUT oldest_xid xid, OUT oldest_xid_dbid oid,
OUT oldest_active_xid xid, OUT oldest_multi_xid xid, OUT
oldest_multi_dbid oid, OUT oldest_commit_ts_xid xid, OUT
newest_commit_ts_xid xid, OUT checkpoint_time timestamp with time zone

postgres=# select * from pg_checkpoint_state();
-[ RECORD 1 ]+-
checkpoint_location  | 0/14CD368
prior_location   | 0/14CD0D0
redo_location| 0/14CD368
redo_wal_file| 00010001
timeline_id  | 1
prev_timeline_id | 1
full_page_writes | t
next_xid | 0:576
next_oid | 12415
next_multixact_id| 1
next_multi_offset| 0
oldest_xid   | 568
oldest_xid_dbid  | 1
oldest_active_xid| 0
oldest_multi_xid | 1
oldest_multi_dbid| 1
oldest_commit_ts_xid | 0
newest_commit_ts_xid | 0
checkpoint_time  | 2016-02-19 18:44:51-08

postgres=# \df pg_controldata_state
Name| pg_controldata_state
Result data type| record
Argument data types | OUT pg_control_version integer, OUT
catalog_version_no integer, OUT system_identifier bigint, OUT
pg_control_last_modified timestamp with time zone

postgres=# select * from pg_controldata_state();
-[ RECORD 1 ]+---
pg_control_version   | 942
catalog_version_no   | 201602171
system_identifier| 6253198751269127743
pg_control_last_modified | 2016-02-19 18:44:58-08

postgres=# \df pg_init_state
Name| pg_init_state
Result data type| record
Argument data types | OUT max_data_alignment integer, OUT
database_block_size integer, OUT blocks_per_segment integer, OUT
wal_block_size integer, OUT bytes_per_wal_segment integer, OUT
max_identifier_length integer, OUT max_index_columns integer, OUT
max_toast_chunk_size integer, OUT large_object_chunk_size integer, OUT
bigint_timestamps boolean, OUT float4_pass_by_value boolean, OUT
float8_pass_by_value boolean, OUT data_page_checksum_version integer

postgres=# select * from pg_init_state();
-[ RECORD 1 ]--+-
max_data_alignment | 8
database_block_size| 8192
blocks_per_segment | 131072
wal_block_size | 8192
bytes_per_wal_segment  | 16777216
max_identifier_length  | 64
max_index_columns  | 32
max_toast_chunk_size   | 1996
large_object_chunk_size| 2048
bigint_timestamps  | t
float4_pass_by_value   | t
float8_pass_by_value   | t
data_page_checksum_version | 0

postgres=# \df pg_recovery_state
Result data type| record
Argument data types | OUT min_recovery_end_location pg_lsn, OUT
min_recovery_end_timeline integer,

[HACKERS] psql metaqueries with \gexec

2016-02-19 Thread Corey Huinker

Often, I'm faced with a long .sql script that builds some objects, then
builds things on top of them.

This means that some of the queries I wish to run are dependent on the
state of things that are unknown at the time of writing the script.

I could give up, and make a python script that mostly just strings together
SQL statements. That's ugly and cumbersome.

I could do some wizardry like this:

$ create table foo( a integer, b text, c date);
$ select coalesce( ( select string_agg(format('create index
foo(%I);',attname),E'\n')
   from pg_attribute
   where attrelid = 'foo'::regclass
   and attnum > 0 order by attnum),
 '') as sql_statements
\gset
:sql_statements


For those of you not willing to parse that, that's a dictionary query with
a 1-column result set formatted into sql with a ';' appended, string
aggregated with a newline delimiter, with the final result set coalesced
with an empty string because \gset will error on an empty result set. I
then immediately put that psql variable back into the command buffer, where
I hope that I meta-wrote valid SQL. If it hurt to read, you can imagine
what it was like to write.

I could use \g and pipe the results to another psql session...but that will
happen in another transaction where my objects might not exist yet.

I would also like the log to show what commands were run.

For that reason, I created the psql command \gexec

It is like \g and \gset in the sense that it executes the query currently
in the buffer. However, it treats every cell in the result set as a query
which itself should be immediately executed.

$ create temporary table gexec_temp( a int, b text, c date, d float);
CREATE TABLE
$ select format('create index on gexec_temp(%I)',attname)
from pg_attribute
where attrelid = 'gexec_temp'::regclass
and attnum > 0
order by attnum

\gexec

create index on gexec_temp(a)
CREATE INDEX
create index on gexec_temp(b)
CREATE INDEX
create index on gexec_temp(c)
CREATE INDEX
create index on gexec_temp(d)
CREATE INDEX



Execution order of the statements is top to bottom, left to right.

$ select 'select 1 as ones', 'select x.y, x.y*2 as double from
generate_series(1,4) as x(y)'
union all
select 'select true as is_true', 'select ''2000-01-01''::date as party_over'
\gexec
ones

   1
(1 row)

y double
- --
1  2
2  4
3  6
4  8
(4 rows)

is_true
---
t
(1 row)

party_over
--
01-01-2000
(1 row)



Empty result sets do nothing:

$ select 'select 1 as expect_zero_rows ' where false
\gexec


The results are just strings which are sent to SendQuery(), where they
succeed or fail on their own merits

$ select 'do $$ begin raise notice ''plpgsql block executed''; end;$$' as
block
from generate_series(1,2)

\gexec

do $$ begin raise notice 'plpgsql block executed'; end;$$
NOTICE:  plpgsql block executed
DO
do $$ begin raise notice 'plpgsql block executed'; end;$$
NOTICE:  plpgsql block executed
DO


I am not sure that "gexec" is the right name for this command. Others
considered were \execute_each, \meta, \gmeta, \geach, as well as adding a
"<" parameter to the \g command.

Many thanks to Pavel Stěhule for giving me some direction in this endeavor,
though he might not agree with the design.
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..5ca769f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -849,6 +849,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/* \gset [prefix] -- send query and store result into variables */
else if (strcmp(cmd, "gset") == 0)
{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2cb2e9b..35bbeb9 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -710,6 +710,39 @@ StoreQueryTuple(const PGresult *result)
return success;
 }
 
+/*
+ * ExecQueryTuples: assuming query result is OK, execute every query
+ * result as its own statement
+ *
+ * Returns true if successful, false otherwise.
+ */
+static bool
+ExecQueryTuples(const PGresult *result)
+{
+   boolsuccess = true;
+   int nrows = PQntuples(result);
+   int ncolumns = PQnfields(result);
+   int r, c;
+
+   for (r = 0; r < nrows; r++)
+   {
+   for (c = 0; c < ncolumns; c++)
+   {
+   if (! PQgetisnull(result, r, c))
+   {
+   if ( ! SendQuery(PQgetvalue(result, r, c)) )
+   {
+   success = false;
+   }
+   }
+   }
+

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-19 Thread Michael Paquier

On Sat, Feb 20, 2016 at 5:08 AM, Fabien COELHO  wrote:
>> Kernel 3.2 is extremely bad for Postgresql, as the vm seems to amplify IO
>> somehow. The difference to 3.13 (the latest LTS kernel for 12.04) is huge.
>>
>>
>> https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4#.6dx44vipu
>
>
> Interesting! To summarize it, 25% performance degradation from best kernel
> (2.6.32) to worst (3.2.0), that is indeed significant.

As far as I recall, the OS cache eviction is very aggressive in 3.2,
so it would be possible that data from the FS cache that was just read
could be evicted even if it was not used yet. Thie represents a large
difference when the database does not fit in RAM.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-02-19 Thread Andres Freund

On 2016-02-19 22:46:44 +0100, Fabien COELHO wrote:
> 
> Hello Andres,
> 
> >Here's the next two (the most important) patches of the series:
> >0001: Allow to trigger kernel writeback after a configurable number of 
> >writes.
> >0002: Checkpoint sorting and balancing.
> 
> I will look into these two in depth.
> 
> Note that I would have ordered them in reverse because sorting is nearly
> always very beneficial, and "writeback" (formely called flushing) is then
> nearly always very beneficial on sorted buffers.

I had it that way earlier. I actually saw pretty large regressions from
sorting alone in some cases as well, apparently because the kernel
submits much larger IOs to disk; although that probably only shows on
SSDs.  This way the modifications imo look a trifle better ;). I'm
intending to commit both at the same time, keep them separate only
because they're easier to ynderstand separately.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-02-19 Thread Fabien COELHO



Hello Andres,


Here's the next two (the most important) patches of the series:
0001: Allow to trigger kernel writeback after a configurable number of writes.
0002: Checkpoint sorting and balancing.


I will look into these two in depth.

Note that I would have ordered them in reverse because sorting is nearly 
always very beneficial, and "writeback" (formely called flushing) is then 
nearly always very beneficial on sorted buffers.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: make NOTIFY list de-duplication optional

2016-02-19 Thread Catalin Iacob

On 2/9/16, Tom Lane  wrote:
> FWIW, I think it would be a good thing if the NOTIFY statement syntax were
> not remarkably different from the syntax used in the pg_notify() function
> call.  To do otherwise would certainly be confusing.  So on the whole
> I'd go with the "NOTIFY channel [ , payload [ , mode ] ]" option.

I'm quite interested in getting this addressed in time for 9.6 as I'll
be using NOTIFY extensively in a project and I agree with Craig that
the deduplication is frustrating both because you sometimes want every
event and because it can apparently cause O(n^2) behaviour (which I
didn't know before this thread). If another use case for suppressing
deduplication is needed, consider publishing events like "inserted
tuple", "deleted tuple" from triggers and a transaction that does
"insert, delete, insert" which the client then sees as "insert,
delete, oops nothing else".

Tom's proposal allows for more flexible modes than just the ALL and
DISTINCT keywords and accommodates the concern that DISTINCT will lead
to bug reports about not really being distinct due to savepoints.

Android has a similar thing for push notifications to mobile devices
which they call collapse:
https://developers.google.com/cloud-messaging/concept-options, search
for collapse_key.

So I propose NOTIFY channel [ , payload [ , collapse_mode ] ] with
collapse mode being:

* 'never'
  ** Filip's proposed behaviour for the ALL option
  ** if specified, every notification is queued regardless what's in the queue

* 'maybe'
  ** vague word allowing for flexibility in what the server decides to do
  ** current behaviour
  ** improves performance for big transactions if a row trigger
creates the same payload over and over one after the other due to the
current optimization of checking the tail of the list
  ** has performance problems O(n^2) for big transactions with
different payloads
  *** the performance problems can be addressed by a different
patch which uses a hash table, or decides to collapse less
aggressively (Tom's check last 100 idea), or whatever else
  *** in the meantime the 'never' mode acts as a good workaround

In the future we might support an 'always' collapse_mode which would
really be always, including across savepoints. Or an
'only_inside_savepoints' which guarantees the current behaviour.

Filip, do you agree with Tom's proposal? Do you plan to rework the
patch on these lines? If you are I'll try to review it, if not I could
give it a shot as I'm interested in having this in 9.6.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-19 Thread Pavel Stehule

2016-02-18 17:59 GMT+01:00 Catalin Iacob :

> On 2/18/16, Pavel Stehule  wrote:
> > it doesn't look badly. Is there any possibility how to emulate it with
> > Python2 ? What do you think about some similar implementation on Python2?
>
> The example code I gave works as is in Python2.
>
> The Python3 keyword only arguments are only syntactic sugar. See
> https://www.python.org/dev/peps/pep-3102 for the details. But, as the
> PEP notes,
>
> def f(a, b, *, key1, key2)
>
> is similar to doing this which also works in Python2
>
> def f(a, b, *ignore, key1, key2):
> if ignore:
>raise TypeError('too many positional arguments')
>
> For our case, we want to accept any number of positional arguments due
> to compatibility so we don't need or want the check for 'too many
> positional arguments'.
>
> Note that in both Python 2 and 3, invoking f(1, 2, key1='k1',
> key2='k2') is just syntactic sugar for constructing the (1, 2) tuple
> and {'key1': 'k1', 'key2': 'k2'} dict and passing those to f which
> then unpacks them into a, b, key1 and key2. You see that reflected in
> the C API where you get PyObject* args and PyObject* kw and you unpack
> them explicitly with PyArg_ParseTupleAndKeywords or just use tuple and
> dict calls to peek inside.
>
> What you loose by not having Python3 is that you can't use
> PyArg_ParseTupleAndKeywords and tell it that detail and so on are
> keywork only arguments. But because we don't want to unpack args we
> probably couldn't use that anyway even if we wouldn't support Python2.
>
> Therefore you need to look inside the kw dictionary manually as my
> example shows. What we could also do is check that the kw dictionary
> *only* contains detail, hint and so on and raise a TypeError if it has
> more things to avoid silently accepting stuff like:
> plpy.error('message', some_param='abc'). In Python3
> PyArg_ParseTupleAndKeywords would ensure that, but we need to do it
> manually.
>

 It looks like good idea. Last version are not breaking compatibility - and
I think so it can works.

I wrote the code, that works on Python2 and Python3

Regards

Pavel
diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
new file mode 100644
index 015bbad..7d31342
*** a/doc/src/sgml/plpython.sgml
--- b/doc/src/sgml/plpython.sgml
*** $$ LANGUAGE plpythonu;
*** 1341,1360 
Utility Functions

 The plpy module also provides the functions
!plpy.debug(msg),
!plpy.log(msg),
!plpy.info(msg),
!plpy.notice(msg),
!plpy.warning(msg),
!plpy.error(msg), and
!plpy.fatal(msg).elogin PL/Python
 plpy.error and
 plpy.fatal actually raise a Python exception
 which, if uncaught, propagates out to the calling query, causing
 the current transaction or subtransaction to be aborted.
 raise plpy.Error(msg) and
 raise plpy.Fatal(msg) are
!equivalent to calling
 plpy.error and
 plpy.fatal, respectively.
 The other functions only generate messages of different
--- 1341,1360 
Utility Functions

 The plpy module also provides the functions
!plpy.debug(exception_params),
!plpy.log(exception_params),
!plpy.info(exception_params),
!plpy.notice(exception_params),
!plpy.warning(exception_params),
!plpy.error(exception_params), and
!plpy.fatal(exception_params).elogin PL/Python
 plpy.error and
 plpy.fatal actually raise a Python exception
 which, if uncaught, propagates out to the calling query, causing
 the current transaction or subtransaction to be aborted.
 raise plpy.Error(msg) and
 raise plpy.Fatal(msg) are
!partial equivalent to calling
 plpy.error and
 plpy.fatal, respectively.
 The other functions only generate messages of different
*** $$ LANGUAGE plpythonu;
*** 1367,1372 
--- 1367,1397 

  

+ 
+The exception_params are
+[ message [, detail [, hint [, sqlstate  [, schema  [, table  [, column  [, datatype  [, constraint ].
+These parameters kan be entered as keyword parameters.
+The message, detail, hint
+are automaticly casted to string, other should be string.
+ 
+ 
+ CREATE FUNCTION raise_custom_exception() RETURNS void AS $$
+ plpy.error("custom exception message", detail = "some info about exception", hint = "hint for users")
+ $$ LANGUAGE plpythonu;
+ 
+ postgres=# select raise_custom_exception();
+ ERROR:  XX000: plpy.Error: custom exception message
+ DETAIL:  some info about exception
+ HINT:  hint for users
+ CONTEXT:  Traceback (most recent call last):
+   PL/Python function "raise_custom_exception", line 2, in module
+ plpy.error("custom exception message", detail = "some info about exception", hint = "hint for users")
+ PL/Python function "raise_custom_exception"
+ LOCATION:  PLy_elog, plpy_elog.c:132
+ 
+   
+ 
+   
 Another set of utility functions are
 plpy.quote_literal(string),

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Tom Lane

Andres Freund  writes:
> On 2016-02-19 15:09:58 -0500, Tom Lane wrote:
>> I see no need for an additional mechanism.  Just watch pg_control until
>> you see DB_IN_PRODUCTION state there, then switch over to the same
>> connection probing that "pg_ctl start -w" uses.

> That's afaics not sufficient if the standby was using hot standby, as
> that'll let -w succeed immediately, no?

Oh, now I get Fujii-san's point.  Yes, that wouldn't prove anything
about whether you could do a write transaction immediately.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Andres Freund

On 2016-02-19 15:09:58 -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 2/19/16 10:06 AM, Fujii Masao wrote:
> >> One concern is that there can be a "time" after the pg_control's state
> >> is changed to DB_IN_PRODUCTION and before the server is able to
> >> start accepting normal (not read-only) connections. So if users try to
> >> start write transaction just after pg_ctl promote -w ends, they might
> >> get an error because the server is still in recovery, i.e., the startup
> >> process is running.
> 
> > I think that window would be acceptable.
> 
> > If not, then the way to deal with it would seem to be to create an
> > entirely new mechanism to communicate with pg_ctl (e.g., a file) and
> > call that at the very end of StartupXLOG().  I'm not sure that that is
> > worth it.
> 
> I see no need for an additional mechanism.  Just watch pg_control until
> you see DB_IN_PRODUCTION state there, then switch over to the same
> connection probing that "pg_ctl start -w" uses.

That's afaics not sufficient if the standby was using hot standby, as
that'll let -w succeed immediately, no?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-02-19 Thread Andres Freund

On 2016-02-04 16:54:58 +0100, Andres Freund wrote:
> Hi,
> 
> Fabien asked me to post a new version of the checkpoint flushing patch
> series. While this isn't entirely ready for commit, I think we're
> getting closer.
> 
> I don't want to post a full series right now, but my working state is
> available on
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush
> git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush

I've updated the git tree.

Here's the next two (the most important) patches of the series:
0001: Allow to trigger kernel writeback after a configurable number of writes.
0002: Checkpoint sorting and balancing.

For 0001 I've recently changed:
* Don't schedule writeback after smgrextend() - that defeats linux
  delayed allocation mechanism, increasing fragmentation noticeably.
* Add docs for the new GUC variables
* comment polishing
* BackendWritebackContext now isn't dynamically allocated anymore

I think this patch primarily needs:
* review of the docs, not sure if they're easy enough to
  understand. Some language polishing might also be needed.
* review of the writeback API, combined with the smgr/md.c changes.
* Currently *_flush_after can be set to a nonzero value, even if there's
  no support for flushing on that platform. Imo that's ok, but perhaps
  other people's opinion differ.

For 0002 I've recently changed:
* Removed the sort timing information, we've proven sufficiently that
  it doesn't take a lot of time.
* Minor comment polishing.

I think this patch primarily needs:
* Benchmarking on FreeBSD/OSX to see whether we should enable the
  mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
  inclined to leave it off till then.

Regards,

Andres
>From 58aee659417372f3dda4420d8f2a4f4d41c56d31 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 19 Feb 2016 12:13:05 -0800
Subject: [PATCH 1/4] Allow to trigger kernel writeback after a configurable
 number of writes.

Currently writes to the main data files of postgres all go through the
OS page cache. This means that currently several operating systems can
end up collecting a large number of dirty buffers in their respective
page caches.  When these dirty buffers are flushed to storage rapidly,
be it because of fsync(), timeouts, or dirty ratios, latency for other
writes can increase massively.  This is the primary reason for regular
massive stalls observed in real world scenarios and artificial
benchmarks; on rotating disks stalls on the order of hundreds of seconds
have been observed.

On linux it is possible to control this by reducing the global dirty
limits significantly, reducing the above problem. But global
configuration is rather problematic because it'll affect other
applications; also PostgreSQL itself doesn't always generally want this
behavior, e.g. for temporary files it's undesirable.

Several operating systems allow some control over the kernel page
cache. Linux has sync_file_range(2), several posix systems have msync(2)
and posix_fadvise(2). sync_file_range(2) is preferable because it
requires no special setup, whereas msync() requires the to-be-flushed
range to be mmap'ed. For the purpose of flushing dirty data
posix_fadvise(2) is the worst alternative, as flushing dirty data is
just a side-effect of POSIX_FADV_DONTNEED, which also removes the pages
from the page cache.  Thus the feature is enabled by default only on
linux, but can be enabled on all systems that have any of the above
APIs.

With the infrastructure added, writes made via checkpointer, bgwriter
and normal user backends can be flushed after a configurable number of
writes. Each of these sources of writes controlled by a separate GUC,
checkpointer_flush_after, bgwriter_flush_after and backend_flush_after
respectively; they're separate because the number of flushes that are
good are separate, and because the performance considerations of
controlled flushing for each of these are different.

A later patch will add checkpoint sorting - after that flushes from the
ckeckpoint will almost always be desirable. Bgwriter flushes are most of
the time going to be random, which are slow on lots of storage hardware.
Flushing in backends works well if the storage and bgwriter can keep up,
but if not it can have negative consequences.  This patch is likely to
have negative performance consequences without checkpoint sorting, but
unfortunately so has sorting without flush control.

TODO:
* verify msync codepath
* properly detect mmap() && msync(MS_ASYNC) support, use it by default
  if available and sync_file_range is *not* available

Discussion: alpine.DEB.2.10.150601132.28433@sto
Author: Fabien Coelho and Andres Freund
---
 doc/src/sgml/config.sgml  |  81 +++
 doc/src/sgml/wal.sgml |  13 +++
 src/backend/postmaster/bgwriter.c |   8 +-
 src/backend/storage/buffer/buf_init.c |   5 +

Re: [HACKERS] FDW: should GetFdwRoutine be called when drop table?

2016-02-19 Thread Tom Lane

Andres Freund  writes:
> On 2016-02-19 14:18:19 -0500, Peter Eisentraut wrote:
>> On 2/19/16 12:21 PM, Feng Tian wrote:
>>> I have an fdw that each foreign table will acquire some persisted resource.

>> But foreign data wrappers are meant to be wrappers around data managed
>> elsewhere, not their own storage managers (although that is clearly
>> tempting), so there might well be other places where this breaks down.

> Sounds like even a BEGIN;DROP TABLE foo;ROLLBACK; will break this
> approach.

Yes, that's exactly the problem: you'd need some sort of atomic commit
mechanism to make this work safely.

It's possible we could give FDWs a bunch of hooks that would let them
manage post-commit cleanup the same way smgr does, but it's a far larger
project than it might have seemed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2016-02-19 Thread Corey Huinker

On Thu, Feb 18, 2016 at 12:41 AM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> START [ EXCL ] (startval) END [ INCL ] (endval)
>
> That is, in range type notation, '[startval, endval)' is the default
> behavior. So for each partition, there is at least the following pieces of
> metadata:
>

This is really close, and if it is what we ended up with we would be able
to use it.

I suggest that the range notation can be used even when no suitable range
type exists.

I assume the code for parsing a range spec regardless of data type already
exists, but in case it doesn't, take a range spec of unknown type:

[x,y)

x and y are either going to be raw strings or doublequoted strings with
possible doublequote escapes, each of which would be coercible into the the
type of the partition column.

In other words, if your string values were 'blah , blah ' and 'fabizzle',
the [) range spec would be ["blah , blah ",fabizzle).

Using regular range specs syntax also allows for the range to be unbounded
in either or both directions, which is a possibility, especially in newer
tables where the expected distribution of data is unknown.

p.s. Sorry I haven't been able to kick the tires just yet. We have a very
good use case for this, it's just a matter of getting a machine and the
time to devote to it.

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Tom Lane

Peter Eisentraut  writes:
> On 2/19/16 10:06 AM, Fujii Masao wrote:
>> One concern is that there can be a "time" after the pg_control's state
>> is changed to DB_IN_PRODUCTION and before the server is able to
>> start accepting normal (not read-only) connections. So if users try to
>> start write transaction just after pg_ctl promote -w ends, they might
>> get an error because the server is still in recovery, i.e., the startup
>> process is running.

> I think that window would be acceptable.

> If not, then the way to deal with it would seem to be to create an
> entirely new mechanism to communicate with pg_ctl (e.g., a file) and
> call that at the very end of StartupXLOG().  I'm not sure that that is
> worth it.

I see no need for an additional mechanism.  Just watch pg_control until
you see DB_IN_PRODUCTION state there, then switch over to the same
connection probing that "pg_ctl start -w" uses.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-19 Thread Fabien COELHO



Hallo Patric,

Kernel 3.2 is extremely bad for Postgresql, as the vm seems to amplify 
IO somehow. The difference to 3.13 (the latest LTS kernel for 12.04) is 
huge.


https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4#.6dx44vipu


Interesting! To summarize it, 25% performance degradation from best kernel 
(2.6.32) to worst (3.2.0), that is indeed significant.


You might consider upgrading your kernel to 3.13 LTS. It's quite easy 
[...]


There are other stuff running on the hardware that I do not wish to touch, 
so upgrading the particular host is currently not an option, otherwise I 
would have switched to trusty.


Thanks for the pointer.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Tom Lane

Peter Eisentraut  writes:
> Is it safe to read pg_control externally without a lock?  pg_controldata
> will just report a CRC error and proceed, and if you're not sure you can
> just run it again.  But if a promotion fails every so often because of
> concurrent pg_control writes, that would make this feature annoying.

Just retry the read till you don't get a CRC error.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?

2016-02-19 Thread Peter Geoghegan

On Fri, Feb 19, 2016 at 12:05 PM, Alvaro Herrera
 wrote:
> But those pages are supposed to be used as the index grows.  So unless
> they are forgotten by the FSM, they shouldn't accumulate.  (Except where
> the table doesn't grow but only shrinks, so there's no need for new
> index pages, but I don't think that's an interesting case.)

Sure. I'm talking about a narrow issue around how things are
represented in pgstatindex() only.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?

2016-02-19 Thread Alvaro Herrera

Peter Geoghegan wrote:
> On Thu, Feb 18, 2016 at 4:53 PM, Tom Lane  wrote:

> >> there are usage patterns where half-dead pages might accumulate.
> >
> > Other than a usage pattern of "randomly SIGKILL backends every few
> > seconds", I don't see how that would happen.
> 
> I meant where pages could accumulate without being recycled.

But those pages are supposed to be used as the index grows.  So unless
they are forgotten by the FSM, they shouldn't accumulate.  (Except where
the table doesn't grow but only shrinks, so there's no need for new
index pages, but I don't think that's an interesting case.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DOCS] The number of bytes is stored in index_size of pgstatindex() ?

2016-02-19 Thread Peter Geoghegan

On Thu, Feb 18, 2016 at 4:53 PM, Tom Lane  wrote:
> Only a physical-order scan, ie vacuum, would visit a dead page
> (ignoring transient corner cases like a page getting deleted while an
> indexscan is in flight to it).  So I think treating it as part of the
> fragmentation measure is completely wrong: the point of that measure,
> AFAICS, is to model how close an index-order traversal is to linear.
> Half-dead pages are also normally very transient --- the only way they
> persist is if there's a crash partway through a page deletion.  So I think
> it's appropriate to assume that future indexscans won't visit those,
> either.

Okay.

>> there are usage patterns where half-dead pages might accumulate.
>
> Other than a usage pattern of "randomly SIGKILL backends every few
> seconds", I don't see how that would happen.

I meant where pages could accumulate without being recycled.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restructuring Paths to allow per-Path targetlist info

2016-02-19 Thread Tom Lane

Alvaro Herrera  writes:
>> So, the attached patch just bites the bullet and adds explicit output
>> tlist information to struct Path.

> Hmm, I wonder if this can be used to attack the problem here in a more
> sensible manner:
> https://github.com/2ndQuadrant/postgres/commit/e7c5df6b614b542d55588a483dd2ddba3892a0f6

As far as I gather from that commit message, it wouldn't help much, because
everything discussed there happens before any Paths have been built.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW: should GetFdwRoutine be called when drop table?

2016-02-19 Thread Andres Freund

On 2016-02-19 14:18:19 -0500, Peter Eisentraut wrote:
> On 2/19/16 12:21 PM, Feng Tian wrote:
> > I have an fdw that each foreign table will acquire some persisted resource.
> > In my case, some files in file system.   To drop the table cleanly, I
> > have written
> > an object_access_hook that remove those files.  The hook is installed in
> > _PG_init.  
> > 
> > It all worked well except one case. Suppose a user login, the very first
> > command is 
> > drop foreign table.  Drop foreign table will not load the module, so
> > that the hook 
> > is not installed and the files are not properly cleaned up.
> 
> You could load your library with one of the *_library_preload settings
> to make sure the hook is always present.
> 
> But foreign data wrappers are meant to be wrappers around data managed
> elsewhere, not their own storage managers (although that is clearly
> tempting), so there might well be other places where this breaks down.

Sounds like even a BEGIN;DROP TABLE foo;ROLLBACK; will break this
approach.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Andres Freund

On 2016-02-20 00:06:09 +0900, Fujii Masao wrote:
> One concern is that there can be a "time" after the pg_control's state
> is changed to DB_IN_PRODUCTION and before the server is able to
> start accepting normal (not read-only) connections. So if users try to
> start write transaction just after pg_ctl promote -w ends, they might
> get an error because the server is still in recovery, i.e., the startup
> process is running.

It might make sense to have one more state type, which is used when
ending recovery, but before full access is allowed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW: should GetFdwRoutine be called when drop table?

2016-02-19 Thread Peter Eisentraut

On 2/19/16 12:21 PM, Feng Tian wrote:
> I have an fdw that each foreign table will acquire some persisted resource.
> In my case, some files in file system.   To drop the table cleanly, I
> have written
> an object_access_hook that remove those files.  The hook is installed in
> _PG_init.  
> 
> It all worked well except one case. Suppose a user login, the very first
> command is 
> drop foreign table.  Drop foreign table will not load the module, so
> that the hook 
> is not installed and the files are not properly cleaned up.

You could load your library with one of the *_library_preload settings
to make sure the hook is always present.

But foreign data wrappers are meant to be wrappers around data managed
elsewhere, not their own storage managers (although that is clearly
tempting), so there might well be other places where this breaks down.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Andres Freund

On 2016-02-19 13:48:52 -0500, Peter Eisentraut wrote:
> Is it safe to read pg_control externally without a lock?  pg_controldata
> will just report a CRC error and proceed, and if you're not sure you can
> just run it again.  But if a promotion fails every so often because of
> concurrent pg_control writes, that would make this feature annoying.

Yes, the OS should give sufficient guarantees here:

   If write() is interrupted by a signal before it writes any data, it 
shall return −1 with errno set to [EINTR].

   If write() is interrupted by a signal after it successfully writes some 
data, it shall return the number of bytes written.

   If the value of nbyte is greater than {SSIZE_MAX}, the result is 
implementation-defined.

   After a write() to a regular file has successfully returned:

*  Any successful read() from each byte position in the file that was 
modified by that write shall return the data specified
   by the write() for that position until such byte positions are again 
modified.

We currently assume that all "small" writes succeed in one go (and throw
errors if not). Thus the guarantees by read/write are sufficient.

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Peter Eisentraut

On 2/19/16 10:06 AM, Fujii Masao wrote:
> One concern is that there can be a "time" after the pg_control's state
> is changed to DB_IN_PRODUCTION and before the server is able to
> start accepting normal (not read-only) connections. So if users try to
> start write transaction just after pg_ctl promote -w ends, they might
> get an error because the server is still in recovery, i.e., the startup
> process is running.

I think that window would be acceptable.

If not, then the way to deal with it would seem to be to create an
entirely new mechanism to communicate with pg_ctl (e.g., a file) and
call that at the very end of StartupXLOG().  I'm not sure that that is
worth it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Peter Eisentraut

On 2/18/16 3:33 AM, Andres Freund wrote:
> Hi,
> 
> On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
>> It would be nice if pg_ctl promote supported the -w (wait) option.
>>
>> How could pg_ctl determine when the promotion has finished?
> 
> How about looking into pg_control? ControlFileData->state ought to have
> the correct information.

Is it safe to read pg_control externally without a lock?  pg_controldata
will just report a CRC error and proceed, and if you're not sure you can
just run it again.  But if a promotion fails every so often because of
concurrent pg_control writes, that would make this feature annoying.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] about google summer of code 2016

2016-02-19 Thread Josh berkus


On 02/19/2016 10:10 AM, Álvaro Hernández Tortosa wrote:


 Hi.

 Oleg and I discussed recently that a really good addition to a GSoC
item would be to study whether it's convenient to have a binary
serialization format for jsonb over the wire. Some argue this should be
benchmarked first. So the scope for this project would be to benchmark
and analyze the potential improvements and then agree on which format
jsonb could be serialized to (apart from the current on-disk format,
there are many json or nested k-v formats that could be used for sending
over the wire).

 I would like to mentor this project with Oleg.


+1


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-19 Thread Patric Bechtel

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Fabien,

Fabien COELHO schrieb am 19.02.2016 um 16:04:
> 
>>> [...] Ubuntu 12.04 LTS (precise)
>> 
>> That's with 12.04's standard kernel?
> 
> Yes.

Kernel 3.2 is extremely bad for Postgresql, as the vm seems to amplify IO 
somehow. The difference
to 3.13 (the latest LTS kernel for 12.04) is huge.

https://medium.com/postgresql-talk/benchmarking-postgresql-with-different-linux-kernel-versions-on-ubuntu-lts-e61d57b70dd4#.6dx44vipu

You might consider upgrading your kernel to 3.13 LTS. It's quite easy normally:

https://wiki.ubuntu.com/Kernel/LTSEnablementStack

/Patric
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: GnuPT 2.5.2

iEYEARECAAYFAlbHW4AACgkQfGgGu8y7ypC1EACgy8mW6AoaWjKycbuAnCZ3CEPW
Al8AmwfF0smqmDvNsaPkq0dAtop7jP5M
=TxT+
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] about google summer of code 2016

2016-02-19 Thread Álvaro Hernández Tortosa



Hi.

Oleg and I discussed recently that a really good addition to a GSoC 
item would be to study whether it's convenient to have a binary 
serialization format for jsonb over the wire. Some argue this should be 
benchmarked first. So the scope for this project would be to benchmark 
and analyze the potential improvements and then agree on which format 
jsonb could be serialized to (apart from the current on-disk format, 
there are many json or nested k-v formats that could be used for sending 
over the wire).


I would like to mentor this project with Oleg.

Thanks,

Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata




On 17/02/16 08:40, Amit Langote wrote:

Hi Shubham,

On 2016/02/17 16:27, Shubham Barai wrote:

Hello everyone,

I am currently pursuing my bachelor of engineering in computer science
at Maharashtra
Institute of Technology, Pune ,India. I am very excited about contributing
to postgres through google summer of code program.

Is postgres   applying for gsoc 2016 as mentoring organization ?

I think it does.  Track this page for updates:
http://www.postgresql.org/developer/summerofcode/

You can contact one of the people listed on that page for the latest.

I didn't find for 2016 but here is the PostgreSQL wiki page for the last
year's GSoC page: https://wiki.postgresql.org/wiki/GSoC_2015#Project_Ideas

Thanks,
Amit








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Google Summer of Code

2016-02-19 Thread piyush patil

Mentoring organization application deadline is 19 February 19:00 UTC

[HACKERS] FDW: should GetFdwRoutine be called when drop table?

2016-02-19 Thread Feng Tian

Hi, Hackers,

I have an fdw that each foreign table will acquire some persisted resource.
In my case, some files in file system.   To drop the table cleanly, I have
written
an object_access_hook that remove those files.  The hook is installed in
_PG_init.

It all worked well except one case. Suppose a user login, the very first
command is
drop foreign table.  Drop foreign table will not load the module, so that
the hook
is not installed and the files are not properly cleaned up.

Should drop foreign table call GetFdwRoutine?   _PG_init is the only
entrance point
that I know for registering hooks, I feel we need to trigger a load for all
DML/DDL on
FDW including drop.   Does this make sense?

Thanks,
Feng

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread Bruce Momjian

On Fri, Feb 19, 2016 at 08:20:31AM -0800, Andres Freund wrote:
> On 2016-02-19 10:14:47 -0500, Bruce Momjian wrote:
> > We have already hesitated to record DDL changes for
> > logical replication because of the code size, maintenance overhead, and
> > testing required.
> 
> I'm not sure what you're referring to here? It'd be some relatively
> minor code surgery to also pass catalog changes to output plugins. It's
> just questionable what'd that bring to the table.

The complaint we got about recording DDL changes for logical replication
was the requirement to track every new DDL option we add.  If that
overhead can be done for both logical replication and auditing, it is a
win.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread Bruce Momjian

On Fri, Feb 19, 2016 at 11:20:13AM -0500, David Steele wrote:
> On 2/19/16 10:54 AM, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> >> Understood.  My point is that there is a short list of read events, and
> >> many DDL events.  We have already hesitated to record DDL changes for
> >> logical replication because of the code size, maintenance overhead, and
> >> testing required.
> > 
> > DDL is already captured using the event triggers mechanism (which is
> > what it was invented for in the first place).  The only thing we don't
> > have is a hardcoded mechanism to transform it from C struct format to
> > SQL language.
> 
> Since DDL event triggers only cover database-level DDL they miss a lot
> that is very important to auditing, e.g. CREATE/ALTER/DROP ROLE,
> GRANT/REVOKE, CREATE/ALTER/DROP DATABASE, etc.

Well, we need to enhance them then.

> I would like to see a general mechanism that allows event triggers,
> logical replication, and audit to all get the information they need
> without them being tied to each other directly.

I think the reporting of DDL would be produced in a way that could be
used by auditing or logical replication, as I already stated.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] about google summer of code 2016

2016-02-19 Thread Alvaro Herrera

Atri Sharma wrote:

> I agree, there might be scope for non core projects and PL/Java sounds like
> a good area.

We've hosted MADlib-based projects in the past, so why not.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread Andres Freund

On 2016-02-19 10:14:47 -0500, Bruce Momjian wrote:
> We have already hesitated to record DDL changes for
> logical replication because of the code size, maintenance overhead, and
> testing required.

I'm not sure what you're referring to here? It'd be some relatively
minor code surgery to also pass catalog changes to output plugins. It's
just questionable what'd that bring to the table.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread David Steele

On 2/19/16 10:54 AM, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
>> Understood.  My point is that there is a short list of read events, and
>> many DDL events.  We have already hesitated to record DDL changes for
>> logical replication because of the code size, maintenance overhead, and
>> testing required.
> 
> DDL is already captured using the event triggers mechanism (which is
> what it was invented for in the first place).  The only thing we don't
> have is a hardcoded mechanism to transform it from C struct format to
> SQL language.

Since DDL event triggers only cover database-level DDL they miss a lot
that is very important to auditing, e.g. CREATE/ALTER/DROP ROLE,
GRANT/REVOKE, CREATE/ALTER/DROP DATABASE, etc.

I would like to see a general mechanism that allows event triggers,
logical replication, and audit to all get the information they need
without them being tied to each other directly.

-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread Bruce Momjian

On Fri, Feb 19, 2016 at 12:54:17PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Understood.  My point is that there is a short list of read events, and
> > many DDL events.  We have already hesitated to record DDL changes for
> > logical replication because of the code size, maintenance overhead, and
> > testing required.
> 
> DDL is already captured using the event triggers mechanism (which is
> what it was invented for in the first place).  The only thing we don't
> have is a hardcoded mechanism to transform it from C struct format to
> SQL language.

Right, which is I think were the maintenance/testing overhead will come
from, which we are trying to avoid.  Having logical replication and
auditing share that burden would be a win.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread Alvaro Herrera

Bruce Momjian wrote:

> Understood.  My point is that there is a short list of read events, and
> many DDL events.  We have already hesitated to record DDL changes for
> logical replication because of the code size, maintenance overhead, and
> testing required.

DDL is already captured using the event triggers mechanism (which is
what it was invented for in the first place).  The only thing we don't
have is a hardcoded mechanism to transform it from C struct format to
SQL language.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restructuring Paths to allow per-Path targetlist info

2016-02-19 Thread Alvaro Herrera

Tom Lane wrote:

> So, the attached patch just bites the bullet and adds explicit output
> tlist information to struct Path.  I did set things up so that the cost
> is only one pointer in each Path in the typical case where Paths emit
> the set of Vars needed from their relation; in that case, they just
> point to a default PathTarget struct embedded in the parent RelOptInfo.
> A Path emitting something else will need its own PathTarget struct.

Hmm, I wonder if this can be used to attack the problem here in a more
sensible manner:
https://github.com/2ndQuadrant/postgres/commit/e7c5df6b614b542d55588a483dd2ddba3892a0f6

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread David Steele

On 2/19/16 10:14 AM, Bruce Momjian wrote:
> On Fri, Feb 19, 2016 at 09:19:58AM -0500, David Steele wrote:
>>> I was suggesting we could track write events via logical replication and
>>> then we only have to figure out auditing of read events, which would be
>>> easier.
>>
>> I agree that DDL/DML audit logging requires a lot of the same
>> information as logical replication but I don't think reading the logical
>> WAL stream is a good way to do audit logging even for writes.
>>
>> For instance there are some "writes" that are not WAL logged such as SET
>> or ALTER SYSTEM.  Determining attribution would be difficult or
>> impossible, such as which function called an update statement (or vice
>> versa).  Tying together the read and write information would be tricky
>> as well since the WAL stream contains information on transactions but
>> not user sessions, if I understand it correctly.
>>
>> The end result is that it would be very difficult to record a coherent,
>> attributed, and sequential audit log and in fact represent a step
>> backward from pgaudit's current capabilities.
> 
> Understood.  My point is that there is a short list of read events, and
> many DDL events.  We have already hesitated to record DDL changes for
> logical replication because of the code size, maintenance overhead, and
> testing required.  If we could use the same facility for both logical
> replication and auditing, the cost overhead is less per feature.  For
> example, we don't need to read the WAL to do the auditing, but the same
> facility could be used for both, e.g. output some JSON standard format
> that both logical replication and auditing could understand.

Agreed, and this was discussed up thread.  In my mind putting a more
generic structure on top of logical replication and DDL auditing is a
promising solution but I have not looked at it closely enough to know if
it will work as expected or address maintenance concerns.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread Bruce Momjian

On Fri, Feb 19, 2016 at 09:19:58AM -0500, David Steele wrote:
> > I was suggesting we could track write events via logical replication and
> > then we only have to figure out auditing of read events, which would be
> > easier.
> 
> I agree that DDL/DML audit logging requires a lot of the same
> information as logical replication but I don't think reading the logical
> WAL stream is a good way to do audit logging even for writes.
> 
> For instance there are some "writes" that are not WAL logged such as SET
> or ALTER SYSTEM.  Determining attribution would be difficult or
> impossible, such as which function called an update statement (or vice
> versa).  Tying together the read and write information would be tricky
> as well since the WAL stream contains information on transactions but
> not user sessions, if I understand it correctly.
> 
> The end result is that it would be very difficult to record a coherent,
> attributed, and sequential audit log and in fact represent a step
> backward from pgaudit's current capabilities.

Understood.  My point is that there is a short list of read events, and
many DDL events.  We have already hesitated to record DDL changes for
logical replication because of the code size, maintenance overhead, and
testing required.  If we could use the same facility for both logical
replication and auditing, the cost overhead is less per feature.  For
example, we don't need to read the WAL to do the auditing, but the same
facility could be used for both, e.g. output some JSON standard format
that both logical replication and auditing could understand.

The bottom line is we need to crack the record-DDL nut somehow, and at
least in this case, we have two use-cases for it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_ctl promote wait

2016-02-19 Thread Fujii Masao

On Thu, Feb 18, 2016 at 5:45 PM, Simon Riggs  wrote:
> On 18 February 2016 at 08:33, Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
>> > It would be nice if pg_ctl promote supported the -w (wait) option.

+1

>> > How could pg_ctl determine when the promotion has finished?
>>
>> How about looking into pg_control? ControlFileData->state ought to have
>> the correct information.
>
>
> +1

One concern is that there can be a "time" after the pg_control's state
is changed to DB_IN_PRODUCTION and before the server is able to
start accepting normal (not read-only) connections. So if users try to
start write transaction just after pg_ctl promote -w ends, they might
get an error because the server is still in recovery, i.e., the startup
process is running.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-19 Thread Fabien COELHO



Hello.


Based on these results I think 32 will be a good default for
checkpoint_flush_after? There's a few cases where 64 showed to be
beneficial, and some where 32 is better. I've seen 64 perform a bit
better in some cases here, but the differences were not too big.


Yes, these many runs show that 32 is basically as good or better than 64.

I'll do some runs with 16/48 to have some more data.

I gather that you didn't play with 
backend_flush_after/bgwriter_flush_after, i.e. you left them at their 
default values? Especially backend_flush_after can have a significant 
positive and negative performance impact.


Indeed, non reported configuration options have their default values. 
There were also minor changes in the default options for logging (prefix, 
checkpoint, ...), but nothing significant, and always the same for all 
runs.



 [...] Ubuntu 12.04 LTS (precise)


That's with 12.04's standard kernel?


Yes.


   checkpoint_flush_after = { none, 0, 32, 64 }


Did you re-initdb between the runs?


Yes, all runs are from scratch (initdb, pgbench -i, some warmup...).


I've seen massively varying performance differences due to autovacuum
triggered analyzes. It's not completely deterministic when those run,
and on bigger scale clusters analyze can take ages, while holding a
snapshot.


Yes, I agree that probably the performance changes on long vs short runs 
(andres00c vs andres00b) is due to autovacuum.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relaxing SSL key permission checks

2016-02-19 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Now, I have heard it argued that the OpenSSH/L authors are a bunch of
> idiots who know nothing about security.  But it's not like insisting
> on restrictive permissions on key files is something we invented out
> of the blue.  It's pretty standard practice, AFAICT.

I certainly was not intending to imply that anyone was an 'idiot'.

Nor am I arguing that we must remove all such checks from every part of
the system.

I do contend that relying on such checks that happen on a relativly
infrequent basis in daemon processes creates a false sense of security
and does not practically improve security, today.  As I mentioned
previously, perhaps before distributions were as cognizant about
security concerns or about considering how a particular daemon should
be set up, or when users were still frequently installing from source,
such checks were more valuable.  However, when they get in the way of
entirely reasonable system policies and require distributions to patch
the source code, they're a problem.  That doesn't mean we necessairly
have to remove them, but we should be flexible.

Similar checks in client utilities, such as the ssh example, or in psql,
are more useful and, from a practical standpoint, havn't been an issue
for system policies.

Further, we do more than check key files but also check permissions on
the data directory and don't provide any way for users to configure the
permissions on new files, which could be seen as akin to sshd requiring
user home directories to be 700 and forcing umask to 077.

Note that all of this only actually applies to OpenSSH, not to OpenSSL.
Certainly, as evidenced by the question which sparked this discussion,
the packages which are configured to use the local snakeoil cert on
Debian-based systems (which also includes postfix and Apache, offhand)
do not have a problem with the group read permissions that are being
asked for.  I don't find that to be offensive or unacceptable in the
least, nor do I feel that Debian is flawed for taking this approach, or
that OpenSSL is flawed for not having such a check.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH v5] GSSAPI encryption support

2016-02-19 Thread David Steele

On 2/15/16 12:45 PM, Robbie Harwood wrote:
> David Steele  writes:
>
>> 1) It didn't apply cleanly to HEAD.  It did apply cleanly on a455878
>> which I figured was recent enough for testing.  I didn't bisect to find
>> the exact commit that broke it.
> 
> It applied to head of master (57c932475504d63d8f8a68fc6925d7decabc378a)
> for me (`patch -p1 < v4-GSSAPI-encryption-support.patch`).  I rebased it
> anyway and cut a v5 anyway, just to be sure.  It's attached, and
> available on github as well:
> https://github.com/frozencemetery/postgres/commit/dc10e3519f0f6c67f79abd157dc8ff1a1c293f53

It could have been my mistake.  I'll give it another try when you have a
new patch.

>> 2) While I was able to apply the patch and get it compiled it seemed
>> pretty flaky - I was only able to logon about 1 in 10 times on average.
>>  Here was my testing methodology:
> 
> What I can't tell from looking at your methodology is whether both the
> client and server were running my patches or no.  There's no fallback
> here (I'd like to talk about how that should work, with example from
> v1-v3, if people have ideas).  This means that both the client and the
> server need to be running my patches for the moment.  Is this your
> setup?

I was testing on a system with no version of PostgreSQL installed.  I
applied your patch to master and then ran both server and client from
that patched version.  Is there something I'm missing?

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-19 Thread David Steele

On 2/17/16 10:25 PM, Bruce Momjian wrote:
> On Wed, Feb 17, 2016 at 01:59:09PM +0530, Robert Haas wrote:
>> On Wed, Feb 17, 2016 at 5:20 AM, Bruce Momjian  wrote:
>>> On Fri, Feb  5, 2016 at 01:16:25PM -0500, Stephen Frost wrote:
 Looking at pgaudit and the other approaches to auditing which have been
 developed (eg: applications which sit in front of PG and essentially
 have to reimplement large bits of PG to then audit the commands sent
 before passing them to PG, or hacks which try to make sense out of log
 files full of SQL statements) make it quite clear, in my view, that
 attempts to bolt-on auditing to PG result in a poorer solution, from a
 technical perspective, than what this project is known for and capable
 of.  To make true progress towards that, however, we need to get past
 the thinking that auditing doesn't need to be in-core or that it should
 be a second-class citizen feature or that we don't need it in PG.
>>>
>>> Coming in late here, but the discussion around how to maintain the
>>> auditing code seems very similar to how to handle the logical
>>> replication of DDL commands.  First, have we looked into hooking
>>> auditing into scanning logical replication contents, and second, how are
>>> we handling the logical replication of DDL and could we use the same
>>> approach for auditing?
>>
>> Auditing needs to trace read-only events, which aren't reflected in
>> logical replication in any way.  I think it's a good idea to try to
>> drive auditing off of existing machinery instead of inventing
>> something new - I suggested plan invalidation items upthread.  But I
>> doubt that logical replication is the thing to attach it to.
> 
> I was suggesting we could track write events via logical replication and
> then we only have to figure out auditing of read events, which would be
> easier.

I agree that DDL/DML audit logging requires a lot of the same
information as logical replication but I don't think reading the logical
WAL stream is a good way to do audit logging even for writes.

For instance there are some "writes" that are not WAL logged such as SET
or ALTER SYSTEM.  Determining attribution would be difficult or
impossible, such as which function called an update statement (or vice
versa).  Tying together the read and write information would be tricky
as well since the WAL stream contains information on transactions but
not user sessions, if I understand it correctly.

The end result is that it would be very difficult to record a coherent,
attributed, and sequential audit log and in fact represent a step
backward from pgaudit's current capabilities.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] MinGW / Windows / printf format specifiers

2016-02-19 Thread Tom Lane

Craig Ringer  writes:
> On 19 February 2016 at 12:15, Chapman Flack  wrote:
>> Have issues like this been dealt with in PostgreSQL code before, and did
>> a favorite approach emerge?

> INT64_FORMAT and UINT64_FORMAT

Yeah.  Note in particular the convention to avoid using those in
translatable strings.  Where necessary, we manage that by using
sprintf(INT64_FORMAT) into a buffer variable and then printing
the buffer with %s in the translatable message.  Grotty, but there
seems no other way that doesn't result in platform-dependent
translatable strings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in bufmgr.c that result in waste of memory

2016-02-19 Thread Andres Freund

Hi,

Nice catch!

On February 19, 2016 2:42:08 PM GMT+01:00, Tom Lane  wrote:
>> I think we should fix it, but not backpatch.
>
>I don't think that's particularly good policy.  It's a clear bug, why
>would we not fix it?  Leaving it as-is in the back branches can have
>no good effect, and what it does do is create a merge hazard for other
>back-patchable bug fixes in the same area.

Agreed. Unless somebody beats be to it, I'll do do so in a couple hours (11h 
flight now.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Typo in bufmgr.c that result in waste of memory

2016-02-19 Thread Tom Lane

Simon Riggs  writes:
> I see the problem, but I don't buy the argument that it wastes large
> amounts of memory. Or do you have some evidence that it does?

Agreed, it seems unlikely that that hash table gets large enough for
this to be really significant.  Still ...

> I think we should fix it, but not backpatch.

I don't think that's particularly good policy.  It's a clear bug, why
would we not fix it?  Leaving it as-is in the back branches can have
no good effect, and what it does do is create a merge hazard for other
back-patchable bug fixes in the same area.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-19 Thread Amit Kapila

On Fri, Feb 12, 2016 at 6:57 PM, Michael Paquier 
wrote:
>
>
> OK, here is attached a new version that I hope addresses all the
> points raised until now. The following things are changed:
> - Extend XLogInsert with a new uint8 argument to have flags. As of now
> there is one flag: XLOG_INSERT_NO_PROGRESS that can be set to not
> update the progress. By default, the progress LSN is updated.
> - Add extra check in bgwriter to not log a snapshot to be sure that no
> snapshots are logged when there is no activity since last snapshot
> taken, and since last checkpoint.
>

You doesn't seem to have taken care of below typo in your patch as
pointed out by me earlier.

+ * to not rely on taking an exclusive lock an all the WAL insertion locks,

/an all/on all

Does this in anyway take care of the case when there is an activity
on unlogged tables?
I think avoiding to perform checkpoints in an idle system is introduced
in commit 4d14fe0048cf80052a3ba2053560f8aab1bb1b22 whereas
unlogged relation is introduced by commit
53dbc27c62d8e1b6c5253feba04a5094cb8fe046 which is much later.
Now, I think one might consider it okay to not do anything for
unlogged tables as it is not done previously and this patch is
anyway improving the current situation, but I feel if we agree
that checkpoints will get skipped if the only operations that
are happening in the system are on unlogged relations, then
it is better to add it in comments as an improvement even if we
don't want to address it as part of this patch.

+ elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt
%X/%X",
+ (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
+ (uint32) (ControlFile->checkPoint >> 32), (uint32)
ControlFile->checkPoint);

Are you proposing to have the newly intorduced elog messages
as part of commit or are these just for debugging purpose? If you
are proposing for commit, then I think it is worth to justify the need
of same and we should discuss what is the appropriate log level,
otherwise, I think you can have these in an additional patch just for
verification as the main patch is now very close to being
ready for committer.

Also, I think it is worth to once take the performance data for
write tests (using pgbench 30 minute run or some other way) with
minimum checkpoint_timeout (i.e 30s) to see if the additional locking
has any impact on performance.  I think taking locks at intervals
of 15s or 30s should not matter much, but it is better to be safe.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] WAL logging problem in 9.4.3?

2016-02-19 Thread Michael Paquier

On Fri, Feb 19, 2016 at 4:33 PM, Michael Paquier
 wrote:
> On Thu, Feb 18, 2016 at 4:27 PM, Michael Paquier
>  wrote:
>> On Thu, Feb 4, 2016 at 3:24 PM, Heikki Linnakangas  wrote:
>>> I dropped the ball on this one back in July, so here's an attempt to revive
>>> this thread.
>>>
>>> I spent some time fixing the remaining issues with the prototype patch I
>>> posted earlier, and rebased that on top of current git master. See attached.
>>>
>>> Some review of that would be nice. If there are no major issues with it, I'm
>>> going to create backpatchable versions of this for 9.4 and below.
>>
>> I am going to look into that very soon. For now and to not forget
>> about this bug, I have added an entry in the CF app:
>> https://commitfest.postgresql.org/9/528/
>
> Worth noting that this patch does not address the problem with index
> relations when a TRUNCATE is used in the same transaction as its
> CREATE TABLE, take that for example when wal_level = minimal:
> 1) Run transaction
> =# begin;
> BEGIN
> =# create table ab (a int primary key);
> CREATE TABLE
> =# truncate ab;
> TRUNCATE TABLE
> =# commit;
> COMMIT
> 2) Restart server with immediate mode.
> 3) Failure:
> =# table ab;
> ERROR:  XX001: could not read block 0 in file "base/16384/16388": read
> only 0 of 8192 bytes
> LOCATION:  mdread, md.c:728
>
> The case where a COPY is issued after TRUNCATE is fixed though, so
> that's still an improvement.
>
> Here are other comments.
>
> +   /* Flush updates to relations that we didn't WAL-logged */
> +   smgrDoPendingSyncs(true);
> "Flush updates to relations there were not WAL-logged"?
>
> +void
> +FlushRelationBuffersWithoutRelCache(RelFileNode rnode, bool islocal)
> +{
> +   FlushRelationBuffers_common(smgropen(rnode, InvalidBackendId), islocal);
> +}
> islocal is always set as false, I'd rather remove it this argument
> from FlushRelationBuffersWithoutRelCache.
>
> for (i = 0; i < nrels; i++)
> +   {
> smgrclose(srels[i]);
> +   }
> Looks like noise.
>
> +   if (!found)
> +   {
> +   pending->truncated_to = InvalidBlockNumber;
> +   pending->sync_above = nblocks;
> +
> +   elog(DEBUG2, "registering new pending sync for rel %u/%u/%u at
> block %u",
> +rnode.spcNode, rnode.dbNode, rnode.relNode, nblocks);
> +
> +   }
> +   else if (pending->sync_above == InvalidBlockNumber)
> +   {
> +   elog(DEBUG2, "registering pending sync for rel %u/%u/%u at block %u",
> +rnode.spcNode, rnode.dbNode, rnode.relNode, nblocks);
> +   pending->sync_above = nblocks;
> +   }
> +   else
> Here couldn't it be possible that when (sync_above !=
> InvalidBlockNumber), nblocks can be higher than sync_above? In which
> case we had better increase sync_above to nblocks, no?
>
> +   if (!pendingSyncs)
> +   createPendingSyncsHash();
> +   pending = (PendingRelSync *) hash_search(pendingSyncs,
> +(void *) >rd_node,
> +HASH_ENTER, );
> This is lacking comments.
>
> -   if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
> +   BufferGetTag(buffer, , , );
> +   if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT) &&
> +   !smgrIsSyncPending(rnode, blknum))
> Here as well explaining in more details why the buffer does not need
> to go through XLogSaveBufferForHint would be nice.

An additional one:
-   XLogRegisterBuffer(0, newbuf, bufflags);
-   if (oldbuf != newbuf)
-   XLogRegisterBuffer(1, oldbuf, REGBUF_STANDARD);
In log_heap_update, the new buffer is now conditionally logged
depending on if the heap needs WAL or not.

Now during replay the following thing is done:
-   oldaction = XLogReadBufferForRedo(record, (oldblk == newblk) ? 0 : 1,
- );
+   if (oldblk == newblk)
+   oldaction = XLogReadBufferForRedo(record, 0, );
+   else if (XLogRecHasBlockRef(record, 1))
+   oldaction = XLogReadBufferForRedo(record, 1, );
+   else
+   oldaction = BLK_DONE;
Shouldn't we check for XLogRecHasBlockRef(record, 0) when the tuple is
updated on the same page?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-19 Thread Andres Freund

Hi,

On 2016-02-19 10:16:41 +0100, Fabien COELHO wrote:
> Below the results of a lot of tests with pgbench to exercise checkpoints on
> the above version when fetched.

Wow, that's a great test series.

> Overall comments:
>  - sorting & flushing is basically always a winner
>  - benchmarking with short runs on large databases is a bad idea
>the results are very different if a longer run is used
>(see andres00b vs andres00c)

Based on these results I think 32 will be a good default for
checkpoint_flush_after? There's a few cases where 64 showed to be
beneficial, and some where 32 is better. I've seen 64 perform a bit
better in some cases here, but the differences were not too big.

I gather that you didn't play with
backend_flush_after/bgwriter_flush_after, i.e. you left them at their
default values? Especially backend_flush_after can have a significant
positive and negative performance impact.

>  16 GB 2 cpu 8 cores
>  200 GB RAID1 HDD, ext4 FS
>  Ubuntu 12.04 LTS (precise)

That's with 12.04's standard kernel?

>  postgresql.conf:
>shared_buffers = 1GB
>max_wal_size = 1GB
>checkpoint_timeout = 300s
>checkpoint_completion_target = 0.8
>checkpoint_flush_after = { none, 0, 32, 64 }

Did you re-initdb between the runs?

I've seen massively varying performance differences due to autovacuum
triggered analyzes. It's not completely deterministic when those run,
and on bigger scale clusters analyze can take ages, while holding a
snapshot.

> Hmmm, interesting: maintenance_work_mem seems to have some influence on
> performance, although it is not too consistent between settings, probably
> because as the memory is used to its limit the performance is quite
> sensitive to the available memory.

That's probably because of differing behaviour of autovacuum/vacuum,
which sometime will have to do several scans of the tables if there are
too many dead tuples.

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relaxing SSL key permission checks

2016-02-19 Thread Christoph Berg

Re: Tom Lane 2016-02-18 <27423.1455809...@sss.pgh.pa.us>
> I did have a thought though: could we allow two distinct permissions
> configurations?  That is, allow either:
> 
> * file is owned by us, mode 0600 or less
> 
> * file is owned by root, mode 0640 or less
> 
> The first case is what we allow today.  (We don't need an explicit
> ownership check; if the mode is 0600 and we can read it, we must be
> the owner.)  The second case is what Debian wants.  We already know
> we are not root, so if we can read the file, we must be part of the
> group that root has allowed to read the file, and at that point it's
> on root's head whether or not that group is secure.  I don't have a
> problem with trusting root's judgment on security matters --- if the
> root admin is incompetent, there are probably holes everywhere anyway.

Makes sense to me.

> The problem with the proposed patch is that it's conflating these
> distinct cases, but that's easily fixed.

Updated patch attached.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
new file mode 100644
index 1e3dfb6..1f61601
*** a/src/backend/libpq/be-secure-openssl.c
--- b/src/backend/libpq/be-secure-openssl.c
*** be_tls_init(void)
*** 207,213 
  			ssl_key_file)));
  
  		/*
! 		 * Require no public access to key file.
  		 *
  		 * XXX temporarily suppress check when on Windows, because there may
  		 * not be proper support for Unix-y file permissions.  Need to think
--- 207,217 
  			ssl_key_file)));
  
  		/*
! 		 * Require no public access to key file. If the file is owned by us,
! 		 * require mode 0600 or less. If owned by root, require 0640 or less
! 		 * to allow read access through our gid, or a supplementary gid that
! 		 * allows to read system-wide certificates. Refuse to load files owned
! 		 * by other users.
  		 *
  		 * XXX temporarily suppress check when on Windows, because there may
  		 * not be proper support for Unix-y file permissions.  Need to think
*** be_tls_init(void)
*** 215,226 
  		 * directory permission check in postmaster.c)
  		 */
  #if !defined(WIN32) && !defined(__CYGWIN__)
! 		if (!S_ISREG(buf.st_mode) || buf.st_mode & (S_IRWXG | S_IRWXO))
  			ereport(FATAL,
  	(errcode(ERRCODE_CONFIG_FILE_ERROR),
    errmsg("private key file \"%s\" has group or world access",
  		 ssl_key_file),
!    errdetail("Permissions should be u=rw (0600) or less.")));
  #endif
  
  		if (SSL_CTX_use_PrivateKey_file(SSL_context,
--- 219,233 
  		 * directory permission check in postmaster.c)
  		 */
  #if !defined(WIN32) && !defined(__CYGWIN__)
! 		if (!S_ISREG(buf.st_mode) ||
! (buf.st_uid == geteuid() && buf.st_mode & (S_IRWXGRP | S_IRWXO)) ||
! (buf.st_uid == 0 && buf.st_mode & (S_IWXGRP | S_IRWXO)) ||
! (buf.st_uid != geteuid() && buf.st_uid != 0))
  			ereport(FATAL,
  	(errcode(ERRCODE_CONFIG_FILE_ERROR),
    errmsg("private key file \"%s\" has group or world access",
  		 ssl_key_file),
!    errdetail("File must be owned by the database user and have permissions u=rw (0600) or less, or owned by root and have permissions u=rw,g=w (0640) or less.")));
  #endif
  
  		if (SSL_CTX_use_PrivateKey_file(SSL_context,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

2016-02-19 Thread Pavel Stehule

Hi

2016-02-18 4:59 GMT+01:00 Pavel Stehule :

> select
>> parse_ident(E'X\rXX');
>
>
I am sending updated patch - I used json function for correct escaping -
the escaping behave is same.

Regards

Pavel
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index f9eea76..c1c113a
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 1821,1826 
--- 1821,1847 

 
  
+  parse_ident
+ 
+ parse_ident(str text,
+[ strictmode boolean DEFAULT true ] )
+
+text[]
+Split qualified identifier into array parts.
+When strictmode is false, extra characters after the identifier are ignored.
+This is useful for parsing identifiers for objects like functions and arrays that may have trailing
+characters. By default, extra characters after the last identifier are considered an error.
+second parameter is false, then chars after last identifier are ignored. Note that this function
+does not truncate quoted identifiers. If you care about that you should cast the result of this
+function to name[].
+
+parse_ident('"SomeSchema".someTable')
+"SomeSchema,sometable"
+   
+ 
+   
+
+ 
   pg_client_encoding
  
  pg_client_encoding()
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index abf9a70..38af138
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS jsonb
*** 971,973 
--- 971,980 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strict boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
new file mode 100644
index 2b4ab20..7aa5b76
*** a/src/backend/parser/scansup.c
--- b/src/backend/parser/scansup.c
*** scanstr(const char *s)
*** 130,135 
--- 130,144 
  char *
  downcase_truncate_identifier(const char *ident, int len, bool warn)
  {
+ 	return downcase_identifier(ident, len, warn, true);
+ }
+ 
+ /*
+  * a workhorse for downcase_truncate_identifier
+  */
+ char *
+ downcase_identifier(const char *ident, int len, bool warn, bool truncate)
+ {
  	char	   *result;
  	int			i;
  	bool		enc_is_single_byte;
*** downcase_truncate_identifier(const char
*** 158,169 
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
--- 167,179 
  	}
  	result[i] = '\0';
  
! 	if (i >= NAMEDATALEN && truncate)
  		truncate_identifier(result, i, warn);
  
  	return result;
  }
  
+ 
  /*
   * truncate_identifier() --- truncate an identifier to NAMEDATALEN-1 bytes.
   *
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 43f36db..d11581a
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 21,32 
--- 21,35 
  #include 
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***
*** 38,44 
--- 41,49 
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
+ #include "utils/json.h"
  #include "utils/timestamp.h"
  
  #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
*** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 719,721 
--- 724,912 
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * Sanitize enhanced string for using in error message.
+  */
+ static char *
+ sanitize_str(const char *str)
+ {
+ 	StringInfoData	ds;
+ 
+ 	/* we share same escaping rules with json escaping function */
+ 	initStringInfo();
+ 	escape_json(, str);
+ 
+ 	return

Re: [HACKERS] Typo in bufmgr.c that result in waste of memory

2016-02-19 Thread Takashi Horikawa


> I see the problem, but I don't buy the argument that it wastes large amounts
> of memory. Or do you have some evidence that it does?
No. I don’t have any trouble caused by it.
I think I did not mention it wastes 'large' amount of memory but 'a few'.

> I think we should fix it, but not backpatch.
I agree that backpatch is unnecessary.
I hope it will be fixed for the current development version
at some suitable opportunity.

Best regards,
Takashi Horikawa
NEC Corporation
Knowledge Discovery Research Laboratories


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Simon Riggs
> Sent: Friday, February 19, 2016 6:24 PM
> To: Horikawa Takashi(堀川 隆)
> Cc: PostgreSQL Hackers
> Subject: Re: [HACKERS] Typo in bufmgr.c that result in waste of memory
> 
> On 19 February 2016 at 02:58, Takashi Horikawa 
> wrote:
> 
> 
> 
>   I have just found a typo in the source code (not in a comment) of
> bufmgr.c
>   that result in waste of memory. It might be a 'bug' but it does
> not result
>   in any incorrect operation but just results in waste of a few memory
>   resource.
> 
>   As sizeof(PrivateRefCountArray) found in InitBufferPoolAccess()
> is 64 and
>   sizeof(PrivateRefCountEntry) which should be used here is 8, this
> typo
>   produces 56 byte of unused memory area per one PrivateRefCount entry
> in the
>   hash table. I think this result in not only the waste of memory
> but also
>   reduces the cache hit ratio.
> 
>   Xhash_ctl.entrysize = sizeof(PrivateRefCountArray);
>   Ohash_ctl.entrysize = sizeof(PrivateRefCountEntry);
> 
> 
> 
> I see the problem, but I don't buy the argument that it wastes large amounts
> of memory. Or do you have some evidence that it does?
> 
> I think we should fix it, but not backpatch.
> 
> --
> 
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types

2016-02-19 Thread Artur Zakirov


It seems all fixes are done. I tested the patch and regression tests passed.

On 27.01.2016 20:58, Pavel Stehule wrote:



 > --- 1681,1687 
 >* --
 >*/
 >   PLpgSQL_type *
 > ! plpgsql_parse_wordtype(char *ident, int reftype_mode)
 >   {
 >   PLpgSQL_type *dtype;
 >   PLpgSQL_nsitem *nse;

Use the typedef'ed enum, as above.

 > --- 1699,1721 
 >   switch (nse->itemtype)
 >   {
 >   case PLPGSQL_NSTYPE_VAR:
 > ! {
 > ! dtype = ((PLpgSQL_var *)
(plpgsql_Datums[nse->itemno]))->datatype;
 > ! return derive_type(dtype,
reftype_mode);
 > ! }
 >
 > ! case PLPGSQL_NSTYPE_ROW:
 > ! {
 > ! dtype = ((PLpgSQL_row *)
(plpgsql_Datums[nse->itemno]))->datatype;
 > ! return derive_type(dtype,
reftype_mode);
 > ! }
 >
 > + /*
 > +  * XXX perhaps allow REC here? Probably it
has not any sense, because
 > +  * in this moment, because PLpgSQL doesn't
support rec parameters, so
 > +  * there should not be any rec polymorphic
parameter, and any work can
 > +  * be done inside function.
 > +  */

I think you should remove from the "?" onwards in that comment, i.e.
just keep what was already in the original comment (minus the ROW)


I tried to fix it, not sure if understood well.


I think here Alvaro means that you should keep original comment without 
the ROW. Like this:


/* XXX perhaps allow REC here? */



 > *** extern bool plpgsql_parse_dblword(char *
 > *** 961,968 
 > PLwdatum *wdatum, PLcword
*cword);
 >   extern bool plpgsql_parse_tripword(char *word1, char *word2,
char *word3,
 >  PLwdatum *wdatum,
PLcword *cword);
 > ! extern PLpgSQL_type *plpgsql_parse_wordtype(char *ident);
 > ! extern PLpgSQL_type *plpgsql_parse_cwordtype(List *idents);
 >   extern PLpgSQL_type *plpgsql_parse_wordrowtype(char *ident);
 >   extern PLpgSQL_type *plpgsql_parse_cwordrowtype(List *idents);
 >   extern PLpgSQL_type *plpgsql_build_datatype(Oid typeOid, int32
typmod,
 > --- 973,980 
 > PLwdatum *wdatum, PLcword
*cword);
 >   extern bool plpgsql_parse_tripword(char *word1, char *word2,
char *word3,
 >  PLwdatum *wdatum,
PLcword *cword);
 > ! extern PLpgSQL_type *plpgsql_parse_wordtype(char *ident, int
reftype_mode);
 > ! extern PLpgSQL_type *plpgsql_parse_cwordtype(List *idents, int
reftype_mode);
 >   extern PLpgSQL_type *plpgsql_parse_wordrowtype(char *ident);
 >   extern PLpgSQL_type *plpgsql_parse_cwordrowtype(List *idents);
 >   extern PLpgSQL_type *plpgsql_build_datatype(Oid typeOid, int32
typmod,

By the way, these functions are misnamed after this patch.  They are
called "wordtype" and "cwordtype" originally because they accept
"word%TYPE" and "compositeword%TYPE", but after the patch they not only
accept TYPE at the right of the percent sign but also ELEMENTTYPE and
ARRAYTYPE.  Not sure that this is something we want to be too strict
about.


Understand - used name ***reftype instead type


I am not sure, but it seems that new names is a little worse. I think 
original names are good too. They accept a word and return the 
PLpgSQL_type structure.




Thank you for your comment

Attached updated patch

Regards

Pavel



--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




I noticed a little typo in the comment in the derive_type():
/* Return base_type, when it is a array already */

should be:
/* Return base_type, when it is an array already */

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in StartupSUBTRANS

2016-02-19 Thread Simon Riggs

On 14 February 2016 at 00:03, Jeff Janes  wrote:


> I've attached a new version, incorporating comments from Tom and Michael.
>

Applied, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Typo in bufmgr.c that result in waste of memory

2016-02-19 Thread Simon Riggs

On 19 February 2016 at 02:58, Takashi Horikawa 
wrote:


> I have just found a typo in the source code (not in a comment) of bufmgr.c
> that result in waste of memory. It might be a 'bug' but it does not result
> in any incorrect operation but just results in waste of a few memory
> resource.
>
> As sizeof(PrivateRefCountArray) found in InitBufferPoolAccess() is 64 and
> sizeof(PrivateRefCountEntry) which should be used here is 8, this typo
> produces 56 byte of unused memory area per one PrivateRefCount entry in the
> hash table. I think this result in not only the waste of memory but also
> reduces the cache hit ratio.
>
> Xhash_ctl.entrysize = sizeof(PrivateRefCountArray);
> Ohash_ctl.entrysize = sizeof(PrivateRefCountEntry);
>

I see the problem, but I don't buy the argument that it wastes large
amounts of memory. Or do you have some evidence that it does?

I think we should fix it, but not backpatch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] checkpointer continuous flushing - V16

2016-02-19 Thread Fabien COELHO



Hello Andres,


I don't want to post a full series right now, but my working state is
available on
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/checkpoint-flush
git://git.postgresql.org/git/users/andresfreund/postgres.git checkpoint-flush


Below the results of a lot of tests with pgbench to exercise checkpoints 
on the above version when fetched.


Overall comments:
 - sorting & flushing is basically always a winner
 - benchmarking with short runs on large databases is a bad idea
   the results are very different if a longer run is used
   (see andres00b vs andres00c)

# HOST/SOFT

 16 GB 2 cpu 8 cores
 200 GB RAID1 HDD, ext4 FS
 Ubuntu 12.04 LTS (precise)

# ABOUT THE REPORTED STATISTICS

 tps: is the "excluding connection" time tps, the higher the better
 1-sec tps: average of measured per-second tps
   note - it should be the same as the previous one, but due to various
  hazards in the trace, especially when things go badly and pg get
  stuck, it may be different. Such hazard also explain why there
  may be some non-integer tps reported for some seconds.
 stddev: standard deviation, the lower the better
 the five figures in bracket give a feel of the distribution:
 - min: minimal per-second tps seen in the trace
 - q1: first quarter per-second tps seen in the trace
 - med: median per-second tps seen in the trace
 - q3: third quarter per-second tps seen in the trace
 - max: maximal per-second tps seen in the trace
 the last percentage dubbed "<=10.0" is percent of seconds where performance
   is below 10 tps: this measures of how unresponsive pg was during the run

## TINY2

 pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4
   with scale = 10 (~ 200 MB)

 postgresql.conf:
   shared_buffers = 1GB
   max_wal_size = 1GB
   checkpoint_timeout = 300s
   checkpoint_completion_target = 0.8
   checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 2574.1 / 2574.3 ± 367.4 [229.0, 2570.1, 2721.9, 2746.1, 2857.2] 0.0%
  1 | 2575.0 / 2575.1 ± 359.3 [  1.0, 2595.9, 2712.0, 2732.0, 2847.0] 0.1%
  2 | 2602.6 / 2602.7 ± 359.5 [ 54.0, 2607.1, 2735.1, 2768.1, 2908.0] 0.0%

0 0 | 2583.2 / 2583.7 ± 296.4 [164.0, 2580.0, 2690.0, 2717.1, 2833.8] 0.0%
  1 | 2596.6 / 2596.9 ± 307.4 [296.0, 2590.5, 2707.9, 2738.0, 2847.8] 0.0%
  2 | 2604.8 / 2605.0 ± 300.5 [110.9, 2619.1, 2712.4, 2738.1, 2849.1] 0.0%

   32 0 | 2625.5 / 2625.5 ± 250.5 [  1.0, 2645.9, 2692.0, 2719.9, 2839.0] 0.1%
  1 | 2630.2 / 2630.2 ± 243.1 [301.8, 2654.9, 2697.2, 2726.0, 2837.4] 0.0%
  2 | 2648.3 / 2648.4 ± 236.7 [570.1, 2664.4, 2708.9, 2739.0, 2844.9] 0.0%

   64 0 | 2587.8 / 2587.9 ± 306.1 [ 83.0, 2610.1, 2680.0, 2731.0, 2857.1] 0.0%
  1 | 2591.1 / 2591.1 ± 305.2 [455.9, 2608.9, 2680.2, 2734.1, 2859.0] 0.0%
  2 | 2047.8 / 2046.4 ± 925.8 [  0.0, 1486.2, 2592.6, 2691.1, 3001.0] 0.2% ?

Pretty small setup, all data fit in buffers. Good tps performance all around
(best for 32 flushes), and flushing shows a noticable (360 -> 240) reduction
in tps stddev.

## SMALL

 pgbench -M prepared -N -P 1 -T 4000 -j 2 -c 4
   with scale = 120 (~ 2 GB)

 postgresql.conf:
   shared_buffers = 2GB
   checkpoint_timeout = 300s
   checkpoint_completion_target = 0.8
   checkpoint_flush_after = { none, 0, 32, 64 }

 opts # |   tps / 1-sec tps ± stddev [ min q1 med q2 max ] <=10.0

 head 0 | 209.2 / 204.2 ± 516.5 [0.0,   0.0,   4.0,5.0, 2251.0] 82.3%
  1 | 207.4 / 204.2 ± 518.7 [0.0,   0.0,   4.0,5.0, 2245.1] 82.3%
  2 | 217.5 / 211.0 ± 530.3 [0.0,   0.0,   3.0,5.0, 2255.0] 82.0%
  3 | 217.8 / 213.2 ± 531.7 [0.0,   0.0,   4.0,6.0, 2261.9] 81.7%
  4 | 230.7 / 223.9 ± 542.7 [0.0,   0.0,   4.0,7.0, 2282.0] 80.7%

0 0 | 734.8 / 735.5 ± 879.9 [0.0,   1.0,  16.5, 1748.3, 2281.1] 47.0%
  1 | 694.9 / 693.0 ± 849.0 [0.0,   1.0,  29.5, 1545.7, 2428.0] 46.4%
  2 | 735.3 / 735.5 ± 888.4 [0.0,   0.0,  12.0, 1781.2, 2312.1] 47.9%
  3 | 736.0 / 737.5 ± 887.1 [0.0,   1.0,  16.0, 1794.3, 2317.0] 47.5%
  4 | 734.9 / 735.1 ± 885.1 [0.0,   1.0,  15.5, 1781.0, 2297.1] 47.2%

   32 0 | 738.1 / 737.9 ± 415.8 [0.0, 553.0, 679.0,  753.0, 2312.1]  0.2%
  1 | 730.5 / 730.7 ± 413.2 [0.0, 546.5, 671.0,  744.0, 2319.0]  0.1%
  2 | 741.9 / 741.9 ± 416.5 [0.0, 556.0, 682.0,  756.0, 2331.0]  0.2%
  3 | 744.1 / 744.1 ± 414.4 [0.0, 555.5, 685.2,  758.0, 2285.1]  0.1%
  4 | 746.9 / 746.9 ± 416.6 [0.0, 566.6, 685.0,  759.0, 2308.1]  0.1%

   64 0 | 743.0 / 743.1 ± 416.5 [1.0, 555.0, 683.0,  759.0, 2353.0]  0.1%
  1 | 742.5 / 742.5 ± 415.6 [0.0, 558.2, 680.0,  758.2, 2296.0]  0.1%
  2 | 742.5 / 742.5 ± 415.9 [0.0, 559.0, 681.1,  757.0, 2310.0]  0.1%
  3 | 529.0 / 526.6 ± 410.9 [0.0, 245.0, 444.0,  701.0, 2380.9]  1.5% ??
  4 | 734.8 / 735.0 ± 414.1 [0.0, 550.0, 673.0,  754.0, 2298.0]  0.1%

Sorting brings * 3.3 tps, flushing significantly reduces tps

[PoC] WaitLatchOrSocketMulti (Re: [HACKERS] Performance degradation in commit ac1d794)

2016-02-19 Thread Kyotaro HORIGUCHI

Hello, I don't see how ac1d794 will be dealt, but I tried an
example implement of multi-socket version of WaitLatchOrSocket
using callbacks on top of the current master where ac1d794 has
not been removed yet.

At Thu, 14 Jan 2016 13:46:44 -0500, Robert Haas  wrote 
in 
> On Thu, Jan 14, 2016 at 12:14 PM, Andres Freund  wrote:
> > On 2016-01-14 12:07:23 -0500, Robert Haas wrote:
> >> > Do we want to provide a backward compatible API for all this? I'm fine
> >> > either way.
> >>
> >> How would that work?
> >
> > I'm thinking of something like;
> >
> > int WaitOnLatchSet(LatchEventSet *set, int wakeEvents, long timeout);
> >
> > int
> > WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,long 
> > timeout)
> > {
> > LatchEventSet set;
> >
> > LatchEventSetInit(, latch);
> >
> > if (sock != PGINVALID_SOCKET)
> >LatchEventSetAddSock(, sock);
> >
> > return WaitOnLatchSet(set, wakeEvents, timeout);
> > }
> >
> > I think we'll need to continue having wakeEvents and timeout parameters
> > for WaitOnLatchSet, we quite frequently want to wait socket
> > readability/writability, not wait on the socket, or have/not have
> > timeouts.
> 
> Well, if we ever wanted to support multiple FDs, we'd need the
> readability/writeability thing to be per-fd, not per-set.
> 
> Overall, if this is what you have in mind for backward compatibility,
> I rate it M for Meh.  Let's just break compatibility and people will
> have to update their code.  That shouldn't be hard, and if we don't
> make people do it when we make the change, then we'll be stuck with
> the backward-compatibility interface for a decade.  I doubt it's worth
> it.

The API is similar to what Robert suggested but different because
it would too complicate a bit for the most cases. So this example
implement has an intermediate style of the current API and the
Robert's suggestion, and using callbacks as I proposed.

int WaitLatchOrSocketMulti(pgwaitobject *wobjs, int nobjs, long timeout);

This is implemented only for poll, not for select.

A sample usage is seen in secure_read().

> pgwaitobject objs[3];
...
> InitWaitLatch(objs[0], MyLatch);
> InitWaitPostmasterDeath(objs[1]);
> InitWaitSocket(objs[2], port->sock, waitfor);
> 
> w = WaitLatchOrSocketMulti(objs, 3, 0);
> //w = WaitLatchOrSocket(MyLatch,
> //WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
> //port->sock, 0);


The core of the function looks as the following. It runs
callbacks for every fired events.

>rc = poll(pfds, nfds, (int) cur_timeout);
...
>if (rc < 0)
...
>else
>{
>  for (i = 0 ; i < nfds ; i++)
>  {
>wobjs[i].retEvents = 0;
>if (pfds[i].revents && wobjs[i].cb)
>  result |= wobjs[i].cb([i], pfds[i].revents);
>
>if (result & WL_IMMEDIATELY_BREAK)
>  break;
>  }
>}

In the above part, poll()'s event is passed the callbacks so
callbacks may have a different inplement for select().

Having a callback for sockets. The initializer could be as the
following.

> InitWaitSocketCB(wait_obj, sock, event, your_callback);


If we want to have the waiting-object array independently from
specific functions to achieve asynchronous handling of socket
events. It could be realised by providing a set of wrapper
functions as exactly what Robert said as above.

Does this make sense?
Does anyone have any opinion? or thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b7cc9939ea61654fae98c4fe958c8c67df9f3758 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 19 Feb 2016 16:34:49 +0900
Subject: [PATCH] [PoC] Add mult-socket version of WaitLatchOrSocket.

---
 src/backend/libpq/be-secure.c |  12 +-
 src/backend/port/unix_latch.c | 268 ++
 src/include/storage/latch.h   |  29 +
 3 files changed, 306 insertions(+), 3 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index ac709d1..d99a983 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -141,12 +141,18 @@ retry:
 	if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
 	{
 		int			w;
+		pgwaitobject objs[3];
 
 		Assert(waitfor);
 
-		w = WaitLatchOrSocket(MyLatch,
-			  WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
-			  port->sock, 0);
+		InitWaitLatch(objs[0], MyLatch);
+		InitWaitPostmasterDeath(objs[1]);
+		InitWaitSocket(objs[2], port->sock, waitfor);
+
+		w = WaitLatchOrSocketMulti(objs, 3, 0);
+//		w = WaitLatchOrSocket(MyLatch,
+//			  WL_LATCH_SET | WL_POSTMASTER_DEATH | waitfor,
+//			  port->sock, 0);
 
 		/*
 		 * If the postmaster has died, it's not safe to continue running,
diff --git a/src/backend/port/unix_latch.c

63 matches

Mail list logo