Re: format of pg_upgrade loadable_libraries warning

2019-10-04 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Oct  4, 2019 at 05:40:08PM -0500, Justin Pryzby wrote:
>> I would argue to include in 12.1, since 12 is what most everyone will use for
>> upgrades, and patch for .1 will help people upgrading for 11 of the next 12
>> months.  (But, your patch is more general than mine).

> No, there might be tools that depend on the existing format, and this is
> the first report of confusion I have read.

Translations will also lag behind any such change.  Speaking of which,
it might be a good idea to include some translator: annotations to
help translators understand what context these fragmentary phrases
are used in.  I'd actually say that my biggest concern with these
messages is whether they can translate into something sane.

regards, tom lane




Re: Change atoi to strtol in same place

2019-10-04 Thread Ashutosh Sharma
Hi Joe,

On a quick look, the patch seems to be going in a good direction
although there are quite some pending work to be done.

One suggestion:
The start value for port number is set to 1, however it seems like the
port number that falls in the range of 1-1023 is reserved and can't be
used. So, is it possible to have the start value as 1024 instead of 1
?

Further, I encountered one syntax error (INT_MAX undeclared) as the
header file "limits.h" has not been included in postgres_fe.h or
option.h

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Fri, Oct 4, 2019 at 9:04 PM Alvaro Herrera  wrote:
>
> On 2019-Oct-03, Joe Nelson wrote:
>
> > Kyotaro Horiguchi wrote:
> > > > pg_standby: -k keepfiles could not parse 'hoge' as integer
> > >
> > > I didn't checked closely, but -k of pg_standby's message looks
> > > somewhat strange. Needs a separator?
> >
> > Good point, how about this:
> >
> >   pg_standby: -k keepfiles: 
>
> The wording is a bit strange.  How about something like
> pg_standy: invalid argument to -k: %s
>
> where the %s is the error message produced like you propose:
>
> > I could have pg_strtoint64_range() wrap its error messages in _() so
> > that translators could customize the messages prior to concatenation.
> >
> >   *error = psprintf(_("could not parse '%s' as integer"), str);
>
> ... except that they would rather be more explicit about what the
> problem is.  "insufficient digits" or "extraneous character", etc.
>
> > Would this suffice?
>
> I hope that no callers would like to have the messages not translated,
> because that seems like it would become a mess.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>




Re: refactoring - share str2*int64 functions

2019-10-04 Thread Ashutosh Sharma
On Fri, Oct 4, 2019 at 8:58 PM Andres Freund  wrote:
>
> Hi,
>
> On 2019-10-04 14:27:44 +0530, Ashutosh Sharma wrote:
> > Is there any specific reason for hard coding the *base* of a number
> > representing the string in strtouint64(). I understand that currently
> > strtouint64() is being used just to convert an input string to decimal
> > unsigned value but what if we want it to be used for hexadecimal
> > values or may be some other values, in that case it can't be used.
>
> It's a lot slower if the base is variable, because the compiler cannot
> replace the division by shifts.
>

Thanks Andres for the reply. I didn't know that the compiler won't be
able to replace division with shifts operator if the base is variable
and it's true that it would make the things a lot slower.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com




Updated some links which are not working with new links

2019-10-04 Thread vignesh C
Hi,

There are some links referred in the source files which are currently
not working.

The below link:

is updated with:


The below links:
http://www-01.ibm.com/support/knowledgecenter/SSGH2K_11.1.0/com.ibm.xlc111.aix.doc/language_ref/function_attributes.html
http://www-01.ibm.com/support/knowledgecenter/SSGH2K_11.1.0/com.ibm.xlc111.aix.doc/language_ref/type_attrib.html
are updated with:
http://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/function_attributes.html
http://www-01.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/language_ref/type_attrib.html

In c.h the link was not updated but in generic-xlc.h the link has been
updated earlier.

Attached patch contains the fix with the updated links.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


0001-Updated-the-links-that-are-not-working.patch
Description: Binary data


Re: format of pg_upgrade loadable_libraries warning

2019-10-04 Thread Bruce Momjian
On Fri, Oct  4, 2019 at 05:40:08PM -0500, Justin Pryzby wrote:
> On Fri, Oct 04, 2019 at 05:37:46PM -0400, Bruce Momjian wrote:
> > On Wed, Oct  2, 2019 at 12:23:37PM -0500, Justin Pryzby wrote:
> > > Regarding the previous thread and commit here:
> > > https://www.postgresql.org/message-id/flat/20180713162815.GA3835%40momjian.us
> > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=60e3bd1d7f92430b24b710ecf0559656eb8ed499
> > > 
> > > I'm suggesting to reformat the warning, which I found to be misleading:
> > 
> > Understood.  This is a general problem with the way pg_upgrade displays
> > errors and the databases/objects associated with them.  The attached
> > patch fixes the output text to say "in database", e.g.:
> > 
> >   Could not load library "$libdir/pgfincore": ERROR:  could not access file 
> > "$libdir/pgfincore": No such file or directory
> >   in database: postgres
> >   in database: too
> > 
> > Would intenting help too?  I am inclined to fix this only head, and not
> > to backpatch the change.
> 
> Yes, indenting would also help.
> 
> I would argue to include in 12.1, since 12 is what most everyone will use for
> upgrades, and patch for .1 will help people upgrading for 11 of the next 12
> months.  (But, your patch is more general than mine).

No, there might be tools that depend on the existing format, and this is
the first report of confusion I have read.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Bruce Momjian
On Sat, Oct  5, 2019 at 12:54:35AM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:
> > For full-cluster TDE with AES-NI-enabled, the performance impact is
> > usually ~4%, so doing anything more granular doesn't seem useful.  See
> > this PGCon presentation with charts:
> > 
> > https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s
> > 
> > Having anthing more fine-grained that all-cluster didn't seem worth it.
> > Using per-user keys is useful, but also much harder to implement.
> > 
> 
> Not sure I follow. I thought you are asking why Oracle apparently does
> not leverage AES-NI for column-level encryption (at least according to
> the document I linked)? And I don't know why that's the case.

No, I read it as Oracle saying that there isn't much value to per-column
encryption if you have crypto hardware acceleration, because the
all-cluster encryption overhead is so minor.

> FWIW performance is just one (supposed) benefit of column encryption,
> even if all-cluster encryption is just as fast, there might be other
> reasons to support it.

Well, there is per-user/db encryption, but I think that needs to be done
at the SQL level.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 06:06:10PM -0400, Bruce Momjian wrote:

On Fri, Oct  4, 2019 at 11:48:19PM +0200, Tomas Vondra wrote:

On Fri, Oct 04, 2019 at 04:58:14PM -0400, Bruce Momjian wrote:
> On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
> > Oracle also has a handy "TDE best practices" document [2], which says
> > when to use column-level encryption - let me quote a couple of points:
> >
> > * Location of sensitive information is known
> >
> > * Less than 5% of all application columns are encryption candidates
> >
> > * Encryption candidates are not foreign-key columns
> >
> > * Indexes over encryption candidates are normal B-tree indexes (this
> >  also means no support for indexes on expressions, and likely partial
> >  indexes)
> >
> > * No support from hardware crypto acceleration.
>
> Aren't all modern systems going to have hardware crypto acceleration,
> i.e., AES-NI CPU extensions.  Does that mean there is no value of
> partial encryption on such systems?  Looking at the overhead numbers I
> have seen for AES-NI-enabled systems, I believe it.
>


That's a good question, I don't know the answer. You're right most
systems have CPUs with AES-NI these days, and I'm not sure why the
column encryption does not leverage that.

Maybe it's because column encryption has to encrypt/decrypt much smaller
chunks of data, and AES-NI is not efficient for that? I don't know.


For full-cluster TDE with AES-NI-enabled, the performance impact is
usually ~4%, so doing anything more granular doesn't seem useful.  See
this PGCon presentation with charts:

https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s

Having anthing more fine-grained that all-cluster didn't seem worth it.
Using per-user keys is useful, but also much harder to implement.



Not sure I follow. I thought you are asking why Oracle apparently does
not leverage AES-NI for column-level encryption (at least according to
the document I linked)? And I don't know why that's the case.

FWIW performance is just one (supposed) benefit of column encryption,
even if all-cluster encryption is just as fast, there might be other
reasons to support it.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: format of pg_upgrade loadable_libraries warning

2019-10-04 Thread Justin Pryzby
On Fri, Oct 04, 2019 at 05:37:46PM -0400, Bruce Momjian wrote:
> On Wed, Oct  2, 2019 at 12:23:37PM -0500, Justin Pryzby wrote:
> > Regarding the previous thread and commit here:
> > https://www.postgresql.org/message-id/flat/20180713162815.GA3835%40momjian.us
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=60e3bd1d7f92430b24b710ecf0559656eb8ed499
> > 
> > I'm suggesting to reformat the warning, which I found to be misleading:
> 
> Understood.  This is a general problem with the way pg_upgrade displays
> errors and the databases/objects associated with them.  The attached
> patch fixes the output text to say "in database", e.g.:
> 
>   Could not load library "$libdir/pgfincore": ERROR:  could not access file 
> "$libdir/pgfincore": No such file or directory
>   in database: postgres
>   in database: too
> 
> Would intenting help too?  I am inclined to fix this only head, and not
> to backpatch the change.

Yes, indenting would also help.

I would argue to include in 12.1, since 12 is what most everyone will use for
upgrades, and patch for .1 will help people upgrading for 11 of the next 12
months.  (But, your patch is more general than mine).

Thanks,
Justin




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Bruce Momjian
On Fri, Oct  4, 2019 at 11:48:19PM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 04:58:14PM -0400, Bruce Momjian wrote:
> > On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
> > > Oracle also has a handy "TDE best practices" document [2], which says
> > > when to use column-level encryption - let me quote a couple of points:
> > > 
> > > * Location of sensitive information is known
> > > 
> > > * Less than 5% of all application columns are encryption candidates
> > > 
> > > * Encryption candidates are not foreign-key columns
> > > 
> > > * Indexes over encryption candidates are normal B-tree indexes (this
> > >  also means no support for indexes on expressions, and likely partial
> > >  indexes)
> > > 
> > > * No support from hardware crypto acceleration.
> > 
> > Aren't all modern systems going to have hardware crypto acceleration,
> > i.e., AES-NI CPU extensions.  Does that mean there is no value of
> > partial encryption on such systems?  Looking at the overhead numbers I
> > have seen for AES-NI-enabled systems, I believe it.
> > 
> 
> 
> That's a good question, I don't know the answer. You're right most
> systems have CPUs with AES-NI these days, and I'm not sure why the
> column encryption does not leverage that.
> 
> Maybe it's because column encryption has to encrypt/decrypt much smaller
> chunks of data, and AES-NI is not efficient for that? I don't know.

For full-cluster TDE with AES-NI-enabled, the performance impact is
usually ~4%, so doing anything more granular doesn't seem useful.  See
this PGCon presentation with charts:

https://www.youtube.com/watch?v=TXKoo2SNMzk#t=27m50s

Having anthing more fine-grained that all-cluster didn't seem worth it. 
Using per-user keys is useful, but also much harder to implement.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Bruce Momjian
On Fri, Oct  4, 2019 at 11:31:00PM +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote:
> > The current approach is to encrypt anything that contains user data,
> > which includes heap, index, and WAL files.  I think replication slots
> > and logical replication might also fall into that category, which is why
> > I started this thread.
> 
> Yes, I think those bits have to be encrypted too.
> 
> BTW I'm not sure why you list replication slots and logical replication
> independently, those are mostly the same thing I think. For physical
> slots we probably don't need to encrypt anything, but for logical slots
> we may spill decoded data to files (so those will contain user data).

In this thread, I am really looking for experts who can explain exactly
where sensitive data is stored in PGDATA.  Oh, pgsql_tmp must be
encrypted too.  I would say we know which things must be encrypted, but
we now need to go through the rest of PGDATA to determine which parts
are safe to leave unencrypted, and which must be encrypted.

> > I can see some saying that all cluster files should be encrypted, and I
> > can respect that argument.  However, as outlined in the diagram linked
> > to from the blog entry:
> > 
> > https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019
> > 
> > I feel that TDE, since it has limited value, and can't really avoid all
> > information leakage, should strive to find the intersection of ease of
> > implementation, security, and compliance.  If people don't think that
> > limited file encryption is secure, I get it.  However, encrypting most
> > or all files I think would lead us into such a "difficult to implement"
> > scope that I would not longer be able to work on this feature.  I think
> > the code complexity, fragility, potential unreliability, and even
> > overhead of trying to encrypt most/all files would lead TDE to be
> > greatly delayed or never implemented.  I just couldn't recommend it.
> > Now, I might be totally wrong, and encryption of everything might be
> > just fine, but I have to pick my projects, and such an undertaking seems
> > far too risky for me.
> > 
> 
> I agree some trade-offs will be needed, to make the implementation at
> all possible (irrespectedly of the exact design). But I think those
> trade-offs need to be conscious, based on some technical arguments why
> it's OK to consider a particular information leak acceptable, etc. For
> example it may be fine when assuming the attacker only gets a single
> static copy of the data directory, but not when having the ability to
> observe changes made by a running instance.

Yes, we need to be explicit in what we don't encrypt --- that it is
reasonably safe.

> In a way, my concern is somehat the opposite of yours - that we'll end
> up with a feature (which necessarily adds complexity) that however does
> not provide sufficient security for various use cases.

Yep, if we can't do it safely, there is no point in doing it.

> And I don't know where exactly the middle ground is, TBH.

We spend a lot of time figuring out exactly how to safely encrypt WAL,
heap, index, and pgsql_tmp files.   The idea of doing this for another
20 types of files --- to find a safe nonce, to be sure a file rewrite
doesn't reuse the nonce, figuring the API, crash recovery, forensics,
tool interface --- is something I would like to avoid.  I want to avoid
it not because I don't like work, but because I am afraid the code
impact and fragility will doom the feature.

> > Just for some detail, we have solved the block-level encryption problem
> > by using CTR mode in most cases, but there is still a requirement for a
> > nonce for every encryption operation.  You can use derived keys too, but
> > you need to set up those keys for every write to encrypt files.  Maybe
> > it is possible to set up a write API that handles this transparently in
> > the code, but I don't know how to do that cleanly, and I doubt if the
> > value of encrypting everything is worth it.
> > 
> > As far as encrypting the log file, I can see us adding documentation to
> > warn about that, and even issue a server log message if encryption is
> > enabled and syslog is not being used.  (I don't know how to test if
> > syslog is being shipped to a remote server.)
> > 
> 
> Not sure. I wonder if it's possible to setup syslog so that it encrypts
> the data on storage, and if that would be a suitable solution e.g. for
> PCI DSS purposes. (It seems at least rsyslogd supports that.)

Well, users don't want the data visible in a mounted file system, which
is why we were thinking a remote secure server would help.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 04:58:14PM -0400, Bruce Momjian wrote:

On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:

Oracle also has a handy "TDE best practices" document [2], which says
when to use column-level encryption - let me quote a couple of points:

* Location of sensitive information is known

* Less than 5% of all application columns are encryption candidates

* Encryption candidates are not foreign-key columns

* Indexes over encryption candidates are normal B-tree indexes (this
 also means no support for indexes on expressions, and likely partial
 indexes)

* No support from hardware crypto acceleration.


Aren't all modern systems going to have hardware crypto acceleration,
i.e., AES-NI CPU extensions.  Does that mean there is no value of
partial encryption on such systems?  Looking at the overhead numbers I
have seen for AES-NI-enabled systems, I believe it.




That's a good question, I don't know the answer. You're right most
systems have CPUs with AES-NI these days, and I'm not sure why the
column encryption does not leverage that.

Maybe it's because column encryption has to encrypt/decrypt much smaller
chunks of data, and AES-NI is not efficient for that? I don't know.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Andres Freund
Hi,

On 2019-10-04 17:08:29 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-10-04 16:31:29 -0400, Bruce Momjian wrote:
> >> Yeah, it is certainly weird that you have to assign the first array
> >> element to get the rest to be zeros.  By using a macro, we can document
> >> this behavior in one place.
> 
> > IDK, to me this seems like something one just has to learn about C, with
> > the macro just obfuscating that already required knowledge. It's not
> > like this only applies to stack variables initializes with {0}. It's
> > also true of global variables, or function-local static ones, for
> > example.
> 
> Huh?  For both those cases, the *default* behavior, going back to K C,
> is that the variable initializes to all-bits-zero.  There's no need to
> write anything extra.

What I mean is that if there's any initialization, it's to all zeroes,
except for the parts explicitly initialized explicitly. And all that the
{0} does, is that the rest of the fields are initialized the way other
such initialization happens.

There's plenty places where we don't initialize every part, e.g. a
struct member, of global variables (and for stack allocated data as
well, increasingly so).  To be able to make sense of things like
somevar = {.foo = bar /* field blub is not initialized */};
or things like guc.c - where we rely on zero initialize most fields of
config_generic.


> If some people are writing {0} there, I think
> we should discourage that on the grounds that it results in inconsistent
> coding style.

Yea, I'm not advocating that.

Greetings,

Andres Freund




Re: format of pg_upgrade loadable_libraries warning

2019-10-04 Thread Bruce Momjian
On Wed, Oct  2, 2019 at 12:23:37PM -0500, Justin Pryzby wrote:
> Regarding the previous thread and commit here:
> https://www.postgresql.org/message-id/flat/20180713162815.GA3835%40momjian.us
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=60e3bd1d7f92430b24b710ecf0559656eb8ed499
> 
> I'm suggesting to reformat the warning, which I found to be misleading:
> 
> |could not load library "$libdir/pgfincore": ERROR:  could not access file 
> "$libdir/pgfincore": No such file or directory
> |Database: postgres
> |Database: too
> 
> To me that reads as "error message" followed by successful processing of two,
> named database, and not "error message followed by list of databases for which
> that error was experienced".  Essentially, the database names are themselves
> the "error", and the message is a prefix indicating the library version; but
> normally, error-looking things are output without a "prefix", since they
> weren't anticipated.
> 
> The existing check is optimized to check each library once, but then outputs
> each database which would try to load it.  That's an implementation detail, 
> but
> adds to confusion, since it shows a single error-looking thing which might
> apply to multiple DBs (not obvious to me that it's associated with an DB at
> all).  That leads me to believe that after I "DROP EXTENSION" once, I can
> reasonably expect the upgrade to complete, which has a good chance of being
> wrong, and is exactly what the patch was intended to avoid :(

Understood.  This is a general problem with the way pg_upgrade displays
errors and the databases/objects associated with them.  The attached
patch fixes the output text to say "in database", e.g.:

  Could not load library "$libdir/pgfincore": ERROR:  could not access file 
"$libdir/pgfincore": No such file or directory
  in database: postgres
  in database: too

Would intenting help too?  I am inclined to fix this only head, and not
to backpatch the change.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
new file mode 100644
index 617270f..1bbb57c
*** a/src/bin/pg_upgrade/check.c
--- b/src/bin/pg_upgrade/check.c
*** check_for_isn_and_int8_passing_mismatch(
*** 858,864 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
--- 858,864 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "in database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
*** check_for_tables_with_oids(ClusterInfo *
*** 937,943 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
--- 937,943 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "in database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s\n",
*** check_for_reg_data_type_usage(ClusterInf
*** 1046,1052 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
--- 1046,1052 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "in database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
*** check_for_jsonb_9_4_usage(ClusterInfo *c
*** 1137,1143 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "Database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
--- 1137,1143 
  		 output_path, strerror(errno));
  			if (!db_used)
  			{
! fprintf(script, "in database: %s\n", active_db->db_name);
  db_used = true;
  			}
  			fprintf(script, "  %s.%s.%s\n",
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
new file mode 100644
index 0c66d1c..506fd34
*** a/src/bin/pg_upgrade/function.c
--- b/src/bin/pg_upgrade/function.c
*** check_loadable_libraries(void)
*** 256,262 
  		}
  
  		if (was_load_failure)
! 			fprintf(script, _("Database: %s\n"),
  	old_cluster.dbarr.dbs[os_info.libraries[libnum].dbnum].db_name);
  	}
  
--- 256,262 
  		}
  
  		if (was_load_failure)
! 			fprintf(script, _("in database: %s\n"),
  	old_cluster.dbarr.dbs[os_info.libraries[libnum].dbnum].db_name);
  	}
  
diff --git a/src/bin/pg_upgrade/version.c b/src/bin/pg_upgrade/version.c
new file mode 100644
index 

Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote:

On Fri, Oct  4, 2019 at 09:18:58AM -0400, Robert Haas wrote:

I think everyone would agree that if you have no information about a
database other than the contents of pg_clog, that's not a meaningful
information leak. You would be able to tell which transactions
committed and which transactions aborted, but since you know nothing
about the data inside those transactions, it's of no use to you.
However, in that situation, you probably wouldn't be attacking the
database in the first place. Most likely you have some knowledge about
what it contains. Maybe there's a stream of sensor data that flows
into the database, and you can see that stream.  By watching pg_clog,
you can see when a particular bit of data is rejected. That could be
valuable.


It is certainly true that seeing activity in _any_ cluster file could
leak information.  However, even if we encrypted all the cluster files,
bad actors could still get information by analyzing the file sizes and
size changes of relation files, and the speed of WAL creation, and even
monitor WAL for write activity (WAL file byte changes).  I would think
that would leak more information than clog.



Yes, those information leaks seem unavoidable. 


I am not sure how you could secure against that information leak.  While
file system encryption might do that at the storage layer, it doesn't do
anything at the mounted file system layer.



That's because FDE is only meant to protect against passive attacker,
essentially stealing the device. It's useless when someone gains access
to a mounted disk, so these information leaks are irrelevant.

(I'm only talking about encryption at the block device level. I'm not
sure about details e.g. for the encryption built into ext4, etc.)


The current approach is to encrypt anything that contains user data,
which includes heap, index, and WAL files.  I think replication slots
and logical replication might also fall into that category, which is why
I started this thread.



Yes, I think those bits have to be encrypted too.

BTW I'm not sure why you list replication slots and logical replication
independently, those are mostly the same thing I think. For physical
slots we probably don't need to encrypt anything, but for logical slots
we may spill decoded data to files (so those will contain user data).


I can see some saying that all cluster files should be encrypted, and I
can respect that argument.  However, as outlined in the diagram linked
to from the blog entry:

https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019

I feel that TDE, since it has limited value, and can't really avoid all
information leakage, should strive to find the intersection of ease of
implementation, security, and compliance.  If people don't think that
limited file encryption is secure, I get it.  However, encrypting most
or all files I think would lead us into such a "difficult to implement"
scope that I would not longer be able to work on this feature.  I think
the code complexity, fragility, potential unreliability, and even
overhead of trying to encrypt most/all files would lead TDE to be
greatly delayed or never implemented.  I just couldn't recommend it.
Now, I might be totally wrong, and encryption of everything might be
just fine, but I have to pick my projects, and such an undertaking seems
far too risky for me.



I agree some trade-offs will be needed, to make the implementation at
all possible (irrespectedly of the exact design). But I think those
trade-offs need to be conscious, based on some technical arguments why
it's OK to consider a particular information leak acceptable, etc. For
example it may be fine when assuming the attacker only gets a single
static copy of the data directory, but not when having the ability to
observe changes made by a running instance.

In a way, my concern is somehat the opposite of yours - that we'll end
up with a feature (which necessarily adds complexity) that however does
not provide sufficient security for various use cases.

And I don't know where exactly the middle ground is, TBH.


Just for some detail, we have solved the block-level encryption problem
by using CTR mode in most cases, but there is still a requirement for a
nonce for every encryption operation.  You can use derived keys too, but
you need to set up those keys for every write to encrypt files.  Maybe
it is possible to set up a write API that handles this transparently in
the code, but I don't know how to do that cleanly, and I doubt if the
value of encrypting everything is worth it.

As far as encrypting the log file, I can see us adding documentation to
warn about that, and even issue a server log message if encryption is
enabled and syslog is not being used.  (I don't know how to test if
syslog is being shipped to a remote server.)



Not sure. I wonder if it's possible to setup syslog so that it encrypts
the data on storage, and if that would be a suitable 

Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Tom Lane
Andres Freund  writes:
> On 2019-10-04 16:31:29 -0400, Bruce Momjian wrote:
>> Yeah, it is certainly weird that you have to assign the first array
>> element to get the rest to be zeros.  By using a macro, we can document
>> this behavior in one place.

> IDK, to me this seems like something one just has to learn about C, with
> the macro just obfuscating that already required knowledge. It's not
> like this only applies to stack variables initializes with {0}. It's
> also true of global variables, or function-local static ones, for
> example.

Huh?  For both those cases, the *default* behavior, going back to K C,
is that the variable initializes to all-bits-zero.  There's no need to
write anything extra.  If some people are writing {0} there, I think
we should discourage that on the grounds that it results in inconsistent
coding style.

Note that I'm not proposing a rule against, say,

static MyNodeType *my_variable = NULL;

That's perfectly sensible and adds no cognitive load that I can see.
But in cases where you have to indulge in type punning or reliance on
obscure language features to get the result that would happen if you'd
just not written anything, I think you should just not write anything.

regards, tom lane




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Bruce Momjian
On Fri, Oct  4, 2019 at 10:46:57PM +0200, Tomas Vondra wrote:
> Oracle also has a handy "TDE best practices" document [2], which says
> when to use column-level encryption - let me quote a couple of points:
> 
> * Location of sensitive information is known
> 
> * Less than 5% of all application columns are encryption candidates
> 
> * Encryption candidates are not foreign-key columns
> 
> * Indexes over encryption candidates are normal B-tree indexes (this
>  also means no support for indexes on expressions, and likely partial
>  indexes)
> 
> * No support from hardware crypto acceleration.

Aren't all modern systems going to have hardware crypto acceleration,
i.e., AES-NI CPU extensions.  Does that mean there is no value of
partial encryption on such systems?  Looking at the overhead numbers I
have seen for AES-NI-enabled systems, I believe it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Tomas Vondra

On Thu, Oct 03, 2019 at 01:26:55PM -0400, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

On Thu, Oct 03, 2019 at 11:51:41AM -0400, Stephen Frost wrote:
>* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
>>On Thu, Oct 03, 2019 at 10:40:40AM -0400, Stephen Frost wrote:
>>>People who are looking for 'encrypt all the things' should and will be
>>>looking at filesytem-level encryption options.  That's not what this
>>>feature is about.
>>
>>That's almost certainly not true, at least not universally.
>>
>>It may be true for some people, but a a lot of the people asking for
>>in-database encryption essentially want to do filesystem encryption but
>>can't use it for various reasons. E.g. because they're running in
>>environments that make filesystem encryption impossible to use (OS not
>>supporting it directly, no access to the block device, lack of admin
>>privileges, ...). Or maybe they worry about people with fs access.
>
>Anyone coming from other database systems isn't asking for that though
>and it wouldn't be a comparable offering to other systems.

I don't think that's quite accurate. In the previous message you claimed
(1) this isn't what other database systems do and (2) people who want to
encrypt everything should just use fs encryption, because that's not
what TDE is about.

Regarding (1), I'm pretty sure Oracle TDE does pretty much exactly this,
at least in the mode with tablespace-level encryption. It's true there
is also column-level mode, but from my experience it's far less used
because it has a number of annoying limitations.


We're probably being too general and that's ending up with us talking
past each other.  Yes, Oracle provides tablespace and column level
encryption, but neither case results in *everything* being encrypted.



Possibly. There are far too many different TDE definitions in all those
various threads.


So I'm somewhat puzzled by your claim that people coming from other
systems are asking for the column-level mode. At least I'm assuming
that's what they're asking for, because I don't see other options.


I've seen asks for tablespace, table, and column-level, but it's always
been about the actual data.  Something like clog is an entirely internal
structure that doesn't include the actual data.  Yes, it's possible it
could somehow be used for a side-channel attack, as could other things,
such as WAL, and as such I'm not sure that forcing a policy of "encrypt
everything" is actually a sensible approach and it definitely adds
complexity and makes it a lot more difficult to come up with a sensible
solution.



IMHO the proven design principle is "deny all" by default, i.e. we
should start with the assumption that clog is encrypted and then present
arguments why it's not needed. Maybe it's 100% fine and we don't need to
encrypt it, or maybe it's a minor information leak and is not worth the
extra complexity, or maybe it's not needed for v1. But how do you know?
I don't think that discussion happened anywhere in those threads.



>>If you look at how the two threads discussing the FDE design, both of
>>them pretty much started as "let's do FDE in the database".
>
>And that's how some folks continue to see it- let's just encrypt all the
>things, until they actually look at it and start thinking about what
>that means and how to implement it.

This argument also works the other way, though. On Oracle, people often
start with the column-level encryption because it seems naturally
superior (hey, I can encrypt just the columns I want, ...) and then they
start running into the various limitations and eventually just switch to
the tablespace-level encryption.

Now, maybe we'll be able to solve those limitations - but I think it's
pretty unlikely, because those limitations seem quite inherent to how
encryption affects indexes etc.


It would probably be useful to discuss the specific limitations that
you've seen causes people to move away from column-level encryption.

I definitely agree that figuring out how to make things work with
indexes is a non-trivial challenge, though I'm hopeful that we can come
up with something sensible.



Hope is hardly something we should use to drive design decisions ...

As for the limitations, the column-level limitations in Oracle, this is
what the docs [1] say:

-  -
Do not use TDE column encryption with the following database features:

   Index types other than B-tree

   Range scan search through an index

   Synchronous change data capture

   Transportable tablespaces

   Columns that have been created as identity columns

In addition, you cannot use TDE column encryption to encrypt columns
used in foreign key constraints.
-  -

Now, some of that is obviously specific to Oracle, but at least some of
it seems to affect us too - certainly range scans through indexes,
possibly data capture (I believe that's mostly logical decoding),
non-btree indexes and identity columns.

Oracle also has a 

Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Andres Freund
Hi,

On 2019-10-04 16:31:29 -0400, Bruce Momjian wrote:
> On Fri, Oct  4, 2019 at 02:05:41PM -0400, Chapman Flack wrote:
> > On 10/4/19 1:44 PM, Ashwin Agrawal wrote:
> > 
> > > macro exist in first place will be hard to remember. So, irrespective
> > > in long run, {0} might get used in code and hence seems better
> > > to just use {0} from start itself instead of macro/wrapper on top.

It already is in somewhat frequent use, fwiw.


> > > Plus, even if someone starts out with thought {1} sets them all to ones,
> > > I feel will soon realize by exercising the code isn't the reality.
> > 
> > I wish ISO C had gone the same place gcc (and C++ ?) went, and allowed
> > the initializer {}, which would eliminate any chance of it misleading
> > a casual reader.
> > 
> > If that were the case, I would be +1 on just using the {} syntax.
> > 
> > But given that the standard is stuck on requiring a first element,
> > I am +1 on using the macro, just to avoid giving any wrong impressions,
> > even fleeting ones.
> 
> Yeah, it is certainly weird that you have to assign the first array
> element to get the rest to be zeros.  By using a macro, we can document
> this behavior in one place.

IDK, to me this seems like something one just has to learn about C, with
the macro just obfuscating that already required knowledge. It's not
like this only applies to stack variables initializes with {0}. It's
also true of global variables, or function-local static ones, for
example.

Greetings,

Andres Freund




Re: Tighten error control for OpenTransientFile/CloseTransientFile

2019-10-04 Thread Andres Freund
Hi,

On 2019-03-07 10:56:25 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c 
> b/src/backend/access/heap/rewriteheap.c
> index f5cf9ffc9c..bce4274362 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
>errmsg("could not fsync file \"%s\": %m", 
> path)));
>   pgstat_report_wait_end();
>
> - CloseTransientFile(fd);
> + if (CloseTransientFile(fd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": %m", 
> path)));
>  }
...
> diff --git a/src/backend/access/transam/twophase.c 
> b/src/backend/access/transam/twophase.c
> index 64679dd2de..21986e48fe 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
>   }
>
>   pgstat_report_wait_end();
> - CloseTransientFile(fd);
> +
> + if (CloseTransientFile(fd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": %m", 
> path)));
>
>   hdr = (TwoPhaseFileHeader *) buf;
>   if (hdr->magic != TWOPHASE_MAGIC)
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index 0fdd82a287..c7047738b6 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3469,7 +3469,10 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, 
> XLogSegNo srcsegno,
>   (errcode_for_file_access(),
>errmsg("could not close file \"%s\": %m", 
> tmppath)));
>
> - CloseTransientFile(srcfd);
> + if (CloseTransientFile(srcfd))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": %m", 
> path)));
>
>   /*
>* Now move the segment into place with its final name.
...

This seems like an odd set of changes to me. What is this supposed to
buy us?  The commit message says:
2) When opening transient files, it is up to the caller to close the
file descriptors opened.  In error code paths, CloseTransientFile() gets
called to clean up things before issuing an error.  However in normal
exit paths, a lot of callers of CloseTransientFile() never actually
reported errors, which could leave a file descriptor open without
knowing about it.  This is an issue I complained about a couple of
times, but never had the courage to write and submit a patch, so here we
go.

but that reasoning seems bogus to me. For one, on just about any
platform close always closes the fd, even when returning an error
(unless you pass in a bad fd, in which case it obviously doesn't). So
the reasoning that this fixes unnoticed fd leaks doesn't really seem to
make sense. For another, even if it did, throwing an ERROR seems to
achieve very little: We continue with a leaked fd *AND* we cause the
operation to error out.

I can see reasoning for:
- LOG, so it can be noticed, but operations continue to work
- FATAL, to fix the leak
- PANIC, so we recover from the problem, in case of the close indicating
  a durability issue

  commit 9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1
  Author: Thomas Munro 
  Date:   2018-11-19 13:31:10 +1300

  PANIC on fsync() failure.

but ERROR seems to have very little going for it.

The durability argument doesn't seem to apply for the cases where we
previously fsynced the file, a significant fraction of the locations you
touched.

And if your goal was just to achieve consistency, I also don't
understand, because you left plenty close()'s unchecked? E.g. you added
an error check in get_controlfile(), but not one in
ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
one.

Greetings,

Andres Freund




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Bruce Momjian
On Fri, Oct  4, 2019 at 02:05:41PM -0400, Chapman Flack wrote:
> On 10/4/19 1:44 PM, Ashwin Agrawal wrote:
> 
> > macro exist in first place will be hard to remember. So, irrespective
> > in long run, {0} might get used in code and hence seems better
> > to just use {0} from start itself instead of macro/wrapper on top.
> > 
> > Plus, even if someone starts out with thought {1} sets them all to ones,
> > I feel will soon realize by exercising the code isn't the reality.
> 
> I wish ISO C had gone the same place gcc (and C++ ?) went, and allowed
> the initializer {}, which would eliminate any chance of it misleading
> a casual reader.
> 
> If that were the case, I would be +1 on just using the {} syntax.
> 
> But given that the standard is stuck on requiring a first element,
> I am +1 on using the macro, just to avoid giving any wrong impressions,
> even fleeting ones.

Yeah, it is certainly weird that you have to assign the first array
element to get the rest to be zeros.  By using a macro, we can document
this behavior in one place.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Memory Accounting

2019-10-04 Thread Robert Haas
On Tue, Sep 24, 2019 at 2:47 PM Melanie Plageman
 wrote:
> I think it would be helpful if we could repeat the performance tests
> Robert did on that machine with the current patch (unless this version
> of the patch is exactly the same as the ones he tested previously).

I still have access to a POWER machine, but it's currently being used
by somebody else for a benchmarking project, so I can't test this
immediately.

It's probably worth noting that, in addition to whatever's changed in
this patch, tuplesort.c's memory management has been altered
significantly since 2015 - see
0011c0091e886b874e485a46ff2c94222ffbf550 and
e94568ecc10f2638e542ae34f2990b821bbf90ac. I'm not immediately sure how
that would affect the REINDEX case that I tested back then, but it
seems at least possible that they would have the effect of making
palloc overhead less significant. More generally, so much of the
sorting machinery has been overhauled by Peter Geoghegan since then
that what happens now may just not be very comparable to what happened
back then.

I do agree that this approach looks pretty light weight. Tomas's point
about the difference between updating only the current context and
updating all parent contexts seems right on target to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 07:52:48AM +0200, Magnus Hagander wrote:

On Fri, Oct 4, 2019 at 3:42 AM Stephen Frost  wrote:



> It doesn't seem like it would require
> much work at all to construct an argument that a hacker might enjoy
> having unfettered access to pg_clog even if no other part of the
> database can be read.

The question isn't about what hackers would like to have access to, it's
about what would actually provide them with a channel to get information
that's sensitive, and at what rate.  Perhaps there's an argument to be
made that clog would provide a high enough rate of information that
could be used to glean sensitive information, but that's certainly not
an argument that's been put forth, instead it's the knee-jerk reaction
of "oh goodness, if anything isn't encrypted then hackers will be able
to get access to everything" and that's just not a real argument.



Huh. That is *exactly* the argument I made. Though granted the example was
on multixact primarily, because I think that is much more likely to leak
interesting information, but the basis certainly applies to all the
metadata.



IMHO we should treat everything as a serious side-channel by default,
and only consider not encrypting it after presenting arguments why
that's not the case. So we shouldn't be starting with unencrypted clog
and waiting for folks to come up with attacks leveraging that.

Of course, it's impossible to prove that something is not a serious
side-channel (it might be safe on it's own, but not necessarily when
combined with other side-channels). And it's not black-and-white, i.e.
the side-channel may be leaking so little information it's not worth
bothering with. And ultimately it's a trade-off between complexity of
implementation and severity of the side-channel.

But without at least trying to quantify the severity of the side-channel
we can't really have a discussion whether it's OK not to encrypt clog,
whether it can be omitted from v1 etc.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Bruce Momjian
On Fri, Oct  4, 2019 at 09:18:58AM -0400, Robert Haas wrote:
> I think everyone would agree that if you have no information about a
> database other than the contents of pg_clog, that's not a meaningful
> information leak. You would be able to tell which transactions
> committed and which transactions aborted, but since you know nothing
> about the data inside those transactions, it's of no use to you.
> However, in that situation, you probably wouldn't be attacking the
> database in the first place. Most likely you have some knowledge about
> what it contains. Maybe there's a stream of sensor data that flows
> into the database, and you can see that stream.  By watching pg_clog,
> you can see when a particular bit of data is rejected. That could be
> valuable.

It is certainly true that seeing activity in _any_ cluster file could
leak information.  However, even if we encrypted all the cluster files,
bad actors could still get information by analyzing the file sizes and
size changes of relation files, and the speed of WAL creation, and even
monitor WAL for write activity (WAL file byte changes).  I would think
that would leak more information than clog.

I am not sure how you could secure against that information leak.  While
file system encryption might do that at the storage layer, it doesn't do
anything at the mounted file system layer.

The current approach is to encrypt anything that contains user data,
which includes heap, index, and WAL files.  I think replication slots
and logical replication might also fall into that category, which is why
I started this thread.

I can see some saying that all cluster files should be encrypted, and I
can respect that argument.  However, as outlined in the diagram linked
to from the blog entry:

https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019

I feel that TDE, since it has limited value, and can't really avoid all
information leakage, should strive to find the intersection of ease of
implementation, security, and compliance.  If people don't think that
limited file encryption is secure, I get it.  However, encrypting most
or all files I think would lead us into such a "difficult to implement"
scope that I would not longer be able to work on this feature.  I think
the code complexity, fragility, potential unreliability, and even
overhead of trying to encrypt most/all files would lead TDE to be
greatly delayed or never implemented.  I just couldn't recommend it. 
Now, I might be totally wrong, and encryption of everything might be
just fine, but I have to pick my projects, and such an undertaking seems
far too risky for me.

Just for some detail, we have solved the block-level encryption problem
by using CTR mode in most cases, but there is still a requirement for a
nonce for every encryption operation.  You can use derived keys too, but
you need to set up those keys for every write to encrypt files.  Maybe
it is possible to set up a write API that handles this transparently in
the code, but I don't know how to do that cleanly, and I doubt if the
value of encrypting everything is worth it.

As far as encrypting the log file, I can see us adding documentation to
warn about that, and even issue a server log message if encryption is
enabled and syslog is not being used.  (I don't know how to test if
syslog is being shipped to a remote server.)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Problem with repalloc downsizing patch

2019-10-04 Thread Tom Lane
I wrote:
> I'm also wondering a bit whether there's anyplace *else* that is
> cheating by assuming that a downsizing repalloc doesn't move the block.
> We could investigate that by testing with a modified form of
> AllocSetRealloc that always moves the block, but of course that won't
> find bugs in untested code paths.  So another possibility is to revert
> c477f3e44 and then document that AllocSetRealloc does not move a block
> when reducing its size.  That does not seem very attractive though.

I did that testing, and found that check-world does not expose any other
trouble spots beyond the one in enlarge_list().  So I think this option
should be rejected.

That leaves us with needing to decide whether we should or shouldn't
forcibly split off the initial ListCell array if it's large.  I'm
kind of leaning to not doing so, because doing that would add an
extra test (at least) to each list creation, and the frequency of
being able to reclaim space seems like it'd be pretty small.  You
need a large initial list, *plus* a request to make it even larger.

(I haven't been able to reproduce skink's failure though, so maybe
there's something I'm missing.)

Thoughts?

regards, tom lane




Re: Memory Accounting

2019-10-04 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Oct 4, 2019 at 7:32 AM Jeff Davis  wrote:
>> The patch has been floating around for a very long time, so I don't
>> remember exactly why I chose a signed value. Sorry.

> I am reminded of the fact that int64 is used to size buffers within
> tuplesort.c, because it needs to support negative availMem sizes --
> when huge allocations were first supported, tuplesort.c briefly used
> "Size", which didn't work. Perhaps it had something to do with that.

I wonder if we should make that use ssize_t instead.  Probably
not worth the trouble.

regards, tom lane




Re: Memory Accounting

2019-10-04 Thread Peter Geoghegan
On Fri, Oct 4, 2019 at 7:32 AM Jeff Davis  wrote:
> The patch has been floating around for a very long time, so I don't
> remember exactly why I chose a signed value. Sorry.

I am reminded of the fact that int64 is used to size buffers within
tuplesort.c, because it needs to support negative availMem sizes --
when huge allocations were first supported, tuplesort.c briefly used
"Size", which didn't work. Perhaps it had something to do with that.

--
Peter Geoghegan




Re: Memory Accounting

2019-10-04 Thread Tom Lane
I wrote:
> Just to make things even more mysterious, prairiedog finally showed
> the Assert failure on its fourth run with c477f3e449 included:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2019-10-04%2012%3A35%3A41
> It's also now apparent that lapwing and locust were failing only
> sometimes, as well.  I totally don't understand why that failure
> would've been only partially reproducible.  Maybe we should dig
> a bit harder, rather than just deciding that we fixed it.

I did that digging, reproducing the problem on florican's host
(again intermittently).  Here's a stack trace from the spot where
we sometimes downsize a large chunk:

#5  0x0851c70a in AllocSetRealloc (context=0x31b35000, pointer=0x319e5020, 
size=1224) at aset.c:1158
#6  0x085232eb in repalloc (pointer=0x319e5020, size=1224) at mcxt.c:1082
#7  0x31b69591 in resize_intArrayType (num=300, a=0x319e5020)
at _int_tool.c:268
#8  resize_intArrayType (a=0x319e5020, num=300) at _int_tool.c:250
#9  0x31b6995d in _int_unique (r=0x319e5020) at _int_tool.c:329
#10 0x31b66a00 in g_int_union (fcinfo=0xffbfcc5c) at _int_gist.c:146
#11 0x084ff9c4 in FunctionCall2Coll (flinfo=0x319e2bb4, collation=100, 
arg1=834250780, arg2=4290759884) at fmgr.c:1162
#12 0x080db3a3 in gistMakeUnionItVec (giststate=0x319e2820, itvec=0x31bae4a4, 
len=15, attr=0xffbfce5c, isnull=0xffbfcedc) at gistutil.c:204
#13 0x080e410d in gistunionsubkeyvec (giststate=giststate@entry=0x319e2820, 
itvec=itvec@entry=0x31bb5ed4, gsvp=gsvp@entry=0xffbfcd4c) at gistsplit.c:64
#14 0x080e416f in gistunionsubkey (giststate=giststate@entry=0x319e2820, 
itvec=itvec@entry=0x31bb5ed4, spl=spl@entry=0xffbfce3c) at gistsplit.c:91
#15 0x080e4456 in gistSplitByKey (r=, page=, 
itup=, len=, giststate=, 
v=, attno=) at gistsplit.c:689
#16 0x080d8797 in gistSplit (r=0x31bbb424, page=0x297e0b80 "", 
itup=0x31bb5ed4, len=16, giststate=0x319e2820) at gist.c:1432
#17 0x080d8d9c in gistplacetopage (rel=, 
freespace=, giststate=, 
buffer=, itup=, ntup=, 
oldoffnum=, newblkno=, 
leftchildbuf=, splitinfo=, 
markfollowright=, heapRel=, 
is_build=) at gist.c:299

So the potential downsize is expected, triggered by _int_unique
being able to remove some duplicate entries from a GIST union key.
AFAICT the fact that it happens only intermittently must boil down
to the randomized insertion choices that gistchoose() sometimes makes.

In short: nothing to see here, move along.

regards, tom lane




Re: tableam vs. TOAST

2019-10-04 Thread Robert Haas
On Fri, Sep 6, 2019 at 10:59 AM Robert Haas  wrote:
> On Thu, Sep 5, 2019 at 4:07 PM Andres Freund  wrote:
> > Yea, makes sense to me.
>
> OK, done.  Here's the remaining patches again, with a slight update to
> the renaming patch (now 0002).  In the last version, I renamed
> toast_insert_or_update to heap_toast_insert_or_update but did not
> rename toast_delete to heap_toast_delete.  Actually, I'm not seeing
> any particular reason not to go ahead and push the renaming patch at
> this point also.

And, hearing no objections, done.

Here's the last patch back, rebased over that renaming. Although I
think that Andres (and Tom) are probably right that there's room for
improvement here, I currently don't see a way around the issues I
wrote about in 
http://postgr.es/m/ca+tgmoa0zfcacpojcsspollgpztvfsyvcvb-uss8yokzmo5...@mail.gmail.com
-- so not quite sure where to go next. Hopefully Andres or someone
else will give me a quick whack with the cluebat if I'm missing
something obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v7-0001-Allow-TOAST-tables-to-be-implemented-using-table-.patch
Description: Binary data


Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Chapman Flack
On 10/4/19 1:44 PM, Ashwin Agrawal wrote:

> macro exist in first place will be hard to remember. So, irrespective
> in long run, {0} might get used in code and hence seems better
> to just use {0} from start itself instead of macro/wrapper on top.
> 
> Plus, even if someone starts out with thought {1} sets them all to ones,
> I feel will soon realize by exercising the code isn't the reality.

I wish ISO C had gone the same place gcc (and C++ ?) went, and allowed
the initializer {}, which would eliminate any chance of it misleading
a casual reader.

If that were the case, I would be +1 on just using the {} syntax.

But given that the standard is stuck on requiring a first element,
I am +1 on using the macro, just to avoid giving any wrong impressions,
even fleeting ones.

Regards,
-Chap




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Ashwin Agrawal
On Fri, Oct 4, 2019 at 8:49 AM Tom Lane  wrote:

> Jacob Champion  writes:
> > On Fri, Oct 4, 2019 at 7:51 AM Tom Lane  wrote:
> >> I concur with Joe here.  The reason why some of the existing
> >> memset's use "false" is for symmetry with other places where we use
> >> "memset(p, true, n)" to set an array of bools to all-true.
>
> > Why introduce a macro at all for the universal zero initializer, if it
> > seems to encourage the construction of other (incorrect) macros?
>
> Well, the argument is that some people might think that if {0} is enough
> to set all array elements to 0, then maybe {1} sets them all to ones
> (as, indeed, one could argue would be a far better specification than
> what the C committee actually wrote).  Using a separate macro and then
> discouraging direct use of the incomplete-initializer syntax should help
> to avoid that error.
>

Seems avoidable overhead to remind folks on macro existence. Plus, for such
a thing macro exist in first place will be hard to remember. So,
irrespective in long run, {0} might get used in code and hence seems better
to just use {0} from start itself instead of macro/wrapper on top.

Plus, even if someone starts out with thought {1} sets them all to ones, I
feel will soon realize by exercising the code isn't the reality. If such
code is written and nothing fails, that itself seems bigger issue.


Problem with repalloc downsizing patch

2019-10-04 Thread Tom Lane
skink just found what seems like a serious problem with commit
c477f3e44:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2019-10-04%2010%3A15%3A05

performing post-bootstrap initialization ... TRAP: FailedAssertion("newlist == 
list", File: 
"/home/andres/build/buildfarm/HEAD/pgsql.build/../pgsql/src/backend/nodes/list.c",
 Line: 200)
Aborted

That assert is here:

/*
 * Currently, asking aset.c to reduce the allocated size of the List
 * header is pointless in terms of reclaiming space, unless the list
 * is very long.  However, it seems worth doing anyway to cause the
 * no-longer-needed initial_elements[] space to be cleared in
 * debugging builds.
 */
newlist = (List *) repalloc(list, offsetof(List, initial_elements));

/* That better not have failed, nor moved the list header */
Assert(newlist == list);

If we invoke realloc() then it's very much within its rights to return
a block that's not where the original block was.  I'm a bit surprised
that initdb runs any SQL commands that create Lists long enough to
reach the threshold where c477f3e44 would make a difference (~1000
initial elements would be needed), but there you have it.

This is a bit sticky to resolve.  I'm inclined to think that the fault
is on list.c, as it really shouldn't assume that repalloc doesn't move
the block.  (In fact, the quoted comment is self-contradictory, since
it evidently thinks that space *could* be given back when a large
enough block is involved; which is exactly what didn't happen before
c477f3e44.)  But I do not think we can give up the invariant that List
headers don't move around.  Hence we basically can't do repalloc here
after all.  Perhaps, to limit the cost of that, we should eat the cost
of separately palloc'ing the ListCell array when the initial list length
exceeds some threshold.

I'm also wondering a bit whether there's anyplace *else* that is
cheating by assuming that a downsizing repalloc doesn't move the block.
We could investigate that by testing with a modified form of
AllocSetRealloc that always moves the block, but of course that won't
find bugs in untested code paths.  So another possibility is to revert
c477f3e44 and then document that AllocSetRealloc does not move a block
when reducing its size.  That does not seem very attractive though.

Any opinions about this?

regards, tom lane




Re: dropdb --force

2019-10-04 Thread Pavel Stehule
čt 3. 10. 2019 v 19:48 odesílatel vignesh C  napsal:

> On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule 
> wrote:
> >
> > Thank you for careful review. I hope so all issues are out.
> >
> >
> Thanks Pavel for fixing the comments.
> Few comments:
> The else part cannot be hit in DropDatabase function as gram.y expects
> FORCE.
> +
> + if (strcmp(opt->defname, "force") == 0)
> + force = true;
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
> + parser_errposition(pstate, opt->location)));
> + }
> +
>

I know - but somebody can call DropDatabase function outside parser. So is
better check all possibilities.


> We should change gram.y to accept any keyword and throw error from
> DropDatabase function.
> + */
> +drop_option: FORCE
> + {
> + $$ = makeDefElem("force", NULL, @1);
> + }
> + ;
>

I spent some time with thinking about it, and I think so this variant (with
keyword) is well readable and very illustrative. This will be lost with
generic variant.

When the keyword FORCE already exists, then I prefer current state.


>
> "This will also fail" should be "This will fail"
> + 
> +  This will fail, if current user has no permissions to terminate
> other
> +  connections. Required permissions are the same as with
> +  pg_terminate_backend, described
> +  in .
> +
> +  This will also fail if we are not able to terminate connections or
> +  when there are active prepared transactions or active logical
> replication
> +  slots.
> + 
>

fixed


>
> Can we add few tests for this feature.
>

there are not any other test for DROP DATABASE

We can check syntax later inside second patch (for -f option of dropdb
command)

Regards

Pavel

>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
>
diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml
index 3ac06c984a..9b9cabe71c 100644
--- a/doc/src/sgml/ref/drop_database.sgml
+++ b/doc/src/sgml/ref/drop_database.sgml
@@ -21,7 +21,11 @@ PostgreSQL documentation
 
  
 
-DROP DATABASE [ IF EXISTS ] name
+DROP DATABASE [ IF EXISTS ] name [ [ WITH ] ( option [, ...] ) ] 
+
+where option can be one of:
+
+FORCE
 
  
 
@@ -32,9 +36,11 @@ DROP DATABASE [ IF EXISTS ] name
DROP DATABASE drops a database. It removes the
catalog entries for the database and deletes the directory
containing the data.  It can only be executed by the database owner.
-   Also, it cannot be executed while you or anyone else are connected
-   to the target database.  (Connect to postgres or any
-   other database to issue this command.)
+   Also, it cannot be executed while you are connected to the target database.
+   (Connect to postgres or any other database to issue this
+   command.)
+   If anyone else is connected to the target database, this command will fail
+   unless you use the FORCE option described below.
   
 
   
@@ -64,6 +70,28 @@ DROP DATABASE [ IF EXISTS ] name
  
 

+
+   
+FORCE
+
+ 
+  Attempt to terminate all existing connections to the target database.
+  It doesn't terminate prepared transactions or active logical replication
+  slot(s).
+ 
+ 
+  This will fail, if current user has no permissions to terminate other
+  connections. Required permissions are the same as with 
+  pg_terminate_backend, described
+  in .
+
+  This will fail if we are not able to terminate connections or
+  when there are active prepared transactions or active logical replication
+  slots.
+ 
+
+   
+
   
  
 
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 01d66212e9..8e62359b4d 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -811,7 +811,7 @@ createdb_failure_callback(int code, Datum arg)
  * DROP DATABASE
  */
 void
-dropdb(const char *dbname, bool missing_ok)
+dropdb(const char *dbname, bool missing_ok, bool force)
 {
 	Oid			db_id;
 	bool		db_istemplate;
@@ -896,6 +896,9 @@ dropdb(const char *dbname, bool missing_ok)
   nslots_active, nslots_active)));
 	}
 
+	if (force)
+		TerminateOtherDBBackends(db_id);
+
 	/*
 	 * Check for other backends in the target database.  (Because we hold the
 	 * database lock, no new ones can start after this.)
@@ -1003,6 +1006,30 @@ dropdb(const char *dbname, bool missing_ok)
 	ForceSyncCommit();
 }
 
+/*
+ * Process options and call dropdb function.
+ */
+void
+DropDatabase(ParseState *pstate, DropdbStmt *stmt)
+{
+	bool		force = false;
+	ListCell   *lc;
+
+	foreach(lc, stmt->options)
+	{
+		DefElem*opt = (DefElem *) lfirst(lc);
+
+		if (strcmp(opt->defname, "force") == 0)
+			force = true;
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_SYNTAX_ERROR),
+	 errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
+	 parser_errposition(pstate, opt->location)));
+	}
+
+	dropdb(stmt->dbname, 

Re: How to retain lesser paths at add_path()?

2019-10-04 Thread Robert Haas
On Mon, Aug 12, 2019 at 12:28 AM Kohei KaiGai  wrote:
> For implementation of the concept, this patch puts a hook on add_path
> / add_partial_path
> to override the path removal decision by extensions, according to its
> own viewpoint.

I don't think this hook is a very useful approach to this problem, and
I'm concerned that it might have a measurable performance cost.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Tom Lane
Jacob Champion  writes:
> On Fri, Oct 4, 2019 at 7:51 AM Tom Lane  wrote:
>> I concur with Joe here.  The reason why some of the existing
>> memset's use "false" is for symmetry with other places where we use
>> "memset(p, true, n)" to set an array of bools to all-true.

> Why introduce a macro at all for the universal zero initializer, if it
> seems to encourage the construction of other (incorrect) macros?

Well, the argument is that some people might think that if {0} is enough
to set all array elements to 0, then maybe {1} sets them all to ones
(as, indeed, one could argue would be a far better specification than
what the C committee actually wrote).  Using a separate macro and then
discouraging direct use of the incomplete-initializer syntax should help
to avoid that error.

> IMO
> the use of {0} as an initializer is well understood in the C developer
> community, and I'm used to it showing up verbatim in code.

Yeah, if we were all 100% familiar with every sentence in the C standard,
we could argue like that.  But we get lots of submissions from people
for whom C is not their main language.  The fewer gotchas there are in
our agreed-on subset of C, the better.

regards, tom lane




Re: Memory Accounting

2019-10-04 Thread Tom Lane
Tomas Vondra  writes:
> On Fri, Oct 04, 2019 at 12:36:01AM -0400, Tom Lane wrote:
>> What I think is happening is that c477f3e449 allowed this bit in
>> AllocSetRealloc:
>> context->mem_allocated += blksize - oldblksize;
>> to be executed in situations where blksize < oldblksize, where before
>> that was not possible.
>> ...
>> (I'm not quite sure why we're not seeing this failure on *all* the
>> 32-bit machines; maybe there's some other factor involved?)

> Interesting failure mode (especially that it does *not* fail on some
> 32-bit machines).

Just to make things even more mysterious, prairiedog finally showed
the Assert failure on its fourth run with c477f3e449 included:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2019-10-04%2012%3A35%3A41

It's also now apparent that lapwing and locust were failing only
sometimes, as well.  I totally don't understand why that failure
would've been only partially reproducible.  Maybe we should dig
a bit harder, rather than just deciding that we fixed it.

regards, tom lane




Re: Change atoi to strtol in same place

2019-10-04 Thread Alvaro Herrera
On 2019-Oct-03, Joe Nelson wrote:

> Kyotaro Horiguchi wrote:
> > > pg_standby: -k keepfiles could not parse 'hoge' as integer
> >
> > I didn't checked closely, but -k of pg_standby's message looks
> > somewhat strange. Needs a separator?
> 
> Good point, how about this:
> 
>   pg_standby: -k keepfiles: 

The wording is a bit strange.  How about something like
pg_standy: invalid argument to -k: %s

where the %s is the error message produced like you propose:

> I could have pg_strtoint64_range() wrap its error messages in _() so
> that translators could customize the messages prior to concatenation.
> 
>   *error = psprintf(_("could not parse '%s' as integer"), str);

... except that they would rather be more explicit about what the
problem is.  "insufficient digits" or "extraneous character", etc.

> Would this suffice?

I hope that no callers would like to have the messages not translated,
because that seems like it would become a mess.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Jacob Champion
On Fri, Oct 4, 2019 at 7:51 AM Tom Lane  wrote:
> I concur with Joe here.  The reason why some of the existing
> memset's use "false" is for symmetry with other places where we use
> "memset(p, true, n)" to set an array of bools to all-true.  That
> coding is unfortunately a bit dubious --- it would sort-of fail if
> bool weren't of width 1, in that the bools would still test as true
> but they wouldn't contain the standard bit pattern for true.
> I don't want to change those places, but we shouldn't make the
> mechanism proposed by this patch look like it can do anything but
> initialize to zeroes.
>
> regards, tom lane

Why introduce a macro at all for the universal zero initializer, if it
seems to encourage the construction of other (incorrect) macros? IMO
the use of {0} as an initializer is well understood in the C developer
community, and I'm used to it showing up verbatim in code. Similar to
{}'s role as the C++ universal initializer.

--Jacob




Re: refactoring - share str2*int64 functions

2019-10-04 Thread Andres Freund
Hi,

On 2019-10-04 14:27:44 +0530, Ashutosh Sharma wrote:
> Is there any specific reason for hard coding the *base* of a number
> representing the string in strtouint64(). I understand that currently
> strtouint64() is being used just to convert an input string to decimal
> unsigned value but what if we want it to be used for hexadecimal
> values or may be some other values, in that case it can't be used.

It's a lot slower if the base is variable, because the compiler cannot
replace the division by shifts.

Greetings,

Andres Freund




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Tom Lane
Joe Nelson  writes:
> One might argue that INIT_ALL_ELEMS_FALSE as a synonym for
> INIT_ALL_ELEMS_ZERO is good for readability in the same way that "false"
> is for 0. However I want to avoid creating the impression that there is,
> or can be, a collection of INIT_ALL_ELEMS_xxx macros invoking different
> initializer behavior.

I concur with Joe here.  The reason why some of the existing
memset's use "false" is for symmetry with other places where we use
"memset(p, true, n)" to set an array of bools to all-true.  That
coding is unfortunately a bit dubious --- it would sort-of fail if
bool weren't of width 1, in that the bools would still test as true
but they wouldn't contain the standard bit pattern for true.
I don't want to change those places, but we shouldn't make the
mechanism proposed by this patch look like it can do anything but
initialize to zeroes.

regards, tom lane




Re: Memory Accounting

2019-10-04 Thread Tom Lane
Jeff Davis  writes:
> On Fri, 2019-10-04 at 10:26 +0200, Tomas Vondra wrote:
>> Yeah, I think that's an oversight. Maybe there's a reason why Jeff
>> used int64, but I can't think of any.

> I had chosen a 64-bit value to account for the situation Tom mentioned:
> that, in theory, Size might not be large enough to represent all
> allocations in a memory context. Apparently, that theoretical situation
> is not worth being concerned about.

Well, you could also argue it the other way: maybe in our children's
time, int64 won't be as wide as Size.  (Yeah, I know that sounds
ridiculous, but needing pointers wider than 32 bits was a ridiculous
idea too when I started in this business.)

The committed fix seems OK to me except that I think you should've
also changed MemoryContextMemAllocated() to return Size.

regards, tom lane




Remove some code for old unsupported versions of MSVC

2019-10-04 Thread Peter Eisentraut
As of d9dd406fe281d22d5238d3c26a7182543c711e74, we require MSVC 2013,
which means _MSC_VER >= 1800.  This means that conditionals about
older versions of _MSC_VER can be removed or simplified.

Previous code was also in some cases handling MinGW, where _MSC_VER is
not defined at all, incorrectly, such as in pg_ctl.c and win32_port.h,
leading to some compiler warnings.  This should now be handled better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d93b420940fe162e833d2008c681f6403d9e7b7b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 4 Oct 2019 16:26:06 +0200
Subject: [PATCH] Remove some code for old unsupported versions of MSVC

As of d9dd406fe281d22d5238d3c26a7182543c711e74, we require MSVC 2013,
which means _MSC_VER >= 1800.  This means that conditionals about
older versions of _MSC_VER can be removed or simplified.

Previous code was also in some cases handling MinGW, where _MSC_VER is
not defined at all, incorrectly, such as in pg_ctl.c and win32_port.h,
leading to some compiler warnings.  This should now be handled better.
---
 src/backend/utils/adt/pg_locale.c |  23 +-
 src/backend/utils/adt/selfuncs.c  |  13 -
 src/bin/pg_ctl/pg_ctl.c   |  31 --
 src/include/pg_config.h.win32 |  33 +-
 src/include/port/win32.h  |   2 +-
 src/include/port/win32_port.h |  12 -
 src/include/utils/float.h |   4 +-
 .../ecpg/test/expected/pgtypeslib-nan_test.c  | 107 +++---
 .../test/expected/pgtypeslib-nan_test.stderr  | 354 +-
 .../ecpg/test/pgtypeslib/nan_test.pgc |   7 -
 src/port/chklocale.c  |   4 +-
 src/tools/msvc/Solution.pm|   2 -
 12 files changed, 235 insertions(+), 357 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c 
b/src/backend/utils/adt/pg_locale.c
index b2f08ead45..2a076a3dfd 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -973,7 +973,7 @@ cache_locale_time(void)
 static char *
 IsoLocaleName(const char *winlocname)
 {
-#if (_MSC_VER >= 1400) /* VC8.0 or later */
+#ifdef _MSC_VER
static char iso_lc_messages[32];
_locale_t   loct = NULL;
 
@@ -987,7 +987,6 @@ IsoLocaleName(const char *winlocname)
loct = _create_locale(LC_CTYPE, winlocname);
if (loct != NULL)
{
-#if (_MSC_VER >= 1700) /* Visual Studio 2012 or later */
size_t  rc;
char   *hyphen;
 
@@ -1014,28 +1013,10 @@ IsoLocaleName(const char *winlocname)
hyphen = strchr(iso_lc_messages, '-');
if (hyphen)
*hyphen = '_';
-#else
-   charisolang[32],
-   isocrty[32];
-   LCIDlcid;
-
-   lcid = loct->locinfo->lc_handle[LC_CTYPE];
-   if (lcid == 0)
-   lcid = MAKELCID(MAKELANGID(LANG_ENGLISH, 
SUBLANG_ENGLISH_US), SORT_DEFAULT);
-   _free_locale(loct);
-
-   if (!GetLocaleInfoA(lcid, LOCALE_SISO639LANGNAME, isolang, 
sizeof(isolang)))
-   return NULL;
-   if (!GetLocaleInfoA(lcid, LOCALE_SISO3166CTRYNAME, isocrty, 
sizeof(isocrty)))
-   return NULL;
-   snprintf(iso_lc_messages, sizeof(iso_lc_messages) - 1, "%s_%s", 
isolang, isocrty);
-#endif
return iso_lc_messages;
}
-   return NULL;
-#else
+#endif /* _MSC_VER */
return NULL;/* Not supported on this 
version of msvc/mingw */
-#endif /* _MSC_VER >= 1400 */
 }
 #endif /* WIN32 && LC_MESSAGES 
*/
 
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 17101298fb..35a8995f62 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4092,20 +4092,7 @@ convert_string_datum(Datum value, Oid typid, Oid collid, 
bool *failure)
 * crashes since it will only give an estimation error and 
nothing
 * fatal.
 */
-#if _MSC_VER == 1400   /* VS.Net 2005 */
-
-   /*
-*
-* 
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=99694
-*/
-   {
-   charx[1];
-
-   xfrmlen = strxfrm(x, val, 0);
-   }
-#else
xfrmlen = strxfrm(NULL, val, 0);
-#endif
 #ifdef WIN32
 
/*
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index dd76be6dd2..a3fd002ac4 100644
--- 

Re: Memory Accounting

2019-10-04 Thread Jeff Davis
On Fri, 2019-10-04 at 10:26 +0200, Tomas Vondra wrote:
> On Fri, Oct 04, 2019 at 12:36:01AM -0400, Tom Lane wrote:
> > So ... why exactly did this patch define
> > MemoryContextData.mem_allocated
> > as int64?  That seems to me to be doubly wrong: it is not the right
> > width
> > on 32-bit machines, and it is not the right signedness anywhere.  I
> > think
> > that field ought to be of type Size (a/k/a size_t, but memnodes.h
> > always
> > calls it Size).
> > 
> 
> Yeah, I think that's an oversight. Maybe there's a reason why Jeff
> used
> int64, but I can't think of any.

I had chosen a 64-bit value to account for the situation Tom mentioned:
that, in theory, Size might not be large enough to represent all
allocations in a memory context. Apparently, that theoretical situation
is not worth being concerned about.

The patch has been floating around for a very long time, so I don't
remember exactly why I chose a signed value. Sorry.

Regards,
Jeff Davis






Re: Memory Accounting

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 10:26:44AM +0200, Tomas Vondra wrote:

On Fri, Oct 04, 2019 at 12:36:01AM -0400, Tom Lane wrote:

I haven't chased down exactly what else would need to change.
It might be that s/int64/Size/g throughout the patch is
sufficient, but I haven't analyzed it.



I think so too, but I'll take a closer look in the afternoon, unless you
beat me to it.



I've pushed a fix changing the type to Size, splitting the mem_allocated
to two separate updates (to prevent any underflows in the subtraction).
Hopefully this fixes the 32-bit machines ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Joe Nelson
Amit Kapila wrote:
> > How about I just define them both the same?
> > #define INIT_ALL_ELEMS_ZERO {0}
> > #define INIT_ALL_ELEMS_FALSE{0}
>
> I think using one define would be preferred, but you can wait and see
> if others prefer defining different macros for the same thing.

+1 on using INIT_ALL_ELEMS_ZERO everywhere, even for bool[].  You may
worry, "Should we really be assigning 0 to something of type bool? Is
that the wrong type or a detail that varies by implementation?" It's
safe though, the behavior is guaranteed to be correct by section 7.16 of
the C99 spec, which says that bool, true, and false are always macros
for _Bool, 1, and 0 respectively.

One might argue that INIT_ALL_ELEMS_FALSE as a synonym for
INIT_ALL_ELEMS_ZERO is good for readability in the same way that "false"
is for 0. However I want to avoid creating the impression that there is,
or can be, a collection of INIT_ALL_ELEMS_xxx macros invoking different
initializer behavior.




Re: [HACKERS] Block level parallel vacuum

2019-10-04 Thread Masahiko Sawada
On Fri, Oct 4, 2019 at 2:31 PM Amit Kapila  wrote:
>
> On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada  wrote:
>>
>> On Thu, Oct 3, 2019 at 9:06 PM Dilip Kumar  wrote:
>> >
>> > On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada  
>> > wrote:
>> >
>> > + else
>> > + {
>> > + if (for_cleanup)
>> > + {
>> > + if (lps->nworkers_requested > 0)
>> > + appendStringInfo(,
>> > + ngettext("launched %d parallel vacuum worker for index cleanup
>> > (planned: %d, requested %d)",
>> > +   "launched %d parallel vacuum workers for index cleanup (planned:
>> > %d, requsted %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers,
>> > + lps->nworkers_requested);
>> > + else
>> > + appendStringInfo(,
>> > + ngettext("launched %d parallel vacuum worker for index cleanup (planned: 
>> > %d)",
>> > +   "launched %d parallel vacuum workers for index cleanup (planned: %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers);
>> > + }
>> > + else
>> > + {
>> > + if (lps->nworkers_requested > 0)
>> > + appendStringInfo(,
>> > + ngettext("launched %d parallel vacuum worker for index vacuuming
>> > (planned: %d, requested %d)",
>> > +   "launched %d parallel vacuum workers for index vacuuming (planned:
>> > %d, requested %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers,
>> > + lps->nworkers_requested);
>> > + else
>> > + appendStringInfo(,
>> > + ngettext("launched %d parallel vacuum worker for index vacuuming
>> > (planned: %d)",
>> > +   "launched %d parallel vacuum workers for index vacuuming (planned: 
>> > %d)",
>> > +   lps->pcxt->nworkers_launched),
>> > + lps->pcxt->nworkers_launched,
>> > + lps->pcxt->nworkers);
>> > + }
>> >
>> > Multiple places I see a lot of duplicate code for for_cleanup is true
>> > or false.  The only difference is in the error message whether we give
>> > index cleanup or index vacuuming otherwise complete code is the same
>> > for
>> > both the cases.  Can't we create some string and based on the value of
>> > the for_cleanup and append it in the error message that way we can
>> > avoid duplicating this at many places?
>>
>> I think it's necessary for translation. IIUC if we construct the
>> message it cannot be translated.
>>
>
> Do we really need to log all those messages?  The other places where we 
> launch parallel workers doesn't seem to be using such messages.  Why do you 
> think it is important to log the messages here when other cases don't use it?

Well I would rather think that parallel create index doesn't log
enough messages. Parallel maintenance operation is invoked manually by
user. I can imagine that DBA wants to cancel and try the operation
again later if enough workers are not launched. But there is not a
convenient way to confirm how many parallel workers planned and
actually launched. We need to see ps command or pg_stat_activity.
That's why I think that log message would be helpful for users.

Regards,

--
Masahiko Sawada




Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-04 Thread Alexey Kondratov

On 04.10.2019 11:37, Michael Paquier wrote:

On Thu, Oct 03, 2019 at 12:43:37PM +0300, Alexey Kondratov wrote:

On 03.10.2019 6:07, Michael Paquier wrote:

I have reworked your first patch as per the attached.  What do you
think about it?  The part with the control file needs to go down to
v12, and I would likely split that into two commits on HEAD: one for
the control file and a second for the recovery.conf portion with the
fix for --no-ensure-shutdown to keep a cleaner history.

It looks fine for me excepting the progress reporting part. It now adds
PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control file
is either included into filemap and fetch_size or counted during
calculate_totals(). Maybe I've missed something, but now it looks like we
report something that wasn't planned for progress reporting, doesn't
it?

Right.  The pre-12 code actually handles that incorrecly as it assumed
that any files written through file_ops.c should be part of the
progress.  So I went with the simplest solution, and backpatched this
part with 6f3823b.  I have also committed the set of fixes for the new
options so as we have a better base of work than what's on HEAD
currently.


Great, thanks.



Regarding the tests, adding a --dry-run command is a good idea.
However I think that there is more value to automate the use of the
single user mode automatically in the tests as that's more critical
from the point of view of rewind run, and stopping the cluster with
immediate mode causes, as expected, the next --dry-run command to
fail.

Another thing is that I think that we should use -F with --single.
This makes recovery faster, and the target data folder is synced
at the end of pg_rewind anyway.

Using the long option names makes the tests easier to follow in this
case, so I have switched -R to --write-recovery-conf.

Some comments and the docs have been using some confusing wording, so
I have reworked what I found (like many "it" in a single sentence
referring different things).


I agree with all the points. Shutting down target server using 
'immediate' mode is a good way to test ensureCleanShutdown automatically.



Regarding all the set of incompatible options, we have much more of
that after the initial option parsing so I think that we should group
all the cheap ones together.  Let's tackle that as a separate patch.
We can also just check after --no-ensure-shutdown directly in
RewindTest.pm as I have switched the cluster to not be cleanly shut
down anymore to stress the automatic recovery path, and trigger that
before running pg_rewind for the local and remote mode.

Attached is an updated patch with all I found.  What do you think?


I've checked your patch, but it seems that it cannot be applied as is, 
since it e.g. adds a comment to 005_same_timeline.pl without actually 
changing the test. So I've slightly modified your patch and tried to fit 
both dry-run and ensureCleanShutdown testing together. It works just 
fine and fails immediately if any of recent fixes is reverted. I still 
think that dry-run testing is worth adding, since it helped to catch 
this v12 refactoring issue, but feel free to throw it way if it isn't 
commitable right now, of course.


As for incompatible options and sanity checks testing, yes, I agree that 
it is a matter of different patch. I attached it as a separate WIP patch 
just for history. Maybe I will try to gather more cases there later.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 6e5667edcad6b037004288635a7ae0eda40d4262 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 4 Oct 2019 17:14:12 +0300
Subject: [PATCH v3 1/2] Improve functionality, docs and tests of -R,
 --no-ensure-shutdown and --dry-run options

Branch: pg-rewind-fixes
---
 doc/src/sgml/ref/pg_rewind.sgml| 10 +--
 src/bin/pg_rewind/pg_rewind.c  | 19 +++---
 src/bin/pg_rewind/t/001_basic.pl   |  2 +-
 src/bin/pg_rewind/t/002_databases.pl   |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl  |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/005_same_timeline.pl   | 32 +++---
 src/bin/pg_rewind/t/RewindTest.pm  | 71 +-
 8 files changed, 103 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index fbf454803b..42d29edd4e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -169,12 +169,14 @@ PostgreSQL documentation
   --no-ensure-shutdown
   

-pg_rewind verifies that the target server
-is cleanly shutdown before rewinding; by default, if it isn't, it
-starts the server in single-user mode to complete crash recovery.
+pg_rewind requires that the target server
+is cleanly shut down before rewinding. By default, if the target server
+is not shut down cleanly, pg_rewind starts
+

Re: [HACKERS] Block level parallel vacuum

2019-10-04 Thread Masahiko Sawada
On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila  wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada  wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila  wrote:
>> > *
>> > In function compute_parallel_workers, don't we want to cap the number
>> > of workers based on maintenance_work_mem as we do in
>> > plan_create_index_workers?
>> >
>> > The basic point is how do we want to treat maintenance_work_mem for
>> > this feature.  Do we want all workers to use at max the
>> > maintenance_work_mem or each worker is allowed to use
>> > maintenance_work_mem?  I would prefer earlier unless we have good
>> > reason to follow the later strategy.
>> >
>> > Accordingly, we might need to update the below paragraph in docs:
>> > "Note that parallel utility commands should not consume substantially
>> > more memory than equivalent non-parallel operations.  This strategy
>> > differs from that of parallel query, where resource limits generally
>> > apply per worker process.  Parallel utility commands treat the
>> > resource limit maintenance_work_mem as a limit to
>> > be applied to the entire utility command, regardless of the number of
>> > parallel worker processes."
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index 
> cleanup?  However, I think we are using during index cleanup, see forex. 
> ginInsertCleanup.  I think before reaching any conclusion about what to do 
> about this, first we need to establish whether this is a problem.  If I am 
> correct, then only some of the index cleanups (like gin index) use 
> maintenance_work_mem, so we need to consider that point while designing a 
> solution for this.
>

I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem. It might be too small in
some cases but it doesn't consume more memory than single lazy vacuum
as long as index AM doesn't consume more memory regardless of
maintenance_work_mem. I think it really depends on the implementation
of index AM.

>>
>> > *
>> > + keys++;
>> > +
>> > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>> > + maxtuples = compute_max_dead_tuples(nblocks, true);
>> > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
>> > +mul_size(sizeof(ItemPointerData), maxtuples)));
>> > + shm_toc_estimate_chunk(>estimator, est_deadtuples);
>> > + keys++;
>> > +
>> > + shm_toc_estimate_keys(>estimator, keys);
>> > +
>> > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
>> > + querylen = strlen(debug_query_string);
>> > + shm_toc_estimate_chunk(>estimator, querylen + 1);
>> > + shm_toc_estimate_keys(>estimator, 1);
>> >
>> > The code style looks inconsistent here.  In some cases, you are
>> > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > and in other cases, you are accumulating keys.  I think it is better
>> > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > in all cases.
>>
>> Fixed. But there are some code that call shm_toc_estimate_keys for
>> multiple keys in for example nbtsort.c and parallel.c. What is the
>> difference?
>>
>
> We can do it, either way, depending on the situation.  For example, in 
> nbtsort.c, there is an if check based on which 'number of keys' can vary.  I 
> think here we should try to write in a way that it should not confuse the 
> reader why it is done in a particular way.  This is the reason I told you to 
> be consistent.

Understood. Thank you for explanation!

>
>>
>> >
>> > *
>> > +void
>> > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>> > +{
>> > ..
>> > + /* Open table */
>> > + onerel = heap_open(lvshared->relid, 

Re: Close stdout and stderr in syslogger

2019-10-04 Thread Robert Haas
On Thu, Oct 3, 2019 at 8:30 AM Святослав Ермилин
 wrote:
> There is another way to solve this problem:
> We can give users the opportunity to leave or close descriptors.
> I.e. config switch for this. But I think that it's too complicated.

The typical solution to this problem is to send the stdout/stderr to a
logfile that isn't rotated because it never gets very large, and
subsequent output to the real logfile. Seems like that would be the
low-stress way to go here, rather than trying to change the code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-04 Thread Robert Haas
On Thu, Oct 3, 2019 at 9:42 PM Stephen Frost  wrote:
> > It doesn't seem like it would require
> > much work at all to construct an argument that a hacker might enjoy
> > having unfettered access to pg_clog even if no other part of the
> > database can be read.
>
> The question isn't about what hackers would like to have access to, it's
> about what would actually provide them with a channel to get information
> that's sensitive, and at what rate.  Perhaps there's an argument to be
> made that clog would provide a high enough rate of information that
> could be used to glean sensitive information, but that's certainly not
> an argument that's been put forth, instead it's the knee-jerk reaction
> of "oh goodness, if anything isn't encrypted then hackers will be able
> to get access to everything" and that's just not a real argument.

Well, I gather that you didn't much like my characterization of your
argument as "making it up as you go along," which is probably fair,
but I doubt that the people who are arguing that we should encrypt
anything will appreciate your characterization of their argument as
"knee-jerk" any better.

I think everyone would agree that if you have no information about a
database other than the contents of pg_clog, that's not a meaningful
information leak. You would be able to tell which transactions
committed and which transactions aborted, but since you know nothing
about the data inside those transactions, it's of no use to you.
However, in that situation, you probably wouldn't be attacking the
database in the first place. Most likely you have some knowledge about
what it contains. Maybe there's a stream of sensor data that flows
into the database, and you can see that stream.  By watching pg_clog,
you can see when a particular bit of data is rejected. That could be
valuable.

To take a highly artificial example, suppose that the database is fed
by secret video cameras which identify the faces of everyone who
boards a commercial aircraft and records all of those names in a
database, but high-ranking government officials are exempt from the
program and there's a white-list of people whose names can't be
inserted. When the system tries, a constraint violation occurs and the
transaction aborts.  Now, if you see a transaction abort show up in
pg_clog, you know that either a high-ranking government official just
tried to walk onto a plane, or the system is broken.  If you see a
whole bunch of aborts within a few hours of each other, separated by
lots of successful insertions, maybe you can infer a cabinet meeting.
I don't know. That's a little bit of a stretch, but I don't see any
reason why something like that can't happen. There are probably more
plausible examples.

The point is that it's unreasonable, at least in my view, to decide
that the knowledge of which transactions commit and which transactions
abort isn't sensitive. Yeah, on a lot of systems it won't be, but on
some systems it might be, so it should be encrypted.

What I really find puzzling here is that Cybertec had a patch that
encrypted -- well, I don't remember whether it encrypted this, but it
encrypted a lot of stuff, and it spent a lot of time being concerned
about these exact kinds of issues.  I know for example that they
thought about the stats file, which is an even more clear vector for
information leakage than we're talking about here.  They thought about
logical decoding spill files, also a clear vector for information
leakage. Pretty sure they also thought about WAL. That's all really
important stuff, and one thing I learned from reading that patch is
that you can't solve those problems in a trivial, mechanical way. Some
of those systems currently write data byte-by-byte, and converting
them to work block-by-block makes encrypting them a lot easier. So it
seems to me that even if you think that patch had the dumbest key
management system in the history of the universe, you ought to be
embracing some of the ideas that are in that patch because they'll
make any future encryption project easier. Instead of arguing about
whether these side-channel attacks are important -- and I seem not to
be alone here in believing that they are -- we could be working to get
code that has already been written to help solve those problems
committed.

I ask again -- why are you so opposed to a single-key,
encrypt-everything approach? Even if you think multiple-key,
encrypt-only-some-things is better, they don't have to block each
other.

> Which database systems have you looked at which have the properties
> you're describing above that we should be working hard towards?

I haven't studied other database systems much myself. I have, however,
talked with coworkers of mine who are trying to convince people to use
PostgreSQL and/or Advanced Server, and I've heard a lot from them
about what the customers with whom they work would like to see. I base
my comments on those conversations. What I hear from them is basically
that anything we 

Re: fairywren failures

2019-10-04 Thread Andrew Dunstan


On 10/3/19 4:13 PM, Tom Lane wrote:
> Andres Freund  writes:
>> * It's certainly curious that the failures so far only have happended as
>>   part of pg_upgradeCheck, rather than the plain regression tests.
> Isn't it though.  We spent a long time wondering why we saw parallel
> plan instability mostly in pg_upgradeCheck, too [1].  We eventually
> decided that the cause of that instability was chance timing collisions
> with bgwriter/checkpointer, but nobody ever really explained why
> pg_upgradeCheck should be more prone to hit those windows than the plain
> tests are.  I feel like there's something still to be understood there.
>
> Whether this is related, who's to say.  But given your thought about
> stack alignment, I'm half thinking that the crash is seen when we get a
> signal (e.g. SIGUSR1 from sinval processing) at the wrong time, allowing
> the stack to become unaligned, and that the still-unexplained timing
> difference in pg_upgradeCheck accounts for that test being more prone to
> show it.
>
>   regards, tom lane
>
> [1] 
> https://www.postgresql.org/message-id/20190605050037.ga33...@rfd.leadboat.com



Yes, that's very puzzling. But what do we actually do differently in the
pg_upgrade checks that might account for it? Nothing that is at all
obvious to me that might account for it.


Another data point: the new Visual Studio 2019 instance drongo running
on the same machine is not exhibiting these problems. Yes, it's not
running test.sh, but vcregress.pl does pretty much the same thing. So
that does seem to point to the toolset. I'll see if I can get the same
toolset jacana is using installed and try that.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: HashTable KeySize

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 05:06:47PM +0530, Natarajan R wrote:

typedef struct HashTableKey
{
 Oid dbId; // 4 bytes
 int64 productid; // 8 bytes
}HashTableKey; (total size - 12 bytes)

typedef struct HashTableEntry
{
 HashTableKey key;
 ProductInfo *pdt;
}HashTableEntry;

HASHCTL hashInfo;
hashInfo.keysize = sizeof(HashTableKey);
hashInfo.entrysize = sizeof(HashTableEntry);
SampleHashTable = ShmemInitHash("productid vs product struct HashTable",
size, size, , HASH_ELEM | HASH_SHARED_MEM | HASH_BLOBS);

while printing keysize: elog(LOG,"Keysize = %d",sizeof(HashTableKey));

I am getting Keysize = 16, How? what should i need to do inorder to have
keysize = 12


That's likely due to alignment. The second field is a 64-bit value will
be aligned at 8-byte boundary, so in memory the struct will look like
this:

 dbId -- 4 bytes
 padding -- 4 bytes
 productId -- 8 bytes

See

 https://en.wikipedia.org/wiki/Data_structure_alignment

and there's also a tool to show the memory layout:

 https://linux.die.net/man/1/pahole


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Consider low startup cost in add_partial_path

2019-10-04 Thread Robert Haas
On Wed, Oct 2, 2019 at 10:22 AM James Coleman  wrote:
> In all cases I've been starting with:
>
> set enable_hashjoin = off;
> set enable_nestloop = off;
> set max_parallel_workers_per_gather = 4;
> set min_parallel_index_scan_size = 0;
> set min_parallel_table_scan_size = 0;
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
>
> I've also tried various combinations of random_page_cost,
> cpu_index_tuple_cost, cpu_tuple_cost.
>
> Interestingly I've noticed plans joining two relations that look like:
>
>  Limit
>->  Merge Join
>  Merge Cond: (t1.pk = t2.pk)
>  ->  Gather Merge
>Workers Planned: 4
>->  Parallel Index Scan using t_pkey on t t1
>  ->  Gather Merge
>Workers Planned: 4
>->  Parallel Index Scan using t_pkey on t t2
>
> Where I would have expected a Gather Merge above a parallelized merge
> join. Is that reasonable to expect?

Well, you told the planner that parallel_setup_cost = 0, so starting
workers is free. And you told the planner that parallel_tuple_cost =
0, so shipping tuples from the worker to the leader is also free. So
it is unclear why it should prefer a single Gather Merge over two
Gather Merges: after all, the Gather Merge is free!

If you use give those things some positive cost, even if it's smaller
than the default, you'll probably get a saner-looking plan choice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: max_parallel_workers question

2019-10-04 Thread Robert Haas
On Sat, Sep 28, 2019 at 1:36 PM Jeff Davis  wrote:
> In that case, PGC_SIGHUP seems most appropriate.

Yeah.

> It also might make more sense to rename it to reserved_worker_processes
> and invert the meaning. To me, that would be more clear that it's
> designed to prevent parallel query from interfering with other uses of
> worker processes.

I don't think that would work as well.  Some day we might have another
class of worker processes with its own independent limit, and then
this terminology would get confusing.  It makes sense to say that you
can have up to 10 worker processes of which at most 4 can be used for
parallel query and at most 3 can be used for logical replication, but
it doesn't make nearly as much sense to say that you can have up to 10
worker processes of which 6 can't be used for parallel query and of
which 7 can't be used for logical application.  That leaves, uh, how
many?

> Another option would be to make it two pools, one for parallel workers
> and one for everything else, and each one would be controlled by a
> PGC_POSTMASTER setting. But it seems like some thought went into trying
> to share the pool of workers[1], so I assume there was a good reason
> you wanted to do that.

Here again, I imagine that in the future we might have various
different worker classes that need to share the total number of
workers, but not necessarily via a hard partition. For example, you
could sensible say that there are 3 purposes for workers and 10
workers, and no single purpose can consume more than 4 workers. Even
though 4 * 3 > 10, it's a completely reasonable configuration. The
early bird gets the juiciest worm, and the late bird doesn't starve to
death.  Even a more extreme configuration where you limit each purpose
to, say, 7 workers could be reasonable. Here there is a risk of
starvation, but you may know that in your environment it's not likely
to last for very long.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: let's kill AtSubStart_Notify

2019-10-04 Thread Robert Haas
On Fri, Sep 27, 2019 at 5:41 AM Jeevan Ladhe
 wrote:
> I have reviewed your patch, and it seems correctly implementing the
> actions per subtransactions using stack. Atleast I could not find
> any flaw with your implementation here.

Thanks for the review.  Based on this and other positive comments made
on this thread, I have committed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: WIP/PoC for parallel backup

2019-10-04 Thread Robert Haas
On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman  wrote:
> Based on my understanding your main concern is that the files won't be 
> distributed fairly i.e one worker might get a big file and take more time 
> while others get done early with smaller files? In this approach I have 
> created a list of files in descending order based on there sizes so all the 
> big size files will come at the top. The maximum file size in PG is 1GB so if 
> we have four workers who are picking up file from the list one by one, the 
> worst case scenario is that one worker gets a file of 1GB to process while 
> others get files of smaller size. However with this approach of descending 
> files based on size and handing it out to workers one by one, there is a very 
> high likelihood of workers getting work evenly. does this address your 
> concerns?

Somewhat, but I'm not sure it's good enough. There are lots of reasons
why two processes that are started at the same time with the same
amount of work might not finish at the same time.

I'm also not particularly excited about having the server do the
sorting based on file size.  Seems like that ought to be the client's
job, if the client needs the sorting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: recovery_min_apply_delay in archive recovery causes assertion failure in latch

2019-10-04 Thread Fujii Masao
On Mon, Sep 30, 2019 at 12:49 AM Fujii Masao  wrote:
>
> Hi,
>
> I got the following assertion failure when I enabled recovery_min_apply_delay
> and started archive recovery (i.e., I put only recovery.signal not
> standby.signal).
>
> TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File:
> "latch.c", Line: 522)
>
> Here is the example to reproduce the issue:
>
> 
> initdb -D data
> pg_ctl -D data start
> psql -c "alter system set recovery_min_apply_delay to '60s'"
> psql -c "alter system set archive_mode to on"
> psql -c "alter system set archive_command to 'cp %p ../arch/%f'"
> psql -c "alter system set restore_command to 'cp ../arch/%f %p'"
> mkdir arch
> pg_basebackup -D bkp -c fast
> pgbench -i
> pgbench -t 1000
> pg_ctl -D data -m i stop
> rm -rf bkp/pg_wal
> mv data/pg_wal bkp
> rm -rf data
> mv bkp data
> touch data/recovery.signal
> pg_ctl -D data -W start
> 
>
> The latch that causes this assertion failure is recoveryWakeupLatch.
> The ownership of this latch is taken only when standby mode is
> requested. But this latch can be used when starting archive recovery
> with recovery_min_apply_delay set even though it's unowned.
> So the assertion failure happened.
>
> Attached patch fixes this issue by making archive recovery always ignore
> recovery_min_apply_delay. This change is OK because
> recovery_min_apply_delay was introduced for standby mode, I think.

No, I found the following description in the doc.

--
This parameter is intended for use with streaming replication deployments;
however, if the parameter is specified it will be honored in all cases
--

So archive recovery with recovery_min_apply_delay enabled would be
intended configuration. My patch that changes archive recovery so that
it always ignores thesetting might be in wrong direction. Maybe we should
make recovery_min_apply_delay work properly even in archive recovery.
Thought?

Regards,

-- 
Fujii Masao




Re: [HACKERS] Block level parallel vacuum

2019-10-04 Thread vignesh C
On Fri, Oct 4, 2019 at 4:18 PM Amit Kapila  wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada  wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila  wrote:
>> >
One comment:
We can check if parallel_workers is within range something within
MAX_PARALLEL_WORKER_LIMIT.
+ int parallel_workers = 0;
+
+ if (optarg != NULL)
+ {
+ parallel_workers = atoi(optarg);
+ if (parallel_workers <= 0)
+ {
+ pg_log_error("number of parallel workers must be at least 1");
+ exit(1);
+ }
+ }

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: pg_upgrade fails with non-standard ACL

2019-10-04 Thread Anastasia Lubennikova

27.09.2019 15:51, Bruce Momjian wrote:

On Fri, Sep 27, 2019 at 04:22:15PM +0900, Michael Paquier wrote:

On Thu, Sep 26, 2019 at 04:19:38PM -0400, Bruce Momjian wrote:

On Thu, Sep 26, 2019 at 05:16:19PM -0300, Alvaro Herrera wrote:

On 2019-Sep-26, Bruce Momjian wrote:

Well, right now, pg_upgrade --check succeeds, but the upgrade fails.  I
am proposing, at a minimum, that pg_upgrade --check fails in such cases,

Agreed, that should be a minimum fix.

Yes.

Agreed as well here.  At least the latest patch proposed has the merit
to track automatically functions not existing anymore from the
source's version to the target's version, so patching --check offers a
good compromise.  Bruce, are you planning to look more at the patch
posted at [1]?

I did look at it.  It has some TODO items listed in it still, and some
C++ comments, but if everyone likes it I can apply it.


Cool. It seems that everyone agrees on this patch.

I attached the updated version. Now it prints a better error message
and generates an SQL script instead of multiple warnings. The attached 
test script shows that.


Patches for 10, 11, and 12 slightly differ due to merge conflicts, so I 
attached multiple versions.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit dc6d9246c6255bed2bdb2e850a21bb1fc5e0c2fc
Author: Anastasia 
Date:   Fri Oct 4 14:39:54 2019 +0300

pg_upgrade_ACL_check_v2

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index e28b661..bfe9bbb 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -90,6 +90,7 @@ check_and_dump_old_cluster(bool live_check)
 
 	get_loadable_libraries();
 
+	get_catalog_procedures(_cluster);
 
 	/*
 	 * Check for various failure cases
@@ -149,6 +150,16 @@ check_new_cluster(void)
 
 	check_loadable_libraries();
 
+	/*
+	 * When restoring non-default ACL from old cluster we may attempt to apply
+	 * GRANT for functions whose signatures have changed significantly.
+	 *
+	 * Compare function lists of old cluster and new cluster to catch
+	 * the incompatibility early and report it to user with a nice error message
+	 */
+	get_catalog_procedures(_cluster);
+	check_catalog_procedures(_cluster, _cluster);
+
 	if (user_opts.transfer_mode == TRANSFER_MODE_LINK)
 		check_hard_link();
 
diff --git a/src/bin/pg_upgrade/function.c b/src/bin/pg_upgrade/function.c
index 063a94f..0fce929 100644
--- a/src/bin/pg_upgrade/function.c
+++ b/src/bin/pg_upgrade/function.c
@@ -13,6 +13,7 @@
 
 #include "access/transam.h"
 #include "catalog/pg_language.h"
+#include "fe_utils/string_utils.h"
 
 
 /*
@@ -275,3 +276,204 @@ check_loadable_libraries(void)
 	else
 		check_ok();
 }
+
+/*
+ * qsort comparator for procedure signatures
+ */
+static int
+proc_compare_sig(const void *p1, const void *p2)
+{
+	ProcInfo *proc1 = (ProcInfo *) p1;
+	ProcInfo *proc2 = (ProcInfo *) p2;
+
+	return strcmp(proc1->procsig, proc2->procsig);
+}
+
+/*
+ * Fetch the signatures and ACL of cluster's system procedures.
+ */
+void
+get_catalog_procedures(ClusterInfo *cluster)
+{
+	int			dbnum;
+
+	/*
+	 * Fetch all procedure signatures and ACL.
+	 * Each procedure may have different ACL in different database.
+	 */
+	for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+	{
+		DbInfo	   *dbinfo = >dbarr.dbs[dbnum];
+		PGconn	   *conn = connectToServer(cluster, dbinfo->db_name);
+		PGresult   *res;
+		int			num_procs;
+		int			rowno;
+
+		/*
+		 * Fetch procedure signatures and ACL of functions that have
+		 * some non default ACL.
+		 */
+		if (cluster->major_version >= 11)
+		{
+			res = executeQueryOrDie(conn,
+		"select proname::text || '('"
+		" || pg_get_function_arguments(pg_proc.oid)::text"
+		" || ')' as funsig,"
+		" array (SELECT unnest(pg_proc.proacl)"
+		" EXCEPT SELECT unnest(pg_init_privs.initprivs))"
+		" from pg_proc join pg_init_privs"
+		" on pg_proc.oid = pg_init_privs.objoid"
+		" where prokind='f' and proacl != initprivs;");
+		}
+		else
+		{
+			res = executeQueryOrDie(conn,
+	"select proname::text || '('"
+	" || pg_get_function_arguments(pg_proc.oid)::text"
+	" || ')' as funsig,"
+	" array (SELECT unnest(pg_proc.proacl)"
+	" EXCEPT SELECT unnest(pg_init_privs.initprivs))"
+	" from pg_proc join pg_init_privs"
+	" on pg_proc.oid = pg_init_privs.objoid"
+	" where proisagg = false and proacl != initprivs;");
+		}
+
+		num_procs = PQntuples(res);
+		dbinfo->proc_arr.nprocs = num_procs;
+		dbinfo->proc_arr.procs = (ProcInfo *) pg_malloc(sizeof(ProcInfo) * num_procs);
+
+		for (rowno = 0; rowno < num_procs; rowno++)
+		{
+			ProcInfo*curr = >proc_arr.procs[rowno];
+			char	   *procsig = PQgetvalue(res, rowno, 0);
+			char	   *procacl = PQgetvalue(res, rowno, 1);
+
+			curr->procsig = pg_strdup(procsig);
+			curr->procacl = pg_strdup(procacl);
+		}
+
+		qsort((void *) dbinfo->proc_arr.procs, dbinfo->proc_arr.nprocs,
+			  sizeof(ProcInfo), 

HashTable KeySize

2019-10-04 Thread Natarajan R
typedef struct HashTableKey
{
  Oid dbId; // 4 bytes
  int64 productid; // 8 bytes
}HashTableKey; (total size - 12 bytes)

typedef struct HashTableEntry
{
  HashTableKey key;
  ProductInfo *pdt;
}HashTableEntry;

HASHCTL hashInfo;
hashInfo.keysize = sizeof(HashTableKey);
hashInfo.entrysize = sizeof(HashTableEntry);
SampleHashTable = ShmemInitHash("productid vs product struct HashTable",
size, size, , HASH_ELEM | HASH_SHARED_MEM | HASH_BLOBS);

while printing keysize: elog(LOG,"Keysize = %d",sizeof(HashTableKey));

I am getting Keysize = 16, How? what should i need to do inorder to have
keysize = 12


Re: WIP/PoC for parallel backup

2019-10-04 Thread Asif Rehman
On Thu, Oct 3, 2019 at 6:40 PM Robert Haas  wrote:

> On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman 
> wrote:
> >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
> list.
> >> > pg_basebackup will then send back a list of filenames in this
> command. This commands will be send by each worker and that worker will be
> getting the said files.
> >>
> >> Seems reasonable, but I think you should just pass one file name and
> >> use the command multiple times, once per file.
> >
> > I considered this approach initially,  however, I adopted the current
> strategy to avoid multiple round trips between the server and clients and
> save on query processing time by issuing a single command rather than
> multiple ones. Further fetching multiple files at once will also aid in
> supporting the tar format by utilising the existing ReceiveTarFile()
> function and will be able to create a tarball for per tablespace per worker.
>
> I think that sending multiple filenames on a line could save some time
> when there are lots of very small files, because then the round-trip
> overhead could be significant.
>
> However, if you've got mostly big files, I think this is going to be a
> loser. It'll be fine if you're able to divide the work exactly evenly,
> but that's pretty hard to do, because some workers may succeed in
> copying the data faster than others for a variety of reasons: some
> data is in memory, some data has to be read from disk, different data
> may need to be read from different disks that run at different speeds,
> not all the network connections may run at the same speed. Remember
> that the backup's not done until the last worker finishes, and so
> there may well be a significant advantage in terms of overall speed in
> putting some energy into making sure that they finish as close to each
> other in time as possible.
>
> To put that another way, the first time all the workers except one get
> done while the last one still has 10GB of data to copy, somebody's
> going to be unhappy.
>

I have updated the patch (see the attached patch) to include tablespace
support, tar format support and all other backup base backup options to
work in parallel mode as well. As previously suggested, I have removed
BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
backup. The tar format will write multiple tar files depending upon the
number of workers specified. Also made all commands
(START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
base_backup_opt_list. This way the command-line options can also be
provided to these commands. Since the command-line options don't change
once the backup initiates, I went this way instead of storing them in
shared state.

The START_BACKUP command will now return a sorted list of files in
descending order based on file sizes. This way, the larger files will be on
top of the list. hence these files will be assigned to workers one by one,
making it so that the larger files will be copied before other files.

Based on my understanding your main concern is that the files won't be
distributed fairly i.e one worker might get a big file and take more time
while others get done early with smaller files? In this approach I have
created a list of files in descending order based on there sizes so all the
big size files will come at the top. The maximum file size in PG is 1GB so
if we have four workers who are picking up file from the list one by one,
the worst case scenario is that one worker gets a file of 1GB to process
while others get files of smaller size. However with this approach of
descending files based on size and handing it out to workers one by one,
there is a very high likelihood of workers getting work evenly. does this
address your concerns?

Furthermore the patch also includes the regression test. As t/
010_pg_basebackup.pl test-case is testing base backup comprehensively, so I
have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel
option in all of its tests, to make sure parallel mode works expectantly.
The one thing that differs from base backup is the file checksum reporting.
In parallel mode, the total number of checksum failures are not reported
correctly however it will abort the backup whenever a checksum failure
occurs. This is because processes are not maintaining any shared state. I
assume that it's not much important to report total number of failures vs
noticing the failure and aborting.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


0001-parallel-backup.patch
Description: Binary data


Re: [HACKERS] Block level parallel vacuum

2019-10-04 Thread Amit Kapila
On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada 
wrote:

> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila 
> wrote:
> >
> > *
> > +end_parallel_vacuum(LVParallelState *lps, Relation *Irel, int nindexes)
> > {
> > ..
> > + /* Shutdown worker processes and destroy the parallel context */
> > + WaitForParallelWorkersToFinish(lps->pcxt);
> > ..
> > }
> >
> > Do we really need to call WaitForParallelWorkersToFinish here as it
> > must have been called in lazy_parallel_vacuum_or_cleanup_indexes
> > before this time?
>
> No, removed.
>

+ /* Shutdown worker processes and destroy the parallel context */
+ DestroyParallelContext(lps->pcxt);

But you forget to update the comment.

Few more comments:

1.
+/*
+ * Parallel Index vacuuming and index cleanup routine used by both the
leader
+ * process and worker processes. Unlike single process vacuum, we don't
update
+ * index statistics after cleanup index since it is not allowed during
+ * parallel mode, instead copy index bulk-deletion results from the local
+ * memory to the DSM segment and update them at the end of parallel lazy
+ * vacuum.
+ */
+static void
+do_parallel_vacuum_or_cleanup_indexes(Relation *Irel, int nindexes,
+  IndexBulkDeleteResult **stats,
+  LVShared *lvshared,
+  LVDeadTuples *dead_tuples)
+{
+ /* Loop until all indexes are vacuumed */
+ for (;;)
+ {
+ int idx;
+
+ /* Get an index number to process */
+ idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
+
+ /* Done for all indexes? */
+ if (idx >= nindexes)
+ break;
+
+ /*
+ * Update the pointer to the corresponding bulk-deletion result
+ * if someone has already updated it.
+ */
+ if (lvshared->indstats[idx].updated &&
+ stats[idx] == NULL)
+ stats[idx] = &(lvshared->indstats[idx].stats);
+
+ /* Do vacuum or cleanup one index */
+ if (!lvshared->for_cleanup)
+ lazy_vacuum_index(Irel[idx], [idx], dead_tuples,
+  lvshared->reltuples);
+ else
+ lazy_cleanup_index(Irel[idx], [idx], lvshared->reltuples,
+   lvshared->estimated_count);

It seems we always run index cleanup via parallel worker which seems
overkill because the cleanup index generally scans the index only when
bulkdelete was not performed.  In some cases like for hash index, it
doesn't do anything even bulk delete is not called.  OTOH, for brin index,
it does the main job during cleanup but we might be able to always allow
index cleanup by parallel worker for brin indexes if we remove the
allocation in brinbulkdelete which I am not sure is of any use.

I think we shouldn't call cleanup via parallel worker unless bulkdelete
hasn't been performed on the index.

2.
- for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
-  [i],
-  vacrelstats);
+ lazy_vacuum_or_cleanup_indexes(vacrelstats, Irel, nindexes,
+   indstats, lps, false);

Indentation is not proper.  You might want to run pgindent.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: dropdb --force

2019-10-04 Thread vignesh C
On Thu, Oct 3, 2019 at 11:18 PM vignesh C  wrote:
>
> On Wed, Oct 2, 2019 at 10:21 PM Pavel Stehule  wrote:
> >
> > Thank you for careful review. I hope so all issues are out.
> >
> >
> Thanks Pavel for fixing the comments.
> Few comments:
> The else part cannot be hit in DropDatabase function as gram.y expects FORCE.
> +
> + if (strcmp(opt->defname, "force") == 0)
> + force = true;
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("unrecognized DROP DATABASE option \"%s\"", opt->defname),
> + parser_errposition(pstate, opt->location)));
> + }
> +
>
> We should change gram.y to accept any keyword and throw error from
> DropDatabase function.
> + */
> +drop_option: FORCE
> + {
> + $$ = makeDefElem("force", NULL, @1);
> + }
> + ;
>
> "This will also fail" should be "This will fail"
> + 
> +  This will fail, if current user has no permissions to terminate other
> +  connections. Required permissions are the same as with
> +  pg_terminate_backend, described
> +  in .
> +
> +  This will also fail if we are not able to terminate connections or
> +  when there are active prepared transactions or active logical 
> replication
> +  slots.
> + 
>
> Can we add few tests for this feature.
>
Couple of minor comments:
DBNAME can be included
- * DROP DATABASE [ IF EXISTS ]
+ * DROP DATABASE [ IF EXISTS ] [ [ WITH ] ( options ) ]
can be
- * DROP DATABASE [ IF EXISTS ]
+ * DROP DATABASE [ IF EXISTS ] DBNAME [ [ WITH ] ( options ) ]

Should we include dbname in the below also?
+DROP DATABASE [ IF EXISTS ] name [ [ WITH ] ( option [, ...] ) ]
+
+where option can
be one of:

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Attempt to consolidate reading of XLOG page

2019-10-04 Thread Antonin Houska
This is the next version.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From 01f5cc8b0c1e6133e16242ec99957a78551058a7 Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Fri, 4 Oct 2019 12:07:22 +0200
Subject: [PATCH 1/2] Use only xlogreader.c:XLogRead()

The implementations in xlogutils.c and walsender.c are just renamed now, to be
removed by the following diff.
---
 src/backend/access/transam/xlogreader.c | 141 ++-
 src/backend/access/transam/xlogutils.c  |  61 --
 src/backend/replication/walsender.c | 143 +++-
 src/bin/pg_waldump/pg_waldump.c |  60 +-
 src/include/access/xlogreader.h |  42 +++
 5 files changed, 428 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index c8b0d2303d..3e2167ca5a 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -17,6 +17,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/transam.h"
 #include "access/xlogrecord.h"
 #include "access/xlog_internal.h"
@@ -27,6 +29,7 @@
 
 #ifndef FRONTEND
 #include "miscadmin.h"
+#include "pgstat.h"
 #include "utils/memutils.h"
 #endif
 
@@ -626,7 +629,13 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	if (!XLogReaderValidatePageHeader(state, pageptr, (char *) hdr))
 		goto err;
 
-	/* update read state information */
+	/*
+	 * Update read state information.
+	 *
+	 * Note that XLogRead(), if used, should have updated the "seg" too for
+	 * its own reasons, however we cannot rely on ->read_page() to call
+	 * XLogRead().
+	 */
 	state->seg.ws_segno = targetSegNo;
 	state->seg.ws_off = targetPageOff;
 	state->readLen = readLen;
@@ -1016,6 +1025,136 @@ out:
 
 #endif			/* FRONTEND */
 
+/*
+ * Read 'count' bytes from WAL fetched from timeline 'tli' into 'buf',
+ * starting at location 'startptr'. 'seg' is the last segment used,
+ * 'openSegment' is a callback to opens the next segment if needed and
+ * 'segcxt' is additional segment info that does not fit into 'seg'.
+ *
+ * 'errinfo' should point to XLogReadError structure which will receive error
+ * details in case the read fails.
+ *
+ * Returns true if succeeded, false if failed.
+ *
+ * XXX probably this should be improved to suck data directly from the
+ * WAL buffers when possible.
+ */
+bool
+XLogRead(char *buf, XLogRecPtr startptr, Size count, TimeLineID tli,
+		 WALOpenSegment *seg, WALSegmentContext *segcxt,
+		 WALSegmentOpen openSegment, XLogReadError *errinfo)
+{
+	char	   *p;
+	XLogRecPtr	recptr;
+	Size		nbytes;
+
+	p = buf;
+	recptr = startptr;
+	nbytes = count;
+
+	while (nbytes > 0)
+	{
+		int			segbytes;
+		int			readbytes;
+
+		seg->ws_off = XLogSegmentOffset(recptr, segcxt->ws_segsize);
+
+		if (seg->ws_file < 0 ||
+			!XLByteInSeg(recptr, seg->ws_segno, segcxt->ws_segsize) ||
+			tli != seg->ws_tli)
+		{
+			XLogSegNo	nextSegNo;
+
+			/* Switch to another logfile segment */
+			if (seg->ws_file >= 0)
+close(seg->ws_file);
+
+			XLByteToSeg(recptr, nextSegNo, segcxt->ws_segsize);
+
+			/* Open the next segment in the caller's way. */
+			openSegment(nextSegNo, segcxt, , >ws_file);
+
+			/* Update the current segment info. */
+			seg->ws_tli = tli;
+			seg->ws_segno = nextSegNo;
+			seg->ws_off = 0;
+		}
+
+		/* How many bytes are within this segment? */
+		if (nbytes > (segcxt->ws_segsize - seg->ws_off))
+			segbytes = segcxt->ws_segsize - seg->ws_off;
+		else
+			segbytes = nbytes;
+
+#ifndef FRONTEND
+		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
+#endif
+
+		/*
+		 * Failure to read the data does not necessarily imply non-zero errno.
+		 * Set it to zero so that caller can distinguish the failure that does
+		 * not affect errno.
+		 */
+		errno = 0;
+
+		readbytes = pg_pread(seg->ws_file, p, segbytes, seg->ws_off);
+
+#ifndef FRONTEND
+		pgstat_report_wait_end();
+#endif
+
+		if (readbytes <= 0)
+		{
+			errinfo->read_errno = errno;
+			errinfo->readbytes = readbytes;
+			errinfo->reqbytes = segbytes;
+			errinfo->seg = seg;
+			return false;
+		}
+
+		/* Update state for read */
+		recptr += readbytes;
+		nbytes -= readbytes;
+		p += readbytes;
+
+		/* Update the current segment info. */
+		seg->ws_off += readbytes;
+	}
+
+	return true;
+}
+
+#ifndef FRONTEND
+/*
+ * Backend-specific convenience code to handle read errors encountered by
+ * XLogRead().
+ */
+void
+XLogReadProcessError(XLogReadError *errinfo)
+{
+	WALOpenSegment *seg = errinfo->seg;
+
+	if (errinfo->readbytes < 0)
+	{
+		errno = errinfo->read_errno;
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not read from log segment %s, offset %u, length %zu: %m",
+		XLogFileNameP(seg->ws_tli, seg->ws_segno),
+		seg->ws_off, (Size) errinfo->reqbytes)));
+	}
+	else
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("could not read from log segment %s, offset %u: length %zu",
+		

Re: [HACKERS] Block level parallel vacuum

2019-10-04 Thread Dilip Kumar
On Fri, Oct 4, 2019 at 11:01 AM Amit Kapila  wrote:
>
> On Fri, Oct 4, 2019 at 10:28 AM Masahiko Sawada  wrote:
>>
Some more comments..
1.
+ for (idx = 0; idx < nindexes; idx++)
+ {
+ if (!for_cleanup)
+ lazy_vacuum_index(Irel[idx], [idx], vacrelstats->dead_tuples,
+   vacrelstats->old_live_tuples);
+ else
+ {
+ /* Cleanup one index and update index statistics */
+ lazy_cleanup_index(Irel[idx], [idx], vacrelstats->new_rel_tuples,
+vacrelstats->tupcount_pages < vacrelstats->rel_pages);
+
+ lazy_update_index_statistics(Irel[idx], stats[idx]);
+
+ if (stats[idx])
+ pfree(stats[idx]);
+ }

I think instead of checking for_cleanup variable for every index of
the loop we better move loop inside, like shown below?

if (!for_cleanup)
for (idx = 0; idx < nindexes; idx++)
lazy_vacuum_index(Irel[idx], [idx], vacrelstats->dead_tuples,
else
for (idx = 0; idx < nindexes; idx++)
{
lazy_cleanup_index
lazy_update_index_statistics
...
}

2.
+static void
+lazy_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats, Relation *Irel,
+int nindexes, IndexBulkDeleteResult **stats,
+LVParallelState *lps, bool for_cleanup)
+{
+ int idx;
+
+ Assert(!IsParallelWorker());
+
+ /* no job if the table has no index */
+ if (nindexes <= 0)
+ return;

Wouldn't it be good idea to call this function only if nindexes > 0?

3.
+/*
+ * Vacuum or cleanup indexes with parallel workers. This function must be used
+ * by the parallel vacuum leader process.
+ */
+static void
+lazy_parallel_vacuum_or_cleanup_indexes(LVRelStats *vacrelstats,
Relation *Irel,
+ int nindexes, IndexBulkDeleteResult **stats,
+ LVParallelState *lps, bool for_cleanup)

If you see this function there is no much common code between
for_cleanup and without for_cleanup except these 3-4 statement.
LaunchParallelWorkers(lps->pcxt);
/* Create the log message to report */
initStringInfo();
...
/* Wait for all vacuum workers to finish */
WaitForParallelWorkersToFinish(lps->pcxt);

Other than that you have got a lot of checks like this
+ if (!for_cleanup)
+ {
+ }
+ else
+ {
}

I think code would be much redable if we have 2 functions one for
vaccum (lazy_parallel_vacuum_indexes) and another for
cleanup(lazy_parallel_cleanup_indexes).

4.
 * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * Lazy vacuum supports parallel execution with parallel worker processes. In
+ * parallel lazy vacuum, we perform both index vacuuming and index cleanup with
+ * parallel worker processes. Individual indexes are processed by one vacuum

Spacing after the "." is not uniform, previous comment is using 2
space and newly
added is using 1 space.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pgsql: Remove pqsignal() from libpq's official exports list.

2019-10-04 Thread Christoph Berg
Re: Tom Lane 2018-09-28 
> Remove pqsignal() from libpq's official exports list.
> 
> Client applications should get this function, if they need it, from
> libpgport.
> 
> The fact that it's exported from libpq is a hack left over from before
> we set up libpgport.  It's never been documented, and there's no good
> reason for non-PG code to be calling it anyway, so hopefully this won't
> cause any problems.  Moreover, with the previous setup it was not real
> clear whether our clients that use the function were getting it from
> libpgport or libpq, so this might actually prevent problems.
> 
> The reason for changing it now is that in the wake of commit ea53100d5,
> some linkers won't export the symbol, apparently because it's coming from
> a .a library instead of a .o file.  We could get around that by continuing
> to symlink pqsignal.c into libpq as before; but unless somebody complains
> very hard, I don't want to adopt such a kluge.

This is starting to hurt in several places:

04 11:41  mha@xindi:~$ psql
04 11:41  /usr/lib/postgresql/9.2/bin/psql: symbol lookup error:
   /usr/lib/postgresql/9.2/bin/psql: undefined symbol: pqsignal

pg_repack linked against libpq5 11 breaks with libpq5 12:

--- 
/tmp/autopkgtest-lxc.evololie/downtmp/build.XvF/src/regress/expected/repack-run.out
 2018-10-18 11:30:46.0 +
+++ 
/tmp/autopkgtest-lxc.evololie/downtmp/build.XvF/src/regress/results/repack-run.out
  2019-10-03 21:24:29.049576631 +
@@ -2,19 +2,8 @@
 -- do repack
 --
 \! pg_repack --dbname=contrib_regression --table=tbl_cluster
-INFO: repacking table "public.tbl_cluster"
+pg_repack: symbol lookup error: pg_repack: undefined symbol: pqsignal

https://ci.debian.net/data/autopkgtest/testing/amd64/p/pg-repack/3070831/log.gz

Christoph




Shared memory

2019-10-04 Thread Natarajan R
Why postgres not providing freeing shared memory?


Re: Standby accepts recovery_target_timeline setting?

2019-10-04 Thread Michael Paquier
On Wed, Oct 02, 2019 at 03:30:38AM -0400, Stephen Frost wrote:
> * David Steele (da...@pgmasters.net) wrote:
>> On 9/28/19 1:26 PM, Fujii Masao wrote:
>>> There might be some recovery parameters that we can safely use
>>> even in crash recovery, e.g., maybe recovery_end_command
>>> (now, you can see that recovery_end_command is executed in
>>> crash recovery in v12). But at this stage of v12, it's worth thinking to
>>> just cause crash recovery to exit with an error when any recovery
>>> parameter is set. Thought?
>> 
>> I dislike the idea of crash recovery throwing fatal errors because there
>> are recovery settings in postgresql.auto.conf.  Since there is no
>> defined mechanism for cleaning out old recovery settings we have to
>> assume that they will persist (and accumulate) more or less forever.

Yeah, I don't think that's a good thing either.  That's a recipe to
make the user experience more confusing.

>>> Or if that change is overkill, alternatively we can make crash recovery
>>> "ignore" any recovery parameters, e.g., by forcibly disabling
>>> the parameters.
>> 
>> I'd rather load recovery settings *only* if recovery.signal or
>> standby.signal is present and do this only after crash recovery is
>> complete, i.e. in the absence of backup_label.
>> 
>> I think blindly loading recovery settings then trying to ignore them
>> later is pretty much why we are having these issues in the first place.
>>  I'd rather not extend that pattern if possible.
> 
> Agreed.

That would mean that you need to create some new special handling,
while making sure that the process reading the parameters is not
confused either by default values.  It seems to me that if we are not
in recovery, the best thing was can do now is just to not process
anything those parameters trigger, instead of "ignoring" these (you
are referring to use SetConfigOption in the startup process here?).
--
Michael


signature.asc
Description: PGP signature


Re: refactoring - share str2*int64 functions

2019-10-04 Thread Ashutosh Sharma
Is there any specific reason for hard coding the *base* of a number
representing the string in strtouint64(). I understand that currently
strtouint64() is being used just to convert an input string to decimal
unsigned value but what if we want it to be used for hexadecimal
values or may be some other values, in that case it can't be used.
Further, the function name is strtouint64() but the comments atop it's
definition says it's pg_strtouint64(). That needs to be corrected.

At few places, I could see that the function call to
pg_strtoint32_check() is followed by an error handling. Isn't that
already being done in pg_strtoint32_check function itself. For e.g. in
refint.c the function call to pg_strtoint32_check is followed by a if
condition that checks for an error which I assume shouldn't be there
as it is already being done by pg_strtoint32_check.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Wed, Sep 18, 2019 at 6:43 AM Michael Paquier  wrote:
>
> On Tue, Sep 17, 2019 at 11:29:13AM +0900, Michael Paquier wrote:
> > In order to unify the parsing interface and put all the conversion
> > routines in a single place, I still think that the patch has value so
> > I would still keep it (with a fix for the queryId parsing of course),
> > but there is much more to it.
>
> As of now, here is an updated patch which takes the path to not
> complicate the refactored APIs and fixes the issue with queryID in
> readfuncs.c.  Thoughts?
> --
> Michael




Re: dropping column prevented due to inherited index

2019-10-04 Thread Michael Paquier
On Thu, Oct 03, 2019 at 09:18:12AM -0300, Alvaro Herrera wrote:
> Hmm.  I wonder if we shouldn't adopt the coding pattern we've used
> elsewhere of collecting all columns to be dropped first into an
> ObjectAddresses array, then use performMultipleDeletions.

+1.  That's the common pattern these days, because that's more
performant.  I think that the patch should have regression tests.
--
Michael


signature.asc
Description: PGP signature


Re: Include RELKIND_TOASTVALUE in get_relkind_objtype

2019-10-04 Thread Michael Paquier
On Thu, Oct 03, 2019 at 09:52:34AM -0400, Tom Lane wrote:
> FWIW, I really dislike this patch, mainly because it is based on the 
> assumption (as John said) that get_relkind_objtype is used only
> in aclcheck_error calls.  However it's not obvious why that should
> be true, and there certainly is no documentation suggesting that
> it needs to be true.  That's mainly because get_relkind_objtype has no
> documentation period, which if you ask me is flat out unacceptable
> for a globally-exposed function.  (Same comment about its wrapper
> get_object_type.)

Yes, I agree that the expectations that the caller of this function
can have are hard to guess.  So we could tackle this occasion to add
more comments.  I could try to come up with a better patch.  Or
perhaps you have already your mind on it?

> The patch also falsifies the comment just a few lines away that
> 
> /*
>  * other relkinds are not supported here because they don't map to
>  * OBJECT_* values
>  */
> 
> without doing anything about that.

That's actually what I was referring to in my previous email.

> I'm inclined to think that we should redefine the charter of
> get_relkind_objtype/get_object_type to be that they'll produce
> some OBJECT_* value for any relkind whatever, on the grounds
> that throwing an error here isn't a particularly useful behavior;
> we'd rather come out with a possibly-slightly-inaccurate generic
> message about a "table".  And they need to be documented that way.

This is tempting.

> Alternatively, instead of mapping other relkinds to OBJECT_TABLE,
> we could invent a new enum entry OBJECT_RELATION.  There's precedent
> for that in OBJECT_ROUTINE ... but I don't know that we want to
> build out all the other infrastructure for a new ObjectType right now.

I am too lazy to check the thread that led to 8b9e964, but I recall
that Peter wanted to get rid of OBJECT_RELATION because that's
confusing as that's not an purely exclusive object type, and it mapped
with other object types.
--
Michael


signature.asc
Description: PGP signature


Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-10-04 Thread Michael Paquier
On Thu, Oct 03, 2019 at 12:43:37PM +0300, Alexey Kondratov wrote:
> On 03.10.2019 6:07, Michael Paquier wrote:
>> I have reworked your first patch as per the attached.  What do you
>> think about it?  The part with the control file needs to go down to
>> v12, and I would likely split that into two commits on HEAD: one for
>> the control file and a second for the recovery.conf portion with the
>> fix for --no-ensure-shutdown to keep a cleaner history.
> 
> It looks fine for me excepting the progress reporting part. It now adds
> PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control file
> is either included into filemap and fetch_size or counted during
> calculate_totals(). Maybe I've missed something, but now it looks like we
> report something that wasn't planned for progress reporting, doesn't
> it?

Right.  The pre-12 code actually handles that incorrecly as it assumed
that any files written through file_ops.c should be part of the
progress.  So I went with the simplest solution, and backpatched this
part with 6f3823b.  I have also committed the set of fixes for the new
options so as we have a better base of work than what's on HEAD
currently.

>> +   # Check that incompatible options error out.
>> +   command_fails(
>> +   [
>> +   'pg_rewind', "--debug",
>> +   "--source-pgdata=$standby_pgdata",
>> +   "--target-pgdata=$master_pgdata", "-R",
>> +   "--no-ensure-shutdown"
>> +   ],
>> +   'pg_rewind local with -R');
>> Incompatible options had better be checked within a separate perl
>> script?  We generally do that for the other binaries.
> 
> Yes, it makes sense. I've reworked the patch with tests and added a couple
> of extra cases.

Regarding the tests, adding a --dry-run command is a good idea.
However I think that there is more value to automate the use of the
single user mode automatically in the tests as that's more critical
from the point of view of rewind run, and stopping the cluster with
immediate mode causes, as expected, the next --dry-run command to
fail.

Another thing is that I think that we should use -F with --single.
This makes recovery faster, and the target data folder is synced
at the end of pg_rewind anyway.

Using the long option names makes the tests easier to follow in this
case, so I have switched -R to --write-recovery-conf.

Some comments and the docs have been using some confusing wording, so
I have reworked what I found (like many "it" in a single sentence
referring different things).

+command_fails(
+[
+'pg_rewind', "--debug",
+"--source-pgdata=$standby_pgdata",
+"--target-pgdata=$master_pgdata",
+"--no-ensure-shutdown"
+],
+'pg_rewind local without source shutdown');
Regarding all the set of incompatible options, we have much more of
that after the initial option parsing so I think that we should group
all the cheap ones together.  Let's tackle that as a separate patch.
We can also just check after --no-ensure-shutdown directly in
RewindTest.pm as I have switched the cluster to not be cleanly shut
down anymore to stress the automatic recovery path, and trigger that
before running pg_rewind for the local and remote mode.  

Attached is an updated patch with all I found.  What do you think?
--
Michael
diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index fbf454803b..42d29edd4e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -169,12 +169,14 @@ PostgreSQL documentation
   --no-ensure-shutdown
   

-pg_rewind verifies that the target server
-is cleanly shutdown before rewinding; by default, if it isn't, it
-starts the server in single-user mode to complete crash recovery.
+pg_rewind requires that the target server
+is cleanly shut down before rewinding. By default, if the target server
+is not shut down cleanly, pg_rewind starts
+the target server in single-user mode to complete crash recovery first,
+and stops it.
 By passing this option, pg_rewind skips
 this and errors out immediately if the server is not cleanly shut
-down.  Users are expected to handle the situation themselves in that
+down. Users are expected to handle the situation themselves in that
 case.

   
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index fe1468b771..875a43b219 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -270,11 +270,12 @@ main(int argc, char **argv)
 	pg_free(buffer);
 
 	/*
-	 * If the target instance was not cleanly shut down, run a single-user
-	 * postgres session really quickly and reload the control file to get the
-	 * new state. Note if no_ensure_shutdown is 

Re: Memory Accounting

2019-10-04 Thread Tomas Vondra

On Fri, Oct 04, 2019 at 12:36:01AM -0400, Tom Lane wrote:

So ... why exactly did this patch define MemoryContextData.mem_allocated
as int64?  That seems to me to be doubly wrong: it is not the right width
on 32-bit machines, and it is not the right signedness anywhere.  I think
that field ought to be of type Size (a/k/a size_t, but memnodes.h always
calls it Size).



Yeah, I think that's an oversight. Maybe there's a reason why Jeff used
int64, but I can't think of any.


I let this pass when the patch went in, but now I'm on the warpath
about it, because since c477f3e449 went in, some of the 32-bit buildfarm
members are failing with

2019-10-04 00:41:56.569 CEST [66916:86] pg_regress/_int LOG:  statement: CREATE 
INDEX text_idx on test__int using gist ( a gist__int_ops );
TRAP: FailedAssertion("total_allocated == context->mem_allocated", File: 
"aset.c", Line: 1533)
2019-10-04 00:42:25.505 CEST [63836:11] LOG:  server process (PID 66916) was 
terminated by signal 6: Abort trap
2019-10-04 00:42:25.505 CEST [63836:12] DETAIL:  Failed process was running: 
CREATE INDEX text_idx on test__int using gist ( a gist__int_ops );

What I think is happening is that c477f3e449 allowed this bit in
AllocSetRealloc:

context->mem_allocated += blksize - oldblksize;

to be executed in situations where blksize < oldblksize, where before
that was not possible.  Of course blksize and oldblksize are of type
Size, hence unsigned, so the subtraction result underflows in this
case.  If mem_allocated is of the same width as Size then this does
not matter because the final result wraps around to the proper value,
but if we're going to allow mem_allocated to be wider than Size then
we will need more logic here to add or subtract as appropriate.

(I'm not quite sure why we're not seeing this failure on *all* the
32-bit machines; maybe there's some other factor involved?)



Interesting failure mode (especially that it does *not* fail on some
32-bit machines).


I see no value in defining mem_allocated to be wider than Size.
Yes, the C standard contemplates the possibility that the total
available address space is larger than the largest chunk you can
ever ask malloc for; but nobody has built a platform like that in
this century, and they sucked to program on back in the dark ages
when they did exist.  (I speak from experience.)  I do not think
we need to design Postgres to allow for that.

Likewise, there's no evident value in allowing mem_allocated
to be negative.

I haven't chased down exactly what else would need to change.
It might be that s/int64/Size/g throughout the patch is
sufficient, but I haven't analyzed it.



I think so too, but I'll take a closer look in the afternoon, unless you
beat me to it.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Amit Kapila
On Fri, Oct 4, 2019 at 12:10 PM Smith, Peter 
wrote:

> From: Tom Lane  Sent: Friday, 4 October 2019 2:08 PM
>
> >> #define INIT_ALL_ELEMS_ZERO  {0}
> >> #define INIT_ALL_ELEMS_FALSE {false}
>
> >I would say that's 100% wrong.  The entire point here is that it's
> memset-equivalent, and therefore writes zeroes, regardless of what the
> datatype is.
>
> I agree it is memset-equivalent.
>
> All examples of the memset code that INIT_ALL_ELEMS_ZERO replaces looked
> like this:
> memset(values, 0, sizeof(values));
>
> Most examples of the memset code that INIT_ALL_ELEMS_FALSE replaces looked
> like this:
> memset(nulls, false, sizeof(nulls));
>
> ~
>
> I made the 2nd macro because I anticipate the same folk that don't like
> setting 0 to a bool will also not like setting something called
> INIT_ALL_ELEMS_ZERO to a bool array.
>
> How about I just define them both the same?
> #define INIT_ALL_ELEMS_ZERO {0}
> #define INIT_ALL_ELEMS_FALSE{0}
>
>
I think using one define would be preferred, but you can wait and see if
others prefer defining different macros for the same thing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


RE: Proposal: Make use of C99 designated initialisers for nulls/values arrays

2019-10-04 Thread Smith, Peter
From: Tom Lane  Sent: Friday, 4 October 2019 2:08 PM

>> #define INIT_ALL_ELEMS_ZERO  {0}
>> #define INIT_ALL_ELEMS_FALSE {false}

>I would say that's 100% wrong.  The entire point here is that it's 
>memset-equivalent, and therefore writes zeroes, regardless of what the 
>datatype is.  

I agree it is memset-equivalent.

All examples of the memset code that INIT_ALL_ELEMS_ZERO replaces looked like 
this:
memset(values, 0, sizeof(values));

Most examples of the memset code that INIT_ALL_ELEMS_FALSE replaces looked like 
this:
memset(nulls, false, sizeof(nulls));

~

I made the 2nd macro because I anticipate the same folk that don't like setting 
0 to a bool will also not like setting something called INIT_ALL_ELEMS_ZERO to 
a bool array.

How about I just define them both the same?
#define INIT_ALL_ELEMS_ZERO {0}
#define INIT_ALL_ELEMS_FALSE{0}

>As a counterexample, the coding you have above looks a lot like it would work 
>to add
>
>#define INIT_ALL_ELEMS_TRUE{true}
> which as previously noted will *not* work.  So I think the one-size-fits-all 
> approach is what to use.

I agree it looks that way; in my previous email I should have provided more 
context to the code.
Below is the full fragment of the last shared patch, which included a note to 
prevent anybody from doing such a thing.

~~

/*
 * Macros for C99 designated-initialiser syntax to set all array elements to 
0/false.
 *
 * Use these macros in preference to explicit {0} syntax to avoid giving a 
misleading
 * impression that the same value is always used for all elements.
 * e.g.
 * bool foo[2] = {false}; // sets both elements false
 * bool foo[2] = {true}; // does NOT set both elements true
 *
 * Reference: C99 [$6.7.8/21] If there are fewer initializers in a 
brace-enclosed list than there
 * are elements or members of an aggregate, or fewer characters in a string 
literal used to
 * initialize an array of known size than there are elements in the array, the 
remainder of the
 * aggregate shall be initialized implicitly the same as objects that have 
static storage duration
 */
#define INIT_ALL_ELEMS_ZERO {0}
#define INIT_ALL_ELEMS_FALSE{false} 

~~


Kind Regards,
--
Peter Smith
Fujitsu Australia