Re: adding partitioned tables to publications

2020-01-28 Thread Amit Langote
On Tue, Jan 28, 2020 at 6:11 PM Peter Eisentraut
 wrote:
> This structure looks good now.

Thanks for taking a look.

> However, it does seem unfortunate that in pg_get_publication_tables() we
> need to postprocess the result of GetPublicationRelations().  Since
> we're already changing the API of GetPublicationRelations(), couldn't we
> also make it optionally not include partitioned tables?

Hmm, okay.  We really need GetPublicationRelations() to handle
partitioned tables in 3 ways:

1. Don't expand and return them as-is
2. Expand and return only leaf partitions
3. Expand and return all partitions

I will try that in the new patch.

> For the test, perhaps add test cases where partitions are attached and
> detached so that we can see whether their publication relcache
> information is properly updated.  (I'm not doubting that it works, but
> it would be good to have a test for, in case of future restructuring.)

Okay, I will add some to publication.sql.

Will send updated patches after addressing Rafia's comments.

Thanks,
Amit




Re: Add %x to PROMPT1 and PROMPT2

2020-01-28 Thread Fabien COELHO


Hello Vik,


Isn't there examples in the documentation which use the default prompts?

If so, should they be updated accordingly?


Good catch!
I thought about the documentation but not the examples therein.

Updated patch attached.


Ok.

Only one transaction prompt example in the whole documentation:-(
No tests is troubled by the change:-(
Sigh…

Patch applies and compiles cleanly, global and psql make check ok.

Doc build ok.

Works for me.

I'd be in favor of adding a non trivial session example in psql 
documentation at the end of the prompt stuff section, something like:


BEGIN;
CREATE TABLE
  Word(data TEXT PRIMARY KEY);
COPY Word(data) FROM STDIN;
hello
\.
SELECT 2+;
ROLLBACK;

but this is not necessary for this patch.

--
Fabien.

Re: [PATCH] Windows port, fix some resources leaks

2020-01-28 Thread Michael Paquier
On Tue, Jan 28, 2020 at 04:11:47PM -0500, Robert Haas wrote:
> On Tue, Jan 28, 2020 at 4:06 PM Alvaro Herrera  
> wrote:
>> I don't think we have ever expressed it as such, but certainly we prefer
>> postmaster to be super robust ... rather live with a some hundred bytes
>> leak rather than have it die and take the whole database service down
>> for what's essentially a fringe bug that has bothered no one in a decade
>> and a half.
> 
> Well, yeah. I mean, I'm not saying it's a good idea in this instance
> to FATAL here. I'm just saying that I don't think there is a general
> rule that code which does FATAL in the postmaster is automatically
> wrong, which is what I took Michael to be suggesting.

Re-reading the thread, I can see your point that my previous email may
read like a rule applying to the postmaster, so sorry for the
confusion.

Anyway, I was referring to the point mentioned in three places of
pgwin32_ReserveSharedMemoryRegion() to not use FATAL for this
routine.  The issue with the order of DLL loading is hard to miss..
--
Michael


signature.asc
Description: PGP signature


Re: adding partitioned tables to publications

2020-01-28 Thread Rafia Sabih
Hi Amit,

Once again I went through this patch set and here are my few comments,

On Thu, 23 Jan 2020 at 11:10, Amit Langote  wrote:
>
> On Wed, Jan 22, 2020 at 2:38 PM Amit Langote  wrote:
> > Other than that, the updated patch contains following significant changes:
> >
> > * Changed pg_publication.c: GetPublicationRelations() so that any
> > published partitioned tables are expanded as needed
> >
> > * Since the pg_publication_tables view is backed by
> > GetPublicationRelations(), that means subscriptioncmds.c:
> > fetch_table_list() no longer needs to craft a query to include
> > partitions when needed, because partitions are included at source.
> > That seems better, because it allows to limit the complexity
> > surrounding publication of partitioned tables to the publication side.
> >
> > * Fixed the publication table DDL to spot more cases of tables being
> > added to a publication in a duplicative manner.  For example,
> > partition being added to a publication which already contains its
> > ancestor and a partitioned tables being added to a publication
> > (implying all of its partitions are added) which already contains a
> > partition
>
> On second thought, this seems like an overkill.  It might be OK after
> all for both a partitioned table and its partitions to be explicitly
> added to a publication without complaining of duplication. IOW, it's
> the user's call whether it makes sense to do that or not.
>
> > Only attaching 0001.
>
> Attached updated 0001 considering the above and the rest of the
> patches that add support for replicating partitioned tables using
> their own identity and schema.  I have reorganized the other patches
> as follows:
>
> 0002: refactoring of logical/worker.c without any functionality
> changes (contains much less churn than in earlier versions)
>
> 0003: support logical replication into partitioned tables on the
> subscription side (allows replicating from a non-partitioned table on
> publisher node into a partitioned table on subscriber node)
>
> 0004: support optionally replicating partitioned table changes (and
> changes directly made to partitions) using root partitioned table
> identity and schema

+ cannot replicate from a regular table into a partitioned able or vice
Here is a missing t from table.

+ 
+  When a partitioned table is added to a publication, all of its existing
+  and future partitions are also implicitly considered to be part of the
+  publication.  So, even operations that are performed directly on a
+  partition are also published via its ancestors' publications.

Now this is confusing, does it mean that when partitions are later
added to the table they will be replicated too, I think not, because
you need to first create them manually at the replication side, isn't
it...?

+ /* Must be a regular or partitioned table */
+ if (RelationGetForm(targetrel)->relkind != RELKIND_RELATION &&
+ RelationGetForm(targetrel)->relkind != RELKIND_PARTITIONED_TABLE)
  ereport(ERROR,
  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("\"%s\" is not a table",

IMHO the error message and details should be modified here to
something along the lines of 'is neither a regular or partitioned
table'

+ * published via an ancestor and when a partitioned tables's partitions
tables's --> tables'

+ if (get_rel_relispartition(relid))
+ {
+ List*ancestors = get_partition_ancestors(relid);

Now, this is just for my understanding, why the ancestors have to be a
list, I always assumed that a partition could only have one ancestor
-- the root table. Is there something more to it that I am totally
missing here or is it to cover the scenario of having partitions of
partitions.

Here I also want to clarify one thing, does it also happen like if a
partitioned table is dropped from a publication then all its
partitions are also implicitly dropped? As far as my understanding
goes that doesn't happen, so shouldn't there be some notice about it.

-GetPublicationRelations(Oid pubid)
+GetPublicationRelations(Oid pubid, bool include_partitions)

How about having an enum here with INCLUDE_PARTITIONS,
INCLUDE_PARTITIONED_REL, and SKIP_PARTITIONS to address the three
possibilities and avoiding reiterating through the list in
pg_get_publication_tables().

-- 
Regards,
Rafia Sabih




Re: Physical replication slot advance is not persistent

2020-01-28 Thread Michael Paquier
On Tue, Jan 28, 2020 at 06:06:06PM +0300, Alexey Kondratov wrote:
> On 28.01.2020 15:14, Kyotaro Horiguchi wrote:
>> I agree not to save slots immediately. The code is wrtten as described
>> above. The TAP test is correct.
> 
> +1, removing this broken saving code path from pg_replication_slot_advance
> and marking slot as dirty looks good to me. It solves the issue and does not
> add any unnecessary complexity.

Ok, good.  So I am seeing no objections on that part :D

>> But the doc part looks a bit too detailed to me. Couldn't we explain
>> that without the word 'dirty'?
>> 
>> -and it will not be moved beyond the current insert location.  
>> Returns
>> -name of the slot and real position to which it was advanced to.
>> +and it will not be moved beyond the current insert location. Returns
>> +name of the slot and real position to which it was advanced to. The
>> +updated slot is marked as dirty if any advancing is done, with its
>> +information being written out at the follow-up checkpoint. In the
>> +event of a crash, the slot may return to an earlier position.
>> 
>> and it will not be moved beyond the current insert location. Returns
>> name of the slot and real position to which it was advanced to. The
>> information of the updated slot is scheduled to be written out at the
>> follow-up checkpoint if any advancing is done. In the event of a
>> crash, the slot may return to an earlier position.
> 
> Just searched through the *.sgml files, we already use terms 'dirty' and
> 'flush' applied to writing out pages during checkpoints. Here we are trying
> to describe the very similar process, but in relation to replication slots,
> so it looks fine for me. In the same time, the term 'schedule' is used for
> VACUUM, constraint check or checkpoint itself.

Honestly, I was a bit on the fence for the term "dirty" when typing
this paragraph, so I kind of agree with Horiguchi-san's point that it
could be confusing when applied to replication slots, because there is
no other reference in the docs about the link between the two
concepts.  So, I would go for a more simplified sentence for the first
part, keeping the second sentence intact:
"The information of the updated slot is written out at the follow-up
checkpoint if any advancing is done.  In the event of a crash, the
slot may return to an earlier position."
--
Michael


signature.asc
Description: PGP signature


Re: Append with naive multiplexing of FDWs

2020-01-28 Thread Movead Li
Hello Kyotaro,



>"Parallel scan" at the moment means multiple workers fetch unique 

>blocks from *one* table in an arbitrated manner. In this sense 

>"parallel FDW scan" means multiple local workers fetch unique bundles 

>of tuples from *one* foreign table, which means it is running on a 

>single session.  That doesn't offer an advantage. 



It maybe not "parallel FDW scan", it can be "parallel shards scan"

the local workers will pick every foreign partition to scan. I have ever 

draw a picture about that you can see it in the link below.

https://www.highgo.ca/2019/08/22/parallel-foreign-scan-of-postgresql/

I think the "parallel shards scan" make sence in this way.



>If parallel query processing worked in worker-per-table mode, 

>especially on partitioned tables, maybe the current FDW would work 

>without much of modification. But I believe asynchronous append on 

>foreign tables on a single process is far resource-effective and 

>moderately faster than parallel append. 



As the test result, current patch can not gain more performance when 

it returns a huge number of tuples. By "parallel shards scan" method,

it can work well, because the 'parallel' can take full use of CPUs while 

'asynchronous' can't. 






Highgo Software (Canada/China/Pakistan) 

URL : http://www.highgo.ca/ 

EMAIL: mailto:movead(dot)li(at)highgo(dot)ca

Re: BufFileRead() error signalling

2020-01-28 Thread Michael Paquier
On Tue, Jan 28, 2020 at 03:51:54PM -0500, Robert Haas wrote:
> I quickly reread that thread and I don't see that there's any firm
> consensus there in favor of "read %d of %zu" over "read only %d of %zu
> bytes". Now, if most people prefer the former, so be it, but I don't
> think that's clear from that thread.

The argument of consistency falls in favor of the former on HEAD:
$ git grep "could not read" | grep "read %d of %zu" | wc -l
59
$ git grep "could not read" | grep "read only %d of %zu" | wc -l
0
--
Michael


signature.asc
Description: PGP signature


Re: Setting min/max TLS protocol in clientside libpq

2020-01-28 Thread Michael Paquier
On Tue, Jan 28, 2020 at 11:29:39AM +0100, Daniel Gustafsson wrote:
> You don't really qualify as the target audience for basic, yet not always
> universally known/understood, sections in the documentation though =)

Likely I don't.

> I've heard enough complaints that it's complicated to set up that I
> think we need to make the docs more digestable, but if noone else
> +1's then lets drop it.

Sure.  Now I am pretty sure that we would need a bit more than just
saying that the SSL protocol is negotiated between the backend and
libpq if we add a new section.
--
Michael


signature.asc
Description: PGP signature


Re: Some incorrect option sizes for PQconninfoOption in libpq

2020-01-28 Thread Michael Paquier
On Tue, Jan 28, 2020 at 11:57:19AM +0100, Daniel Gustafsson wrote:
> Nice catch! +1 on the attached patch.

Thanks, fixed and backpatched down to 12, where gssencmode has been
added.
--
Michael


signature.asc
Description: PGP signature


Re: making the backend's json parser work in frontend code

2020-01-28 Thread Andrew Dunstan

On 1/28/20 5:28 PM, Mark Dilger wrote:
>
>
>> +# There doesn't seem to be any easy way to get TestLib to use the binaries 
>> from
>> +# our directory, so we hack up a path to our binary and run that
>> directly.  This
>> +# seems brittle enough that some other solution should be found, if 
>> possible.
>> +
>> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
>>
>> I don't know what the right thing to do here is. Perhaps someone more
>> familiar with TAP testing can comment.
> Yeah, I was hoping that might get a comment from Andrew.  I think if it works 
> as-is on windows, we could just use it this way until it causes a problem on 
> some platform or other.  It’s not a runtime issue, being only a build-time 
> test, and only then when tap tests are enabled *and* running check-world, so 
> nobody should really be adversely affected.  I’ll likely get around to 
> testing this on Windows, but I don’t have any Windows environments set up 
> yet, as that is still on my todo list.
>


I think using TESTDIR is Ok, but we do need a little more on Windows,
because the executable name will be different. See attached revised
version of the test script.



We also need some extra stuff for MSVC. Something like the attached
change to src/tools/msvc/Mkvcbuild.pm. Also, the Makefile will need a
line like:


PROGRAM = test_json


I'm still not 100% on the location of the test. I think the way the msvc
suite works this should be in its own dedicated directory e.g.
src/test/json_parse.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index c735d529ca..3c7be61e74 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -32,9 +32,9 @@ my @unlink_on_exit;
 
 # Set of variables for modules in contrib/ and src/test/modules/
 my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpq = ('test_json', 'dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
 my @contrib_uselibpgport   = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('test_json','oid2name', 'pg_standby', 'vacuumlo');
 my $contrib_extralibs  = undef;
 my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
 my $contrib_extrasource = {
@@ -121,7 +121,7 @@ sub mkvcbuild
 
 	our @pgcommonallfiles = qw(
 	  base64.c config_info.c controldata_utils.c d2s.c encnames.c exec.c
-	  f2s.c file_perm.c ip.c
+	  f2s.c file_perm.c ip.c jsonapi.c
 	  keywords.c kwlookup.c link-canary.c md5.c
 	  pg_lzcompress.c pgfnames.c psprintf.c relpath.c rmtree.c
 	  saslprep.c scram-common.c string.c stringinfo.c unicode_norm.c username.c
@@ -486,6 +486,11 @@ sub mkvcbuild
 		closedir($D);
 	}
 
+	if (-d "src/test/bin")
+	{
+		AddContrib("src/test","bin");
+	}
+
 	# Build Perl and Python modules after contrib/ modules to satisfy some
 	# dependencies with transform contrib modules, like hstore_plpython
 	# ltree_plpython and hstore_plperl.


001_test_json.pl
Description: Perl program


Re: standby apply lag on inactive servers

2020-01-28 Thread Kyotaro Horiguchi
At Tue, 28 Jan 2020 11:18:14 -0300, Alvaro Herrera  
wrote in 
> > xlog.c:7329: (similar code exists at line 9332)
> > >ereport(LOG,
> > >   (errmsg("redo done at %X/%X",
> > >   (uint32) (ReadRecPtr >> 32), (uint32) 
> > > ReadRecPtr)));
> > >xtime = GetLatestXTime();
> > >if (xtime)
> > >   ereport(LOG,
> > >   (errmsg("last completed transaction was at log 
> > > time %s",
> > >   timestamptz_to_str(xtime;
> > 
> > This code assumes (and the name GetLatestXTime() suggests, I first
> > noticed that here..) that the timestamp comes from commit/abort logs,
> > so otherwise it shows a wrong timestamp.  We shouldn't update the
> > variable by other than that kind of records.
> 
> Hmm, that's terrible.  GetLatestXTime() being displayed user-visibly for
> "last transaction completion" but having it include unrelated things
> such as restore points is terrible.  One idea is to should split it in
> two: one exclusively for transaction commit/abort, and another for all
> WAL activity.  That way, the former can be used for that message, and
> the latter for standby replay reports.  However, that might be
> overengineering, if the only thing that the former is that one LOG
> message; instead changing the log message to state that the time is for
> other activity, as you suggest, is simpler and has no downside that I
> can see.

Perhaps we can use ControlData->checkPointCopy.time instead. It misses
checkpoint records intermittently but works in general.

But as more significant issue, nowadays PostgreSQL doesn't run a
checkpoint if it is really inactive (that is, if no "important" WAL
records have issued.).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: standby apply lag on inactive servers

2020-01-28 Thread Kyotaro Horiguchi
At Tue, 28 Jan 2020 11:18:50 -0300, Alvaro Herrera  
wrote in 
> On 2020-Jan-27, Alvaro Herrera wrote:
> 
> > Actually looking again, getRecordTimestamp is looking pretty strange.
> > It looks much more natural by using nested switch/case blocks, as with
> > this diff.  I think the compiler does a better job this way too.
> 
> I hadn't noticed I forgot to attach the diff here :-(

Yeay, that patch bases the apply-lag patch:) And contains
XLOG_CHECKPOINT_*. But otherwise looks good to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai
2020年1月29日(水) 11:06 Thomas Munro :
>
> On Wed, Jan 29, 2020 at 2:49 PM Kohei KaiGai  wrote:
> > 2020年1月29日(水) 4:27 Thomas Munro :
> > > On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai  wrote:
> > > FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> > > exactly the same problem while making nearly exactly the same kind of
> > > thing (namely, a MemoryContext backed by space in the main shm area,
> > > in this case reusing the dsa.c allocator).
> > >
> > Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
> > months ago, however, missed the thread right now.
> >
> > The main point of the differences from this approach is portability of 
> > pointers
> > to shared memory chunks. (If I understand correctly)
> > PG-Strom preserves logical address space, but no physical pages, on startup
> > time, then maps shared memory segment on the fixed address on demand.
> > So, pointers are portable to all the backend processes, thus, suitable to 
> > build
> > tree structure on shared memory also.
>
> Very interesting.  PostgreSQL's DSM segments could potentially have
> been implemented that way (whereas today they are not mapped with
> MAP_FIXED), but I suppose people were worried about portability
> problems and ASLR.  The Windows port struggles with that stuff.
>
Yes. I'm not certain whether Windows can support equivalen behavior
to mmap(..., PROT_NONE) and SIGSEGV/SIGBUS handling.
It is also a reason why PG-Strom (that is only for Linux) wants to have
own shared memory management logic, at least, right now.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: Hash join not finding which collation to use for string hashing

2020-01-28 Thread Mark Dilger



> On Jan 28, 2020, at 7:38 PM, Mark Dilger  wrote:
> 
> 
> 
>> On Jan 28, 2020, at 6:46 PM, Tom Lane  wrote:
>> 
>> Mark Dilger  writes:
>>> While reviewing the partition-wise join patch, I ran into an issue that 
>>> exists in master, so rather than responding to that patch, I’m starting 
>>> this new thread.
>>> I noticed that this seems similar to the problem that was supposed to have 
>>> been fixed in the "Re: COLLATE: Hash partition vs UPDATE” thread.  As such, 
>>> I’ve included Tom and Amit in the CC list.
>> 
>> Hm, I don't see any bug here.  You're asking it to join
>> 
 CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
 CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");
>> 
 SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE 
 t1.a IN ('äbç', 'ὀδυσσεύς');
>> 
>> so t1.a and t2.a have different collations, and the system can't resolve
>> which to use for the comparison.
>> 
>> Now, I'd be the first to agree that this error could be reported better.
>> The parser knows that it couldn't resolve a collation for t1.a = t2.a, but
>> what it does *not* know is whether the '=' operator cares for collation.
>> Throwing an error when the operator wouldn't care at runtime isn't going
>> to make many people happy.  On the other hand, when the operator finally
>> does run and can't get a collation, all it knows is that it didn't get a
>> collation, not why.  So we can't produce an error message as specific as
>> "ja_JP and tr_TR collations conflict".
>> 
>> Now that the collations feature has settled in, it'd be nice to go back
>> and see if we can't improve that somehow.  Not sure how.
>> 
>> (BTW, before v12 the text '=' operator indeed did not care for collation,
>> so this example would've worked.  But the change in behavior is a
>> necessary consequence of having invented nondeterministic collations,
>> not a bug.)
> 
> I contemplated that for a while before submitting the report.  I agree that 
> for strings that are not binary equal, some collations might say the two 
> strings are equal, and other collations may say that they are not. But when 
> does any collation say that a string is not equal to itself?  All the strings 
> in these columns were loaded from the same source table, and they should 
> always equal themselves, so the only problem I am aware of is if some of them 
> equal others of them under one of the collations in question, where the other 
> collation doesn’t think so.  I’m pretty sure that does not exist in this 
> concrete example.
> 
> I guess I’m arguing that the system is giving up too soon, saying, “In theory 
> there might be values I don’t know how to compare, so I’m going to give up 
> now and not look”.
> 
> I think what is happening here is that the system thinks, “Hey, I can use a 
> hash join for this”, and then later realizes, “Oh, no, I can’t” and instead 
> of falling back to something other than hash join, it gives up.
> 
> Is there some more fundamental reason this query couldn’t correctly be 
> completed?  I don’t mind being enlightened about the part that I’m missing.

If the answer here is just that you’d rather it always fail at planning time 
because that’s more deterministic than having it sometimes succeed and 
sometimes fail at runtime depending on which data has been loaded, ok, I can 
understand that.  If so, then let’s put this error string into the docs, 
because right now, if you google

site:postgresql.org "could not determine which collation to use for 
string hashing” 

you don’t get anything from the docs telling you that this is an expected 
outcome.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Hash join not finding which collation to use for string hashing

2020-01-28 Thread Mark Dilger



> On Jan 28, 2020, at 6:46 PM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> While reviewing the partition-wise join patch, I ran into an issue that 
>> exists in master, so rather than responding to that patch, I’m starting this 
>> new thread.
>> I noticed that this seems similar to the problem that was supposed to have 
>> been fixed in the "Re: COLLATE: Hash partition vs UPDATE” thread.  As such, 
>> I’ve included Tom and Amit in the CC list.
> 
> Hm, I don't see any bug here.  You're asking it to join
> 
>>> CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
>>> CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");
> 
>>> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE 
>>> t1.a IN ('äbç', 'ὀδυσσεύς');
> 
> so t1.a and t2.a have different collations, and the system can't resolve
> which to use for the comparison.
> 
> Now, I'd be the first to agree that this error could be reported better.
> The parser knows that it couldn't resolve a collation for t1.a = t2.a, but
> what it does *not* know is whether the '=' operator cares for collation.
> Throwing an error when the operator wouldn't care at runtime isn't going
> to make many people happy.  On the other hand, when the operator finally
> does run and can't get a collation, all it knows is that it didn't get a
> collation, not why.  So we can't produce an error message as specific as
> "ja_JP and tr_TR collations conflict".
> 
> Now that the collations feature has settled in, it'd be nice to go back
> and see if we can't improve that somehow.  Not sure how.
> 
> (BTW, before v12 the text '=' operator indeed did not care for collation,
> so this example would've worked.  But the change in behavior is a
> necessary consequence of having invented nondeterministic collations,
> not a bug.)

I contemplated that for a while before submitting the report.  I agree that for 
strings that are not binary equal, some collations might say the two strings 
are equal, and other collations may say that they are not.  But when does any 
collation say that a string is not equal to itself?  All the strings in these 
columns were loaded from the same source table, and they should always equal 
themselves, so the only problem I am aware of is if some of them equal others 
of them under one of the collations in question, where the other collation 
doesn’t think so.  I’m pretty sure that does not exist in this concrete example.

I guess I’m arguing that the system is giving up too soon, saying, “In theory 
there might be values I don’t know how to compare, so I’m going to give up now 
and not look”.

I think what is happening here is that the system thinks, “Hey, I can use a 
hash join for this”, and then later realizes, “Oh, no, I can’t” and instead of 
falling back to something other than hash join, it gives up.

Is there some more fundamental reason this query couldn’t correctly be 
completed?  I don’t mind being enlightened about the part that I’m missing.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Peter Geoghegan
On Tue, Jan 28, 2020 at 6:53 PM Tom Lane  wrote:
> > FWIW, I noticed that GDB becomes much better at this when you add "set
> > print symbol on" to your .gdbinit dot file about a year ago.
>
> Interesting.  But I bet there are platform and version dependencies
> in the mix, too.  I'd still not wish to rely on this for debugging.

I agree that there are a lot of moving pieces here. I wouldn't like to
have to rely on this working myself.

-- 
Peter Geoghegan




Don't try fetching future segment of a TLI.

2020-01-28 Thread Kyotaro Horiguchi
Hello, I added (moved to) -hackers.

At Tue, 28 Jan 2020 19:13:32 +0300, Pavel Suderevsky  
wrote in 
> But for me it still seems that PostgreSQL has enough information to check
> that no WALs exist for the new timeline to omit searching all the
> possibly-existing WALs.
> 
> It can just look through the first received new-timeline's WAL and ensure
> timeline switch occured in this WAL. Finally, it can check archive for the
> only one possibly-existing previous WAL.

Right. The timeline history file tells where a timeline ends.

> Regading influence: issue is not about the large amount of WALs to apply
> but in searching for the non-existing WALs on the remote storage, each such
> search can take 5-10 seconds while obtaining existing WAL takes
> milliseconds.

Wow. I didn't know of a file system that takes that much seconds to
trying non-existent files. Although I still think this is not a bug,
but avoiding that actually leads to a big win on such systems.

After a thought, I think it's safe and effectively doable to let
XLogFileReadAnyTLI() refrain from trying WAL segments of too-high
TLIs.  Some garbage archive files out of the range of a timeline might
be seen, for example, after reusing archive directory without clearing
files.  However, fetching such garbages just to fail doesn't
contribute durability or reliablity at all, I think.

The attached does that. 

Any thoughts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1b9211740175b7f9cb6810c822a67d4065ca9cf0 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 29 Jan 2020 11:17:56 +0900
Subject: [PATCH v1] Don't try fetching out-of-timeline segments.

XLogFileReadAnyTLI scans known TLIs down from the largest one in
descending order while searching the target segment. Even if we know
that the segment belongs to a lower TLI, it tries opening the segment
of the larger TLIs just to fail. Under certain circumstances that
access to non-existent files take a long time and makes recovery time
significantly longer.

Although a segment beyond the end of a TLI suggests that the
XLOG/archive files may be broken, we can safely ignore such files as
far as recovery proceeds.
---
 src/backend/access/transam/xlog.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6e09ded597..415288f50d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3738,11 +3738,27 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source)
 
 	foreach(cell, tles)
 	{
-		TimeLineID	tli = ((TimeLineHistoryEntry *) lfirst(cell))->tli;
+		TimeLineHistoryEntry *hent = (TimeLineHistoryEntry *) lfirst(cell);
+		TimeLineID	tli = hent->tli;
 
 		if (tli < curFileTLI)
 			break;/* don't bother looking at too-old TLIs */
 
+		/* Skip segments not belonging to the TLI */
+		if (hent->begin != InvalidXLogRecPtr)
+		{
+			XLogSegNo	beginseg = 0;
+
+			XLByteToSeg(hent->begin, beginseg, wal_segment_size);
+
+			/*
+			 * We are scanning TLIs in descending order. It is sufficient to
+			 * check only the upper boundary.
+			 */
+			if (segno < beginseg)
+continue;		/* don't bother looking at future TLIs */
+		}
+
 		if (source == XLOG_FROM_ANY || source == XLOG_FROM_ARCHIVE)
 		{
 			fd = XLogFileRead(segno, emode, tli,
-- 
2.18.2



Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Tom Lane
Peter Geoghegan  writes:
> On Tue, Jan 28, 2020 at 10:09 AM Tom Lane  wrote:
>> No.  No.  Just NO.  (A) that's overly complex for developers to use,
>> and (B) you have far too much faith in the debugger producing something
>> useful.  (My experience is that it'll fail to render function pointers
>> legibly on an awful lot of platforms.)  Plus, you won't actually save
>> any space by removing both of those fields.

> FWIW, I noticed that GDB becomes much better at this when you add "set
> print symbol on" to your .gdbinit dot file about a year ago.

Interesting.  But I bet there are platform and version dependencies
in the mix, too.  I'd still not wish to rely on this for debugging.

regards, tom lane




Re: Hash join not finding which collation to use for string hashing

2020-01-28 Thread Tom Lane
Mark Dilger  writes:
> While reviewing the partition-wise join patch, I ran into an issue that 
> exists in master, so rather than responding to that patch, I’m starting this 
> new thread.
> I noticed that this seems similar to the problem that was supposed to have 
> been fixed in the "Re: COLLATE: Hash partition vs UPDATE” thread.  As such, 
> I’ve included Tom and Amit in the CC list.

Hm, I don't see any bug here.  You're asking it to join

>> CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
>> CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");

>> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE 
>> t1.a IN ('äbç', 'ὀδυσσεύς');

so t1.a and t2.a have different collations, and the system can't resolve
which to use for the comparison.

Now, I'd be the first to agree that this error could be reported better.
The parser knows that it couldn't resolve a collation for t1.a = t2.a, but
what it does *not* know is whether the '=' operator cares for collation.
Throwing an error when the operator wouldn't care at runtime isn't going
to make many people happy.  On the other hand, when the operator finally
does run and can't get a collation, all it knows is that it didn't get a
collation, not why.  So we can't produce an error message as specific as
"ja_JP and tr_TR collations conflict".

Now that the collations feature has settled in, it'd be nice to go back
and see if we can't improve that somehow.  Not sure how.

(BTW, before v12 the text '=' operator indeed did not care for collation,
so this example would've worked.  But the change in behavior is a
necessary consequence of having invented nondeterministic collations,
not a bug.)

regards, tom lane




Re: Autovacuum on partitioned table

2020-01-28 Thread yuzuko
Hello,

> Besides the complexity of
> getting that infrastructure in place, an important question is whether
> the current system of applying threshold and scale factor to
> changes_since_analyze should be used as-is for inheritance parents
> (partitioned tables), because if users set those parameters similarly
> to for regular tables, autovacuum might analyze partitioned tables
> more than necessary.  We'll either need a different formula, or some
> commentary in the documentation about how partitioned tables might
> need different setting, or maybe both.
>
I'm not sure but I think we need new autovacuum parameters for
partitioned tables (autovacuum, autovacuum_analyze_threshold,
autovacuum_analyze_scale_factor) because whether it's necessary
to run autovacuum on partitioned tables will depend on users.
What do you think?

> How are you going to track changes_since_analyze of partitioned table?
> It's just an idea but we can accumulate changes_since_analyze of
> partitioned table by adding child tables's value after analyzing each
> child table. And compare the partitioned tables value to the threshold
> that is computed by (autovacuum_analyze_threshold  + total rows
> including all child tables * autovacuum_analyze_scale_factor).
>
The idea Sawada-san mentioned is similar to mine.  Also, for tracking
changes_since_analyze, we have to make partitioned table's statistics.
To do that, we can invent a new PgStat_StatPartitionedTabEntry based
on PgStat_StatTabEntry.  Through talking with Amit, I think the new structure
needs the following members:

tableid
changes_since_analyze
analyze_timestamp
analyze_count
autovac_analyze_timestamp
autovac_analyze_count

Vacuum doesn't run on partitioned tables, so I think members related to
(auto) vacuum need not be contained in the structure.

I'm still writing a patch.  I'll send it this week.
-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Peter Geoghegan
On Tue, Jan 28, 2020 at 10:09 AM Tom Lane  wrote:
> No.  No.  Just NO.  (A) that's overly complex for developers to use,
> and (B) you have far too much faith in the debugger producing something
> useful.  (My experience is that it'll fail to render function pointers
> legibly on an awful lot of platforms.)  Plus, you won't actually save
> any space by removing both of those fields.

FWIW, I noticed that GDB becomes much better at this when you add "set
print symbol on" to your .gdbinit dot file about a year ago. In theory
you shouldn't need to do that to print the symbol that a function
pointer points to, I think. At least that's what the documentation
says. But in practice this seems to help a lot.

I don't recall figuring out a reason for this. Could have been due to
GDB being fussier about the declared type of a pointer than it needs
to be, or something along those lines.

-- 
Peter Geoghegan




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Thomas Munro
On Wed, Jan 29, 2020 at 2:49 PM Kohei KaiGai  wrote:
> 2020年1月29日(水) 4:27 Thomas Munro :
> > On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai  wrote:
> > FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> > exactly the same problem while making nearly exactly the same kind of
> > thing (namely, a MemoryContext backed by space in the main shm area,
> > in this case reusing the dsa.c allocator).
> >
> Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
> months ago, however, missed the thread right now.
>
> The main point of the differences from this approach is portability of 
> pointers
> to shared memory chunks. (If I understand correctly)
> PG-Strom preserves logical address space, but no physical pages, on startup
> time, then maps shared memory segment on the fixed address on demand.
> So, pointers are portable to all the backend processes, thus, suitable to 
> build
> tree structure on shared memory also.

Very interesting.  PostgreSQL's DSM segments could potentially have
been implemented that way (whereas today they are not mapped with
MAP_FIXED), but I suppose people were worried about portability
problems and ASLR.  The Windows port struggles with that stuff.

Actually the WIP code in that patch reserves a chunk of space up front
in the postmaster, and then puts a DSA allocator inside it.  Normally,
DSA allocators create/destroy new DSM segments on demand and deal with
the address portability stuff through a lot of extra work (this
probably makes Parallel Hash Join slightly slower than it should be),
but as a special case a DSA allocator can be created in preexisting
memory and then not allowed to grow.  In exchange for accepting a
fixed space, you get normal shareable pointers.  This means that you
can use the resulting weird MemoryContext for stuff like building
query plans that can be used by other processes, but when you run out
of memory, allocations begin to fail.  That WIP code is experimenting
with caches that can tolerate running out of memory (or at least that
is the intention), so a fixed sized space is OK for that.

> This article below introduces how our "shared memory context" works.
> It is originally written in Japanese, and Google translate may
> generate unnatural
> English. However, its figures probably explain how it works.
> https://translate.google.co.jp/translate?hl=ja=auto=en=http%3A%2F%2Fkaigai.hatenablog.com%2Fentry%2F2016%2F06%2F19%2F095127

Thanks!




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai
2020年1月29日(水) 4:27 Thomas Munro :
>
> On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai  wrote:
> > I noticed MemoryContextIsValid() called by various kinds of memory context
> > routines checks its node-tag as follows:
> >
> > #define MemoryContextIsValid(context) \
> > ((context) != NULL && \
> >  (IsA((context), AllocSetContext) || \
> >   IsA((context), SlabContext) || \
> >   IsA((context), GenerationContext)))
> >
> > It allows only "known" memory context methods, even though the memory 
> > context
> > mechanism enables to implement custom memory allocator by extensions.
> > Here is a node tag nobody used: T_MemoryContext.
> > It looks to me T_MemoryContext is a neutral naming for custom memory 
> > context,
> > and here is no reason why memory context functions prevents custom methods.
> >
> >
> > https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> > I recently implemented a custom memory context for shared memory allocation
> > with portable pointers. It shall be used for cache of pre-built gpu
> > binary code and
> > metadata cache of apache arrow files.
> > However, the assertion check above requires extension to set a fake node-tag
> > to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, 
> > but
> > feel a bit bad.
>
> FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
> exactly the same problem while making nearly exactly the same kind of
> thing (namely, a MemoryContext backed by space in the main shm area,
> in this case reusing the dsa.c allocator).
>
Thanks, I had looked at "Shared Memory Context" title on pgsql-hackers a few
months ago, however, missed the thread right now.

The main point of the differences from this approach is portability of pointers
to shared memory chunks. (If I understand correctly)
PG-Strom preserves logical address space, but no physical pages, on startup
time, then maps shared memory segment on the fixed address on demand.
So, pointers are portable to all the backend processes, thus, suitable to build
tree structure on shared memory also.

This article below introduces how our "shared memory context" works.
It is originally written in Japanese, and Google translate may
generate unnatural
English. However, its figures probably explain how it works.
https://translate.google.co.jp/translate?hl=ja=auto=en=http%3A%2F%2Fkaigai.hatenablog.com%2Fentry%2F2016%2F06%2F19%2F095127

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: Enabling B-Tree deduplication by default

2020-01-28 Thread Peter Geoghegan
On Thu, Jan 16, 2020 at 12:05 PM Peter Geoghegan  wrote:
> > It does seem odd to me to treat them differently, but it's possible
> > that this is a reflection of my own lack of understanding. What do
> > other database systems do?
>
> Other database systems treat unique indexes very differently, albeit
> in a way that we're not really in a position to take too much away
> from -- other than the general fact that unique indexes can be thought
> of as very different things.

I should point out here that I've just posted v31 of the patch, which
changes things for unique indexes. Our strategy during deduplication
is now the same for unique indexes, since the original,
super-incremental approach doesn't seem to make sense anymore. Further
optimization work in the patch eliminated problems that made this
approach seem like it might be worthwhile.

Note, however, that v31 changes nothing about how we think about
deduplication in unique indexes in general, nor how it is presented to
users. There is still special criteria around how deduplication is
*triggered* in unique indexes. We continue to trigger a deduplication
pass based on seeing a duplicate within _bt_check_unique() +
_bt_findinsertloc() -- otherwise we never attempt deduplication in a
unique index (same as before). Plus the GUC still doesn't affect
unique indexes, unique index deduplication still isn't really
documented in the user docs (it just gets a passing mention in B-Tree
internals section), etc. This seems like the right way to go, since
deduplication in unique indexes can only make sense on leaf pages
where most or all new items are duplicates of existing items, a
situation that is already easy to detect.

It wouldn't be that bad if we always attempted deduplication in a
unique index, but it's easy to only do it when we're pretty confident
we'll get a benefit -- why not save a few cycles?

--
Peter Geoghegan




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai
2020年1月29日(水) 2:18 Robert Haas :
>
> On Tue, Jan 28, 2020 at 11:24 AM Tom Lane  wrote:
> > I strongly object to having the subtype field be just "char".
> > I want it to be declared "MemoryContextType", so that gdb will
> > still be telling me explicitly what struct type this really is.
> > I realize that you probably did that for alignment reasons, but
> > maybe we could shave the magic number down to 2 bytes, and then
> > rearrange the field order?  Or just not sweat so much about
> > wasting a longword here.  Having those bools up at the front
> > is pretty ugly anyway.
>
> I kind of dislike having the magic number not be the first thing in
> the struct on aesthetic grounds, and possibly on the grounds that
> somebody might be examining the initial bytes manually to try to
> figure out what they've got, and having the magic number there makes
> it easier. Regarding space consumption, I guess this structure is
> already over 64 bytes and not close to 128 bytes, so adding another 8
> bytes probably isn't very meaningful, but I don't love it. One thing
> that concerns me a bit is that if somebody adds their own type of
> memory context, then MemoryContextType will have a value that is not
> actually in the enum. If compilers are optimizing the code on the
> assumption that this cannot occur, do we need to worry about undefined
> behavior?
>
> Actually, I have what I think is a better idea. I notice that the
> "type" field is actually completely unused. We initialize it, and then
> nothing in the code ever checks it or relies on it for any purpose.
> So, we could have a bug in the code that initializes that field with
> the wrong value, for a new context type or in general, and only a
> developer with a debugger would ever notice. That's because the thing
> that really matters is the 'methods' array. So what I propose is that
> we nuke the type field from orbit. If a developer wants to figure out
> what type of context they've got, they should print
> context->methods[0]; gdb will tell you the function names stored
> there, and then you'll know *for real* what type of context you've
> got.
>
> Here's a v2 that approaches the problem that way.
>
How about to have "const char *name" in MemoryContextMethods?
It is more human readable for debugging, than raw function pointers.
We already have similar field to identify the methods at CustomScanMethods.
(it is also used for EXPLAIN, not only debugging...)

I love the idea to identify the memory-context type with single identifiers
rather than two. If we would have sub-field Id and memory-context methods
separately, it probably needs Assert() to check the consistency of
them, isn't it?

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




RE: [Patch]: Documentation of ALTER TABLE re column type changes on binary-coercible fields

2020-01-28 Thread k.jami...@fujitsu.com
On Wednesday, January 29, 2020 3:56 AM (GMT+9),  Mike Lissner wrote:
> Hi all, I didn't get any replies to this. Is this the right way to send in a 
> patch to the
> docs?

Hello,
Yes, although your current patch does not apply as I tried it in my machine.
But you can still rebase it.
For the reviewers/committers to keep track of this, I think it might be better 
to
register your patch to the commitfest app: 
https://commitfest.postgresql.org/27/,
and you may put it under the "Documentation" topic. 

There's also a CFbot to check online whether your patch still applies cleanly
and passes the tests, especially after several commits in the source code.
Current CF: http://commitfest.cputube.org/index.html
Next CF: http://commitfest.cputube.org/next.html

Regards,
Kirk Jamison


> On Thu, Jan 23, 2020 at 11:01 PM Mike Lissner   > wrote:
> 
> 
>   Hi, first patch here and first post to pgsql-hackers. Here goes.
> 
> 
>   Enclosed please find a patch to tweak the documentation of the ALTER 
> TABLE
> page. I believe this patch is ready to be applied to master and backported 
> all the way
> to 9.2.
> 
> 
>   On the ALTER TABLE page, it currently notes that if you change the type 
> of a
> column, even to a binary coercible type:
> 
>   > any indexes on the affected columns must still be rebuilt.
> 
> 
>   It appears this hasn't been true for about eight years, since 367bc426a.
> 
>   Here's the discussion of the topic from earlier today and yesterday:
> 
>   https://www.postgresql.org/message-
> id/flat/CAMp9%3DExXtH0NeF%2BLTsNrew_oXycAJTNVKbRYnqgoEAT01t%3D67A%40
> mail.gmail.com
> 
>   I haven't run tests, but I presume they'll be unaffected by a 
> documentation
> change.
> 
> 
>   I've made an effort to follow the example of other people's patches I 
> looked
> at, but I haven't contributed here before. Happy to take another stab at this 
> if this
> doesn't hit the mark — though I hope it does. I love and appreciate 
> Postgresql and
> hope that I can do my little part to make it better.
> 
>   For the moment, I haven't added this to commitfest. I don't know what 
> it is,
> but I suspect this is small enough somebody will just pick it up.
> 
> 
>   Mike
> 



Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai
2020年1月29日(水) 0:56 Tomas Vondra :
>
> On Tue, Jan 28, 2020 at 11:32:49PM +0900, Kohei KaiGai wrote:
> >2020年1月28日(火) 23:09 Tomas Vondra :
> >>
> >> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
> >> >Hello,
> >> >
> >> >I noticed MemoryContextIsValid() called by various kinds of memory context
> >> >routines checks its node-tag as follows:
> >> >
> >> >#define MemoryContextIsValid(context) \
> >> >((context) != NULL && \
> >> > (IsA((context), AllocSetContext) || \
> >> >  IsA((context), SlabContext) || \
> >> >  IsA((context), GenerationContext)))
> >> >
> >> >It allows only "known" memory context methods, even though the memory 
> >> >context
> >> >mechanism enables to implement custom memory allocator by extensions.
> >> >Here is a node tag nobody used: T_MemoryContext.
> >> >It looks to me T_MemoryContext is a neutral naming for custom memory 
> >> >context,
> >> >and here is no reason why memory context functions prevents custom 
> >> >methods.
> >> >
> >>
> >> Good question. I don't think there's an explicit reason not to allow
> >> extensions to define custom memory contexts, and using T_MemoryContext
> >> seems like a possible solution. It's a bit weird though, because all the
> >> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> >> T_CustomMemoryContext would be a better choice, but that only works in
> >> master, of course.
> >>
> >From the standpoint of extension author, reuse of T_MemoryContext and
> >back patching the change on MemoryContextIsValid() makes us happy. :)
> >However, even if we add a new node-tag here, the custom memory-context
> >can leave to use fake node-tag a few years later. It's better than nothing.
> >
>
> Oh, right. I forgot we still need to backpatch this bit. But that seems
> like a fairly small amount of code, so it should work.
>
> I think we can't backpatch the addition of T_CustomMemoryContext anyway
> as it essentially breaks ABI, as it changes the values assigned to T_
> constants.
>
> >> Also, it won't work if we need to add memory contexts to equalfuncs.c
> >> etc. but maybe won't need that - it's more a theoretical issue.
> >>
> >Right now, none of nodes/XXXfuncs.c support these class of nodes related to
> >MemoryContext. It shall not be a matter.
> >
>
> Yes. I did not really mean it as argument against the patch, it was
> meant more like "This could be an issue, but it actually is not." Sorry
> if that wasn't clear.
>
> >> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> >> >I recently implemented a custom memory context for shared memory 
> >> >allocation
> >> >with portable pointers. It shall be used for cache of pre-built gpu
> >> >binary code and
> >> >metadata cache of apache arrow files.
> >> >However, the assertion check above requires extension to set a fake 
> >> >node-tag
> >> >to avoid backend crash. Right now, it is harmless to set 
> >> >T_AllocSetContext, but
> >> >feel a bit bad.
> >> >
> >>
> >> Interesting. Does that mean the hared memory contexts are part of the
> >> same hierarchy as "normal" contexts? That would be a bit confusing, I
> >> think.
> >>
> >If this shared memory context is a child of normal context, likely, it 
> >should be
> >reset or deleted as usual. However, if this shared memory context performs
> >as a parent of normal context, it makes a problem when different process 
> >tries
> >to delete this context, because its child (normal context) exists at the 
> >creator
> >process only. So, it needs to be used carefully.
> >
>
> Yeah, handling life cycle of a mix of those contexts may be quite tricky.
>
> But my concern was a bit more general - is it a good idea to hide the
> nature of the memory context behind the same API. If you call palloc()
> shouldn't you really know whether you're allocating the stuff in regular
> or shared memory context?
>
> Maybe it makes perfect sense, but my initial impression is that those
> seem rather different, so maybe we should keep them separate (in
> separate hierarchies or something). But I admit I don't have much
> experience with use cases that would require such shmem contexts.
>
Yeah, as you mentioned, we have no way to distinguish whether a particular
memory chunk is private memory or shared memory, right now.
It is the responsibility of software developers, and I assume the shared-
memory chunks are applied on a limited usages where it has a certain reason
why it should be shared.
On the other hand, it is the same situation even if private memory.
We should pay attention to the memory context to allocate a memory chunk from.
For example, state object to be valid during query execution must be allocated
from estate->es_query_cxt. If someone allocates it from CurrentMemoryContext,
then implicitly released, we shall consider it is a bug.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Hash join not finding which collation to use for string hashing

2020-01-28 Thread Mark Dilger
While reviewing the partition-wise join patch, I ran into an issue that exists 
in master, so rather than responding to that patch, I’m starting this new 
thread.

I noticed that this seems similar to the problem that was supposed to have been 
fixed in the "Re: COLLATE: Hash partition vs UPDATE” thread.  As such, I’ve 
included Tom and Amit in the CC list.

Notice the "ERROR:  could not determine which collation to use for string 
hashing”

The following is extracted from the output from the test:

> CREATE TABLE raw_data (a text);
> INSERT INTO raw_data (a) VALUES ('Türkiye'),
> ('TÜRKIYE'),
> ('bıt'),
> ('BIT'),
> ('äbç'),
> ('ÄBÇ'),
> ('aaá'),
> ('coté'),
> ('Götz'),
> ('ὀδυσσεύς'),
> ('ὈΔΥΣΣΕΎΣ'),
> ('を読み取り用'),
> ('にオープンできませんでした');
> -- Create unpartitioned tables for test
> CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
> CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");
> INSERT INTO alpha (SELECT a, a FROM raw_data);
> INSERT INTO beta (SELECT a, a FROM raw_data);
> ANALYZE alpha;
> ANALYZE beta;
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE 
> t1.a IN ('äbç', 'ὀδυσσεύς');
>  QUERY PLAN
> 
>  Hash Join
>Hash Cond: ((t2.a)::text = (t1.a)::text)
>->  Seq Scan on beta t2
>->  Hash
>  ->  Seq Scan on alpha t1
>Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> (6 rows)
> 
> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE 
> t1.a IN ('äbç', 'ὀδυσσεύς');
> ERROR:  could not determine which collation to use for string hashing
> HINT:  Use the COLLATE clause to set the collation explicitly.
> 


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



0001-WIP-patch-to-demonstrate-problem.patch
Description: Binary data



Re: Complete data erasure

2020-01-28 Thread Tomas Vondra

On Tue, Jan 28, 2020 at 02:34:07PM -0500, Stephen Frost wrote:

Greetings,

* asaba.takan...@fujitsu.com (asaba.takan...@fujitsu.com) wrote:

From: Stephen Frost 
> * asaba.takan...@fujitsu.com (asaba.takan...@fujitsu.com) wrote:
> > This feature erases data area just before it is returned to the OS (“erase”
> means that overwrite data area to hide its contents here)
> > because there is a risk that the data will be restored by attackers if it 
is returned
> to the OS without being overwritten.
> > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
>
> Looking at this fresh, I wanted to point out that I think Tom's right-
> we aren't going to be able to reasonbly support this kind of data
> erasure on a simple DROP TABLE or TRUNCATE.
>
> > I want users to be able to customize the erasure method for their security
> policies.
>
> There's also this- but I think what it means is that we'd probably have
> a top-level command that basically is "ERASE TABLE blah;" or similar
> which doesn't operate during transaction commit but instead marks the
> table as "to be erased" and then perhaps "erasure in progress" and then
> "fully erased" (or maybe just back to 'normal' at that point).  Making
> those updates will require the command to perform its own transaction
> management which is why it can't be in a transaction itself but also
> means that the data erasure process doesn't need to be done during
> commit.
>
> > My idea is adding a new parameter erase_command to postgresql.conf.
>
> Yeah, I don't think that's really a sensible option or even approach.

I think erase_command can also manage the state of a table.
The exit status of a configured command shows it.( 0 is "fully erased" or "normal", 1 is 
"erasure in progress")
erase_command is executed not during a transaction but when unlink() is 
executed.


I really don't see what the advantage of having this be configurable is.
In addition, an external command's actions wouldn't be put through the
WAL meaning that replicas would have to be dealt with in some other way
beyind regular WAL and that seems like it'd just be ugly.


(for example, after a transaction that has done DROP TABLE)


We certainly can't run external commands during transaction COMMIT, so
this can't be part of a regular DROP TABLE.



IMO the best solution would be that the DROP TABLE does everything as
usual, but instead of deleting the relfilenode it moves it to some sort
of queue. And then a background worker would "erase" these relfilenodes
outside the COMMIT.

And yes, we need to do this in a way that works with replicas, i.e. we
need to WAL-log it somehow. And it should to be done in a way that works
when the replica is on a different type of filesystem.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-01-28 Thread Peter Geoghegan
On Tue, Jan 28, 2020 at 1:34 PM Floris Van Nee  wrote:
> With Andres' instructions I ran a couple of tests. With your patches I can 
> reproduce a speedup of ~3% on single core tests reliably on a dual-socket 
> 36-core machine for the pgbench select-only test case.

Thanks for testing!

Attached is v2 of patch series, which makes the changes to 0001-*
requested by Andres. I restructured the loop in a way that allows the
compiler to assume that there will always be at least one loop
iteration -- so I'm not quite as aggressive as I was with v1. We don't
actually delay the call to BTreeTupleGetNAtts() as such in v2.

Can you test this version, Floris? The second two patches are probably
not helping here, so it would be useful if you could just test 0001-*,
and then test all three together. I can toss the latter two patches if
there is no additional speedup.

If we're lucky, then Andres will have been right to suspect that there
might be a smaller stall caused by the new branch in the loop that
existed in v1. Maybe this will show up at higher client counts.

I should point out that the deduplication patch changes the definition
of BTreeTupleGetNAtts(), making it slightly more complicated. With the
deduplication patch, we have to check that the tuple isn't a posting
list tuple, which uses the INDEX_ALT_TID_MASK/INDEX_AM_RESERVED_BIT
bit to indicate a non-standard tuple header format, just like the
current pivot tuple format (we need to check a BT_RESERVED_OFFSET_MASK
bit to further differentiate posting list tuples from pivot tuples).
This increase in the complexity of BTreeTupleGetNAtts() will probably
further tip things in favor of this patch.

There are no changes in either 0002-* or 0003-* patches for v2. I'm
including the same patches here a second time for completeness.

--
Peter Geoghegan
From 8f55bcedaa9c109543627e9c785dab0ffb0e5c75 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 22 Jan 2020 20:59:57 -0800
Subject: [PATCH v2 3/3] Remove "negative infinity" check from _bt_compare().

---
 src/backend/access/nbtree/nbtsearch.c | 32 ++-
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index dd56fdaaea..b2a2605c47 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -348,6 +348,8 @@ _bt_binsrch(Relation rel,
 high;
 	int32		result,
 cmpval;
+	bool		isleaf;
+	OffsetNumber noff;
 
 	page = BufferGetPage(buf);
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
@@ -359,6 +361,7 @@ _bt_binsrch(Relation rel,
 
 	low = P_FIRSTDATAKEY(opaque);
 	high = PageGetMaxOffsetNumber(page);
+	isleaf = P_ISLEAF(opaque);
 
 	/*
 	 * If there are no keys on the page, return the first available slot. Note
@@ -386,13 +389,20 @@ _bt_binsrch(Relation rel,
 
 	cmpval = key->nextkey ? 0 : 1;	/* select comparison value */
 
+	if (isleaf)
+		noff = InvalidOffsetNumber;
+	else
+		noff = P_FIRSTDATAKEY(opaque);
 	while (high > low)
 	{
 		OffsetNumber mid = low + ((high - low) / 2);
 
 		/* We have low <= mid < high, so mid points at a real slot */
 
-		result = _bt_compare_inl(rel, key, page, mid);
+		if (mid == noff)
+			result = 1;
+		else
+			result = _bt_compare_inl(rel, key, page, mid);
 
 		if (result >= cmpval)
 			low = mid + 1;
@@ -407,7 +417,7 @@ _bt_binsrch(Relation rel,
 	 * On a leaf page, we always return the first key >= scan key (resp. >
 	 * scan key), which could be the last slot + 1.
 	 */
-	if (P_ISLEAF(opaque))
+	if (isleaf)
 		return low;
 
 	/*
@@ -536,6 +546,16 @@ _bt_compare(Relation rel,
 			Page page,
 			OffsetNumber offnum)
 {
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+
+	/*
+	 * Force result ">" if target item is first data item on an internal page
+	 * --- see NOTE above.
+	 */
+	if (!P_ISLEAF(opaque) && offnum == P_FIRSTDATAKEY(opaque))
+		return 1;
+
+
 	return _bt_compare_inl(rel, key, page, offnum);
 }
 
@@ -568,7 +588,6 @@ _bt_compare_inl(Relation rel,
 OffsetNumber offnum)
 {
 	TupleDesc	itupdesc = RelationGetDescr(rel);
-	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	IndexTuple	itup;
 	ItemPointer heapTid;
 	int			ski;
@@ -580,13 +599,6 @@ _bt_compare_inl(Relation rel,
 	Assert(key->keysz <= IndexRelationGetNumberOfKeyAttributes(rel));
 	Assert(key->heapkeyspace || key->scantid == NULL);
 
-	/*
-	 * Force result ">" if target item is first data item on an internal page
-	 * --- see NOTE above.
-	 */
-	if (!P_ISLEAF(opaque) && offnum == P_FIRSTDATAKEY(opaque))
-		return 1;
-
 	itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
 	ntupatts = BTreeTupleGetNAtts(itup, rel);
 
-- 
2.17.1

From aa6d3365da2bcd6fdeaaffb7a4ac0f783538b522 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 22 Jan 2020 20:35:04 -0800
Subject: [PATCH v2 2/3] Inline _bt_compare().

---
 src/backend/access/nbtree/nbtsearch.c | 27 +++
 1 file 

Re: [HACKERS] kqueue

2020-01-28 Thread Thomas Munro
On Sat, Jan 25, 2020 at 11:29 AM Thomas Munro  wrote:
> On Thu, Jan 23, 2020 at 9:38 AM Rui DeSousa  wrote:
> > Here is two charts comparing a patched and unpatched system.
> > These systems are very large and have just shy of thousand
> > connections each with averages of 20 to 30 active queries concurrently
> > running at times including hundreds if not thousand of queries hitting
> > the database in rapid succession.  The effect is the unpatched system
> > generates a lot of system load just handling idle connections where as
> > the patched version is not impacted by idle sessions or sessions that
> > have already received data.
>
> Thanks.  I can reproduce something like this on an Azure 72-vCPU
> system, using pgbench -S -c800 -j32.  The point of those settings is
> to have many backends, but they're all alternating between work and
> sleep.  That creates a stream of poll() syscalls, and system time goes
> through the roof (all CPUs pegged, but it's ~half system).  Profiling
> the kernel with dtrace, I see the most common stack (by a long way) is
> in a poll-related lock, similar to a profile Rui sent me off-list from
> his production system.  Patched, there is very little system time and
> the TPS number goes from 539k to 781k.

If there are no further objections, I'm planning to commit this sooner
rather than later, so that it gets plenty of air time on developer and
build farm machines.  If problems are discovered on a particular
platform, there's a pretty good escape hatch: you can define
WAIT_USE_POLL, and if it turns out to be necessary, we could always do
something in src/template similar to what we do for semaphores.




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Mark Dilger
Thanks for your review.  I considered all of this along with your review 
comments in another email prior to sending v7 in response to that other email a 
few minutes ago.

> On Jan 28, 2020, at 7:17 AM, Robert Haas  wrote:
> 
> On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
>  wrote:
>> I’m attaching a new patch set with these three changes including Mahendra’s 
>> patch posted elsewhere on this thread.
>> 
>> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now 
>> based on a fresh copy of master.
> 
> OK, so I think this is getting close.
> 
> What is now 0001 manages to have four (4) conditionals on FRONTEND at
> the top of the file. This seems like at least one two many. 

You are referencing this section, copied here from the patch:

> #ifndef FRONTEND
> #include "postgres.h"
> #else
> #include "postgres_fe.h"
> #endif
> 
> #include "common/jsonapi.h"
> 
> #ifdef FRONTEND
> #include "common/logging.h"
> #endif
> 
> #include "mb/pg_wchar.h"
> 
> #ifndef FRONTEND
> #include "miscadmin.h"
> #endif

I merged these a bit.  See v7-0001 for details.

> Also, the preprocessor police are on their way to your house now to
> arrest you for that first one. You need to write it like this:
> 
> #define json_log_and_abort(...) \
>do { pg_log_fatal(__VA_ARGS__); exit(1); } while (0)

Yes, right, I had done that and somehow didn’t get it into the patch.  I’ll 
have coffee and donuts waiting.

> {
> - JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
> + JsonLexContext *lex;
> +
> +#ifndef FRONTEND
> + lex = palloc0(sizeof(JsonLexContext));
> +#else
> + lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
> + memset(lex, 0, sizeof(JsonLexContext));
> +#endif
> 
> Instead of this, how making no change at all here?

Yes, good point.  I had split that into frontend vs backend because I was using 
palloc0fast for the backend, which seems to me the preferred function when the 
size is compile-time known, like it is here, and there is no palloc0fast in 
fe_memutils.h for frontend use.  I then changed back to palloc0 when I noticed 
that pretty much nowhere else similar to this in the project uses palloc0fast.  
I neglected to change back completely, which left what you are quoting.

Out of curiousity, why is palloc0fast not used in more places?

> - default:
> - elog(ERROR, "unexpected json parse state: %d", ctx);
>  }
> +
> + /* Not reached */
> + json_log_and_abort("unexpected json parse state: %d", ctx);
> 
> This, too, seems unnecessary.

This was in response to Mahendra’s report of a compiler warning, which I didn’t 
get on my platform. The code in master changed a bit since v6 was written, so 
v7 just goes with how the newer committed code does this.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Tom Lane
Stephen Frost  writes:
> On Tue, Jan 28, 2020 at 16:17 Tom Lane  wrote:
>> On the other hand, there's the point that lots of people have probably
>> given out schema-CREATE privilege to users whom they wouldn't necessarily
>> wish to trust with INSTALL privilege.  Schema-CREATE is a pretty harmless
>> privilege, INSTALL much less so.

> CREATE doesn’t just control the ability to create schemas these days- it
> was extended to cover publications also not that long ago.

Oh really ... hm, that does make it a much bigger deal than I was
thinking.  Given that, I don't think there's any huge objection to
attaching this to CREATE, at least till we get around to a more
significant redesign.

regards, tom lane




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Mark Dilger


> On Jan 28, 2020, at 8:32 AM, Robert Haas  wrote:
> 
> On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
>  wrote:
>> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now 
>> based on a fresh copy of master.
> 
> I think the first question for 0005 is whether want this at all.
> Initially, you proposed NOT committing it, but then Andrew reviewed it
> as if it were for commit. I'm not sure whether he was actually saying
> that it ought to be committed, though, or whether he just missed your
> remarks on the topic. Nobody else has really taken a position. I'm not
> 100% convinced that it's necessary to include this, but I'm also not
> particularly opposed to it. It's a fairly small amount of code, which
> is nice, and perhaps useful as a demonstration of how to use the JSON
> parser in a frontend application, which someone also might find nice.

Once Andrew reviewed it, I started thinking about it as something that might 
get committed.  In that context, I think there should be a lot more tests in 
this new src/test/bin directory for other common code, but adding those as part 
of this patch just seems to confuse this patch.

In addition to adding frontend tests for code already in src/common, the 
conversation in another thread about adding frontend versions of elog and 
ereport seem like candidates for tests in this location.  Sure, you can add an 
elog into a real frontend tool, such as pg_ctl, and update the tests for that 
program to expect that elog’s output, but what if you just want to exhaustively 
test the elog infrastructure in the frontend spanning multiple locales, 
encodings, whatever?  You’ve also recently mentioned the possibility of having 
memory contexts in frontend code.  Testing those seems like a good fit, too.

I decided to leave this in the next version of the patch set, v7.  v6 had three 
files, the second being something that already got committed in a different 
form, so this is now in v7-0002 whereas it had been in v6-0003.  v6-0002 has no 
equivalent in v7.

> Anyone else want to express an opinion?
> 
> Meanwhile, here is a round of nitp^H^H^H^Hreview:
> 
> -# installcheck and install should not recurse into the subdirectory 
> "modules".
> +# installcheck and install should not recurse into the subdirectory "modules"
> +# nor "bin".
> 
> I would probably have just changed this to:
> 
> # installcheck and install should not recurse into "modules" or "bin"
> 
> The details are arguable, but you definitely shouldn't say "the
> subdirectory" and then list two of them.

I read that as “nor [the subdirectory] bin” with the [the subdirectory] portion 
elided, and it doesn’t sound anomalous to me, but your formulation is more 
compact.  I have used it in v7 of the patch set.  Thanks.

> 
> +This directory contains a set of programs that exercise functionality 
> declared
> +in src/include/common and defined in src/common.  The purpose of these 
> programs
> +is to verify that code intended to work both from frontend and backend code 
> do
> +indeed work when compiled and used in frontend code.  The structure of this
> +directory makes no attempt to test that such code works in the backend, as 
> the
> +backend has its own tests already, and presumably those tests sufficiently
> +exercide the code as used by it.
> 
> "exercide" is not spelled correctly, but I also disagree with giving
> the directory so narrow a charter. I think you should just say
> something like:
> 
> This directory contains programs that are built and executed for
> testing purposes,
> but never installed. It may be used, for example, to test that code in
> src/common
> works in frontend environments.

Your formulation sounds fine, and I’ve used it in v7.

> +# There doesn't seem to be any easy way to get TestLib to use the binaries 
> from
> +# our directory, so we hack up a path to our binary and run that
> directly.  This
> +# seems brittle enough that some other solution should be found, if possible.
> +
> +my $test_json = join('/', $ENV{TESTDIR}, 'test_json');
> 
> I don't know what the right thing to do here is. Perhaps someone more
> familiar with TAP testing can comment.

Yeah, I was hoping that might get a comment from Andrew.  I think if it works 
as-is on windows, we could just use it this way until it causes a problem on 
some platform or other.  It’s not a runtime issue, being only a build-time 
test, and only then when tap tests are enabled *and* running check-world, so 
nobody should really be adversely affected.  I’ll likely get around to testing 
this on Windows, but I don’t have any Windows environments set up yet, as that 
is still on my todo list.

> + set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));
> 
> Do we need this? I guess we're not likely to bother with translations
> for a test program.

Removed.

> + /*
> + * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
> + * too, which it should already be everywhere except sometimes in 

Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Juan José Santamaría Flecha
On Tue, Jan 28, 2020 at 5:21 PM Alvaro Herrera 
wrote:

> On 2020-Jan-28, Peter Eisentraut wrote:
>
> > On 2020-01-28 04:05, Mark Dilger wrote:
> > > German uses both Sonnabend and Samstag for Saturday, so don’t you have
> to compare to a list of values anyway?
> >
> > Yeah, good point.  If it doesn't accept both "Sonnabend" and "Samstag",
> then
> > it's not really usable.
>
> The string "Sonnabend" never appears in the glibc sources, so that will
> certainly not work.  I vote not to care about that, but of course my
> language is not one that has alternate weekday/month names.  I guess if
> we're intent on recognizing alternate names, we'll have to build our own
> list of them :-(
>
> I don't have the ICU sources here to check the status there.
>
>
"Sonnabend" is neither available in ICU.

What is available are both genitive and nominative forms for months, as
reported up thread by Peter. See formats "M" and "L" in:

http://userguide.icu-project.org/formatparse/datetime

Regards,

Juan José Santamaría Flecha


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Stephen Frost
Greetings,

On Tue, Jan 28, 2020 at 16:17 Tom Lane  wrote:

> Robert Haas  writes:
> > On Tue, Jan 28, 2020 at 3:52 PM Tom Lane  wrote:
> >> I continue to think that allowing DB owners to decide this is, if not
> >> fundamentally the wrong thing, at least not a feature that anybody has
> >> asked for in the past.  The feature *I* want in this area is for the
> >> superuser to be able to decide who's got install privilege.  Making
> >> it a DB-level privilege doesn't serve that goal, more the opposite.
>
> > I agree.
>
> >> Still, if we can compromise by making this part of DB "CREATE" privilege
> >> for the time being, I'm willing to take that compromise.  It's certainly
> >> better than failing to get rid of pg_pltemplate.
>
> > Doesn't that have exactly the issue you describe above?
> > bob=> grant create on database bob to fred;
> > GRANT
>
> Either of them do, in that a DB owner can always grant his whole role;
> "grant bob to fred" will give fred install privileges (in bob's DBs)
> regardless of which of these choices we adopt.  And that was true before
> (with respect to trusted PLs), too.  Attaching the ability to the CREATE
> bit would at least allow DB owners to be a bit more selective about how
> they give it out.


Right.

The reason I'm happier about doing this with CREATE than inventing
> a separate INSTALL bit is that once we do the latter, we're more or
> less bound to keep supporting that ability forever.  If we extend
> the definition of CREATE in v13, and then narrow it again in some
> future release, that seems less likely to cause problems than taking
> away a named privilege bit would do.


I would like to segregate these privileges more than just “install” vs
“other stuff” in the future anyway, so mixing it with the existing CREATE
isn’t that big of a deal in my view.

On the other hand, there's the point that lots of people have probably
> given out schema-CREATE privilege to users whom they wouldn't necessarily
> wish to trust with INSTALL privilege.  Schema-CREATE is a pretty harmless
> privilege, INSTALL much less so.


CREATE doesn’t just control the ability to create schemas these days- it
was extended to cover publications also not that long ago.  We never said
we wouldn’t extend CREATE to cover more objects and we’ve already extended
it recently, without anyone being up in arms about it that I can recall, so
this doesn’t feel like a huge issue or concern to me.  Note that, again,
these are trusted extensions which means, in their regular install, they
shouldn’t be able to “break outside the box” any more than a plpgsql
function is able to.

Thanks,

Stephen


RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare()

2020-01-28 Thread Floris Van Nee

> 
> I could do some tests with the patch on some larger machines. What exact
> tests do you propose? Are there some specific postgresql.conf settings and
> pgbench initialization you recommend for this? And was the test above just
> running 'pgbench -S' select-only with specific -T, -j and -c parameters?
> 

With Andres' instructions I ran a couple of tests. With your patches I can 
reproduce a speedup of ~3% on single core tests reliably on a dual-socket 
36-core machine for the pgbench select-only test case. When using the full 
scale test my results are way too noisy even for large runs unfortunately. I 
also tried some other queries (for example select's that return 10 or 100 rows 
instead of just 1), but can't see much of a speed-up there either, although it 
also doesn't hurt.

So I guess the most noticeable one is the select-only benchmark for 1 core:


transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 30255419
latency average = 0.020 ms
latency stddev = 0.001 ms
tps = 50425.693234 (including connections establishing)
tps = 50425.841532 (excluding connections establishing)


transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 31363398
latency average = 0.019 ms
latency stddev = 0.001 ms
tps = 52272.326597 (including connections establishing)
tps = 52272.476380 (excluding connections establishing)

This is the one with 40 clients, 40 threads. Not really an improvement, and 
quite still quite noisy.

transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 40
number of threads: 40
duration: 600 s
number of transactions actually processed: 876846915
latency average = 0.027 ms
latency stddev = 0.015 ms
tps = 1461407.539610 (including connections establishing)
tps = 1461422.084486 (excluding connections establishing)


transaction type: 
scaling factor: 300
query mode: prepared
number of clients: 40
number of threads: 40
duration: 600 s
number of transactions actually processed: 872633979
latency average = 0.027 ms
latency stddev = 0.038 ms
tps = 1454387.326179 (including connections establishing)
tps = 1454396.879195 (excluding connections establishing)

For tests that don't use the full machine (eg. 10 clients, 10 threads) I see 
speed-ups as well, but not as high as the single-core run. It seems there are 
other bottlenecks (on the machine) coming into play.

-Floris



Re: [PATCH] Windows port, fix some resources leaks

2020-01-28 Thread Ranier Vilela
Em ter., 28 de jan. de 2020 às 18:06, Alvaro Herrera <
alvhe...@2ndquadrant.com> escreveu:

> On 2020-Jan-28, Robert Haas wrote:
>
> > On Fri, Jan 24, 2020 at 2:13 AM Michael Paquier 
> wrote:
> > > No, that's not right.  I think that it is possible to loop over
> > > ShmemProtectiveRegion in some cases.  And actually, your patch is dead
> > > wrong because this is some code called by the postmaster and it cannot
> > > use FATAL.
> >
> > Uh, really? I am not aware of such a rule.
>
> I don't think we have ever expressed it as such, but certainly we prefer
> postmaster to be super robust ... rather live with a some hundred bytes
> leak rather than have it die and take the whole database service down
> for what's essentially a fringe bug that has bothered no one in a decade
> and a half.
>
Maybe it didn't bother anyone, because the Windows port is much less used.
Anyway, I believe that freeing the memory before returning false, will not
bring down the service, changing the patch to LOG, instead of FATAL.
The primary error of the patch was to use FATAL.

regards,
Ranier Vilela


Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 3:52 PM Tom Lane  wrote:
>> I continue to think that allowing DB owners to decide this is, if not
>> fundamentally the wrong thing, at least not a feature that anybody has
>> asked for in the past.  The feature *I* want in this area is for the
>> superuser to be able to decide who's got install privilege.  Making
>> it a DB-level privilege doesn't serve that goal, more the opposite.

> I agree.

>> Still, if we can compromise by making this part of DB "CREATE" privilege
>> for the time being, I'm willing to take that compromise.  It's certainly
>> better than failing to get rid of pg_pltemplate.

> Doesn't that have exactly the issue you describe above?
> bob=> grant create on database bob to fred;
> GRANT

Either of them do, in that a DB owner can always grant his whole role;
"grant bob to fred" will give fred install privileges (in bob's DBs)
regardless of which of these choices we adopt.  And that was true before
(with respect to trusted PLs), too.  Attaching the ability to the CREATE
bit would at least allow DB owners to be a bit more selective about how
they give it out.

The reason I'm happier about doing this with CREATE than inventing
a separate INSTALL bit is that once we do the latter, we're more or
less bound to keep supporting that ability forever.  If we extend
the definition of CREATE in v13, and then narrow it again in some
future release, that seems less likely to cause problems than taking
away a named privilege bit would do.

On the other hand, there's the point that lots of people have probably
given out schema-CREATE privilege to users whom they wouldn't necessarily
wish to trust with INSTALL privilege.  Schema-CREATE is a pretty harmless
privilege, INSTALL much less so.

I do like your point about how maybe we shouldn't change the status quo
without more consensus than we've got ... but in the end I just want
to get this done and move on.

regards, tom lane




Re: [PATCH] Windows port, fix some resources leaks

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 4:06 PM Alvaro Herrera  wrote:
> I don't think we have ever expressed it as such, but certainly we prefer
> postmaster to be super robust ... rather live with a some hundred bytes
> leak rather than have it die and take the whole database service down
> for what's essentially a fringe bug that has bothered no one in a decade
> and a half.

Well, yeah. I mean, I'm not saying it's a good idea in this instance
to FATAL here. I'm just saying that I don't think there is a general
rule that code which does FATAL in the postmaster is automatically
wrong, which is what I took Michael to be suggesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Windows port, fix some resources leaks

2020-01-28 Thread Ranier Vilela
Em ter., 28 de jan. de 2020 às 17:54, Robert Haas 
escreveu:

> On Fri, Jan 24, 2020 at 2:13 AM Michael Paquier 
> wrote:
> > No, that's not right.  I think that it is possible to loop over
> > ShmemProtectiveRegion in some cases.  And actually, your patch is dead
> > wrong because this is some code called by the postmaster and it cannot
> > use FATAL.
>
> Uh, really? I am not aware of such a rule.
>
> Well, in postmaster.c has a structure that makes use of the variable
ShmemProtectiveRegion, I think it is related to the function in
src/backend/port/win32_shmem.c.
On line 575 in src / backend / port / win32_shmem.c, there is a comment
that tells to not use FATAL.
"Don't use FATAL since we're running in the postmaster."

regards,
Ranier Vilela


Re: [PATCH] Windows port, fix some resources leaks

2020-01-28 Thread Alvaro Herrera
On 2020-Jan-28, Robert Haas wrote:

> On Fri, Jan 24, 2020 at 2:13 AM Michael Paquier  wrote:
> > No, that's not right.  I think that it is possible to loop over
> > ShmemProtectiveRegion in some cases.  And actually, your patch is dead
> > wrong because this is some code called by the postmaster and it cannot
> > use FATAL.
> 
> Uh, really? I am not aware of such a rule.

I don't think we have ever expressed it as such, but certainly we prefer
postmaster to be super robust ... rather live with a some hundred bytes
leak rather than have it die and take the whole database service down
for what's essentially a fringe bug that has bothered no one in a decade
and a half.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: VALUES ROW(...)

2020-01-28 Thread Tom Lane
Markus Winand  writes:
> PostgreSQL does not accept the following standard conforming statement:
>VALUES ROW(1,2), ROW(3,4)
> There is a comment about this in the source code [0]:

> /*
> * We should allow ROW '(' expr_list ')' too, but that seems to require
> * making VALUES a fully reserved word, which will probably break more apps
> * than allowing the noise-word is worth.
> */

> The latest release of MySQL (8.0.19) introduced table value constructors 
> (VALUES), but **requires** the keyword ROW [1]. Of the 9 systems I tested, 
> only MySQL and H2 accept ROW in VALUES [2].

> Is it worth re-visiting this decision in order to improve standard 
> conformance and MySQL (and H2) compability?

I'd still say that making VALUES fully reserved is a bridge too far:
you will make many more people unhappy from that than you make happy
because we can now spell this more than one way.  (I'm somewhat
guessing that some people are using "values" as a column or table name,
but I doubt that that's an unreasonable guess.)

If you want to see some movement on this, look into whether we can
find a way to allow it without that.  I don't recall exactly what
the stumbling block is there, but maybe there's a way around it.

regards, tom lane




Re: Making psql error out on output failures

2020-01-28 Thread David Zhang



On 2020-01-28 4:14 a.m., Daniel Verite wrote:

David Zhang wrote:


The error "No space left on device" can be logged by fprintf() which is
redefined as pg_fprintf() when print_aligned_text() is called

Are you sure? I don't find that redefinition. Besides
print_aligned_text() also calls putc and puts.
Yes, below is the gdb debug message when psql first time detects the 
error "No space left on device". Test case, "postgres=# select 
repeat('111', 100) \g /mnt/ramdisk/file"

bt
#0  flushbuffer (target=0x7ffd6a709ad0) at snprintf.c:313
#1  0x55ba90b88b6c in dopr_outchmulti (c=32, slen=447325, 
target=0x7ffd6a709ad0) at snprintf.c:1435
#2  0x55ba90b88d5e in trailing_pad (padlen=-147, 
target=0x7ffd6a709ad0) at snprintf.c:1514
#3  0x55ba90b87f36 in fmtstr (value=0x55ba90bb4f9a "", leftjust=1, 
minlen=147, maxwidth=0, pointflag=0, target=0x7ffd6a709ad0) at 
snprintf.c:994
#4  0x55ba90b873c6 in dopr (target=0x7ffd6a709ad0, 
format=0x55ba90bb5083 "%s%-*s", args=0x7ffd6a709f40) at snprintf.c:675
#5  0x55ba90b865b5 in pg_vfprintf (stream=0x55ba910cf240, 
fmt=0x55ba90bb507f "%-*s%s%-*s", args=0x7ffd6a709f40) at snprintf.c:257
#6  0x55ba90b866aa in pg_fprintf (stream=0x55ba910cf240, 
fmt=0x55ba90bb507f "%-*s%s%-*s") at snprintf.c:270
#7  0x55ba90b75d22 in print_aligned_text (cont=0x7ffd6a70a210, 
fout=0x55ba910cf240, is_pager=false) at print.c:937
#8  0x55ba90b7ba19 in printTable (cont=0x7ffd6a70a210, 
fout=0x55ba910cf240, is_pager=false, flog=0x0) at print.c:3378
#9  0x55ba90b7bedc in printQuery (result=0x55ba910f9860, 
opt=0x7ffd6a70a2c0, fout=0x55ba910cf240, is_pager=false, flog=0x0) at 
print.c:3496
#10 0x55ba90b39560 in PrintQueryTuples (results=0x55ba910f9860) at 
common.c:874
#11 0x55ba90b39d55 in PrintQueryResults (results=0x55ba910f9860) at 
common.c:1262
#12 0x55ba90b3a343 in SendQuery (query=0x55ba910f2590 "select 
repeat('111', 100) ") at common.c:1446
#13 0x55ba90b51f36 in MainLoop (source=0x7f1623a9ea00 
<_IO_2_1_stdin_>) at mainloop.c:505
#14 0x55ba90b5c4da in main (argc=3, argv=0x7ffd6a70a738) at 
startup.c:445

(gdb) l
308 size_t  written;
309
310 written = fwrite(target->bufstart, 1, nc, 
target->stream);

311 target->nchars += written;
312 if (written != nc)
313 target->failed = true;
314 }
315 target->bufptr = target->bufstart;
316 }
317
(gdb) p written
$2 = 1023
(gdb) p nc
$3 = 1024
(gdb) p strerror(errno)
$4 = 0x7f16238672c9 "No space left on device"
(gdb)



Will that be a better solution if redefine fputs similar to fprintf and
log the exact error when first time discovered?

I think we can assume it's not acceptable to have pg_fprintf()
to print anything to stderr, or to set a flag through a global
variable. So even if using pg_fprintf() for all the printing, no matter
how  (through #defines or otherwise),  there's still the problem that the
error needs to be propagated up the call chain to be processed by psql.


The concern is that if we can't provide a more accurate
information to the end user when error happens, sometimes the
end user might got even confused.

It's better to have a more informative message, but I'm for
not having the best be the enemy of the good.
The first concern is that at the moment, we have no error
report at all in the case when the output can be opened
but the error happens later along the writes.


Best regards,

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca





Re: [PoC] Non-volatile WAL buffer

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 3:28 AM Takashi Menjo
 wrote:
> I think our concerns are roughly classified into two:
>
>  (1) Performance
>  (2) Consistency
>
> And your "different concern" is rather into (2), I think.

Actually, I think it was mostly a performance concern (writes
triggering lots of reading) but there might be a consistency issue as
well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 3:52 PM Tom Lane  wrote:
> I continue to think that allowing DB owners to decide this is, if not
> fundamentally the wrong thing, at least not a feature that anybody has
> asked for in the past.  The feature *I* want in this area is for the
> superuser to be able to decide who's got install privilege.  Making
> it a DB-level privilege doesn't serve that goal, more the opposite.

I agree.

> Still, if we can compromise by making this part of DB "CREATE" privilege
> for the time being, I'm willing to take that compromise.  It's certainly
> better than failing to get rid of pg_pltemplate.

Doesn't that have exactly the issue you describe above?

bob=> grant create on database bob to fred;
GRANT

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Windows port, fix some resources leaks

2020-01-28 Thread Robert Haas
On Fri, Jan 24, 2020 at 2:13 AM Michael Paquier  wrote:
> No, that's not right.  I think that it is possible to loop over
> ShmemProtectiveRegion in some cases.  And actually, your patch is dead
> wrong because this is some code called by the postmaster and it cannot
> use FATAL.

Uh, really? I am not aware of such a rule.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> The minimum committable patch seems like it would just grant the
>> "can install trusted extensions" ability to DB owners, full stop.

> If you're alright with making it something a DB owner can do, what is
> the issue with making it part of the CREATE right on the database?

Um, well, people were complaining that it should be a distinct privilege,
which I for one wasn't sold on.

I continue to think that allowing DB owners to decide this is, if not
fundamentally the wrong thing, at least not a feature that anybody has
asked for in the past.  The feature *I* want in this area is for the
superuser to be able to decide who's got install privilege.  Making
it a DB-level privilege doesn't serve that goal, more the opposite.

Still, if we can compromise by making this part of DB "CREATE" privilege
for the time being, I'm willing to take that compromise.  It's certainly
better than failing to get rid of pg_pltemplate.

regards, tom lane




Re: BufFileRead() error signalling

2020-01-28 Thread Robert Haas
On Mon, Jan 27, 2020 at 9:03 PM Michael Paquier  wrote:
> That's actually not the best fit, because this does not take care of
> the pluralization of the second message if you have only 1 byte to
> read ;)

But ... if you have only one byte to read, you cannot have a short read.

> A second point to take into account is that the unification of error
> messages makes for less translation work, which is always welcome.
> Those points have been discussed on this thread:
> https://www.postgresql.org/message-id/2018052522.gb1...@paquier.xyz

I quickly reread that thread and I don't see that there's any firm
consensus there in favor of "read %d of %zu" over "read only %d of %zu
bytes". Now, if most people prefer the former, so be it, but I don't
think that's clear from that thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 3:29 PM Stephen Frost  wrote:
> I get that you want to push forward with making this part of the DB
> owner, and I said up-thread that I'd be able to live with that, but I
> still don't understand what the argument is against making it part of
> CREATE instead.

It's a change from the status quo. If we're going to how it works, we
should try to agree on how it ought to work. Tom's proposal dodges
that by leaving things exactly as they are, deferring any actual
modifications to who can do what to a future patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 2:29 PM Tom Lane  wrote:
>> Well, yeah, that's exactly my point.  But in my book, "refuse to do
>> anything" should be "elog(ERROR)", not "invoke undefined behavior".
>> An actual abort() call might be all right here, in that at least
>> we'd know what would happen and we could debug it once we got hold
>> of a stack trace.  But pg_unreachable() is not that.  Basically, if
>> there's *any* circumstance, broken code or not, where control could
>> reach a pg_unreachable() call, you did it wrong.

> I don't really agree. I think such defensive coding is more than
> justified when the input is coming from a file on disk or some other
> external source where it might have been corrupted.

There's certainly an argument to be made that an elog() call is an
unjustified expenditure of code space and we should just do an abort()
(but still not pg_unreachable(), IMO).  However, what I'm really on about
here is that CreateDestReceiver is out of step with nigh-universal project
practice.  If it's not worth having an elog() here, then there are
literally hundreds of other elog() calls that we ought to be nuking on
the same grounds.  I don't really want to run around and have a bunch
of debates about exactly which extant elog() calls are effectively
unreachable and which are not.  That's not always very clear, and even
if it is clear today it might not be tomorrow.  The minute somebody calls
CreateDestReceiver with a non-constant argument, the issue becomes open
again.  And I'd rather not have to stop and think hard about the tradeoff
between elog() and abort() when I write such functions in future.

So basically, my problem with this is that I don't think it's a coding
style we want to encourage, because it's too fragile.  And there's no
good argument (like performance) to leave it that way.  I quite agree
with you that there are places like tuple deforming where we're taking
more chances than I'd like --- but there is a noticeable performance
cost to being paranoid there.

regards, tom lane




Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Peter Eisentraut

On 2020-01-28 16:47, Juan José Santamaría Flecha wrote:
This patch targets to do something symmetrical to to_char(), which will 
just return a single value.


I didn't fully realize while reading this thread that to_char() already 
supports localized output and this patch indeed just wants to do the 
opposite.


So I'm withdrawing my concerns with respect to this patch.  As long as 
it can do a roundtrip conversion with to_char(), it's fine.


It's pretty clear that this interface cannot be useful for producing or 
parsing fully general localized dates.  But since it exists now (and 
it's quasi SQL standard now), we might as well fill in this gap.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




VALUES ROW(...)

2020-01-28 Thread Markus Winand
Hi!

PostgreSQL does not accept the following standard conforming statement:

   VALUES ROW(1,2), ROW(3,4)

There is a comment about this in the source code [0]:

/*
* We should allow ROW '(' expr_list ')' too, but that seems to require
* making VALUES a fully reserved word, which will probably break more apps
* than allowing the noise-word is worth.
*/

The latest release of MySQL (8.0.19) introduced table value constructors 
(VALUES), but **requires** the keyword ROW [1]. Of the 9 systems I tested, only 
MySQL and H2 accept ROW in VALUES [2].

Is it worth re-visiting this decision in order to improve standard conformance 
and MySQL (and H2) compability?

-markus

Refs:
[0] src/backend/parser/gram.y - 
https://github.com/postgres/postgres/blob/30012a04a6c8127397a8ab71e160d9c7e7fbe874/src/backend/parser/gram.y#L11893
[1] https://dev.mysql.com/doc/refman/8.0/en/values.html
[2] Not supporting it: Db2 11.5, MariaDB 10.4, Oracle 19c, SQL Server 2019, 
SQLite 3.30.0, Derby 10.15.1.3.
Some results published here: 
https://modern-sql.com/feature/values#compatibility





Re: Removing pg_pltemplate and creating "trustable" extensions

2020-01-28 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> The patch as I'm proposing it has nothing to do with "CREATE" rights.
> >> You're attacking something different from what I actually want to do.
> 
> > Yes, as an aside, I'm argueing that we should split up the general
> > CREATE privileges, but I also said that's not required for this.
> 
> So how do we move this forward?  I really don't want this patch to be
> blocked by what's fundamentally a side point about permissions.
> 
> The minimum committable patch seems like it would just grant the
> "can install trusted extensions" ability to DB owners, full stop.

If you're alright with making it something a DB owner can do, what is
the issue with making it part of the CREATE right on the database?  That
doesn't require any additional permission bits, in a default setup
doesn't change who is able to create extensions (in either your proposal
or mine, it's the DB owner), and in either proposal means that people
who couldn't create extensions with PG12 will be able to create them in
PG13.

We've added other things to the DB-level CREATE rights rather recently
too, so it's not like we've historically avoided that.

The argument you presented previously against that idea was because it
would mean the DB owner would still be able to exercise that right,
which is what you're now proposing anyway and which I was always
advocating for and wasn't trying to say wouldn't be the case with that
approach.

So I'm at a loss for what the actual argument is against making it part
of DB-level CREATE.

> This is small, it's exactly the same as our historical behavior for
> trusted PLs, and it's upward compatible with either of two possible
> future extensions:
> 
> * adding a predefined role (which'd let superusers give out the install
> privilege, in addition to DB owners having it)

Uh, just to be clear, even with your approach, a DB owner could 'GRANT'
the necessary right for another user to install extensions by simply
GRANT'ing their own role to that user.  Obviously, that conveys other
privileges with it, but we have that problem at any level as long as we
constrain ourselves to a single set of 32 bits for representing
privileges.  I see it as being manifestly better to lump it in with the
DB-level CREATE privilege though.

> * converting DB owners' hard-wired privilege to a grantable privilege
> (which'd let DB owners give out the install privilege, if the privilege
> is attached to the DBs themselves; but maybe there's some other way?)

In either of these proposals, we could split up the bounded-together
privileges down the road, and, sure, there might be more than one way to
do that, but I really don't want to go down a road where every privilege
ends up being split up into a seperate default-role (or predefined role
or whatever we want to call those things today).

> Given the lack of consensus about either of those being what we want,
> it doesn't seem like we're going to come to an agreement in a
> reasonable timeframe on a patch that includes either.  So I'd like
> to get this done and move on to the next problem (ie, what is it
> we're actually going to do about the python 2/3 mess).

I get that you want to push forward with making this part of the DB
owner, and I said up-thread that I'd be able to live with that, but I
still don't understand what the argument is against making it part of
CREATE instead.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 2:29 PM Tom Lane  wrote:
> Well, yeah, that's exactly my point.  But in my book, "refuse to do
> anything" should be "elog(ERROR)", not "invoke undefined behavior".
> An actual abort() call might be all right here, in that at least
> we'd know what would happen and we could debug it once we got hold
> of a stack trace.  But pg_unreachable() is not that.  Basically, if
> there's *any* circumstance, broken code or not, where control could
> reach a pg_unreachable() call, you did it wrong.

I don't really agree. I think such defensive coding is more than
justified when the input is coming from a file on disk or some other
external source where it might have been corrupted. For instance, I
think the fact that the code which deforms heap tuples will cheerfully
sail off the end of the buffer or seg fault if the tuple descriptor
doesn't match the tuple is a seriously bad thing. It results in actual
production crashes that could be avoided with more defensive coding.
Admittedly, there would likely be a performance cost, which might not
be a reason to do it, but if that cost is small I would probably vote
for paying it, because this is something that actually happens to
users on a pretty regular basis.

In the case at hand, though, there are no constants of type
CommandDest that come from any place other than a constant in the
program text, and it seems unlikely that this will ever be different
in the future. So, how could we ever end up with a value that's not in
the enum? I guess the program text itself could be corrupted, but we
cannot defend against that.

Mind you, I'm not going to put up a huge stink if you're bound and
determined to go change this. I prefer it the way that it is, and I
think that preference is well-justified by facts on the ground, but I
don't think it's worth fighting about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Patch]: Documentation of ALTER TABLE re column type changes on binary-coercible fields

2020-01-28 Thread Mark Dilger



> On Jan 28, 2020, at 10:55 AM, Mike Lissner  
> wrote:
> 
> Hi all, I didn't get any replies to this. Is this the right way to send in a 
> patch to the docs?
> 

Yes, your patch has been received, thanks.  I don’t know if anybody is 
reviewing it, but typically you don’t hear back on a patch until somebody has 
reviewed it and has an opinion to share with you.

The feeling, “Hey, I submitted something, and for all I know it got lost in the 
mail” is perhaps common for first time submitters.  We’d like to see you around 
the project again, so please don’t take the silence as a cold shoulder.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Tom Lane
Mark Dilger  writes:
>> On Jan 28, 2020, at 9:30 AM, Tom Lane  wrote:
>> A brute-force answer, if there are few enough cases, is to recognize
>> them regardless of the specific value of LC_TIME.  Do we think
>> anybody would notice or care if to_date('Sonnabend', 'TMDay') works
>> even when in a non-German locale?

> I think this only becomes a problem if there are weekday or month name 
> collisions between languages where they have different meanings.  As an 
> absurd hypothetical, if “Sunday” in Tagalog means what “Tuesday” means in 
> English, then you’ve got a problem.

> This does happen for month abbreviations.  “Mar” is short for “March” and 
> variations in a number of languages, but short for “November” in Finnish.

> For day abbreviations, “Su” collides between fi_FI and hr_HR, and “tor” 
> collides between sl_SL and no_NO.

But none of those cases are alternative names, so we wouldn't have
entries for them in this hypothetical list.  They'd only be recognized
when strftime() returns them.  I suspect also that the abbreviated
names have few if any alternatives, so we may only need this whole hack
for full names.

A possible way of tightening things up without explicitly decoding the
locale name is to make the entries of the list be string pairs: "if
strftime() returned this, then also consider that".  That puts a bit
of a premium on knowing which names strftime considers primary, but
I think we'd have had to know that anyway.

regards, tom lane




Re: Complete data erasure

2020-01-28 Thread Stephen Frost
Greetings,

* asaba.takan...@fujitsu.com (asaba.takan...@fujitsu.com) wrote:
> From: Stephen Frost 
> > * asaba.takan...@fujitsu.com (asaba.takan...@fujitsu.com) wrote:
> > > This feature erases data area just before it is returned to the OS 
> > > (“erase”
> > means that overwrite data area to hide its contents here)
> > > because there is a risk that the data will be restored by attackers if it 
> > > is returned
> > to the OS without being overwritten.
> > > The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
> > 
> > Looking at this fresh, I wanted to point out that I think Tom's right-
> > we aren't going to be able to reasonbly support this kind of data
> > erasure on a simple DROP TABLE or TRUNCATE.
> > 
> > > I want users to be able to customize the erasure method for their security
> > policies.
> > 
> > There's also this- but I think what it means is that we'd probably have
> > a top-level command that basically is "ERASE TABLE blah;" or similar
> > which doesn't operate during transaction commit but instead marks the
> > table as "to be erased" and then perhaps "erasure in progress" and then
> > "fully erased" (or maybe just back to 'normal' at that point).  Making
> > those updates will require the command to perform its own transaction
> > management which is why it can't be in a transaction itself but also
> > means that the data erasure process doesn't need to be done during
> > commit.
> > 
> > > My idea is adding a new parameter erase_command to postgresql.conf.
> > 
> > Yeah, I don't think that's really a sensible option or even approach.
> 
> I think erase_command can also manage the state of a table.
> The exit status of a configured command shows it.( 0 is "fully erased" or 
> "normal", 1 is "erasure in progress") 
> erase_command is executed not during a transaction but when unlink() is 
> executed. 

I really don't see what the advantage of having this be configurable is.
In addition, an external command's actions wouldn't be put through the
WAL meaning that replicas would have to be dealt with in some other way
beyind regular WAL and that seems like it'd just be ugly.

> (for example, after a transaction that has done DROP TABLE)

We certainly can't run external commands during transaction COMMIT, so
this can't be part of a regular DROP TABLE.

> > > When erase_command is set, VACUUM does not truncate a file size to 
> > > non-zero
> > > because it's safer for users to return the entire file to the OS than to 
> > > return part
> > of it.
> > 
> > There was discussion elsewhere about preventing VACUUM from doing a
> > truncate on a file because of the lock it requires and problems with
> > replicas..  I'm not sure where that ended up, but, in general, I don't
> > think this feature and VACUUM should really have anything to do with
> > each other except for the possible case that a user might be told to
> > configure their system to not allow VACUUM to truncate tables if they
> > care about this case.
> 
> I think that if ftruncate(fd, 0) is executed in VACUUM, 
> data area allocated to a file is returned to the OS, so that area must be 
> overwritten.

As I mentioned, there was already talk of making that disallowed in
VACUUM, so that would just need to be configured (and be able to be
configured) when running in an environment which requires this.

> > As mentioned elsewhere, you do also have to consider that the sensitive
> > data will end up in the WAL and on replicas.  I don't believe that means
> > this feature is without use, but it means that users of this feature
> > will also need to understand and be able to address WAL and replicas
> > (along with backups and such too, of course).
> 
> I see.
> I can't think of it right away, but I will deal with it.

It's something that will need to be understood and dealt with.  A simple
answer might be "the user must also destroy all WAL and all backups in
an approved manner, and ensure all replicas have replayed all WAL or
been destroyed".

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: making the backend's json parser work in frontend code

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 1:32 PM Tom Lane  wrote:
>> BTW, now that I think about it, CreateDestReceiver is not up to project
>> standards anyway, in that it fails to provide reasonable behavior in
>> the case where what's passed is not a legal value of the enum.

> Well, I might be responsible for the CreateDestReceiver thing -- or I
> might not, I haven't checked -- but I do think that style is a bit
> cleaner and more elegant. I think it's VERY unlikely that anyone would
> ever manage to call it with something that's not a legal value of the
> enum, and if they do, I think the chances of surviving are basically
> nil, and frankly, I'd rather die. If you asked me where you want me to
> store my output and I tell you to store it in the sdklgjsdjgslkdg, you
> really should refuse to do anything at all, not just stick my output
> someplace-or-other and hope for the best.

Well, yeah, that's exactly my point.  But in my book, "refuse to do
anything" should be "elog(ERROR)", not "invoke undefined behavior".
An actual abort() call might be all right here, in that at least
we'd know what would happen and we could debug it once we got hold
of a stack trace.  But pg_unreachable() is not that.  Basically, if
there's *any* circumstance, broken code or not, where control could
reach a pg_unreachable() call, you did it wrong.

regards, tom lane




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Thomas Munro
On Tue, Jan 28, 2020 at 2:55 PM Kohei KaiGai  wrote:
> I noticed MemoryContextIsValid() called by various kinds of memory context
> routines checks its node-tag as follows:
>
> #define MemoryContextIsValid(context) \
> ((context) != NULL && \
>  (IsA((context), AllocSetContext) || \
>   IsA((context), SlabContext) || \
>   IsA((context), GenerationContext)))
>
> It allows only "known" memory context methods, even though the memory context
> mechanism enables to implement custom memory allocator by extensions.
> Here is a node tag nobody used: T_MemoryContext.
> It looks to me T_MemoryContext is a neutral naming for custom memory context,
> and here is no reason why memory context functions prevents custom methods.
>
>
> https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> I recently implemented a custom memory context for shared memory allocation
> with portable pointers. It shall be used for cache of pre-built gpu
> binary code and
> metadata cache of apache arrow files.
> However, the assertion check above requires extension to set a fake node-tag
> to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, 
> but
> feel a bit bad.

FWIW the code in https://commitfest.postgresql.org/26/2325/ ran into
exactly the same problem while making nearly exactly the same kind of
thing (namely, a MemoryContext backed by space in the main shm area,
in this case reusing the dsa.c allocator).




Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Mark Dilger



> On Jan 28, 2020, at 9:30 AM, Tom Lane  wrote:
> 
> A brute-force answer, if there are few enough cases, is to recognize
> them regardless of the specific value of LC_TIME.  Do we think
> anybody would notice or care if to_date('Sonnabend', 'TMDay') works
> even when in a non-German locale?

I think this only becomes a problem if there are weekday or month name 
collisions between languages where they have different meanings.  As an absurd 
hypothetical, if “Sunday” in Tagalog means what “Tuesday” means in English, 
then you’ve got a problem.

This does happen for month abbreviations.  “Mar” is short for “March” and 
variations in a number of languages, but short for “November” in Finnish.

For day abbreviations, “Su” collides between fi_FI and hr_HR, and “tor” 
collides between sl_SL and no_NO.

I don’t see any collisions with full month or full weekday names, but I’m also 
only looking at the 53 locales installed in /usr/share/locale/.*/LC_TIME on my 
mac.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 1:32 PM Tom Lane  wrote:
> BTW, now that I think about it, CreateDestReceiver is not up to project
> standards anyway, in that it fails to provide reasonable behavior in
> the case where what's passed is not a legal value of the enum.
> What you'll get, if you're lucky, is a SIGABRT crash with no
> indication of the cause --- or if you're not lucky, some very
> hard-to-debug crash elsewhere as a result of the function returning
> a garbage pointer.  So independently of whether the current coding
> suppresses compiler warnings reliably, I think we ought to replace it
> with elog()-and-return-NULL.  Admittedly, that's wasting a few bytes
> on a case that should never happen ... but we haven't ever hesitated
> to do that elsewhere, if it'd make the problem more diagnosable.

Well, I might be responsible for the CreateDestReceiver thing -- or I
might not, I haven't checked -- but I do think that style is a bit
cleaner and more elegant. I think it's VERY unlikely that anyone would
ever manage to call it with something that's not a legal value of the
enum, and if they do, I think the chances of surviving are basically
nil, and frankly, I'd rather die. If you asked me where you want me to
store my output and I tell you to store it in the sdklgjsdjgslkdg, you
really should refuse to do anything at all, not just stick my output
someplace-or-other and hope for the best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 1:08 PM Tom Lane  wrote:
> > That's because the thing
> > that really matters is the 'methods' array. So what I propose is that
> > we nuke the type field from orbit. If a developer wants to figure out
> > what type of context they've got, they should print
> > context->methods[0]; gdb will tell you the function names stored
> > there, and then you'll know *for real* what type of context you've
> > got.
>
> No.  No.  Just NO.  (A) that's overly complex for developers to use,
> and (B) you have far too much faith in the debugger producing something
> useful.  (My experience is that it'll fail to render function pointers
> legibly on an awful lot of platforms.)  Plus, you won't actually save
> any space by removing both of those fields.

I half-expected this reaction, but I think it's unjustified. Two
sources of truth are not better than one, and I don't think that any
other place where we use a vtable-type approach includes a redundant
type field just for decoration. Can you think of a counterexample?

After scrounging around the source tree a bit, the most direct
parallel I can find is probably TupleTableSlot, which contains a
pointer to a TupleTableSlotOps, but not a separate field identifying
the slot type. I don't remember you or anybody objecting that
TupleTableSlot should contain both "const TupleTableSlotOps *const
tts_ops" and also "enum TupleTableSlotType", and I don't think that
such a suggestion would have been looked upon very favorably, not only
because it would have made the struct bigger and served no necessary
purpose, but also because having a centralized list of all
TupleTableSlot types flies in the face of the essential goal of the
table AM interface, which is to allow adding new table type (and a
slot type for each) without having to modify core code. That exact
consideration is also relevant here: KaiGai wants to be able to add
his own memory context type in third-party code without having to
modify core code. I've wanted to do that in the past, too. Having to
list all the context types in an enum means that you really can't do
that, which sucks, unless you're willing to lie about the context type
and hope that nobody adds code that cares about it. Is there an
alternate solution that you can propose that does not prevent that?

You might be entirely correct that there are some debuggers that can't
print function pointers correctly. I have not run across one, if at
all, in a really long time, but I also work mostly on MacOS and Linux
these days, and those are pretty mainstream platforms where such
problems are less likely. However, I suspect that there are very few
developers who do the bulk of their work on obscure platforms with
poorly-functioning debuggers. The only time it's likely to come up is
if a buggy commit makes things crash on some random buildfarm critter
and we need to debug from a core dump or whatever. But, if that does
happen, we're not dead. The only possible values of the "methods"
pointer -- if only core code is at issue -- are ,
, and , so somebody can just print out
those values and compare it to the pointer they find. That is a lot
less convenient than being able to just print context->methods[0] and
see everything, but if it only comes up when debugging irreproducible
crashes on obscure platforms, it seems OK to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




[Patch]: Documentation of ALTER TABLE re column type changes on binary-coercible fields

2020-01-28 Thread Mike Lissner
Hi all, I didn't get any replies to this. Is this the right way to send in
a patch to the docs?

Thanks,


Mike

On Thu, Jan 23, 2020 at 11:01 PM Mike Lissner <
mliss...@michaeljaylissner.com> wrote:

> Hi, first patch here and first post to pgsql-hackers. Here goes.
>
> Enclosed please find a patch to tweak the documentation of the ALTER TABLE
> page. I believe this patch is ready to be applied to master and backported
> all the way to 9.2.
>
> On the ALTER TABLE page, it currently notes that if you change the type of
> a column, even to a binary coercible type:
>
> > any indexes on the affected columns must still be rebuilt.
>
> It appears this hasn't been true for about eight years, since 367bc426a.
>
> Here's the discussion of the topic from earlier today and yesterday:
>
>
> https://www.postgresql.org/message-id/flat/CAMp9%3DExXtH0NeF%2BLTsNrew_oXycAJTNVKbRYnqgoEAT01t%3D67A%40mail.gmail.com
>
> I haven't run tests, but I presume they'll be unaffected by a
> documentation change.
>
> I've made an effort to follow the example of other people's patches I
> looked at, but I haven't contributed here before. Happy to take another
> stab at this if this doesn't hit the mark — though I hope it does. I love
> and appreciate Postgresql and hope that I can do my little part to make it
> better.
>
> For the moment, I haven't added this to commitfest. I don't know what it
> is, but I suspect this is small enough somebody will just pick it up.
>
> Mike
>
Index: doc/src/sgml/ref/alter_table.sgml
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- doc/src/sgml/ref/alter_table.sgml	(revision 6de7bcb76f6593dcd107a6bfed645f2142bf3225)
+++ doc/src/sgml/ref/alter_table.sgml	(revision 9a813e0896e828900739d95f78b5e4be10dac365)
@@ -1225,10 +1225,9 @@
 existing column, if the USING clause does not change
 the column contents and the old type is either binary coercible to the new
 type or an unconstrained domain over the new type, a table rewrite is not
-needed; but any indexes on the affected columns must still be rebuilt.
-Table and/or index rebuilds may take a
-significant amount of time for a large table; and will temporarily require
-as much as double the disk space.
+needed. Table and/or index rebuilds may take a significant amount of time
+for a large table; and will temporarily require as much as double the disk
+space.

 



Re: VACUUM memory management

2020-01-28 Thread Ibrar Ahmed
On Wed, Jan 22, 2020 at 11:17 AM k.jami...@fujitsu.com <
k.jami...@fujitsu.com> wrote:

> Hi Ibrar,
>
>
>
> Are you still working on this patch?
>
> Currently the patch does not apply mainly because of
>
> recent commits for parallel vacuum have updated the files in this patch.
>
> Kindly rebase it and change the status to "Needs Review" after.
>
>
>
> Upon quick scan of another thread [1] mentioned above,
>
> I believe the people involved had consensus on the same direction
>
> of allocating mem in chunks, and dynamically alloc when
>
> needed. A point for discussion was the size of chunk allocation.
>
>
>
> After a brief look of your patch, there's a typo between
>
> declaration and definition of lazy_vacuum_page():
>
> arryindex --> arrindex
>
>
>
> static int   lazy_vacuum_page(Relation onerel, BlockNumber blkno,
> Buffer buffer,
>
> -
> int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
>
> +
> int arryindex, int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
>
>
>
> static int
>
> lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
>
> -   int tupindex, LVRelStats
> *vacrelstats, Buffer *vmbuffer)
>
> +  int arrindex, int tupindex,
> LVRelStats *vacrelstats, Buffer *vmbuffer)
>
>
>
> Unnecessary change:
>
> -   long  maxtuples;
>
> -   intvac_work_mem =
> IsAutoVacuumWorkerProcess() &&
>
> +  longmaxtuples;
>
> +  int vac_work_mem = IsAutoVacuumWorkerProcess() &&
>
>
>
> Other typo:
>
> + * pg_bsearch() -- bsearch algorithem for two dimention array.
>
> algorithem --> algorithm
>
> dimention --> dimension
>
>
>
> I might have missed something more,
>
> but I'll continue reviewing after the rebased patch.
>
>
>
> Regards,
>
> Kirk Jamison
>
>
>
> [1]
> https://www.postgresql.org/message-id/flat/CAGTBQpbDCaR6vv9%3DscXzuT8fSbckf%3Da3NgZdWFWZbdVugVht6Q%40mail.gmail.com
>
Hi,
Yes, I am working on that. I will send the rebased and updated patch.


-- 
Ibrar Ahmed


Re: making the backend's json parser work in frontend code

2020-01-28 Thread Tom Lane
I wrote:
> Robert Haas  writes:
>> Is the example of CreateDestReceiver() sufficient to show that this is
>> not a problem in practice?

> Dunno.  I don't see any warnings about that in the buildfarm, but
> that's not a very large sample of non-gcc compilers.

BTW, now that I think about it, CreateDestReceiver is not up to project
standards anyway, in that it fails to provide reasonable behavior in
the case where what's passed is not a legal value of the enum.
What you'll get, if you're lucky, is a SIGABRT crash with no
indication of the cause --- or if you're not lucky, some very
hard-to-debug crash elsewhere as a result of the function returning
a garbage pointer.  So independently of whether the current coding
suppresses compiler warnings reliably, I think we ought to replace it
with elog()-and-return-NULL.  Admittedly, that's wasting a few bytes
on a case that should never happen ... but we haven't ever hesitated
to do that elsewhere, if it'd make the problem more diagnosable.

IOW, there's a good reason why there are exactly no other uses
of that coding pattern.

regards, tom lane




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 11:35 AM Tom Lane  wrote:
>> 3. Some compilers still don't understand that elog(ERROR) doesn't
>> return, so you need a dummy return.  Perhaps pg_unreachable()
>> would do as well, but project style has been the dummy return for
>> a long time ... and I'm not entirely convinced by the assumption
>> that every compiler understands pg_unreachable(), anyway.

> Is the example of CreateDestReceiver() sufficient to show that this is
> not a problem in practice?

Dunno.  I don't see any warnings about that in the buildfarm, but
that's not a very large sample of non-gcc compilers.

Another angle here is that on non-gcc compilers, pg_unreachable()
is going to expand to an abort() call, which is likely to eat more
code space than a dummy "return 0".

regards, tom lane




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 11:24 AM Tom Lane  wrote:
>> I strongly object to having the subtype field be just "char".
>> I want it to be declared "MemoryContextType", so that gdb will
>> still be telling me explicitly what struct type this really is.
>> I realize that you probably did that for alignment reasons, but
>> maybe we could shave the magic number down to 2 bytes, and then
>> rearrange the field order?  Or just not sweat so much about
>> wasting a longword here.  Having those bools up at the front
>> is pretty ugly anyway.

> I kind of dislike having the magic number not be the first thing in
> the struct on aesthetic grounds,

Huh?  I didn't propose that.  I was thinking either

uint16  magic;
boolisReset;
boolallowInCritSection;
enumtype;
... 64-bit fields follow ...

or

uint32  magic;
enumtype;
boolisReset;
boolallowInCritSection;
... 64-bit fields follow ...

where the latter wastes space unless the compiler chooses to fit the
enum into 16 bits, but it's not really our fault if it doesn't.  Besides,
what's the reason to think we'll never add any more bools here?  I don't
think we need to be that excited about the padding.

> So, we could have a bug in the code that initializes that field with
> the wrong value, for a new context type or in general, and only a
> developer with a debugger would ever notice.

Right, but that is a pretty important use-case.

> That's because the thing
> that really matters is the 'methods' array. So what I propose is that
> we nuke the type field from orbit. If a developer wants to figure out
> what type of context they've got, they should print
> context->methods[0]; gdb will tell you the function names stored
> there, and then you'll know *for real* what type of context you've
> got.

No.  No.  Just NO.  (A) that's overly complex for developers to use,
and (B) you have far too much faith in the debugger producing something
useful.  (My experience is that it'll fail to render function pointers
legibly on an awful lot of platforms.)  Plus, you won't actually save
any space by removing both of those fields.

If we were going to conclude that we don't really need a magic number,
I'd opt for replacing the NodeTag with an enum MemoryContextType field
that's decoupled from NodeTag.  But I don't feel tremendously happy
about not having a magic number.  That'd make it noticeably harder
to recognize cases where you're referencing an invalid context pointer.

In the end, trying to shave a couple of bytes from context headers
seems pretty penny-wise and pound-foolish.  There are lots of other
structs with significantly higher usage where we've not stopped to
worry about alignment padding, so why here?  Personally I'd just
put the bools back at the end where they used to be.

regards, tom lane




Re: JIT performance bug/regression & JIT EXPLAIN

2020-01-28 Thread Robert Haas
On Mon, Jan 27, 2020 at 4:18 PM Tom Lane  wrote:
> Robert Haas  writes:
> >>> I do not think that the readability-vs-usefulness tradeoff is going
> >>> to be all that good there, anyway.  Certainly for testing purposes
> >>> it's going to be more useful to examine portions of a structured output.
>
> > I intensely dislike having information that we can't show in the text
> > format, or really, that we can't show in every format.
>
> Well, if it's relegated to a "jit = detail" option or some such,
> the readability objection could be overcome.  But I'm still not clear
> on how you'd physically wedge it into the output, at least not in a way
> that matches up with the proposal that non-text modes handle this stuff
> by producing sub-nodes for the existing types of expression fields.

Well, remember that the text format was the original format. The whole
idea of "groups" was an anachronism that I imposed on the text format
to make it possible to add other formats. It wasn't entirely natural,
because the text format basically indicated nesting by indentation,
and that wasn't going to work for XML or JSON. The text format also
felt free to repeat elements and assume the reader would figure it
out; repeating elements is OK in XML in general, but in JSON it's only
OK if the surrounding context is an array rather than an object.
Anyway, the point is that I (necessarily) started with whatever we had
and found a way to fit it into a structure. It seems like it ought to
be possible to go the other direction also, and figure out how to make
the structured data look OK as text.

Here's Andres's original example:

"Filter": {
  "Expr": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)",
  "JIT-Expr": "evalexpr_0_2",
  "JIT-Deform-Scan": "deform_0_3",
  "JIT-Deform-Outer": null,
  "JIT-Deform-Inner": null
}

Right now we show:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)

Andres proposed:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone); JIT-Expr: evalexpr_0_2, JIT-Deform-Scan:
deform_0_3

That's not ideal because it's all on one line, but that could be changed:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)
  JIT-Expr: evalexpr_0_2
  JIT-Deform-Scan: deform_0_3

I would propose either including null all the time or omitting it all
the time, so that we would either change the JSON output to...

"Filter": {
  "Expr": "(lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)",
  "JIT-Expr": "evalexpr_0_2",
  "JIT-Deform-Scan": "deform_0_3"
}

Or the text output to:

Filter: (lineitem.l_shipdate <= '1998-09-18 00:00:00'::timestamp
without time zone)
  JIT-Expr: evalexpr_0_2
  JIT-Deform-Scan: deform_0_3
  JIT-Deform-Outer: null
  JIT-Deform-Inner: null

You could argue that this is inconsistent because the JSON format
shows a bunch of keys that are essentially parallel, and this text
format makes the Expr key essentially the primary value and the others
secondary. But since the text format is for human beings, and since
human beings are likely to find the Expr key to be the primary piece
of information, maybe that's totally fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [Proposal] Global temporary tables

2020-01-28 Thread Pavel Stehule
út 28. 1. 2020 v 18:13 odesílatel Pavel Stehule 
napsal:

>
>
> út 28. 1. 2020 v 18:12 odesílatel 曾文旌(义从) 
> napsal:
>
>>
>>
>> 2020年1月29日 上午12:40,Pavel Stehule  写道:
>>
>>
>>
>> út 28. 1. 2020 v 17:01 odesílatel 曾文旌(义从) 
>> napsal:
>>
>>>
>>>
>>> 2020年1月24日 上午4:47,Robert Haas  写道:
>>>
>>> On Sat, Jan 11, 2020 at 8:51 PM Tomas Vondra
>>>  wrote:
>>>
>>> I proposed just ignoring those new indexes because it seems much simpler
>>> than alternative solutions that I can think of, and it's not like those
>>> other solutions don't have other issues.
>>>
>>>
>>> +1.
>>>
>>> I complete the implementation of this feature.
>>> When a session x create an index idx_a on GTT A then
>>> For session x, idx_a is valid when after create index.
>>> For session y, before session x create index done, GTT A has some data,
>>> then index_a is invalid.
>>> For session z, before session x create index done, GTT A has no data,
>>> then index_a is valid.
>>>
>>>
>>> For example, I've looked at the "on demand" building as implemented in
>>> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
>>> calls into various places in index code seems somewht suspicious.
>>>
>>>
>>> +1. I can't imagine that's a safe or sane thing to do.
>>>
>>> --
>>> Robert Haas
>>> EnterpriseDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>>>
>>> Opinion by Pavel
>>> + rel->rd_islocaltemp = true;  <<< if this is valid, then the name
>>> of field "rd_islocaltemp" is not probably best
>>> I renamed rd_islocaltemp
>>>
>>
>> I don't see any change?
>>
>> Rename rd_islocaltemp to rd_istemp
>> in global_temporary_table_v8-pg13.patch
>>
>
> ok :)
>

I found a bug

postgres=# create global temp table x(a int);
CREATE TABLE
postgres=# insert into x values(1);
INSERT 0 1
postgres=# create index on x (a);
CREATE INDEX
postgres=# create index on x((a + 1));
CREATE INDEX
postgres=# analyze x;
WARNING:  oid 16468 not a relation
ANALYZE

other behave looks well for me.

Regards

Pavel


> Pavel
>
>>
>>
>> Wenjing
>>
>>
>>
>>
>>
>>
>>> Opinion by Konstantin Knizhnik
>>> 1 Fixed comments
>>> 2 Fixed assertion
>>>
>>>
>>> Please help me review.
>>>
>>>
>>> Wenjing
>>>
>>>
>>


Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Tom Lane
Mark Dilger  writes:
> But then the manual page goes on to say:

>> %E* %O*
>> POSIX locale extensions.  The sequences %Ec %EC %Ex %EX %Ey %EY %Od %Oe %OH 
>> %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy are supposed to provide alternate 
>> representations.
>> 
>> Additionally %OB implemented to represent alternative months names (used 
>> standalone, without day mentioned).

> This is the part I haven’t played with, but it sounds like it can handle at 
> least one alternate name.  Perhaps you can get the alternates this way?

This sounded promising, but the POSIX strftime spec doesn't mention %OB,
so I'm afraid we can't count on it to do much.  At this point I'm not
really convinced that there are no languages with more than two forms,
anyway :-(.

I also wondered whether we could get any further by using strptime() to
convert localized month and day names on-the-fly, rather than the patch's
current approach of re-using strftime() results.  If strptime() fails
to support alternative names, it's their bug not ours.  Unfortunately,
glibc has got said bug (AFAICS anyway), so in practice this would only
offer us plausible deniability and not much of any real functionality.

In the end it seems like we could only handle alternative names by
keeping our own lists of them.  There are probably few enough cases
that that wouldn't be a tremendous maintenance problem, but what
I'm not quite seeing is how we'd decide which list to use when.
Right now, locale identifiers are pretty much opaque to us ... do
we really want to get into the business of recognizing that such a
name refers to German, or Greek?

A brute-force answer, if there are few enough cases, is to recognize
them regardless of the specific value of LC_TIME.  Do we think
anybody would notice or care if to_date('Sonnabend', 'TMDay') works
even when in a non-German locale?

regards, tom lane




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 11:35 AM Tom Lane  wrote:
> 3. Some compilers still don't understand that elog(ERROR) doesn't
> return, so you need a dummy return.  Perhaps pg_unreachable()
> would do as well, but project style has been the dummy return for
> a long time ... and I'm not entirely convinced by the assumption
> that every compiler understands pg_unreachable(), anyway.

Is the example of CreateDestReceiver() sufficient to show that this is
not a problem in practice?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 11:24 AM Tom Lane  wrote:
> I strongly object to having the subtype field be just "char".
> I want it to be declared "MemoryContextType", so that gdb will
> still be telling me explicitly what struct type this really is.
> I realize that you probably did that for alignment reasons, but
> maybe we could shave the magic number down to 2 bytes, and then
> rearrange the field order?  Or just not sweat so much about
> wasting a longword here.  Having those bools up at the front
> is pretty ugly anyway.

I kind of dislike having the magic number not be the first thing in
the struct on aesthetic grounds, and possibly on the grounds that
somebody might be examining the initial bytes manually to try to
figure out what they've got, and having the magic number there makes
it easier. Regarding space consumption, I guess this structure is
already over 64 bytes and not close to 128 bytes, so adding another 8
bytes probably isn't very meaningful, but I don't love it. One thing
that concerns me a bit is that if somebody adds their own type of
memory context, then MemoryContextType will have a value that is not
actually in the enum. If compilers are optimizing the code on the
assumption that this cannot occur, do we need to worry about undefined
behavior?

Actually, I have what I think is a better idea. I notice that the
"type" field is actually completely unused. We initialize it, and then
nothing in the code ever checks it or relies on it for any purpose.
So, we could have a bug in the code that initializes that field with
the wrong value, for a new context type or in general, and only a
developer with a debugger would ever notice. That's because the thing
that really matters is the 'methods' array. So what I propose is that
we nuke the type field from orbit. If a developer wants to figure out
what type of context they've got, they should print
context->methods[0]; gdb will tell you the function names stored
there, and then you'll know *for real* what type of context you've
got.

Here's a v2 that approaches the problem that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v2-0001-Teach-MemoryContext-infrastructure-not-to-depend-.patch
Description: Binary data


Re: [Proposal] Global temporary tables

2020-01-28 Thread Pavel Stehule
út 28. 1. 2020 v 18:12 odesílatel 曾文旌(义从) 
napsal:

>
>
> 2020年1月29日 上午12:40,Pavel Stehule  写道:
>
>
>
> út 28. 1. 2020 v 17:01 odesílatel 曾文旌(义从) 
> napsal:
>
>>
>>
>> 2020年1月24日 上午4:47,Robert Haas  写道:
>>
>> On Sat, Jan 11, 2020 at 8:51 PM Tomas Vondra
>>  wrote:
>>
>> I proposed just ignoring those new indexes because it seems much simpler
>> than alternative solutions that I can think of, and it's not like those
>> other solutions don't have other issues.
>>
>>
>> +1.
>>
>> I complete the implementation of this feature.
>> When a session x create an index idx_a on GTT A then
>> For session x, idx_a is valid when after create index.
>> For session y, before session x create index done, GTT A has some data,
>> then index_a is invalid.
>> For session z, before session x create index done, GTT A has no data,
>> then index_a is valid.
>>
>>
>> For example, I've looked at the "on demand" building as implemented in
>> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
>> calls into various places in index code seems somewht suspicious.
>>
>>
>> +1. I can't imagine that's a safe or sane thing to do.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>>
>> Opinion by Pavel
>> + rel->rd_islocaltemp = true;  <<< if this is valid, then the name of
>> field "rd_islocaltemp" is not probably best
>> I renamed rd_islocaltemp
>>
>
> I don't see any change?
>
> Rename rd_islocaltemp to rd_istemp
> in global_temporary_table_v8-pg13.patch
>

ok :)

Pavel

>
>
> Wenjing
>
>
>
>
>
>
>> Opinion by Konstantin Knizhnik
>> 1 Fixed comments
>> 2 Fixed assertion
>>
>>
>> Please help me review.
>>
>>
>> Wenjing
>>
>>
>


Re: pause recovery if pitr target not reached

2020-01-28 Thread Leif Gunnar Erlandsen
Great job with the patch Peter, it has been even cleaner than before after you 
moved the check.


> "Peter Eisentraut"  skrev 27. januar 2020 
> kl. 12:16:




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Julien Rouhaud
On Tue, Jan 28, 2020 at 5:26 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud  wrote:
> >> FTR this has unfortunately the same result on Thomas' automatic patch
> >> tester, e.g. 
> >> https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968
>
> > That's unfortunate ... but presumably Tom's changes took care of this?
>
> Probably the cfbot just hasn't retried this build since that fix.
> I don't know what its algorithm is for retrying failed builds, but it
> does seem to do so after awhile.

Yes, I think to remember that Thomas put some rules to avoid
rebuilding everything all the time.  Patches that was rebuilt since
indeed starting to get back to green, so it's all good!




Re: [Proposal] Global temporary tables

2020-01-28 Thread Pavel Stehule
út 28. 1. 2020 v 17:01 odesílatel 曾文旌(义从) 
napsal:

>
>
> 2020年1月24日 上午4:47,Robert Haas  写道:
>
> On Sat, Jan 11, 2020 at 8:51 PM Tomas Vondra
>  wrote:
>
> I proposed just ignoring those new indexes because it seems much simpler
> than alternative solutions that I can think of, and it's not like those
> other solutions don't have other issues.
>
>
> +1.
>
> I complete the implementation of this feature.
> When a session x create an index idx_a on GTT A then
> For session x, idx_a is valid when after create index.
> For session y, before session x create index done, GTT A has some data,
> then index_a is invalid.
> For session z, before session x create index done, GTT A has no data,
> then index_a is valid.
>
>
> For example, I've looked at the "on demand" building as implemented in
> global_private_temp-8.patch, I kinda doubt adding a bunch of index build
> calls into various places in index code seems somewht suspicious.
>
>
> +1. I can't imagine that's a safe or sane thing to do.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> Opinion by Pavel
> + rel->rd_islocaltemp = true;  <<< if this is valid, then the name of
> field "rd_islocaltemp" is not probably best
> I renamed rd_islocaltemp
>

I don't see any change?



> Opinion by Konstantin Knizhnik
> 1 Fixed comments
> 2 Fixed assertion
>
>
> Please help me review.
>
>
> Wenjing
>
>


Re: making the backend's json parser work in frontend code

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 10:30 AM Mahendra Singh Thalor
>  wrote:
>> Tom Lane already fixed this and committed 
>> yesterday(4589c6a2a30faba53d0655a8e).

> Oops. OK, thanks.

Yeah, there were multiple issues here:

1. If a switch is expected to cover all values of an enum type,
we now prefer not to have a default: case, so that we'll get
compiler warnings if somebody adds an enum value and fails to
update the switch.

2. Without a default:, though, you need to have after-the-switch
code to catch the possibility that the runtime value was not a
legal enum element.  Some compilers are trusting and assume that
that's not a possible case, but some are not (and Coverity will
complain about it too).

3. Some compilers still don't understand that elog(ERROR) doesn't
return, so you need a dummy return.  Perhaps pg_unreachable()
would do as well, but project style has been the dummy return for
a long time ... and I'm not entirely convinced by the assumption
that every compiler understands pg_unreachable(), anyway.

(I know Robert knows all this stuff, even if he momentarily
forgot.  Just summarizing for onlookers.)

regards, tom lane




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
 wrote:
> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now 
> based on a fresh copy of master.

I think the first question for 0005 is whether want this at all.
Initially, you proposed NOT committing it, but then Andrew reviewed it
as if it were for commit. I'm not sure whether he was actually saying
that it ought to be committed, though, or whether he just missed your
remarks on the topic. Nobody else has really taken a position. I'm not
100% convinced that it's necessary to include this, but I'm also not
particularly opposed to it. It's a fairly small amount of code, which
is nice, and perhaps useful as a demonstration of how to use the JSON
parser in a frontend application, which someone also might find nice.
Anyone else want to express an opinion?

Meanwhile, here is a round of nitp^H^H^H^Hreview:

-# installcheck and install should not recurse into the subdirectory "modules".
+# installcheck and install should not recurse into the subdirectory "modules"
+# nor "bin".

I would probably have just changed this to:

# installcheck and install should not recurse into "modules" or "bin"

The details are arguable, but you definitely shouldn't say "the
subdirectory" and then list two of them.

+This directory contains a set of programs that exercise functionality declared
+in src/include/common and defined in src/common.  The purpose of these programs
+is to verify that code intended to work both from frontend and backend code do
+indeed work when compiled and used in frontend code.  The structure of this
+directory makes no attempt to test that such code works in the backend, as the
+backend has its own tests already, and presumably those tests sufficiently
+exercide the code as used by it.

"exercide" is not spelled correctly, but I also disagree with giving
the directory so narrow a charter. I think you should just say
something like:

This directory contains programs that are built and executed for
testing purposes,
but never installed. It may be used, for example, to test that code in
src/common
works in frontend environments.

+# There doesn't seem to be any easy way to get TestLib to use the binaries from
+# our directory, so we hack up a path to our binary and run that
directly.  This
+# seems brittle enough that some other solution should be found, if possible.
+
+my $test_json = join('/', $ENV{TESTDIR}, 'test_json');

I don't know what the right thing to do here is. Perhaps someone more
familiar with TAP testing can comment.

+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_json"));

Do we need this? I guess we're not likely to bother with translations
for a test program.

+ /*
+ * Make stdout unbuffered to match stderr; and ensure stderr is unbuffered
+ * too, which it should already be everywhere except sometimes in Windows.
+ */
+ setbuf(stdout, NULL);
+ setbuf(stderr, NULL);

Do we need this? If so, why?

+ char *json;
+ unsigned int json_len;
+ JsonLexContext *lex;
+ int client_encoding;
+ JsonParseErrorType parse_result;
+
+ json_len = (unsigned int) strlen(str);
+ client_encoding = PQenv2encoding();
+
+ json = strdup(str);
+ lex = makeJsonLexContextCstringLen(json, strlen(json),
client_encoding, true /* need_escapes */);
+ parse_result = pg_parse_json(lex, );
+ fprintf(stdout, _("%s\n"), (JSON_SUCCESS == parse_result ? "VALID" :
"INVALID"));
+ return;

json_len is set but not used.

Not entirely sure why we are using PQenv2encoding() here.

The trailing return is unnecessary.

I think it would be a good idea to use json_errdetail() in the failure
case, print the error, and have the tests check that we got the
expected error.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud  wrote:
>> FTR this has unfortunately the same result on Thomas' automatic patch
>> tester, e.g. 
>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

> That's unfortunate ... but presumably Tom's changes took care of this?

Probably the cfbot just hasn't retried this build since that fix.
I don't know what its algorithm is for retrying failed builds, but it
does seem to do so after awhile.

regards, tom lane




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jan 28, 2020 at 10:35 AM Tom Lane  wrote:
>> I don't actually believe that private context types in extensions is
>> a very likely use-case, so on the whole I'd just as soon leave this
>> alone.  If we did want to do something, I'd vote for one NodeTag
>> code T_MemoryContext and then a secondary ID field that's an enum
>> over specific memory context types.

> I generally like this idea, but I'd like to propose that we instead
> replace the NodeTag with a 4-byte magic number.

Yeah, there's something to be said for that.  It's unlikely that it'd
ever make sense for us to have copy/equal/write/read/etc support for
memory context headers, so having them be part of the Node taxonomy
doesn't seem very necessary.

> Along with that, I think we could also change MemoryContextIsValid()
> to just check the magic number and not validate the type field.

Right, that's isomorphic to what I was imagining: there'd be just
one check not N.

> Proposed patch attached.

I strongly object to having the subtype field be just "char".
I want it to be declared "MemoryContextType", so that gdb will
still be telling me explicitly what struct type this really is.
I realize that you probably did that for alignment reasons, but
maybe we could shave the magic number down to 2 bytes, and then
rearrange the field order?  Or just not sweat so much about
wasting a longword here.  Having those bools up at the front
is pretty ugly anyway.

regards, tom lane




Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Alvaro Herrera
On 2020-Jan-28, Peter Eisentraut wrote:

> On 2020-01-28 04:05, Mark Dilger wrote:
> > German uses both Sonnabend and Samstag for Saturday, so don’t you have to 
> > compare to a list of values anyway?
> 
> Yeah, good point.  If it doesn't accept both "Sonnabend" and "Samstag", then
> it's not really usable.

The string "Sonnabend" never appears in the glibc sources, so that will
certainly not work.  I vote not to care about that, but of course my
language is not one that has alternate weekday/month names.  I guess if
we're intent on recognizing alternate names, we'll have to build our own
list of them :-(

I don't have the ICU sources here to check the status there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 10:19 AM Julien Rouhaud  wrote:
> FTR this has unfortunately the same result on Thomas' automatic patch
> tester, e.g. 
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968

That's unfortunate ... but presumably Tom's changes took care of this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 10:35 AM Tom Lane  wrote:
> I don't actually believe that private context types in extensions is
> a very likely use-case, so on the whole I'd just as soon leave this
> alone.  If we did want to do something, I'd vote for one NodeTag
> code T_MemoryContext and then a secondary ID field that's an enum
> over specific memory context types.

I generally like this idea, but I'd like to propose that we instead
replace the NodeTag with a 4-byte magic number. I was studying how
feasible it would be to make memory contexts available in frontend
code, and it doesn't look all that bad, but one of the downsides is
that nodes/memnodes.h includes nodes/nodes.h, and I think it would not
be a good idea to make frontend code depend on nodes/nodes.h, which
seems like it's really a backend-only piece of infrastructure. Using a
magic number would allow us to avoid that, and it doesn't really cost
anything, because the memory context nodes really don't participate in
any of that infrastructure anyway.

Along with that, I think we could also change MemoryContextIsValid()
to just check the magic number and not validate the type field. I
think the spirit of the code is just to check that we've got some kind
of memory context rather than random garbage in memory, and checking
the magic number is good enough for that. It's possibly a little
better than what we have now, since a node tag is a small integer,
which might be more likely to occur at a random pointer address. At
any rate I think it's no worse.

Proposed patch attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-0001-Teach-MemoryContext-infrastructure-not-to-depend-.patch
Description: Binary data


Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Mark Dilger



> On Jan 28, 2020, at 7:47 AM, Juan José Santamaría Flecha 
>  wrote:
> 
> This looks like a POSIX feature that some systems might not like (Windows 
> [1]). But if this is something that the patch should aim to, I am fine with a 
> RWF and give it another try in the future.

As long as this implementation doesn’t create a backward-compatibility problem 
when doing a more complete implementation later, I’m fine with this patch not 
tackling the whole problem.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Tomas Vondra

On Tue, Jan 28, 2020 at 11:32:49PM +0900, Kohei KaiGai wrote:

2020年1月28日(火) 23:09 Tomas Vondra :


On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
>Hello,
>
>I noticed MemoryContextIsValid() called by various kinds of memory context
>routines checks its node-tag as follows:
>
>#define MemoryContextIsValid(context) \
>((context) != NULL && \
> (IsA((context), AllocSetContext) || \
>  IsA((context), SlabContext) || \
>  IsA((context), GenerationContext)))
>
>It allows only "known" memory context methods, even though the memory context
>mechanism enables to implement custom memory allocator by extensions.
>Here is a node tag nobody used: T_MemoryContext.
>It looks to me T_MemoryContext is a neutral naming for custom memory context,
>and here is no reason why memory context functions prevents custom methods.
>

Good question. I don't think there's an explicit reason not to allow
extensions to define custom memory contexts, and using T_MemoryContext
seems like a possible solution. It's a bit weird though, because all the
actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
T_CustomMemoryContext would be a better choice, but that only works in
master, of course.


From the standpoint of extension author, reuse of T_MemoryContext and
back patching the change on MemoryContextIsValid() makes us happy. :)
However, even if we add a new node-tag here, the custom memory-context
can leave to use fake node-tag a few years later. It's better than nothing.



Oh, right. I forgot we still need to backpatch this bit. But that seems
like a fairly small amount of code, so it should work.

I think we can't backpatch the addition of T_CustomMemoryContext anyway
as it essentially breaks ABI, as it changes the values assigned to T_
constants.


Also, it won't work if we need to add memory contexts to equalfuncs.c
etc. but maybe won't need that - it's more a theoretical issue.


Right now, none of nodes/XXXfuncs.c support these class of nodes related to
MemoryContext. It shall not be a matter.



Yes. I did not really mean it as argument against the patch, it was
meant more like "This could be an issue, but it actually is not." Sorry
if that wasn't clear.


>https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
>I recently implemented a custom memory context for shared memory allocation
>with portable pointers. It shall be used for cache of pre-built gpu
>binary code and
>metadata cache of apache arrow files.
>However, the assertion check above requires extension to set a fake node-tag
>to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
>feel a bit bad.
>

Interesting. Does that mean the hared memory contexts are part of the
same hierarchy as "normal" contexts? That would be a bit confusing, I
think.


If this shared memory context is a child of normal context, likely, it should be
reset or deleted as usual. However, if this shared memory context performs
as a parent of normal context, it makes a problem when different process tries
to delete this context, because its child (normal context) exists at the creator
process only. So, it needs to be used carefully.



Yeah, handling life cycle of a mix of those contexts may be quite tricky.

But my concern was a bit more general - is it a good idea to hide the
nature of the memory context behind the same API. If you call palloc()
shouldn't you really know whether you're allocating the stuff in regular
or shared memory context?

Maybe it makes perfect sense, but my initial impression is that those
seem rather different, so maybe we should keep them separate (in
separate hierarchies or something). But I admit I don't have much
experience with use cases that would require such shmem contexts.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Juan José Santamaría Flecha
On Tue, Jan 28, 2020 at 4:06 PM Mark Dilger 
wrote:

>
> I’m not insisting, just asking about the design.  If it only works with
> one name supported per weekday per language, and per month per language,
> I’m not going to object.  I was just curious if you were going to support
> multiple names anyway, and if that made the question about the Greek case
> folding less pressing.
>
>
This patch targets to do something symmetrical to to_char(), which will
just return a single value.

The issue with the greek locale is that it cannot hold this equivalent
behaviour, as in:

postgres=# select to_date(to_char(now(), 'TMMONTH'), 'TMMONTH');
ERROR:  invalid value "ΙΑΝΟΥΆΡΙΟΣ" for "MONTH"

Because of the particular behaviour sigma (Σσς) casing has, which is also
user visible with a lower().


>
> >  %E* %O*
> >POSIX locale extensions.  The sequences %Ec %EC %Ex %EX %Ey
> %EY %Od %Oe %OH %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy are supposed to
> provide alternate representations.
> >
> >Additionally %OB implemented to represent alternative months
> names (used standalone, without day mentioned).
>
> This is the part I haven’t played with, but it sounds like it can handle
> at least one alternate name.  Perhaps you can get the alternates this way?
>
>
This looks like a POSIX feature that some systems might not like (Windows
[1]). But if this is something that the patch should aim to, I am fine with
a RWF and give it another try in the future.

[1]
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strftime-wcsftime-strftime-l-wcsftime-l?view=vs-2019

Regards,

Juan José Santamaría Flecha


Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
>> I noticed MemoryContextIsValid() called by various kinds of memory context
>> routines checks its node-tag as follows:
>> #define MemoryContextIsValid(context) \
>> ((context) != NULL && \
>> (IsA((context), AllocSetContext) || \
>> IsA((context), SlabContext) || \
>> IsA((context), GenerationContext)))

> Good question. I don't think there's an explicit reason not to allow
> extensions to define custom memory contexts, and using T_MemoryContext
> seems like a possible solution. It's a bit weird though, because all the
> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> T_CustomMemoryContext would be a better choice, but that only works in
> master, of course.

I think the original reason for having distinct node types was to let
individual mcxt functions verify that they were passed the right type
of node ... but I don't see any of them actually doing so, and the
context-methods API makes it a bit hard to credit that we need to.
So there's certainly a case for replacing all three node typecodes
with just MemoryContext.  On the other hand, it'd be ugly and it would
complicate debugging: you could not be totally sure which struct type
to cast a pointer-to-MemoryContext to when trying to inspect the
contents.  We have a roughly comparable situation with respect to
Value --- the node type codes we use with it, such as T_String, don't
have anything to do with the struct typedef name.  I've always found
that very confusing when debugging.  When gdb tells me that
"*(Node*) address" is T_String, the first thing I want to do is write
"p *(String*) address" and of course that doesn't work.

I don't actually believe that private context types in extensions is
a very likely use-case, so on the whole I'd just as soon leave this
alone.  If we did want to do something, I'd vote for one NodeTag
code T_MemoryContext and then a secondary ID field that's an enum
over specific memory context types.  But that doesn't really improve
matters for debugging extension contexts, because they still don't
have a way to add elements to the secondary enum.

regards, tom lane




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Tue, Jan 28, 2020 at 10:30 AM Mahendra Singh Thalor
 wrote:
> Tom Lane already fixed this and committed 
> yesterday(4589c6a2a30faba53d0655a8e).

Oops. OK, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Mahendra Singh Thalor
On Tue, 28 Jan 2020 at 20:36, Robert Haas  wrote:
>
> On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
>  wrote:
> > I can see one warning on HEAD.
> >
> > jsonapi.c: In function ‘json_errdetail’:
> > jsonapi.c:1068:1: warning: control reaches end of non-void function
> > [-Wreturn-type]
> >  }
> >  ^
> >
> > Attaching a patch to fix warning.
>
> Hmm, I don't get a warning there. This function is a switch over an
> enum type with a case for every value of the enum, and every branch
> either does a "return" or an "elog," so any code after the switch
> should be unreachable. It's possible your compiler is too dumb to know
> that, but I thought there were other places in the code base where we
> assumed that if we handled every defined value of enum, that was good
> enough.
>
> But maybe not. I found similar coding in CreateDestReceiver(), and
> that ends with:
>
> /* should never get here */
> pg_unreachable();
>
> So perhaps we need the same thing here. Does adding that fix it for you?
>

Hi Robert,
Tom Lane already fixed this and committed yesterday(4589c6a2a30faba53d0655a8e).

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Julien Rouhaud
On Tue, Jan 28, 2020 at 4:06 PM Robert Haas  wrote:
>
> On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
>  wrote:
> > I can see one warning on HEAD.
> >
> > jsonapi.c: In function ‘json_errdetail’:
> > jsonapi.c:1068:1: warning: control reaches end of non-void function
> > [-Wreturn-type]
> >  }
> >  ^
> >
> > Attaching a patch to fix warning.
>
> Hmm, I don't get a warning there. This function is a switch over an
> enum type with a case for every value of the enum, and every branch
> either does a "return" or an "elog," so any code after the switch
> should be unreachable. It's possible your compiler is too dumb to know
> that, but I thought there were other places in the code base where we
> assumed that if we handled every defined value of enum, that was good
> enough.
>
> But maybe not. I found similar coding in CreateDestReceiver(), and
> that ends with:
>
> /* should never get here */
> pg_unreachable();
>
> So perhaps we need the same thing here. Does adding that fix it for you?

FTR this has unfortunately the same result on Thomas' automatic patch
tester, e.g. 
https://travis-ci.org/postgresql-cfbot/postgresql/builds/642634195#L1968




Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Mon, Jan 27, 2020 at 3:05 PM Mark Dilger
 wrote:
> I’m attaching a new patch set with these three changes including Mahendra’s 
> patch posted elsewhere on this thread.
>
> Since you’ve committed your 0004 and 0005 patches, this v6 patch set is now 
> based on a fresh copy of master.

OK, so I think this is getting close.

What is now 0001 manages to have four (4) conditionals on FRONTEND at
the top of the file. This seems like at least one two many. I am OK
with this being separate:

+#ifndef FRONTEND
 #include "postgres.h"
+#else
+#include "postgres_fe.h"
+#endif

postgres(_fe).h has pride of place among includes, so it's reasonable
to put this in its own section like this.

+#ifdef FRONTEND
+#define check_stack_depth()
+#define json_log_and_abort(...) pg_log_fatal(__VA_ARGS__); exit(1);
+#else
+#define json_log_and_abort(...) elog(ERROR, __VA_ARGS__)
+#endif

OK, so here we have a section entirely devoted to our own file-local
macros. Also reasonable. But in between, you have both an #ifdef
FRONTEND and an #ifndef FRONTEND for other includes, and I really
think that should be like #ifdef FRONTEND .. #else .. #endif.

Also, the preprocessor police are on their way to your house now to
arrest you for that first one. You need to write it like this:

#define json_log_and_abort(...) \
do { pg_log_fatal(__VA_ARGS__); exit(1); } while (0)

Otherwise, hilarity ensues if somebody writes if (my_code_is_buggy)
json_log_and_abort("oops").

 {
- JsonLexContext *lex = palloc0(sizeof(JsonLexContext));
+ JsonLexContext *lex;
+
+#ifndef FRONTEND
+ lex = palloc0(sizeof(JsonLexContext));
+#else
+ lex = (JsonLexContext*) malloc(sizeof(JsonLexContext));
+ memset(lex, 0, sizeof(JsonLexContext));
+#endif

Instead of this, how making no change at all here?

- default:
- elog(ERROR, "unexpected json parse state: %d", ctx);
  }
+
+ /* Not reached */
+ json_log_and_abort("unexpected json parse state: %d", ctx);

This, too, seems unnecessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Physical replication slot advance is not persistent

2020-01-28 Thread Alexey Kondratov

On 28.01.2020 15:14, Kyotaro Horiguchi wrote:

At Tue, 28 Jan 2020 17:45:47 +0800, Craig Ringer  wrote 
in

On Tue, 28 Jan 2020 at 16:01, Michael Paquier  wrote:

So attached is an updated patch which addresses the problem just by
marking a physical slot as dirty if any advancing is done.  Some
documentation is added about the fact that an advance is persistent
only at the follow-up checkpoint.  And the tests are fixed to not use
a fake LSN but instead advance to the latest LSN position produced.

Any objections?

LGTM. Thankyou.

I agree not to save slots immediately. The code is wrtten as described
above. The TAP test is correct.


+1, removing this broken saving code path from 
pg_replication_slot_advance and marking slot as dirty looks good to me. 
It solves the issue and does not add any unnecessary complexity.




But the doc part looks a bit too detailed to me. Couldn't we explain
that without the word 'dirty'?

-and it will not be moved beyond the current insert location.  Returns
-name of the slot and real position to which it was advanced to.
+and it will not be moved beyond the current insert location. Returns
+name of the slot and real position to which it was advanced to. The
+updated slot is marked as dirty if any advancing is done, with its
+information being written out at the follow-up checkpoint. In the
+event of a crash, the slot may return to an earlier position.

and it will not be moved beyond the current insert location. Returns
name of the slot and real position to which it was advanced to. The
information of the updated slot is scheduled to be written out at the
follow-up checkpoint if any advancing is done. In the event of a
crash, the slot may return to an earlier position.


Just searched through the *.sgml files, we already use terms 'dirty' and 
'flush' applied to writing out pages during checkpoints. Here we are 
trying to describe the very similar process, but in relation to 
replication slots, so it looks fine for me. In the same time, the term 
'schedule' is used for VACUUM, constraint check or checkpoint itself.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company





Re: making the backend's json parser work in frontend code

2020-01-28 Thread Robert Haas
On Mon, Jan 27, 2020 at 2:02 PM Mahendra Singh Thalor
 wrote:
> I can see one warning on HEAD.
>
> jsonapi.c: In function ‘json_errdetail’:
> jsonapi.c:1068:1: warning: control reaches end of non-void function
> [-Wreturn-type]
>  }
>  ^
>
> Attaching a patch to fix warning.

Hmm, I don't get a warning there. This function is a switch over an
enum type with a case for every value of the enum, and every branch
either does a "return" or an "elog," so any code after the switch
should be unreachable. It's possible your compiler is too dumb to know
that, but I thought there were other places in the code base where we
assumed that if we handled every defined value of enum, that was good
enough.

But maybe not. I found similar coding in CreateDestReceiver(), and
that ends with:

/* should never get here */
pg_unreachable();

So perhaps we need the same thing here. Does adding that fix it for you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Mark Dilger



> On Jan 28, 2020, at 6:51 AM, Tom Lane  wrote:
> 
> Peter Eisentraut  writes:
>> On 2020-01-28 04:05, Mark Dilger wrote:
>>> German uses both Sonnabend and Samstag for Saturday, so don’t you have to 
>>> compare to a list of values anyway?
> 
>> Yeah, good point.  If it doesn't accept both "Sonnabend" and "Samstag", 
>> then it's not really usable.
> 
> If we're going to insist on that, then the entire patch is junk.

I’m not insisting, just asking about the design.  If it only works with one 
name supported per weekday per language, and per month per language, I’m not 
going to object.  I was just curious if you were going to support multiple 
names anyway, and if that made the question about the Greek case folding less 
pressing.

> Indeed, I don't even know where we could get the knowledge of which
> name(s) to accept, because strftime is surely only going to tell us
> one of them.

I haven’t played with this yet, but per the manual page for strftime_l:

>  %Ais replaced by national representation of the full weekday name.
> 
>  %ais replaced by national representation of the abbreviated weekday 
> name.
> 
>  %Bis replaced by national representation of the full month name.
> 
>  %bis replaced by national representation of the abbreviated month 
> name.

Which I think gives you just the preferred name, by whatever means the library 
decides which of Sonnabend/Samstag is the preferred name.  But then the manual 
page goes on to say:

>  %E* %O*
>POSIX locale extensions.  The sequences %Ec %EC %Ex %EX %Ey %EY 
> %Od %Oe %OH %OI %Om %OM %OS %Ou %OU %OV %Ow %OW %Oy are supposed to provide 
> alternate representations.
> 
>Additionally %OB implemented to represent alternative months names 
> (used standalone, without day mentioned).

This is the part I haven’t played with, but it sounds like it can handle at 
least one alternate name.  Perhaps you can get the alternates this way?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Allow to_date() and to_timestamp() to accept localized names

2020-01-28 Thread Tom Lane
Peter Eisentraut  writes:
> On 2020-01-28 04:05, Mark Dilger wrote:
>> German uses both Sonnabend and Samstag for Saturday, so don’t you have to 
>> compare to a list of values anyway?

> Yeah, good point.  If it doesn't accept both "Sonnabend" and "Samstag", 
> then it's not really usable.

If we're going to insist on that, then the entire patch is junk.
Indeed, I don't even know where we could get the knowledge of which
name(s) to accept, because strftime is surely only going to tell us
one of them.

regards, tom lane




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Kohei KaiGai
2020年1月28日(火) 23:09 Tomas Vondra :
>
> On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:
> >Hello,
> >
> >I noticed MemoryContextIsValid() called by various kinds of memory context
> >routines checks its node-tag as follows:
> >
> >#define MemoryContextIsValid(context) \
> >((context) != NULL && \
> > (IsA((context), AllocSetContext) || \
> >  IsA((context), SlabContext) || \
> >  IsA((context), GenerationContext)))
> >
> >It allows only "known" memory context methods, even though the memory context
> >mechanism enables to implement custom memory allocator by extensions.
> >Here is a node tag nobody used: T_MemoryContext.
> >It looks to me T_MemoryContext is a neutral naming for custom memory context,
> >and here is no reason why memory context functions prevents custom methods.
> >
>
> Good question. I don't think there's an explicit reason not to allow
> extensions to define custom memory contexts, and using T_MemoryContext
> seems like a possible solution. It's a bit weird though, because all the
> actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
> T_CustomMemoryContext would be a better choice, but that only works in
> master, of course.
>
>From the standpoint of extension author, reuse of T_MemoryContext and
back patching the change on MemoryContextIsValid() makes us happy. :)
However, even if we add a new node-tag here, the custom memory-context
can leave to use fake node-tag a few years later. It's better than nothing.

> Also, it won't work if we need to add memory contexts to equalfuncs.c
> etc. but maybe won't need that - it's more a theoretical issue.
>
Right now, none of nodes/XXXfuncs.c support these class of nodes related to
MemoryContext. It shall not be a matter.

> >https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
> >I recently implemented a custom memory context for shared memory allocation
> >with portable pointers. It shall be used for cache of pre-built gpu
> >binary code and
> >metadata cache of apache arrow files.
> >However, the assertion check above requires extension to set a fake node-tag
> >to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, 
> >but
> >feel a bit bad.
> >
>
> Interesting. Does that mean the hared memory contexts are part of the
> same hierarchy as "normal" contexts? That would be a bit confusing, I
> think.
>
If this shared memory context is a child of normal context, likely, it should be
reset or deleted as usual. However, if this shared memory context performs
as a parent of normal context, it makes a problem when different process tries
to delete this context, because its child (normal context) exists at the creator
process only. So, it needs to be used carefully.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 




Re: standby apply lag on inactive servers

2020-01-28 Thread Alvaro Herrera
On 2020-Jan-27, Alvaro Herrera wrote:

> Actually looking again, getRecordTimestamp is looking pretty strange.
> It looks much more natural by using nested switch/case blocks, as with
> this diff.  I think the compiler does a better job this way too.

I hadn't noticed I forgot to attach the diff here :-(

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0bc3d8cec5..1a165701ae 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5566,33 +5566,43 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 static bool
 getRecordTimestamp(XLogReaderState *record, TimestampTz *recordXtime)
 {
-	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
-	uint8		xact_info = info & XLOG_XACT_OPMASK;
-	uint8		rmid = XLogRecGetRmid(record);
+	RmgrId		rmid = XLogRecGetRmid(record);
+	uint8		info;
 
-	if (rmid == RM_XLOG_ID && info == XLOG_RESTORE_POINT)
+	switch (rmid)
 	{
-		*recordXtime = ((xl_restore_point *) XLogRecGetData(record))->rp_time;
-		return true;
-	}
-	if (rmid == RM_XLOG_ID && (info == XLOG_CHECKPOINT_ONLINE ||
-			   info == XLOG_CHECKPOINT_SHUTDOWN))
-	{
-		*recordXtime =
-			time_t_to_timestamptz(((CheckPoint *) XLogRecGetData(record))->time);
-		return true;
-	}
-	if (rmid == RM_XACT_ID && (xact_info == XLOG_XACT_COMMIT ||
-			   xact_info == XLOG_XACT_COMMIT_PREPARED))
-	{
-		*recordXtime = ((xl_xact_commit *) XLogRecGetData(record))->xact_time;
-		return true;
-	}
-	if (rmid == RM_XACT_ID && (xact_info == XLOG_XACT_ABORT ||
-			   xact_info == XLOG_XACT_ABORT_PREPARED))
-	{
-		*recordXtime = ((xl_xact_abort *) XLogRecGetData(record))->xact_time;
-		return true;
+		case RM_XLOG_ID:
+			info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+			switch (info)
+			{
+case XLOG_RESTORE_POINT:
+	*recordXtime = ((xl_restore_point *) XLogRecGetData(record))->rp_time;
+	return true;
+case XLOG_CHECKPOINT_ONLINE:
+case XLOG_CHECKPOINT_SHUTDOWN:
+	*recordXtime =
+		time_t_to_timestamptz(((CheckPoint *) XLogRecGetData(record))->time);
+	return true;
+			}
+			break;
+
+		case RM_XACT_ID:
+			info = XLogRecGetInfo(record) & XLOG_XACT_OPMASK;
+
+			switch (info)
+			{
+case XLOG_XACT_COMMIT:
+case XLOG_XACT_COMMIT_PREPARED:
+	*recordXtime = ((xl_xact_commit *) XLogRecGetData(record))->xact_time;
+	return true;
+
+case XLOG_XACT_ABORT:
+case XLOG_XACT_ABORT_PREPARED:
+	*recordXtime = ((xl_xact_abort *) XLogRecGetData(record))->xact_time;
+	return true;
+			}
+			break;
 	}
 	return false;
 }


Re: standby apply lag on inactive servers

2020-01-28 Thread Alvaro Herrera
On 2020-Jan-28, Kyotaro Horiguchi wrote:

> At Mon, 27 Jan 2020 18:06:27 -0300, Alvaro Herrera  
> wrote in 
> > Actually looking again, getRecordTimestamp is looking pretty strange.
> > It looks much more natural by using nested switch/case blocks, as with
> > this diff.  I think the compiler does a better job this way too.
> 
> Agreed.  Anyway I looked the latest version.
> 
> The aim of the patch seem reasonable. XLOG_END_OF_RECOVERY and
> XLOG_XACT_PREPARE also have a timestamp but it doesn't help much. (But
> could be included for consistency.)

Hello, thanks for looking.

> The timestamp of a checkpoint record is the start time of a checkpoint
> (and doesn't have subseconds part, but it's not a problem.). That
> means that the timestamp is usually far behind the time at the record
> has been inserted. As the result the stored timestamp can go back by a
> significant internval. I don't think that causes an actual problem but
> the movement looks wierd as the result of
> pg_last_xact_replay_timestamp().

Ouch ... yeah, it should be set only if it doesn't go backwards.

> xlog.c:7329: (similar code exists at line 9332)
> >ereport(LOG,
> > (errmsg("redo done at %X/%X",
> > (uint32) (ReadRecPtr >> 32), (uint32) 
> > ReadRecPtr)));
> >xtime = GetLatestXTime();
> >if (xtime)
> > ereport(LOG,
> > (errmsg("last completed transaction was at log time %s",
> > timestamptz_to_str(xtime;
> 
> This code assumes (and the name GetLatestXTime() suggests, I first
> noticed that here..) that the timestamp comes from commit/abort logs,
> so otherwise it shows a wrong timestamp.  We shouldn't update the
> variable by other than that kind of records.

Hmm, that's terrible.  GetLatestXTime() being displayed user-visibly for
"last transaction completion" but having it include unrelated things
such as restore points is terrible.  One idea is to should split it in
two: one exclusively for transaction commit/abort, and another for all
WAL activity.  That way, the former can be used for that message, and
the latter for standby replay reports.  However, that might be
overengineering, if the only thing that the former is that one LOG
message; instead changing the log message to state that the time is for
other activity, as you suggest, is simpler and has no downside that I
can see.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Is custom MemoryContext prohibited?

2020-01-28 Thread Tomas Vondra

On Tue, Jan 28, 2020 at 10:55:11AM +0900, Kohei KaiGai wrote:

Hello,

I noticed MemoryContextIsValid() called by various kinds of memory context
routines checks its node-tag as follows:

#define MemoryContextIsValid(context) \
   ((context) != NULL && \
(IsA((context), AllocSetContext) || \
 IsA((context), SlabContext) || \
 IsA((context), GenerationContext)))

It allows only "known" memory context methods, even though the memory context
mechanism enables to implement custom memory allocator by extensions.
Here is a node tag nobody used: T_MemoryContext.
It looks to me T_MemoryContext is a neutral naming for custom memory context,
and here is no reason why memory context functions prevents custom methods.



Good question. I don't think there's an explicit reason not to allow
extensions to define custom memory contexts, and using T_MemoryContext
seems like a possible solution. It's a bit weird though, because all the
actual contexts are kinda "subclasses" of MemoryContext. So maybe adding
T_CustomMemoryContext would be a better choice, but that only works in
master, of course.

Also, it won't work if we need to add memory contexts to equalfuncs.c
etc. but maybe won't need that - it's more a theoretical issue.



https://github.com/heterodb/pg-strom/blob/master/src/shmbuf.c#L1243
I recently implemented a custom memory context for shared memory allocation
with portable pointers. It shall be used for cache of pre-built gpu
binary code and
metadata cache of apache arrow files.
However, the assertion check above requires extension to set a fake node-tag
to avoid backend crash. Right now, it is harmless to set T_AllocSetContext, but
feel a bit bad.



Interesting. Does that mean the hared memory contexts are part of the
same hierarchy as "normal" contexts? That would be a bit confusing, I
think.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




  1   2   >