Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-13 Thread Amit Kapila
On Sat, Sep 14, 2019 at 10:10 AM Michael Paquier  wrote:
>
> On Sat, Sep 14, 2019 at 09:25:31AM +0530, Amit Kapila wrote:
> > On Fri, Sep 13, 2019 at 5:31 PM Alvaro Herrera  
> > wrote:
> >> A thought I had as I fell asleep last night is to include the derivate
> >> flags in a separate output column altogether.  So
> >> heap_tuple_infomask_flags() could be made to return two columns, one
> >> with the straight one-flag-per-bit, and another one with the compound
> >> flags.
> >
> > So, in most cases, the compound column will be empty, but in some
> > cases like HEAP_XMIN_FROZEN, HEAP_XMAX_SHR_LOCK, etc. the flag will be
> > displayed.  I like this idea though it will be a bit of noise in some
> > cases but it is neat.  Another benefit is that one doesn't need to
> > invoke this function twice to see the compound flags.
>
> Hmmm.  Doesn't it become less user-friendly to invoke the function
> then?  You would need to pass it down to the FROM clause after
> fetching the raw page and then parsing its tuple items to have
> t_infomask and t_infomask2 passed down as arguments to the new
> function.  The one-column version has the advantage to be more
> consistent with tuple_data_split() after getting all the values parsed
> by heap_page_items().
>

Won't 'Lateral' clause be helpful here as the patch contains it in one
of its tests?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Amit Kapila
On Fri, Sep 13, 2019 at 6:36 PM Alvaro Herrera  wrote:
>
> On 2019-Sep-13, Amit Kapila wrote:
>
> > I would like to take inputs from others as well for the display part
> > of this patch.  After this patch, for a simple-update pgbench test,
> > the changed output will be as follows (note: partition method and
> > partitions):
>
> > pgbench.exe -c 4 -j 4 -T 10 -N postgres
> > starting vacuum...end.
> > transaction type: 
> > scaling factor: 1
> > partition method: hash
> > partitions: 3
> > query mode: simple
> > number of clients: 4
> > number of threads: 4
> > duration: 10 s
> > number of transactions actually processed: 14563
> > latency average = 2.749 ms
> > tps = 1454.899150 (including connections establishing)
> > tps = 1466.689412 (excluding connections establishing)
> >
> > What do others think about this?  This will be the case when the user
> > has used --partitions option in pgbench, otherwise, it won't change.
>
> I wonder what's the intended usage of this output ... it seems to be
> getting a bit too long.  Is this intended for machine processing?  I
> would rather have more things per line in a more compact header.
> But then I'm not the kind of person who automates multiple pgbench runs.
> Maybe we can get some input from Tomas, who does -- how do you automate
> extracting data from collected pgbench output, or do you instead just
> redirect the output to a file whose path/name indicates the parameters
> that were used?  (I do the latter.)
>
> I mean, if we changed it like this (and I'm not proposing to do it in
> this patch, this is only an example), would it bother anyone?
>
> $ pgbench -x -y -z ...
> starting vacuum...end.
> scaling factor: 1  partition method: hash   partitions: 1
> transaction type:   query mode: simple
> number of clients: 4   number of threads: 4 duration: 10s
> number of transactions actually processed: 14563
> latency average = 2.749 ms
> tps = 1454.899150 (including connections establishing)
> tps = 1466.689412 (excluding connections establishing)
>
>
> If this output doesn't bother people, then I suggest that this patch
> should put the partition information in the line together with scaling
> factor.
>

IIUC, there are two things here (a) you seem to be fine displaying
'partitions' and 'partition method' information, (b) you would prefer
to put it along with 'scaling factor' line.

I personally prefer each parameter to be displayed in a separate line,
but I am fine if more people would like to see the 'multiple
parameters information in a single line'.  I think it is better to
that (point (b)) as a separate patch even if we agree on changing the
display format.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-13 Thread Michael Paquier
On Sat, Sep 14, 2019 at 09:25:31AM +0530, Amit Kapila wrote:
> On Fri, Sep 13, 2019 at 5:31 PM Alvaro Herrera  
> wrote:
>> A thought I had as I fell asleep last night is to include the derivate
>> flags in a separate output column altogether.  So
>> heap_tuple_infomask_flags() could be made to return two columns, one
>> with the straight one-flag-per-bit, and another one with the compound
>> flags.
> 
> So, in most cases, the compound column will be empty, but in some
> cases like HEAP_XMIN_FROZEN, HEAP_XMAX_SHR_LOCK, etc. the flag will be
> displayed.  I like this idea though it will be a bit of noise in some
> cases but it is neat.  Another benefit is that one doesn't need to
> invoke this function twice to see the compound flags.

Hmmm.  Doesn't it become less user-friendly to invoke the function
then?  You would need to pass it down to the FROM clause after
fetching the raw page and then parsing its tuple items to have
t_infomask and t_infomask2 passed down as arguments to the new
function.  The one-column version has the advantage to be more
consistent with tuple_data_split() after getting all the values parsed
by heap_page_items().

>>  That way we always have the raw ones available, and we avoid any
>> confusion about strange cases such as LOCK_UPGRADED and IS_LOCKED_ONLY.
> 
> Yeah, but I am not sure if we want to do display LOCK_UPGRADED stuff
> in the compound column as that is not directly comparable to other
> flags we want to display there like HEAP_XMIN_FROZEN,
> HEAP_XMAX_SHR_LOCK.

Yep, I agree that this one ought to not be considered as a proper
combination.  The other three ones are fine though.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-13 Thread Michael Paquier
On Sat, Sep 14, 2019 at 09:51:33AM +0530, Amit Kapila wrote:
> Yeah, but I think we should also try to see what we want to do about
> 'decode_combined' flag-related point, maybe we can adapt to what
> Alvaro has purposed?

Thanks, I'll keep note of this patch.  I was just going to comment on
the other point raised :)
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Michael Paquier
On Sat, Sep 14, 2019 at 01:06:32PM +0900, Tattsu Yama wrote:
> Thanks! I can review your patch for fix it.
> However, I was starting fixing the problem from the last day of PGConf.Asia
> (11 Sep).
> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)

+   if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command 
== cmdtype)
+   {
+   PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+   beentry->st_progress_param[index] = val;
+   PGSTAT_END_WRITE_ACTIVITY(beentry);
+   }
You basically don't need the progress reports if the command ID is
invalid, no?

Another note is that you don't actually fix the problems related to
the calls of pgstat_progress_end_command() which have been added for
REINDEX reporting, so a progress report started for CLUSTER can get
ended earlier than expected, preventing the follow-up progress updates
to show up.
--
Michael


signature.asc
Description: PGP signature


Re: BF failure: could not open relation with OID XXXX while querying pg_views

2019-09-13 Thread Tom Lane
Tomas Vondra  writes:
> On Wed, Aug 14, 2019 at 05:24:26PM +1200, Thomas Munro wrote:
>> On Wed, Aug 14, 2019 at 5:06 PM Tom Lane  wrote:
>>> Oh, hmm --- yeah, that should mean it's safe.  Maybe somebody incautiously
>>> changed one of the other tests that run concurrently with "rules"?

>> Looks like stats_ext.sql could be the problem.  It creates and drops
>> priv_test_view, not in a schema.  Adding Dean, author of commit
>> d7f8d26d.

> Yeah, that seems like it might be the cause. I'll take a look at fixing
> this, probably by creating the view in a different schema.

Ping?  We're still getting intermittent failures of this ilk, eg

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet=2019-09-14%2003%3A37%3A03

With v12 release approaching, I'd like to not have failures
like this in a released branch.

regards, tom lane




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-13 Thread Amit Kapila
On Fri, Sep 13, 2019 at 10:42 AM Michael Paquier  wrote:
>
> On Fri, Sep 13, 2019 at 09:59:40AM +0530, Amit Kapila wrote:
> > I think that is what we have not done in one of the cases pointed by me.
>
> Thinking more about it, I see your point now.  HEAP_LOCKED_UPGRADED is
> not a direct combination of the other flags and depends on other
> conditions, so we cannot make a combination of it with other things.
> The three others don't have that problem.
>
> Attached is a patch to fix your suggestions.  This also removes the
> use of HEAP_XMAX_IS_LOCKED_ONLY which did not make completely sense
> either as a "raw" flag.  While on it, the order of the flags can be
> improved to match more the order of htup_details.h
>
> Does this patch address your concerns?
>

Yeah, but I think we should also try to see what we want to do about
'decode_combined' flag-related point, maybe we can adapt to what
Alvaro has purposed?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: pg_rewind docs correction

2019-09-13 Thread Michael Paquier
On Fri, Sep 13, 2019 at 01:47:03PM -0400, James Coleman wrote:
> So I've attached a patch to summarize more correctly as well as
> document clearly the state of the cluster after the operation and also
> the operation sequencing dangers caused by copying configuration
> files from the source.

+   After a successful rewind, the target data directory is equivalent
to the
+   to the state of the data directory at the point at which the
source and
+   target diverged plus the current state on the source of any blocks
changed
+   on the target after that divergence. While only changed blocks
from relation
+   files are copied; all other files are copied in full, including
configuration
+   files and WAL segments. The advantage of
pg_rewind
+   over taking a new base backup, or tools like
rsync,
+   is that pg_rewind does not require
comparing or
+   copying unchanged relation blocks in the cluster. As such the
rewind operation
+   is significantly faster than other approaches when the database is
large and
+   only a small fraction of blocks differ between the clusters.

The point of divergence could be defined as the LSN position where WAL
has forked on the new timeline, but the block diffs are copied from
actually the last checkpoint just before WAL has forked.  So this new
paragraph brings confusion about the actual divergence point.

Regarding the relation files, if the file does not exist on the target
but does exist on the source, it is also copied fully, so the second
sentence is wrong here to mention as relation files could also be
copied fully.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Tattsu Yama
Hi Alvaro!



> Hello Tatsuro,
> On 2019-Aug-13, Tatsuro Yamada wrote:
> > On 2019/08/02 3:43, Alvaro Herrera wrote:
> > > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > > ever go up.  I think it's because the indexes are built using parallel
> > > index build ... or maybe it was the table AM changes that moved things
> > > around, not sure.  There's a period at the end when the CLUSTER command
> > > keeps working, but it's gone from pg_stat_progress_cluster.
> >
> > Thanks for your report.
> > I'll investigate it. :)
>


I have fixed it.  Can you please verify?
>


Thanks! I can review your patch for fix it.
However, I was starting fixing the problem from the last day of PGConf.Asia
(11 Sep).
Attached file is WIP patch.In my patch, I added "command id" to all APIs of
progress reporting to isolate commands. Therefore, it doesn't allow to
cascade updating system views. And my patch is on WIP so it needs clean-up
and test.
I share it anyway. :)

Here is a test result of my patch.
The last column index_rebuild count is increased.

postgres=# select * from pg_stat_progress_cluster ; \watch 0.001;
11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0
...
11636|13591|postgres|16384|CLUSTER|index scanning
heap|16389|1|1|0|0|0
11636|13591|postgres|16384|CLUSTER|rebuilding
index|16389|1|1|0|0|0...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|1
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|2
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|3
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|4
...
11636|13591|postgres|16384|CLUSTER|performing final
cleanup|16389|1|1|0|0|5


Thanks,
Tatsuro Yamada


v1_fix_progress_report_for_cluster.patch
Description: Binary data


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-13 Thread Amit Kapila
On Fri, Sep 13, 2019 at 5:31 PM Alvaro Herrera  wrote:
>
> On 2019-Sep-13, Michael Paquier wrote:
>
> > Attached is a patch to fix your suggestions.  This also removes the
> > use of HEAP_XMAX_IS_LOCKED_ONLY which did not make completely sense
> > either as a "raw" flag.  While on it, the order of the flags can be
> > improved to match more the order of htup_details.h
>
> A thought I had as I fell asleep last night is to include the derivate
> flags in a separate output column altogether.  So
> heap_tuple_infomask_flags() could be made to return two columns, one
> with the straight one-flag-per-bit, and another one with the compound
> flags.
>

So, in most cases, the compound column will be empty, but in some
cases like HEAP_XMIN_FROZEN, HEAP_XMAX_SHR_LOCK, etc. the flag will be
displayed.  I like this idea though it will be a bit of noise in some
cases but it is neat.  Another benefit is that one doesn't need to
invoke this function twice to see the compound flags.

>  That way we always have the raw ones available, and we avoid any
> confusion about strange cases such as LOCK_UPGRADED and IS_LOCKED_ONLY.
>

Yeah, but I am not sure if we want to do display LOCK_UPGRADED stuff
in the compound column as that is not directly comparable to other
flags we want to display there like HEAP_XMIN_FROZEN,
HEAP_XMAX_SHR_LOCK.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Michael Paquier
On Fri, Sep 13, 2019 at 12:48:40PM -0400, Robert Haas wrote:
> On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera
>  wrote:
>> Ummm ... I've been operating --in this thread-- under the assumption
>> that it is REINDEX to blame for this problem, not CREATE INDEX, because
>> my recollection is that I tested CREATE INDEX together with CLUSTER and
>> it worked fine.  Has anybody done any actual research that the problem
>> is to blame on CREATE INDEX and not REINDEX?
> 
> I am not sure. I think, though, that the point is that all three
> commands rebuild indexes. So unless they all expect the same things in
> terms of which counters get set during that process, things will not
> work correctly.

I have provided a short summary of the two issues on the open item
page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as
the open item was too much evasive.  Here is a copy-paste for the
archives of what I wrote:
1) A progress may be started while another one is already in progress.
Hence, if progress gets stopped the previously-started state is
removed, causing all follow-up updates to not happen.
2) Progress updates happening in a code path shared between those
three commands may clobber a previous state present.

Regarding 1) and based on what I found in the code, you can blame
REINDEX reporting which has added progress_start calls in code paths
which are also taken by CREATE INDEX and CLUSTER, causing their
progress reporting to go to the void.  In order to fix this one we
could do what I summarized in [1].

As mentioned by Robert, the problem summarized in 2) is much more
complex using the current infrastructure, and one could blame all the
commands per the way they do not share the same set of progress
phases.  There are a couple of potential solutions which have been
discussed on the thread:
- Allow commands to share the same set of phases, which requires some
kind of mapping between the phases (?).
- Allow progress reports to become a stack.  This would also take care
of any kind of issues in 1) for the future, and this can cause the
incorrect command to be reported to pg_stat_activity if not being
careful.
- Allow only reporting for a given command ID, which would basically
require to pass down the command ID to progress update APIs and bypass
an update if the command ID provided by caller does not match the
existing one started (?).

1) is pretty easy to fix based on the current state of the code, 2)
requires much more consideration, and that's no material for v12.  It
could be perfectly possible as well that we have another solution not
discussed yet on this thread.

[1]: https://www.postgresql.org/message-id/20190905010316.gb14...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Michael Paquier
On Fri, Sep 13, 2019 at 11:09:52AM -0400, Tom Lane wrote:
> I think that's a useful function, but it's a different function from
> the one first proposed, which was to tell you the properties of a
> collation you already installed (which might not be ICU, even).
> Perhaps we should have both.

Perhaps.  Having a default description for the collations imported by
initdb is nice to have, but because of the gap with collations defined
after initialization it seems to me that there is an argument to
switch to that function for psql instead of grepping the default
description added to pg_description.  Enforcing a comment for a
collation manually created based on what libicu tells us does not
feel right either, as we don't enforce a comment for the creation of
other objects.
--
Michael


signature.asc
Description: PGP signature


Re: refactoring - share str2*int64 functions

2019-09-13 Thread Andres Freund
On 2019-09-10 12:05:25 +0900, Michael Paquier wrote:
> On Mon, Sep 09, 2019 at 05:27:04AM -0700, Andres Freund wrote:
> > On 2019-09-09 20:57:46 +0900, Michael Paquier wrote:
> > But ISTM all of them ought to just use the C types, rather than the SQL
> > types however. Since in the above proposal the caller determines the
> > type names, if you want a different type - like the SQL input routines -
> > can just invoke pg_strtoint_error() themselves (or just have it open
> > coded).
> 
> Yep, that was my line of thoughts.
> 
> >> And for errors which should never happen we could just use
> >> elog().  For the input functions of int2/4/8 we still need the
> >> existing errors of course.
> > 
> > Right, there it makes sense to continue to refer the SQL level types.
> 
> Actually, I found your suggestion of using a noreturn function for the
> error reporting to be a very clean alternative.  I didn't know though
> that gcc is not able to detect that a function does not return if you
> don't have a default in the switch for all the status codes.  And this
> even if all the values of the enum for the switch are listed.

As I proposed they'd be in different translation units, so the compiler
wouldn't see the definition of the function, just the declaration.


> >> Not sure about that.  I would keep the scope of the patch simple as of
> >> now, where we make sure that we have the right interface for
> >> everything.  There are a couple of extra improvements which could be
> >> done afterwards, and if we move everything in the same place that
> >> should be easier to move on with more improvements.  Hopefully.
> > 
> > The only reason for thinking about it now is that we'd then avoid
> > changing the API twice.
> 
> What I think we would be looking for here is an extra argument for the
> low-level routines to control the behavior of the function in an
> extensible way, say a bits16 for a set of flags, with one flag to
> ignore checks for trailing and leading whitespace.

That'd probably be a bad idea, for performance reasons.



> Attached is an updated patch?  How does it look?  I have left the
> parts of readfuncs.c for now as there are more issues behind that than
> doing a single switch, short reads are one, long reads a second.

Hm? I don't know what you mean by those issues.


> And the patch already does a lot.  There could be also an argument for
> having extra _check wrappers for the unsigned portions but these would
> be mostly unused in the backend code, so I have left that out on
> purpose.

I'd value consistency higher here.




> diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
> index 2c0ae395ba..8e75d52b06 100644
> --- a/src/backend/executor/spi.c
> +++ b/src/backend/executor/spi.c
> @@ -21,6 +21,7 @@
>  #include "catalog/heap.h"
>  #include "catalog/pg_type.h"
>  #include "commands/trigger.h"
> +#include "common/string.h"
>  #include "executor/executor.h"
>  #include "executor/spi_priv.h"
>  #include "miscadmin.h"
> @@ -2338,8 +2339,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo 
> paramLI,
>   CreateTableAsStmt *ctastmt = 
> (CreateTableAsStmt *) stmt->utilityStmt;
>  
>   if (strncmp(completionTag, "SELECT ", 
> 7) == 0)
> - _SPI_current->processed =
> - 
> pg_strtouint64(completionTag + 7, NULL, 10);
> + (void) 
> pg_strtouint64(completionTag + 7, &_SPI_current->processed);

I'd just use the checked version here, seems like a good thing to check
for, and I can't imagine it matters performance wise.


> @@ -63,8 +63,16 @@ Datum
>  int2in(PG_FUNCTION_ARGS)
>  {
>   char   *num = PG_GETARG_CSTRING(0);
> + int16   res;
> + pg_strtoint_status status;
>  
> - PG_RETURN_INT16(pg_strtoint16(num));
> + /* Use a custom set of error messages here adapted to the data type */
> + status = pg_strtoint16(num, );

I don't know what that comment is supposed to mean?

> +/*
> + * pg_strtoint64_check
> + *
> + * Convert input string to a signed 64-bit integer.
> + *
> + * This throws ereport() upon bad input format or overflow.
> + */
> +int64
> +pg_strtoint64_check(const char *s)
> +{
> + int64   result;
> + pg_strtoint_status status = pg_strtoint64(s, );
> +
> + if (unlikely(status != PG_STRTOINT_OK))
> + pg_strtoint_error(status, s, "int64");
> + return result;
> +}

I think I'd just put these as inlines in the header.



Greetings,

Andres Freund




Re: Duplicated LSN in ReorderBuffer

2019-09-13 Thread Andres Freund
Hi,

On 2019-09-13 16:42:47 -0300, Alvaro Herrera wrote:
> On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote:
> 
> > Here's a couple of patches.
> > 
> > always_decode_assignment.patch is Masahiko Sawada's patch, which has
> > been confirmed to fix the assertion failure.
> 
> I pushed this one to all branches.  Thanks Ildar for reporting and
> Sawada-san for fixing, and reviewers.
> 
> If you (Andres) want to propose a different fix for this, be my guest.
> We can always revert this one if you have a different better fix.

I'm bit surprised to see this go in just now, after I asked for the
changes you were reporting as not working for three weeks, and you sent
them out three days ago (during which I was at the linux plumbers
conference)...

Greetings,

Andres Freund




Re: allow_system_table_mods stuff

2019-09-13 Thread Alvaro Herrera
On 2019-Jun-28, Peter Eisentraut wrote:

> Here is a new patch after the discussion.
> 
> - Rename allow_system_table_mods to allow_system_table_ddl.
> 
> (This makes room for a new allow_system_table_dml, but it's not
> implemented here.)
> 
> - Make allow_system_table_ddl SUSET.
> 
> - Add regression test.
> 
> - Remove the behavior that allow_system_table_mods allowed
> non-superusers to do DML on catalog tables without further access checking.
> 
> I think there was general agreement on all these points.

I think this patch is at a point where it merits closer review from
fellow committers, so I marked it RfC for now.  I hope non-committers
would also look at it some more, though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Psql patch to show access methods info

2019-09-13 Thread Alvaro Herrera
On 2019-Aug-06, Alexander Korotkov wrote:

> Revised patch is attached.  Changes to \dA+ command are reverted.  It
> also contains some minor improvements.
> 
> Second patch looks problematic for me, because it provides index
> description alternative to \d+.  IMHO, if there is something really
> useful to display about index, we should keep it in \d+.  So, I
> propose to postpone this.

Are you saying that we should mark this entire CF entry as Returned with
Feedback?  Or do you see a subset of your latest 0001 as a commitable
patch?

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH][PROPOSAL] Add enum releation option type

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Nikolay Shaplov wrote:

> I've played with it around, and did some googling, but without much success.
> If we are moving this way (an this way seems to be good one), I need you 
> help, 
> because this thing is beyond my C knowledge, I will not manage it myself.

Well, you need to change the definition of the struct correspondingly
also, which your patch doesn't show you doing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Allow to_date() and to_timestamp() to accept localized names

2019-09-13 Thread Alvaro Herrera
On 2019-Aug-22, Juan José Santamaría Flecha wrote:

> On Sun, Aug 18, 2019 at 10:42 AM Juan José Santamaría Flecha
>  wrote:
> >
> > Going through the current items in the wiki's todo list, I have been
> > looking into: "Allow to_date () and to_timestamp () to accept
> > localized month names".
> 
> I have gone through a second take on this, trying to give it a better
> shape but it would surely benefit from some review, so I will open an
> item in the commitfest.

I'm confused why we acquire the MONTH_DIM / etc definitions.  Can't we
just use lengthof() of the corresponding array?  AFAICS it should work
just as well.

I wonder if the "compare first char" thing (seq_search_localized) really
works when there are multibyte chars in the day/month names.  I think
the code compares just the first char ... but what if the original
string uses those funny Unicode non-normalized letters and the locale
translation uses normalized letters?  My guess is that the first-char
comparison will fail, but surely you'll want the name to match.
(There's no month/day name in Spanish that doesn't start with an ASCII
letter, but I bet there are some in other languages.)  I think the
localized comparison should avoid the first-char optimization, just
compare the whole string all the time, and avoid possible weird issues.

But then, I'm not 100% sure of this code.  formatting.c is madness
distilled.

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] [PATCH] Generic type subscripting

2019-09-13 Thread Dmitry Dolgov
> On Thu, Sep 12, 2019 at 3:58 AM Alvaro Herrera  
> wrote:
> Can you please send an updated version?

Sure, I'll send it in a few days.




Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Robert Haas wrote:

> On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier  wrote:
> > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> > > It's fine if things are updated as well -- it's just you need to make
> > > sure that those places know whether or not they are supposed to be
> > > doing those updates. Again, why can't we just pass down a value
> > > telling them "do reindex-style progress updates" or "do cluster-style
> > > progress updates" or "do no progress updates"?
> >
> > That's invasive.  CREATE INDEX reporting goes pretty deep into the
> > tree, with steps dedicated to the builds and scans of btree (nbtsort.c
> > for example) and hash index AMs.

> Well, if CREATE INDEX progress reporting can't be reasonably done
> within the current infrastructure, then it should be reverted for v12
> and not committed again until somebody upgrades the infrastructure.

Ummm ... I've been operating --in this thread-- under the assumption
that it is REINDEX to blame for this problem, not CREATE INDEX, because
my recollection is that I tested CREATE INDEX together with CLUSTER and
it worked fine.  Has anybody done any actual research that the problem
is to blame on CREATE INDEX and not REINDEX?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Improve performance of NOTIFY over many databases (v2)

2019-09-13 Thread Tom Lane
Martijn van Oosterhout  writes:
> Here is the rebased second patch.

This throws multiple compiler warnings for me:

async.c: In function 'asyncQueueUnregister':
async.c:1293: warning: unused variable 'advanceTail'
async.c: In function 'asyncQueueAdvanceTail':
async.c:2153: warning: 'slowbackendpid' may be used uninitialized in this 
function

Also, I don't exactly believe this bit:

+/* If we are advancing to a new page, remember this so after the
+ * transaction commits we can attempt to advance the tail
+ * pointer, see ProcessCompletedNotifies() */
+if (QUEUE_POS_OFFSET(QUEUE_HEAD) == 0)
+backendTryAdvanceTail = true;

It seems unlikely that insertion would stop exactly at a page boundary,
but that seems to be what this is looking for.

But, really ... do we need the backendTryAdvanceTail flag at all?
I'm dubious, because it seems like asyncQueueReadAllNotifications
would have already covered the case if we're listening.  If we're
not listening, but we signalled some other listeners, it falls
to them to kick us if we're the slowest backend.  If we're not the
slowest backend then doing asyncQueueAdvanceTail isn't useful.

I agree with getting rid of the asyncQueueAdvanceTail call in
asyncQueueUnregister; on reflection doing that there seems pretty unsafe,
because we're not necessarily in a transaction and hence anything that
could possibly error is a bad idea.  However, it'd be good to add a
comment explaining that we're not doing that and why it's ok not to.

I'm fairly unimpressed with the "kick a random slow backend" logic.
There can be no point in kicking any but the slowest backend, ie
one whose pointer is exactly the oldest.  Since we're already computing
the min pointer in that loop, it would actually take *less* logic inside
the loop to remember the/a backend that had that pointer value, and then
decide afterwards whether it's slow enough to merit a kick.

regards, tom lane




Re: Modest proposal for making bpchar less inconsistent

2019-09-13 Thread Pavel Stehule
Dne pá 13. 9. 2019 16:43 uživatel Tom Lane  napsal:

> It struck me that the real reason that we keep getting gripes about
> the weird behavior of CHAR(n) is that these functions (and, hence,
> their corresponding operators) fail to obey the "trailing blanks
> aren't significant" rule:
>
>regprocedure|prosrc
> ---+--
>  bpcharlike(character,text)| textlike
>  bpcharnlike(character,text)   | textnlike
>  bpcharicregexeq(character,text)   | texticregexeq
>  bpcharicregexne(character,text)   | texticregexne
>  bpcharregexeq(character,text) | textregexeq
>  bpcharregexne(character,text) | textregexne
>  bpchariclike(character,text)  | texticlike
>  bpcharicnlike(character,text) | texticnlike
>
> They're just relying on binary compatibility of bpchar to text ...
> but of course textlike etc. think trailing blanks are significant.
>
> Every other primitive operation we have for bpchar correctly ignores
> the trailing spaces.
>
> We could fix this, and save some catalog space too, if we simply
> deleted these functions/operators and let such calls devolve
> into implicit casts to text.
>
> This might annoy people who are actually writing trailing spaces
> in their patterns to make such cases work.  But I think there
> are probably not too many such people, and having real consistency
> here is worth something.
>

has sense

Pavel

>
> regards, tom lane
>
>
>


Re: Duplicated LSN in ReorderBuffer

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-10, Alvaro Herrera from 2ndQuadrant wrote:

> Here's a couple of patches.
> 
> always_decode_assignment.patch is Masahiko Sawada's patch, which has
> been confirmed to fix the assertion failure.

I pushed this one to all branches.  Thanks Ildar for reporting and
Sawada-san for fixing, and reviewers.

If you (Andres) want to propose a different fix for this, be my guest.
We can always revert this one if you have a different better fix.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Write visibility map during CLUSTER/VACUUM FULL

2019-09-13 Thread Alexander Korotkov
On Thu, Sep 12, 2019 at 4:55 PM Alexander Korotkov
 wrote:
> On Wed, Sep 11, 2019 at 3:30 PM Amit Kapila  wrote:
> > On Sun, Sep 1, 2019 at 1:37 PM Alexander Korotkov
> >  wrote:
> > > I found it weird that CLUSTER/VACUUM FULL don't write visibility map.
> > > Attached patch implements writing visibility map in
> > > heapam_relation_copy_for_cluster().
> > >
> > > I've studied previous attempt to implement this [1].  The main problem
> > > of that attempt was usage of existing heap_page_is_all_visible() and
> > > visibilitymap_set() functions.  These functions works through buffer
> > > manager, while heap rewriting is made bypass buffer manager.
> > >
> > > In my patch visibility map pages are handled in the same way as heap
> > > pages are.
> > >
> >
> > I haven't studied this patch in detail, but while glancing I observed
> > that this doesn't try to sync the vm pages as we do for heap pages in
> > the end (during end_heap_rewrite).  Am I missing something?
>
> You're not missed anything.  Yes, VM need sync.  Will fix this.  And I
> just noticed I need a closer look to what is going on with TOAST.

Attached patch syncs VM during end_heap_rewrite().

However, VM for TOAST still isn't read.  It appear to be much more
difficult to write VM for TOAST, because it's written by insertion
tuples one-by-one.  Despite it seems to fill TOAST heap pages
sequentially (assuming no FSM exists yet), it's quite hard to handle
page-switching event with reasonable level of abstraction.
Nevertheless, I find this patch useful in current shape.  Even if we
don't write VM for TOAST, it's still useful to do for regular heap.
Additionally, one of key advantages of having VM is index-only scan,
which don't work for TOAST anyway.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-write-vm-during-cluster-3.patch
Description: Binary data


range test for hash index?

2019-09-13 Thread Paul A Jungwirth
Hello,

I noticed the tests for range types do this:

create table numrange_test2(nr numrange);
create index numrange_test2_hash_idx on numrange_test2 (nr);

Does that need a `using hash`? It seems like that's the intention. We
only use that table for equality comparisions. The script already
creates a table with a btree index further up. If I don't drop the
table I can see it's not a hash index:

regression=# \d numrange_test2
   Table "public.numrange_test2"
 Column |   Type   | Collation | Nullable | Default
+--+---+--+-
 nr | numrange |   |  |
Indexes:
"numrange_test2_hash_idx" btree (nr)

Everything else passes if I change just that one line in the
{sql,expected} files.

Regards,
Paul




Re: [DOC] Document auto vacuum interruption

2019-09-13 Thread James Coleman
On Sat, Aug 31, 2019 at 10:51 PM Amit Kapila  wrote:
>
> On Fri, Jul 26, 2019 at 1:45 AM James Coleman  wrote:
> >
> > We've discussed this internally many times, but today finally decided
> > to write up a doc patch.
> >
>
> Thanks, I think something on the lines of what you have written can
> help some users to understand the behavior in this area and there
> doesn't seem to be any harm in giving such information to the user.
>
> > Autovacuum holds a SHARE UPDATE EXCLUSIVE lock, but other processes
> > can cancel autovacuum if blocked by that lock unless the autovacuum is
> > to prevent wraparound.This can result in very surprising behavior:
> > imagine a system that needs to run ANALYZE manually before batch jobs
> > to ensure reasonable query plans. That ANALYZE will interrupt attempts
> > to run autovacuum, and pretty soon the table is far more bloated than
> > expected, and query plans (ironically) degrade further.
> >
>
> +If a process attempts to acquire a SHARE UPDATE
> EXCLUSIVE
> +lock (the lock type held by autovacuum), lock acquisition will interrupt
> +the autovacuum.
>
> I think it is not only for a process that tries to acquire a lock in
> SHARE UPDATE EXCLUSIVE mode, rather when a process tries to acquire
> any lock mode that conflicts with SHARE UPDATE EXCLUSIVE.  For the
> conflicting lock modes, you can refer docs [1] (See Table 13.2.
> Conflicting Lock Modes).
>
> [1] - https://www.postgresql.org/docs/devel/explicit-locking.html

Updated patch attached. I changed the wording to be about conflicting
locks rather than a single lock type, added a link to the conflicting
locks table, and fixed a few sgml syntax issues in the original.

James Coleman


autovacuum-interruption-v2.patch
Description: Binary data


pg_rewind docs correction

2019-09-13 Thread James Coleman
The pg_rewind docs assert that the state of the target's data directory
after rewind is equivalent to the source's data directory. But that
isn't true both because the base state is further back in time and
because the target's data directory will include the current state on
the source of any copied blocks.

So I've attached a patch to summarize more correctly as well as
document clearly the state of the cluster after the operation and also
the operation sequencing dangers caused by copying configuration
files from the source.

James Coleman


001_pg_rewind_explanation_doc_fixes.patch
Description: Binary data


Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Alvaro Herrera
Hello Tatsuro,

On 2019-Aug-13, Tatsuro Yamada wrote:

> On 2019/08/02 3:43, Alvaro Herrera wrote:
> > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > ever go up.  I think it's because the indexes are built using parallel
> > index build ... or maybe it was the table AM changes that moved things
> > around, not sure.  There's a period at the end when the CLUSTER command
> > keeps working, but it's gone from pg_stat_progress_cluster.
> 
> Thanks for your report.
> I'll investigate it. :)

I have fixed it.  Can you please verify?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: psql - improve test coverage from 41% to 88%

2019-09-13 Thread Fabien COELHO



Hello Alvaro,


I think the TestLib.pm changes should be done separately, not together
with the rest of the hacking in this patch.

Mostly, because I think they're going to cause trouble.  Adding a
parameter in the middle of the list may cause trouble for third-party
users of TestLib.


That is also what I thought, however, see below.


I see.  But you seem to have skipped my suggestion without considering
it.


I did understand it, but as Tom did not want simple hocus-pocus, ISTM that 
dynamically checking the argument type would not be considered a very good 
idea either.



I think the current API of these functions where they just receive a
plain array of arguments, and all callers have to be patched in unison,
is not very convenient.


I agree, but the no diff solution was rejected. I can bring one back, but 
going against Tom's views has not proven a good move in the past.


Also, I *think* your new icommand_checks method is the same as 
command_checks_all, except that you also have the "init" part.


Nope, it is an interactive version based on Expect, which sends input and 
waits for output, the process is quite different from a simple one shot no 
timeout exec version.


So you're duplicating code because the original doesn't have 
functionality you need?


Yes, I'm creating a interactive validation variant.

But why do that, if you could have *one* function that does both things? 
If some callers don't have the "init" part, just omit it from the 
parameters.


Although it could be abstracted somehow, I do not think that having one 
function behaving so differently under the hood is a good idea. It is not 
just the question of the init part.



(Whether it's implemented using Expect or not should not matter.  Either
Expect works everywhere, and we can use it, or it doesn't and we can't.)


For me the question is not about Expect dependency, it is more about how 
the test behaves.


--
Fabien.




Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Fabien COELHO


Hello Amit,


+ res = PQexec(con,
+ "select p.partstrat, count(*) "
+ "from pg_catalog.pg_class as c "
+ "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
+ "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
+ "where c.relname = 'pgbench_accounts' "
+ "group by 1, c.oid");

Can't we write this query with inner join instead of left join?  What
additional purpose you are trying to serve by using left join?


I'm ensuring that there is always a one line answer, whether it is
partitioned or not. Maybe the count(*) should be count(something in p) to
get 0 instead of 1 on non partitioned tables, though, but this is hidden
in the display anyway.


Sure, but I feel the code will be simplified.  I see no reason for
using left join here.


Without a left join, the query result is empty if there are no partitions, 
whereas there is one line with it. This fact simplifies managing the query 
result afterwards because we are always expecting 1 row in the "normal" 
case, whether partitioned or not.



+partition_method_t partition_method = PART_NONE;

It is better to make this static.


I do agree, but this would depart from all other global variables around
which are currently not static.


Check QueryMode.


Indeed, there is a mix of static (about 8) and non static (29 cases). I 
think static is better anyway, so why not.


Attached a v7.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..648a0c9865 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,15 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning */
+static int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
+  partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = { "none", "range", "hash", "unknown" };
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +626,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3613,17 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+ " with (fillfactor=%d)", fillfactor);
+}
+
 /*
  * Create pgbench's standard tables
  */
@@ -3664,9 +3687,15 @@ initCreateTables(PGconn *con)
 
 		/* Construct new create table statement. */
 		opts[0] = '\0';
-		if (ddl->declare_fillfactor)
+
+		/* Partition pgbench_accounts table */
+		if (partitions >= 1 && strcmp(ddl->table, "pgbench_accounts") == 0)
 			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
-	 " with (fillfactor=%d)", fillfactor);
+	 " partition by %s (aid)", PARTITION_METHOD[partition_method]);
+		else if (ddl->declare_fillfactor)
+			/* fillfactor is only expected on actual tables */
+			append_fillfactor(opts, sizeof(opts));
+
 		if (tablespace != NULL)
 		{
 			char	   *escape_tablespace;
@@ -3686,6 +3715,57 @@ initCreateTables(PGconn *con)
 
 		executeStatement(con, buffer);
 	}
+
+	/* if needed, pgbench_accounts partitions must be created manually */
+	if (partitions >= 1)
+	{
+		char		ff[64];
+
+		ff[0] = '\0';
+		append_fillfactor(ff, sizeof(ff));
+
+		fprintf(stderr, "creating %d partitions...\n", partitions);
+
+		for (int p = 

Re: Bug in GiST paring heap comparator

2019-09-13 Thread Alexander Korotkov
On Fri, Sep 13, 2019 at 5:23 PM Nikita Glukhov  wrote:
> I have moved handling of NULL ordering keys from opclasses to the common
> SP-GiST code, but really I don't like how it is implemented now. Maybe it's
> worth to move handling of NULL order-by keys to the even more higher
> level so,
> that AM don't have to worry about NULLs?

Yes, optimizer could remove away "col op NULL" clauses from ORDER BY
if op is strict operator.  And then such clauses wouldn't be passed to
AM.  But I see this as future improvement.  For backpatching we should
solve this at AM side.

> Also I leaved usages of IndexOrderByDistance in opclasses. I think, that
> may
> help to minimize opclass changes in the future.

Could you please extract this as a separate patch.  We can consider
this for master, but we shouldn't backpatch this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: block-level incremental backup

2019-09-13 Thread vignesh C
On Mon, Sep 9, 2019 at 4:51 PM Jeevan Chalke 
wrote:
>
>
>
> On Tue, Aug 27, 2019 at 4:46 PM vignesh C  wrote:
>>
>> Few comments:
>> Comment:
>> + buf = (char *) malloc(statbuf->st_size);
>> + if (buf == NULL)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OUT_OF_MEMORY),
>> + errmsg("out of memory")));
>> +
>> + if ((cnt = fread(buf, 1, statbuf->st_size, fp)) > 0)
>> + {
>> + Bitmapset  *mod_blocks = NULL;
>> + int nmodblocks = 0;
>> +
>> + if (cnt % BLCKSZ != 0)
>> + {
>>
>> We can use same size as full page size.
>> After pg start backup full page write will be enabled.
>> We can use the same file size to maintain data consistency.
>
>
> Can you please explain which size?
> The aim here is to read entire file in-memory and thus used
statbuf->st_size.
>
Instead of reading the whole file here, we can read the file page by page.
There is a possibility of data inconsistency if data is not read page by
page, data will be consistent if read page by page as full page write will
be enabled at this time.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Robert Haas
On Fri, Sep 13, 2019 at 11:12 AM Tom Lane  wrote:
> That argument presupposes (a) manual execution of the creation query,
> and (b) that the user pays close attention to the NOTICE output.
> Unfortunately, I think our past over-use of notices has trained
> people to ignore them.

Our past overuse aside, it's just easy to ignore chatter. It often
happens to me that I realize 10 minutes after I did something that I
didn't look carefully enough at the output ... which is usually
followed by an attempt to scroll back through my terminal buffer to
find it. But after a few thousand lines of subsequent output that's
hard. So I like the idea of making the information available
on-demand, rather than only at creation time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Robert Haas
On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera
 wrote:
> Ummm ... I've been operating --in this thread-- under the assumption
> that it is REINDEX to blame for this problem, not CREATE INDEX, because
> my recollection is that I tested CREATE INDEX together with CLUSTER and
> it worked fine.  Has anybody done any actual research that the problem
> is to blame on CREATE INDEX and not REINDEX?

I am not sure. I think, though, that the point is that all three
commands rebuild indexes. So unless they all expect the same things in
terms of which counters get set during that process, things will not
work correctly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-13 Thread Robert Haas
On Fri, Sep 13, 2019 at 12:14 PM Tom Lane  wrote:
> Amit Khandekar  writes:
> > You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> > that excess fds are automatically closed. But Alvaro seems to be
> > talking in context of tracking of file seek position. VFD  does not
> > have a mechanism to track file offsets if one of the vfd cached file
> > is closed and reopened.
>
> Hm.  It used to, but somebody got rid of that on the theory that
> we could use pread/pwrite instead.  I'm inclined to think that that
> was the right tradeoff, but it'd mean that getting logical decoding
> to adhere to the VFD API requires extra work to track file position
> on the caller side.

Oops.  I forgot that we'd removed that.

> Again, though, the advice that's been given here is that we should
> fix logical decoding to use the VFD API as it stands, not change
> that API.  I concur with that.

A reasonable position.  So I guess logical decoding has to track the
file position itself, but perhaps use the VFD layer for managing FD
pooling.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: errbacktrace

2019-09-13 Thread Alvaro Herrera
On 2019-Aug-20, Peter Eisentraut wrote:

> The memory management of that seems too complicated.  The "extra"
> mechanism of the check/assign hooks only supports one level of malloc.
> Using a List seems impossible.  I don't know if you can safely do a
> malloc-ed array of malloc-ed strings either.

Here's an idea -- have the check/assign hooks create a different
representation, which is a single guc_malloc'ed chunk that is made up of
every function name listed in the GUC, separated by \0.  That can be
scanned at error time comparing the function name with each piece.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 7341310cd75722fbfe7cb7996516b0b50d83bb1f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 13 Sep 2019 12:51:13 -0300
Subject: [PATCH v5] Add backtrace support

Add some support for automatically showing backtraces in certain error
situations in the server.  Backtraces are shown on assertion failure.
A new setting backtrace_functions can be set to a list of C function
names, and all ereport()s and elog()s from the functions will have
backtraces generated.  Finally, the function errbacktrace() can be
manually added to an ereport() call to generate a backtrace for that
call.

Discussion: https://www.postgresql.org/message-id/flat/5f48cb47-bf1e-05b6-7aae-3bf2cd015...@2ndquadrant.com
---
 configure|  61 +++-
 configure.in |   4 ++
 doc/src/sgml/config.sgml |  26 +++
 src/backend/utils/error/assert.c |  13 
 src/backend/utils/error/elog.c   | 117 +++
 src/backend/utils/misc/guc.c |  67 ++
 src/include/pg_config.h.in   |   6 ++
 src/include/utils/elog.h |   3 +
 src/include/utils/guc.h  |   2 +
 9 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index b3c92764be..900b83dc4e 100755
--- a/configure
+++ b/configure
@@ -11606,6 +11606,63 @@ if test "$ac_res" != no; then :
 
 fi
 
+# *BSD:
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing backtrace_symbols" >&5
+$as_echo_n "checking for library containing backtrace_symbols... " >&6; }
+if ${ac_cv_search_backtrace_symbols+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_func_search_save_LIBS=$LIBS
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char backtrace_symbols ();
+int
+main ()
+{
+return backtrace_symbols ();
+  ;
+  return 0;
+}
+_ACEOF
+for ac_lib in '' execinfo; do
+  if test -z "$ac_lib"; then
+ac_res="none required"
+  else
+ac_res=-l$ac_lib
+LIBS="-l$ac_lib  $ac_func_search_save_LIBS"
+  fi
+  if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_search_backtrace_symbols=$ac_res
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext
+  if ${ac_cv_search_backtrace_symbols+:} false; then :
+  break
+fi
+done
+if ${ac_cv_search_backtrace_symbols+:} false; then :
+
+else
+  ac_cv_search_backtrace_symbols=no
+fi
+rm conftest.$ac_ext
+LIBS=$ac_func_search_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_backtrace_symbols" >&5
+$as_echo "$ac_cv_search_backtrace_symbols" >&6; }
+ac_res=$ac_cv_search_backtrace_symbols
+if test "$ac_res" != no; then :
+  test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
+
+fi
+
 
 if test "$with_readline" = yes; then
 
@@ -12704,7 +12761,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi
 
 
-for ac_header in atomic.h copyfile.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h copyfile.h execinfo.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
@@ -15087,7 +15144,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memset_s memmove poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale utime utimes wcstombs_l
+for ac_func in backtrace_symbols cbrt 

Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-13 Thread Tom Lane
Amit Khandekar  writes:
> You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
> that excess fds are automatically closed. But Alvaro seems to be
> talking in context of tracking of file seek position. VFD  does not
> have a mechanism to track file offsets if one of the vfd cached file
> is closed and reopened.

Hm.  It used to, but somebody got rid of that on the theory that
we could use pread/pwrite instead.  I'm inclined to think that that
was the right tradeoff, but it'd mean that getting logical decoding
to adhere to the VFD API requires extra work to track file position
on the caller side.

Again, though, the advice that's been given here is that we should
fix logical decoding to use the VFD API as it stands, not change
that API.  I concur with that.

regards, tom lane




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-09-13 Thread Amit Khandekar
On Thu, 12 Sep 2019 at 19:11, Robert Haas  wrote:
>
> On Thu, Sep 12, 2019 at 5:31 AM Tomas Vondra
>  wrote:
> > I don't see how the current API could do that transparently - it does
> > track the files, but the user only gets a file descriptor. With just a
> > file descriptor, how could the code know to do reopen/seek when it's going
> > just through the regular fopen/fclose?
> >
> > Anyway, I agree we need to do something, to fix this corner case (many
> > serialized in-progress transactions). ISTM we have two options - either do
> > something in the context of reorderbuffer.c, or extend the transient file
> > API somehow. I'd say the second option is the right thing going forward,
> > because it does allow doing it transparently and without leaking details
> > about maxAllocatedDescs etc. There are two issues, though - it does
> > require changes / extensions to the API, and it's not backpatchable.
> >
> > So maybe we should start with the localized fix in reorderbuffer, and I
> > agree tracking offset seems reasonable.
>

> We've already got code that knows how to track this sort of thing.

You mean tracking excess kernel fds right ? Yeah, we can use VFDs so
that excess fds are automatically closed. But Alvaro seems to be
talking in context of tracking of file seek position. VFD  does not
have a mechanism to track file offsets if one of the vfd cached file
is closed and reopened. Robert, are you suggesting to add this
capability to VFD ? I agree that we could do it, but for
back-patching, offhand I couldn't think of a simpler way.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-09-13 Thread James Coleman
On Fri, Sep 13, 2019 at 10:54 AM Tomas Vondra
 wrote:
>
> On Thu, Sep 12, 2019 at 11:54:06AM -0400, James Coleman wrote:
> >> OK, so we have that now.  I suppose this spreadsheet now tells us which
> >> places are useful and which aren't, at least for the queries that you've
> >> tested.  Dowe that mean that we want to get the patch to consider adding
> >> paths only the places that your spreadsheet says are useful?  I'm not
> >> sure what the next steps are for this patch.
> >
> >I wanted to note here that I haven't abandoned this patch, but ended
> >up needing to use my extra time for working on a conference talk. That
> >talk is today, so I'm hoping to be able to catch up on this again
> >soon.
> >
>
> Good! I'm certainly looking forward to a new patch version.
>
> As discussed in the past, this patch is pretty sensitive (large, touches
> planning, ...), so we should try getting most of it in not too late in
> the cycle. For example 2019-11 would be nice.

Completely agree; originally I'd hoped to have it in rough draft
finished form to get serious review in the September CF...but...




Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Tom Lane
"Daniel Verite"  writes:
> This output tend to reveal mistakes with tags, which is why I thought
> to expose it as a NOTICE. It addresses the case of a user
> who wouldn't suspect an error, so the "in-your-face" effect is
> intentional. With the function approach, the user must be
> proactive.

That argument presupposes (a) manual execution of the creation query,
and (b) that the user pays close attention to the NOTICE output.
Unfortunately, I think our past over-use of notices has trained
people to ignore them.

regards, tom lane




Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Tom Lane
Robert Haas  writes:
> On Fri, Sep 13, 2019 at 9:57 AM Daniel Verite  wrote:
>> An advantage of this approach is that you may execute the
>> function before creating the collation, instead of creating the
>> collation, realizing there was something wrong in your
>> locale/lc_collate argument, dropping the collation and trying again.

> That would be really nice.

I think that's a useful function, but it's a different function from
the one first proposed, which was to tell you the properties of a
collation you already installed (which might not be ICU, even).
Perhaps we should have both.

regards, tom lane




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-09-13 Thread Tomas Vondra

On Thu, Sep 12, 2019 at 11:54:06AM -0400, James Coleman wrote:

OK, so we have that now.  I suppose this spreadsheet now tells us which
places are useful and which aren't, at least for the queries that you've
tested.  Dowe that mean that we want to get the patch to consider adding
paths only the places that your spreadsheet says are useful?  I'm not
sure what the next steps are for this patch.


I wanted to note here that I haven't abandoned this patch, but ended
up needing to use my extra time for working on a conference talk. That
talk is today, so I'm hoping to be able to catch up on this again
soon.



Good! I'm certainly looking forward to a new patch version.

As discussed in the past, this patch is pretty sensitive (large, touches
planning, ...), so we should try getting most of it in not too late in
the cycle. For example 2019-11 would be nice.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Robert Haas
On Fri, Sep 13, 2019 at 9:57 AM Daniel Verite  wrote:
> An advantage of this approach is that you may execute the
> function before creating the collation, instead of creating the
> collation, realizing there was something wrong in your
> locale/lc_collate argument, dropping the collation and trying again.

That would be really nice.

I also think that the documentation is entirely inadequate in this
area.  https://www.postgresql.org/docs/11/collation.html#COLLATION-CREATE
gives some examples, but those don't help you understand the general
principles very much, and it has some links to the ICU documentation,
which helps less than one might think. For example it links to
http://userguide.icu-project.org/locale which describes locales like
en_IE@currency=IEP and es__TRADITIONAL, but if you can figure out what
all the valid possibilities are by reading that page, you are much
smarter than me.  Then, too, according to the PostgreSQL documentation
you ought to prefer forms using the newer syntax, which looks like a
bunch of dash-separated things, e.g. de-u-co-phonebk. But neither the
PostgreSQL documentation itself nor either of the links to ICU
included there actually describe the rules for that syntax. They just
say things like 'use BCP-47', which doesn't help at all.  So I am just
reduced to trying a bunch of things and seeing which collations appear
to behave differently when I use them.

This proposal wouldn't fix the problem that I have to guess what
strings to use, but at least it might be clearer when I have or have
not guessed correctly.

I seriously hate this stuff with a fiery passion that cannot be
quenched. How does anybody manage to use software that seems to have
no usable documentation and doesn't even tell whether or not you
supplied it with input that it thinks is valid?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-09-13 Thread Tomas Vondra

On Thu, Sep 12, 2019 at 12:49:29PM -0300, Alvaro Herrera wrote:

On 2019-Jul-30, Tomas Vondra wrote:


On Sun, Jul 21, 2019 at 01:34:22PM +0200, Tomas Vondra wrote:
>
> I wonder if we're approaching this wrong. Maybe we should not reverse
> engineer queries for the various places, but just start with a set of
> queries that we want to optimize, and then identify which places in the
> planner need to be modified.


[...]


I've decided to do a couple of experiments, trying to make my mind about
which modified places matter to diffrent queries. But instead of trying
to reverse engineer the queries, I've taken a different approach - I've
compiled a list of queries that I think are sensible and relevant, and
then planned them with incremental sort enabled in different places.


[...]


The list of queries (synthetic, but hopefully sufficiently realistic)
and a couple of scripts to collect the plans is in this repository:

 https://github.com/tvondra/incremental-sort-tests-2

There's also a spreadsheet with a summary of results, with a visual
representation of which GUCs affect which queries.


OK, so we have that now.  I suppose this spreadsheet now tells us which
places are useful and which aren't, at least for the queries that you've
tested.  Dowe that mean that we want to get the patch to consider adding
paths only the places that your spreadsheet says are useful?  I'm not
sure what the next steps are for this patch.



Yes. I think the spreadsheet call help us with answering two things:

1) places actually affecting the plan (all but three do)

2) redundant places (there are some cases where two GUCs produce the
same plan in the end)

Of course, this does assume the query set makes sense and is somewhat
realistic, but I've tried to construct queries where that is true. We
may extend it over time, of course.

I think we've agreed to add incremental sort paths different places in
separate patches, to make review easier. So this may be a useful way to
decide which places to address first. I'd probably do it in this order:

- create_ordered_paths
- create_ordered_paths (parallel part)
- add_paths_to_grouping_rel
- ... not sure ...

but that's just a proposal. It'd give us most of the benefits, I think,
and we could also focus on the rest of the patch.

Also, regarding the three GUCs that don't affect any of the queries, we
can't really add them as we wouldn't be able to test them. If we manage
to construct a query that'd benefit from them, we can revisit this.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Modest proposal for making bpchar less inconsistent

2019-09-13 Thread Tom Lane
It struck me that the real reason that we keep getting gripes about
the weird behavior of CHAR(n) is that these functions (and, hence,
their corresponding operators) fail to obey the "trailing blanks
aren't significant" rule:

   regprocedure|prosrc
---+--
 bpcharlike(character,text)| textlike
 bpcharnlike(character,text)   | textnlike
 bpcharicregexeq(character,text)   | texticregexeq
 bpcharicregexne(character,text)   | texticregexne
 bpcharregexeq(character,text) | textregexeq
 bpcharregexne(character,text) | textregexne
 bpchariclike(character,text)  | texticlike
 bpcharicnlike(character,text) | texticnlike

They're just relying on binary compatibility of bpchar to text ...
but of course textlike etc. think trailing blanks are significant.

Every other primitive operation we have for bpchar correctly ignores
the trailing spaces.

We could fix this, and save some catalog space too, if we simply
deleted these functions/operators and let such calls devolve
into implicit casts to text.

This might annoy people who are actually writing trailing spaces
in their patterns to make such cases work.  But I think there
are probably not too many such people, and having real consistency
here is worth something.

regards, tom lane




Re: Bug in GiST paring heap comparator

2019-09-13 Thread Nikita Glukhov


On 12.09.2019 16:45, Alexander Korotkov wrote:

On Wed, Sep 11, 2019 at 3:34 AM Nikita Glukhov  wrote:

On 09.09.2019 22:47, Alexander Korotkov wrote:

On Mon, Sep 9, 2019 at 8:32 PM Nikita Glukhov  wrote:

On 08.09.2019 22:32, Alexander Korotkov wrote:

On Fri, Sep 6, 2019 at 5:44 PM Alexander Korotkov
 wrote:

I'm going to push both if no objections.

So, pushed!

Two years ago there was a similar patch for this issue:
https://www.postgresql.org/message-id/1499c9d0-075a-3014-d2aa-ba59121b3728%40postgrespro.ru

Sorry that I looked at this thread too late.

I see, thanks.

You patch seems a bit less cumbersome thanks to using GISTDistance
struct.  You don't have to introduce separate array with null flags.
But it's harder too keep index_store_float8_orderby_distances() used
by both GiST and SP-GiST.

What do you think?  You can rebase your patch can propose that as refactoring.

Rebased patch with refactoring is attached.

During this rebase I found that SP-GiST code has similar problems with NULLs.
All SP-GiST functions do not check SK_ISNULL flag of ordering ScanKeys, and
that leads to segfaults.  In the following test, index is searched with
non-NULL key first and then with NULL key, that leads to a crash:

CREATE TABLE t AS SELECT point(x, y) p
FROM generate_series(0, 100) x, generate_series(0, 100) y;

CREATE INDEX ON t USING spgist (p);

-- crash
SELECT (SELECT p FROM t ORDER BY p <-> pt LIMIT 1)
FROM (VALUES (point '1,2'), (NULL)) pts(pt);


After adding SK_ISNULL checks and starting to produce NULL distances, we need to
store NULL flags for distance somewhere.  Existing singleton flag for the whole
SPGistSearchItem is not sufficient anymore.


So, I introduced structure IndexOrderByDistance and used it everywhere in the
GiST and SP-GiST code instead of raw double distances.


SP-GiST order-by-distance code can be further refactored so that user functions
do not have to worry about SK_ISNULL checks.

It doesn't seem SP-GiST inner_consistent and leaf_consistent functions
can handle NULL scan keys for now.  So should we let them handle NULL
orderby keys?  If we assume that NULL arguments produce NULL
distances, we can evade changing the code of opclasses.

I have moved handling of NULL ordering keys from opclasses to the common
SP-GiST code, but really I don't like how it is implemented now. Maybe it's
worth to move handling of NULL order-by keys to the even more higher 
level so,

that AM don't have to worry about NULLs?

Also I leaved usages of IndexOrderByDistance in opclasses. I think, that 
may

help to minimize opclass changes in the future.


Also, I've noticed IndexOrderByDistance is missed in typedefs.list.

Fixed.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From edff2142fd8239bbc9cef31a885a55344e20a649 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Tue, 10 Sep 2019 17:26:26 +0300
Subject: [PATCH 1/4] Fix GiST and SP-GiST ordering by distance for NULLs

---
 src/backend/access/gist/gistget.c |  68 ---
 src/backend/access/gist/gistscan.c|  18 ++-
 src/backend/access/index/indexam.c|  22 ++--
 src/backend/access/spgist/spgkdtreeproc.c |   2 +-
 src/backend/access/spgist/spgproc.c   |   9 +-
 src/backend/access/spgist/spgquadtreeproc.c   |   2 +-
 src/backend/access/spgist/spgscan.c   | 136 +-
 src/backend/utils/adt/geo_spgist.c|  18 +--
 src/include/access/genam.h|  10 +-
 src/include/access/gist_private.h |  27 +
 src/include/access/spgist.h   |   4 +-
 src/include/access/spgist_private.h   |  22 ++--
 src/test/regress/expected/create_index_spgist.out |  10 ++
 src/test/regress/sql/create_index_spgist.sql  |   5 +
 src/tools/pgindent/typedefs.list  |   1 +
 15 files changed, 212 insertions(+), 142 deletions(-)

diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index db633a9..22d790d 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -112,9 +112,8 @@ gistkillitems(IndexScanDesc scan)
  * Similarly, *recheck_distances_p is set to indicate whether the distances
  * need to be rechecked, and it is also ignored for non-leaf entries.
  *
- * If we are doing an ordered scan, so->distancesValues[] and
- * so->distancesNulls[] is filled with distance data from the distance()
- * functions before returning success.
+ * If we are doing an ordered scan, so->distances[] is filled with distance
+ * data from the distance() functions before returning success.
  *
  * We must decompress the key in the IndexTuple before passing it to the
  * sk_funcs (which actually are the opclass Consistent or Distance methods).
@@ -135,8 +134,7 @@ gistindex_keytest(IndexScanDesc scan,
 	GISTSTATE  *giststate = so->giststate;
 	ScanKey		key = scan->keyData;
 	int			

Re: Create collation reporting the ICU locale display name

2019-09-13 Thread Daniel Verite
Michael Paquier wrote:

> Or could it make sense to provide a system function which returns a
> collation description for at least an ICU-provided one?  We could make
> use of that in psql for example.

If we prefer having a function over the instant feedback effect of
the NOTICE, the function might look like icu_collation_attributes() [1]
from the icu_ext extension. It returns a set of (attribute,value)
tuples, among which the displayname is one of the values.

An advantage of this approach is that you may execute the
function before creating the collation, instead of creating the
collation, realizing there was something wrong in your
locale/lc_collate argument, dropping the collation and trying again.

Another advantage would be the possibility of localizing the
display name, leaving the localization as a choice to the user.
Currently get_icu_locale_comment() forces "en" as the language because
it want results in US-ASCII, but a user-callable function could have the
language code as an optional argument. When not being forced, the
language has a default value obtained by ICU from the environment
(so that would be from where the postmaster is started in our case),
and is also settable with uloc_setDefault().

Example with icu_ext functions:

test=> select icu_set_default_locale('es');
 icu_set_default_locale 

 es

test=> select value from icu_collation_attributes('en-US-u-ka-shifted')
   where attribute='displayname';
   value

 inglés (Estados Unidos, alternate=shifted)

This output tend to reveal mistakes with tags, which is why I thought
to expose it as a NOTICE. It addresses the case of a user
who wouldn't suspect an error, so the "in-your-face" effect is
intentional. With the function approach, the user must be
proactive.

An example of mistake I found myself doing is forgetting the '-u-' before
the collation tags, which doesn't error out but is detected relatively
easily with the display name.

-- wrong
test=> select value from icu_collation_attributes('de-DE-ks-level1')  
   where attribute='displayname';
 value
-
 German (Germany, KS_LEVEL1)

-- right
test=> select value from icu_collation_attributes('de-DE-u-ks-level1')  
   where attribute='displayname';
 value 
---
 German (Germany, colstrength=primary)


[1] https://github.com/dverite/icu_ext#icu_collation_attributes


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: [PATCH] Speedup truncates of relation forks

2019-09-13 Thread Fujii Masao
On Fri, Sep 13, 2019 at 9:51 PM Alvaro Herrera  wrote:
>
> On 2019-Sep-13, Fujii Masao wrote:
>
> > On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk  
> > wrote:
>
> > > > Please add a preliminary patch that removes the function.  Dead code is 
> > > > good,
> > > > as long as it is gone.  We can get it pushed ahead of the rest of this.
> > >
> > > Alright. I've attached a separate patch removing the smgrdounlinkfork.
> >
> > Per the past discussion, some people want to keep this "dead" function
> > for some reasons. So, in my opinion, it's better to just enclose the 
> > function
> > with #if NOT_USED and #endif, to keep the function itself as it is, and then
> > to start new discussion on hackers about the removal of that separatedly
> > from this patch.
>
> I searched for anybody requesting to keep the function.  I couldn't find
> anything.  Tom said in 2012:
> https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.us

Yes. And I found Andres.
https://www.postgresql.org/message-id/20180621174129.hogefyopje4xa...@alap3.anarazel.de

> > As committed, the smgrdounlinkfork case is actually dead code; it's
> > never called from anywhere.  I left it in place just in case we want
> > it someday.
>
> but if no use has appeared in 7 years, I say it's time to kill it.

+1

Regards,

-- 
Fujii Masao




Primary keepalive message not appearing in Logical Streaming Replication

2019-09-13 Thread Virendra Negi
Implemented the Logical Streaming Replication thing are working fine I see
the XLogData message appearing and I'm able to parse them.

But I haven't see any "Primary Keepalive message"  yet. I had tried setting
the *tcp_keepalive_interval*, *tcp_keepalives_idle* both from client
runtime paramter and well as from postgresql.conf still no clue of it.

Any information around it?


Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Amit Kapila wrote:

> I would like to take inputs from others as well for the display part
> of this patch.  After this patch, for a simple-update pgbench test,
> the changed output will be as follows (note: partition method and
> partitions):

> pgbench.exe -c 4 -j 4 -T 10 -N postgres
> starting vacuum...end.
> transaction type: 
> scaling factor: 1
> partition method: hash
> partitions: 3
> query mode: simple
> number of clients: 4
> number of threads: 4
> duration: 10 s
> number of transactions actually processed: 14563
> latency average = 2.749 ms
> tps = 1454.899150 (including connections establishing)
> tps = 1466.689412 (excluding connections establishing)
> 
> What do others think about this?  This will be the case when the user
> has used --partitions option in pgbench, otherwise, it won't change.

I wonder what's the intended usage of this output ... it seems to be
getting a bit too long.  Is this intended for machine processing?  I
would rather have more things per line in a more compact header.
But then I'm not the kind of person who automates multiple pgbench runs.
Maybe we can get some input from Tomas, who does -- how do you automate
extracting data from collected pgbench output, or do you instead just
redirect the output to a file whose path/name indicates the parameters
that were used?  (I do the latter.)

I mean, if we changed it like this (and I'm not proposing to do it in
this patch, this is only an example), would it bother anyone?

$ pgbench -x -y -z ...
starting vacuum...end.
scaling factor: 1  partition method: hash   partitions: 1
transaction type:   query mode: simple
number of clients: 4   number of threads: 4 duration: 10s
number of transactions actually processed: 14563
latency average = 2.749 ms
tps = 1454.899150 (including connections establishing)
tps = 1466.689412 (excluding connections establishing)


If this output doesn't bother people, then I suggest that this patch
should put the partition information in the line together with scaling
factor.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Amit Kapila wrote:

> On Fri, Sep 13, 2019 at 1:50 PM Fabien COELHO  wrote:
> > >>> Is there a reason why we treat "partitions = 0" as a valid value?
> > >>
> > >> Yes. It is an explicit "do not create partitioned tables", which differ
> > >> from 1 which says "create a partitionned table with just one partition".
> > >
> > > Why would anyone want to use --partitions option in the first case
> > > ("do not create partitioned tables")?
> >
> > Having an explicit value for the default is generally a good idea, eg for
> > a script to tests various partitioning settings:
> >
> >for nparts in 0 1 2 3 4 5 6 7 8 9 ; do
> >  pgbench -i --partitions=$nparts ... ;
> >  ...
> >done
> >
> > Otherwise you would need significant kludging to add/remove the option.
> > Allowing 0 does not harm anyone.
> >
> > Now if the consensus is to remove an explicit 0, it is simple enough to
> > change it, but my opinion is that it is better to have it.
> 
> Fair enough, let us see if anyone else wants to weigh in.

It seems convenient UI -- I vote to keep it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Speedup truncates of relation forks

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Fujii Masao wrote:

> On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk  wrote:

> > > Please add a preliminary patch that removes the function.  Dead code is 
> > > good,
> > > as long as it is gone.  We can get it pushed ahead of the rest of this.
> >
> > Alright. I've attached a separate patch removing the smgrdounlinkfork.
> 
> Per the past discussion, some people want to keep this "dead" function
> for some reasons. So, in my opinion, it's better to just enclose the function
> with #if NOT_USED and #endif, to keep the function itself as it is, and then
> to start new discussion on hackers about the removal of that separatedly
> from this patch.

I searched for anybody requesting to keep the function.  I couldn't find
anything.  Tom said in 2012:
https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.us

> As committed, the smgrdounlinkfork case is actually dead code; it's
> never called from anywhere.  I left it in place just in case we want
> it someday.

but if no use has appeared in 7 years, I say it's time to kill it.

In absence of objections, I'll commit a patch to remove it later today.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: psql - improve test coverage from 41% to 88%

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Fabien COELHO wrote:

> Hello Alvaro,
> 
> > I think the TestLib.pm changes should be done separately, not together
> > with the rest of the hacking in this patch.
> > 
> > Mostly, because I think they're going to cause trouble.  Adding a
> > parameter in the middle of the list may cause trouble for third-party
> > users of TestLib.
> 
> That is also what I thought, however, see below.

I see.  But you seem to have skipped my suggestion without considering
it.

I think the current API of these functions where they just receive a
plain array of arguments, and all callers have to be patched in unison,
is not very convenient.  Also, I *think* your new icommand_checks method
is the same as command_checks_all, except that you also have the "init"
part.  So you're duplicating code because the original doesn't have
functionality you need?  But why do that, if you could have *one*
function that does both things?  If some callers don't have the "init"
part, just omit it from the parameters.

(Whether it's implemented using Expect or not should not matter.  Either
Expect works everywhere, and we can use it, or it doesn't and we can't.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] Speedup truncates of relation forks

2019-09-13 Thread Fujii Masao
On Mon, Sep 9, 2019 at 3:52 PM Jamison, Kirk  wrote:
>
> On Friday, September 6, 2019 11:51 PM (GMT+9), Alvaro Herrera wrote:
>
> Hi Alvaro,
> Thank you very much for the review!
>
> > On 2019-Sep-05, Jamison, Kirk wrote:
> >
> > > I also mentioned it from my first post if we can just remove this dead 
> > > code.
> > > If not, it would require to modify the function because it would also
> > > need nforks as input argument when calling DropRelFileNodeBuffers. I
> > > kept my changes in the latest patch.
> > > So should I remove the function now or keep my changes?
> >
> > Please add a preliminary patch that removes the function.  Dead code is 
> > good,
> > as long as it is gone.  We can get it pushed ahead of the rest of this.
>
> Alright. I've attached a separate patch removing the smgrdounlinkfork.

Per the past discussion, some people want to keep this "dead" function
for some reasons. So, in my opinion, it's better to just enclose the function
with #if NOT_USED and #endif, to keep the function itself as it is, and then
to start new discussion on hackers about the removal of that separatedly
from this patch.

Regards,

-- 
Fujii Masao




Re: Leakproofness of texteq()/textne()

2019-09-13 Thread Robert Haas
On Thu, Sep 12, 2019 at 5:19 PM Tom Lane  wrote:
> However, if there is some character C that makes ICU misbehave like
> that, we are going to have problems with indexing strings containing C,
> whether we think varstr_cmp is leaky or not.  So I'm not sure that
> focusing our attention on leakiness is a helpful way to proceed.

That seems like a compelling argument to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Robert Haas
On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier  wrote:
> On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> > It's fine if things are updated as well -- it's just you need to make
> > sure that those places know whether or not they are supposed to be
> > doing those updates. Again, why can't we just pass down a value
> > telling them "do reindex-style progress updates" or "do cluster-style
> > progress updates" or "do no progress updates"?
>
> That's invasive.  CREATE INDEX reporting goes pretty deep into the
> tree, with steps dedicated to the builds and scans of btree (nbtsort.c
> for example) and hash index AMs.  In this case we have something that
> does somewhat what you are looking for with report_progress which gets
> set to true only for VACUUM.  If we were to do something like that, we
> would also need to keep some sort of mapping regarding which command
> ID (as defined by ProgressCommandType) is able to use which set of
> parameter flags (as defined by the arguments of
> pgstat_progress_update_param() and its multi_* cousin).  Then comes
> the issue that some parameters may be used by multiple command types,
> while other don't (REINDEX and CREATE INDEX have some shared
> mapping).

Well, if CREATE INDEX progress reporting can't be reasonably done
within the current infrastructure, then it should be reverted for v12
and not committed again until somebody upgrades the infrastructure.

I admit that I was a bit suspicious about that commit, but I figured
Alvaro knew what he was doing and didn't study it in any depth.  And
perhaps he did know what he was doing and will disagree with your
assessment. But so far I haven't heard an idea for evolving the
infrastructure that sounds more than half-baked.

It's generally clear, though, that the existing infrastructure is not
well-suited to progress reporting for code that bounces all over the
tree.  It's not clear that *any* infrastructure is *entirely*
well-suited to do that; such problems are inherently complex. But this
is just a very simple system which was designed to be simple and very
low cost to use, and it may be that it's been stretched outside of its
comfort zone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Speedup truncates of relation forks

2019-09-13 Thread Fujii Masao
On Thu, Sep 5, 2019 at 5:53 PM Jamison, Kirk  wrote:
>
> On Tuesday, September 3, 2019 9:44 PM (GMT+9), Fujii Masao wrote:
> > Thanks for the patch!
>
> Thank you as well for the review!
>
> > -smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool isRedo)
> > +smgrdounlinkfork(SMgrRelation reln, ForkNumber *forknum, int nforks,
> > bool isRedo)
> >
> > smgrdounlinkfork() is dead code. Per the discussion [1], this unused 
> > function
> > was left intentionally. But it's still dead code since 2012, so I'd like to
> > remove it. Or, even if we decide to keep that function for some reasons, I
> > don't think that we need to update that so that it can unlink multiple forks
> > at once. So, what about keeping
> > smgrdounlinkfork() as it is?
> >
> > [1]
> > https://www.postgresql.org/message-id/1471.1339106...@sss.pgh.pa.us
>
> I also mentioned it from my first post if we can just remove this dead code.
> If not, it would require to modify the function because it would also
> need nforks as input argument when calling DropRelFileNodeBuffers. I kept my
> changes in the latest patch.
> So should I remove the function now or keep my changes?
>
>
> > + for (int i = 0; i < nforks; i++)
> >
> > The variable "i" should not be declared in for loop per PostgreSQL coding
> > style.
>
> Fixed.
>
>
> > + /* Check with the lower bound block number and skip the loop */ if
> > + (bufHdr->tag.blockNum < minBlock) continue; /* skip checking the
> > + buffer pool scan */
> >
> > Because of the above code, the following source comment in bufmgr.c should
> > be updated.
> >
> > * We could check forkNum and blockNum as well as the rnode, but the
> > * incremental win from doing so seems small.
> >
> > And, first of all, is this check really useful for performance?
> > Since firstDelBlock for FSM fork is usually small, minBlock would also be
> > small. So I'm not sure how much this is helpful for performance.
>
> This was a suggestion from Sawada-san in the previous email,
> but he also thought that the performance benefit might be small..
> so I just removed the related code block in this patch.
>
>
> > When relation is completely truncated at all (i.e., the number of block to
> > delete first is zero), can RelationTruncate() and smgr_redo()  just call
> > smgrdounlinkall() like smgrDoPendingDeletes() does, instead of calling
> > MarkFreeSpaceMapTruncateRel(), visibilitymap_truncate_prepare() and
> > smgrtruncate()? ISTM that smgrdounlinkall() is faster and simpler.
>
> I haven't applied this in my patch yet.
> If my understanding is correct, smgrdounlinkall() is used for deleting
> relation forks. However, we only truncate (not delete) relations
> in RelationTruncate() and smgr_redo(). I'm not sure if it's correct to
> use it here. Could you expound more your idea on using smgrdounlinkall?

My this comment is pointless, so please ignore it. Sorry for noise..

Here are other comments for the latest patch:

+ block = visibilitymap_truncate_prepare(rel, 0);
+ if (BlockNumberIsValid(block))
+ fork = VISIBILITYMAP_FORKNUM;
+
+ smgrtruncate(rel->rd_smgr, , 1, );

If visibilitymap_truncate_prepare() returns InvalidBlockNumber,
smgrtruncate() should not be called.

+ FreeSpaceMapVacuumRange(rel, first_removed_nblocks, InvalidBlockNumber);

FreeSpaceMapVacuumRange() should be called only when FSM exists,
like the original code does?

Regards,

-- 
Fujii Masao




Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-09-13 Thread Alvaro Herrera
On 2019-Sep-13, Michael Paquier wrote:

> Attached is a patch to fix your suggestions.  This also removes the
> use of HEAP_XMAX_IS_LOCKED_ONLY which did not make completely sense
> either as a "raw" flag.  While on it, the order of the flags can be
> improved to match more the order of htup_details.h

A thought I had as I fell asleep last night is to include the derivate
flags in a separate output column altogether.  So
heap_tuple_infomask_flags() could be made to return two columns, one
with the straight one-flag-per-bit, and another one with the compound
flags.  That way we always have the raw ones available, and we avoid any
confusion about strange cases such as LOCK_UPGRADED and IS_LOCKED_ONLY.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Minimal logical decoding on standbys

2019-09-13 Thread Amit Khandekar
On Mon, 9 Sep 2019 at 16:06, Amit Khandekar  wrote:
>
> On Tue, 3 Sep 2019 at 23:10, Alvaro Herrera  wrote:
> >
> > On 2019-Jul-19, Amit Khandekar wrote:
> >
> > > Attached are the split patches. Included is an additional patch that
> > > has doc changes. Here is what I have added in the docs. Pasting it
> > > here so that all can easily spot how it is supposed to behave, and to
> > > confirm that we are all on the same page :
> >
> > ... Apparently, this patch was not added to the commitfest for some
> > reason; and another patch that *is* in the commitfest has been said to
> > depend on this one (Petr's https://commitfest.postgresql.org/24/1961/
> > which hasn't been updated in quite a while and has received no feedback
> > at all, not even from the listed reviewer Shaun Thomas).  To make
> > matters worse, Amit's patchset no longer applies.
> >
> > What I would like to do is add a link to this thread to CF's 1961 entry
> > and then update all these patches, in order to get things moving.
>
> Hi Alvaro,
>
> Thanks for notifying about this. Will work this week on rebasing this
> patchset and putting it into the 2019-11 commit fest.

Rebased patch set attached.

Added in the Nov commitfest : https://commitfest.postgresql.org/25/2283/



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


logicaldecodng_standby_v1_rebased.tar.gz
Description: GNU Zip compressed data


Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Amit Kapila
On Fri, Sep 13, 2019 at 1:50 PM Fabien COELHO  wrote:
> >>> Is there a reason why we treat "partitions = 0" as a valid value?
> >>
> >> Yes. It is an explicit "do not create partitioned tables", which differ
> >> from 1 which says "create a partitionned table with just one partition".
> >
> > Why would anyone want to use --partitions option in the first case
> > ("do not create partitioned tables")?
>
> Having an explicit value for the default is generally a good idea, eg for
> a script to tests various partitioning settings:
>
>for nparts in 0 1 2 3 4 5 6 7 8 9 ; do
>  pgbench -i --partitions=$nparts ... ;
>  ...
>done
>
> Otherwise you would need significant kludging to add/remove the option.
> Allowing 0 does not harm anyone.
>
> Now if the consensus is to remove an explicit 0, it is simple enough to
> change it, but my opinion is that it is better to have it.
>

Fair enough, let us see if anyone else wants to weigh in.

> >>> I think we should print the information about partitions in
> >>> printResults.  It can help users while analyzing results.
> >
> > + res = PQexec(con,
> > + "select p.partstrat, count(*) "
> > + "from pg_catalog.pg_class as c "
> > + "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
> > + "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
> > + "where c.relname = 'pgbench_accounts' "
> > + "group by 1, c.oid");
> >
> > Can't we write this query with inner join instead of left join?  What
> > additional purpose you are trying to serve by using left join?
>
> I'm ensuring that there is always a one line answer, whether it is
> partitioned or not. Maybe the count(*) should be count(something in p) to
> get 0 instead of 1 on non partitioned tables, though, but this is hidden
> in the display anyway.
>

Sure, but I feel the code will be simplified.  I see no reason for
using left join here.

> > +partition_method_t partition_method = PART_NONE;
> >
> > It is better to make this static.
>
> I do agree, but this would depart from all other global variables around
> which are currently not static.
>

Check QueryMode.

> Maybe a separate patch could turn them all
> as static, but ISTM that this patch should not change the current style.
>

No need to change others, but we can do it for this one.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Why overhead of SPI is so large?

2019-09-13 Thread Pavel Stehule
Hi

I testing very simple function

create or replace function f1(int) returns int as $$ declare i int = 0;
begin while i < $1 loop i = i + 1; end loop; return i; end $$ language
plpgsql immutable;

profile - when function is marked as immutable

   8,65%  postgres[.] ExecInterpExpr
   ▒
   8,59%  postgres[.] AcquireExecutorLocks
   ▒
   6,95%  postgres[.] OverrideSearchPathMatchesCurrent
   ▒
   5,72%  plpgsql.so  [.] plpgsql_param_eval_var
   ▒
   5,15%  postgres[.] AcquirePlannerLocks
  ▒
   4,54%  postgres[.] RevalidateCachedQuery
  ▒
   4,52%  postgres[.] GetCachedPlan
  ▒
   3,82%  postgres[.] ResourceArrayRemove
  ▒
   2,87%  postgres[.] SPI_plan_get_cached_plan
   ▒
   2,80%  plpgsql.so  [.] exec_eval_expr
   ▒
   2,70%  plpgsql.so  [.] exec_assign_value
  ▒
   2,55%  plpgsql.so  [.] exec_stmt
  ▒
   2,53%  postgres[.] recomputeNamespacePath
   ▒
   2,39%  plpgsql.so  [.] exec_cast_value
  ▒
   2,19%  postgres[.] int4pl
   ▒
   2,13%  postgres[.] int4lt
   ▒
   1,98%  postgres[.] CheckCachedPlan

volatile

   7,21%  postgres  [.] GetSnapshotData
   6,92%  plpgsql.so[.] exec_eval_simple_expr
   5,79%  postgres  [.] AcquireExecutorLocks
   5,57%  postgres  [.] ExecInterpExpr
   4,12%  postgres  [.] LWLockRelease
   3,68%  postgres  [.] OverrideSearchPathMatchesCurrent
   3,64%  postgres  [.] PopActiveSnapshot
   3,36%  plpgsql.so[.] plpgsql_param_eval_var
   3,31%  postgres  [.] LWLockAttemptLock
   3,13%  postgres  [.] AllocSetAlloc
   2,91%  postgres  [.] GetCachedPlan
   2,79%  postgres  [.] MemoryContextAlloc
   2,76%  postgres  [.] AcquirePlannerLocks
   2,70%  postgres  [.] ResourceArrayRemove
   2,45%  postgres  [.] PushActiveSnapshot
   2,44%  postgres  [.] RevalidateCachedQuery
   2,29%  postgres  [.] SPI_plan_get_cached_plan
   2,18%  postgres  [.] CopySnapshot
   1,95%  postgres  [.] AllocSetFree
   1,81%  postgres  [.] LWLockAcquire
   1,71%  plpgsql.so[.] exec_assign_value
   1,61%  plpgsql.so[.] exec_stmt
   1,59%  plpgsql.so[.] exec_eval_expr
   1,48%  postgres  [.] int4pl
   1,48%  postgres  [.] CheckCachedPlan
   1,40%  plpgsql.so[.] exec_cast_value
   1,39%  postgres  [.] int4lt
   1,38%  postgres  [.] recomputeNamespacePath
   1,25%  plpgsql.so[.] exec_eval_cleanup
   1,08%  postgres  [.] ScanQueryForLocks
   1,01%  plpgsql.so[.] exec_eval_boolean
   1,00%  postgres  [.] pfree

For tested function almost all CPU should be used for int4pl and int4lt
functions - but there are used only 4% together. I think so almost all of

   8,59%  postgres[.] AcquireExecutorLocks
   ▒
   6,95%  postgres[.] OverrideSearchPathMatchesCurrent
   ▒
   5,72%  plpgsql.so  [.] plpgsql_param_eval_var
   ▒
   5,15%  postgres[.] AcquirePlannerLocks
  ▒
   4,54%  postgres[.] RevalidateCachedQuery
  ▒
   4,52%  postgres[.] GetCachedPlan
  ▒
   3,82%  postgres[.] ResourceArrayRemove
  ▒
   2,87%  postgres[.] SPI_plan_get_cached_plan
   ▒
   2,53%  postgres[.] recomputeNamespacePath
   ▒

can be reduced if we know so we should to call just builtin immutable V1
functions.

My example is a extrem - when you use any embedded SQL, then the profile
will be significantly changed. But for some cases there can be nice some
significant speedup of expressions only functions (like PostGIS)

Regards

Pavel


Re: [PATCH] Tab completion for CREATE OR REPLACE

2019-09-13 Thread Fujii Masao
On Tue, Sep 3, 2019 at 11:04 PM Fujii Masao  wrote:
>
> On Thu, Aug 22, 2019 at 3:06 PM Wang, Shenhao
>  wrote:
> >
> > Hello, hackers:
> >
> > I created a patch about tab completion for command CREATE OR REPLACE in psql
> > includes:
> > CREATE [ OR REPLACE ] FUNCTION
> > CREATE [ OR REPLACE ] PROCEDURE
> > CREATE [ OR REPLACE ] LANGUAGE
> > CREATE [ OR REPLACE ] RULE name AS ON event
> > CREATE [ OR REPLACE ] VIEW AS SELECT
> > CREATE [ OR REPLACE ] AGGREGATE
> > CREATE [ OR REPLACE ] TRANSFORM
>
> Thanks for the patch! The patch looks good to me.

Committed. Thanks!

Regards,

-- 
Fujii Masao




Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Dilip Kumar
On Fri, Sep 13, 2019 at 2:05 PM Fabien COELHO  wrote:
>
>
> Hello Dilip,
>
> > + /* For RANGE, we use open-ended partitions at the beginning and end */
> > + if (p == 1)
> > + sprintf(minvalue, "minvalue");
> > + else
> > + sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);
> > +
> > + if (p < partitions)
> > + sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
> > + else
> > + sprintf(maxvalue, "maxvalue");
> >
> > I do not understand the reason why first partition need to be
> > open-ended?  Because we are clear that the minimum value of the aid is 1
> > in pgbench_accout.  So if you directly use sprintf(minvalue,
> > INT64_FORMAT, (p-1) * part_size + 1);  then also it will give 1 as
> > minvalue for the first partition and that will be the right thing to do.
> > Am I missing something here?

>
> This is simply for the principle that any value allowed for the primary
> key type has a corresponding partition, and also that it exercices these
> special values.

IMHO, the primary key values for the pgbench_accout tables are always
within the defined range minvalue=1 and maxvalue=scale*10, right?
>
> It also probably reduces the cost of checking whether a value belongs to
> the first partition because one test is removed, so there is a small
> additional performance benefit beyond principle and coverage.

Ok,  I agree that it will slightly reduce the cost for the tuple
falling in the first and the last partition.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Why overhead of SPI is so large?

2019-09-13 Thread Konstantin Knizhnik



On 13.09.2019 10:16, Pavel Stehule wrote:



pá 13. 9. 2019 v 9:09 odesílatel Konstantin Knizhnik 
mailto:k.knizh...@postgrespro.ru>> napsal:




On 24.08.2019 19:13, Pavel Stehule wrote:



so 24. 8. 2019 v 18:01 odesílatel David Fetter mailto:da...@fetter.org>> napsal:

On Fri, Aug 23, 2019 at 11:10:28AM +0200, Pavel Stehule wrote:
> pá 23. 8. 2019 v 11:05 odesílatel Konstantin Knizhnik <
> k.knizh...@postgrespro.ru
> napsal:
>
> >
> >
> > On 22.08.2019 18:56, Pavel Stehule wrote:
> >
> >
> >
> > čt 22. 8. 2019 v 17:51 odesílatel Konstantin Knizhnik <
> > k.knizh...@postgrespro.ru
> napsal:
> >
> >> Some more information...
> >> First of all I found out that marking PL/pgSQL function
as immutable
> >> significantly increase speed of its execution:
> >> 19808 ms vs. 27594. It happens because
exec_eval_simple_expr is taken
> >> snapshot if function is volatile (default).
> >> I wonder if PL/pgSQL compiler can detect that evaluated
expression itself
> >> is actually immutable  and there is no need to take snapshot
> >> for each invocation of this function. Also I have tried
yet another PL
> >> language - JavaScript, which is now new outsider,
despite to the fact that
> >> v8 JIT compiler is very good.
> >>
> >
> > I have a plan to do some work in this direction. Snapshot
is not necessary
> > for almost buildin functions. If expr calls only buildin
functions, then
> > probably can be called without snapshot and without any
work with plan
> > cache.
> >
> >
> > I wonder if the following simple patch is correct?
> >
>
> You cannot to believe to user defined functions so
immutable flag is
> correct. Only buildin functions are 100% correct.
>
> CREATE OR REPLACE FUNCTION foo()
> RETURNS int AS $$
> SELECT count(*) FROM pg_class;
> $$ LANGUAGE sql IMMUTABLE;
>
> is working.

No, it's lying to the RDBMS, so it's pilot error. The problem of
determining from the function itself whether it is in fact
immutable
is, in general, equivalent to the Halting Problem, so no, we
can't
figure it out. We do need to trust our users not to lie to
us, and we
do not need to protect them from the consequences when they do.


I have not any problem with fixing this behave when there will be
any alternative.

I can imagine new special flag that can be used for STABLE
functions, that enforce one shot plans and can be optimized
similar like IMMUTABLE functions now - using result in planning time.

The users lie because they must - there is not a alternative.
There is not any other solution - and estimation errors related
to a joins are fundamental issue.



Pavel, I wonder if I can put my patch (with fix which performs
this optimization only for built-in functions) to commitfest or
you prefer to do it yourself in some other way and propose your
own solution?


I think so your patch is good enough for commitfest.

It doesn't remove all overhead - I think so there is lot of overhead 
related to plan cache, but it in good direction.


Probably for these expressions is our final target using a cached JIT 
- but nobody knows when it will be. I'll not have to time for my 
experiments before October.




This is profile of execution of PL/pgSQL function with my patch:

   5.39%  postgres  plpgsql.so [.] exec_assign_value
   5.10%  postgres  postgres   [.] ExecInterpExpr
   4.70%  postgres  postgres   [.] tts_buffer_heap_getsomeattrs
   4.56%  postgres  plpgsql.so [.] exec_move_row_from_fields
   3.87%  postgres  postgres   [.] ExecScan
   3.74%  postgres  plpgsql.so [.] exec_eval_expr
   3.64%  postgres  postgres   [.] heap_form_tuple
   3.13%  postgres  postgres   [.] heap_fill_tuple
   3.07%  postgres  postgres   [.] heapgettup_pagemode
   2.95%  postgres  postgres   [.] heap_deform_tuple
   2.92%  postgres  plpgsql.so [.] plpgsql_param_eval_var
   2.64%  postgres  postgres   [.] HeapTupleSatisfiesVisibility
   2.61%  postgres  postgres   [.] AcquirePlannerLocks
   2.58%  postgres  postgres   [.] AcquireExecutorLocks
   2.43%  postgres  postgres   [.] GetCachedPlan
   2.26%  postgres  plpgsql.so [.] exec_stmt
   2.23%  postgres  plpgsql.so [.] exec_cast_value
   1.89%  postgres  postgres   [.] AllocSetAlloc
   1.75%  postgres  postgres   [.] palloc0
   1.73%  postgres  plpgsql.so  

Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Fabien COELHO



Hello Dilip,


+ /* For RANGE, we use open-ended partitions at the beginning and end */
+ if (p == 1)
+ sprintf(minvalue, "minvalue");
+ else
+ sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);
+
+ if (p < partitions)
+ sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+ else
+ sprintf(maxvalue, "maxvalue");

I do not understand the reason why first partition need to be 
open-ended?  Because we are clear that the minimum value of the aid is 1 
in pgbench_accout.  So if you directly use sprintf(minvalue, 
INT64_FORMAT, (p-1) * part_size + 1);  then also it will give 1 as 
minvalue for the first partition and that will be the right thing to do. 
Am I missing something here?


This is simply for the principle that any value allowed for the primary 
key type has a corresponding partition, and also that it exercices these 
special values.


It also probably reduces the cost of checking whether a value belongs to 
the first partition because one test is removed, so there is a small 
additional performance benefit beyond principle and coverage.


--
Fabien.




Re: [PATCH][PROPOSAL] Add enum releation option type

2019-09-13 Thread Michael Paquier
On Fri, Sep 13, 2019 at 10:16:30AM +0300, Nikolay Shaplov wrote:
> I also attached a diff of what I have done to get these warnings. It should 
> be 
> applied to latest version of patch we are discussing.

If you do that I think that the CF bot is not going to appreciate and
will result in failures trying to apply the patch.
--
Michael


signature.asc
Description: PGP signature


Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Dilip Kumar
On Fri, Sep 13, 2019 at 1:35 PM Fabien COELHO  wrote:

Thanks for the updated version of the patch.

> > Generally, we give one blank line between the variable declaration and
> > the first statement of the block.
>
> Ok.
>
> > (p-1) -> (p - 1)
>
> Ok.
>
> > I am just wondering will it be a good idea to expand it to support
> > multi-level partitioning?
>
> ISTM that how the user could specify multi-level parameters is pretty
> unclear, so I would let that as a possible extension if someone wants it
> enough.
Ok
>
> Attached v6 implements the two cosmetic changes outlined above.

+ /* For RANGE, we use open-ended partitions at the beginning and end */
+ if (p == 1)
+ sprintf(minvalue, "minvalue");
+ else
+ sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);
+
+ if (p < partitions)
+ sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+ else
+ sprintf(maxvalue, "maxvalue");

I do not understand the reason why first partition need to be
open-ended?  Because we are clear that the minimum value of the aid is
1 in pgbench_accout.  So if you directly use
sprintf(minvalue, INT64_FORMAT, (p-1) * part_size + 1);  then also it
will give 1 as minvalue for the first partition and that will be the
right thing to do.  Am I missing something here?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Fabien COELHO



Hello Amit,


Is there a reason why we treat "partitions = 0" as a valid value?


Yes. It is an explicit "do not create partitioned tables", which differ
from 1 which says "create a partitionned table with just one partition".


Why would anyone want to use --partitions option in the first case
("do not create partitioned tables")?


Having an explicit value for the default is generally a good idea, eg for 
a script to tests various partitioning settings:


  for nparts in 0 1 2 3 4 5 6 7 8 9 ; do
pgbench -i --partitions=$nparts ... ;
...
  done

Otherwise you would need significant kludging to add/remove the option.
Allowing 0 does not harm anyone.

Now if the consensus is to remove an explicit 0, it is simple enough to 
change it, but my opinion is that it is better to have it.



I think we should print the information about partitions in
printResults.  It can help users while analyzing results.


+ res = PQexec(con,
+ "select p.partstrat, count(*) "
+ "from pg_catalog.pg_class as c "
+ "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
+ "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
+ "where c.relname = 'pgbench_accounts' "
+ "group by 1, c.oid");

Can't we write this query with inner join instead of left join?  What
additional purpose you are trying to serve by using left join?


I'm ensuring that there is always a one line answer, whether it is 
partitioned or not. Maybe the count(*) should be count(something in p) to 
get 0 instead of 1 on non partitioned tables, though, but this is hidden 
in the display anyway.



+partition_method_t partition_method = PART_NONE;

It is better to make this static.


I do agree, but this would depart from all other global variables around 
which are currently not static. Maybe a separate patch could turn them all 
as static, but ISTM that this patch should not change the current style.


--
Fabien.




Re: proposal - patch: psql - sort_by_size

2019-09-13 Thread Pavel Stehule
pá 13. 9. 2019 v 9:35 odesílatel Pavel Stehule 
napsal:

>
>
> čt 12. 9. 2019 v 0:01 odesílatel Alvaro Herrera from 2ndQuadrant <
> alvhe...@alvh.no-ip.org> napsal:
>
>> On 2019-Jul-31, Rafia Sabih wrote:
>>
>> > I had a look at this patch, seems like a useful thing to have.
>>
>> So the two initial questions for this patch are
>>
>> 1. Is this a feature we want?
>> 2. Is the user interface correct?
>>
>> I think the feature is useful, and Rafia also stated as much.  Therefore
>> ISTM we're okay on that front.
>>
>> As for the UI, Fabien thinks the patch adopts one that's far too
>> simplistic, and I agree.  Fabien has proposed a number of different UIs,
>> but doesn't seem convinced of any of them.  One of them was to have
>> "options" in the command,
>>  \dt+ [-o 1d,2a]
>>
>
>> Another idea is to use variables in a more general form.  So instead of
>> Pavel's proposal of SORT_BY_SIZE=on we could do something like
>> SORT_BY=[list]
>> where the list after the equal sign consists of predetermined elements
>> (say SIZE, NAME, SCHEMA and so on) and indicates a specific column to
>> sort by.  This is less succint than Fabien's idea, and in particular you
>> can't specify it in the command itself but have to set the variable
>> beforehand instead.
>>
>
> for more generic design probably you need redesign psql report systems.
> You cannot to use just ORDER BY 1,2 on some columns, but you need to
> produce (and later hide) some content (for size).
>
> So it can be unfunny complex patch. I finished sort inside pspg and I it
> looks to be better solution, than increase complexity (and less
> maintainability (due support old releases)).
>

I changed status for this patch to withdrawn

I like a idea with enhancing \dt about some clauses like " \dt+ [-o
1d,2a]". But it needs probably significant redesign of describe.c module.
Maybe implementation of some simple query generator for queries to system
catalogue can good.

Surely - this should be implemented from scratch - I am not a volunteer for
that.

Pavel


> Regards
>
> Pavel
>
>
>> --
>> Álvaro Herrerahttps://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>


Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Fabien COELHO


Hello Dilip,


Generally, we give one blank line between the variable declaration and
the first statement of the block.


Ok.


(p-1) -> (p - 1)


Ok.

I am just wondering will it be a good idea to expand it to support 
multi-level partitioning?


ISTM that how the user could specify multi-level parameters is pretty 
unclear, so I would let that as a possible extension if someone wants it 
enough.


Attached v6 implements the two cosmetic changes outlined above.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..2b9fd07561 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,14 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/* partitioning for pgbench_accounts table, 0 for no partitioning */
+int 		partitions = 0;
+
+typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
+  partition_method_t;
+
+partition_method_t partition_method = PART_NONE;
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +625,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3612,17 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option if not 100.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	if (fillfactor < 100)
+		snprintf(opts + strlen(opts), len - strlen(opts),
+ " with (fillfactor=%d)", fillfactor);
+}
+
 /*
  * Create pgbench's standard tables
  */
@@ -3664,9 +3686,18 @@ initCreateTables(PGconn *con)
 
 		/* Construct new create table statement. */
 		opts[0] = '\0';
-		if (ddl->declare_fillfactor)
+
+		/* Partition pgbench_accounts table */
+		if (partitions >= 1 && strcmp(ddl->table, "pgbench_accounts") == 0)
+		{
 			snprintf(opts + strlen(opts), sizeof(opts) - strlen(opts),
-	 " with (fillfactor=%d)", fillfactor);
+	 " partition by %s (aid)",
+	 partition_method == PART_RANGE ? "range" : "hash");
+		}
+		else if (ddl->declare_fillfactor)
+			/* fillfactor is only expected on actual tables */
+			append_fillfactor(opts, sizeof(opts));
+
 		if (tablespace != NULL)
 		{
 			char	   *escape_tablespace;
@@ -3686,6 +3717,57 @@ initCreateTables(PGconn *con)
 
 		executeStatement(con, buffer);
 	}
+
+	/* if needed, pgbench_accounts partitions must be created manually */
+	if (partitions >= 1)
+	{
+		char		ff[64];
+
+		ff[0] = '\0';
+		append_fillfactor(ff, sizeof(ff));
+
+		fprintf(stderr, "creating %d partitions...\n", partitions);
+
+		for (int p = 1; p <= partitions; p++)
+		{
+			char		query[256];
+
+			if (partition_method == PART_RANGE)
+			{
+int64		part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
+char		minvalue[32], maxvalue[32];
+
+/* For RANGE, we use open-ended partitions at the beginning and end */
+if (p == 1)
+	sprintf(minvalue, "minvalue");
+else
+	sprintf(minvalue, INT64_FORMAT, (p - 1) * part_size + 1);
+
+if (p < partitions)
+	sprintf(maxvalue, INT64_FORMAT, p * part_size + 1);
+else
+	sprintf(maxvalue, "maxvalue");
+
+snprintf(query, sizeof(query),
+		 "create%s table pgbench_accounts_%d\n"
+		 "  partition of pgbench_accounts\n"
+		 "  for values from (%s) to (%s)%s\n",
+		 unlogged_tables ? " unlogged" : "", p,
+		 minvalue, maxvalue, ff);
+			}
+			else if (partition_method == PART_HASH)
+snprintf(query, sizeof(query),
+		 "create%s table pgbench_accounts_%d\n"
+		 "  partition of 

Re: proposal - patch: psql - sort_by_size

2019-09-13 Thread Pavel Stehule
čt 12. 9. 2019 v 0:01 odesílatel Alvaro Herrera from 2ndQuadrant <
alvhe...@alvh.no-ip.org> napsal:

> On 2019-Jul-31, Rafia Sabih wrote:
>
> > I had a look at this patch, seems like a useful thing to have.
>
> So the two initial questions for this patch are
>
> 1. Is this a feature we want?
> 2. Is the user interface correct?
>
> I think the feature is useful, and Rafia also stated as much.  Therefore
> ISTM we're okay on that front.
>
> As for the UI, Fabien thinks the patch adopts one that's far too
> simplistic, and I agree.  Fabien has proposed a number of different UIs,
> but doesn't seem convinced of any of them.  One of them was to have
> "options" in the command,
>  \dt+ [-o 1d,2a]
>

> Another idea is to use variables in a more general form.  So instead of
> Pavel's proposal of SORT_BY_SIZE=on we could do something like
> SORT_BY=[list]
> where the list after the equal sign consists of predetermined elements
> (say SIZE, NAME, SCHEMA and so on) and indicates a specific column to
> sort by.  This is less succint than Fabien's idea, and in particular you
> can't specify it in the command itself but have to set the variable
> beforehand instead.
>

for more generic design probably you need redesign psql report systems. You
cannot to use just ORDER BY 1,2 on some columns, but you need to produce
(and later hide) some content (for size).

So it can be unfunny complex patch. I finished sort inside pspg and I it
looks to be better solution, than increase complexity (and less
maintainability (due support old releases)).

Regards

Pavel


> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Why overhead of SPI is so large?

2019-09-13 Thread Pavel Stehule
pá 13. 9. 2019 v 9:09 odesílatel Konstantin Knizhnik <
k.knizh...@postgrespro.ru> napsal:

>
>
> On 24.08.2019 19:13, Pavel Stehule wrote:
>
>
>
> so 24. 8. 2019 v 18:01 odesílatel David Fetter  napsal:
>
>> On Fri, Aug 23, 2019 at 11:10:28AM +0200, Pavel Stehule wrote:
>> > pá 23. 8. 2019 v 11:05 odesílatel Konstantin Knizhnik <
>> > k.knizh...@postgrespro.ru> napsal:
>> >
>> > >
>> > >
>> > > On 22.08.2019 18:56, Pavel Stehule wrote:
>> > >
>> > >
>> > >
>> > > čt 22. 8. 2019 v 17:51 odesílatel Konstantin Knizhnik <
>> > > k.knizh...@postgrespro.ru> napsal:
>> > >
>> > >> Some more information...
>> > >> First of all I found out that marking PL/pgSQL function as immutable
>> > >> significantly increase speed of its execution:
>> > >> 19808 ms vs. 27594. It happens because exec_eval_simple_expr is taken
>> > >> snapshot if function is volatile (default).
>> > >> I wonder if PL/pgSQL compiler can detect that evaluated expression
>> itself
>> > >> is actually immutable  and there is no need to take snapshot
>> > >> for each invocation of this function. Also I have tried yet another
>> PL
>> > >> language - JavaScript, which is now new outsider, despite to the
>> fact that
>> > >> v8 JIT compiler is very good.
>> > >>
>> > >
>> > > I have a plan to do some work in this direction. Snapshot is not
>> necessary
>> > > for almost buildin functions. If expr calls only buildin functions,
>> then
>> > > probably can be called without snapshot and without any work with plan
>> > > cache.
>> > >
>> > >
>> > > I wonder if the following simple patch is correct?
>> > >
>> >
>> > You cannot to believe to user defined functions so immutable flag is
>> > correct. Only buildin functions are 100% correct.
>> >
>> > CREATE OR REPLACE FUNCTION foo()
>> > RETURNS int AS $$
>> > SELECT count(*) FROM pg_class;
>> > $$ LANGUAGE sql IMMUTABLE;
>> >
>> > is working.
>>
>> No, it's lying to the RDBMS, so it's pilot error. The problem of
>> determining from the function itself whether it is in fact immutable
>> is, in general, equivalent to the Halting Problem, so no, we can't
>> figure it out. We do need to trust our users not to lie to us, and we
>> do not need to protect them from the consequences when they do.
>>
>
> I have not any problem with fixing this behave when there will be any
> alternative.
>
> I can imagine new special flag that can be used for STABLE functions, that
> enforce one shot plans and can be optimized similar like IMMUTABLE
> functions now - using result in planning time.
>
> The users lie because they must - there is not a alternative. There is not
> any other solution - and estimation errors related to a joins are
> fundamental issue.
>
>
> Pavel, I wonder if I can put my patch (with fix which performs this
> optimization only for built-in functions) to commitfest or you prefer to do
> it yourself in some other way and propose your own solution?
>

I think so your patch is good enough for commitfest.

It doesn't remove all overhead - I think so there is lot of overhead
related to plan cache, but it in good direction.

Probably for these expressions is our final target using a cached JIT - but
nobody knows when it will be. I'll not have to time for my experiments
before October.



>
>
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: [PATCH][PROPOSAL] Add enum releation option type

2019-09-13 Thread Nikolay Shaplov
В письме от четверг, 5 сентября 2019 г. 11:42:27 MSK пользователь Alvaro 
Herrera from 2ndQuadrant написал:
> After looking closer once again, I don't like this patch.
> 
> I think the FOOBAR_ENUM_DEF defines serve no purpose, other than
> source-code placement next to the enum value definitions.  I think for
> example check_option, living in reloptions.c, should look like this:

This sounds as a good idea, I tried it, but did not succeed.

When I do as you suggest I get a bunch of warnings, and more over it, tests do 
not pass afterwards.

reloptions.c:447:3: warning: braces around scalar initializer
   {
   ^
reloptions.c:447:3: warning: (near initialization for 
‘enumRelOpts[0].members’)
reloptions.c:448:4: warning: braces around scalar initializer
{ "on",  GIST_OPTION_BUFFERING_ON },
^
reloptions.c:448:4: warning: (near initialization for 
‘enumRelOpts[0].members’)
reloptions.c:448:4: warning: initialization from incompatible pointer type
reloptions.c:448:4: warning: (near initialization for 
‘enumRelOpts[0].members’)

and so on, see full warning list attached.

I've played with it around, and did some googling, but without much success.
If we are moving this way (an this way seems to be good one), I need you help, 
because this thing is beyond my C knowledge, I will not manage it myself.

I also attached a diff of what I have done to get these warnings. It should be 
applied to latest version of patch we are discussing.
reloptions.c:447:3: warning: braces around scalar initializer
   {
   ^
reloptions.c:447:3: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:448:4: warning: braces around scalar initializer
{"on",  GIST_OPTION_BUFFERING_ON },
^
reloptions.c:448:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:448:4: warning: initialization from incompatible pointer type
reloptions.c:448:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:448:4: warning: excess elements in scalar initializer
reloptions.c:448:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:449:4: warning: braces around scalar initializer
{"off",  GIST_OPTION_BUFFERING_OFF },
^
reloptions.c:449:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:449:4: warning: initialization from incompatible pointer type
reloptions.c:449:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:449:4: warning: excess elements in scalar initializer
reloptions.c:449:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:449:4: warning: excess elements in scalar initializer
reloptions.c:449:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:450:4: warning: braces around scalar initializer
{"auto", GIST_OPTION_BUFFERING_AUTO },
^
reloptions.c:450:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:450:4: warning: initialization from incompatible pointer type
reloptions.c:450:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:450:4: warning: excess elements in scalar initializer
reloptions.c:450:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:450:4: warning: excess elements in scalar initializer
reloptions.c:450:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:451:4: warning: braces around scalar initializer
{ NULL }
^
reloptions.c:451:4: warning: (near initialization for ‘enumRelOpts[0].members’)
reloptions.c:451:4: warning: excess elements in scalar initializer
reloptions.c:451:4: warning: (near initialization for ‘enumRelOpts[0].members’)
diff --git a/.gitignore b/.gitignore
index 794e35b..37331c2 100644
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index fffab3a..fcf4766 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -433,8 +433,6 @@ static relopt_real realRelOpts[] =
 	{{NULL}}
 };
 
-static relopt_enum_elt_def gist_buffering_enum_def[] =
-GIST_OPTION_BUFFERING_ENUM_DEF;
 static relopt_enum_elt_def view_check_option_enum_def[] =
 VIEW_OPTION_CHECK_OPTION_ENUM_DEF;
 static relopt_enum enumRelOpts[] =
@@ -446,8 +444,13 @@ static relopt_enum enumRelOpts[] =
 RELOPT_KIND_GIST,
 AccessExclusiveLock
 		},
-			gist_buffering_enum_def,
-			GIST_OPTION_BUFFERING_AUTO
+		{
+			{"on",		GIST_OPTION_BUFFERING_ON },
+			{"off",		GIST_OPTION_BUFFERING_OFF },
+			{"auto",	GIST_OPTION_BUFFERING_AUTO },
+			{ NULL }
+		},
+		GIST_OPTION_BUFFERING_AUTO
 	},
 	{
 		{
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index d586b04..047490a 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -379,24 +379,15 @@ typedef struct GISTBuildBuffers
 } GISTBuildBuffers;
 
 /*
- * Buffering is an enum option
- * gist_option_buffering_numeric_values defines a numeric 

Re: pgbench - allow to create partitioned tables

2019-09-13 Thread Amit Kapila
On Wed, Sep 11, 2019 at 6:08 PM Fabien COELHO  wrote:
>
>

I would like to take inputs from others as well for the display part
of this patch.  After this patch, for a simple-update pgbench test,
the changed output will be as follows (note: partition method and
partitions):
pgbench.exe -c 4 -j 4 -T 10 -N postgres
starting vacuum...end.
transaction type: 
scaling factor: 1
partition method: hash
partitions: 3
query mode: simple
number of clients: 4
number of threads: 4
duration: 10 s
number of transactions actually processed: 14563
latency average = 2.749 ms
tps = 1454.899150 (including connections establishing)
tps = 1466.689412 (excluding connections establishing)

What do others think about this?  This will be the case when the user
has used --partitions option in pgbench, otherwise, it won't change.

>
> > + case 11: /* partitions */
> > + initialization_option_set = true;
> > + partitions = atoi(optarg);
> > + if (partitions < 0)
> > + {
> > + fprintf(stderr, "invalid number of partitions: \"%s\"\n",
> > + optarg);
> > + exit(1);
> > + }
> > + break;
> >
> > Is there a reason why we treat "partitions = 0" as a valid value?
>
> Yes. It is an explicit "do not create partitioned tables", which differ
> from 1 which says "create a partitionned table with just one partition".
>

Why would anyone want to use --partitions option in the first case
("do not create partitioned tables")?

>
> > I think we should print the information about partitions in
> > printResults.  It can help users while analyzing results.
>
> Hmmm. Why not, with some hocus-pocus to get the information out of
> pg_catalog, and trying to fail gracefully so that if pgbench is run
> against a no partitioning-support version.
>

+ res = PQexec(con,
+ "select p.partstrat, count(*) "
+ "from pg_catalog.pg_class as c "
+ "left join pg_catalog.pg_partitioned_table as p on (p.partrelid = c.oid) "
+ "left join pg_catalog.pg_inherits as i on (c.oid = i.inhparent) "
+ "where c.relname = 'pgbench_accounts' "
+ "group by 1, c.oid");

Can't we write this query with inner join instead of left join?  What
additional purpose you are trying to serve by using left join?

> > *
> > +enum { PART_NONE, PART_RANGE, PART_HASH }
> > + partition_method = PART_NONE;
> > +
> >
> > I think it is better to follow the style of QueryMode enum by using
> > typedef here, that will make look code in sync with nearby code.
>
> Hmmm. Why not. This means inventing a used-once type name for
> partition_method. My great creativity lead to partition_method_t.
>

+partition_method_t partition_method = PART_NONE;

It is better to make this static.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Extending range type operators to cope with elements

2019-09-13 Thread Esteban Zimanyi
Dear all

While developing MobilityDB we needed to extend the range type operators so
they cope with elements. In the same way that currently the range types
support both
- @> contains range/element
- <@ element/range is contained by
we extended the left (<<), overleft (&<), right (>>), and overright (&>)
operators so they can cope with both elements and ranges at the left- or
right-hand side. These can be seen in github
https://github.com/ULB-CoDE-WIT/MobilityDB/blob/master/src/rangetypes_ext.c

If you think that these extensions could be useful for the community at
large, I can prepare a PR. Please let me know.

Esteban


Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Michael Paquier
On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> It's fine if things are updated as well -- it's just you need to make
> sure that those places know whether or not they are supposed to be
> doing those updates. Again, why can't we just pass down a value
> telling them "do reindex-style progress updates" or "do cluster-style
> progress updates" or "do no progress updates"?

That's invasive.  CREATE INDEX reporting goes pretty deep into the
tree, with steps dedicated to the builds and scans of btree (nbtsort.c
for example) and hash index AMs.  In this case we have something that
does somewhat what you are looking for with report_progress which gets
set to true only for VACUUM.  If we were to do something like that, we
would also need to keep some sort of mapping regarding which command
ID (as defined by ProgressCommandType) is able to use which set of
parameter flags (as defined by the arguments of
pgstat_progress_update_param() and its multi_* cousin).  Then comes
the issue that some parameters may be used by multiple command types,
while other don't (REINDEX and CREATE INDEX have some shared
mapping).
--
Michael


signature.asc
Description: PGP signature


Re: Close stdout and stderr in syslogger

2019-09-13 Thread Святослав Ермилин
Hi Tom, Thank you for quick reply. > I'm quite certain that the current behavior is intentional, if only> because closing the syslogger's stderr would make it impossible to> debug problems inside the syslogger.Developer who debugs syslogger, probably, can remove fclose() calls.Maybe we can have a switch for this?As far as I understand there is no stdout printing in syslogger? > Why is it a problem that we leave it open? For us it's problem because logfiles are rotated by size.The size of a file could be on order of few Gbs.We can unlink logfile, logrotate is doing so, but the file will not be deleteform file system. We can use copy-truncate log rotation, this solves the problem,but such solution seems outdated. >  I don't believe either that the file will grow much (in normal cases anyway) We are not in control of what user write to logs. We provide managed PostgreSQL.Some users can set log_statement = 'all' and get tons of logs.Our goal is to make logging system accountable to free space monitoring andefficient (no log copy, no extra occupied space). We could also avoid pointing pg_ctl log to same logfile as syslogger writes.But that's how postgresql-common works and it is easier for our users to seethis output among usual logs to understand what was goining on with database. > In any case, the proposed patch almost certainly introduces new> problems, in that you dropped the fcloses's into code that> executes repeatedly. Probably, I should place fclose() right after successfull syslogger start? _ To reproduce problem on can follow this steps: 1) Use this settings:log_destination = 'stderr,csvlog'logging_collector = on log_filename = 'postgresql-11-data.log'log_file_mode = '0640'   log_truncate_on_rotation = off 2) pg_ctl -D DemoDb --log=/private/tmp/pg/postgresql-11-data.log start  3) Check out open descriptors (look for types 1w - stdout and 2w - stderr)lsof | grep postgres | grep logpostgres   7968 munakoiso    1w      REG                1,4        1170 3074156 /private/tmp/pg/postgresql-11-data.logpostgres   7968 munakoiso    2w      REG                1,4        1170 3074156 /private/tmp/pg/postgresql-11-data.logpostgres   7968 munakoiso   12w      REG                1,4        1170 3074156 /private/tmp/pg/postgresql-11-data.log 4) delete file /private/tmp/pg/postgresql-11-data.log to imitate logrotate, thenpsql postgres -c "select pg_rotate_logfile();" 5) Check out open descriptorswithout using this patch you shold find here lines with 1w and 2wwith using should not lsof | grep postgres | grep logpostgres   8082 munakoiso    5w      REG                1,4        2451 3074156 /private/tmp/pg/postgresql-11-data.log  __Sviatoslav ErmilinYandex  12.09.2019, 18:33, "Tom Lane" :=?utf-8?B?0KHQstGP0YLQvtGB0LvQsNCyINCV0YDQvNC40LvQuNC9?=  writes: Hi! Few months ago we have encountered situation when some quite big open log files were open by Postres despite being deleted.This affects free space caluculation in out managed PostgreSQL instances.Currently I'm investigating this issue.We traced some roots to unclosed descriptors in Perl code of postgresql-common [0], but we still face some unclosed descriptors pointing to log file. Debian tools from postgresql-common starts pg_ctl piped to logfile. Descriptor is piped to logfile and block it for delete.That is why we can't delete logfile.1 after logrotate.And after second logrotate logfile.1 is in "deleted" status, but can't be deleted in fact. Here I apply path with possible solution. In this patch stdout and stderr pipes are just closed in syslogger. --Sviatoslav ErmilinYandex [0] https://salsa.debian.org/postgresql/postgresql-common/commit/580aa0677edc222ebaf6e1031cf3929f847f27fbI'm quite certain that the current behavior is intentional, if onlybecause closing the syslogger's stderr would make it impossible todebug problems inside the syslogger. Why is it a problem that weleave it open? I don't believe either that the file will grow much(in normal cases anyway), or that it's impossible to unlink it(except on Windows, but you didn't say anything about Windows).In any case, the proposed patch almost certainly introduces newproblems, in that you dropped the fcloses's into code thatexecutes repeatedly.regards, tom lane

Re: pgsql: Use data directory inode number, not port, to select SysV resour

2019-09-13 Thread Noah Misch
On Sun, Sep 08, 2019 at 05:54:12PM -0400, Andrew Dunstan wrote:
> I'm going to disable this test (src/test/recovery/t/017_shm.pl) on
> Windows on the back branches too unless there's a violent objection. The
> reason is that the script runs "postgres --single" and that fails on
> Windows when run by an administrative account. We've carefully enabled
> postgres and its tests to run safely under an admin account. I
> discovered this as part of my myss2 testing.

I'm reading that the test falsified this assertion that we've enabled postgres
to run safely under an admin account.  Enabling safe use of admin accounts
entails fixing single-user mode.  (Alternately, one could replace the "vacuum
that database in single-user mode" errhint with a reference to some
not-yet-built alternative.  That sounds harder.)