date:20190922

Re: pgbench - allow to create partitioned tables

2019-09-22 Thread Fabien COELHO



Hello Amit,


It is better for a user to write a custom script for such cases.


I kind-of agree, but IMHO this is not for pgbench to decide what is better 
for the user and to fail on a script that would not fail.



Because after that "select-only" or "simple-update" script doesn't
make any sense. [...].


What make sense in a benchmarking context may not be what you think. For 
instance, AFAICR, I already removed benevolent but misplaced guards which 
were preventing running scripts without queries: if one wants to look at 
pgbench overheads because they are warry that it may be too high, they 
really need to be allowed to run such scripts.


This not for us to decide, and as I already said they do if you want to 
test no-op overheads. Moreover the problem pre-exists: if the user deletes 
the contents of pgbench_accounts these scripts are no-op, and we do not 
complain. The no partition attached is just a particular case.


Having said that, I see your point and don't mind allowing such cases 
until we don't have to write special checks in the code to support such 
cases.


Indeed, it is also simpler to not care about such issues in the code.

[...] Now, we can have a detailed comment in printResults to explain why 
we have a different check there as compare to other code paths or change 
other code paths to have a similar check as printResults, but I am not 
convinced of any of those options.


Yep. ISTM that the current version is reasonable.

[...] I am talking about the call to append_fillfactor in 
createPartitions() function.  See, in my version, there are some 
comments.


Ok, I understand that you want a comment. Patch v15 does that.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index c857aa3cba..e3a0abb4c7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -306,6 +306,31 @@ pgbench  options  d
   
  
 
+ 
+  --partitions=NUM
+  
+   
+Create a partitioned pgbench_accounts table with
+NUM partitions of nearly equal size for
+the scaled number of accounts.
+Default is 0, meaning no partitioning.
+   
+  
+ 
+
+ 
+  --partition-method=NAME
+  
+   
+Create a partitioned pgbench_accounts table with
+NAME method.
+Expected values are range or hash.
+This option requires that --partitions is set to non-zero.
+If unspecified, default is range.
+   
+  
+ 
+
  
   --tablespace=tablespace
   
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ed7652bfbf..9e45f00fb5 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -186,6 +186,25 @@ int64		latency_limit = 0;
 char	   *tablespace = NULL;
 char	   *index_tablespace = NULL;
 
+/*
+ * Number of "pgbench_accounts" partitions, found or to create.
+ * When creating, 0 is the default and means no partitioning.
+ * When running, this is the actual number of partitions.
+ */
+static int	partitions = 0;
+
+/* partitioning strategy for "pgbench_accounts" */
+typedef enum
+{
+	PART_NONE,		/* no partitioning */
+	PART_RANGE,	/* range partitioning */
+	PART_HASH		/* hash partitioning */
+}
+			partition_method_t;
+
+static partition_method_t partition_method = PART_NONE;
+static const char *PARTITION_METHOD[] = {"none", "range", "hash"};
+
 /* random seed used to initialize base_random_sequence */
 int64		random_seed = -1;
 
@@ -617,6 +636,9 @@ usage(void)
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
+		   "  --partitions=NUM partition pgbench_accounts in NUM parts (default: 0)\n"
+		   "  --partition-method=(range|hash)\n"
+		   "   partition pgbench_accounts with this method (default: range)\n"
 		   "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
@@ -3601,6 +3623,87 @@ initDropTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
+/*
+ * add fillfactor percent option.
+ */
+static void
+append_fillfactor(char *opts, int len)
+{
+	/* as default is 100, it could be removed in this case */
+	snprintf(opts + strlen(opts), len - strlen(opts),
+			 " with (fillfactor=%d)", fillfactor);
+}
+
+/*
+ * Create "pgbench_accounts" partitions if needed.
+ *
+ * This is the larger table of pgbench default tpc-b like schema
+ * with a known size, so that it can be partitioned by range.
+ */
+static void
+createPartitions(PGconn *con)
+{
+	char		ff[64];
+
+	ff[0] = '\0';
+
+	/*
+	 * Per ddlinfo in initCreateTables below, fillfactor is needed on
+	 * table pgbench_accounts.
+	 */
+	append_fillfactor(ff, sizeof(ff));
+
+	Assert(partitions > 0);
+
+	fprintf(stderr, "creating %d partitions...\n", partit

Re: row filtering for logical replication

2019-09-22 Thread movead li

Hello

I find several problems as below when I test the patches:

1. There be some regression problem after apply 0001.patch~0005.patch
   The regression problem is solved in 0006.patch
2. There be a data wrong after create subscription if the relation contains
 inherits table, for example:
   ##
   The Tables:
   CREATE TABLE cities (
   nametext,
   population  float,
   altitudeint
   );
   CREATE TABLE capitals (
   state   char(2)
   ) INHERITS (cities);
   
   Do on publication:
   insert into cities values('aaa',123, 134);
   insert into capitals values('bbb',123, 134);
   create publication pub_tc for table cities where (altitude > 100 and 
altitude < 200);
   postgres=# select * from cities ;
name | population | altitude 
   --++--
aaa  |123 |  134
bbb  |123 |  134
   (2 rows)
   
   Do on subscription:
   create subscription sub_tc connection 'host=localhost port=5432 
dbname=postgres' publication pub_tc;
   postgres=# select * from cities ;
name | population | altitude 
   --++--
aaa  |123 |  134
bbb  |123 |  134
bbb  |123 |  134
   (3 rows)
   ##
   An unexcept row appears.
   
3. I am puzzled when I test the update.
  Use the tables in problem 2 and test as below:
  #
  On publication:
  postgres=# insert into cities values('t1',123, 34);
  INSERT 0 1
  postgres=# update cities SET altitude = 134 where altitude = 34;
  UPDATE 1
  postgres=# select * from cities ;
   name | population | altitude 
  --++--
   t1   |123 |  134
  (1 row)
  On subscription:
  postgres=# select * from cities ;
   name | population | altitude 
  --++--
  (0 rows)
  
  On publication:
  insert into cities values('t1',1,'135');
  update cities set altitude=300 where altitude=135;
  postgres=# table cities ;
   name | population | altitude 
  --++--
   t1   |123 |  134
   t1   |  1 |  300
  (2 rows)
  
  On subscription:
  ostgres=# table cities ;
   name | population | altitude 
  --++--
   t1   |  1 |  135
  (1 row)
  #
  Result1:Update a row that is not suitable the publication condition to
  suitable, the subscription change nothing.
  Result2: Update a row that is suitable for the publication condition to
  not suitable, the subscription change nothing.
  If it is a bug? Or there should be an explanation about it?

4. SQL splicing code in fetch_remote_table_info() function is too long

---
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead...@highgo.ca

The new status of this patch is: Waiting on Author

Re: pgbench - allow to create partitioned tables

2019-09-22 Thread Amit Kapila

On Sun, Sep 22, 2019 at 12:22 PM Fabien COELHO  wrote:
> >>sh> pgbench -T 10
> >>...
> >>partitions: 0
> >
> > I am not sure how many users would be able to make out that it is a
> > run where actual partitions are not present unless they beforehand
> > know and detect such a condition in their scripts.
>
> > What is the use of such a run which completes without actual updates?
>
> Why should we decide that they cannot do that?
>
> The user could be testing the overhead of no-op updates, which is
> something interesting, and check what happens with partitioning in this
> case. For that, they may delete pgbench_accounts contents or its
> partitions for partitioned version, or only some partitions, or whatever.
>
> A valid (future) case is that hopefully dynamic partitioning could be
> implemented, thus no partitions would be a perfectly legal state even with
> the standard benchmarking practice. Maybe the user just wrote a clever
> extension to do that with a trigger and wants to test the performance
> overhead with pgbench. Fine!
>

It is better for a user to write a custom script for such cases.
Because after that "select-only" or "simple-update" script doesn't
make any sense.  In the "select-only" case why would anyone like test
fetching zero rows, similarly in "simple-update" case, 2 out of 3
statements will be a no-op.  In "tpcb-like" script, 2 out of 5 queries
will be no-op and it won't be completely no-op updates as you are
telling.  Having said that, I see your point and don't mind allowing
such cases until we don't have to write special checks in the code to
support such cases.  Now, we can have a detailed comment in
printResults to explain why we have a different check there as compare
to other code paths or change other code paths to have a similar check
as printResults, but I am not convinced of any of those options.

> >>
> >> I did not buy moving the condition inside the fillfactor function.
> >
> > I also don't agree with your position.  My main concern here is that
> > we can't implicitly assume that fillfactor need to be appended.
>
> Sure.
>
> > At the very least we should have a comment saying why we are always
> > appending the fillfactor for partitions
>
> The patch does not do that, the condition is just before the call, not
> inside it with a boolean passed as an argument. AFAICS the behavior of v14
> is exactly the same as your version and as the initial code.
>

Here, I am talking about the call to append_fillfactor in
createPartitions() function.  See, in my version, there are some
comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Index Skip Scan

2019-09-22 Thread Alvaro Herrera

On 2019-Sep-22, Dmitry Dolgov wrote:

> > I think multiplying two ScanDirections to watch for a negative result is
> > pretty ugly:
> 
> Probably, but the only alternative I see to check if directions are opposite 
> is
> to check that directions come in pairs (back, forth), (forth, back). Is there
> an easier way?

Maybe use the ^ operator?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: strong memory leak in plpgsql from handled rollback and lazy cast

2019-09-22 Thread Andres Freund

Hi,

On 2019-09-22 20:16:15 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-09-22 18:43:23 -0400, Tom Lane wrote:
> >> I'm not convinced that it'd be safe to re-use an ExprState after a
> >> previous execution failed (though perhaps Andres has a different
> >> opinion?)
> 
> > I don't immediately see why it'd be problematic to reuse at a later
> > time, as long as it's guaranteed that a) there's only one execution
> > happening at a time b) the failure wasn't in the middle of writing a
> > value.  a) would be problematic regardless of reuse-after-failure, and
> > b) should be satisfied by only failing at ereport etc.
> 
> I think you're considering a much smaller slice of the system than
> what seems to me to be at risk here.

Yea, I was only referencing the expression eval logic itself, as I
understood your question to aim mainly at that...


> As an example, an ExprState tree would also include any
> fn_extra-linked state that individual functions might've set up.
> We've got very little control over what the validity requirements are
> for those or how robust the code that creates them is.  I *think* that
> most of the core code that makes such things is written in a way that
> it doesn't leave partially-valid cache state if setup fails partway
> through ... but I wouldn't swear that it all is, and I'd certainly bet
> money on there being third-party code that isn't careful about that.

Hm. I'd be kinda willing to just declare such code broken. But given
that the memory leak situation, as you say, still exists, I don't think
it matters for now.


> > Hm. I interestingly am working on a patch that merges all the memory
> > allocations done for an ExprState into one or two allocations (by
> > basically doing the traversal twice). Then it'd be feasible to just
> > pfree() the memory, if that helps.
> 
> Again, that'd do nothing to clean up subsidiary fn_extra state.
> If we want no leaks, we need a separate context for the tree
> to live in.

Well, it'd presumably leak a lot less :/.

Greetings,

Andres Freund

Re: strong memory leak in plpgsql from handled rollback and lazy cast

2019-09-22 Thread Tom Lane

Andres Freund  writes:
> On 2019-09-22 18:43:23 -0400, Tom Lane wrote:
>> I'm not convinced that it'd be safe to re-use an ExprState after a
>> previous execution failed (though perhaps Andres has a different
>> opinion?)

> I don't immediately see why it'd be problematic to reuse at a later
> time, as long as it's guaranteed that a) there's only one execution
> happening at a time b) the failure wasn't in the middle of writing a
> value.  a) would be problematic regardless of reuse-after-failure, and
> b) should be satisfied by only failing at ereport etc.

I think you're considering a much smaller slice of the system than
what seems to me to be at risk here.  As an example, an ExprState
tree would also include any fn_extra-linked state that individual
functions might've set up.  We've got very little control over what
the validity requirements are for those or how robust the code that
creates them is.  I *think* that most of the core code that makes
such things is written in a way that it doesn't leave partially-valid
cache state if setup fails partway through ... but I wouldn't swear
that it all is, and I'd certainly bet money on there being third-party
code that isn't careful about that.

> Hm. I interestingly am working on a patch that merges all the memory
> allocations done for an ExprState into one or two allocations (by
> basically doing the traversal twice). Then it'd be feasible to just
> pfree() the memory, if that helps.

Again, that'd do nothing to clean up subsidiary fn_extra state.
If we want no leaks, we need a separate context for the tree
to live in.

regards, tom lane

Re: strong memory leak in plpgsql from handled rollback and lazy cast

2019-09-22 Thread Andres Freund

Hi,

On 2019-09-22 18:43:23 -0400, Tom Lane wrote:
> I'm not convinced that it'd be safe to re-use an ExprState after a
> previous execution failed (though perhaps Andres has a different
> opinion?)

I don't immediately see why it'd be problematic to reuse at a later
time, as long as it's guaranteed that a) there's only one execution
happening at a time b) the failure wasn't in the middle of writing a
value.  a) would be problematic regardless of reuse-after-failure, and
b) should be satisfied by only failing at ereport etc.

Most memory writes during ExprState evaluation are redone from scratch
every execution, and the remaining things should only be things like
tupledesc's being cached at first execution. And that even uses an
ExprContext callback to reset the cache on context shutdown.

The other piece is that on the first execution of a expression we use
ExecInterpExprStillValid, and we don't on later executions. Not sure if
that's relevant here?

> so I think the only way to avoid the intratransaction memory leak would
> be to set up each new cast ExprState in its own memory context that we
> could free.  That seems like adding quite a lot of overhead to get rid
> of a leak that we've been living with for ages.

Hm. I interestingly am working on a patch that merges all the memory
allocations done for an ExprState into one or two allocations (by
basically doing the traversal twice). Then it'd be feasible to just
pfree() the memory, if that helps.

Greetings,

Andres Freund

Re: WAL recycled despite logical replication slot

2019-09-22 Thread Andres Freund

Hi,

On 2019-09-22 11:45:05 -0400, Jeff Janes wrote:
> On Fri, Sep 20, 2019 at 6:25 PM Andres Freund  wrote:
> > Hi,
> > >Is there an innocent explanation for this?  I thought logical
> > >replication
> > >slots provided an iron-clad guarantee that WAL would be retained until
> > >it
> > >was no longer needed.  I am just using pub/sub, none of the lower level
> > >stuff.
> >
> > It indeed should. What's the content of
> > pg_replication_slot for that slot?
> >
> 
> Unfortunately I don't think I have that preserved.  If I can reproduce the
> issue, would preserving data/pg_replslot/sub/state help as well?

Can't hurt. Best together with other slots, if they exists.

Could you describe the system a bit?

Greetings,

Andres Freund

Re: JSONPATH documentation

2019-09-22 Thread Alexander Korotkov

On Mon, Sep 23, 2019 at 1:03 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Sun, Sep 22, 2019 at 9:18 PM Jeff Janes  wrote:
> > Currently description of jsonpath is divided between datatypes section
> > and functions and operators section.  And yes, this looks cumbersome.
>
> Agreed, but ...
>
> > I think we should move the whole description to the one section.
> > Probably we should move jsonpath description to datatypes section
> > (assuming jsonpath is a datatype) leaving functions and operators
> > section with just SQL-level functions and operators.  What do you
> > think?
>
> ... I don't think that's an improvement.  We don't document detailed
> behavior of a datatype's functions in datatype.sgml, and this seems
> like it would be contrary to that layout.  If anything, I'd merge
> the other way, with only a very minimal description of jsonpath
> (perhaps none?) in datatype.sgml.
>
> While we're whining about this, I find it very off-putting that
> the jsonpath stuff was inserted in the JSON functions section
> ahead of the actual JSON functions.  I think it should have
> gone after them, because it feels like a barely-related interjection
> as it stands.  Maybe there's even a case that it should be
> its own , after the "functions-json" section.

Yes, it think moving jsonpath description to own  is a good
idea.  But I still think we should have complete jsonpath description
in the single place.  What about joining jsonpath description from
both datatypes section and functions and operators section into this
, leaving datatypes section with something very brief?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: strong memory leak in plpgsql from handled rollback and lazy cast

2019-09-22 Thread Tom Lane

Pavel Stehule  writes:
> When I tested some hypothesis I wrote buggy code. It was surprise how fast
> I lost all free memory

> do $$
> begin
>   for i in 1..300
>   loop
> begin
>   -- do some error
>   if i then end if;
> exception when others then
>   -- do nothing
> end;
>   end loop;
> end;
> $$;

Yeah, this is because an error gets thrown inside the cast-to-boolean.
It's intentional that the execution state tree gets thrown away if that
happens, per the comment in get_cast_hashentry:

 * Prepare the expression for execution, if it's not been done already in
 * the current transaction; also, if it's marked busy in the current
 * transaction, abandon that expression tree and build a new one, so as to
 * avoid potential problems with recursive cast expressions and failed
 * executions.  (We will leak some memory intra-transaction if that
 * happens a lot, but we don't expect it to.)  It's okay to update the

I'm not convinced that it'd be safe to re-use an ExprState after a
previous execution failed (though perhaps Andres has a different opinion?)
so I think the only way to avoid the intratransaction memory leak would
be to set up each new cast ExprState in its own memory context that we
could free.  That seems like adding quite a lot of overhead to get rid
of a leak that we've been living with for ages.

Maybe we could pay the extra overhead only after the expression has
failed at least once.  Seems a bit messy though, and I'm afraid that
we'd have to add PG_TRY overhead in any case.

regards, tom lane

Re: scorpionfly needs more semaphores

2019-09-22 Thread Thomas Munro

On Wed, Sep 18, 2019 at 4:55 PM jungle boogie  wrote:
> $ sysctl | ag kern.seminfo.semmni
> kern.seminfo.semmni=100

It still seems to be happening.  Perhaps you need to increase semmns too?

> > Scorpionfly also seems to be having problems with its git repo breaking on
> > a regular basis.  I have no idea what's up with that.
>
> That is a mystery to me as well. 9.4 stable seems to be the branch with
> the most problems:
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=scorpionfly&br=REL9_4_STABLE
>
>
> My cronjobs:
> 0 */6 * * * cd /home/pgbuilder/bin/REL10 && ./run_build.pl  --verbose
> 0 */12 * * * cd /home/pgbuilder/bin/REL10 && ./run_branches.pl  --run-all

I think you need just the run_branches.pl entry.

BTW I'm sorry for my flippant tone about sem_init() earlier, which
someone pointed out to me was not great cross-project open source.
What I really meant to say was: if you have OpenBSD developer
contacts, it would be cool if you could highlight that issue as
something that would make the PostgreSQL-on-OpenBSD experience nicer
for end users (and I suspect other multi-process software too).  On
Linux and FreeBSD we now use sem_init()
(PREFERRED_SEMAPHORES=UNNAMED_POSIX) so users never need to worry
about configuring that kernel resource.  On at least AIX, we still
have to use SysV, but there the default limits are high enough that
(according to our manual) no adjustment is required.  Also, as I
speculated in that other thread: based on a quick peek at the
implementation, you might get better performance on very large busy
PostgreSQL clusters from our cache line padded sem_init() array than
you do with your more densely packed SysV semas (I could be totally
wrong about that, but we saw an effect like that on some other
operating system).

-- 
Thomas Munro
https://enterprisedb.com

Re: JSONPATH documentation

2019-09-22 Thread Tom Lane

Alexander Korotkov  writes:
> On Sun, Sep 22, 2019 at 9:18 PM Jeff Janes  wrote:
> Currently description of jsonpath is divided between datatypes section
> and functions and operators section.  And yes, this looks cumbersome.

Agreed, but ...

> I think we should move the whole description to the one section.
> Probably we should move jsonpath description to datatypes section
> (assuming jsonpath is a datatype) leaving functions and operators
> section with just SQL-level functions and operators.  What do you
> think?

... I don't think that's an improvement.  We don't document detailed
behavior of a datatype's functions in datatype.sgml, and this seems
like it would be contrary to that layout.  If anything, I'd merge
the other way, with only a very minimal description of jsonpath
(perhaps none?) in datatype.sgml.

While we're whining about this, I find it very off-putting that
the jsonpath stuff was inserted in the JSON functions section
ahead of the actual JSON functions.  I think it should have
gone after them, because it feels like a barely-related interjection
as it stands.  Maybe there's even a case that it should be
its own , after the "functions-json" section.

regards, tom lane

Re: Efficient output for integer types

2019-09-22 Thread David Fetter

On Sat, Sep 21, 2019 at 07:29:25AM +0100, Andrew Gierth wrote:
> > "David" == David Fetter  writes:
> 
>  David> +static inline uint32
>  David> +decimalLength64(const uint64_t v)
> 
> Should be uint64, not uint64_t.

Fixed.

> Also return an int, not a uint32.

Fixed.

> For int vs. int32, my own inclination is to use "int" where the value is
> just a (smallish) number, especially one that will be used as an index
> or loop count, and use "int32" when it actually matters that it's 32
> bits rather than some other size. Other opinions may differ.

Done with int.

>  David> +{
>  David> + uint32  t;
>  David> + static uint64_t PowersOfTen[] = {
> 
> uint64 not uint64_t here too.

Fixed.

>  David> +int32
>  David> +pg_ltoa_n(uint32 value, char *a)
> 
> If this is going to handle only unsigned values, it should probably be
> named pg_ultoa_n.

It does signed values now.

>  David> + uint32  i = 0, adjust = 0;
> 
> "adjust" is not assigned anywhere else. Presumably that's from previous
> handling of negative numbers?

It was, and now it's gone.

>  David> + memcpy(a, "0", 1);
> 
>  *a = '0';  would suffice.

Fixed.

>  David> + i += adjust;
> 
> Superfluous?

Yep. Gone.

>  David> + uint32_tuvalue = (uint32_t)value;
> 
> uint32 not uint32_t.

Fixed.

>  David> + int32   len;
> 
> See above re. int vs. int32.

Done that way.

>  David> + uvalue = (uint32_t)0 - (uint32_t)value;
> 
> Should be uint32 not uint32_t again.

Done.

> For anyone wondering, I suggested this to David in place of the ugly
> special casing of INT32_MIN. This method avoids the UB of doing (-value)
> where value==INT32_MIN, and is nevertheless required to produce the
> correct result:
> 
> 1. If value < 0, then ((uint32)value) is (value + UINT32_MAX + 1)
> 2. (uint32)0 - (uint32)value
>   becomes  (UINT32_MAX+1)-(value+UINT32_MAX+1)
>   which is (-value) as required
> 
>  David> +int32
>  David> +pg_lltoa_n(uint64_t value, char *a)
> 
> Again, if this is doing unsigned, then it should be named pg_ulltoa_n

Renamed to allow the uint64s that de-special-casing INT32_MIN/INT64_MIN 
requires.

>  David> + if (value == PG_INT32_MIN)
> 
> This being inconsistent with the others is not nice.

Fixed.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 8045adc25343314e089d83fe9fbb91dbd1fb71e1 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Sun, 15 Sep 2019 00:06:29 -0700
Subject: [PATCH v11] Make int4 and int8 operations more efficent
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.21.0"

This is a multi-part message in MIME format.
--2.21.0
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- Output routines now do more digits per iteration, and
- Code determines the number of decimal digits in int4/int8 efficiently
- Split off pg_ltoa_n from pg_ltoa
- Use same to make other functions shorter

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index 651ade14dd..5c5b6d33b2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -112,7 +112,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 			case INT8OID:
 {
 	int64		num = DatumGetInt64(value);
-	char		str[23];	/* sign, 21 digits and '\0' */
+	char		str[MAXINT8LEN + 1];
 
 	pg_lltoa(num, str);
 	pq_sendcountedtext(&buf, str, strlen(str), false);
diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 0ff9394a2f..6230807906 100644
--- a/src/backend/utils/adt/int8.c
+++ b/src/backend/utils/adt/int8.c
@@ -27,8 +27,6 @@
 #include "utils/builtins.h"
 
 
-#define MAXINT8LEN		25
-
 typedef struct
 {
 	int64		current;
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 70138feb29..1fdf9caadd 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -20,6 +20,68 @@
 
 #include "common/int.h"
 #include "utils/builtins.h"
+#include "port/pg_bitutils.h"
+
+/*
+ * A table of all two-digit numbers. This is used to speed up decimal digit
+ * generation by copying pairs of digits into the final output.
+ */
+static const char DIGIT_TABLE[200] =
+"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
+"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
+"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
+"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
+"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
+"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
+"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
+"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
+"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
+"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";
+
+/*
+ * Adapted from http://graphics.stanford.edu/~seand

Re: JSONPATH documentation

2019-09-22 Thread Alexander Korotkov

Hi!

On Sun, Sep 22, 2019 at 9:18 PM Jeff Janes  wrote:
> I find the documentation in  
> https://www.postgresql.org/docs/12/functions-json.html very confusing.
>
> In table 9.44 take the first entry,
>
> Example JSON
>  {"x": [2.85, -14.7, -9.4]}
>
> Example Query
>   + $.x.floor()
>
> Result
> 2, -15, -10
>
> There are no end to end examples here. How do I apply the example query to 
> the example json to obtain the given result?

Yes, I agree this looks unclear.  I can propose two possible solutions.
1) Include full queries into the table.  For instance, it could be
"SELECT jsonb_path_query_array('{"x": [2.85, -14.7, -9.4]}', '+
$.x.floor()');".  Or at least full SQL expressions, e.g.
"jsonb_path_query_array('{"x": [2.85, -14.7, -9.4]}', '+
$.x.floor()')".
2) Add a note clarifying which functions use to run the examples.

What do you think?

> Table 9.47 only gives two operators which apply a jsonpath to a json(b) 
> object: @? and @@; and neither one of those yield the indicated result from 
> the first line in 9.44. What does?

Operators don't produce these results.  These results may be produced
by jsonb_path_query() or jsonb_path_query_array() functions described
in table 9.49.

> Also, I can't really figure out what the descriptions of @? and @@ mean.  
> Does @? return true if an item exists, even if the value of that item is 
> false, while @@ returns the truth value of the existing item?

I see @? and @@ are lacking of examples.  And description given in the
table is a bit vague.

@? checks if jsonpath returns at least of item.

# SELECT '{"x": [2.85, -14.7, -9.4]}' @? '$.x[*] ? (@ > 2)';
 ?column?
--
 t

# SELECT '{"x": [2.85, -14.7, -9.4]}' @? '$.x[*] ? (@ > 3)';
 ?column?
--
 f

@@ checks if first item returned by jsonpath is true.

# SELECT '{"x": [2.85, -14.7, -9.4]}' @@ '$.x.size() == 3';
 ?column?
--
 f

# SELECT '{"x": [2.85, -14.7, -9.4]}' @@ '$.x.size() == 4';
 ?column?
--
 f

> https://www.postgresql.org/docs/12/datatype-json.html#DATATYPE-JSONPATH
>
> "The SQL/JSON path language is fully integrated into the SQL engine". What 
> does that mean?  If it were only partially integrated, what would that mean?  
> Is this providing me with any useful information?  Is this just saying that 
> this is not a contrib extension module?

I guess, this sentence comes from uncommitted patch, which implements
SQL/JSON clauses.  I see that now we only can use jsonpath in
functions and operator.  So, we can't say it's fully integrated.

> What is the difference between "SQL/JSON Path Operators And Methods" and  and 
> "jsonpath Accessors" and why are they not described in the same place, or at 
> least nearby each other?

Accessors are used to access parts of json objects/arrays, while
operators manipulates accessed parts.  This terminology comes from SQL
standard.  In principle we could call accessors and operators the same
name, but we follow standard terminology.

Currently description of jsonpath is divided between datatypes section
and functions and operators section.  And yes, this looks cumbersome.
I think we should move the whole description to the one section.
Probably we should move jsonpath description to datatypes section
(assuming jsonpath is a datatype) leaving functions and operators
section with just SQL-level functions and operators.  What do you
think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: JSONPATH documentation

2019-09-22 Thread Jeff Janes

On Sun, Sep 22, 2019 at 2:18 PM Jeff Janes  wrote:

> I find the documentation in
> https://www.postgresql.org/docs/12/functions-json.html very confusing.
>
> In table 9.44 take the first entry,
>
> Example JSON
>  {"x": [2.85, -14.7, -9.4]}
>
> Example Query
>   + $.x.floor()
>
> Result
> 2, -15, -10
>
> There are no end to end examples here. How do I apply the example query to
> the example json to obtain the given result?
>

OK, never mind here.  After digging in the regression tests, I did find
jsonb_path_query and friends, and they are in the docs with examples in
table 9.49.  I don't know how I overlooked that in the first place, I guess
I was fixated on operators.  Or maybe by the time I was down in those
functions, I thought I had cycled back up and was looking at 9.44 again.
But I think it would make sense to move the description of jsonpath to its
own page.  It is confusing to have operators within the jsonpath language,
and operators which apply to jsonpath "from the outside", together in the
same page.

Cheers,

Jeff

>

Re: [bug fix??] Fishy code in tts_cirtual_copyslot()

2019-09-22 Thread Tom Lane

"Tsunakawa, Takayuki"  writes:
> In the following code in execTuples.c, shouldn' srcdesc point to the source 
> slot's tuple descriptor?  The attached fix passes make check.  What kind of 
> failure could this cause?

Yeah, sure looks like a typo to me too.

I temporarily changed the Assert to be "==" rather than "<=", and
it still passed check-world, so evidently we are not testing any
cases where the descriptors are of different lengths.  This explains
the lack of symptoms.  It's still a bug though, so pushed.

> BTW, I thought that in PostgreSQL coding convention, local variables should 
> be defined at the top of blocks, but this function writes "for (int natts;".

Yeah, we've agreed to join the 21st century to the extent of allowing
local for-loop variables.

Thanks for the report!

regards, tom lane

JSONPATH documentation

2019-09-22 Thread Jeff Janes

I find the documentation in
https://www.postgresql.org/docs/12/functions-json.html very confusing.

In table 9.44 take the first entry,

Example JSON
 {"x": [2.85, -14.7, -9.4]}

Example Query
  + $.x.floor()

Result
2, -15, -10

There are no end to end examples here. How do I apply the example query to
the example json to obtain the given result?

Table 9.47 only gives two operators which apply a jsonpath to a json(b)
object: @? and @@; and neither one of those yield the indicated result from
the first line in 9.44. What does?

Also, I can't really figure out what the descriptions of @? and @@ mean.
Does @? return true if an item exists, even if the value of that item is
false, while @@ returns the truth value of the existing item?

https://www.postgresql.org/docs/12/datatype-json.html#DATATYPE-JSONPATH

"The SQL/JSON path language is fully integrated into the SQL engine". What
does that mean?  If it were only partially integrated, what would that
mean?  Is this providing me with any useful information?  Is this just
saying that this is not a contrib extension module?

What is the difference between "SQL/JSON Path Operators And Methods" and
and "jsonpath Accessors" and why are they not described in the same place,
or at least nearby each other?

Cheers,

Jeff

Re: Wrong sentence in the README?

2019-09-22 Thread Tom Lane

"Daniel Westermann (DWE)"  writes:
> in the README, top level, there is this:

> PostgreSQL has many language interfaces, many of which are listed here:
> https://www.postgresql.org/download

> I don't think the download page lists any language interfaces or do I miss 
> something?

Not directly on that page, though if you drill down into the "software
catalogue" you can find them.

Since there's already a link to that same page a bit further down in
the file, I'm inclined to just remove the quoted sentence.  Maybe
add something about "and related software" to the later link.  Certainly
"language interfaces" is just one small part of that.

regards, tom lane

Re: [PATCH] Improve performance of NOTIFY over many databases (v2)

2019-09-22 Thread Tom Lane

Martijn van Oosterhout  writes:
> On Mon, 16 Sep 2019 at 15:33, Tom Lane  wrote:
>> But do we care?  With asyncQueueAdvanceTail gone from the listeners,
>> there's no longer an exclusive lock for them to contend on.  And,
>> again, I failed to see any significant contention even in HEAD as it
>> stands; so I'm unconvinced that you're solving a live problem.

> You're right, they only acquire a shared lock which is much less of a
> problem. And I forgot that we're still reducing the load from a few
> hundred signals and exclusive locks per NOTIFY to perhaps a dozen
> shared locks every thousand messages. You'd be hard pressed to
> demonstrate there's a real problem here.

> So I think your patch is fine as is.

OK, pushed.

> Looking at the release cycle it looks like the earliest either of
> these patches will appear in a release is PG13, right?

Right.

regards, tom lane

Re: WAL recycled despite logical replication slot

2019-09-22 Thread Jeff Janes

On Fri, Sep 20, 2019 at 6:25 PM Andres Freund  wrote:

> Hi,
>
> On September 20, 2019 5:45:34 AM PDT, Jeff Janes 
> wrote:
> >While testing something else (whether "terminating walsender process
> >due to
> >replication timeout" was happening spuriously), I had logical
> >replication
> >set up streaming a default pgbench transaction load, with the publisher
> >being 13devel-e1c8743 and subscriber being 12BETA4.  Eventually I
> >started
> >getting errors about requested wal segments being already removed:
> >
> >10863 sub idle 0 2019-09-19 17:14:58.140 EDT LOG:  starting logical
> >decoding for slot "sub"
> >10863 sub idle 0 2019-09-19 17:14:58.140 EDT DETAIL:  Streaming
> >transactions committing after 79/EB0B17A0, reading WAL from
> >79/E70736A0.
> >10863 sub idle 58P01 2019-09-19 17:14:58.140 EDT ERROR:  requested WAL
> >segment 0001007900E7 has already been removed
> >10863 sub idle 0 2019-09-19 17:14:58.144 EDT LOG:  disconnection:
> >session time: 0:00:00.030 user=jjanes database=jjanes host=10.0.2.2
> >port=40830
> >
> >It had been streaming for about 50 minutes before the error showed up,
> >and
> >it showed right when streaming was restarting after one of the
> >replication
> >timeouts.
> >
> >Is there an innocent explanation for this?  I thought logical
> >replication
> >slots provided an iron-clad guarantee that WAL would be retained until
> >it
> >was no longer needed.  I am just using pub/sub, none of the lower level
> >stuff.
>
> It indeed should. What's the content of
> pg_replication_slot for that slot?
>

Unfortunately I don't think I have that preserved.  If I can reproduce the
issue, would preserving data/pg_replslot/sub/state help as well?

Cheers,

Jeff

Re: WAL recycled despite logical replication slot

2019-09-22 Thread Jeff Janes

On Fri, Sep 20, 2019 at 11:27 AM Tomas Vondra 
wrote:

> >
> >Is there an innocent explanation for this?  I thought logical replication
> >slots provided an iron-clad guarantee that WAL would be retained until it
> >was no longer needed.  I am just using pub/sub, none of the lower level
> >stuff.
> >
>
> I think you're right - this should not happen with replication slots.
> Can you provide more detailed setup instructions, so that I can try to
> reproduce and investigate the isssue?
>

It is a bit messy, because this isn't what I was trying to test.

The basic set up is pretty simple:

On master:

pgbench -i -s 100
create publication pgbench for table pgbench_accounts,  pgbench_branches,
pgbench_history , pgbench_tellers;
pgbench -R200 -c4 -j4 -P60 -T36 -n

on replica:

pgbench -i -s 1
truncate pgbench_history , pgbench_accounts, pgbench_branches,
pgbench_tellers;
create subscription sub CONNECTION 'host=192.168.0.15' publication pgbench;

The messy part:  It looked like the synch was never going to finish, so
first I cut the rate down to -R20.  Then what I thought I did was drop the
primary key on pgbench_accounts (manually doing a kill -15 on the synch
worker to release the lock), wait for the copy to start again and then
finish and then start getting "ERROR:  logical replication target relation
"public.pgbench_accounts" has neither REPLICA IDENTITY index nor PRIMARY
KEY and published relation does not have REPLICA IDENTITY FULL" log
messages, then I re-added the primary key.  Then I increased the -R back to
200, and about 50 minutes later got the WAL already removed error.

But now I can't seem to reproduce this, as the next time I tried to do the
synch with no primary key there doesn't seem to be a commit after the COPY
finishes so once it tries to replay the first update, it hits the above "no
primary key" error and then rolls back the **the entire COPY** as well as
the single-row update, and starts the entire COPY over again before you
have a chance to intervene and build the index.  So I'm guessing now that
either the lack of a commit (which itself seems like a spectacularly bad
idea) is situation dependent, or the very slow COPY had finished between
the time I had decided to drop the primary key, and time I actually
implemented the drop.

Perhaps important here is that the replica is rather underpowered.  Write
IO and fsyncs periodically become painfully slow, which is probably why
there are replication timeouts, and since the problem happened when trying
to reestablish after a timeout I would guess that that is critical to the
issue.

I was running the master with fsync=off, but since the OS never crashed
that should not be the source of corruption.

I'll try some more to reproduce this, but I wanted to make sure there was
actually something here to reproduce, and not just my misunderstanding of
how things are supposed to work.

Cheers,

Jeff

Re: The flinfo->fn_extra question, from me this time.

2019-09-22 Thread Tom Lane

Dent John  writes:
> On 21 Jul 2019, at 22:54, Tom Lane  wrote:
>> Chapman Flack  writes:
>>> But looking in the code, I'm getting the impression that those
>>> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
>>> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
>>> to build up a whole tuplestore in advance.

>> Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
>> actually does get the chance to pipeline, if it implements ValuePerCall.
>> The FROM case could be improved perhaps, if somebody wanted to put
>> time into it.

> By any chance, do either of you know if there are initiatives to make the 
> changes mentioned?

I don't know of anybody working on it.

>> You'd still need to be prepared to build a tuplestore,
>> in case of rescan or backwards fetch; but […]

> I’m also interested in your comment here. If the function was STABLE, could 
> not the function scan simply be restarted? (Rather than needing to create the 
> tuplestore for all cases.)
> I guess perhaps the backwards scan is where it falls down though...

My point was that you can't simply remove the tuplestore-building code
path.  The exact boundary conditions for that might be negotiable.
But I'd be very dubious of an assumption that re-running the function
would be cheaper than building a tuplestore, regardless of whether it's
safe.

> Does the planner have any view on this?

cost_functionscan and cost_rescan would likely need some adjustment if
possible.  However, I'm not sure that the planner has any way to know
whether a given SRF will support ValuePerCall or not.

regards, tom lane

Re: Index Skip Scan

2019-09-22 Thread Dmitry Dolgov

> On Thu, Sep 5, 2019 at 9:41 PM Alvaro Herrera from 2ndQuadrant 
>  wrote:
>
> On 2019-Sep-05, Dmitry Dolgov wrote:
>
> > Here is the version in which stepping between the pages works better. It 
> > seems
> > sufficient to fix the case you've mentioned before, but for that we need to
> > propagate keepPrev logic through `_bt_steppage` & `_bt_readnextpage`, and I
> > can't say I like this solution. I have an idea that maybe it would be 
> > simpler
> > to teach the code after index_skip to not do `_bt_next` right after one skip
> > happened before. It should immediately elliminate several hacks from index 
> > skip
> > itself, so I'll try to pursue this idea.
>
> Cool.

Here it is. Since now the code after index_skip knows whether to do
index_getnext or not, it's possible to use unmodified `_bt_readpage` /
`_bt_steppage`. To achieve that there is a flag that indicated whether or not
we were skipping to the current item (I guess it's possible to implement it
without such a flag, but the at the end result looked more ugly as for me). On
the way I've simplified few things, and all the tests we accumulated before are
still passing. I'm almost sure it's possible to implement some parts of the
code more elegant, but don't see yet how.

> I think multiplying two ScanDirections to watch for a negative result is
> pretty ugly:

Probably, but the only alternative I see to check if directions are opposite is
to check that directions come in pairs (back, forth), (forth, back). Is there
an easier way?

> I think "scanstart" needs more documentation, both in the SGML docs as
> well as the code comments surrounding it.

I was able to remove it after another round of simplification.

v26-0001-Index-skip-scan.patch
Description: Binary data

Re: Allow to_date() and to_timestamp() to accept localized names

2019-09-22 Thread Juan José Santamaría Flecha

On Wed, Sep 18, 2019 at 11:09 AM Juan José Santamaría Flecha
 wrote:
>
> On Fri, Sep 13, 2019 at 10:31 PM Alvaro Herrera
>  wrote:
> >
> Thanks for taking a look at this.
>
> > I'm confused why we acquire the MONTH_DIM / etc definitions.  Can't we
> > just use lengthof() of the corresponding array?  AFAICS it should work
> > just as well.
> >
>
> It was because of the length difference between ascii-name arrays,
> which were all null-ended, and localized-name arrays. The attached
> version uses lengthof().
>
> > I wonder if the "compare first char" thing (seq_search_localized) really
> > works when there are multibyte chars in the day/month names.  I think
> > the code compares just the first char ... but what if the original
> > string uses those funny Unicode non-normalized letters and the locale
> > translation uses normalized letters?  My guess is that the first-char
> > comparison will fail, but surely you'll want the name to match.
> > (There's no month/day name in Spanish that doesn't start with an ASCII
> > letter, but I bet there are some in other languages.)  I think the
> > localized comparison should avoid the first-char optimization, just
> > compare the whole string all the time, and avoid possible weird issues.
>
> The localized search is reformulated in this version to deal with
> multibyte normalization. A regression test for this issue is included.

This version is updated to optimize the need for dynamic allocation.


> Regards,
>
> Juan José Santamaría Flecha
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4e3e213..c470c0e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6415,8 +6415,17 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
  
   
TM does not include trailing blanks.
+  
+ 
+
+ 
+  
to_timestamp and to_date ignore
-   the TM modifier.
+   the case when receiving names as an input.  For example, either
+   to_timestamp('2000-JUN', '-MON') or
+   to_timestamp('2000-Jun', '-MON') or
+   to_timestamp('2000-JUN', '-mon') work and return
+   the same output.
   
  
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 755ca6e..39d2d11 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -96,6 +96,7 @@
 #include "utils/memutils.h"
 #include "utils/numeric.h"
 #include "utils/pg_locale.h"
+#include "common/unicode_norm.h"
 
 /* --
  * Routines type
@@ -174,18 +175,17 @@ typedef struct
 #define CLOCK_24_HOUR		0
 #define CLOCK_12_HOUR		1
 
-
 /* --
  * Full months
  * --
  */
 static const char *const months_full[] = {
 	"January", "February", "March", "April", "May", "June", "July",
-	"August", "September", "October", "November", "December", NULL
+	"August", "September", "October", "November", "December"
 };
 
 static const char *const days_short[] = {
-	"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", NULL
+	"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
 };
 
 /* --
@@ -217,8 +217,8 @@ static const char *const days_short[] = {
  * matches for BC have an odd index.  So the boolean value for BC is given by
  * taking the array index of the match, modulo 2.
  */
-static const char *const adbc_strings[] = {ad_STR, bc_STR, AD_STR, BC_STR, NULL};
-static const char *const adbc_strings_long[] = {a_d_STR, b_c_STR, A_D_STR, B_C_STR, NULL};
+static const char *const adbc_strings[] = {ad_STR, bc_STR, AD_STR, BC_STR};
+static const char *const adbc_strings_long[] = {a_d_STR, b_c_STR, A_D_STR, B_C_STR};
 
 /* --
  * AM / PM
@@ -244,8 +244,8 @@ static const char *const adbc_strings_long[] = {a_d_STR, b_c_STR, A_D_STR, B_C_S
  * matches for PM have an odd index.  So the boolean value for PM is given by
  * taking the array index of the match, modulo 2.
  */
-static const char *const ampm_strings[] = {am_STR, pm_STR, AM_STR, PM_STR, NULL};
-static const char *const ampm_strings_long[] = {a_m_STR, p_m_STR, A_M_STR, P_M_STR, NULL};
+static const char *const ampm_strings[] = {am_STR, pm_STR, AM_STR, PM_STR};
+static const char *const ampm_strings_long[] = {a_m_STR, p_m_STR, A_M_STR, P_M_STR};
 
 /* --
  * Months in roman-numeral
@@ -254,10 +254,10 @@ static const char *const ampm_strings_long[] = {a_m_STR, p_m_STR, A_M_STR, P_M_S
  * --
  */
 static const char *const rm_months_upper[] =
-{"XII", "XI", "X", "IX", "VIII", "VII", "VI", "V", "IV", "III", "II", "I", NULL};
+{"XII", "XI", "X", "IX", "VIII", "VII", "VI", "V", "IV", "III", "II", "I"};
 
 static const char *const rm_months_lower[] =
-{"xii", "xi", "x", "ix", "viii", "vii", "vi", "v", "iv", "iii", "ii", "i", NULL};
+{"xii", "xi", "x", "ix", "viii", "vii", "vi", "v", "iv", "iii", "ii", "i"};
 
 /* --
  * Roman numbers
@@ -975,7 +975,7 @@ static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 
 static void DCH_to_char(FormatNode *node, bool is_

Re: Unwanted expression simplification in PG12b2

2019-09-22 Thread Komяpa

Hi,

On Fri, Sep 20, 2019 at 11:14 PM Robert Haas  wrote:

> On Wed, Jul 17, 2019 at 5:20 PM Darafei "Komяpa" Praliaskouski
>  wrote:
> > Indeed, it seems I failed to minimize my example.
> >
> > Here is the actual one, on 90GB table with 16M rows:
> > https://gist.github.com/Komzpa/8d5b9008ad60f9ccc62423c256e78b4c
> >
> > I can share the table on request if needed, but hope that plan may be
> enough.
>
> What I taught the planner to do here had to do with making the costing
> more accurate for cases like this. It now figures out that if it's
> going to stick a Gather in at that point, computing the expressions
> below the Gather rather than above the Gather makes a difference to
> the cost of that plan vs. other plans. However, it still doesn't
> consider any more paths than it did before; it just costs them more
> accurately. In your first example, I believe that the planner should
> be able to consider both GroupAggregate -> Gather Merge -> Sort ->
> Parallel Seq Scan and GroupAggregate -> Sort -> Gather -> Parallel Seq
> Scan, but I think it's got a fixed idea about which fields should be
> fed into the Sort. In particular, I believe it thinks that sorting
> more data is so undesirable that it doesn't want to carry any
> unnecessary baggage through the Sort for any reason. To solve this
> problem, I think it would need to cost the second plan with projection
> done both before the Sort and after the Sort and decide which one was
> cheaper.
>
> This class of problem is somewhat annoying in that the extra planner
> cycles and complexity to deal with getting this right would be useless
> for many queries, but at the same time, there are a few cases where it
> can win big. I don't know what to do about that.
>

A heuristic I believe should help my case (and I hardly imagine how it can
break others) is that in presence of Gather, all the function calls that
are parallel safe should be pushed into it.
In a perfect future this query shouldn't even have a subquery that I have
extracted for the sake of OFFSET 0 demo. Probably as a single loop that in
case of presence of a Gather tries to push down all the inner part of the
nested functions call that is Parallel Safe.
If we go as far as starting more workers, it really makes sense to load
them with actual work and not only wait for the master process.


-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Wrong results using initcap() with non normalized string

2019-09-22 Thread Juan José Santamaría Flecha

On Sat, Sep 21, 2019 at 2:42 AM Alvaro Herrera  wrote:
>
> On 2019-Sep-20, Tom Lane wrote:
>
> > If we're going to start worrying about non-normalized characters,
> > I suspect there are far more places than this one that we'd have
> > to consider buggy :-(.
>
> I would think that we have to start somewhere, rather than take the
> position that we can never do anything about it.
>
> (ref: 
> https://www.postgresql.org/message-id/flat/53E179E1.3060404%402ndquadrant.com 
> )

This conversation is prior to having the normalization code available
[1]. Nowadays this particular issue seems like low hanging fruit, but
I agree it would be problematic if it was the only normalization-aware
function, although most functions are sure to be troubleless if
nothing has been reported before.

The attached patch addresses the comment about assuming UTF8.

Regards,

Juan José Santamaría Flecha

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=60f11b87a2349985230c08616fa8a34ffde934c8

0001-initcap-non-normalized-string-v1.patch
Description: Binary data

Re: The flinfo->fn_extra question, from me this time.

2019-09-22 Thread Dent John

On 21 Jul 2019, at 22:54, Tom Lane  wrote:
> 
> Chapman Flack  writes:
>> Until now, I had assumed that SFRM_ValuePerCall mode might offer some
>> benefits, such as the possibility of pipelining certain queries and not
>> building up a whole tuplestore in advance.
> 
>> But looking in the code, I'm getting the impression that those
>> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
>> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
>> to build up a whole tuplestore in advance.
> 
> Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
> actually does get the chance to pipeline, if it implements ValuePerCall.
> 
> The FROM case could be improved perhaps, if somebody wanted to put
> time into it.

While looking at whether REFCURSOR output could be pipelined into the executor 
[1], I’ve stumbled upon the same.

By any chance, do either of you know if there are initiatives to make the 
changes mentioned?

> You'd still need to be prepared to build a tuplestore,
> in case of rescan or backwards fetch; but […]

I’m also interested in your comment here. If the function was STABLE, could not 
the function scan simply be restarted? (Rather than needing to create the 
tuplestore for all cases.)

I guess perhaps the backwards scan is where it falls down though...

> […] in principle you could return
> rows immediately while stashing them aside in a tuplestore.

Does the planner have any view on this? When I first saw what was going on, I 
presumed the planner had decided the cost of multiple function scans was 
greater than the cost of materialising it in a temporary store.

It occurs to me that, if we made a switch towards pipelining the function scan 
results directly out, then we might be loose efficiency where there are a 
significant number of scans and/or the function cost high. Is that why you were 
suggesting to as well stash them aside?

denty.

[1] 
https://www.postgresql.org/message-id/B2AFCAB5-FACD-44BF-963F-7DD2735FAB5D%40QQdd.eu

Re: pgbench - allow to create partitioned tables

Re: row filtering for logical replication

Re: pgbench - allow to create partitioned tables

Re: Index Skip Scan

Re: strong memory leak in plpgsql from handled rollback and lazy cast

Re: strong memory leak in plpgsql from handled rollback and lazy cast

Re: strong memory leak in plpgsql from handled rollback and lazy cast

Re: WAL recycled despite logical replication slot

Re: JSONPATH documentation

Re: strong memory leak in plpgsql from handled rollback and lazy cast

Re: scorpionfly needs more semaphores

Re: JSONPATH documentation

Re: Efficient output for integer types

Re: JSONPATH documentation

Re: JSONPATH documentation

Re: [bug fix??] Fishy code in tts_cirtual_copyslot()

JSONPATH documentation

Re: Wrong sentence in the README?

Re: [PATCH] Improve performance of NOTIFY over many databases (v2)

Re: WAL recycled despite logical replication slot

Re: WAL recycled despite logical replication slot

Re: The flinfo->fn_extra question, from me this time.

Re: Index Skip Scan

Re: Allow to_date() and to_timestamp() to accept localized names

Re: Unwanted expression simplification in PG12b2

Re: Wrong results using initcap() with non normalized string

Re: The flinfo->fn_extra question, from me this time.

27 matches

Site Navigation

Mail list logo

Footer information