date:20171203

Errands around AllocateDir()

2017-12-03 Thread Michael Paquier

Hi all,

On the recent following thread problems around the use of
AllocateDir() have been diagnosed with its use in the backend code:
https://www.postgresql.org/message-id/20171127093107.1473.70...@wrigleys.postgresql.org

I had a close look at all the callers of AllocateDir() and noticed a
couple of unwelcome things (Tom noticed some of those in the thread
mentioned above, I found others):
- Some elog() instead of ereport(), which is bad for the error code
and translation.
- Some use ereport(LOG), and could take advantage of ReadDirExtended,
which could get exposed instead of being contained in fd.c. Those
elog(LOG) can be removed when there is no further operation in the
routine where the folder read is happening.
- perform_base_backup() makes the mistake of not saving errno before
CheckXLogRemoved() when AllocateDir returns NULL, which can lead to an
incorrect error message.
- restoreTwoPhaseData() calls ReadDir() with LWLockAcquire()
in-between, which can lead to an incorrect errno used.
- Some code paths can simply rely on the error generated by ReadDir(),
so some code is useless.
- Some code paths use non-generic error messages, like
RemoveOldXlogFiles(). Here more consistent error string would I think
make sense.

Some issues are real bugs, like what's in perform_base_backup() and
restoreTwoPhaseData(), and some incorrect error codes. Missing
translation of some messages is also wrong IMO. Making error messages
more consistent is nice and cosmetic. My monitoring of all those
issues is leading me to the attached patch for HEAD. Thoughts?
-- 
Michael


allocatedir-errands.patch
Description: Binary data

Re: Would a BGW need shmem_access or database_connection to enumerate databases?

2017-12-03 Thread Amit Kapila

On Sat, Dec 2, 2017 at 12:41 AM, Robert Haas  wrote:
> On Fri, Dec 1, 2017 at 10:04 AM, Chapman Flack  wrote:
>> Can I call RegisterDynamicBackgroundWorker when not in the postmaster,
>> but also not in a "regular backend", but rather another BGW?
>
> I believe that doing it from another BGW works fine.
>

I think we are already doing it that way in autoprewarm
(src/contrib/pg_prewarm/autoprewarm.c).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Do we actually need an ItemId array on nbtree pages containing fixed-width tuples?

2017-12-03 Thread Andrey Borodin

Hi, Peter!

> 4 дек. 2017 г., в 4:55, Peter Geoghegan  написал(а):
> Thoughts?

I like the idea of more compact B-tree.
Chances are that I didn't understood all your ideas. 

But ItemId's let you insert a tuple among two existing tuples without data 
movement. New tuple is places wherever free space starts. You just shift bytes 
in ItemId array.
And you always have to insert tuple in specific position, since B-tree relies 
on tuple order.

Best regards, Andrey Borodin.

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-12-03 Thread Michael Paquier

On Mon, Dec 4, 2017 at 2:38 PM, Claudio Freire  wrote:
> They did apply at the time, but I think major work on vacuum was
> pushed since then, and also I was traveling so out of reach.
>
> It may take some time to rebase them again. Should I move to needs
> review myself after that?

Sure, if you can get into this state, please feel free to update the
status of the patch yourself.
-- 
Michael

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-12-03 Thread Claudio Freire

On Tue, Nov 28, 2017 at 10:37 PM, Michael Paquier
 wrote:
> On Mon, Oct 2, 2017 at 11:02 PM, Claudio Freire  
> wrote:
>> Rebased version of the patches attached
>
> The status of the patch is misleading:
> https://commitfest.postgresql.org/15/844/. This was marked as waiting
> on author but a new version has been published. Let's be careful.
>
> The last patches I am aware of, aka those from
> https://www.postgresql.org/message-id/CAGTBQpZHTf2JtShC=ijc9wzeipo3xokwqhx+8wip7zjpc3f...@mail.gmail.com,
> do not apply. I am moving the patch to the next commit fest with a
> waiting on author status, as this should be reviewed, but those need a
> rebase.

They did apply at the time, but I think major work on vacuum was
pushed since then, and also I was traveling so out of reach.

It may take some time to rebase them again. Should I move to needs
review myself after that?

Re: Partition pruning for Star Schema

2017-12-03 Thread Mark Kirkwood


On 04/12/17 16:08, Ashutosh Bapat wrote:


On Sun, Dec 3, 2017 at 5:56 AM, legrand legrand
 wrote:

Hello,

I have a typical star schema, having dimension tables "product", "calendar"
and "country" and a fact table "sales".
This fact table is partitionned by time (range by month) and country (list).

Will query like:

select product.name, calendar.month, sum(sales.net_price)
from sales
  inner join product on (product.id = sales.cust_id)
  inner join country on (country.id = sales.country_id)
  inner join calendar on (calendar.id = sales.calendar_id)
where
  country.name = 'HERE'
  and calendar.year = '2017'
group by product.name,calendar.month

be able to identify needed partitions ?


AFAIU partition pruning, it works only with the partition key columns.
So, if country.name and calendar.year are the partition keys partition
pruning would identify the needed partitions from those tables. But
planner doesn't know that calendar.year is somehow related to
calendar.id and then transfer that knowledge so that partitions of
sales can be identified.



If you can get your code to perform a star transformation on this type 
of query, then you might see some partition pruning.


Cheers

Mark

Re: [HACKERS] [POC] Faster processing at Gather node

2017-12-03 Thread Amit Kapila

On Fri, Dec 1, 2017 at 8:04 PM, Robert Haas  wrote:
> On Sun, Nov 26, 2017 at 3:15 AM, Amit Kapila  wrote:
>> Yeah and I think something like that can happen after your patch
>> because now the memory for tuples returned via TupleQueueReaderNext
>> will be allocated in ExecutorState and that can last for long.   I
>> think it is better to free memory, but we can leave it as well if you
>> don't feel it important.  In any case, I have written a patch, see if
>> you think it makes sense.
>
> Well, I don't really know.  My intuition is that in most cases after
> ExecShutdownGatherMergeWorkers() we will very shortly thereafter call
> ExecutorEnd() and everything will go away.
>

I thought there are some cases (though less) where we want to Shutdown
the nodes (ExecShutdownNode) earlier and release the resources sooner.
However, if you are not completely sure about this change, then we can
leave it as it.  Thanks for sharing your thoughts.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Partition pruning for Star Schema

2017-12-03 Thread Ashutosh Bapat

On Sun, Dec 3, 2017 at 5:56 AM, legrand legrand
 wrote:
> Hello,
>
> I have a typical star schema, having dimension tables "product", "calendar"
> and "country" and a fact table "sales".
> This fact table is partitionned by time (range by month) and country (list).
>
> Will query like:
>
> select product.name, calendar.month, sum(sales.net_price)
> from sales
>  inner join product on (product.id = sales.cust_id)
>  inner join country on (country.id = sales.country_id)
>  inner join calendar on (calendar.id = sales.calendar_id)
> where
>  country.name = 'HERE'
>  and calendar.year = '2017'
> group by product.name,calendar.month
>
> be able to identify needed partitions ?
>

AFAIU partition pruning, it works only with the partition key columns.
So, if country.name and calendar.year are the partition keys partition
pruning would identify the needed partitions from those tables. But
planner doesn't know that calendar.year is somehow related to
calendar.id and then transfer that knowledge so that partitions of
sales can be identified.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: pl/perl extension fails on Windows

2017-12-03 Thread Noah Misch

On Wed, Nov 29, 2017 at 08:14:41PM -0800, Noah Misch wrote:
> 1. If $Config{gccversion} is nonempty, add _USE_32BIT_TIME_T.  This will do
>the wrong thing if MinGW changes its default to match modern MSVC.  It will
>do the wrong thing for a Perl built with "gcc -D__MINGW_USE_VC2005_COMPAT".
> 
> 2. When configuring the build, determine whether to add _USE_32BIT_TIME_T by
>running a test program built with and without that symbol.  Perhaps have
>the test program store and retrieve a PL_modglobal value.  (PL_modglobal
>maps to a PerlInterpreter field that follows the fields sensitive to
>_USE_32BIT_TIME_T, and perlapi documents it since 5.8.0 or earlier.)  This
>is more principled than (1), but it will be more code and may have weird
>interactions with rare Perl build options.
> 
> I am inclined toward (2) if it takes no more than roughly a hundred lines of
> code, else (1).  Opinions?  I regret investing in 32-bit Windows.  If there's
> any OS where a 32-bit PostgreSQL server makes sense today, it's not Windows.

Here's an implementation of (2).  This is more intricate than I hoped.  One
could argue for (1), but I estimate (2) wins by a nose.  I successfully tested
http://strawberryperl.com/download/5.14.4.1/strawberry-perl-5.14.4.1-32bit.msi
(Perl 5.14.4; MinGW-built; must have -D_USE_32BIT_TIME_T) and
http://get.enterprisedb.com/languagepacks/edb-languagepack-10-3-windows.exe
(Perl 5.24.0; MSVC-built; must not have -D_USE_32BIT_TIME_T).  I also tried
http://strawberryperl.com/download/5.8.9/strawberry-perl-5.8.9.5.msi, which
experienced a StackHash_0a9e crash during PERL_SYS_INIT3(), with or without
-D_USE_32BIT_TIME_T.  I expect that breakage is orthogonal.  I didn't have
ready access to obsolete MSVC-built Perl, so it will be interesting to see how
the buildfarm likes this.
MSVC: Test whether 32-bit Perl needs -D_USE_32BIT_TIME_T.

Commits 5a5c2feca3fd858e70ea348822595547e6fa6c15 and
b5178c5d08ca59e30f9d9428fa6fdb2741794e65 introduced support for modern
MSVC-built, 32-bit Perl, but they broke use of MinGW-built, 32-bit Perl
distributions like Strawberry Perl and modern ActivePerl.  Perl has no
robust means to report whether it expects a -D_USE_32BIT_TIME_T ABI, so
test this.  Back-patch to 9.3 (all supported versions).

The chief alternative was a heuristic of adding -D_USE_32BIT_TIME_T when
$Config{gccversion} is nonempty.  That banks on every gcc-built Perl
using the same ABI.  gcc could change its default ABI the way MSVC once
did, and one could build Perl with gcc and the non-default ABI.

The GNU make build system could benefit from a similar test, without
which it does not support MSVC-built Perl.  For now, just add a comment.
Most users taking the special step of building Perl with MSVC probably
build PostgreSQL with MSVC.

Discussion: https://postgr.es/m/20171130041441.ga3161...@rfd.leadboat.com


diff --git a/config/perl.m4 b/config/perl.m4
index 76b1a92..caefb07 100644
--- a/config/perl.m4
+++ b/config/perl.m4
@@ -48,19 +48,23 @@ AC_DEFUN([PGAC_CHECK_PERL_CONFIGS],
 
 # PGAC_CHECK_PERL_EMBED_CCFLAGS
 # -
-# We selectively extract stuff from $Config{ccflags}.  We don't really need
-# anything except -D switches, and other sorts of compiler switches can
-# actively break things if Perl was compiled with a different compiler.
-# Moreover, although Perl likes to put stuff like -D_LARGEFILE_SOURCE and
-# -D_FILE_OFFSET_BITS=64 here, it would be fatal to try to compile PL/Perl
-# to a different libc ABI than core Postgres uses.  The available information
-# says that all the symbols that affect Perl's own ABI begin with letters,
-# so it should be sufficient to adopt -D switches for symbols not beginning
-# with underscore.  An exception is that we need to let through
-# -D_USE_32BIT_TIME_T if it's present.  (We probably could restrict that to
-# only get through on Windows, but for the moment we let it through always.)
-# For debugging purposes, let's have the configure output report the raw
-# ccflags value as well as the set of flags we chose to adopt.
+# We selectively extract stuff from $Config{ccflags}.  For debugging purposes,
+# let's have the configure output report the raw ccflags value as well as the
+# set of flags we chose to adopt.  We don't really need anything except -D
+# switches, and other sorts of compiler switches can actively break things if
+# Perl was compiled with a different compiler.  Moreover, although Perl likes
+# to put stuff like -D_LARGEFILE_SOURCE and -D_FILE_OFFSET_BITS=64 here, it
+# would be fatal to try to compile PL/Perl to a different libc ABI than core
+# Postgres uses.  The available information says that most symbols that affect
+# Perl's own ABI begin with letters, so it's almost sufficient to adopt -D
+# switches for symbols not beginning with underscore.  Some exceptions are the
+# Windows-specific -D_USE_32BIT_TIME_T and -D__MINGW_USE_VC2005_COMPAT; see
+# Mkvcbuild.pm for details.  We absorb t

Re: [HACKERS] Partition-wise aggregation/grouping

2017-12-03 Thread Ashutosh Bapat

On Sat, Dec 2, 2017 at 4:08 AM, Robert Haas  wrote:
> On Fri, Dec 1, 2017 at 7:41 AM, Ashutosh Bapat
>  wrote:
>> This code creates plans where there are multiple Gather nodes under an Append
>> node.
>
> We should avoid that.  Starting and stopping workers is inefficient,
> and precludes things like turning the Append into a Parallel Append.

Ah, I didn't think about it. Thanks for bringing it up.

>
>> AFAIU, the workers assigned to one gather node can be reused until that
>> Gather node finishes. Having multiple Gather nodes under an Append mean that
>> every worker will be idle from the time that worker finishes the work till 
>> the
>> last worker finishes the work.
>
> No, workers will exit as soon as they finish.  They don't hang around idle.

Sorry, I think I used wrong word "idle". I meant that if a worker
finishes and exists, the query can't use it that worker slot until the
next Gather node starts. But as you pointed out, starting and stopping
a worker is costlier than the cost of not using the slot. So we should
avoid such plans.

>
>> index b422050..1941468 100644
>> --- a/src/tools/pgindent/typedefs.list
>> +++ b/src/tools/pgindent/typedefs.list
>> @@ -2345,6 +2345,7 @@ UnlistenStmt
>>  UnresolvedTup
>>  UnresolvedTupData
>>  UpdateStmt
>> +UpperPathExtraData
>>  UpperRelationKind
>>  UpperUniquePath
>>  UserAuth
>>
>> Do we commit this file as part of the feature?
>
> Andres and I regularly commit such changes; Tom rejects them.
>

We will leave it to the committer to decide what to do with this hunk.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Runtime Partition Pruning

2017-12-03 Thread Robert Haas

On Sat, Dec 2, 2017 at 3:33 AM, Beena Emerson  wrote:
>  Append  (cost=0.00..395.10 rows=9 width=8) (actual time=0.119..0.119
> rows=0 loops=1) (run-time partition pruning: on)

If we can, it would be better to show something a bit more precise,
like the table being used for run-time pruning, or the expression
being used for pruning.

Also, we shouldn't use an ad-hoc format like "(run-time
partition-pruning: on)"; rather, we should display something using one
of the ExplainPropertyBlah functions in explain.c.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-12-03 Thread Ashutosh Bapat

On Sat, Dec 2, 2017 at 2:13 AM, Robert Haas  wrote:
> On Fri, Dec 1, 2017 at 1:36 AM, Rajkumar Raghuwanshi
>  wrote:
>> On Tue, Oct 31, 2017 at 2:45 PM, Robert Haas  wrote:
 OK, committed.  This is a good example of how having good code
>>> coverage doesn't necessarily mean you've found all the bugs.  :-)
>>>
>> As of now partition_join.sql is not having test cases covering cases
>> where partition table have default partition, attaching a small test
>> case patch to cover those.
>
> That's not that small, and to me it looks like overkill.
>

I agree, the patch looks longer than expected. I think, it's important
to have some testcases to test partition-wise join with default
partitions. I think we need at least one test for range default
partitions, one test for list partitioning, one for multi-level
partitioning and one negative testcase with default partition missing
from one side of join.

May be we could reduce the number of SQL commands and queries in the
patch by adding default partition to every table that participates in
partition-wise join (leave the tables participating in negative tests
aside.). But that's going to increase the size of EXPLAIN outputs and
query results. The negative test may simply drop the default partition
from one of the tables.

For every table being tested, the patch adds two ALTER TABLE commands,
one for detaching an existing partition and then attach the same as
default partition. Alternative to that is just add a new default
partition without detaching and existing partition. But then the
default partition needs to populated with some data, which requires 1
INSERT statement at least. That doesn't reduce the size of patch, but
increases the output of query and EXPLAIN plan.

May be in case of multi-level partitioning test, we don't need to add
DEFAULT in every partitioned relation; adding to one of them would be
enough. May be add it to the parent, but that too can be avoided. That
would reduce the size of patch a bit.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-12-03 Thread Jing Wang

Hi,

I have rebased the patch on the latest version.

Because the CURRENT_DATABASE can not only being used on COMMENT ON
statement but also on other statements as following list so the patch name
is renamed to "support_CURRENT_DATABASE_keyword_vxx.patch".


1. COMMENT ON DATABASE CURRENT_DATABASE is ...
2. ALTER DATABASE CURRENT_DATABASE OWNER to ...
3. ALTER DATABASE CURRENT_DATABASE SET parameter ...
4. ALTER DATABASE CURRENT_DATABASE RESET parameter ...
5. ALTER DATABASE CURRENT_DATABASE RESET ALL
6. SELECT CURRENT_DATABASE
7. SECURITY LABEL ON DATABASE CURRENT_DATABASE
8. ALTER ROLE ... IN DATABASE CURRENT_DATABASE SET/RESET
configuration_parameter
9. GRANT ... ON DATABASE CURRENT_DATABASE TO role_specification ...
10. REVOKE ... ON DATABASE CURRENT_DATABASE FROM ...


Regards,
Jing Wang
Fujitsu Australia


support_CURRENT_DATABASE_keyword_v4.5.patch
Description: Binary data

Do we actually need an ItemId array on nbtree pages containing fixed-width tuples?

2017-12-03 Thread Peter Geoghegan

I've been working on adding a new feature to pg_hexedit [1] to allow
it to work with GIN indexes. This led to an idea about how we could do
better within nbtree, which I will now describe.

I noticed that GIN's internal posting tree pages (B-Trees over TIDs)
do away with the need for an ItemId array, despite more or less
working in the same way as classic B-Tree pages (posting tree *leaf*
pages are a bit weird OTOH, but that's not important). This is
possible because internal posting list pages store fixed-width
PostingItem structs instead of IndexTuples. A posting tree only needs
to work with TID key values -- not values of some arbitrary data type,
that could be a variable-width type. This raises the question: since
we really just use ItemIds to deal with variable-width tuples, can we
do away with the need for an ItemId array at the start of nbtree pages
in some important, common cases?

Applicability
=

When there is a single attribute within an ItemId array, that we know
will always be 16 bytes (on 64-bit systems), ISTM that it should be
possible to not have any ItemId array. Simple pointer arithmetic on
the page, based on a tuple width that is known at compile time can be
used instead -- no need to go off of an lp_off value -- see
GinDataPageGetPostingItem() for an example of this. All ItemId
metadata becomes redundant, and we add IndexTuples from the start of
the page, just after the page header, and continue to add them until
the special area at the end of the page is reached (again, see the GIN
posting tree code).

Note that ItemId is already quasi-redundant within indexes. In
general, nbtree indexes only actually need the LP_DEAD bit, as well as
lp_off. (lp_len is simply redundant, even for variable-width index
tuples [2].)

Advantages
==

Before I go into more depth on implementation issues, let me first
list off the ways in which I'd expect this to improve performance:

* We currently fit 367 IndexTuples on to a leaf page of a
single-int4-or-int8-attribute nbtree index (with the default leaf
fillfactor and BLCKSZ) after many SERIAL/BIGSERIAL style sequential
inserts (workloads where all leaf page splits are 10:90 splits of the
rightmost page). If we removed the ItemId array, we'd be able to fit
458 IndexTuples on the left half post-split (while still leaving the
same amount of freespace on the page as before). IOW, we'd make
indexes that avail of the optimization about 20% smaller, thereby
reducing random I/O and so on.

The benefits for internal pages would also be significant, because we
win in at least 3 places:

* The first win comes from the fact that we can store more IndexTuples
(downlinks) now, because the optimization works just as well for
internal pages as it does for leaf pages. This much is obvious.

* The second win comes from the fact that there are now about 20%
fewer downlinks/internal tuples needed in the first place, because
that directly corresponds to the number of pages one level down. The
advantages *compound*; our savings in the total number of internal
pages are close to 40% (it's less than a full 40% because of the 30:70
splits we do on internal pages).

* The third win comes from the fact that we will make more efficient
use of CPU cache within internal pages, by packing IndexTuples
together. Binary searches with many cache misses are something that
has been identified as a bottleneck for some workloads [3].

I haven't bothered to apply the classic "2/3 of a page" utilization
rule for my free space accounting in this example, just to keep things
simple. (That rule applies to workloads with random insertions.)

The general approach here should be to keep things simple by changing
the physical layout of B-Tree pages (where the optimization applies)
without changing their logical structure. So, very few changes ought
to be required within amcheck to make it work with this, and they
should all be fairly mechanical. Similarly, nothing would change about
the interface of the nbtree pageinspect functions. We never make any
IndexTuple bigger, so we can mix the old and new formats when
pg_upgrade is used without needing to be very defensive all over the
place.

Issues
==

I'm not planning to work on this myself, so I encourage other
contributors to think about it. These tricky issues will need to be
worked through:

* In the case of GIN, routines like GinDataPageAddPostingItem() are
used instead of the generic PageAddItem() routine. This optimization
requires similar new code that performs page modifications that must
care about page space management (code that must memmove() ranges of
tuples to make space for a new tuples, and so on).

The implementation should probably generalize its approach, so that
this optimization could in principle be used by other index AMs, such
as the hash AM. The PageAddItem() code (and similar bufpage.c code)
that is now used by nbtree would only be used in pages that still need
an ItemId array. The new fixed-width-aware cod

Re: [HACKERS] Constraint exclusion for partitioned tables

2017-12-03 Thread Ashutosh Bapat

On Sat, Dec 2, 2017 at 1:11 AM, Robert Haas  wrote:
> On Fri, Dec 1, 2017 at 12:21 AM, Michael Paquier
>  wrote:
>> On Wed, Sep 13, 2017 at 4:07 PM, Ashutosh Bapat
>>  wrote:
>>> For a partitioned table, this patch saves the time to run constraint
>>> exclusion on all the partitions if constraint exclusion succeeds on
>>> the partitioned table. If constraint exclusion fails, we have wasted
>>> CPU cycles on one run of constraint exclusion. The difference between
>>> the time spent in the two scenarios increases with the number of
>>> partitions. Practically, users will have a handful partitions rather
>>> than a couple and thus running overhead of running constraint
>>> exclusion on partitioned table would be justified given the time it
>>> will save when CE succeeds.
>>
>> Moved patch to next CF.
>
> Committed after adding a comment.  Generally, code changes should be
> accompanied by comment updates.

Thanks for committing the patch. Sorry for not including the comments.
Your comment looks good.

>
> I tested this and found out that this is quite useful for cases where
> multiple levels of partitioning are in use.  Consider creating 100
> partitions like this:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> print "create table foo (a int, b int, c text) partition by list (a);\n";
> for $a (1..10)
> {
> print "create table foo$a partition of foo for values in ($a)
> partition by list (b);\n";
> for $b (1..10)
> {
> print "create table foo${a}_$b partition of foo$a for values
> in ($b);\n";
> }
> }
>
> Then consider this query: select * from foo where a = 5;
>
> Without this patch, we have to reject 90 leaf partitions individually,
> but with the patch, we can reject the intermediate partitioned tables;
> each time we do, it substitutes for rejecting 10 children
> individually.  This seems to me to be a case that is quite likely to
> come up in the real world.
>

Right. Thanks for the testing.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Bitmap scan is undercosted?

2017-12-03 Thread Tom Lane

I wrote:
> I tried creating multiple-column statistics using the v10 facility for
> that:
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> ANALYZE
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again.  But I've not
> looked closer yet.

After looking, I found that indeed dependency_is_compatible_clause()
rejects expressions like "flag" or "NOT flag", which it needn't since
those are fully equivalent to "flag = true" or "flag = false"
respectively.  Moreover there's nothing else in
dependencies_clauselist_selectivity that depends on the exact form of
the clause under test, only on the semantics that it's an equality
condition on some Var.  Hence I propose the attached patch, which
fixes the rowcount estimate for the example discussed in this thread:

create table aaa as select (id%100)::int num, (id%10=1)::bool flag from 
generate_series(1, 1000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
create statistics s1 on num, flag from aaa;
analyze aaa;

explain analyze select count(*) from aaa where num = 1 and flag = true;

Without patch:

 Aggregate  (cost=43236.73..43236.74 rows=1 width=8) (actual time=349.365..349.3
65 rows=1 loops=1)
   ->  Bitmap Heap Scan on aaa  (cost=20086.40..43212.94 rows=9514 width=0) (act
ual time=101.308..337.985 rows=10 loops=1)
 Recheck Cond: (num = 1)
 Filter: flag
 Heap Blocks: exact=44248
 ->  BitmapAnd  (cost=20086.40..20086.40 rows=9514 width=0) (actual time
=92.214..92.214 rows=0 loops=1)
   ->  Bitmap Index Scan on i1  (cost=0.00..1776.43 rows=96000 width
=0) (actual time=17.236..17.236 rows=10 loops=1)
 Index Cond: (num = 1)
   ->  Bitmap Index Scan on i2  (cost=0.00..18304.96 rows=991003 wid
th=0) (actual time=72.168..72.168 rows=100 loops=1)
 Index Cond: (flag = true)
 Planning time: 0.254 ms
 Execution time: 350.796 ms

With patch:

 Aggregate  (cost=43496.19..43496.20 rows=1 width=8) (actual time=359.195..359.1
95 rows=1 loops=1)
   ->  Bitmap Heap Scan on aaa  (cost=20129.64..43256.19 rows=96000 width=0) (ac
tual time=99.750..347.353 rows=10 loops=1)
 Recheck Cond: (num = 1)
 Filter: flag
 Heap Blocks: exact=44248
 ->  BitmapAnd  (cost=20129.64..20129.64 rows=9514 width=0) (actual time
=90.671..90.671 rows=0 loops=1)
   ->  Bitmap Index Scan on i1  (cost=0.00..1776.43 rows=96000 width
=0) (actual time=16.946..16.946 rows=10 loops=1)
 Index Cond: (num = 1)
   ->  Bitmap Index Scan on i2  (cost=0.00..18304.96 rows=991003 wid
th=0) (actual time=70.898..70.898 rows=100 loops=1)
 Index Cond: (flag = true)
 Planning time: 0.218 ms
 Execution time: 360.608 ms

That's the right overall rowcount estimate for the scan, given the stats
it's working from.  There's apparently still something wrong with bitmap
costing, since it's still estimating this as cheaper than the single-index
case --- noting the bogus rowcount estimate for the BitmapAnd, I suspect
that bitmap costing is taking shortcuts rather than using
clauselist_selectivity to estimate the overall selectivity of the bitmap
conditions.  But whatever is causing that, it's independent of this
deficiency.

In addition to the bugfix proper, I improved some comments, got rid of
a NumRelids() test that's redundant with the preceding bms_membership()
test, and fixed dependencies_clauselist_selectivity so that
estimatedclauses actually is a pure output argument as stated by its
API contract.

regards, tom lane

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 9756fb8..ae0f304 100644
*** a/src/backend/statistics/dependencies.c
--- b/src/backend/statistics/dependencies.c
*** pg_dependencies_send(PG_FUNCTION_ARGS)
*** 736,826 
   * dependency_is_compatible_clause
   *		Determines if the clause is compatible with functional dependencies
   *
!  * Only OpExprs with two arguments using an equality operator are supported.
!  * When returning True attnum is set to the attribute number of the Var within
!  * the supported clause.
!  *
!  * Currently we only support Var = Const, or Const = Var. It may be possible
!  * to expand on this later.
   */
  static bool
  dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
  {
  	RestrictInfo *rinfo = (RestrictInfo *) clause;
  
  	if (!IsA(rinfo, RestrictInfo))
  		return false;
  
! 	/* Pseudoconstants are not really inte

Re: pg_dumpall -r -c try to drop user postgres

2017-12-03 Thread Michael Paquier

On Sun, Dec 3, 2017 at 3:21 PM, Pavel Stehule  wrote:
> I am not sure if user postgres should be removed, so it is probably bug
>
> pg_dumpall -r -c | grep postgres
>
> DROP ROLE postgres;
> CREATE ROLE postgres;

You are looking for this bit of code:
/*
 * If asked to --clean, do that first.  We can avoid detailed
 * dependency analysis because databases never depend
on each other,
 * and tablespaces never depend on each other.  Roles could have
 * grants to each other, but DROP ROLE will clean
those up silently.
 */
if (output_clean)
{
if (!globals_only && !roles_only && !tablespaces_only)
dropDBs(conn);

if (!roles_only && !no_tablespaces)
dropTablespaces(conn);

if (!tablespaces_only)
dropRoles(conn);
}
Could you clarify what you think is wrong here?
-- 
Michael

Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-12-03 Thread Deep-Impact

From: Tom Lane
It sounds like what you want is to replace all of Postgres except
the name.  I'm not clear on the point.


The point is to make PostgreSQL a versatile database suitable even for
niche use cases.  I want more people to love and rely on PostgreSQL.
Ideally, I want to see various data models incorporated in core from
the beginning, but it would be difficult.  So, this pluggable data
model is necessary to foster data model development both inside and
outside the PostgreSQL community.  With that said, I hope PostgreSQL
will someday absorb those data models, just like PL/pgSQL is now in
core.

But for now, I think the relational data model will continue to play a
central role.  I don't think it's worth making the current relational
data model implementation a plugged module.  It will depend on the
software design and source code cleanness whether to do that.

I don't understand yet how painful it would be to support other data
models as first-class citizens, and I may be a reckless man who
doesn't know fear.  So I wish you could help and pave this difficult
way together.

Regards
MauMau

Re: [HACKERS] postgres_fdw super user checks

2017-12-03 Thread Stephen Frost

Robert, all,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 12:31 AM, Michael Paquier
>  wrote:
> > I am moving this patch to next CF 2018-01.
> 
> There now seems to be a consensus for superuser -> superuser_arg
> rather than what Jeff did originally; that approach has 4 votes and
> nothing else has more than 1. So, here's a patch that does it that
> way.

I've taken a quick look and this looks good to me.

> I tried to see if some documentation update was needed, but I think
> the documentation already reflects the proposed new behavior.  It
> says:
> 
>
> Only superusers may connect to foreign servers without password
> authentication, so always specify the password option
> for user mappings belonging to non-superusers.
>
> 
> Currently, however, that's not accurate.  Right now you need to
> specify the password option for user mappings that will be *used by*
> non-superusers, not user mappings *belonging to* non-superusers.  So
> this patch is, I think, just making the actual behavior match the
> documented behavior.  Not sure if anyone has any other suggestions
> here.  I think this is definitely a master-only change; should we try
> to insert some kind of warning into the back-branch docs?  I
> definitely think this should be called out in the v11 release notes.

I'm not a fan of having *only* warning in the back-branches.  What I
would think we'd do here is correct the back-branch documentation to be
correct, and then add a warning that it changes in v11.

You didn't suggest an actual change wrt the back-branch warning, but it
seems to me like it'd end up being morally equivilant to "ok, forget
what we just said, what really happens is X, but we fix it in v11."

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Small improvement to compactify_tuples

2017-12-03 Thread Юрий Соколов

hi,

On Wed, Nov 29, 2017 at 8:00 AM, Peter Geoghegan  wrote:
> On Tue, Nov 28, 2017 at 2:41 PM, Andres Freund  wrote:
>> Maybe it's a stupid question. But would we still want to have this after
>> the change? These should be just specializations of the template version
>> imo.

"generic" version operates on bytes, and it will be a bit hard to combine
it with
templated version. Not impossible, but it will look ugly.

> I also wonder why regression test output has changed. Wasn't this
> supposed to be a mechanical change in how the templating is
> implemented? Why would the behavior of the algorithm change, even if
> the change is only a change in the output order among equal elements?

I did some change to algorithm then. But I reverted changes, and now no need
in test fixes.

> Also, is that one last raw CHECK_FOR_INTERRUPTS() in the template
> definition supposed to be there?

There was error. Fixed.

In attach fixed qsort_template version.
And version for compactify_tuples with bucket_sort and templated qsort.

With regards,
Sokolov Yura.

0001-Header-for-customized-qsort.patch.gz
Description: GNU Zip compressed data

0002-Bucket-sort-for-compactify_tuples.patch.gz
Description: GNU Zip compressed data

Re: [HACKERS] postgres_fdw bug in 9.6

2017-12-03 Thread Tom Lane

Robert Haas  writes:
> But have a look at this:
> http://postgr.es/m/561e12d4.7040...@lab.ntt.co.jp
> That shows a case where, before
> 5fc4c26db5120bd90348b6ee3101fcddfdf54800, a query that required the
> foreign table to do an EPQ recheck produced an unambiguously wrong
> answer; the query stipulates that inttab.a = ft1.a but the row
> returned lacks that property.  I think this clearly shows that EPQ
> rechecks are needed at least for FDW paths that are parameterized.

Certainly; that's what I said before.

> However, I guess that doesn't actually refute your point, which IIUC
> is that maybe we don't need these checks for FDW join paths, because
> we don't build parameterized paths for those yet. Now I think it would
> still be a good idea to have a way of handling that case, because
> somebody's likely going want to fix that /* XXX Consider parameterized
> paths for the join relation */ eventually, but I guess for the
> back-branches just setting epq_path = NULL instead of calling
> GetExistingLocalJoinPath() might be the way to go... assuming that
> your contention that this won't break anything is indeed correct.

I thought some more about this.  While it seems clear that we don't
actually need to recheck anything within a postgres_fdw foreign join,
there's still a problem: we have to be able to regurgitate the join
row during postgresRecheckForeignScan.  Since, in the current system
design, the only available data is scan-level rowmarks (that is,
whole-row values from each base foreign relation), there isn't any
very good way to produce the join row except to insert the rowmark
values into dummy scan nodes and then re-execute the join locally.
So really, that is what the outerPlan infrastructure is doing for us.

We could imagine that, instead of the scan-level rowmarks, the foreign
join node could produce a ROW() expression containing the values it
actually has to output, and then use that as a "join-level rowmark" to
reconstitute the join row without any of the outerPlan infrastructure.
This would have some pretty significant advantages, notably that we
could (in principle) avoid fetching all the columns of remote tables
we no longer needed scan-level rowmarks for.

However, to do that, we'd have to dynamically decide which rowmark
expressions actually need to wind up getting computed in the final join
plan, based on which joins we choose to do remotely.  The current scheme
of statically adding rowmarks to the query during base relation setup
doesn't cut it.  So that's sounding like a nontrivial rewrite (and one
that would break the related FDW APIs, although the patch at hand does
that too).

As for the question of a future extension to parameterized paths,
I think that it would likely work to recheck the parameterized qual
at the join node, making it basically just like rechecks for scan-level
nodes.  The objection to this is that it might not produce the same
answer if the parameterized qual references relations that are on the
insides of outer joins ... but I think that in practice we could just
blow off that case (ie, reject producing parameterized paths involving
such quals).  The reason is that I think such cases arise only very
rarely.  It could only happen for non-strict qual expressions, because
a strict qual would have resulted in the outer join getting strength
reduced to a regular join.

Not sure where that leaves us for the patch at hand.  It feels to me
like it's a temporary band-aid for code that we want to get rid of
in the not-too-long run.  As such, it'd be okay if it were smaller,
but it seems bigger and more invasive than I could wish for a band-aid.

regards, tom lane

Re: [HACKERS] <> join selectivity estimate question

2017-12-03 Thread Tom Lane

Thomas Munro  writes:
> So, in that plan we saw anti-join estimate 1 row but really there were
> 13462.  If you remove most of Q21 and keep just the anti-join between
> l1 and l3, then you try removing different quals, you can see the the
> problem is not the <> qual:

>   select count(*)
> from lineitem l1
>where not exists (
> select *
>   from lineitem l3
>  where l3.l_orderkey = l1.l_orderkey
>and l3.l_suppkey <> l1.l_suppkey
>and l3.l_receiptdate > l3.l_commitdate
> )
>   => estimate=1 actual=8998304

ISTM this is basically another variant of ye olde column correlation
problem.  That is, we know there's always going to be an antijoin match
for the l_orderkey equality condition, and that there's always going to
be matches for the l_suppkey inequality, but what we don't know is that
l_suppkey is correlated with l_orderkey so that the two conditions aren't
satisfied at the same time.  The same thing is happening on a smaller
scale with the receiptdate/commitdate comparison.

I wonder whether the extended stats machinery could be brought to bear
on this problem.

regards, tom lane

Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

2017-12-03 Thread MauMau

From: Henry
Would this require a the new pluggable storage which is currently in
development or would the existing storage engine be sufficient? I am
just wondering if there are any rough design/plans for this...

I'm sorry for the long interval.  The graph model can be implemented
on top of the relational storage engine, like Oracle and SQL Server
stores graph data in relational tables.  But the optimal storage
engine for the graph model is required to maximize performance for
graph traversal, which introduces direct pointers between graph nodes.
So I think new level of pluggable storage is necessary.

Regards
MauMau

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2017-12-03 Thread Ashutosh Bapat

On Fri, Oct 13, 2017 at 7:59 AM, Ashutosh Bapat
 wrote:
> On Thu, Oct 12, 2017 at 9:46 PM, Robert Haas  wrote:
>> On Wed, Oct 11, 2017 at 7:08 AM, Ashutosh Bapat
>>  wrote:
>>> Here's updated patch set based on the basic partition-wise join
>>> committed. The patchset applies on top of the patch to optimize the
>>> case of dummy partitioned tables [1].
>>>
>>> Right now, the advanced partition matching algorithm bails out when
>>> either of the joining relations has a default partition.
>>
>> So is that something you are going to fix?
>>
>
> Yes, if time permits. I had left the patch unattended while basic
> partition-wise join was getting committed. Now that it's committed, I
> rebased it. It still has TODOs and some work is required to improve
> it. But for the patch to be really complete, we have to deal with the
> problem of missing partitions described before. I am fine
> collaborating if someone else wants to pick it up.
>

Here's patchset which support advanced partition matching for
partition bounds with default partition. The patchset is rebased on
the latest head.

When a list value is present in one of the joining relations and not
the other, and the other relation has default partition, match (join)
the partition containing that list value with the default partition,
since the default partition may contain rows with that list value. If
the default partition happens to be on the outer side of the join, the
resulting join partition acts as a default partition as it will
contain all the values from the default partition. If the partition
containing the list value happens to be on the outer side of the join,
the resulting join partition is associated with the list value, since
no other partition key value from the default partition makes it to
the join result.

When a range is present (completely or partly) in one of the joining
relations and not the other, and the other relation has default
partition, match (join) the partition corresponding to that range with
the default partition. If the default partition happens to be on the
outer side of the join, the resulting join partition acts as a default
partition as it will contain all the values from the default
partition. If the non-partition corresponding to the range happens to
be on the outer side of the join, the resulting join partition is
associated with that range, since partition key values from the
default partition outside that range won't make it to the join result.

If both the relations have default partition, match (join) the default
partition with each other and deem the resulting join partition as
default partition. If one of the relations has default partition but
not the other, and the default partition happens to be on the outer
side of the join, all its rows will make it to the join.  Such a
default partition may get joined to a non-default partition from the
inner side, if inner side has a range missing in the outer side.

If any of the above causes multiple partitions from one side to match
with one or more partitions on the other side, we won't use
partition-wise join as discussed in the first mail of this thread.

I have tested the patches for two-way join, but haven't added any test
involving default partitions to the patch itself. It needs to be
tested for N-way join as well. So, for now I have kept the two patches
supporting the default partition in case of range and list resp.
separate. Also, some of the code duplication in partition matching
functions can be avoided using macros. I will merge those patches into
the main patch and add macros once they are tested appropriately.

For hash partitioned table, we haven't implemented the advanced
partition matching, since it would be rare that somebody has hash
partitioned tables with holes (even if they are allowed).

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pg_adv_dp_join_patches_v2.tar.gz
Description: GNU Zip compressed data

Re: [HACKERS] pow support for pgbench

2017-12-03 Thread Fabien COELHO




The fact that the return type is not consistently of one type bothers
me.  I'm not sure pgbench's expression language is a good place to
runtime polymorphism -- SQL doesn't work that way.


Sure.

Pg has a NUMERIC adaptative precision version, which is cheating, because it 
can return kind of an "int" or a "float", depending on whether there are 
digits after the decimal point or not.


Pgbench does not have support for NUMERIC, just INT & DOUBLE, so the current 
version is an approximation of that.


Now it is always possible to just do DOUBLE version, but this won't match SQL 
behavior either.


Another point I forgot: pgbench functions and operators are notably 
interesting to generate/transform keys in tables, which are usually 
integers, so having int functions when possible/appropriate is desirable, 
which explain why I pushed for having an int version for POW.


Also, pgbench does not have a static typing model because variable types 
are not declared "\set i ...", so the type is somehow "guessed" based on 
the string values, although if in doubt it is always possible to convert 
(with "int" & "double" functions).


So for me the philosophy is to have expression match SQL behavior when 
possible, as closely as possible, but it is not an exact match.


--
Fabien.

Errands around AllocateDir()

Re: Would a BGW need shmem_access or database_connection to enumerate databases?

Re: Do we actually need an ItemId array on nbtree pages containing fixed-width tuples?

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

Re: Partition pruning for Star Schema

Re: [HACKERS] [POC] Faster processing at Gather node

Re: Partition pruning for Star Schema

Re: pl/perl extension fails on Windows

Re: [HACKERS] Partition-wise aggregation/grouping

Re: [HACKERS] Runtime Partition Pruning

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

Do we actually need an ItemId array on nbtree pages containing fixed-width tuples?

Re: [HACKERS] Constraint exclusion for partitioned tables

Re: Bitmap scan is undercosted?

Re: pg_dumpall -r -c try to drop user postgres

Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Re: [HACKERS] postgres_fdw super user checks

Re: [HACKERS] Small improvement to compactify_tuples

Re: [HACKERS] postgres_fdw bug in 9.6

Re: [HACKERS] <> join selectivity estimate question

Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

Re: [HACKERS] pow support for pgbench

25 matches

Site Navigation

Mail list logo

Footer information