Re: ALTER TABLE ADD COLUMN fast default

2018-02-19 Thread Andrew Dunstan
On Tue, Feb 20, 2018 at 5:03 PM, Andrew Dunstan
 wrote:
> http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2


Wow, my c was working overtime. Good thing this won't do anyone any good.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [WIP PATCH] Index scan offset optimisation using visibility map

2018-02-19 Thread Andrey Borodin
Hi!

I've played with patch. I observe that in some expected scenarios it reduces 
read buffers significantly.

> 14 февр. 2018 г., в 0:04, Michail Nikolaev  
> написал(а):
> Patch updated + rebased on master. check-world is passing.

Minor spots:
There are some trailing whitespaces at line ends
> Offset cannot be optimized because parallel execution
I'd replace with 
> Offset cannot be optimized [in spite of | due to] parallel execution

More important thing: now nodeindexonlyscan.c and nodeindexscan.c share more 
similar code and comments. I do not think it is strictly necessary to avoid, 
but we certainly have to answer the question:
Is it possible to refactor code to avoid duplication?
Currently, patch is not invasive, touches only some relevant code. If 
refactoring will make it shotgun-suregery, I think it is better to avoid it.


> Still not sure about questions 0, 2, 3, 4, 5, and 6 from initial mail (about 
> explain, explain analyze, documentation and optimiser).

I think that I'd be cool if EXPLAIN ANALYZE printed heap fetch information if 
"Index-Only" way was used. But we already have this debug information in form 
of reduced count in "Buffers". There is nothing to add to plain EXPLAIN, in my 
opinion.
From my point of view, you should add to patch some words here
https://www.postgresql.org/docs/current/static/indexes-index-only-scans.html
and, if patch will be accepted, here
https://wiki.postgresql.org/wiki/Index-only_scans

I do not know if it is possible to take into account this optimization in cost 
estimation.
Does Limit node take cost of scanning into startup cost?

Thanks!

Best regards, Andrey Borodin.


Re: ALTER TABLE ADD COLUMN fast default

2018-02-19 Thread Andres Freund
Hi,

On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote:
> Anyway, I consider the performance to be OK. But perhaps Andres could
> comment on this too, as he requested the benchmarks.

My performance concerns were less about CREATE TABLE related things than
about analytics workloads or such, where deforming is the primary
bottleneck.  I think it should be ok, but doing a before/after tpc-h of
scale 5-10 or so wouldn't be a bad thing to verify.

Greetings,

Andres Freund



Re: ALTER TABLE ADD COLUMN fast default

2018-02-19 Thread Andrew Dunstan
http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2

On Mon, Feb 19, 2018 at 1:18 PM, David Rowley
 wrote:
> On 19 February 2018 at 13:48, Andrew Dunstan
>  wrote:
>> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley
>>  wrote:
>>> I'll try to spend some time reviewing the code in detail soon.
>>>
>>
>> Thanks. I'll fix the typos in the next version of the patch, hopefully
>> after your review.
>
> Here's a more complete review... I reserve the right to be wrong about
> a few things and be nitpicky on a few more...

I've dealt with most of these issues. The only change of substance is
the suggested change in slot_getmissingattrs().


> Once that's fixed, you could ditch the Min() call in the following code:
>
> attno = Min(attno, attnum);
>
> slot_deform_tuple(slot, attno);
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, read the
> * rest as NULLs or missing values
> */
> if (attno < attnum)
> {
> slot_getmissingattrs(slot, attno);
> }
>
> And change it to something more like:
>
> /*
> * If tuple doesn't have all the atts indicated by tupleDesc, deform as
> * much as possible, then fill the remaining required atts with nulls or
> * missing values.
> */
> if (attno < attnum)
> {
> slot_deform_tuple(slot, attno);
> slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter)
> }
> else
> slot_deform_tuple(slot, attnum);
>
> Which should result in a small performance increase due to not having
> to compare attno < attnum twice. Although, perhaps the average compile
> may optimise this anyway. I've not checked.
>


Maybe it would make a difference, but it seems unlikely to be significant.


> 11. Is there a reason why the following code in getmissingattr() can't
> be made into a bsearch?
>
[...]
> As far as I can see, the attrmiss is sorted by amnum. But maybe I just
> failed to find a case where it's not.
>
> I'd imagine doing this would further improve Tomas' benchmark score
> for the patched version, since he was performing a sum() on the final
> column.
>
> Although, If done, I'm actually holding some regard to the fact that
> users may one day complain that their query became slower after a
> table rewrite :-)
>
> Update: I've stumbled upon the following code:
>
> + /*
> + * We have a dependency on the attrdef array being filled in in
> + * ascending attnum order. This should be guaranteed by the index
> + * driving the scan. But we want to be double sure
> + */
> + if (!(attp->attnum > attnum))
> + elog(ERROR, "attribute numbers not ascending");
>
> and the code below this seems to also assemble the missing values in
> attnum order.
>
> While I'm here, I'd rather see this if test written as: if
> (attp->attnum <= attnum)
>
> Since the attnum order in the missing values appears to be well
> defined in ascending order, you can also simplify the comparison logic
> in equalTupleDescs(). The inner-most nested loop shouldn't be
> required.
>


Well, this comment in the existing code in equalTupleDescs() suggests
that in fact we can't rely on the attrdef items being in attnum order:

/*
 * We can't assume that the items are always read from the system
 * catalogs in the same order; so use the adnum field to identify
 * the matching item to compare.
 */

And in any case the code out of date since the code no longer ties
missing values to default values. So I've removed the comment and the
error check.

There are probably a few optimisations we can make if we can assume
that the missing values are in the Tuple Descriptor are in attnum
order. But it's unclear to me why they should be and the attrdef
entries not.



>
> 15. Why not: slot->tts_values[missattnum] = (Datum) 0;
>
> + slot->tts_values[missattnum] = PointerGetDatum(NULL);
>
> There are a few other places you've done this. I'm unsure if there's
> any sort of standard to follow.
>

This is a pretty widely used idiom in the source.


>
> I'll await the next version before looking again.
>


Thanks

attached.

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fast_default-v11.patch
Description: Binary data


RE: [doc fix] Correct calculation of vm.nr_hugepages

2018-02-19 Thread Tsunakawa, Takayuki
From: Justin Pryzby [mailto:pry...@telsasoft.com]
> One can do with fewer greps:
> pryzbyj@pryzbyj:~$ sudo pmap `pgrep -P 1 -u postgres` |awk
> '/rw-s/&&/zero/{print $2}' # check PPID=1 144848K

Thanks, I'd like to take this.

Regards
Takayuki Tsunakawa



hugepage_size_doc_v2.patch
Description: hugepage_size_doc_v2.patch


Re: Changing default value of wal_sync_method to open_datasync on Linux

2018-02-19 Thread Andres Freund
On 2018-02-20 01:56:17 +, Tsunakawa, Takayuki wrote:
> Disabling the filesystem barrier is a valid tuning method as the PG manual 
> says:

I don't think it says that:

> https://www.postgresql.org/docs/devel/static/wal-reliability.html
> 
> [Excerpt]
> Recent SATA drives (those following ATAPI-6 or later) offer a drive cache 
> flush command (FLUSH CACHE EXT), while SCSI drives have long supported a 
> similar command SYNCHRONIZE CACHE. These commands are not directly accessible 
> to PostgreSQL, but some file systems (e.g., ZFS, ext4) can use them to flush 
> data to the platters on write-back-enabled drives. Unfortunately, such file 
> systems behave suboptimally when combined with battery-backup unit (BBU) disk 
> controllers. In such setups, the synchronize command forces all data from the 
> controller cache to the disks, eliminating much of the benefit of the BBU. 
> You can run the pg_test_fsync program to see if you are affected. If you are 
> affected, the performance benefits of the BBU can be regained by turning off 
> write barriers in the file system or reconfiguring the disk controller, if 
> that is an option. If write barriers are turned off, make sure the battery 
> remains functional; a faulty battery can potentially lead to data loss. 
> Hopefully file system and disk controller designers will eventually address 
> this suboptimal behavior.

Note it's only valid if running with a BBU. In which case the
performance measurements you're doing aren't particularly meaningful
anyway, as you'd test BBU performance rather than disk performance.


Greetings,

Andres Freund



Re: SHA-2 functions

2018-02-19 Thread Michael Paquier
On Mon, Feb 19, 2018 at 08:43:44AM -0500, Peter Eisentraut wrote:
> I also noticed while working on some SSL code that we have perfectly
> good SHA-2 functionality in the server already, but it has no test
> coverage outside the SCRAM tests.

Yep, the refactoring in src/common/ has been done for SCRAM.  As The
first versions of the patch were for SCRAM-SHA-1, only SHA-1 functions
were moved.  With SCRAM-SHA-256, the full set of APIs for 256, 386 and
512 has been moved as they share much code.

> So I suggest these patches that expose the new functions sha224(),
> sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
> tests more robust, and it will allow them to be used in general purpose
> contexts over md5().

Huge +1 as well.  This also makes sure that the non-SSL implementation
carried in Postgres is consistent what what OpenSSL has.  This would
matter as well if new SSL implementations are added in the future.

+   sha224('abc')
+   
\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
Some bytea characters from the hash are not able to show up correctly?
This does not result in spaces.

+   Note that for historic reasons, the function md5
+   returns a hex-encoded value of type text whereas the SHA-2
+   functions return type bytea.  Use the functions
+   encode and decode to convert
+   between the two.
Adding an example would be nice.

varlena.c is already large and messy.  I would suggest to split into a
new file all the user-facing cryptographic functions, including md5 and
hex functions, say in src/backend/utils/adt/crypt.c.
--
Michael


signature.asc
Description: PGP signature


Re: Server crash in pg_replication_slot_advance function

2018-02-19 Thread Alvaro Herrera
Petr Jelinek wrote:

> >  Attached patch proposes a required fix.
> > 
> 
> Looks correct to me.

Masahiko Sawada wrote:

> The change looks good to me. Thank you.

Thanks, pushed.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Typo in procarray.c

2018-02-19 Thread Masahiko Sawada
Hi,

Attached patch for fixing $subject.

s/replicaton/replication/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_procarray_c.patch
Description: Binary data


RE: Changing default value of wal_sync_method to open_datasync on Linux

2018-02-19 Thread Tsunakawa, Takayuki
From: Andres Freund [mailto:and...@anarazel.de]
> Indeed. My past experience with open_datasync on linux shows it to be slower
> by roughly an order of magnitude. Even if that would turn out not to be
> the case anymore, I'm *extremely* hesitant to make such a change.

Thanks for giving so quick feedback.  An order of magnitude is surprising.  Can 
you share the environment (Linux distro version, kernel version, filesystem, 
mount options, workload, etc.)?  Do you think of anything that explains the 
degradation?  I think it is reasonable that open_datasync is faster than 
fdatasync because:

* Short transactions like pgbench require less system calls: 
write()+fdatasync() vs write().
* fdatasync() probably has to scan the page cache for dirty pages.

The above differences should be invisible on slow disks, but they will show up 
on faster storage.  I guess that's why Robert said open_datasync was much 
faster on NVRAM.

The manual says that pg_test_fsync is a tool for selecting wal_sync_method 
value, and it indicates open_datasync is better.  Why is fdatasync the default 
value only on Linux?  I don't understand as a user why PostgreSQL does the 
special handling.  If the current behavior of choosing fdatasync by default is 
due to some deficiency of old kernel and/or filesystem, I think we can change 
the default so that most users don't have to change wal_sync_method.


Regards
Takayuki Tsunakawa







Re: Changing default value of wal_sync_method to open_datasync on Linux

2018-02-19 Thread Mark Kirkwood

On 20/02/18 13:27, Tsunakawa, Takayuki wrote:


Hello,

I propose changing the default value of wal_sync_method from fdatasync to 
open_datasync on Linux.  The patch is attached.  I'm feeling this may be 
controversial, so I'd like to hear your opinions.

The reason for change is better performance.  Robert Haas said open_datasync 
was much faster than fdatasync with NVRAM in this thread:

https://www.postgresql.org/message-id/flat/c20d38e97bcb33dad59e...@lab.ntt.co.jp#c20d38e97bcb33dad59e...@lab.ntt.co.jp

pg_test_fsync shows higher figures for open_datasync:

[SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
--
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
 open_datasync 50829.597 ops/sec  20 usecs/op
 fdatasync 42094.381 ops/sec  24 usecs/op
 fsync  42209.972 ops/sec  
24 usecs/op
 fsync_writethroughn/a
 open_sync 48669.605 ops/sec  21 usecs/op
--


[HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
(the figures seem oddly high, though; this may be due to some VM configuration)
--
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
 open_datasync 34648.778 ops/sec  29 usecs/op
 fdatasync 31570.947 ops/sec  32 usecs/op
 fsync 27783.283 ops/sec  36 usecs/op
 fsync_writethrough  n/a
 open_sync 35238.866 ops/sec  28 usecs/op
--


pgbench only shows marginally better results, although the difference is within 
an error range.  The following is the tps of the default read/write workload of 
pgbench.  I ran the test with all the tables and indexes preloaded with 
pg_prewarm (except pgbench_history), and the checkpoint not happening.  I ran a 
write workload before running the benchmark so that no new WAL file would be 
created during the benchmark run.

[SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
--
1  2  3avg
fdatasync  17610  17164  16678  17150
open_datasync  17847  17457  17958  17754 (+3%)

[HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
(the figures seem oddly high, though; this may be due to some VM configuration)
--
   1 2 3   avg
fdatasync  4911  5225  5198  5111
open_datasync  4996  5284  5317  5199 (+1%)


As the removed comment describes, when wal_sync_method is open_datasync (or 
open_sync), open() fails with errno=EINVAL if the ext4 volume is mounted with 
data=journal.  That's because open() specifies O_DIRECT in that case.  I don't 
think that's a problem in practice, because data=journal will not be used for 
performance, and wal_level needs to be changed from its default replica to 
minimal and max_wal_senders must be set to 0 for O_DIRECT to be used.




I think the use of 'nobarrier' is probably disabling most/all reliable  
writing to the devices. What do the numbers look like if use remove this  
option?


regards

Mark



Re: master check fails on Windows Server 2008

2018-02-19 Thread Tom Lane
Marina Polyakova  writes:
> On 16-02-2018 19:31, Tom Lane wrote:
>> Weird.  AFAICS the cost estimates for those two plans should be quite
>> different, so this isn't just a matter of the estimates maybe being
>> a bit platform-dependent.  (And that test has been there nearly a
>> year without causing reported problems.)

> Thank you very much! Your test showed that hash aggregation was not even 
> added to the possible paths (see windows_regression.diffs attached). 
> Exploring this, I found that not allowing float8 to pass by value in 
> config.pl was crucial for the size of the hash table used in this query 
> (see diff.patch attached):

Ah-hah.  I can reproduce the described failure if I configure with
--disable-float8-byval on an otherwise 64-bit machine.  It appears that
the minimum work_mem setting that will allow this query to use a hashagg
plan on such a configuration is about 155kB (which squares with the
results you show).  On the other hand, in a normal 64-bit configuration,
work_mem settings of 160kB or more cause other failures (due to other
plans switching to hashagg), and on a 32-bit machine I see such failures
with work_mem of 150kB or more.  So there's basically no setting of
work_mem that would make all these tests pass everywhere.

I see several things we could do about this:

1. Nothing; just say "sorry, we don't promise that the regression tests
pass with no plan differences on nonstandard configurations".  Given that
--disable-float8-byval has hardly any real-world use, there is not a lot
of downside to that.

2. Decide that --disable-float8-byval, and for that matter
--disable-float4-byval, have no real-world use at all and take them out.
There was some point in those options back when we cared about binary
compatibility with version-zero C functions, but now I'm not sure why
anyone would use them.

3. Drop that one test case from stats_ext.sql; I'm not sure how much
additional test value it's actually bringing.

4. Try to tweak the stats_ext.sql test conditions in some more refined
way to get the test to pass everywhere.  This'd be a lot of work with
no guarantee of success, so I'm not too excited about it.

5. Something else?

regards, tom lane



Re: Changing default value of wal_sync_method to open_datasync on Linux

2018-02-19 Thread Andres Freund
Hi,

On 2018-02-20 00:27:47 +, Tsunakawa, Takayuki wrote:
> I propose changing the default value of wal_sync_method from fdatasync
> to open_datasync on Linux.  The patch is attached.  I'm feeling this
> may be controversial, so I'd like to hear your opinions.

Indeed. My past experience with open_datasync on linux shows it to be
slower by roughly an order of magnitude. Even if that would turn out not
to be the case anymore, I'm *extremely* hesitant to make such a change.

> [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
> (the figures seem oddly high, though; this may be due to some VM
> configuration)

These numbers clearly aren't reliable, there's absolutely no way an hdd
can properly do ~30k syncs/sec.  Until there's reliable numbers this
seems moot.

Greetings,

Andres Freund



Changing default value of wal_sync_method to open_datasync on Linux

2018-02-19 Thread Tsunakawa, Takayuki
Hello,

I propose changing the default value of wal_sync_method from fdatasync to 
open_datasync on Linux.  The patch is attached.  I'm feeling this may be 
controversial, so I'd like to hear your opinions.

The reason for change is better performance.  Robert Haas said open_datasync 
was much faster than fdatasync with NVRAM in this thread:

https://www.postgresql.org/message-id/flat/c20d38e97bcb33dad59e...@lab.ntt.co.jp#c20d38e97bcb33dad59e...@lab.ntt.co.jp

pg_test_fsync shows higher figures for open_datasync:

[SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
--
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 50829.597 ops/sec  20 usecs/op
fdatasync 42094.381 ops/sec  24 usecs/op
fsync  42209.972 ops/sec  
24 usecs/op
fsync_writethroughn/a
open_sync 48669.605 ops/sec  21 usecs/op
--


[HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
(the figures seem oddly high, though; this may be due to some VM configuration)
--
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 34648.778 ops/sec  29 usecs/op
fdatasync 31570.947 ops/sec  32 usecs/op
fsync 27783.283 ops/sec  36 usecs/op
fsync_writethrough  n/a
open_sync 35238.866 ops/sec  28 usecs/op
--


pgbench only shows marginally better results, although the difference is within 
an error range.  The following is the tps of the default read/write workload of 
pgbench.  I ran the test with all the tables and indexes preloaded with 
pg_prewarm (except pgbench_history), and the checkpoint not happening.  I ran a 
write workload before running the benchmark so that no new WAL file would be 
created during the benchmark run.

[SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered]
--
   1  2  3avg
fdatasync  17610  17164  16678  17150
open_datasync  17847  17457  17958  17754 (+3%)

[HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback]
(the figures seem oddly high, though; this may be due to some VM configuration)
--
  1 2 3   avg
fdatasync  4911  5225  5198  5111
open_datasync  4996  5284  5317  5199 (+1%)


As the removed comment describes, when wal_sync_method is open_datasync (or 
open_sync), open() fails with errno=EINVAL if the ext4 volume is mounted with 
data=journal.  That's because open() specifies O_DIRECT in that case.  I don't 
think that's a problem in practice, because data=journal will not be used for 
performance, and wal_level needs to be changed from its default replica to 
minimal and max_wal_senders must be set to 0 for O_DIRECT to be used.


Regards
Takayuki Tsunakawa



default_wal_sync_method.patch
Description: default_wal_sync_method.patch


Re: SHA-2 functions

2018-02-19 Thread Michael Paquier
On Mon, Feb 19, 2018 at 03:02:02PM -0500, Peter Eisentraut wrote:
> On 2/19/18 09:06, Aleksander Alekseev wrote:
>>> So I suggest these patches that expose the new functions sha224(),
>>> sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
>>> tests more robust, and it will allow them to be used in general purpose
>>> contexts over md5().
>> 
>> Nice patch. I wonder though whether tests should include hashing
>> non-ASCII characters as well.
> 
> The input is of type bytea, so the concept of non-ASCII characters
> doesn't quite apply.

Encoding issues is a reason to use bytea output in some cases.  For
logical decoding this is for example also why an interface like
pg_logical_slot_peek_binary_changes() exists...  Back to the patch.
--
Michael


signature.asc
Description: PGP signature


Re: [PROPOSAL] Nepali Snowball dictionary

2018-02-19 Thread Oleg Bartunov
On Tue, Feb 20, 2018 at 12:01 AM, Arthur Zakirov
 wrote:
> Thank you for your answer!
>
> 2018-02-19 18:43 GMT+03:00 Tom Lane :
>> We are not the upstream for the snowball stuff, and lack the expertise
>> to decide whether proposed changes are any good.  To get anything
>> changed there, you'd have to get it approved by the snowball group.
>>
>> As best I know, the original list
>> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
>> is moribund, but there's a fork at
>> http://snowballstem.org
>> that has at least some activity.
>
> From the original list it seems that http://snowballstem.org is
> frozen. But development work continues at
> https://github.com/snowballstem by other people.
> I'll try to send them a pull request.
>
>> Probably the first step ought to involve syncing our copy with the
>> current state of that upstream, something that's not been done in a
>> very long time :-(
>
> I think I will try to sync snowball dictionaries with snowballstem.org
> algorithms, it may be useful. Or maybe it is better to sync with the
> github repository. I don't aware how they differ yet, though.

That may be dangerous !

>
> --
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>



Re: pgsql: Allow UNIQUE indexes on partitioned tables

2018-02-19 Thread David G. Johnston
I found the following change to be confusing.

/doc/src/sgml/ref/alter_table.sgml
+   
+Additional restrictions apply when unique indexes are applied to
+partitioned tables; see .
+   

That paragraph appears in the section covering "ALTER TABLE name ADD
table_constraint_using_index"

However, the code says:

/src/backend/commands/tablecmds.c
+   /*
+* Doing this on partitioned tables is not a simple feature to
implement,
+* so let's punt for now.
+*/
+   if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("ALTER TABLE / ADD CONSTRAINT USING INDEX is not
supported on partitioned tables")));

I was expecting the doc for ADD CONSTRAINT USING INDEX to note the
limitation explicitly - in lieu of the above paragraph.

Also, I cannot reason out what the following limitation means:

/doc/src/sgml/ref/create_table.sgml
+  If any partitions are in turn partitioned, all columns of each
partition
+  key are considered at each level below the UNIQUE
+  constraint.


As an aside, adding a link to "Data Definiton/Table Partitioning" from at
least CREATE TABLE ... PARTITION BY; and swapping "PARTITION BY" and
"PARTITION OF" in the Parameters section of that page - one must partition
by a table before one can partition it (and also the synopsis lists them in
the BY before OF order), would be helpful.

David J.


On Mon, Feb 19, 2018 at 1:40 PM, Alvaro Herrera 
wrote:

> Allow UNIQUE indexes on partitioned tables
>
> If we restrict unique constraints on partitioned tables so that they
> must always include the partition key, then our standard approach to
> unique indexes already works --- each unique key is forced to exist
> within a single partition, so enforcing the unique restriction in each
> index individually is enough to have it enforced globally.  Therefore we
> can implement unique indexes on partitions by simply removing a few
> restrictions (and adding others.)
>
> Discussion: https://postgr.es/m/2017112921.hi6hg6pem2w2t36z@alvherre.
> pgsql
> Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre.
> pgsql
> Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime
> Casanova, Amit Langote
>
> Branch
> --
> master
>
> Details
> ---
> https://git.postgresql.org/pg/commitdiff/eb7ed3f3063401496e4aa4bd68fa33
> f0be31a72f
>
> Modified Files
> --
> doc/src/sgml/ddl.sgml |   9 +-
> doc/src/sgml/ref/alter_table.sgml |  15 +-
> doc/src/sgml/ref/create_index.sgml|   5 +
> doc/src/sgml/ref/create_table.sgml|  18 +-
> src/backend/bootstrap/bootparse.y |   2 +
> src/backend/catalog/index.c   |  50 -
> src/backend/catalog/pg_constraint.c   |  76 +++
> src/backend/catalog/toasting.c|   4 +-
> src/backend/commands/indexcmds.c  | 125 +--
> src/backend/commands/tablecmds.c  |  71 ++-
> src/backend/parser/analyze.c  |   7 +
> src/backend/parser/parse_utilcmd.c|  31 +--
> src/backend/tcop/utility.c|   1 +
> src/bin/pg_dump/t/002_pg_dump.pl  |  65 ++
> src/include/catalog/index.h   |   5 +-
> src/include/catalog/pg_constraint_fn.h|   4 +-
> src/include/commands/defrem.h |   1 +
> src/include/parser/parse_utilcmd.h|   3 +-
> src/test/regress/expected/alter_table.out |   8 -
> src/test/regress/expected/create_index.out|   6 +
> src/test/regress/expected/create_table.out|  12 --
> src/test/regress/expected/indexing.out| 294
> +-
> src/test/regress/expected/insert_conflict.out |   2 +-
> src/test/regress/sql/alter_table.sql  |   2 -
> src/test/regress/sql/create_index.sql |   6 +
> src/test/regress/sql/create_table.sql |   8 -
> src/test/regress/sql/indexing.sql | 172 ++-
> 27 files changed, 907 insertions(+), 95 deletions(-)
>
>


Re: [PROPOSAL] Nepali Snowball dictionary

2018-02-19 Thread Arthur Zakirov
Thank you for your answer!

2018-02-19 18:43 GMT+03:00 Tom Lane :
> We are not the upstream for the snowball stuff, and lack the expertise
> to decide whether proposed changes are any good.  To get anything
> changed there, you'd have to get it approved by the snowball group.
>
> As best I know, the original list
> http://lists.tartarus.org/mailman/listinfo/snowball-discuss
> is moribund, but there's a fork at
> http://snowballstem.org
> that has at least some activity.

>From the original list it seems that http://snowballstem.org is
frozen. But development work continues at
https://github.com/snowballstem by other people.
I'll try to send them a pull request.

> Probably the first step ought to involve syncing our copy with the
> current state of that upstream, something that's not been done in a
> very long time :-(

I think I will try to sync snowball dictionaries with snowballstem.org
algorithms, it may be useful. Or maybe it is better to sync with the
github repository. I don't aware how they differ yet, though.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: unique indexes on partitioned tables

2018-02-19 Thread Alvaro Herrera
I pushed this now, with fixes for the last few comments there were.

Peter Eisentraut wrote:

> I don't understand the variable name "third".  I don't see a "first" or
> "second" nearby.

Hah.  That was referring to variables "myself" and "referenced".  I
changed the variable name to "parentConstr".

> I find some of the columns in pg_constraint confusing.  For a primary
> key on a partitioned table, for the PK on the partition I get
> 
> conislocal = false, coninhcount = 1, connoinherit = true
> 
> The last part is confusing to me.

Yeah, I think it was patently wrong.  I changed it so that connoinherit
becomes true in this case.

Alvaro Herrera wrote:
> Jaime Casanova wrote:

> > also noted that if you:
> > 
> > """
> > create table t1(i int) partition by hash (i);
> > create table t1_0 partition of t1 for values with (modulus 2, remainder 0);
> > create table t1_1 partition of t1 for values with (modulus 2, remainder 1);
> > create unique index on t1(i);
> > alter table t1 add primary key using index t1_i_idx ;
> > """
> > 
> > the ALTER TABLE ADD PK does not recurse to partitions, which maybe is
> > perfectly fine because i'm using USING INDEX but it feels like an
> > oversight to me
> 
> Ouch.  Yeah, this is a bug.  I'll try to come up with something.

After looking at it for a few minutes I determined that adding this
feature requires some more work: you need to iterate on all partitions,
obtain the corresponding index, cons up a few fake parse nodes, then
recurse to create the PK in the children.  I think this should be doable
with a couple dozen lines of code, but it's a refinement that can be
added on top.  Care to submit a patch?  In the meantime, I added an
ereport(ERROR) to avoid leaving the system in an inconsistent state
(that probably would not even be reproduced correctly by pg_dump).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SHA-2 functions

2018-02-19 Thread Peter Eisentraut
On 2/19/18 09:06, Aleksander Alekseev wrote:
>> So I suggest these patches that expose the new functions sha224(),
>> sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
>> tests more robust, and it will allow them to be used in general purpose
>> contexts over md5().
> 
> Nice patch. I wonder though whether tests should include hashing
> non-ASCII characters as well.

The input is of type bytea, so the concept of non-ASCII characters
doesn't quite apply.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: JIT compiling with LLVM v10.1

2018-02-19 Thread Jesper Pedersen

Hi,

On 02/14/2018 01:17 PM, Andres Freund wrote:

On 2018-02-07 06:54:05 -0800, Andres Freund wrote:

I've pushed v10.0. The big (and pretty painful to make) change is that
now all the LLVM specific code lives in src/backend/jit/llvm, which is
built as a shared library which is loaded on demand.



I thought

https://db.in.tum.de/~leis/papers/adaptiveexecution.pdf?lang=en

was relevant for this thread.

Best regards,
 Jesper



Re: unique indexes on partitioned tables

2018-02-19 Thread Alvaro Herrera
Jaime Casanova wrote:

> Hi Álvaro,
> 
> attached a tiny patch (on top of yours) that silence two "variables
> uninitilized" warnings.

Thanks!  Applied.

> also noted that if you:
> 
> """
> create table t1(i int) partition by hash (i);
> create table t1_0 partition of t1 for values with (modulus 2, remainder 0);
> create table t1_1 partition of t1 for values with (modulus 2, remainder 1);
> create unique index on t1(i);
> alter table t1 add primary key using index t1_i_idx ;
> """
> 
> the ALTER TABLE ADD PK does not recurse to partitions, which maybe is
> perfectly fine because i'm using USING INDEX but it feels like an
> oversight to me

Ouch.  Yeah, this is a bug.  I'll try to come up with something.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: unique indexes on partitioned tables

2018-02-19 Thread Jaime Casanova
On 12 February 2018 at 15:26, Alvaro Herrera  wrote:
> Hello,
>
> Thanks, Peter, Jesper, Amit, for reviewing the patch.  Replying to
> all review comments at once:
>

[... v5 of patch attached ...]

Hi Álvaro,

attached a tiny patch (on top of yours) that silence two "variables
uninitilized" warnings.

also noted that if you:

"""
create table t1(i int) partition by hash (i);
create table t1_0 partition of t1 for values with (modulus 2, remainder 0);
create table t1_1 partition of t1 for values with (modulus 2, remainder 1);
create unique index on t1(i);
alter table t1 add primary key using index t1_i_idx ;
"""

the ALTER TABLE ADD PK does not recurse to partitions, which maybe is
perfectly fine because i'm using USING INDEX but it feels like an
oversight to me

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 19233b68cb..677e9cabf8 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 14177,14183  AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
  		 */
  		for (i = 0; i < list_length(attachRelIdxs); i++)
  		{
! 			Oid		cldConstrOid;
  
  			/* does this index have a parent?  if so, can't use it */
  			if (has_superclass(RelationGetRelid(attachrelIdxRels[i])))
--- 14177,14183 
  		 */
  		for (i = 0; i < list_length(attachRelIdxs); i++)
  		{
! 			Oid		cldConstrOid = InvalidOid;
  
  			/* does this index have a parent?  if so, can't use it */
  			if (has_superclass(RelationGetRelid(attachrelIdxRels[i])))
***
*** 14475,14481  ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
  		int			i;
  		PartitionDesc partDesc;
  		Oid			constraintOid,
! 	cldConstrId;
  
  		/*
  		 * If this partition already has an index attached, refuse the operation.
--- 14475,14481 
  		int			i;
  		PartitionDesc partDesc;
  		Oid			constraintOid,
! 	cldConstrId = InvalidOid;
  
  		/*
  		 * If this partition already has an index attached, refuse the operation.


Re: pgbench - allow to specify scale as a size

2018-02-19 Thread Fabien COELHO


Hello Mark,


What if we consider using ascii (utf8?) text file sizes as a reference
point, something independent from the database?


Why not.

TPC-B basically specifies that rows (accounts, tellers, branches) are all 
padded to 100 bytes, thus we could consider (i.e. document) that 
--scale=SIZE refers to the amount of useful data hold, and warn that 
actual storage would add various overheads for page and row headers, free 
space at the end of pages, indexes...


Then one scale step is 100,000 accounts, 10 tellers and 1 branch, i.e.
100,011 * 100 ~ 9.5 MiB of useful data per scale step.


I realize even if a flat file size can be used as a more consistent
reference across platforms, it doesn't help with accurately determining
the final data file sizes due to any architecture specific nuances or
changes in the backend.  But perhaps it might still offer a little more
meaning to be able to say "50 gigabytes of bank account data" rather
than "10 million rows of bank accounts" when thinking about size over
cardinality.


Yep.

Now the overhead is really 60-65%. Although the specification is 
unambiguous, but we still need some maths to know whether it fits in 
buffers or memory... The point of Karel regression is to take this into 
account.


Also, whether this option would be more admissible to Tom is still an open 
question. Tom?


--
Fabien.



Re: extern keyword incorrectly used in some function definitions

2018-02-19 Thread Tom Lane
David Rowley  writes:
> While poking around partition.c I noticed that one of the functions
> there is *defined* as "extern". Normally we'd only do this in the
> declaration of the function. I don't really see why it's needed in the
> definition.
> Anyway, I removed it. I then thought I'd better look around for any
> others too...
> A patch is attached which removes them all, apart from the snowball
> ones. I didn't touch those.

Agreed.  The snowball stuff is just imported from elsewhere, so we're
not going to try to make it conform to project style.  But the rest
of these look good; pushed.

regards, tom lane



Re: pgbench - allow to specify scale as a size

2018-02-19 Thread Mark Wong
On Sat, Feb 17, 2018 at 12:22:37PM -0500, Alvaro Hernandez wrote:
> 
> 
> On 17/02/18 12:17, Tom Lane wrote:
> > Alvaro Hernandez  writes:
> >> On 17/02/18 11:26, Tom Lane wrote:
> >>> Fabien COELHO  writes:
>  Here is a attempt at extending --scale so that it can be given a size.
> >>> I do not actually find this to be a good idea.  It's going to be
> >>> platform-dependent, or not very accurate, or both, and thereby
> >>> contribute to confusion by making results less reproducible.
> >>>
> >>> Plus, what do we do if the backend changes table representation in
> >>> some way that invalidates Kaarel's formula altogether?  More confusion
> >>> would be inevitable.
> >>       Why not then insert a "few" rows, measure size, truncate the table,
> >> compute the formula and then insert to the desired user requested size?
> >> (or insert what should be the minimum, scale 1, measure, and extrapolate
> >> what's missing). It doesn't sound too complicated to me, and targeting a
> >> size is something that I believe it's quite good for user.
> > Then you'd *really* have irreproducible results.
> >
> > regards, tom lane
> 
>      You also have irreproducible results today, according to your 
> criteria. Either you agree on the number of rows but may not agree on 
> the size (today), or you agree on the size but may not agree on the 
> number of rows. Right now you can only pick the former, while I think 
> people would significantly appreciate the latter. If neither is correct, 
> let's at least provide the choice.

What if we consider using ascii (utf8?) text file sizes as a reference
point, something independent from the database?

I realize even if a flat file size can be used as a more consistent
reference across platforms, it doesn't help with accurately determining
the final data file sizes due to any architecture specific nuances or
changes in the backend.  But perhaps it might still offer a little more
meaning to be able to say "50 gigabytes of bank account data" rather
than "10 million rows of bank accounts" when thinking about size over
cardinality.

Regards,
Mark

-- 
Mark Wonghttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services



Re: autovacuum: change priority of the vacuumed tables

2018-02-19 Thread Tomas Vondra


On 02/19/2018 03:38 PM, Ildus Kurbangaliev wrote:
> On Fri, 16 Feb 2018 21:48:14 +0900
> Masahiko Sawada  wrote:
> 
>> On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
>>  wrote:
>>> On Fri, 16 Feb 2018 17:42:34 +0900
>>> Masahiko Sawada  wrote:
>>>  
 On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
  wrote:  
> On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>  
>> Hi,
>>
>> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>>  wrote:  
>>>
>>> Hi,
>>>
>>> Attached patch adds 'autovacuum_table_priority' to the current
>>> list of automatic vacuuming settings. It's used in sorting of
>>> vacuumed tables in autovacuum worker before actual vacuum.
>>>
>>> The idea is to give possibility to the users to prioritize
>>> their tables in autovacuum process.
>>>  
>> Hmm, I couldn't understand the benefit of this patch. Would you
>> elaborate it a little more?
>>
>> Multiple autovacuum worker can work on one database. So even if
>> a table that you want to vacuum first is the back of the list
>> and there other worker would pick up it. If the vacuuming the
>> table gets delayed due to some big tables are in front of that
>> table I think you can deal with it by increasing the number of
>> autovacuum workers.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>>  
>
> Database can contain thousands of tables and often
> updates/deletes concentrate mostly in only a handful of tables.
> Going through thousands of less bloated tables can take ages.
> Currently autovacuum know nothing about prioritizing it`s work
> with respect to user`s understanding of his data and
> application.  

 Understood. I have a question; please imagine the following case.

 Suppose that there are 1000 tables in a database, and one table of
 them (table-A) has the highest priority while other 999 tables have
 same priority. Almost tables (say 800 tables) including table-A
 need to get vacuumed at some point, so with your patch an AV
 worker listed 800 tables and table-A will be at the head of the
 list. Table-A will get vacuumed first but this AV worker has to
 vacuum other 799 tables even if table-A requires vacuum later
 again.

 If an another AV worker launches during table-A being vacuumed, the
 new AV worker would include table-A but would not process it
 because concurrent AV worker is processing it. So it would vacuum
 other tables instead. Similarly, this AV worker can not get the
 new table list until finish to vacuum all other tables. (Note that
 it might skip some tables if they are already vacuumed by other AV
 worker.) On the other hand, if another new AV worker launches
 after table-A got vacuumed and requires vacuuming again, the new
 AV worker puts the table-A at the head of list. It processes
 table-A first but, again, it has to vacuum other tables before
 getting new table list next time that might include table-A.

 Is this the expected behavior? I'd rather expect postgres to
 vacuum it before other lower priority tables whenever the table
 having the highest priority requires vacuuming, but it wouldn't.  
>>>
>>> Yes, this is the expected behavior. The patch is the way to give the
>>> user at least some control of the sorting, later it could be
>>> extended with something more sophisticated.
>>>  
>>
>> Since user doesn't know that each AV worker processes tables based on
>> its table list that is different from lists that other worker has, I
>> think it's hard for user to understand this parameter. I'd say that
>> user would expect that high priority table can get vacuumed any time.
> 
> Yes, very good point. It could be strange for the user in cases like
> that.
> 
>>
>> I think what you want to solve is to vacuum some tables preferentially
>> if there are many tables requiring vacuuming. Right? If so, I think
>> the prioritizing table only in the list would not solve the
>> fundamental issue. In the example, table-A will still need to wait for
>> other 799 tables to get vacuumed. Table-A will be bloating during
>> vacuuming other tables. To deal with it, I think we need something
>> queue on the shmem per database in order to control the order of
>> tables waiting for vacuuming and need to use it with a smart
>> algorithm. Thoughts?
> 
> Agree, it would require some shared queue for the autovacuum workers if
> we want to prioritize the table across all of them. I will look into
> this, and maybe will come up with something.
> 
> Masahiko, are you working on this too or just interested with the idea?
> 

Hello,

I have a hard time 

Re: NEXT VALUE FOR sequence

2018-02-19 Thread Tom Lane
Laurenz Albe  writes:
> The SQL standard has the expression "NEXT VALUE FOR asequence" to do
> what we traditionally do with "nextval('asequence')".
> This is an attempt to implement this on top of the recently introduced
> NextValueExpr node.

This has been proposed repeatedly, and rejected repeatedly, because in
fact the standard's semantics for NEXT VALUE FOR are *not* like nextval().
See SQL:2011 4.22.2 "Operations involving sequence generators":

If there are multiple instances of s specifying
the same sequence generator within a single SQL-statement, all those
instances return the same value for a given row processed by that
SQL-statement.

This is not terribly exact --- what is a "processed row" in a join query,
for instance?  But it's certainly not supposed to act like independent
executions of nextval() or NextValueExpr would.  Pending somebody doing
the legwork to produce something that at least arguably conforms to the
spec's semantics, we've left the syntax unimplemented.

regards, tom lane



Re: autovacuum: change priority of the vacuumed tables

2018-02-19 Thread Ildus Kurbangaliev
On Fri, 16 Feb 2018 21:48:14 +0900
Masahiko Sawada  wrote:

> On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
>  wrote:
> > On Fri, 16 Feb 2018 17:42:34 +0900
> > Masahiko Sawada  wrote:
> >  
> >> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
> >>  wrote:  
> >> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
> >> >  
> >> >> Hi,
> >> >>
> >> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
> >> >>  wrote:  
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> Attached patch adds 'autovacuum_table_priority' to the current
> >> >>> list of automatic vacuuming settings. It's used in sorting of
> >> >>> vacuumed tables in autovacuum worker before actual vacuum.
> >> >>>
> >> >>> The idea is to give possibility to the users to prioritize
> >> >>> their tables in autovacuum process.
> >> >>>  
> >> >> Hmm, I couldn't understand the benefit of this patch. Would you
> >> >> elaborate it a little more?
> >> >>
> >> >> Multiple autovacuum worker can work on one database. So even if
> >> >> a table that you want to vacuum first is the back of the list
> >> >> and there other worker would pick up it. If the vacuuming the
> >> >> table gets delayed due to some big tables are in front of that
> >> >> table I think you can deal with it by increasing the number of
> >> >> autovacuum workers.
> >> >>
> >> >> Regards,
> >> >>
> >> >> --
> >> >> Masahiko Sawada
> >> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> >> >> NTT Open Source Software Center
> >> >>  
> >> >
> >> > Database can contain thousands of tables and often
> >> > updates/deletes concentrate mostly in only a handful of tables.
> >> > Going through thousands of less bloated tables can take ages.
> >> > Currently autovacuum know nothing about prioritizing it`s work
> >> > with respect to user`s understanding of his data and
> >> > application.  
> >>
> >> Understood. I have a question; please imagine the following case.
> >>
> >> Suppose that there are 1000 tables in a database, and one table of
> >> them (table-A) has the highest priority while other 999 tables have
> >> same priority. Almost tables (say 800 tables) including table-A
> >> need to get vacuumed at some point, so with your patch an AV
> >> worker listed 800 tables and table-A will be at the head of the
> >> list. Table-A will get vacuumed first but this AV worker has to
> >> vacuum other 799 tables even if table-A requires vacuum later
> >> again.
> >>
> >> If an another AV worker launches during table-A being vacuumed, the
> >> new AV worker would include table-A but would not process it
> >> because concurrent AV worker is processing it. So it would vacuum
> >> other tables instead. Similarly, this AV worker can not get the
> >> new table list until finish to vacuum all other tables. (Note that
> >> it might skip some tables if they are already vacuumed by other AV
> >> worker.) On the other hand, if another new AV worker launches
> >> after table-A got vacuumed and requires vacuuming again, the new
> >> AV worker puts the table-A at the head of list. It processes
> >> table-A first but, again, it has to vacuum other tables before
> >> getting new table list next time that might include table-A.
> >>
> >> Is this the expected behavior? I'd rather expect postgres to
> >> vacuum it before other lower priority tables whenever the table
> >> having the highest priority requires vacuuming, but it wouldn't.  
> >
> > Yes, this is the expected behavior. The patch is the way to give the
> > user at least some control of the sorting, later it could be
> > extended with something more sophisticated.
> >  
> 
> Since user doesn't know that each AV worker processes tables based on
> its table list that is different from lists that other worker has, I
> think it's hard for user to understand this parameter. I'd say that
> user would expect that high priority table can get vacuumed any time.

Yes, very good point. It could be strange for the user in cases like
that.

> 
> I think what you want to solve is to vacuum some tables preferentially
> if there are many tables requiring vacuuming. Right? If so, I think
> the prioritizing table only in the list would not solve the
> fundamental issue. In the example, table-A will still need to wait for
> other 799 tables to get vacuumed. Table-A will be bloating during
> vacuuming other tables. To deal with it, I think we need something
> queue on the shmem per database in order to control the order of
> tables waiting for vacuuming and need to use it with a smart
> algorithm. Thoughts?

Agree, it would require some shared queue for the autovacuum workers if
we want to prioritize the table across all of them. I will look into
this, and maybe will come up with something.

Masahiko, are you working on this too or just interested with the idea?

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian 

Re: CURRENT OF causes an error when IndexOnlyScan is used

2018-02-19 Thread Anastasia Lubennikova

01.02.2018 05:12, Tom Lane:

Yugo Nagata  writes:

I'm sorry the patch attached in the previous mail is broken and
not raises a compile error. I attached the fixed patch.

This patch is almost certainly wrong: you can't assume that the scan-level
state matches the tuple we are currently processing at top level.  Any
sort of delaying action, for instance a sort or materialize node in
between, would break it.

We need to either fix this aspect:


IndexOnlyScan returns a virtual tuple that doesn't have system
column, so we can not get ctid in the same way of other plans.


I'd like to propose the patch that fixes the issue.
We already have a way to return heaptuple from IndexOnlyScan,
but it was not applied to b-tree for some reason.

Attached patch solves the reported bug.
Moreover, it will come in handy for "index with included attributes" 
feature [1],

where we can store long (and even TOASTed) attributes in indextuple.

[1] https://commitfest.postgresql.org/17/1350/

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 8158508..db8a55c 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -374,6 +374,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys)
 	so->currTuples = so->markTuples = NULL;
 
 	scan->xs_itupdesc = RelationGetDescr(rel);
+	scan->xs_hitupdesc = NULL;
+	scan->xs_hitup = NULL;
 
 	scan->opaque = so;
 
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 51dca64..dd3e8b2 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -25,6 +25,7 @@
 #include "utils/tqual.h"
 
 
+static HeapTuple _bt_fetch_tuple(IndexScanDesc scandesc);
 static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir,
 			 OffsetNumber offnum);
 static void _bt_saveitem(BTScanOpaque so, int itemIndex,
@@ -38,6 +39,31 @@ static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir);
 static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp);
 static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir);
 
+/*
+ * Fetch all keys in tuple.
+ * Returns a new HeapTuple containing the originally-indexed data.
+ */
+static HeapTuple
+_bt_fetch_tuple(IndexScanDesc scandesc)
+{
+	Relation index = scandesc->indexRelation;
+	Datum		fetchatt[INDEX_MAX_KEYS];
+	bool		isnull[INDEX_MAX_KEYS];
+	int			i;
+	HeapTuple htuple;
+
+	for (i = 0; i < index->rd_att->natts; i++)
+	{
+		fetchatt[i] = index_getattr(scandesc->xs_itup, i + 1,
+	scandesc->xs_itupdesc, [i]);
+	}
+
+	htuple = heap_form_tuple(scandesc->xs_hitupdesc, fetchatt, isnull);
+	htuple->t_tableOid = scandesc->heapRelation->rd_id;
+
+
+	return htuple;
+}
 
 /*
  *	_bt_drop_lock_and_maybe_pin()
@@ -1105,8 +1131,32 @@ readcomplete:
 	/* OK, itemIndex says what to return */
 	currItem = >currPos.items[so->currPos.itemIndex];
 	scan->xs_ctup.t_self = currItem->heapTid;
+
 	if (scan->xs_want_itup)
+	{
+		if (!scan->xs_hitupdesc)
+		{
+			int natts = RelationGetNumberOfAttributes(scan->indexRelation);
+			int attno;
+			/*
+			* The storage type of the index can be different from the original
+			* datatype being indexed, so we cannot just grab the index's tuple
+			* descriptor. Instead, construct a descriptor with the original data
+			* types.
+			*/
+			scan->xs_hitupdesc = CreateTemplateTupleDesc(natts, false);
+			for (attno = 1; attno <= natts; attno++)
+			{
+TupleDescInitEntry(scan->xs_hitupdesc, attno, NULL,
+scan->indexRelation->rd_opcintype[attno - 1],
+-1, 0);
+			}
+		}
+
 		scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+		scan->xs_hitup = _bt_fetch_tuple(scan);
+		scan->xs_hitup->t_self = currItem->heapTid;
+	}
 
 	return true;
 }
@@ -1155,8 +1205,34 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
 	/* OK, itemIndex says what to return */
 	currItem = >currPos.items[so->currPos.itemIndex];
 	scan->xs_ctup.t_self = currItem->heapTid;
+
 	if (scan->xs_want_itup)
+	{
+		if (!scan->xs_hitupdesc)
+		{
+			int natts = RelationGetNumberOfAttributes(scan->indexRelation);
+			int attno;
+			/*
+			* The storage type of the index can be different from the original
+			* datatype being indexed, so we cannot just grab the index's tuple
+			* descriptor. Instead, construct a descriptor with the original data
+			* types.
+			*/
+			scan->xs_hitupdesc = CreateTemplateTupleDesc(natts, false);
+			for (attno = 1; attno <= natts; attno++)
+			{
+TupleDescInitEntry(scan->xs_hitupdesc, attno, NULL,
+scan->indexRelation->rd_opcintype[attno - 1],
+-1, 0);
+			}
+		}
+
 		scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset);
+		if (scan->xs_hitup)
+			pfree(scan->xs_hitup);
+		scan->xs_hitup = _bt_fetch_tuple(scan);
+		scan->xs_hitup->t_self = currItem->heapTid;
+	}
 
 	

Re: SHA-2 functions

2018-02-19 Thread Joe Conway
On 02/19/2018 08:43 AM, Peter Eisentraut wrote:
> I also noticed while working on some SSL code that we have perfectly
> good SHA-2 functionality in the server already, but it has no test
> coverage outside the SCRAM tests.
> 
> So I suggest these patches that expose the new functions sha224(),
> sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
> tests more robust, and it will allow them to be used in general purpose
> contexts over md5().

I didn't look closely at the patch itself, but +1 on the concept.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


[PROPOSAL] Nepali Snowball dictionary

2018-02-19 Thread Arthur Zakirov
Hello hackers,

I would like to propose nepali snowball dictionary patch.

Nepali is inflectional and derivational language. And it can be stemmed.

initdb also patched, so it can determine default text search
configuration.

Examples:

=# select ts_lexize('nepali_stem', 'लेख्');
 ts_lexize 
---
 {लेख्}

=# select ts_lexize('nepali_stem', 'लेखछेस्');
 ts_lexize 
---
 {लेख}

=# select ts_lexize('nepali_stem', 'लेखे');
 ts_lexize 
---
 {लेखे}

Authors:
- Oleg Bartunov
- Nepali NLP Group

Is it appropriate to add new snowball dictionaries? I'm not sure about
policy of including new snowball dictionaries.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 610b7bf033..c9c7de52ad 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3723,6 +3723,7 @@ Parser: "pg_catalog.default"
  pg_catalog | german_stem | snowball stemmer for german language
  pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
  pg_catalog | italian_stem| snowball stemmer for italian language
+ pg_catalog | nepali_stem | snowball stemmer for nepali language
  pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
  pg_catalog | portuguese_stem | snowball stemmer for portuguese language
  pg_catalog | romanian_stem   | snowball stemmer for romanian language
diff --git a/src/backend/snowball/Makefile b/src/backend/snowball/Makefile
index 50cbace41d..c29f4184f2 100644
--- a/src/backend/snowball/Makefile
+++ b/src/backend/snowball/Makefile
@@ -40,6 +40,7 @@ OBJS= $(WIN32RES) dict_snowball.o api.o utilities.o \
stem_UTF_8_german.o \
stem_UTF_8_hungarian.o \
stem_UTF_8_italian.o \
+   stem_UTF_8_nepali.o \
stem_UTF_8_norwegian.o \
stem_UTF_8_porter.o \
stem_UTF_8_portuguese.o \
@@ -62,6 +63,7 @@ LANGUAGES=  \
german  german  \
hungarian   hungarian   \
italian italian \
+   nepali  nepali  \
norwegian   norwegian   \
portuguese  portuguese  \
romanianromanian\
diff --git a/src/backend/snowball/dict_snowball.c 
b/src/backend/snowball/dict_snowball.c
index 78c9f73ef0..d96c849118 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -49,6 +49,7 @@
 #include "snowball/libstemmer/stem_UTF_8_german.h"
 #include "snowball/libstemmer/stem_UTF_8_hungarian.h"
 #include "snowball/libstemmer/stem_UTF_8_italian.h"
+#include "snowball/libstemmer/stem_UTF_8_nepali.h"
 #include "snowball/libstemmer/stem_UTF_8_norwegian.h"
 #include "snowball/libstemmer/stem_UTF_8_porter.h"
 #include "snowball/libstemmer/stem_UTF_8_portuguese.h"
@@ -102,6 +103,7 @@ static const stemmer_module stemmer_modules[] =
{"german", PG_UTF8, german_UTF_8_create_env, german_UTF_8_close_env, 
german_UTF_8_stem},
{"hungarian", PG_UTF8, hungarian_UTF_8_create_env, 
hungarian_UTF_8_close_env, hungarian_UTF_8_stem},
{"italian", PG_UTF8, italian_UTF_8_create_env, italian_UTF_8_close_env, 
italian_UTF_8_stem},
+   {"nepali", PG_UTF8, nepali_UTF_8_create_env, nepali_UTF_8_close_env, 
nepali_UTF_8_stem},
{"norwegian", PG_UTF8, norwegian_UTF_8_create_env, 
norwegian_UTF_8_close_env, norwegian_UTF_8_stem},
{"porter", PG_UTF8, porter_UTF_8_create_env, porter_UTF_8_close_env, 
porter_UTF_8_stem},
{"portuguese", PG_UTF8, portuguese_UTF_8_create_env, 
portuguese_UTF_8_close_env, portuguese_UTF_8_stem},
diff --git a/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c 
b/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c
new file mode 100644
index 00..f4f6e656ad
--- /dev/null
+++ b/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c
@@ -0,0 +1,440 @@
+/* This file was generated automatically by the Snowball to ISO C compiler */
+/* http://snowballstem.org/ */
+
+#include "header.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+extern int nepali_UTF_8_stem(struct SN_env * z);
+#ifdef __cplusplus
+}
+#endif
+static int r_remove_category_2(struct SN_env * z);
+static int r_remove_category_3(struct SN_env * z);
+static int r_check_category_2(struct SN_env * z);
+static int r_remove_category_1(struct SN_env * z);
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+extern struct SN_env * nepali_UTF_8_create_env(void);
+extern void nepali_UTF_8_close_env(struct SN_env * z);
+
+
+#ifdef __cplusplus
+}
+#endif
+static const symbol s_0_0[6] = { 0xE0, 0xA4, 0x95, 0xE0, 0xA5, 0x80 };
+static const symbol s_0_1[9] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA4, 0xBE, 0xE0, 
0xA4, 0x87 };
+static const symbol s_0_2[6] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA5, 0x87 };
+static const symbol s_0_3[9] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA4, 0xBE, 0xE0, 
0xA4, 0x88 };
+static const symbol s_0_4[6] = { 0xE0, 0xA4, 0x95, 0xE0, 0xA5, 0x88 };
+static const 

Re: SHA-2 functions

2018-02-19 Thread Aleksander Alekseev
Hello Peter,

> So I suggest these patches that expose the new functions sha224(),
> sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
> tests more robust, and it will allow them to be used in general purpose
> contexts over md5().

Nice patch. I wonder though whether tests should include hashing
non-ASCII characters as well.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


SHA-2 functions

2018-02-19 Thread Peter Eisentraut
There was a complaint recently about the documentation using the widely
frowned-upon md5() function in an unrelated context as an example hash
function.  This is quite common in many examples, such as hashing row
values to compare them, or hashing datums if they don't fit into an
index.  But there is nothing we can easily replace md5 with, without
going to things like pgcrypto.

I also noticed while working on some SSL code that we have perfectly
good SHA-2 functionality in the server already, but it has no test
coverage outside the SCRAM tests.

So I suggest these patches that expose the new functions sha224(),
sha256(), sha384(), sha512().  That allows us to make the SSL and SCRAM
tests more robust, and it will allow them to be used in general purpose
contexts over md5().

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 52edfab8f1175c69ba791139fde26feb9279364a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 6 Feb 2018 21:46:46 -0500
Subject: [PATCH 1/2] Add user-callable SHA-2 functions

Add the user-callable functions sha224, sha256, sha384, sha512.  We
already had these in the C code to support SCRAM, but there was no test
coverage outside of the SCRAM tests.  Adding these as user-callable
functions allows writing some tests.  Also, we have a user-callable md5
function but no more modern alternative, which led to wide use of md5 as
a general-purpose hash function, which leads to occasional complaints
about using md5.

Also mark the existing md5 functions as leak-proof.
---
 doc/src/sgml/func.sgml   |  70 -
 src/backend/utils/adt/varlena.c  | 101 +++
 src/include/catalog/pg_proc.h|  12 +++-
 src/test/regress/expected/opr_sanity.out |   6 ++
 src/test/regress/expected/strings.out|  53 
 src/test/regress/sql/strings.sql |  18 ++
 6 files changed, 257 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1e535cf215..bab828d507 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3640,7 +3640,7 @@ Other Binary String Functions
returning the result in hexadecimal
   
   md5(E'Th\\000omas'::bytea)
-  8ab2d3c9689aaf18 b4958c334c82d8b1
+  
8ab2d3c9689aaf18b4958c334c82d8b1
  
 
   
@@ -3674,6 +3674,66 @@ Other Binary String Functions
set_byte(E'Th\\000omas'::bytea, 4, 64)
Th\000o@as
   
+
+  
+   
+
+ sha224
+
+sha224(bytea)
+   
+   bytea
+   
+SHA-224 hash
+   
+   sha224('abc')
+   
\x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7
+  
+
+  
+   
+
+ sha256
+
+sha256(bytea)
+   
+   bytea
+   
+SHA-256 hash
+   
+   sha256('abc')
+   
\xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
+  
+
+  
+   
+
+ sha384
+
+sha384(bytea)
+   
+   bytea
+   
+SHA-384 hash
+   
+   sha384('abc')
+   
\xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7
+  
+
+  
+   
+
+ sha512
+
+sha512(bytea)
+   
+   bytea
+   
+SHA-512 hash
+   
+   sha512('abc')
+   
\xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a964b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f
+  
 

   
@@ -3686,6 +3746,14 @@ Other Binary String Functions
the first byte, and bit 15 is the most significant bit of the second byte.
   
 
+  
+   Note that for historic reasons, the function md5
+   returns a hex-encoded value of type text whereas the SHA-2
+   functions return type bytea.  Use the functions
+   encode and decode to convert
+   between the two.
+  
+
   
See also the aggregate function string_agg in
 and the large object functions
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 304cb26691..d1266f26f5 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_type.h"
 #include "common/int.h"
 #include "common/md5.h"
+#include "common/sha2.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
@@ -4611,6 +4612,106 @@ md5_bytea(PG_FUNCTION_ARGS)
PG_RETURN_TEXT_P(cstring_to_text(hexsum));
 }
 
+/*
+ * SHA-2 variants
+ */
+
+Datum
+sha224_bytea(PG_FUNCTION_ARGS)
+{
+   bytea  *in = PG_GETARG_BYTEA_PP(0);
+   const uint8 *data;
+   size_t  len;
+   pg_sha224_ctx ctx;
+   unsigned char buf[PG_SHA224_DIGEST_LENGTH];
+   bytea  *result;
+
+   len = VARSIZE_ANY_EXHDR(in);
+   data = (unsigned char *) VARDATA_ANY(in);

Re: [HACKERS] path toward faster partition pruning

2018-02-19 Thread David Rowley
On 19 February 2018 at 22:19, Amit Langote
 wrote:
> Attached updated patches.  Thanks again!

Thanks for making those changes.  I've made another pass of v28 and
have a few more comments.

The patch is starting to look good, but there are some new changes in
recent versions which still don't look quite right.

1. This does not fully make sense:

/*
 * Remove the indexes of partitions excluded due to each of
 * those partitions' *all* of allowed datums appearing in
 * keys->ne_datums, that is compared to the partition key
 * using <> operator.
 */

 "each of those partitions' *all* of allowed" is not correct.

Maybe better to write:

/*
 * Remove the indexes of any partitions which cannot possibly
 * contain rows matching the clauses due to key->ne_datums containing
 * all datum values which are allowed in the given partition.  This
 * is only possible to do in LIST partitioning as it's the only
 * partitioning strategy which allows the specification of exact values.
 */

2. Mine does not, but some compilers may complain about
get_partitions_for_keys() result variable being uninitalised in
get_partitions_for_keys. Probably the easiest fix would be to just
assign to NULL in the default case.

3. Did you mean to put this Assert() inside the loop?

memset(keyisnull, false, sizeof(keyisnull));
i = -1;
while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
Assert(i < partnatts);

i will always be -2 at the end of the loop. Seems like a useless
Assert in its current location.

4. Can you add a comment here to say: "Note: LIST partitioning only
supports a single partition key, therefore this function requires no
looping over the partition keys"

/*
 * get_partitions_for_keys_list
 * Return partitions of a list partitioned table for requested keys
 *
 * This interprets the keys and looks up partitions in the partition bound
 * descriptor using the list partitioning semantics.
 */

5. The following comment contains a bit of duplication to the comment
which comes after it. Maybe the following:

/*
* If the query is looking for null keys, there can only be one such
* partition.  Return the same if one exists.
*/

can be changed to:

/* Handle clauses requesting a NULL valued partition key */

6. The following comment does not quite make sense:

/* Exactly matching datum exists. */

Probably better to write:

/* An exact matching datum exists. */

7. "If it's not equal (<)" I think you mean (>), not (<), in:

 * The bound at minoff is <= minkeys, given the way
 * partition_list_bsearch() works.  If it's not equal (<), then
 * increment minoff to make it point to the datum on the right
 * that necessarily satisfies minkeys.  Also do the same if it is
 * equal but minkeys is exclusive.

However, the comment is a bit clumsy. Maybe the following is better?

/*
 * partition_list_bsearch returning a positive number means that
 * minkeys[0] must be greater than or equal to the smallest datum.
 * If we didn't find an exact matching datum (!is_equal) or if the
 * operator used was non-inclusive (>), then in both of these
 * cases we're not interested in the datum pointed to by minoff,
 * but we may start getting matches in the partition which the
 * next datum belongs to, so point to that one instead.  (This may
 * be beyond the last datum in the array, but we'll detect that
 * later.)
 */

8. The following comment could be improved:

* minkeys is greater than the datums of all non-default partitions,
* meaning there isn't one to return.  Return the default partition if
* one exists.

how about:

* The value of minkeys[0] is greater than all of the datums we have
* partitions for.  The only possible partition that could contain a
* match is the default partition.  Return that, if it exists.

9. The following could also be improved:

* The bound at maxoff is <= maxkeys, given the way
* partition_list_bsearch works.  If the bound at maxoff exactly
* matches maxkey (is_equal), but the maxkey is exclusive, then
* decrement maxoff to point to the bound on the left.

how about:

* partition_list_bsearch returning a positive number means that
* maxkeys[0] must be greater than or equal to the smallest datum.
* If the match found is an equal match, but the operator used is
* non-inclusive of that value (<), then the partition belonging
* to maxoff cannot match, so we'll decrement maxoff to point to
* the partition belonging to the previous datum.  We might end up
* decrementing maxoff down to -1, but we'll handle that later.

10. Can you append "  This may not technically be true for some data
types (e.g. integer types), however, we currently lack any sort of
infrastructure to provide us with proofs that would allow us to do
anything smarter here." to:

* For range queries, always include the default list partition,
* because list partitions divide the key space in a discontinuous
* manner, not all values in the given range will have a partition
* assigned.

11. 

Re: NEXT VALUE FOR sequence

2018-02-19 Thread Daniel Verite
Laurenz Albe wrote:

> The SQL standard has the expression "NEXT VALUE FOR asequence" to do
> what we traditionally do with "nextval('asequence')".

The behavior mandated by the standard is that several invocations
of NEXT VALUE on the same sequence on the same output row
must produce the same value. That is:

CREATE SEQUENCE s;
SELECT NEXT VALUE FOR s, NEXT VALUE FOR s
 UNION
SELECT NEXT VALUE FOR s, NEXT VALUE FOR s

should produce
(1,1)
(2,2)

It makes sense that the value does not depend on
the position of the expression as a column.

The trouble of course is that the equivalent with
nextval() would produce instead
(1,2)
(3,4)


There have been previous discussions on the standard syntax
that said that when it will get into postgres, it should go with
the standard conforming semantics.
I guess it would be a much more difficult patch.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Tracking object modification time using event triggers

2018-02-19 Thread Alexander Levsha
Hi all.
I'm lead developer for pgCodeKeeper which is a tool for PostgreSQL database
schema comparison.

In our tool we have a pg_dump-like schema reader for comparing against live
DB instances.
This reader consumes majority of the time the comparison operation takes
and we had an idea to speed it up.
To do this we need to be able to track last modification time of every DB
object and an extension with event triggers seems like a suitable tool for
this.
The extension we've implemented is available, for anyone interested:
https://github.com/pgcodekeeper/pg_dbo_timestamp/

However, we've discovered that event triggers provide almost no data for
GRANT/REVOKE commands, in particular, there's no way to find out which
objects were altered by these commands.
pg_event_trigger_ddl_commands() does provide a pg_ddl_command data which
seems to contain objects list for GRANT, however it seems totally
inaccessible in plpgsql.

This leads to my question: do we need to dive into writing a C function for
our extension to access pg_ddl_command or some other lower-lever
representation? Or can we use something else to solve our task, maybe
avoiding event triggers entirely?

Thanks.
Alexander Levsha


Re: [doc fix] Correct calculation of vm.nr_hugepages

2018-02-19 Thread Justin Pryzby
On Mon, Feb 19, 2018 at 07:05:47AM +, Tsunakawa, Takayuki wrote:
> The current PostgreSQL documentation overestimates the number of huge pages
> (vm.nr_hugepages) because the calculation uses the maximum virtual address
> space.  In practice, huge pages are only used for the anonymous shared memory
> segment.  The attached patch fixes the documentation.

+ pmap 4170 | grep rw-s | grep zero | awk '{print $2}'

One can do with fewer greps:
pryzbyj@pryzbyj:~$ sudo pmap `pgrep -P 1 -u postgres` |awk 
'/rw-s/&&/zero/{print $2}' # check PPID=1
144848K

or:
pryzbyj@pryzbyj:~$ sudo pmap `pgrep -u postgres |sed q` |awk 
'/rw-s/&&/zero/{print $2}' # check any backend, not just postmaster parent
144848K

Or (this is even less clean but gives an alternative which continues to use
/proc directly):
pryzbyj@pryzbyj:~$ sudo cat /proc/`pgrep -P 1 -u postgres`/maps |awk 
--non-decimal-data 'BEGIN{FS="[ -]"} /zero/{a="0x"$1;b="0x"$2;print 
(b-a)/1024"kB"}'
144848kB

BTW when huge_pages=try, I've been verifying hugepages are in use by running:
pryzbyj@pryzbyj:~$ sudo sh -c 'cat /proc/`pgrep -u postgres -P1`/smaps |grep -c 
"KernelPageSize: *2048 kB"' || echo NOT FOUND 
0
NOT FOUND

Justin



Re: Prefix operator for text and spgist support

2018-02-19 Thread Arthur Zakirov
Hello,

On Fri, Feb 02, 2018 at 06:03:27PM +0300, Ildus Kurbangaliev wrote:
> Hi,
> 
> Attached patch introduces prefix operator ^@ for text type. For 'a ^@ b'
> it returns true if 'a' starts with 'b'. Also there is spgist index
> support for this operator.
> 
> It could be useful as an alternative for LIKE for 'something%'
> templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in the
> future. But it would require new strategy for btree.

I've looked at the patch. It is applied and tests pass.

I have a couple comments:

> + if (ptype == Pattern_Type_Prefix)
> + {
> + char*s = TextDatumGetCString(constval);
> + prefix = string_to_const(s, vartype);
> + pstatus = Pattern_Prefix_Partial;
> + rest_selec = 1.0;   /* all */
> + pfree(s);
> + }
> + else
> + pstatus = pattern_fixed_prefix(patt, ptype, collation,
> +
> , _selec);

I think it is better to put Pattern_Type_Prefix processing into
pattern_fixed_prefix() as another case entry.

Secondly, it is worth to fix the documentation. At least here [1]. Maybe
there are another places where documentation should be fixed.


1 - https://www.postgresql.org/docs/current/static/spgist-builtin-opclasses.html

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



NEXT VALUE FOR sequence

2018-02-19 Thread Laurenz Albe
The SQL standard has the expression "NEXT VALUE FOR asequence" to do
what we traditionally do with "nextval('asequence')".

This is an attempt to implement this on top of the recently introduced
NextValueExpr node.

If there is no obvious reason why we would not want that, I'll add it
to the next commitfest.

Is this behavior ok:

test=> CREATE SEQUENCE testseq;
test=> PREPARE x AS SELECT NEXT VALUE FOR testseq;
test=> EXECUTE x;
 ?column? 
--
1
(1 row)

test=> DROP SEQUENCE testseq;
DROP SEQUENCE
test=> EXECUTE x;
ERROR:  could not open relation with OID 24836

If not, what could be done about it?

Yours,
Laurenz AlbeFrom 57e42f0a32be39eea541808979a41ab6feac6b73 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 19 Feb 2018 12:17:59 +0100
Subject: [PATCH] Add support for NEXT VALUE FOR 

NEXT VALUE FOR is the standard SQL expression for getting the next
value from a sequence, which in PostgreSQL traditionally has been
the task of the "nextval" function.

This completes the implementation of SQL standard feature T176.
---
 doc/src/sgml/func.sgml | 10 
 doc/src/sgml/ref/create_sequence.sgml  |  3 ++-
 doc/src/sgml/syntax.sgml   | 18 ++
 src/backend/commands/tablecmds.c   |  2 ++
 src/backend/executor/execExpr.c|  1 +
 src/backend/executor/execExprInterp.c  |  3 ++-
 src/backend/nodes/copyfuncs.c  |  2 ++
 src/backend/nodes/equalfuncs.c |  2 ++
 src/backend/nodes/outfuncs.c   |  2 ++
 src/backend/nodes/readfuncs.c  |  2 ++
 src/backend/parser/gram.y  |  6 +
 src/backend/parser/parse_expr.c| 43 ++
 src/backend/rewrite/rewriteHandler.c   |  2 ++
 src/backend/utils/adt/ruleutils.c  | 12 +++---
 src/include/executor/execExpr.h|  1 +
 src/include/nodes/primnodes.h  |  7 --
 src/test/regress/expected/sequence.out | 32 -
 src/test/regress/sql/sequence.sql  | 12 +-
 18 files changed, 145 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 487c7ff750..e135a05314 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -12095,6 +12095,16 @@ nextval('foo'::text)  foo is looked up at
 This function requires USAGE
 or UPDATE privilege on the sequence.

+
+   
+
+ PostgreSQL also offers the SQL standard compliant expression
+ NEXT VALUE FOR sequence_name
+ which is the same as
+ nextval('sequence_name').
+ See  for details.
+
+   
   
  
 
diff --git a/doc/src/sgml/ref/create_sequence.sgml b/doc/src/sgml/ref/create_sequence.sgml
index 3e0d339c85..2671e80f12 100644
--- a/doc/src/sgml/ref/create_sequence.sgml
+++ b/doc/src/sgml/ref/create_sequence.sgml
@@ -51,7 +51,8 @@ CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] 
+   NEXT VALUE FOR
+
+   
+NEXT VALUE FOR
+   
+
+   
+This expression is used to obtain the next value for a sequence.
+
+NEXT VALUE FOR sequence_name
+
+
+It is equivalent to calling the nextval function.
+   
+
+  
+
   
Expression Evaluation Rules
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 89454d8e80..34a1b0684d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5496,6 +5496,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel,
 
 			nve->seqid = RangeVarGetRelid(colDef->identitySequence, NoLock, false);
 			nve->typeId = typeOid;
+			/* no need to check permissions on the implicit sequence */
+			nve->checkperms = false;
 
 			defval = (Expr *) nve;
 		}
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index c6eb3ebacf..2f6b536d90 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -2104,6 +2104,7 @@ ExecInitExprRec(Expr *node, ExprState *state,
 scratch.opcode = EEOP_NEXTVALUEEXPR;
 scratch.d.nextvalueexpr.seqid = nve->seqid;
 scratch.d.nextvalueexpr.seqtypid = nve->typeId;
+scratch.d.nextvalueexpr.checkperms = nve->checkperms;
 
 ExprEvalPushStep(state, );
 break;
diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index f646fd9c51..1b067eb491 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2355,7 +2355,8 @@ ExecEvalCurrentOfExpr(ExprState *state, ExprEvalStep *op)
 void
 ExecEvalNextValueExpr(ExprState *state, ExprEvalStep *op)
 {
-	int64		newval = nextval_internal(op->d.nextvalueexpr.seqid, false);
+	int64		newval = nextval_internal(op->d.nextvalueexpr.seqid,
+		  op->d.nextvalueexpr.checkperms);
 
 	switch (op->d.nextvalueexpr.seqtypid)
 	{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index bafe0d1071..aaadc8b54b 100644
--- a/src/backend/nodes/copyfuncs.c

extern keyword incorrectly used in some function definitions

2018-02-19 Thread David Rowley
While poking around partition.c I noticed that one of the functions
there is *defined* as "extern". Normally we'd only do this in the
declaration of the function. I don't really see why it's needed in the
definition.

Anyway, I removed it. I then thought I'd better look around for any
others too...

A patch is attached which removes them all, apart from the snowball
ones. I didn't touch those.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_extern_func_defs.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-02-19 Thread David Rowley
On 19 February 2018 at 18:01, David Rowley  wrote:
> On 19 February 2018 at 15:11, Tomas Vondra  
> wrote:
>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual
>> naming for boolean variables.
>
> You're right. I'll change that and post an updated patch. I'll also
> reprocess your email again and try to improve some comments to make
> the intent of the code more clear.

I've made a few changes to the patch. "isproxy" is now "is_proxy".
I've made the comments a bit more verbose at the top of the definition
of struct AppendPath. Also shuffled some comments around in allpaths.c

I've attached both a delta patch with just these changes, and also a
completely new patch based on a recent master.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_singleton_appends_2018-02-19_delta.patch
Description: Binary data


remove_singleton_appends_2018-02-19.patch
Description: Binary data