Re: ALTER TABLE ADD COLUMN fast default
On Tue, Feb 20, 2018 at 5:03 PM, Andrew Dunstanwrote: > http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2 Wow, my c was working overtime. Good thing this won't do anyone any good. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [WIP PATCH] Index scan offset optimisation using visibility map
Hi! I've played with patch. I observe that in some expected scenarios it reduces read buffers significantly. > 14 февр. 2018 г., в 0:04, Michail Nikolaev> написал(а): > Patch updated + rebased on master. check-world is passing. Minor spots: There are some trailing whitespaces at line ends > Offset cannot be optimized because parallel execution I'd replace with > Offset cannot be optimized [in spite of | due to] parallel execution More important thing: now nodeindexonlyscan.c and nodeindexscan.c share more similar code and comments. I do not think it is strictly necessary to avoid, but we certainly have to answer the question: Is it possible to refactor code to avoid duplication? Currently, patch is not invasive, touches only some relevant code. If refactoring will make it shotgun-suregery, I think it is better to avoid it. > Still not sure about questions 0, 2, 3, 4, 5, and 6 from initial mail (about > explain, explain analyze, documentation and optimiser). I think that I'd be cool if EXPLAIN ANALYZE printed heap fetch information if "Index-Only" way was used. But we already have this debug information in form of reduced count in "Buffers". There is nothing to add to plain EXPLAIN, in my opinion. From my point of view, you should add to patch some words here https://www.postgresql.org/docs/current/static/indexes-index-only-scans.html and, if patch will be accepted, here https://wiki.postgresql.org/wiki/Index-only_scans I do not know if it is possible to take into account this optimization in cost estimation. Does Limit node take cost of scanning into startup cost? Thanks! Best regards, Andrey Borodin.
Re: ALTER TABLE ADD COLUMN fast default
Hi, On 2018-02-17 00:23:40 +0100, Tomas Vondra wrote: > Anyway, I consider the performance to be OK. But perhaps Andres could > comment on this too, as he requested the benchmarks. My performance concerns were less about CREATE TABLE related things than about analytics workloads or such, where deforming is the primary bottleneck. I think it should be ok, but doing a before/after tpc-h of scale 5-10 or so wouldn't be a bad thing to verify. Greetings, Andres Freund
Re: ALTER TABLE ADD COLUMN fast default
http://www.publix.com/myaccount/verify?validationCode=889fd4cb-6dbb-4f93-9d4f-c701410cd8a2 On Mon, Feb 19, 2018 at 1:18 PM, David Rowleywrote: > On 19 February 2018 at 13:48, Andrew Dunstan > wrote: >> On Sun, Feb 18, 2018 at 2:52 PM, David Rowley >> wrote: >>> I'll try to spend some time reviewing the code in detail soon. >>> >> >> Thanks. I'll fix the typos in the next version of the patch, hopefully >> after your review. > > Here's a more complete review... I reserve the right to be wrong about > a few things and be nitpicky on a few more... I've dealt with most of these issues. The only change of substance is the suggested change in slot_getmissingattrs(). > Once that's fixed, you could ditch the Min() call in the following code: > > attno = Min(attno, attnum); > > slot_deform_tuple(slot, attno); > > /* > * If tuple doesn't have all the atts indicated by tupleDesc, read the > * rest as NULLs or missing values > */ > if (attno < attnum) > { > slot_getmissingattrs(slot, attno); > } > > And change it to something more like: > > /* > * If tuple doesn't have all the atts indicated by tupleDesc, deform as > * much as possible, then fill the remaining required atts with nulls or > * missing values. > */ > if (attno < attnum) > { > slot_deform_tuple(slot, attno); > slot_getmissingattrs(slot, attno, attnum); // <-- (includes new parameter) > } > else > slot_deform_tuple(slot, attnum); > > Which should result in a small performance increase due to not having > to compare attno < attnum twice. Although, perhaps the average compile > may optimise this anyway. I've not checked. > Maybe it would make a difference, but it seems unlikely to be significant. > 11. Is there a reason why the following code in getmissingattr() can't > be made into a bsearch? > [...] > As far as I can see, the attrmiss is sorted by amnum. But maybe I just > failed to find a case where it's not. > > I'd imagine doing this would further improve Tomas' benchmark score > for the patched version, since he was performing a sum() on the final > column. > > Although, If done, I'm actually holding some regard to the fact that > users may one day complain that their query became slower after a > table rewrite :-) > > Update: I've stumbled upon the following code: > > + /* > + * We have a dependency on the attrdef array being filled in in > + * ascending attnum order. This should be guaranteed by the index > + * driving the scan. But we want to be double sure > + */ > + if (!(attp->attnum > attnum)) > + elog(ERROR, "attribute numbers not ascending"); > > and the code below this seems to also assemble the missing values in > attnum order. > > While I'm here, I'd rather see this if test written as: if > (attp->attnum <= attnum) > > Since the attnum order in the missing values appears to be well > defined in ascending order, you can also simplify the comparison logic > in equalTupleDescs(). The inner-most nested loop shouldn't be > required. > Well, this comment in the existing code in equalTupleDescs() suggests that in fact we can't rely on the attrdef items being in attnum order: /* * We can't assume that the items are always read from the system * catalogs in the same order; so use the adnum field to identify * the matching item to compare. */ And in any case the code out of date since the code no longer ties missing values to default values. So I've removed the comment and the error check. There are probably a few optimisations we can make if we can assume that the missing values are in the Tuple Descriptor are in attnum order. But it's unclear to me why they should be and the attrdef entries not. > > 15. Why not: slot->tts_values[missattnum] = (Datum) 0; > > + slot->tts_values[missattnum] = PointerGetDatum(NULL); > > There are a few other places you've done this. I'm unsure if there's > any sort of standard to follow. > This is a pretty widely used idiom in the source. > > I'll await the next version before looking again. > Thanks attached. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fast_default-v11.patch Description: Binary data
RE: [doc fix] Correct calculation of vm.nr_hugepages
From: Justin Pryzby [mailto:pry...@telsasoft.com] > One can do with fewer greps: > pryzbyj@pryzbyj:~$ sudo pmap `pgrep -P 1 -u postgres` |awk > '/rw-s/&&/zero/{print $2}' # check PPID=1 144848K Thanks, I'd like to take this. Regards Takayuki Tsunakawa hugepage_size_doc_v2.patch Description: hugepage_size_doc_v2.patch
Re: Changing default value of wal_sync_method to open_datasync on Linux
On 2018-02-20 01:56:17 +, Tsunakawa, Takayuki wrote: > Disabling the filesystem barrier is a valid tuning method as the PG manual > says: I don't think it says that: > https://www.postgresql.org/docs/devel/static/wal-reliability.html > > [Excerpt] > Recent SATA drives (those following ATAPI-6 or later) offer a drive cache > flush command (FLUSH CACHE EXT), while SCSI drives have long supported a > similar command SYNCHRONIZE CACHE. These commands are not directly accessible > to PostgreSQL, but some file systems (e.g., ZFS, ext4) can use them to flush > data to the platters on write-back-enabled drives. Unfortunately, such file > systems behave suboptimally when combined with battery-backup unit (BBU) disk > controllers. In such setups, the synchronize command forces all data from the > controller cache to the disks, eliminating much of the benefit of the BBU. > You can run the pg_test_fsync program to see if you are affected. If you are > affected, the performance benefits of the BBU can be regained by turning off > write barriers in the file system or reconfiguring the disk controller, if > that is an option. If write barriers are turned off, make sure the battery > remains functional; a faulty battery can potentially lead to data loss. > Hopefully file system and disk controller designers will eventually address > this suboptimal behavior. Note it's only valid if running with a BBU. In which case the performance measurements you're doing aren't particularly meaningful anyway, as you'd test BBU performance rather than disk performance. Greetings, Andres Freund
Re: SHA-2 functions
On Mon, Feb 19, 2018 at 08:43:44AM -0500, Peter Eisentraut wrote: > I also noticed while working on some SSL code that we have perfectly > good SHA-2 functionality in the server already, but it has no test > coverage outside the SCRAM tests. Yep, the refactoring in src/common/ has been done for SCRAM. As The first versions of the patch were for SCRAM-SHA-1, only SHA-1 functions were moved. With SCRAM-SHA-256, the full set of APIs for 256, 386 and 512 has been moved as they share much code. > So I suggest these patches that expose the new functions sha224(), > sha256(), sha384(), sha512(). That allows us to make the SSL and SCRAM > tests more robust, and it will allow them to be used in general purpose > contexts over md5(). Huge +1 as well. This also makes sure that the non-SSL implementation carried in Postgres is consistent what what OpenSSL has. This would matter as well if new SSL implementations are added in the future. + sha224('abc') + \x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7 Some bytea characters from the hash are not able to show up correctly? This does not result in spaces. + Note that for historic reasons, the function md5 + returns a hex-encoded value of type text whereas the SHA-2 + functions return type bytea. Use the functions + encode and decode to convert + between the two. Adding an example would be nice. varlena.c is already large and messy. I would suggest to split into a new file all the user-facing cryptographic functions, including md5 and hex functions, say in src/backend/utils/adt/crypt.c. -- Michael signature.asc Description: PGP signature
Re: Server crash in pg_replication_slot_advance function
Petr Jelinek wrote: > > Attached patch proposes a required fix. > > > > Looks correct to me. Masahiko Sawada wrote: > The change looks good to me. Thank you. Thanks, pushed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Typo in procarray.c
Hi, Attached patch for fixing $subject. s/replicaton/replication/ Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_typo_in_procarray_c.patch Description: Binary data
RE: Changing default value of wal_sync_method to open_datasync on Linux
From: Andres Freund [mailto:and...@anarazel.de] > Indeed. My past experience with open_datasync on linux shows it to be slower > by roughly an order of magnitude. Even if that would turn out not to be > the case anymore, I'm *extremely* hesitant to make such a change. Thanks for giving so quick feedback. An order of magnitude is surprising. Can you share the environment (Linux distro version, kernel version, filesystem, mount options, workload, etc.)? Do you think of anything that explains the degradation? I think it is reasonable that open_datasync is faster than fdatasync because: * Short transactions like pgbench require less system calls: write()+fdatasync() vs write(). * fdatasync() probably has to scan the page cache for dirty pages. The above differences should be invisible on slow disks, but they will show up on faster storage. I guess that's why Robert said open_datasync was much faster on NVRAM. The manual says that pg_test_fsync is a tool for selecting wal_sync_method value, and it indicates open_datasync is better. Why is fdatasync the default value only on Linux? I don't understand as a user why PostgreSQL does the special handling. If the current behavior of choosing fdatasync by default is due to some deficiency of old kernel and/or filesystem, I think we can change the default so that most users don't have to change wal_sync_method. Regards Takayuki Tsunakawa
Re: Changing default value of wal_sync_method to open_datasync on Linux
On 20/02/18 13:27, Tsunakawa, Takayuki wrote: Hello, I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux. The patch is attached. I'm feeling this may be controversial, so I'd like to hear your opinions. The reason for change is better performance. Robert Haas said open_datasync was much faster than fdatasync with NVRAM in this thread: https://www.postgresql.org/message-id/flat/c20d38e97bcb33dad59e...@lab.ntt.co.jp#c20d38e97bcb33dad59e...@lab.ntt.co.jp pg_test_fsync shows higher figures for open_datasync: [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] -- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 50829.597 ops/sec 20 usecs/op fdatasync 42094.381 ops/sec 24 usecs/op fsync 42209.972 ops/sec 24 usecs/op fsync_writethroughn/a open_sync 48669.605 ops/sec 21 usecs/op -- [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] (the figures seem oddly high, though; this may be due to some VM configuration) -- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 34648.778 ops/sec 29 usecs/op fdatasync 31570.947 ops/sec 32 usecs/op fsync 27783.283 ops/sec 36 usecs/op fsync_writethrough n/a open_sync 35238.866 ops/sec 28 usecs/op -- pgbench only shows marginally better results, although the difference is within an error range. The following is the tps of the default read/write workload of pgbench. I ran the test with all the tables and indexes preloaded with pg_prewarm (except pgbench_history), and the checkpoint not happening. I ran a write workload before running the benchmark so that no new WAL file would be created during the benchmark run. [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] -- 1 2 3avg fdatasync 17610 17164 16678 17150 open_datasync 17847 17457 17958 17754 (+3%) [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] (the figures seem oddly high, though; this may be due to some VM configuration) -- 1 2 3 avg fdatasync 4911 5225 5198 5111 open_datasync 4996 5284 5317 5199 (+1%) As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with errno=EINVAL if the ext4 volume is mounted with data=journal. That's because open() specifies O_DIRECT in that case. I don't think that's a problem in practice, because data=journal will not be used for performance, and wal_level needs to be changed from its default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used. I think the use of 'nobarrier' is probably disabling most/all reliable writing to the devices. What do the numbers look like if use remove this option? regards Mark
Re: master check fails on Windows Server 2008
Marina Polyakovawrites: > On 16-02-2018 19:31, Tom Lane wrote: >> Weird. AFAICS the cost estimates for those two plans should be quite >> different, so this isn't just a matter of the estimates maybe being >> a bit platform-dependent. (And that test has been there nearly a >> year without causing reported problems.) > Thank you very much! Your test showed that hash aggregation was not even > added to the possible paths (see windows_regression.diffs attached). > Exploring this, I found that not allowing float8 to pass by value in > config.pl was crucial for the size of the hash table used in this query > (see diff.patch attached): Ah-hah. I can reproduce the described failure if I configure with --disable-float8-byval on an otherwise 64-bit machine. It appears that the minimum work_mem setting that will allow this query to use a hashagg plan on such a configuration is about 155kB (which squares with the results you show). On the other hand, in a normal 64-bit configuration, work_mem settings of 160kB or more cause other failures (due to other plans switching to hashagg), and on a 32-bit machine I see such failures with work_mem of 150kB or more. So there's basically no setting of work_mem that would make all these tests pass everywhere. I see several things we could do about this: 1. Nothing; just say "sorry, we don't promise that the regression tests pass with no plan differences on nonstandard configurations". Given that --disable-float8-byval has hardly any real-world use, there is not a lot of downside to that. 2. Decide that --disable-float8-byval, and for that matter --disable-float4-byval, have no real-world use at all and take them out. There was some point in those options back when we cared about binary compatibility with version-zero C functions, but now I'm not sure why anyone would use them. 3. Drop that one test case from stats_ext.sql; I'm not sure how much additional test value it's actually bringing. 4. Try to tweak the stats_ext.sql test conditions in some more refined way to get the test to pass everywhere. This'd be a lot of work with no guarantee of success, so I'm not too excited about it. 5. Something else? regards, tom lane
Re: Changing default value of wal_sync_method to open_datasync on Linux
Hi, On 2018-02-20 00:27:47 +, Tsunakawa, Takayuki wrote: > I propose changing the default value of wal_sync_method from fdatasync > to open_datasync on Linux. The patch is attached. I'm feeling this > may be controversial, so I'd like to hear your opinions. Indeed. My past experience with open_datasync on linux shows it to be slower by roughly an order of magnitude. Even if that would turn out not to be the case anymore, I'm *extremely* hesitant to make such a change. > [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] > (the figures seem oddly high, though; this may be due to some VM > configuration) These numbers clearly aren't reliable, there's absolutely no way an hdd can properly do ~30k syncs/sec. Until there's reliable numbers this seems moot. Greetings, Andres Freund
Changing default value of wal_sync_method to open_datasync on Linux
Hello, I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux. The patch is attached. I'm feeling this may be controversial, so I'd like to hear your opinions. The reason for change is better performance. Robert Haas said open_datasync was much faster than fdatasync with NVRAM in this thread: https://www.postgresql.org/message-id/flat/c20d38e97bcb33dad59e...@lab.ntt.co.jp#c20d38e97bcb33dad59e...@lab.ntt.co.jp pg_test_fsync shows higher figures for open_datasync: [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] -- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 50829.597 ops/sec 20 usecs/op fdatasync 42094.381 ops/sec 24 usecs/op fsync 42209.972 ops/sec 24 usecs/op fsync_writethroughn/a open_sync 48669.605 ops/sec 21 usecs/op -- [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] (the figures seem oddly high, though; this may be due to some VM configuration) -- 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync 34648.778 ops/sec 29 usecs/op fdatasync 31570.947 ops/sec 32 usecs/op fsync 27783.283 ops/sec 36 usecs/op fsync_writethrough n/a open_sync 35238.866 ops/sec 28 usecs/op -- pgbench only shows marginally better results, although the difference is within an error range. The following is the tps of the default read/write workload of pgbench. I ran the test with all the tables and indexes preloaded with pg_prewarm (except pgbench_history), and the checkpoint not happening. I ran a write workload before running the benchmark so that no new WAL file would be created during the benchmark run. [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] -- 1 2 3avg fdatasync 17610 17164 16678 17150 open_datasync 17847 17457 17958 17754 (+3%) [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] (the figures seem oddly high, though; this may be due to some VM configuration) -- 1 2 3 avg fdatasync 4911 5225 5198 5111 open_datasync 4996 5284 5317 5199 (+1%) As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with errno=EINVAL if the ext4 volume is mounted with data=journal. That's because open() specifies O_DIRECT in that case. I don't think that's a problem in practice, because data=journal will not be used for performance, and wal_level needs to be changed from its default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used. Regards Takayuki Tsunakawa default_wal_sync_method.patch Description: default_wal_sync_method.patch
Re: SHA-2 functions
On Mon, Feb 19, 2018 at 03:02:02PM -0500, Peter Eisentraut wrote: > On 2/19/18 09:06, Aleksander Alekseev wrote: >>> So I suggest these patches that expose the new functions sha224(), >>> sha256(), sha384(), sha512(). That allows us to make the SSL and SCRAM >>> tests more robust, and it will allow them to be used in general purpose >>> contexts over md5(). >> >> Nice patch. I wonder though whether tests should include hashing >> non-ASCII characters as well. > > The input is of type bytea, so the concept of non-ASCII characters > doesn't quite apply. Encoding issues is a reason to use bytea output in some cases. For logical decoding this is for example also why an interface like pg_logical_slot_peek_binary_changes() exists... Back to the patch. -- Michael signature.asc Description: PGP signature
Re: [PROPOSAL] Nepali Snowball dictionary
On Tue, Feb 20, 2018 at 12:01 AM, Arthur Zakirovwrote: > Thank you for your answer! > > 2018-02-19 18:43 GMT+03:00 Tom Lane : >> We are not the upstream for the snowball stuff, and lack the expertise >> to decide whether proposed changes are any good. To get anything >> changed there, you'd have to get it approved by the snowball group. >> >> As best I know, the original list >> http://lists.tartarus.org/mailman/listinfo/snowball-discuss >> is moribund, but there's a fork at >> http://snowballstem.org >> that has at least some activity. > > From the original list it seems that http://snowballstem.org is > frozen. But development work continues at > https://github.com/snowballstem by other people. > I'll try to send them a pull request. > >> Probably the first step ought to involve syncing our copy with the >> current state of that upstream, something that's not been done in a >> very long time :-( > > I think I will try to sync snowball dictionaries with snowballstem.org > algorithms, it may be useful. Or maybe it is better to sync with the > github repository. I don't aware how they differ yet, though. That may be dangerous ! > > -- > Arthur Zakirov > Postgres Professional: http://www.postgrespro.com > Russian Postgres Company >
Re: pgsql: Allow UNIQUE indexes on partitioned tables
I found the following change to be confusing. /doc/src/sgml/ref/alter_table.sgml + +Additional restrictions apply when unique indexes are applied to +partitioned tables; see . + That paragraph appears in the section covering "ALTER TABLE name ADD table_constraint_using_index" However, the code says: /src/backend/commands/tablecmds.c + /* +* Doing this on partitioned tables is not a simple feature to implement, +* so let's punt for now. +*/ + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), +errmsg("ALTER TABLE / ADD CONSTRAINT USING INDEX is not supported on partitioned tables"))); I was expecting the doc for ADD CONSTRAINT USING INDEX to note the limitation explicitly - in lieu of the above paragraph. Also, I cannot reason out what the following limitation means: /doc/src/sgml/ref/create_table.sgml + If any partitions are in turn partitioned, all columns of each partition + key are considered at each level below the UNIQUE + constraint. As an aside, adding a link to "Data Definiton/Table Partitioning" from at least CREATE TABLE ... PARTITION BY; and swapping "PARTITION BY" and "PARTITION OF" in the Parameters section of that page - one must partition by a table before one can partition it (and also the synopsis lists them in the BY before OF order), would be helpful. David J. On Mon, Feb 19, 2018 at 1:40 PM, Alvaro Herrerawrote: > Allow UNIQUE indexes on partitioned tables > > If we restrict unique constraints on partitioned tables so that they > must always include the partition key, then our standard approach to > unique indexes already works --- each unique key is forced to exist > within a single partition, so enforcing the unique restriction in each > index individually is enough to have it enforced globally. Therefore we > can implement unique indexes on partitions by simply removing a few > restrictions (and adding others.) > > Discussion: https://postgr.es/m/2017112921.hi6hg6pem2w2t36z@alvherre. > pgsql > Discussion: https://postgr.es/m/20171229230607.3iib6b62fn3uaf47@alvherre. > pgsql > Reviewed-by: Simon Riggs, Jesper Pedersen, Peter Eisentraut, Jaime > Casanova, Amit Langote > > Branch > -- > master > > Details > --- > https://git.postgresql.org/pg/commitdiff/eb7ed3f3063401496e4aa4bd68fa33 > f0be31a72f > > Modified Files > -- > doc/src/sgml/ddl.sgml | 9 +- > doc/src/sgml/ref/alter_table.sgml | 15 +- > doc/src/sgml/ref/create_index.sgml| 5 + > doc/src/sgml/ref/create_table.sgml| 18 +- > src/backend/bootstrap/bootparse.y | 2 + > src/backend/catalog/index.c | 50 - > src/backend/catalog/pg_constraint.c | 76 +++ > src/backend/catalog/toasting.c| 4 +- > src/backend/commands/indexcmds.c | 125 +-- > src/backend/commands/tablecmds.c | 71 ++- > src/backend/parser/analyze.c | 7 + > src/backend/parser/parse_utilcmd.c| 31 +-- > src/backend/tcop/utility.c| 1 + > src/bin/pg_dump/t/002_pg_dump.pl | 65 ++ > src/include/catalog/index.h | 5 +- > src/include/catalog/pg_constraint_fn.h| 4 +- > src/include/commands/defrem.h | 1 + > src/include/parser/parse_utilcmd.h| 3 +- > src/test/regress/expected/alter_table.out | 8 - > src/test/regress/expected/create_index.out| 6 + > src/test/regress/expected/create_table.out| 12 -- > src/test/regress/expected/indexing.out| 294 > +- > src/test/regress/expected/insert_conflict.out | 2 +- > src/test/regress/sql/alter_table.sql | 2 - > src/test/regress/sql/create_index.sql | 6 + > src/test/regress/sql/create_table.sql | 8 - > src/test/regress/sql/indexing.sql | 172 ++- > 27 files changed, 907 insertions(+), 95 deletions(-) > >
Re: [PROPOSAL] Nepali Snowball dictionary
Thank you for your answer! 2018-02-19 18:43 GMT+03:00 Tom Lane: > We are not the upstream for the snowball stuff, and lack the expertise > to decide whether proposed changes are any good. To get anything > changed there, you'd have to get it approved by the snowball group. > > As best I know, the original list > http://lists.tartarus.org/mailman/listinfo/snowball-discuss > is moribund, but there's a fork at > http://snowballstem.org > that has at least some activity. >From the original list it seems that http://snowballstem.org is frozen. But development work continues at https://github.com/snowballstem by other people. I'll try to send them a pull request. > Probably the first step ought to involve syncing our copy with the > current state of that upstream, something that's not been done in a > very long time :-( I think I will try to sync snowball dictionaries with snowballstem.org algorithms, it may be useful. Or maybe it is better to sync with the github repository. I don't aware how they differ yet, though. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Re: unique indexes on partitioned tables
I pushed this now, with fixes for the last few comments there were. Peter Eisentraut wrote: > I don't understand the variable name "third". I don't see a "first" or > "second" nearby. Hah. That was referring to variables "myself" and "referenced". I changed the variable name to "parentConstr". > I find some of the columns in pg_constraint confusing. For a primary > key on a partitioned table, for the PK on the partition I get > > conislocal = false, coninhcount = 1, connoinherit = true > > The last part is confusing to me. Yeah, I think it was patently wrong. I changed it so that connoinherit becomes true in this case. Alvaro Herrera wrote: > Jaime Casanova wrote: > > also noted that if you: > > > > """ > > create table t1(i int) partition by hash (i); > > create table t1_0 partition of t1 for values with (modulus 2, remainder 0); > > create table t1_1 partition of t1 for values with (modulus 2, remainder 1); > > create unique index on t1(i); > > alter table t1 add primary key using index t1_i_idx ; > > """ > > > > the ALTER TABLE ADD PK does not recurse to partitions, which maybe is > > perfectly fine because i'm using USING INDEX but it feels like an > > oversight to me > > Ouch. Yeah, this is a bug. I'll try to come up with something. After looking at it for a few minutes I determined that adding this feature requires some more work: you need to iterate on all partitions, obtain the corresponding index, cons up a few fake parse nodes, then recurse to create the PK in the children. I think this should be doable with a couple dozen lines of code, but it's a refinement that can be added on top. Care to submit a patch? In the meantime, I added an ereport(ERROR) to avoid leaving the system in an inconsistent state (that probably would not even be reproduced correctly by pg_dump). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SHA-2 functions
On 2/19/18 09:06, Aleksander Alekseev wrote: >> So I suggest these patches that expose the new functions sha224(), >> sha256(), sha384(), sha512(). That allows us to make the SSL and SCRAM >> tests more robust, and it will allow them to be used in general purpose >> contexts over md5(). > > Nice patch. I wonder though whether tests should include hashing > non-ASCII characters as well. The input is of type bytea, so the concept of non-ASCII characters doesn't quite apply. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: JIT compiling with LLVM v10.1
Hi, On 02/14/2018 01:17 PM, Andres Freund wrote: On 2018-02-07 06:54:05 -0800, Andres Freund wrote: I've pushed v10.0. The big (and pretty painful to make) change is that now all the LLVM specific code lives in src/backend/jit/llvm, which is built as a shared library which is loaded on demand. I thought https://db.in.tum.de/~leis/papers/adaptiveexecution.pdf?lang=en was relevant for this thread. Best regards, Jesper
Re: unique indexes on partitioned tables
Jaime Casanova wrote: > Hi Álvaro, > > attached a tiny patch (on top of yours) that silence two "variables > uninitilized" warnings. Thanks! Applied. > also noted that if you: > > """ > create table t1(i int) partition by hash (i); > create table t1_0 partition of t1 for values with (modulus 2, remainder 0); > create table t1_1 partition of t1 for values with (modulus 2, remainder 1); > create unique index on t1(i); > alter table t1 add primary key using index t1_i_idx ; > """ > > the ALTER TABLE ADD PK does not recurse to partitions, which maybe is > perfectly fine because i'm using USING INDEX but it feels like an > oversight to me Ouch. Yeah, this is a bug. I'll try to come up with something. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: unique indexes on partitioned tables
On 12 February 2018 at 15:26, Alvaro Herrerawrote: > Hello, > > Thanks, Peter, Jesper, Amit, for reviewing the patch. Replying to > all review comments at once: > [... v5 of patch attached ...] Hi Álvaro, attached a tiny patch (on top of yours) that silence two "variables uninitilized" warnings. also noted that if you: """ create table t1(i int) partition by hash (i); create table t1_0 partition of t1 for values with (modulus 2, remainder 0); create table t1_1 partition of t1 for values with (modulus 2, remainder 1); create unique index on t1(i); alter table t1 add primary key using index t1_i_idx ; """ the ALTER TABLE ADD PK does not recurse to partitions, which maybe is perfectly fine because i'm using USING INDEX but it feels like an oversight to me -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 19233b68cb..677e9cabf8 100644 *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 14177,14183 AttachPartitionEnsureIndexes(Relation rel, Relation attachrel) */ for (i = 0; i < list_length(attachRelIdxs); i++) { ! Oid cldConstrOid; /* does this index have a parent? if so, can't use it */ if (has_superclass(RelationGetRelid(attachrelIdxRels[i]))) --- 14177,14183 */ for (i = 0; i < list_length(attachRelIdxs); i++) { ! Oid cldConstrOid = InvalidOid; /* does this index have a parent? if so, can't use it */ if (has_superclass(RelationGetRelid(attachrelIdxRels[i]))) *** *** 14475,14481 ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name) int i; PartitionDesc partDesc; Oid constraintOid, ! cldConstrId; /* * If this partition already has an index attached, refuse the operation. --- 14475,14481 int i; PartitionDesc partDesc; Oid constraintOid, ! cldConstrId = InvalidOid; /* * If this partition already has an index attached, refuse the operation.
Re: pgbench - allow to specify scale as a size
Hello Mark, What if we consider using ascii (utf8?) text file sizes as a reference point, something independent from the database? Why not. TPC-B basically specifies that rows (accounts, tellers, branches) are all padded to 100 bytes, thus we could consider (i.e. document) that --scale=SIZE refers to the amount of useful data hold, and warn that actual storage would add various overheads for page and row headers, free space at the end of pages, indexes... Then one scale step is 100,000 accounts, 10 tellers and 1 branch, i.e. 100,011 * 100 ~ 9.5 MiB of useful data per scale step. I realize even if a flat file size can be used as a more consistent reference across platforms, it doesn't help with accurately determining the final data file sizes due to any architecture specific nuances or changes in the backend. But perhaps it might still offer a little more meaning to be able to say "50 gigabytes of bank account data" rather than "10 million rows of bank accounts" when thinking about size over cardinality. Yep. Now the overhead is really 60-65%. Although the specification is unambiguous, but we still need some maths to know whether it fits in buffers or memory... The point of Karel regression is to take this into account. Also, whether this option would be more admissible to Tom is still an open question. Tom? -- Fabien.
Re: extern keyword incorrectly used in some function definitions
David Rowleywrites: > While poking around partition.c I noticed that one of the functions > there is *defined* as "extern". Normally we'd only do this in the > declaration of the function. I don't really see why it's needed in the > definition. > Anyway, I removed it. I then thought I'd better look around for any > others too... > A patch is attached which removes them all, apart from the snowball > ones. I didn't touch those. Agreed. The snowball stuff is just imported from elsewhere, so we're not going to try to make it conform to project style. But the rest of these look good; pushed. regards, tom lane
Re: pgbench - allow to specify scale as a size
On Sat, Feb 17, 2018 at 12:22:37PM -0500, Alvaro Hernandez wrote: > > > On 17/02/18 12:17, Tom Lane wrote: > > Alvaro Hernandezwrites: > >> On 17/02/18 11:26, Tom Lane wrote: > >>> Fabien COELHO writes: > Here is a attempt at extending --scale so that it can be given a size. > >>> I do not actually find this to be a good idea. It's going to be > >>> platform-dependent, or not very accurate, or both, and thereby > >>> contribute to confusion by making results less reproducible. > >>> > >>> Plus, what do we do if the backend changes table representation in > >>> some way that invalidates Kaarel's formula altogether? More confusion > >>> would be inevitable. > >> Why not then insert a "few" rows, measure size, truncate the table, > >> compute the formula and then insert to the desired user requested size? > >> (or insert what should be the minimum, scale 1, measure, and extrapolate > >> what's missing). It doesn't sound too complicated to me, and targeting a > >> size is something that I believe it's quite good for user. > > Then you'd *really* have irreproducible results. > > > > regards, tom lane > > You also have irreproducible results today, according to your > criteria. Either you agree on the number of rows but may not agree on > the size (today), or you agree on the size but may not agree on the > number of rows. Right now you can only pick the former, while I think > people would significantly appreciate the latter. If neither is correct, > let's at least provide the choice. What if we consider using ascii (utf8?) text file sizes as a reference point, something independent from the database? I realize even if a flat file size can be used as a more consistent reference across platforms, it doesn't help with accurately determining the final data file sizes due to any architecture specific nuances or changes in the backend. But perhaps it might still offer a little more meaning to be able to say "50 gigabytes of bank account data" rather than "10 million rows of bank accounts" when thinking about size over cardinality. Regards, Mark -- Mark Wonghttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services
Re: autovacuum: change priority of the vacuumed tables
On 02/19/2018 03:38 PM, Ildus Kurbangaliev wrote: > On Fri, 16 Feb 2018 21:48:14 +0900 > Masahiko Sawadawrote: > >> On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev >> wrote: >>> On Fri, 16 Feb 2018 17:42:34 +0900 >>> Masahiko Sawada wrote: >>> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin wrote: > On 02/15/2018 09:28 AM, Masahiko Sawada wrote: > >> Hi, >> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev >> wrote: >>> >>> Hi, >>> >>> Attached patch adds 'autovacuum_table_priority' to the current >>> list of automatic vacuuming settings. It's used in sorting of >>> vacuumed tables in autovacuum worker before actual vacuum. >>> >>> The idea is to give possibility to the users to prioritize >>> their tables in autovacuum process. >>> >> Hmm, I couldn't understand the benefit of this patch. Would you >> elaborate it a little more? >> >> Multiple autovacuum worker can work on one database. So even if >> a table that you want to vacuum first is the back of the list >> and there other worker would pick up it. If the vacuuming the >> table gets delayed due to some big tables are in front of that >> table I think you can deal with it by increasing the number of >> autovacuum workers. >> >> Regards, >> >> -- >> Masahiko Sawada >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >> NTT Open Source Software Center >> > > Database can contain thousands of tables and often > updates/deletes concentrate mostly in only a handful of tables. > Going through thousands of less bloated tables can take ages. > Currently autovacuum know nothing about prioritizing it`s work > with respect to user`s understanding of his data and > application. Understood. I have a question; please imagine the following case. Suppose that there are 1000 tables in a database, and one table of them (table-A) has the highest priority while other 999 tables have same priority. Almost tables (say 800 tables) including table-A need to get vacuumed at some point, so with your patch an AV worker listed 800 tables and table-A will be at the head of the list. Table-A will get vacuumed first but this AV worker has to vacuum other 799 tables even if table-A requires vacuum later again. If an another AV worker launches during table-A being vacuumed, the new AV worker would include table-A but would not process it because concurrent AV worker is processing it. So it would vacuum other tables instead. Similarly, this AV worker can not get the new table list until finish to vacuum all other tables. (Note that it might skip some tables if they are already vacuumed by other AV worker.) On the other hand, if another new AV worker launches after table-A got vacuumed and requires vacuuming again, the new AV worker puts the table-A at the head of list. It processes table-A first but, again, it has to vacuum other tables before getting new table list next time that might include table-A. Is this the expected behavior? I'd rather expect postgres to vacuum it before other lower priority tables whenever the table having the highest priority requires vacuuming, but it wouldn't. >>> >>> Yes, this is the expected behavior. The patch is the way to give the >>> user at least some control of the sorting, later it could be >>> extended with something more sophisticated. >>> >> >> Since user doesn't know that each AV worker processes tables based on >> its table list that is different from lists that other worker has, I >> think it's hard for user to understand this parameter. I'd say that >> user would expect that high priority table can get vacuumed any time. > > Yes, very good point. It could be strange for the user in cases like > that. > >> >> I think what you want to solve is to vacuum some tables preferentially >> if there are many tables requiring vacuuming. Right? If so, I think >> the prioritizing table only in the list would not solve the >> fundamental issue. In the example, table-A will still need to wait for >> other 799 tables to get vacuumed. Table-A will be bloating during >> vacuuming other tables. To deal with it, I think we need something >> queue on the shmem per database in order to control the order of >> tables waiting for vacuuming and need to use it with a smart >> algorithm. Thoughts? > > Agree, it would require some shared queue for the autovacuum workers if > we want to prioritize the table across all of them. I will look into > this, and maybe will come up with something. > > Masahiko, are you working on this too or just interested with the idea? > Hello, I have a hard time
Re: NEXT VALUE FOR sequence
Laurenz Albewrites: > The SQL standard has the expression "NEXT VALUE FOR asequence" to do > what we traditionally do with "nextval('asequence')". > This is an attempt to implement this on top of the recently introduced > NextValueExpr node. This has been proposed repeatedly, and rejected repeatedly, because in fact the standard's semantics for NEXT VALUE FOR are *not* like nextval(). See SQL:2011 4.22.2 "Operations involving sequence generators": If there are multiple instances of s specifying the same sequence generator within a single SQL-statement, all those instances return the same value for a given row processed by that SQL-statement. This is not terribly exact --- what is a "processed row" in a join query, for instance? But it's certainly not supposed to act like independent executions of nextval() or NextValueExpr would. Pending somebody doing the legwork to produce something that at least arguably conforms to the spec's semantics, we've left the syntax unimplemented. regards, tom lane
Re: autovacuum: change priority of the vacuumed tables
On Fri, 16 Feb 2018 21:48:14 +0900 Masahiko Sawadawrote: > On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev > wrote: > > On Fri, 16 Feb 2018 17:42:34 +0900 > > Masahiko Sawada wrote: > > > >> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin > >> wrote: > >> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote: > >> > > >> >> Hi, > >> >> > >> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> Attached patch adds 'autovacuum_table_priority' to the current > >> >>> list of automatic vacuuming settings. It's used in sorting of > >> >>> vacuumed tables in autovacuum worker before actual vacuum. > >> >>> > >> >>> The idea is to give possibility to the users to prioritize > >> >>> their tables in autovacuum process. > >> >>> > >> >> Hmm, I couldn't understand the benefit of this patch. Would you > >> >> elaborate it a little more? > >> >> > >> >> Multiple autovacuum worker can work on one database. So even if > >> >> a table that you want to vacuum first is the back of the list > >> >> and there other worker would pick up it. If the vacuuming the > >> >> table gets delayed due to some big tables are in front of that > >> >> table I think you can deal with it by increasing the number of > >> >> autovacuum workers. > >> >> > >> >> Regards, > >> >> > >> >> -- > >> >> Masahiko Sawada > >> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION > >> >> NTT Open Source Software Center > >> >> > >> > > >> > Database can contain thousands of tables and often > >> > updates/deletes concentrate mostly in only a handful of tables. > >> > Going through thousands of less bloated tables can take ages. > >> > Currently autovacuum know nothing about prioritizing it`s work > >> > with respect to user`s understanding of his data and > >> > application. > >> > >> Understood. I have a question; please imagine the following case. > >> > >> Suppose that there are 1000 tables in a database, and one table of > >> them (table-A) has the highest priority while other 999 tables have > >> same priority. Almost tables (say 800 tables) including table-A > >> need to get vacuumed at some point, so with your patch an AV > >> worker listed 800 tables and table-A will be at the head of the > >> list. Table-A will get vacuumed first but this AV worker has to > >> vacuum other 799 tables even if table-A requires vacuum later > >> again. > >> > >> If an another AV worker launches during table-A being vacuumed, the > >> new AV worker would include table-A but would not process it > >> because concurrent AV worker is processing it. So it would vacuum > >> other tables instead. Similarly, this AV worker can not get the > >> new table list until finish to vacuum all other tables. (Note that > >> it might skip some tables if they are already vacuumed by other AV > >> worker.) On the other hand, if another new AV worker launches > >> after table-A got vacuumed and requires vacuuming again, the new > >> AV worker puts the table-A at the head of list. It processes > >> table-A first but, again, it has to vacuum other tables before > >> getting new table list next time that might include table-A. > >> > >> Is this the expected behavior? I'd rather expect postgres to > >> vacuum it before other lower priority tables whenever the table > >> having the highest priority requires vacuuming, but it wouldn't. > > > > Yes, this is the expected behavior. The patch is the way to give the > > user at least some control of the sorting, later it could be > > extended with something more sophisticated. > > > > Since user doesn't know that each AV worker processes tables based on > its table list that is different from lists that other worker has, I > think it's hard for user to understand this parameter. I'd say that > user would expect that high priority table can get vacuumed any time. Yes, very good point. It could be strange for the user in cases like that. > > I think what you want to solve is to vacuum some tables preferentially > if there are many tables requiring vacuuming. Right? If so, I think > the prioritizing table only in the list would not solve the > fundamental issue. In the example, table-A will still need to wait for > other 799 tables to get vacuumed. Table-A will be bloating during > vacuuming other tables. To deal with it, I think we need something > queue on the shmem per database in order to control the order of > tables waiting for vacuuming and need to use it with a smart > algorithm. Thoughts? Agree, it would require some shared queue for the autovacuum workers if we want to prioritize the table across all of them. I will look into this, and maybe will come up with something. Masahiko, are you working on this too or just interested with the idea? -- --- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com Russian
Re: CURRENT OF causes an error when IndexOnlyScan is used
01.02.2018 05:12, Tom Lane: Yugo Nagatawrites: I'm sorry the patch attached in the previous mail is broken and not raises a compile error. I attached the fixed patch. This patch is almost certainly wrong: you can't assume that the scan-level state matches the tuple we are currently processing at top level. Any sort of delaying action, for instance a sort or materialize node in between, would break it. We need to either fix this aspect: IndexOnlyScan returns a virtual tuple that doesn't have system column, so we can not get ctid in the same way of other plans. I'd like to propose the patch that fixes the issue. We already have a way to return heaptuple from IndexOnlyScan, but it was not applied to b-tree for some reason. Attached patch solves the reported bug. Moreover, it will come in handy for "index with included attributes" feature [1], where we can store long (and even TOASTed) attributes in indextuple. [1] https://commitfest.postgresql.org/17/1350/ -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 8158508..db8a55c 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -374,6 +374,8 @@ btbeginscan(Relation rel, int nkeys, int norderbys) so->currTuples = so->markTuples = NULL; scan->xs_itupdesc = RelationGetDescr(rel); + scan->xs_hitupdesc = NULL; + scan->xs_hitup = NULL; scan->opaque = so; diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c index 51dca64..dd3e8b2 100644 --- a/src/backend/access/nbtree/nbtsearch.c +++ b/src/backend/access/nbtree/nbtsearch.c @@ -25,6 +25,7 @@ #include "utils/tqual.h" +static HeapTuple _bt_fetch_tuple(IndexScanDesc scandesc); static bool _bt_readpage(IndexScanDesc scan, ScanDirection dir, OffsetNumber offnum); static void _bt_saveitem(BTScanOpaque so, int itemIndex, @@ -38,6 +39,31 @@ static bool _bt_endpoint(IndexScanDesc scan, ScanDirection dir); static void _bt_drop_lock_and_maybe_pin(IndexScanDesc scan, BTScanPos sp); static inline void _bt_initialize_more_data(BTScanOpaque so, ScanDirection dir); +/* + * Fetch all keys in tuple. + * Returns a new HeapTuple containing the originally-indexed data. + */ +static HeapTuple +_bt_fetch_tuple(IndexScanDesc scandesc) +{ + Relation index = scandesc->indexRelation; + Datum fetchatt[INDEX_MAX_KEYS]; + bool isnull[INDEX_MAX_KEYS]; + int i; + HeapTuple htuple; + + for (i = 0; i < index->rd_att->natts; i++) + { + fetchatt[i] = index_getattr(scandesc->xs_itup, i + 1, + scandesc->xs_itupdesc, [i]); + } + + htuple = heap_form_tuple(scandesc->xs_hitupdesc, fetchatt, isnull); + htuple->t_tableOid = scandesc->heapRelation->rd_id; + + + return htuple; +} /* * _bt_drop_lock_and_maybe_pin() @@ -1105,8 +1131,32 @@ readcomplete: /* OK, itemIndex says what to return */ currItem = >currPos.items[so->currPos.itemIndex]; scan->xs_ctup.t_self = currItem->heapTid; + if (scan->xs_want_itup) + { + if (!scan->xs_hitupdesc) + { + int natts = RelationGetNumberOfAttributes(scan->indexRelation); + int attno; + /* + * The storage type of the index can be different from the original + * datatype being indexed, so we cannot just grab the index's tuple + * descriptor. Instead, construct a descriptor with the original data + * types. + */ + scan->xs_hitupdesc = CreateTemplateTupleDesc(natts, false); + for (attno = 1; attno <= natts; attno++) + { +TupleDescInitEntry(scan->xs_hitupdesc, attno, NULL, +scan->indexRelation->rd_opcintype[attno - 1], +-1, 0); + } + } + scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset); + scan->xs_hitup = _bt_fetch_tuple(scan); + scan->xs_hitup->t_self = currItem->heapTid; + } return true; } @@ -1155,8 +1205,34 @@ _bt_next(IndexScanDesc scan, ScanDirection dir) /* OK, itemIndex says what to return */ currItem = >currPos.items[so->currPos.itemIndex]; scan->xs_ctup.t_self = currItem->heapTid; + if (scan->xs_want_itup) + { + if (!scan->xs_hitupdesc) + { + int natts = RelationGetNumberOfAttributes(scan->indexRelation); + int attno; + /* + * The storage type of the index can be different from the original + * datatype being indexed, so we cannot just grab the index's tuple + * descriptor. Instead, construct a descriptor with the original data + * types. + */ + scan->xs_hitupdesc = CreateTemplateTupleDesc(natts, false); + for (attno = 1; attno <= natts; attno++) + { +TupleDescInitEntry(scan->xs_hitupdesc, attno, NULL, +scan->indexRelation->rd_opcintype[attno - 1], +-1, 0); + } + } + scan->xs_itup = (IndexTuple) (so->currTuples + currItem->tupleOffset); + if (scan->xs_hitup) + pfree(scan->xs_hitup); + scan->xs_hitup = _bt_fetch_tuple(scan); + scan->xs_hitup->t_self = currItem->heapTid; + }
Re: SHA-2 functions
On 02/19/2018 08:43 AM, Peter Eisentraut wrote: > I also noticed while working on some SSL code that we have perfectly > good SHA-2 functionality in the server already, but it has no test > coverage outside the SCRAM tests. > > So I suggest these patches that expose the new functions sha224(), > sha256(), sha384(), sha512(). That allows us to make the SSL and SCRAM > tests more robust, and it will allow them to be used in general purpose > contexts over md5(). I didn't look closely at the patch itself, but +1 on the concept. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
[PROPOSAL] Nepali Snowball dictionary
Hello hackers, I would like to propose nepali snowball dictionary patch. Nepali is inflectional and derivational language. And it can be stemmed. initdb also patched, so it can determine default text search configuration. Examples: =# select ts_lexize('nepali_stem', 'लेख्'); ts_lexize --- {लेख्} =# select ts_lexize('nepali_stem', 'लेखछेस्'); ts_lexize --- {लेख} =# select ts_lexize('nepali_stem', 'लेखे'); ts_lexize --- {लेखे} Authors: - Oleg Bartunov - Nepali NLP Group Is it appropriate to add new snowball dictionaries? I'm not sure about policy of including new snowball dictionaries. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 610b7bf033..c9c7de52ad 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -3723,6 +3723,7 @@ Parser: "pg_catalog.default" pg_catalog | german_stem | snowball stemmer for german language pg_catalog | hungarian_stem | snowball stemmer for hungarian language pg_catalog | italian_stem| snowball stemmer for italian language + pg_catalog | nepali_stem | snowball stemmer for nepali language pg_catalog | norwegian_stem | snowball stemmer for norwegian language pg_catalog | portuguese_stem | snowball stemmer for portuguese language pg_catalog | romanian_stem | snowball stemmer for romanian language diff --git a/src/backend/snowball/Makefile b/src/backend/snowball/Makefile index 50cbace41d..c29f4184f2 100644 --- a/src/backend/snowball/Makefile +++ b/src/backend/snowball/Makefile @@ -40,6 +40,7 @@ OBJS= $(WIN32RES) dict_snowball.o api.o utilities.o \ stem_UTF_8_german.o \ stem_UTF_8_hungarian.o \ stem_UTF_8_italian.o \ + stem_UTF_8_nepali.o \ stem_UTF_8_norwegian.o \ stem_UTF_8_porter.o \ stem_UTF_8_portuguese.o \ @@ -62,6 +63,7 @@ LANGUAGES= \ german german \ hungarian hungarian \ italian italian \ + nepali nepali \ norwegian norwegian \ portuguese portuguese \ romanianromanian\ diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c index 78c9f73ef0..d96c849118 100644 --- a/src/backend/snowball/dict_snowball.c +++ b/src/backend/snowball/dict_snowball.c @@ -49,6 +49,7 @@ #include "snowball/libstemmer/stem_UTF_8_german.h" #include "snowball/libstemmer/stem_UTF_8_hungarian.h" #include "snowball/libstemmer/stem_UTF_8_italian.h" +#include "snowball/libstemmer/stem_UTF_8_nepali.h" #include "snowball/libstemmer/stem_UTF_8_norwegian.h" #include "snowball/libstemmer/stem_UTF_8_porter.h" #include "snowball/libstemmer/stem_UTF_8_portuguese.h" @@ -102,6 +103,7 @@ static const stemmer_module stemmer_modules[] = {"german", PG_UTF8, german_UTF_8_create_env, german_UTF_8_close_env, german_UTF_8_stem}, {"hungarian", PG_UTF8, hungarian_UTF_8_create_env, hungarian_UTF_8_close_env, hungarian_UTF_8_stem}, {"italian", PG_UTF8, italian_UTF_8_create_env, italian_UTF_8_close_env, italian_UTF_8_stem}, + {"nepali", PG_UTF8, nepali_UTF_8_create_env, nepali_UTF_8_close_env, nepali_UTF_8_stem}, {"norwegian", PG_UTF8, norwegian_UTF_8_create_env, norwegian_UTF_8_close_env, norwegian_UTF_8_stem}, {"porter", PG_UTF8, porter_UTF_8_create_env, porter_UTF_8_close_env, porter_UTF_8_stem}, {"portuguese", PG_UTF8, portuguese_UTF_8_create_env, portuguese_UTF_8_close_env, portuguese_UTF_8_stem}, diff --git a/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c b/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c new file mode 100644 index 00..f4f6e656ad --- /dev/null +++ b/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c @@ -0,0 +1,440 @@ +/* This file was generated automatically by the Snowball to ISO C compiler */ +/* http://snowballstem.org/ */ + +#include "header.h" + +#ifdef __cplusplus +extern "C" { +#endif +extern int nepali_UTF_8_stem(struct SN_env * z); +#ifdef __cplusplus +} +#endif +static int r_remove_category_2(struct SN_env * z); +static int r_remove_category_3(struct SN_env * z); +static int r_check_category_2(struct SN_env * z); +static int r_remove_category_1(struct SN_env * z); +#ifdef __cplusplus +extern "C" { +#endif + + +extern struct SN_env * nepali_UTF_8_create_env(void); +extern void nepali_UTF_8_close_env(struct SN_env * z); + + +#ifdef __cplusplus +} +#endif +static const symbol s_0_0[6] = { 0xE0, 0xA4, 0x95, 0xE0, 0xA5, 0x80 }; +static const symbol s_0_1[9] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA4, 0xBE, 0xE0, 0xA4, 0x87 }; +static const symbol s_0_2[6] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA5, 0x87 }; +static const symbol s_0_3[9] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA4, 0xBE, 0xE0, 0xA4, 0x88 }; +static const symbol s_0_4[6] = { 0xE0, 0xA4, 0x95, 0xE0, 0xA5, 0x88 }; +static const
Re: SHA-2 functions
Hello Peter, > So I suggest these patches that expose the new functions sha224(), > sha256(), sha384(), sha512(). That allows us to make the SSL and SCRAM > tests more robust, and it will allow them to be used in general purpose > contexts over md5(). Nice patch. I wonder though whether tests should include hashing non-ASCII characters as well. -- Best regards, Aleksander Alekseev signature.asc Description: PGP signature
SHA-2 functions
There was a complaint recently about the documentation using the widely frowned-upon md5() function in an unrelated context as an example hash function. This is quite common in many examples, such as hashing row values to compare them, or hashing datums if they don't fit into an index. But there is nothing we can easily replace md5 with, without going to things like pgcrypto. I also noticed while working on some SSL code that we have perfectly good SHA-2 functionality in the server already, but it has no test coverage outside the SCRAM tests. So I suggest these patches that expose the new functions sha224(), sha256(), sha384(), sha512(). That allows us to make the SSL and SCRAM tests more robust, and it will allow them to be used in general purpose contexts over md5(). -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 52edfab8f1175c69ba791139fde26feb9279364a Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Tue, 6 Feb 2018 21:46:46 -0500 Subject: [PATCH 1/2] Add user-callable SHA-2 functions Add the user-callable functions sha224, sha256, sha384, sha512. We already had these in the C code to support SCRAM, but there was no test coverage outside of the SCRAM tests. Adding these as user-callable functions allows writing some tests. Also, we have a user-callable md5 function but no more modern alternative, which led to wide use of md5 as a general-purpose hash function, which leads to occasional complaints about using md5. Also mark the existing md5 functions as leak-proof. --- doc/src/sgml/func.sgml | 70 - src/backend/utils/adt/varlena.c | 101 +++ src/include/catalog/pg_proc.h| 12 +++- src/test/regress/expected/opr_sanity.out | 6 ++ src/test/regress/expected/strings.out| 53 src/test/regress/sql/strings.sql | 18 ++ 6 files changed, 257 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 1e535cf215..bab828d507 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3640,7 +3640,7 @@ Other Binary String Functions returning the result in hexadecimal md5(E'Th\\000omas'::bytea) - 8ab2d3c9689aaf18 b4958c334c82d8b1 + 8ab2d3c9689aaf18b4958c334c82d8b1 @@ -3674,6 +3674,66 @@ Other Binary String Functions set_byte(E'Th\\000omas'::bytea, 4, 64) Th\000o@as + + + + + sha224 + +sha224(bytea) + + bytea + +SHA-224 hash + + sha224('abc') + \x23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7 + + + + + + sha256 + +sha256(bytea) + + bytea + +SHA-256 hash + + sha256('abc') + \xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad + + + + + + sha384 + +sha384(bytea) + + bytea + +SHA-384 hash + + sha384('abc') + \xcb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7 + + + + + + sha512 + +sha512(bytea) + + bytea + +SHA-512 hash + + sha512('abc') + \xddaf35a193617abacc417349ae20413112e6fa4e89a97ea20a964b55d39a2192992a274fc1a836ba3c23a3feebbd454d4423643ce80e2a9ac94fa54ca49f + @@ -3686,6 +3746,14 @@ Other Binary String Functions the first byte, and bit 15 is the most significant bit of the second byte. + + Note that for historic reasons, the function md5 + returns a hex-encoded value of type text whereas the SHA-2 + functions return type bytea. Use the functions + encode and decode to convert + between the two. + + See also the aggregate function string_agg in and the large object functions diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 304cb26691..d1266f26f5 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -23,6 +23,7 @@ #include "catalog/pg_type.h" #include "common/int.h" #include "common/md5.h" +#include "common/sha2.h" #include "lib/hyperloglog.h" #include "libpq/pqformat.h" #include "miscadmin.h" @@ -4611,6 +4612,106 @@ md5_bytea(PG_FUNCTION_ARGS) PG_RETURN_TEXT_P(cstring_to_text(hexsum)); } +/* + * SHA-2 variants + */ + +Datum +sha224_bytea(PG_FUNCTION_ARGS) +{ + bytea *in = PG_GETARG_BYTEA_PP(0); + const uint8 *data; + size_t len; + pg_sha224_ctx ctx; + unsigned char buf[PG_SHA224_DIGEST_LENGTH]; + bytea *result; + + len = VARSIZE_ANY_EXHDR(in); + data = (unsigned char *) VARDATA_ANY(in);
Re: [HACKERS] path toward faster partition pruning
On 19 February 2018 at 22:19, Amit Langotewrote: > Attached updated patches. Thanks again! Thanks for making those changes. I've made another pass of v28 and have a few more comments. The patch is starting to look good, but there are some new changes in recent versions which still don't look quite right. 1. This does not fully make sense: /* * Remove the indexes of partitions excluded due to each of * those partitions' *all* of allowed datums appearing in * keys->ne_datums, that is compared to the partition key * using <> operator. */ "each of those partitions' *all* of allowed" is not correct. Maybe better to write: /* * Remove the indexes of any partitions which cannot possibly * contain rows matching the clauses due to key->ne_datums containing * all datum values which are allowed in the given partition. This * is only possible to do in LIST partitioning as it's the only * partitioning strategy which allows the specification of exact values. */ 2. Mine does not, but some compilers may complain about get_partitions_for_keys() result variable being uninitalised in get_partitions_for_keys. Probably the easiest fix would be to just assign to NULL in the default case. 3. Did you mean to put this Assert() inside the loop? memset(keyisnull, false, sizeof(keyisnull)); i = -1; while ((i = bms_next_member(keys->keyisnull, i)) >= 0) { keys->n_eqkeys++; keyisnull[i] = true; } Assert(i < partnatts); i will always be -2 at the end of the loop. Seems like a useless Assert in its current location. 4. Can you add a comment here to say: "Note: LIST partitioning only supports a single partition key, therefore this function requires no looping over the partition keys" /* * get_partitions_for_keys_list * Return partitions of a list partitioned table for requested keys * * This interprets the keys and looks up partitions in the partition bound * descriptor using the list partitioning semantics. */ 5. The following comment contains a bit of duplication to the comment which comes after it. Maybe the following: /* * If the query is looking for null keys, there can only be one such * partition. Return the same if one exists. */ can be changed to: /* Handle clauses requesting a NULL valued partition key */ 6. The following comment does not quite make sense: /* Exactly matching datum exists. */ Probably better to write: /* An exact matching datum exists. */ 7. "If it's not equal (<)" I think you mean (>), not (<), in: * The bound at minoff is <= minkeys, given the way * partition_list_bsearch() works. If it's not equal (<), then * increment minoff to make it point to the datum on the right * that necessarily satisfies minkeys. Also do the same if it is * equal but minkeys is exclusive. However, the comment is a bit clumsy. Maybe the following is better? /* * partition_list_bsearch returning a positive number means that * minkeys[0] must be greater than or equal to the smallest datum. * If we didn't find an exact matching datum (!is_equal) or if the * operator used was non-inclusive (>), then in both of these * cases we're not interested in the datum pointed to by minoff, * but we may start getting matches in the partition which the * next datum belongs to, so point to that one instead. (This may * be beyond the last datum in the array, but we'll detect that * later.) */ 8. The following comment could be improved: * minkeys is greater than the datums of all non-default partitions, * meaning there isn't one to return. Return the default partition if * one exists. how about: * The value of minkeys[0] is greater than all of the datums we have * partitions for. The only possible partition that could contain a * match is the default partition. Return that, if it exists. 9. The following could also be improved: * The bound at maxoff is <= maxkeys, given the way * partition_list_bsearch works. If the bound at maxoff exactly * matches maxkey (is_equal), but the maxkey is exclusive, then * decrement maxoff to point to the bound on the left. how about: * partition_list_bsearch returning a positive number means that * maxkeys[0] must be greater than or equal to the smallest datum. * If the match found is an equal match, but the operator used is * non-inclusive of that value (<), then the partition belonging * to maxoff cannot match, so we'll decrement maxoff to point to * the partition belonging to the previous datum. We might end up * decrementing maxoff down to -1, but we'll handle that later. 10. Can you append " This may not technically be true for some data types (e.g. integer types), however, we currently lack any sort of infrastructure to provide us with proofs that would allow us to do anything smarter here." to: * For range queries, always include the default list partition, * because list partitions divide the key space in a discontinuous * manner, not all values in the given range will have a partition * assigned. 11.
Re: NEXT VALUE FOR sequence
Laurenz Albe wrote: > The SQL standard has the expression "NEXT VALUE FOR asequence" to do > what we traditionally do with "nextval('asequence')". The behavior mandated by the standard is that several invocations of NEXT VALUE on the same sequence on the same output row must produce the same value. That is: CREATE SEQUENCE s; SELECT NEXT VALUE FOR s, NEXT VALUE FOR s UNION SELECT NEXT VALUE FOR s, NEXT VALUE FOR s should produce (1,1) (2,2) It makes sense that the value does not depend on the position of the expression as a column. The trouble of course is that the equivalent with nextval() would produce instead (1,2) (3,4) There have been previous discussions on the standard syntax that said that when it will get into postgres, it should go with the standard conforming semantics. I guess it would be a much more difficult patch. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Tracking object modification time using event triggers
Hi all. I'm lead developer for pgCodeKeeper which is a tool for PostgreSQL database schema comparison. In our tool we have a pg_dump-like schema reader for comparing against live DB instances. This reader consumes majority of the time the comparison operation takes and we had an idea to speed it up. To do this we need to be able to track last modification time of every DB object and an extension with event triggers seems like a suitable tool for this. The extension we've implemented is available, for anyone interested: https://github.com/pgcodekeeper/pg_dbo_timestamp/ However, we've discovered that event triggers provide almost no data for GRANT/REVOKE commands, in particular, there's no way to find out which objects were altered by these commands. pg_event_trigger_ddl_commands() does provide a pg_ddl_command data which seems to contain objects list for GRANT, however it seems totally inaccessible in plpgsql. This leads to my question: do we need to dive into writing a C function for our extension to access pg_ddl_command or some other lower-lever representation? Or can we use something else to solve our task, maybe avoiding event triggers entirely? Thanks. Alexander Levsha
Re: [doc fix] Correct calculation of vm.nr_hugepages
On Mon, Feb 19, 2018 at 07:05:47AM +, Tsunakawa, Takayuki wrote: > The current PostgreSQL documentation overestimates the number of huge pages > (vm.nr_hugepages) because the calculation uses the maximum virtual address > space. In practice, huge pages are only used for the anonymous shared memory > segment. The attached patch fixes the documentation. + pmap 4170 | grep rw-s | grep zero | awk '{print $2}' One can do with fewer greps: pryzbyj@pryzbyj:~$ sudo pmap `pgrep -P 1 -u postgres` |awk '/rw-s/&&/zero/{print $2}' # check PPID=1 144848K or: pryzbyj@pryzbyj:~$ sudo pmap `pgrep -u postgres |sed q` |awk '/rw-s/&&/zero/{print $2}' # check any backend, not just postmaster parent 144848K Or (this is even less clean but gives an alternative which continues to use /proc directly): pryzbyj@pryzbyj:~$ sudo cat /proc/`pgrep -P 1 -u postgres`/maps |awk --non-decimal-data 'BEGIN{FS="[ -]"} /zero/{a="0x"$1;b="0x"$2;print (b-a)/1024"kB"}' 144848kB BTW when huge_pages=try, I've been verifying hugepages are in use by running: pryzbyj@pryzbyj:~$ sudo sh -c 'cat /proc/`pgrep -u postgres -P1`/smaps |grep -c "KernelPageSize: *2048 kB"' || echo NOT FOUND 0 NOT FOUND Justin
Re: Prefix operator for text and spgist support
Hello, On Fri, Feb 02, 2018 at 06:03:27PM +0300, Ildus Kurbangaliev wrote: > Hi, > > Attached patch introduces prefix operator ^@ for text type. For 'a ^@ b' > it returns true if 'a' starts with 'b'. Also there is spgist index > support for this operator. > > It could be useful as an alternative for LIKE for 'something%' > templates. Or even used in LIKE queries instead of ~>=~ and ~<~ in the > future. But it would require new strategy for btree. I've looked at the patch. It is applied and tests pass. I have a couple comments: > + if (ptype == Pattern_Type_Prefix) > + { > + char*s = TextDatumGetCString(constval); > + prefix = string_to_const(s, vartype); > + pstatus = Pattern_Prefix_Partial; > + rest_selec = 1.0; /* all */ > + pfree(s); > + } > + else > + pstatus = pattern_fixed_prefix(patt, ptype, collation, > + > , _selec); I think it is better to put Pattern_Type_Prefix processing into pattern_fixed_prefix() as another case entry. Secondly, it is worth to fix the documentation. At least here [1]. Maybe there are another places where documentation should be fixed. 1 - https://www.postgresql.org/docs/current/static/spgist-builtin-opclasses.html -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
NEXT VALUE FOR sequence
The SQL standard has the expression "NEXT VALUE FOR asequence" to do what we traditionally do with "nextval('asequence')". This is an attempt to implement this on top of the recently introduced NextValueExpr node. If there is no obvious reason why we would not want that, I'll add it to the next commitfest. Is this behavior ok: test=> CREATE SEQUENCE testseq; test=> PREPARE x AS SELECT NEXT VALUE FOR testseq; test=> EXECUTE x; ?column? -- 1 (1 row) test=> DROP SEQUENCE testseq; DROP SEQUENCE test=> EXECUTE x; ERROR: could not open relation with OID 24836 If not, what could be done about it? Yours, Laurenz AlbeFrom 57e42f0a32be39eea541808979a41ab6feac6b73 Mon Sep 17 00:00:00 2001 From: Laurenz AlbeDate: Mon, 19 Feb 2018 12:17:59 +0100 Subject: [PATCH] Add support for NEXT VALUE FOR NEXT VALUE FOR is the standard SQL expression for getting the next value from a sequence, which in PostgreSQL traditionally has been the task of the "nextval" function. This completes the implementation of SQL standard feature T176. --- doc/src/sgml/func.sgml | 10 doc/src/sgml/ref/create_sequence.sgml | 3 ++- doc/src/sgml/syntax.sgml | 18 ++ src/backend/commands/tablecmds.c | 2 ++ src/backend/executor/execExpr.c| 1 + src/backend/executor/execExprInterp.c | 3 ++- src/backend/nodes/copyfuncs.c | 2 ++ src/backend/nodes/equalfuncs.c | 2 ++ src/backend/nodes/outfuncs.c | 2 ++ src/backend/nodes/readfuncs.c | 2 ++ src/backend/parser/gram.y | 6 + src/backend/parser/parse_expr.c| 43 ++ src/backend/rewrite/rewriteHandler.c | 2 ++ src/backend/utils/adt/ruleutils.c | 12 +++--- src/include/executor/execExpr.h| 1 + src/include/nodes/primnodes.h | 7 -- src/test/regress/expected/sequence.out | 32 - src/test/regress/sql/sequence.sql | 12 +- 18 files changed, 145 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 487c7ff750..e135a05314 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12095,6 +12095,16 @@ nextval('foo'::text) foo is looked up at This function requires USAGE or UPDATE privilege on the sequence. + + + + PostgreSQL also offers the SQL standard compliant expression + NEXT VALUE FOR sequence_name + which is the same as + nextval('sequence_name'). + See for details. + + diff --git a/doc/src/sgml/ref/create_sequence.sgml b/doc/src/sgml/ref/create_sequence.sgml index 3e0d339c85..2671e80f12 100644 --- a/doc/src/sgml/ref/create_sequence.sgml +++ b/doc/src/sgml/ref/create_sequence.sgml @@ -51,7 +51,8 @@ CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] + NEXT VALUE FOR + + +NEXT VALUE FOR + + + +This expression is used to obtain the next value for a sequence. + +NEXT VALUE FOR sequence_name + + +It is equivalent to calling the nextval function. + + + + Expression Evaluation Rules diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 89454d8e80..34a1b0684d 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -5496,6 +5496,8 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, Relation rel, nve->seqid = RangeVarGetRelid(colDef->identitySequence, NoLock, false); nve->typeId = typeOid; + /* no need to check permissions on the implicit sequence */ + nve->checkperms = false; defval = (Expr *) nve; } diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index c6eb3ebacf..2f6b536d90 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -2104,6 +2104,7 @@ ExecInitExprRec(Expr *node, ExprState *state, scratch.opcode = EEOP_NEXTVALUEEXPR; scratch.d.nextvalueexpr.seqid = nve->seqid; scratch.d.nextvalueexpr.seqtypid = nve->typeId; +scratch.d.nextvalueexpr.checkperms = nve->checkperms; ExprEvalPushStep(state, ); break; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index f646fd9c51..1b067eb491 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -2355,7 +2355,8 @@ ExecEvalCurrentOfExpr(ExprState *state, ExprEvalStep *op) void ExecEvalNextValueExpr(ExprState *state, ExprEvalStep *op) { - int64 newval = nextval_internal(op->d.nextvalueexpr.seqid, false); + int64 newval = nextval_internal(op->d.nextvalueexpr.seqid, + op->d.nextvalueexpr.checkperms); switch (op->d.nextvalueexpr.seqtypid) { diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index bafe0d1071..aaadc8b54b 100644 --- a/src/backend/nodes/copyfuncs.c
extern keyword incorrectly used in some function definitions
While poking around partition.c I noticed that one of the functions there is *defined* as "extern". Normally we'd only do this in the declaration of the function. I don't really see why it's needed in the definition. Anyway, I removed it. I then thought I'd better look around for any others too... A patch is attached which removes them all, apart from the snowball ones. I didn't touch those. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_extern_func_defs.patch Description: Binary data
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 19 February 2018 at 18:01, David Rowleywrote: > On 19 February 2018 at 15:11, Tomas Vondra > wrote: >> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual >> naming for boolean variables. > > You're right. I'll change that and post an updated patch. I'll also > reprocess your email again and try to improve some comments to make > the intent of the code more clear. I've made a few changes to the patch. "isproxy" is now "is_proxy". I've made the comments a bit more verbose at the top of the definition of struct AppendPath. Also shuffled some comments around in allpaths.c I've attached both a delta patch with just these changes, and also a completely new patch based on a recent master. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_singleton_appends_2018-02-19_delta.patch Description: Binary data remove_singleton_appends_2018-02-19.patch Description: Binary data