Re: [HACKERS] Checksums by default?
On 01/23/2017 08:30 AM, Amit Kapila wrote: On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra wrote: That being said, I'm ready to do some benchmarking on this, so that we have at least some numbers to argue about. Can we agree on a set of workloads that we want to benchmark in the first round? I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. Yes, I'm thinking that's pretty much the worst case for OLTP-like workload, because it has to evict buffers from shared buffers, generating a continuous stream of writes. Doing that on good storage (e.g. PCI-e SSD or possibly tmpfs) will further limit the storage overhead, making the time spent computing checksums much more significant. Makes sense? What about other types of workload? I think we should not look just at write-heavy workloads - I wonder what is the overhead of verifying the checksums in read-only workloads (again, with data that fits into RAM). What about large data loads simulating OLAP, and exports (e.g. pg_dump)? That leaves us with 4 workload types, I guess: 1) read-write OLTP (shared buffers < data < RAM) 2) read-only OLTP (shared buffers < data < RAM) 3) large data loads (COPY) 4) large data exports (pg_dump) Anything else? The other question is of course hardware - IIRC there are differences between CPUs. I do have a new e5-2620v4, but perhaps it'd be good to also do some testing on a Power machine, or an older Intel CPU. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On Sun, Jan 22, 2017 at 3:43 PM, Tomas Vondra wrote: > > That being said, I'm ready to do some benchmarking on this, so that we have > at least some numbers to argue about. Can we agree on a set of workloads > that we want to benchmark in the first round? > I think if we can get data for pgbench read-write workload when data doesn't fit in shared buffers but fit in RAM, that can give us some indication. We can try by varying the ratio of shared buffers w.r.t data. This should exercise the checksum code both when buffers are evicted and at next read. I think it also makes sense to check the WAL data size for each of those runs. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support
On Thu, Jan 19, 2017 at 7:46 AM, Haribabu Kommi wrote: > > Updated patch attached. > I've looked at the updated patch. There are still some changes that needs to be done. It includes: 1. Adding macaddr8 support for btree_gist and testcases. 2. Implementation of macaddr8 support for btree_gin is incomplete. You need to modify contrib/btree_gin/*.sql files as well. Also, testcases needs to be added. Other than that, I've tested the basic implementation of the feature. It looks good. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
> 21 янв. 2017 г., в 18:18, Petr Jelinek > написал(а): > > On 21/01/17 11:39, Magnus Hagander wrote: >> Is it time to enable checksums by default, and give initdb a switch to >> turn it off instead? +1 > > I'd like to see benchmark first, both in terms of CPU and in terms of > produced WAL (=network traffic) given that it turns on logging of hint bits. Here are the results of my testing for 9.4 in December 2014. The benchmark was done on a production like use case when all the data fits in memory (~ 20 GB) and doesn’t fit into shared_buffers (it was intentionally small - 128 MB on stress stend), so that shared_blk_read / shared_blk_hit = 1/4. The workload was a typical OLTP but with mostly write queries (80% writes, 20% reads). Here are the number of WALs written during the test: Defaults263 wal_log_hints 410 checksums 367 I couldn’t find the answer why WAL write amplification is even worse for wal_log_hints than for checksums but I checked several times and it always reproduced. As for CPU I couldn’t see the overhead [1]. And perf top showed me less then 2% in calculating CRC. For all new DBs we now enable checksums at initdb and several dozens of our shards use checksums now. I don’t see any performance difference for them comparing with non-checksumed clusters. And we have already had one case when we caught data corruption with checksums. [1] https://yadi.sk/i/VAiWjv6t3AQCs2?lang=en -- May the force be with you… https://simply.name -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.
On Mon, Jan 23, 2017 at 7:53 AM, Tom Lane wrote: >> 2. I didn't do anything about docs, either, though maybe no change >> is needed. One user-visible change from this is that queries should >> never return any "unknown"-type columns to clients anymore. But I >> think that is not documented now, so maybe there's nothing to change. > > The only thing I could find in the SGML docs that directly addresses > unknown-type columns was a remark in the CREATE VIEW man page, which > I've updated. I also added a section to typeconv.sgml to document > the behavior. This looks good. >> 3. We need to look at whether pg_upgrade is affected. I think we >> might be able to let it just ignore the issue for views, since they'd >> get properly updated during the dump and reload anyway. But if someone >> had an actual table (or matview) with an "unknown" column, that would >> be a pg_upgrade hazard, because "unknown" doesn't store like "text". > > I tested and found that simple views with unknown columns seem to update > sanely if we just let pg_upgrade dump and restore them. So I suggest we > allow that to happen. There might be cases where dependent views behave > unexpectedly after such a conversion, but I think that would be rare > enough that it's not worth forcing users to fix these views by hand before > updating. However, tables with unknown columns would fail the upgrade > (since we'd reject the CREATE TABLE command) while matviews would be > accepted but then the DDL wouldn't match the physical data storage. > So I added code to pg_upgrade to check for those cases and refuse to > proceed. This is almost a straight copy-and-paste of the existing > pg_upgrade code for checking for "line" columns. > > I think this is committable now; the other loose ends can be dealt > with in follow-on patches. Does anyone want to review it? I have spent a couple of hours looking at it, and it looks in pretty good shape. The concept of doing the checks in the parser is far cleaner than what was proposed upthread to tweak the column lists, and more generic. One thing though: even with this patch, it is still possible to define a domain with unknown as underlying type and have a table grab it: create domain name as unknown; create table foo_name (a name); I think that this case should be restricted as well, and pg_upgrade had better complain with a lookup at typbasetype in pg_type. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
On 23 January 2017 at 12:34, Craig Ringer wrote: > On 20 January 2017 at 21:40, Alvaro Herrera wrote: > >> One option would be to add another limit Xid which advances before the >> truncation but which is not used for other decisions other than limiting >> what can users consult. > > This could be useful for other things, but it's probably heavier than needed. > > What I've done in the latest revision of the txid_status() patch is > simply to advance OldestXid _before_ truncating the clog. The rest of > the xid info is advanced after. Currently this is incorporated into > the txid_status patch, but can be separated if desired. > > Relevant commit message portion: > > There was previously no way to look up an arbitrary xid without > running the risk of having clog truncated out from under you. This > hasn't been a problem because anything looking up xids in clog knows > they're protected by datminxid, but that's not the case for arbitrary > user-supplied XIDs. clog was truncated before we advance oldestXid so > taking XidGenLock was insufficient. There's no way to look up a > SLRU with soft-failure. To address this, increase oldestXid under > XidGenLock > before we trunate clog rather than after, so concurrent access is safe. > > Note that while oldestXid is advanced before clog truncation, the xid > limits are advanced _after_ it. If we advanced the xid limits before > truncation too, we'd theoretically run the risk of allocating an xid > from the clog section we're about to truncate, which would be no fun. > (In practice it can't really happen since we only use 1/2 the > available space at a time). > > Moving the lower bound up, truncating, and moving the upper bound up > is the way to go IMO. Patch with explanation in commit msg: https://www.postgresql.org/message-id/camsr+yhodeduua5vw1_rjps_assvemejn_34rqd3pzhf5of...@mail.gmail.com -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree
On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin wrote: > Hello Jeff! > >>Review of the code itself: > How do you think, is there anything else to improve in that patch or > we can mark it as 'Ready for committer'? > > I've checked the concurrency algorithm with original work of Lehman > and Yao on B-link-tree. For me everything looks fine (safe, deadlock > free and not increased probability of livelock due to > LockBufferForCleanup call). Hi, I looked at the patch some more. I have some reservations about the complexity and novelty of the approach. I don't think I've seen an approach quite like this one before. It seems like the main reason you are using this approach is because the btree structure was too simple (no left links); is that right? My gut is telling me we should either leave it simple, or use a real concurrent btree implementation. One idea I had that might be simpler is to use a two-stage page delete. The first stage would remove the link from the parent and mark the page deleted, but leave the right link intact and prevent recycling. The second stage would follow the chain of right links along each level, removing the right links to deleted pages and freeing the page to be recycled. Another idea is to partition the posting tree so that the root page actually points to K posting trees. Scans would need to merge them, but it would allow inserts to progress in one tree while the other is being vacuumed. I think this would give good concurrency even for K=2. I just had this idea now, so I didn't think it through very well. What do you think? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)
On 28 December 2016 at 18:00, Craig Ringer wrote: > On 23 December 2016 at 18:00, Craig Ringer wrote: > >> I'll have to follow up with a patch soon, as it's Toddler o'Clock. > > Here we go. > > This patch advances oldestXid, under XidGenLock, _before_ truncating clog. > > txid_status() holds XidGenLock from when it tests oldestXid until it's > done looking up clog, thus eliminating the race. > > CLOG_TRUNCATE records now contain the oldestXid, so they can advance > oldestXid on a standby, or when we've truncated clog since the most > recent checkpoint on the master during recovery. It's advanced under > XidGenLock during redo to protect against this race on standby. > > As outlined in my prior mail I think this is the right approach. I > don't like taking XidGenLock twice, but we don't advance datfrozenxid > much so it's not a big concern. While a separate ClogTruncationLock > could be added like in my earlier patch, oldestXid is currently under > XidGenLock and I'd rather not change that. > > The biggest change here is that oldestXid is advanced separately to > the vac limits in the rest of ShmemVariableCache. As far as I can tell > we don't prevent two manual VACUUMs on different DBs from trying to > concurrently run vac_truncate_clog, so this has to be safe against two > invocations racing each other. Rather than try to lock out such > concurrency, the patch ensures that oldestXid can never go backwards. > It doesn't really matter if the vac limits go backwards, it's no worse > than what can already happen in the current code. > > We cannot advance the vacuum limits before we truncate the clog away, > in case someone tries to access a very new xid (if we're near > wraparound) > > I'm pretty sure that commit timestamps suffer from the same flaw as > Robert identified upthread with clog. This patch fixes the clog race, > but not the similar one in commit timestamps. Unlike the clog race > with txid_status(), the commit timestamps one is already potentially > user-visible since we allow arbitrary xids to be looked up for commit > timestamps. I'll address that separately. Rebased patch attached. I've split the clog changes out from txid_status() its self. There is relevant discussion on the commit timestamp truncation fix thread where the similar fix for commit_ts got committed. https://www.postgresql.org/message-id/flat/979ff13d-0b8e-4937-01e8-2925c0adc306%402ndquadrant.com#979ff13d-0b8e-4937-01e8-2925c0adc...@2ndquadrant.com -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From f8e5e89145fee7afa9c629c80a3d578fc31e21b4 Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Mon, 23 Jan 2017 13:25:30 +0800 Subject: [PATCH 1/2] Fix race between clog truncation and lookup There was previously no way to look up an arbitrary xid without running the risk of having clog truncated out from under you. This hasn 't been a problem because anything looking up xids in clog knows they're protected by datminxid, but that's not the case for arbitrary user-supplied XIDs. clog was truncated before we advanced oldestXid under XidGenLock, so holding XidGenLock during a clog lookup was insufficient to prevent the race. There's no way to look up a SLRU with soft-failure; attempting a lookup produces an I/O error. There's also no safe way to trap and swallow the SLRU lookup error due mainly to locking issues. To address this, increase oldestXid under XidGenLock before we trunate clog rather than after, so concurrent lookups of arbitrary XIDs are safe if they are done under XidGenLock. The rest of the xid limits are still advanced after clog truncation to ensure there's no chance of a new xid trying to access an about-to-be-truncated clog page. In practice this can't happen anyway since we use only half the xid space at any time, but additional guards against future change are warranted with something this crucial. This race also exists in a worse form on standby servers. On a standby we only advance oldestXid when we replay the next checkpoint, so there's a much larger window between clog truncation and subsequent updating of the limit. Fix this by recording the oldest xid in clog truncation records and applying the oldestXid under XidGenLock before replaying the clog truncation. Note that There's no need to take XidGenLock for normal clog lookups protected by datfrozenxid, only if accepting arbitrary XIDs that might not be protected by vacuum thresholds. --- src/backend/access/rmgrdesc/clogdesc.c | 12 +-- src/backend/access/transam/clog.c | 33 + src/backend/access/transam/varsup.c| 38 -- src/backend/access/transam/xlog.c | 17 +++ src/backend/commands/vacuum.c | 13 src/include/access/clog.h | 5 + src/include/access/transam.h | 2 ++ 7 files changed, 103 insertions(+), 17 deletions(-) diff --git a/src
Re: [HACKERS] Parallel Index Scans
On Fri, Jan 20, 2017 at 7:29 AM, Haribabu Kommi wrote: > > On Thu, Jan 19, 2017 at 1:18 AM, Amit Kapila > wrote: >> > >> > +extern BlockNumber _bt_parallel_seize(IndexScanDesc scan, bool >> > *status); >> > +extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber >> > scan_page); >> > >> > Any better names for the above functions, as these function will >> > provide/set >> > the next page that needs to be read. >> > >> >> These functions also set the state of scan. IIRC, these names were >> being agreed between Robert and Rahila as well (suggested offlist by >> Robert). I am open to change if you or others have any better >> suggestions. > > > I didn't find any better names other than the following, > > _bt_get_next_parallel_page > _bt_set_next_parallel_page > I am not sure using *_next_* here will convey the message because for backward scans we set the last page. I am open to changing the names of functions if committer and or others prefer the names suggested by you. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
Hello, Please find attached an updated WIP patch. I have incorporated almost all comments. This is to be applied over Robert's patches. I will post performance results later on. 1. shift (>>) and AND (&) operations: The assign hook of wal_segment_size sets the WalModMask and WalShiftBit. All the modulo and division operations using XLogSegSize has been replaced with these. However, there are many preprocessors which divide with XLogSegSize in xlog_internal.h. I have not changed these because it would mean I will have to reassign the WalShiftBit along with XLogSegSize in all the modules which use these macros. That does not seem to be a good idea. Also, this means shift operator can be used only in couple of places. 2. pg_standby: it deals with WAL files, so I have used the file size to set the XLogSegSize (similar to pg_xlogdump). Also, macro MaxSegmentsPerLogFile using XLOG_SEG_SIZE is now defined in SetWALFileNameForCleanup where it is used. Since XLOG_SEG_SIZE is not preset, the code which throws an message if the file size is greater than XLOG_SEG_SIZE had to be removed. 3. XLOGChooseNumBuffers: This function, called during the creation of Shared Memory Segment, requires XLogSegSize which is set from the ControlFile. Hence temporarily read the ControlFile in XLOGShmemSize before invoking XLOGChooseNumBuffer. The ControlFile is read again and stored on the Shared Memory later on. 4. IsValidXLogSegSize: This is a macro to verify the XLogSegSize. This is used in initdb, pg_xlogdump, pg_standby. 5. Macro for power2: There were couple of ideas to make it centralised. For now, I have just defined it in xlog_internal. 6. Since CRC is used to verify the ControlFile before reading all the contents from it as is and I do not see the need to re-verify the xlog_seg_size. 7. min/max_wal_size still take values in KB unit and internally store it as segment count. Though the calculation is now shifted to their respective assign hook as the GUC_UNIT_XSEGS had to be removed. 8. Declaring XLogSegSize: There are 2 internal variables for the same parameter. In original code XLOG_SEG_SIZE is defined in the auto-generated file src/include/pg_config.h. And xlog_internal.h defines: #define XLogSegSize ((uint32) XLOG_SEG_SIZE) To avoid renaming all parts of code, I made the following change in xlog_internal.h + extern uint32 XLogSegSize; +#define XLOG_SEG_SIZE XLogSegSize would it be better to just use one variable XLogSegSize everywhere. But few external modules could be using XLOG_SEG_SIZE. Thoughts? 9. Documentation will be added in next version of patch. -- Beena Emerson On Sat, Jan 21, 2017 at 5:30 AM, Michael Paquier wrote: > On Sat, Jan 21, 2017 at 4:50 AM, Robert Haas > wrote: > > On Fri, Jan 20, 2017 at 2:34 AM, Michael Paquier > > wrote: > >> On Fri, Jan 20, 2017 at 12:49 AM, Robert Haas > wrote: > >>> On Wed, Jan 18, 2017 at 12:42 PM, Robert Haas > wrote: > Problems 2-4 actually have to do with a DestReceiver of type > DestRemote really, really wanting to have an associated Portal and > database connection, so one approach is to create a stripped-down > DestReceiver that doesn't care about those things and then passing > that to GetPGVariable. > >>> > >>> I tried that and it worked out pretty well, so I'm inclined to go with > >>> this approach. Proposed patches attached. 0001 adds the new > >>> DestReceiver type, and 0002 is a revised patch to implement the SHOW > >>> command itself. > >>> > >>> Thoughts, comments? > >> > >> This looks like a sensible approach to me. DestRemoteSimple could be > >> useful for background workers that are not connected to a database as > >> well. Isn't there a problem with PGC_REAL parameters? > > > > No, because the output of SHOW is always of type text, regardless of > > the type of the GUC. > > Thinking about that over night, that looks pretty nice. What would be > nicer though would be to add INT8OID and BYTEAOID in the list, and > convert as well the other replication commands. At the end, I think > that we should finish by being able to remove all pq_* routine > dependencies in walsender.c, saving quite a couple of lines. > -- > Michael > -- Thank you, Beena Emerson Have a Great Day! initdb-walsegsize-v2_WIP.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Index Scans
On Sat, Jan 21, 2017 at 12:23 PM, Amit Kapila wrote: > On Sat, Jan 21, 2017 at 1:15 AM, Robert Haas wrote: >> >> I think it's a good idea to put all three of those functions together >> in the listing, similar to what we did in >> 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06 for FDWs. After all they are >> closely related in purpose, and it may be easiest to understand if >> they are next to each other in the listing. I suggest that we move >> them to the end in IndexAmRoutine similar to the way FdwRoutine was >> done; in other words, my idea is to make the structure consistent with >> the way that I revised the documentation instead of making the >> documentation consistent with the order you picked for the structure >> members. What I like about that is that it gives a good opportunity >> to include some general remarks on parallel index scans in a central >> place, as I did in that patch. Also, it makes it easier for people >> who care about parallel index scans to find all of the related things >> (since they are together) and for people who don't care about them to >> ignore it all (for the same reason). What do you think about that >> approach? >> > > Sounds sensible. Updated patch based on that approach is attached. > In spite of being careful, I missed reorganizing the functions in genam.h which I have done in attached patch. > I > will rebase the remaining work based on this patch and send them > separately. > Rebased patches are attached. I have fixed few review comments in these patches. parallel_index_scan_v6 - Changed the function btparallelrescan so that it always expects a valid parallel scan descriptor. parallel_index_opt_exec_support_v6 - Removed the usage of GatherSupportsBackwardScan. Expanded the comments in ExecReScanIndexScan. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel-generic-index-scan.2.patch Description: Binary data parallel_index_scan_v6.patch Description: Binary data parallel_index_opt_exec_support_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On 2017/01/21 6:29, Robert Haas wrote: > On Fri, Jan 20, 2017 at 1:15 AM, Andres Freund wrote: >> On 2017-01-19 14:18:23 -0500, Robert Haas wrote: >>> Committed. >> >> One of the patches around this topic committed recently seems to cause >> valgrind failures like >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-01-19%2008%3A40%3A02 >> : > > Tom Lane independently reported this same issue. I believe I've fixed > it. Sorry for not noticing and crediting this report also. Thanks Robert for looking at this and committing the fix. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Valgrind-detected bug in partitioning code
On 2017/01/21 9:01, Tom Lane wrote: > Robert Haas writes: >> The difference is that those other equalBLAH functions call a >> carefully limited amount of code whereas, in looking over the >> backtrace you sent, I realized that equalPartitionDescs is calling >> partition_bounds_equal which does this: >>cmpval = >> DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j], >> key->partcollation[j], >> b1->datums[i][j], >> b2->datums[i][j])) > > Ah, gotcha. > >> That's of course opening up a much bigger can of worms. But apart >> from the fact that it's unsafe, I think it's also wrong, as I said >> upthread. I think calling datumIsEqual() there should be better all >> around. Do you think that's unsafe here? > > That sounds like a plausible solution. It is safe in the sense of > being a bounded amount of code. It would return "false" in various > interesting cases like toast pointer versus detoasted equivalent, > but I think that would be fine in this application. Sorry for jumping in late. Attached patch replaces the call to partitioning-specific comparison function by the call to datumIsEqual(). I wonder if it is safe to assume that datumIsEqual() would return true for a datum and copy of it made using datumCopy(). The latter is used to copy a single datum from a bound's Const node (what is stored in the catalog for every bound). > It would probably be a good idea to add something to datumIsEqual's > comment to the effect that trying to make it smarter would be a bad idea, > because some callers rely on it being stupid. I assume "making datumIsEqual() smarter" here means to make it account for toasting of varlena datums, which is not a good idea because some of its callers may be working in the context of an aborted transaction. I tried to update the header comment along these lines, though please feel to rewrite it. Thanks, Amit diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index ad95b1bc55..8c9a373f1f 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -639,12 +639,14 @@ partition_bounds_equal(PartitionKey key, continue; } - /* Compare the actual values */ - cmpval = DatumGetInt32(FunctionCall2Coll(&key->partsupfunc[j], - key->partcollation[j], - b1->datums[i][j], - b2->datums[i][j])); - if (cmpval != 0) + /* + * Compare the actual values; note that we do not invoke the + * partitioning specific comparison operator here, because we + * could be in an aborted transaction. + */ + if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j], + key->parttypbyval[j], + key->parttyplen[j])) return false; } diff --git a/src/backend/utils/adt/datum.c b/src/backend/utils/adt/datum.c index 535e4277cc..071a7d4db1 100644 --- a/src/backend/utils/adt/datum.c +++ b/src/backend/utils/adt/datum.c @@ -209,6 +209,10 @@ datumTransfer(Datum value, bool typByVal, int typLen) * of say the representation of zero in one's complement arithmetic). * Also, it will probably not give the answer you want if either * datum has been "toasted". + * + * Do not try to make this any smarter than it currently is with respect + * to "toasted" datums, because some of the callers could be working in the + * context of an aborted transaction. *- */ bool -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication launcher's bgworker enabled by default, and max_logical_replication_workers
On 23 January 2017 at 13:19, Jaime Casanova wrote: > On 22 January 2017 at 23:37, Michael Paquier > wrote: >> Hi all, >> >> When spawning a new instance, I found the following thing, which is >> surprising at first sight: >> postgres: bgworker: logical replication launcher >> > > This is because the downstream needs it > https://www.postgresql.org/message-id/CAMsr%2BYHH2XRUeqWT6pn_X0tFpP5ci7Fsrsn67TDXbFLeMknhBA%40mail.gmail.com ... and the launcher is responsible for launching workers for downstreams. We could probably have the postmaster check whether any logical replication downstreams exist anywhere and avoid starting the launcher, but that means the postmaster has to start poking in the logical replication catalog tables. That seems unnecessarily risky. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication launcher's bgworker enabled by default, and max_logical_replication_workers
On 22 January 2017 at 23:37, Michael Paquier wrote: > Hi all, > > When spawning a new instance, I found the following thing, which is > surprising at first sight: > postgres: bgworker: logical replication launcher > This is because the downstream needs it https://www.postgresql.org/message-id/CAMsr%2BYHH2XRUeqWT6pn_X0tFpP5ci7Fsrsn67TDXbFLeMknhBA%40mail.gmail.com > In the same range of thoughts, it is also surprising to have the > default value of max_logical_replication_workers set to 4, I would > have thought that for a feature that has created more than 13k of code > diffs, it would be disabled by default. > +1, we should that to 0 before release -- Jaime Casanova www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Mon, Jan 23, 2017 at 2:06 PM, Venkata B Nagothi wrote: > > On Sat, Jan 14, 2017 at 5:10 AM, Vladimir Rusinov > wrote: >> >> Attached are two new version of the patch: one keeps aliases, one don't. > > > Both the patches (with and without aliases) are not getting applied to the > latest master. Below is the error - > > Perhaps you used the wrong -p or --strip option? > The text leading up to this was: > -- > |diff --git a/src/include/access/xlog_fn.h b/src/include/access/xlog_fn.h > |index a013d047a2..a46b2f995c 100644 > |--- a/src/include/access/xlog_fn.h > |+++ b/src/include/access/xlog_fn.h > -- > File to patch: xlog_fn.h is not necessary anymore as all the declarations of functions defined in pg_proc.h are now generated automatically, so the conflict is simple enough to resolve. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal
On Sat, Jan 14, 2017 at 5:10 AM, Vladimir Rusinov wrote: > Attached are two new version of the patch: one keeps aliases, one don't. > Both the patches (with and without aliases) are not getting applied to the latest master. Below is the error - Perhaps you used the wrong -p or --strip option? The text leading up to this was: -- |diff --git a/src/include/access/xlog_fn.h b/src/include/access/xlog_fn.h |index a013d047a2..a46b2f995c 100644 |--- a/src/include/access/xlog_fn.h |+++ b/src/include/access/xlog_fn.h -- File to patch: Also, remove stray reference to xlog function in one of the tests. > > I've lost vote count. Should we create a form to calculate which one of > the patches should be commited? > > If we decide to go the extension way, perhaps it can be maintained outside > of core Postgres? > My take on having aliases to the functions : In my opinion as a DBA, I agree with having no-aliases. Having functions doing the same thing with two different names could be confusing. I have been doing quite a number of PostgreSQL upgrades since few years, i do not see, the function name changes as a major issue or a road-blocker in the upgrade exercise. If the function name has changed, it has changed. It would be a serious thing to consider during the upgrade if there is a change in the functionality of a particular function as it needs testing. I think, changing of the function names in the scripts and custom-tools has to be executed and might take up a bit of extra time. IMHO, It is not a good idea to keep aliases as this would make the DBAs lazier to change the function names on priority in the automated scripts/jobs/tools during the upgrades. Especially, in bigger and complex environments where-in database environments are handled by different multiple groups of DBAs, it could be possible that both *xlog* functions and *wal* functions will be used up in the scripts and all of them will be working fine and once the *xlog* functions names are completely removed, then some of the jobs/scripts start failing. I would vote for "no-aliases". Regards, Venkata Balaji N Database Consultant
Re: [HACKERS] Commit fest 2017-01 will begin soon!
Michael, all, * Michael Paquier (michael.paqu...@gmail.com) wrote: > There is one week left for this commit fest, and now is the time to > push for things that have a chance to get in. As of now, there is a > total of 25 patches waiting for some love from a committer. Many thanks for your work as CFM this month! > In the category of Clients, there are some as well: > - pgbench - more operators and functions. I don't mind working to review these, but my recollection is that there's actually still question about just what operators, what functions, and how we want to handle them. In other words, this particular patch, imv, could really use a 'summary' email, most likely from the author, regarding the various opinions raised and the different view-points presented. Otherwise, I'm not sure it's really going to make much more progress. > - pgpassfile connection option. > - pg_dump, pg_dumpall and data durability. I'm interested in these. I had thought the data durability one had another committer already interested and involved, but if not, I may get to it. > Now for Miscellaneous: > - Rename pg_switch_xlog to pg_switch_wal, the name is misleading here > as this is basically renaming all the system functions including the > term "xlog" prefix to "wal". I'm a definite +1 on that happening but, for as much as it's marked 'ready for committer', it's entirely without consensus. I'm not sure how to resolve that- we could ask some committee to investigate it further and provide a final determination (core? some other set of folks?), or simply tally the votes cast and decide to go with whatever the result of that is, or change the options (and then figure out how to decide between whatever the new options are..). > For Monitoring & Control: > - amcheck (B-Tree integrity checking tool). This patch is in such a > state for a rather long time... I'm a big +1 on having this included. In my view, this is similar to the 'enable checksums by default' question, but without nearly *all* of the downside. There's no performance impact, there's no usability issue, the only concerns that I see off-hand are serious bugs that screw things up and the possibility that it throws an error when nothing is wrong., neither of which look very likely considering it's been used extensivly by a number of people and quite a bit at Heroku, as I understand it. At this point, I tend to doubt that we're going to see such errors without the broader community embracing and using it, which means. to me at least, that it's ready to go into core/contrib. > For server features: > - Forbid use of LF and CR characters in database and role names. I'll try and take a look at this. In other words, for me, this comes before the pgpassfile/pg_dump data durability stuff. I have a couple other things to clear off my plate, but I expect to get those taken care of and still have time to handle this too. > As of the current score of the CF, for a total of 155 patches, 52 have > been committed, 3 rejected and 7 marked as returned with feedback. It > may look low, but actually it is not that bad by experience looking at > those numbers. We've certainly had some really big patches land during this CF. I'm hopeful that we won't have any "really big" patches landing for the last CF, or PG10 is going to be difficult to wrestle out the door. Thanks again, Michael! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel bitmap heap scan
On Thu, Jan 19, 2017 at 12:26 AM, Robert Haas wrote: >> Patch 0001 and 0003 required to rebase on the latest head. 0002 is >> still the same. > > I've committed the first half of 0001. Thanks. 0001 and 0003 required rebasing after this commit. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 0001-opt-parallelcost-refactoring-v12.patch Description: Binary data 0002-hash-support-alloc-free-v12.patch Description: Binary data 0003-parallel-bitmap-heap-scan-v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Logical replication launcher's bgworker enabled by default, and max_logical_replication_workers
Hi all, When spawning a new instance, I found the following thing, which is surprising at first sight: postgres: bgworker: logical replication launcher There is perhaps no problem to keep that enabled by default until the release 10 wraps to give it some buildfarm coverage similarly to what has been done last year for parallel query, but what is surprising me is that even if wal_level is *not* logical this gets started. I think that something like the patch attached is needed, so as ApplyLauncherRegister() is a noop if wal_level < logical. In the same range of thoughts, it is also surprising to have the default value of max_logical_replication_workers set to 4, I would have thought that for a feature that has created more than 13k of code diffs, it would be disabled by default. Thanks, -- Michael logirep-bgworker.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
On 20 January 2017 at 21:40, Alvaro Herrera wrote: > One option would be to add another limit Xid which advances before the > truncation but which is not used for other decisions other than limiting > what can users consult. This could be useful for other things, but it's probably heavier than needed. What I've done in the latest revision of the txid_status() patch is simply to advance OldestXid _before_ truncating the clog. The rest of the xid info is advanced after. Currently this is incorporated into the txid_status patch, but can be separated if desired. Relevant commit message portion: There was previously no way to look up an arbitrary xid without running the risk of having clog truncated out from under you. This hasn't been a problem because anything looking up xids in clog knows they're protected by datminxid, but that's not the case for arbitrary user-supplied XIDs. clog was truncated before we advance oldestXid so taking XidGenLock was insufficient. There's no way to look up a SLRU with soft-failure. To address this, increase oldestXid under XidGenLock before we trunate clog rather than after, so concurrent access is safe. Note that while oldestXid is advanced before clog truncation, the xid limits are advanced _after_ it. If we advanced the xid limits before truncation too, we'd theoretically run the risk of allocating an xid from the clog section we're about to truncate, which would be no fun. (In practice it can't really happen since we only use 1/2 the available space at a time). Moving the lower bound up, truncating, and moving the upper bound up is the way to go IMO. > Another option is not to implement direct reads > from the clog. I think there's a pretty decent argument for having clog lookups; txid_status(...) serves as a useful halfway position between accepting indeterminate commit status on connection loss and using full 2PC. > Yet another option is that before we add such interface > somebody produces proof that the problem does not, in fact, exist. It does exist. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest 2017-01 will begin soon!
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier > - Cascading standby cannot catch up and get stuck emitting the same message > repeatedly, a 9.3-only bug fix. Thank you for your hard work as a CFM. For a trivial note, this is for 9.2 only, not 9.3. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups
On 20 January 2017 at 05:32, Peter Eisentraut wrote: > On 1/19/17 10:06 AM, Alvaro Herrera wrote: >>> Also, I wonder whether we should not in vacuum.c change the order of the >>> calls of SetTransactionIdLimit() and SetMultiXactIdLimit() as well, just >>> to keep everything consistent. >> >> I am wary of doing that. The current coding is well battle-tested by >> now, but doing things in the opposite order, not at all. Pending some >> testing to show that there is no problem with a change, I would leave >> things as they are. > > I appreciate this hesitation. > > The issue the patch under discussion is addressing is that we have a > user-space interface to read information about commit timestamps. So if > we truncate away old information before we update the mark where the > good information starts, then we get the race. We don't have a > user-space interface reading, say, the clog, but it doesn't seem > implausible for that to exist at some point. How would this code have > to be structured then? I have a patch in the current commitfest that exposes just such a user interface, txid_status() . See https://commitfest.postgresql.org/12/730/ . Robert was about to commit when he identified this race in commit status lookup, which led me to identify the same race addressed here for commit timestamps. >> What I fear is: what >> happens if a concurrent checkpoint reads the values between the two >> operations, and a crash occurs? I think that the checkpoint might save >> the updated values, so after crash recovery the truncate would not be >> executed, possibly leaving files around. Leaving files around might be >> dangerous for multixacts at least (it's probably harmless for xids). > > Why is leaving files around dangerous? As far as I can tell, leaving the files around as such isn't dangerous. The problem arises if we wrap around and start treating old SLRU contents as new again without first truncating them away. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix a comment in feelist.c
> I've found a mistake in a comment of StrategyNotifyBgWriter > in freelist.c. bgwriterLatch was replaced by bgwprocno in > the following commit, but this is remained in the comment. > > commit d72731a70450b5e7084991b9caa15cb58a2820df > Author: Andres Freund > Date: Thu Dec 25 18:24:20 2014 +0100 > > Lockless StrategyGetBuffer clock sweep hot path. Looks good to me. I will commit/push the patch if there's no objection. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commit fest 2017-01 will begin soon!
On Wed, Jan 4, 2017 at 4:02 PM, Michael Paquier wrote: > Here are now the current stats of the patches: > - Needs review: 82. > - Waiting on Author: 18. > - Ready for Committer: 18. > Meaning that some effort is required from committers and reviewers to > get into a better balance. For the authors of the patches waiting for > their input, please avoid letting your patch in a latent state for too > long. > > And of course, please double-check that the status of your patch on > the CF app (https://commitfest.postgresql.org/12/) is set according to > the status of the thread. > > Let's have a productive CF for Postgres 10! And the show begins.. > Thanks. There is one week left for this commit fest, and now is the time to push for things that have a chance to get in. As of now, there is a total of 25 patches waiting for some love from a committer. In the category of Bug Fixes, there are the following things that deserve attention: - pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled, Windows-only patch, and a bug fix. - Cascading standby cannot catch up and get stuck emitting the same message repeatedly, a 9.3-only bug fix. - Potential data loss of 2PC files, a bug fix to address the fact that pg_twophase is never fsync()'d. All those patches apply on multiple branches. In the category of Clients, there are some as well: - pgbench - more operators and functions. - pgpassfile connection option. - pg_dump, pg_dumpall and data durability. Now for Miscellaneous: - slab-like memory allocators for reorderbuffer - Rename pg_switch_xlog to pg_switch_wal, the name is misleading here as this is basically renaming all the system functions including the term "xlog" prefix to "wal". - Add GUCs for predicate lock promotion thresholds - An isolation test for SERIALIZABLE READ ONLY DEFERRABLE For Monitoring & Control: - amcheck (B-Tree integrity checking tool). This patch is in such a state for a rather long time... For Performance: - Group mode for CLOG updates. - Vacuum: allow usage of more than 1GB of work mem. - Parallel Index Scans. - Unique Joins For Replication & Recovery: - Write Ahead Logging for Hash Indexes. - WAL consistency check facility - Reset slot xmin when HS feedback is turned off while standby is shut down For server features: - Forbid use of LF and CR characters in database and role names. - XMLTABLE function - background sessions - pg_background contrib module proposal - A contrib extension to help prevent common errors in DML - Add pg_current_logfile() function to obtain the current log file used by the syslogger. - pg_xlogdump follow into the future when specifying filename. Magnus, are you going to take care of that? As of the current score of the CF, for a total of 155 patches, 52 have been committed, 3 rejected and 7 marked as returned with feedback. It may look low, but actually it is not that bad by experience looking at those numbers. There is also a very high number of patches waiting for input from the author, so please be sure to update your patch status correctly, or send a new version to address the comments from reviewers. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump, pg_dumpall and data durability
On Tue, Nov 29, 2016 at 1:30 PM, Michael Paquier wrote: > On Mon, Nov 14, 2016 at 6:07 PM, Michael Paquier > wrote: >> On Mon, Nov 14, 2016 at 6:04 PM, Albe Laurenz >> wrote: >>> Michael Paquier wrote: Meh. I forgot docs and --help output updates. >>> >>> Looks good, except that you left the "N" option in the getopt_long >>> call for pg_dumpall. I fixed that in the attached patch. >> >> No, v5 has removed it, but it does not matter much now... >> >>> I'll mark the patch "ready for committer". >> >> Thanks! > > Moved to CF 2017-01. Moved to CF 2017-03 with the same status, ready for committer. Perhaps there is some interest in this feature? v5 of the patch still applies, with a couple of hunks, so v6 is attached. -- Michael pgdump-sync-v6.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On Wed, Jan 18, 2017 at 2:46 PM, Michael Paquier wrote: > FWIW, this patch is on a "waiting on author" state and that's right. > As the discussion on SASLprepare() and the decisions regarding the way > to implement it, or at least have it, are still pending, I am not > planning to move on with any implementation until we have a plan about > what to do. Just using libidn (LGPL) for a first shot is rather > painless but... I am not alone here. With decisions on this matter pending, I am marking this patch as "returned with feedback". If there is a consensus on what to do, I'll be happy to do the implementation with the last CF in March in sight. If no, that would mean that this feature will not be part of PG 10. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Enabling replication connections by default in pg_hba.conf
Hi all, As now wal_level = replica has become the default for Postgres 10, could we consider as well making replication connections enabled by default in pg_hba.conf? This requires just uncommenting a couple of lines in pg_hba.conf.sample. Thanks, -- Michael pghba_rep_default.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander > Is it time to enable checksums by default, and give initdb a switch to turn > it off instead? > > I keep running into situations where people haven't enabled it, because > (a) they didn't know about it, or (b) their packaging system ran initdb > for them so they didn't even know they could. And of course they usually > figure this out once the db has enough data and traffic that the only way > to fix it is to set up something like slony/bucardo/pglogical and a whole > new server to deal with it.. (Which is something that would also be good > to fix -- but having the default changed would be useful as well) +10 I was wondering why the community had decided to turn it off by default. IIRC, the reason was that the performance overhead was 20-30% when the entire data directory was placed on the tmpfs, but it's not as important as the data protection by default. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical Replication WIP
On 20/01/17 22:30, Petr Jelinek wrote: > Since it's not exactly straight forward to find when these need to be > initialized based on commands, I decided to move the initialization code > to exec_replication_command() since that's always called before anything > so that makes it much less error prone (patch 0003). > > The 0003 should be backpatched all the way to 9.4 where multiple > commands started using those buffers. > Actually there is better place, the WalSndInit(). Just to make it easier for PeterE (or whichever committer picks this up) I attached all the logical replication followup fix/polish patches: 0001 - Changes the libpqrcv_connect to use async libpq api so that it won't get stuck forever in case of connect is stuck. This is preexisting bug that also affects walreceiver but it's less visible there as there is no SQL interface to initiate connection there. 0002 - Close replication connection when CREATE SUBSCRIPTION gets canceled (otherwise walsender on the other side may stay in idle in transaction state). 0003 - Fixes buffer initialization in walsender that I found when testing the above two. This one should be back-patched to 9.4 since it's broken since then. 0004 - Fixes the foreign key issue reported by Thom Brown and also adds tests for FK and trigger handling. 0005 - Adds support for renaming publications and subscriptions. All rebased on top of current master (90992e0). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From e2c30b258e97fbba51c8d0f6e12ac557cafb129a Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Fri, 20 Jan 2017 21:20:56 +0100 Subject: [PATCH 1/5] Use asynchronous connect API in libpqwalreceiver --- src/backend/postmaster/pgstat.c| 4 +- .../libpqwalreceiver/libpqwalreceiver.c| 58 +- src/include/pgstat.h | 2 +- 3 files changed, 59 insertions(+), 5 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 7176cf1..19ad6b5 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3340,8 +3340,8 @@ pgstat_get_wait_client(WaitEventClient w) case WAIT_EVENT_WAL_RECEIVER_WAIT_START: event_name = "WalReceiverWaitStart"; break; - case WAIT_EVENT_LIBPQWALRECEIVER_READ: - event_name = "LibPQWalReceiverRead"; + case WAIT_EVENT_LIBPQWALRECEIVER: + event_name = "LibPQWalReceiver"; break; case WAIT_EVENT_WAL_SENDER_WAIT_WAL: event_name = "WalSenderWaitForWAL"; diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 7df3698..8dc51d4 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -112,6 +112,7 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname, char **err) { WalReceiverConn *conn; + PostgresPollingStatusType status; const char *keys[5]; const char *vals[5]; int i = 0; @@ -138,7 +139,60 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname, vals[i] = NULL; conn = palloc0(sizeof(WalReceiverConn)); - conn->streamConn = PQconnectdbParams(keys, vals, /* expand_dbname = */ true); + conn->streamConn = PQconnectStartParams(keys, vals, + /* expand_dbname = */ true); + /* Check for conn status. */ + if (PQstatus(conn->streamConn) == CONNECTION_BAD) + { + *err = pstrdup(PQerrorMessage(conn->streamConn)); + return NULL; + } + + /* Poll connection. */ + do + { + int rc; + int extra_flag; + + /* Determine which function we should use for Polling. */ + status = PQconnectPoll(conn->streamConn); + + /* Next action based upon status value. */ + switch (status) + { + case PGRES_POLLING_READING: +extra_flag = WL_SOCKET_READABLE; +/* pass through */ + case PGRES_POLLING_WRITING: +extra_flag = WL_SOCKET_WRITEABLE; + +ResetLatch(&MyProc->procLatch); +rc = WaitLatchOrSocket(&MyProc->procLatch, + WL_POSTMASTER_DEATH | + WL_LATCH_SET | extra_flag, + PQsocket(conn->streamConn), + 0, + WAIT_EVENT_LIBPQWALRECEIVER); +if (rc & WL_POSTMASTER_DEATH) + exit(1); + +/* Interrupted. */ +if (rc & WL_LATCH_SET) +{ + CHECK_FOR_INTERRUPTS(); + break; +} +break; + + /* Either can continue polling or finished one way or another. */ + default: +break; + } + + /* Loop until we have OK or FAILED status. */ + } while (status != PGRES_POLLING_OK && status != PGRES_POLLING_FAILED); + + /* Check the status. */ if (PQstatus(conn->streamConn) != CONNECTION_OK) { *err = pstrdup(PQerrorMessage(conn->streamConn)); @@ -508,7 +562,7 @@ libpqrcv_PQexec(PGconn *streamConn, const char *query) WL_LATCH_SET, PQsocket(streamConn), 0, -
\if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)
> > > Fabien is pressed for time, so I've been speaking with him out-of-thread > about how I should go about implementing it. > > The v1 patch will be \if , \elseif , \else, \endif, where > will be naively evaluated via ParseVariableBool(). > > \ifs and \endifs must be in the same "file" (each MainLoop will start a > new if-stack). This is partly for sanity (you can see the pairings unless > the programmer is off in \gset meta-land), partly for ease of design (data > structures live in MainLoop), but mostly because it would an absolute > requirement if we ever got around to doing \while. > > I hope to have something ready for the next commitfest. > > As for the fate of \quit_if, I can see it both ways. On the one hand, > it's super-simple, already written, and handy. > > On the other hand, it's easily replaced by > > \if > \q > \endif > > > So I'll leave that as a separate reviewable patch. > > As for loops, I don't think anyone was pushing for implementing \while > now, only to have a decision about what it would look like and how it would > work. There's a whole lot of recording infrastructure (the input could be a > stream) needed to make it happen. Moreover, I think \gexec scratched a > lot of the itches that would have been solved via a psql looping structure. > And here's the patch. I've changed the subject line and will be submitting a new entry to the commitfest. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 9915731..2c3fccd 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2007,6 +2007,83 @@ hello 10 + +\if expr +\elseif expr +\else +\endif + + +This group of commands implements nestable conditional blocks, like +this: + + +\if true +\if 1 +\if yes +\if on +\echo 'all true' +all true +\endif +\endif +\endif +\endif +\if false +\elseif 0 +\elseif no +\elseif off +\else +\echo 'all false' +all false +\endif +\if true +\else +\echo 'should not print #1' +\endif +\if false +\elseif true +\else +\echo 'should not print #2' +\endif + + +Conditional blocks must begin with a \if and end +with an \endif, and the pairs must be found in +the same source file. If an EOF is reached on the main file or an +\include-ed file before all +\if-\endif are matched, then +psql will raise an error. + + +The \if and \elseif commands +read the rest of the line and evaluate it as a boolean expression. +Currently, expressions are limited to the same values allowed for +other option booleans (true, false, 1, 0, on, off, yes, no, etc). + + +Queries within a false branch of a conditional block will not be +sent to the server. + + +Non-conditional \-commands within a false branch +of a conditional block will not be evaluated for correctness. The +command will be ignored along with all remaining input to the end +of the line. + + +Expressions on \if and \elseif +commands within a false branch of a conditional block will not be +evaluated. + + +A conditional block can at most one \else command. + + +The \elseif command cannot follow the +\else command. + + + \ir or \include_relative filename diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 4139b77..d4e0bb8 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -49,6 +49,7 @@ #include "psqlscanslash.h" #include "settings.h" #include "variables.h" +#include "fe_utils/psqlscan_int.h" /* * Editable database object types. @@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state, status = PSQL_CMD_ERROR; } - if (status != PSQL_CMD_ERROR) + if ((status != PSQL_CMD_ERROR) && psqlscan_branch_active(scan_state)) { /* eat any remaining arguments after a valid command */ /* note we suppress evaluation of backticks here */ @@ -194,6 +195,32 @@ read_connect_arg(PsqlScanState scan_state) return result; } +/* + * Read and interpret argument as a boolean expression. + */ +static bool +read_boolean_expression(PsqlScanState scan_state, const char *action) +{ + boolresult = false; + char*expr = psql_scan_slash_option(scan_state, + OT_NORMAL, NULL, false); + if (expr) + { + if (ParseVariableBool(expr, action)) + result = true; + free(expr); + } + return result; +} + +static bool +is_branching_command(const char *cmd) +{ +
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
On 1/16/17 10:09 PM, Haribabu Kommi wrote: Yes, that' correct. Currently with this approach, it is not possible to ditch the heap completely. This approach is useful for the cases, where the user wants to store only some columns as part of clustered index. Ahh, that's unfortunate. Billion row+ tables are becoming rather common, and that 24GB of overhead starts becoming very painful. It's actually a lot worse considering there will be at least one index on the table, so 100GB+ of overhead isn't that uncommon. Another complication is that one of the big advantages of a CSTORE is allowing analysis to be done efficiently on a column-by-column (as opposed to row-by-row) basis. Does your patch by chance provide that? Not the base patch that I shared. But the further patches provides the data access column-by-column basis using the custom plan methods. Great, that's something else that a column store really needs to be successful. Something else I suspect is necessary is a faster/better way to eliminate chunks of rows from scans. Just as an example, with my simple array-based approach, you can store a range type along with each array that contains the min and max values for the array. That means any query that wants values between 50 and 100 can include a clause that filters on range types that overlap with [50,100]. That can be indexed very efficiently and is fast to run checks against. Generally speaking, I do think the idea of adding support for this as an "index" is a really good starting point, since that part of ... as discussed elsewhere in the thread, adding a bunch of hooks is probably not a good way to do this. :/ That would be a great way to gain knowledge on what users would want to see in a column store, something else I suspect we need. It would also be far less code than what you or Alvaro are proposing. When it comes to large changes that don't have crystal-clear requirements, I think that's really important. The main use case of this patch is to support mixed load environments, where both OLTP and OLAP queries are possible. The advantage of proposed patch design is, providing good performance to OLAP queries without affecting OLTP. Yeah, that's a big part of what I was envisioning with my array-based approach. In simple terms, there would be a regular row-based table, and an array-based table, with a view that allows seamless querying into both (re-presenting the array-storage on a per-row basis). There would be a periodic process that moves entire sets of rows from the row storage into the array storage. If you updated or deleted a row that was part of an array, the contents of the entire array could be moved back into row-based storage. After a period of time, rows would get moved back into array storage. Or the array could be modified in place, but you need to be very careful about bloating the array storage if you do that. The big missing piece here is getting the planner to intelligently handle a mixed row/column store. As I mentioned, you can easily add range type fields to greatly increase performance, but they won't do any good unless the appropriate filters get added. It's not THAT hard to do that by hand, but it'd be great if there was a more automated method. Such a method might also be very useful for transforming expressions like date_part('quarter', ...) into something that could use existing indexes. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.
I wrote: > I spent some time fooling with this today and came up with the attached > patch. I think this is basically the direction we should go in, but > there are various loose ends: Here's an updated patch that's rebased against today's HEAD and addresses most of the loose ends: > 1. I adjusted a couple of existing regression tests whose results are > affected, but didn't do anything towards adding new tests showing the > desirable results of this change (as per the examples in this thread). Added some regression test cases. These are mostly designed to prove that coercion to text occurs where expected, but I did include a couple of queries that would have failed outright before. > 2. I didn't do anything about docs, either, though maybe no change > is needed. One user-visible change from this is that queries should > never return any "unknown"-type columns to clients anymore. But I > think that is not documented now, so maybe there's nothing to change. The only thing I could find in the SGML docs that directly addresses unknown-type columns was a remark in the CREATE VIEW man page, which I've updated. I also added a section to typeconv.sgml to document the behavior. > 3. We need to look at whether pg_upgrade is affected. I think we > might be able to let it just ignore the issue for views, since they'd > get properly updated during the dump and reload anyway. But if someone > had an actual table (or matview) with an "unknown" column, that would > be a pg_upgrade hazard, because "unknown" doesn't store like "text". I tested and found that simple views with unknown columns seem to update sanely if we just let pg_upgrade dump and restore them. So I suggest we allow that to happen. There might be cases where dependent views behave unexpectedly after such a conversion, but I think that would be rare enough that it's not worth forcing users to fix these views by hand before updating. However, tables with unknown columns would fail the upgrade (since we'd reject the CREATE TABLE command) while matviews would be accepted but then the DDL wouldn't match the physical data storage. So I added code to pg_upgrade to check for those cases and refuse to proceed. This is almost a straight copy-and-paste of the existing pg_upgrade code for checking for "line" columns. I think this is committable now; the other loose ends can be dealt with in follow-on patches. Does anyone want to review it? regards, tom lane diff --git a/doc/src/sgml/ref/create_view.sgml b/doc/src/sgml/ref/create_view.sgml index 8641e19..a83d956 100644 *** a/doc/src/sgml/ref/create_view.sgml --- b/doc/src/sgml/ref/create_view.sgml *** CREATE VIEW [ schema . ] *** 251,259 CREATE VIEW vista AS SELECT 'Hello World'; ! is bad form in two ways: the column name defaults to ?column?, ! and the column data type defaults to unknown. If you want a ! string literal in a view's result, use something like: CREATE VIEW vista AS SELECT text 'Hello World' AS hello; --- 251,260 CREATE VIEW vista AS SELECT 'Hello World'; ! is bad form because the column name defaults to ?column?; ! also, the column data type defaults to text, which might not ! be what you wanted. Better style for a string literal in a view's ! result is something like: CREATE VIEW vista AS SELECT text 'Hello World' AS hello; diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml index c031c01..63d41f0 100644 *** a/doc/src/sgml/typeconv.sgml --- b/doc/src/sgml/typeconv.sgml *** domain's base type for all subsequent st *** 984,990 If all inputs are of type unknown, resolve as type text (the preferred type of the string category). ! Otherwise, unknown inputs are ignored. --- 984,991 If all inputs are of type unknown, resolve as type text (the preferred type of the string category). ! Otherwise, unknown inputs are ignored for the purposes ! of the remaining rules. *** but integer can be implicitly c *** 1076,1081 --- 1077,1129 result type is resolved as real. + + + + SELECT Output Columns + + + SELECT + determination of result type + + + + The rules given in the preceding sections will result in assignment + of non-unknown data types to all expressions in a SQL query, + except for unspecified-type literals that appear as simple output + columns of a SELECT command. For example, in + + + SELECT 'Hello World'; + + + there is nothing to identify what type the string literal should be + taken as. In this situation PostgreSQL will fall back + to resolving the literal's type as text. + + + + When the SELECT is one arm of a UNION + (or INTERSECT or EXCEPT) construct, or when it + appears within INSERT ... SELECT, this rule is not applied + since rules given in preceding sections take precedence. The type of an + unspecified-type literal can be taken from the other UNION
Re: [HACKERS] Logical replication failing when foreign key present
On 22/01/17 18:50, Thom Brown wrote: > Hi, > > There's an issue which I haven't seen documented as expected > behaviour, where replicating data to a table which has a foreign key > results in a replication failure. This produces the following log > entries: > > LOG: starting logical replication worker for subscription "contacts_sub" > LOG: logical replication apply for subscription "contacts_sub" has started > ERROR: AfterTriggerSaveEvent() called outside of query > LOG: worker process: logical replication worker for subscription > 16408 (PID 19201) exited with exit code 1 > > Hi, thanks for report. Looks like I missed AfterTriggerBeginQuery/AfterTriggerEndQuery when moving the executor stuff around. Attached should fix it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From b6d5a1830da5412520cb50287ee71ea312f689d6 Mon Sep 17 00:00:00 2001 From: Petr Jelinek Date: Sun, 22 Jan 2017 23:16:57 +0100 Subject: [PATCH] Fix after trigger execution in logical replication --- src/backend/replication/logical/worker.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c index 7d86736..6229bef 100644 --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -176,6 +176,9 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel) if (resultRelInfo->ri_TrigDesc) estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate); + /* Prepare to catch AFTER triggers. */ + AfterTriggerBeginQuery(); + return estate; } @@ -536,6 +539,10 @@ apply_handle_insert(StringInfo s) /* Cleanup. */ ExecCloseIndices(estate->es_result_relation_info); PopActiveSnapshot(); + + /* Handle queued AFTER triggers. */ + AfterTriggerEndQuery(estate); + ExecResetTupleTable(estate->es_tupleTable, false); FreeExecutorState(estate); @@ -676,6 +683,10 @@ apply_handle_update(StringInfo s) /* Cleanup. */ ExecCloseIndices(estate->es_result_relation_info); PopActiveSnapshot(); + + /* Handle queued AFTER triggers. */ + AfterTriggerEndQuery(estate); + EvalPlanQualEnd(&epqstate); ExecResetTupleTable(estate->es_tupleTable, false); FreeExecutorState(estate); @@ -763,6 +774,10 @@ apply_handle_delete(StringInfo s) /* Cleanup. */ ExecCloseIndices(estate->es_result_relation_info); PopActiveSnapshot(); + + /* Handle queued AFTER triggers. */ + AfterTriggerEndQuery(estate); + EvalPlanQualEnd(&epqstate); ExecResetTupleTable(estate->es_tupleTable, false); FreeExecutorState(estate); -- 2.7.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online enabling of page level checksums
On 1/22/17 5:13 AM, Magnus Hagander wrote: If the system is interrupted before the background worker is done, it starts over from the beginning. Previously touched blocks will be read and verified, but not written (because their checksum is already correct). This will take time, but not re-generate the WAL. Another option would be to store a watermark of the largest block ID that had been verified for each relation/fork. That would be used to determine if checksums could be trusted, and more usefully would allow you to start where you left off if the verification process got interrupted. I think the actual functions and background worker could go in an extension that's installed and loaded only by those who need it. But the core functionality of being able to have "checksum in progress" would have to be in the core codebase. If it was in contrib I think that'd be fine. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protect syscache from bloating with negative cache entries
Jim Nasby writes: >> Ahh, I hadn't considered that. So one idea would be to only track >> negative entries on caches where we know they're actually useful. That >> might make the performance hit of some of the other ideas more >> tolerable. Presumably you're much less likely to pollute the namespace >> cache than some of the others. > Ok, after reading the code I see I only partly understood what you were > saying. In any case, it might still be useful to do some testing with > CATCACHE_STATS defined to see if there's caches that don't accumulate a > lot of negative entries. There definitely are, according to my testing, but by the same token it's not clear that a shutoff check would save anything. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protect syscache from bloating with negative cache entries
On 1/22/17 4:41 PM, Jim Nasby wrote: On 1/21/17 8:54 PM, Tom Lane wrote: Jim Nasby writes: The other (possibly naive) question I have is how useful negative entries really are? Will Postgres regularly incur negative lookups, or will these only happen due to user activity? It varies depending on the particular syscache, but in at least some of them, negative cache entries are critical for performance. See for example RelnameGetRelid(), which basically does a RELNAMENSP cache lookup for each schema down the search path until it finds a match. Ahh, I hadn't considered that. So one idea would be to only track negative entries on caches where we know they're actually useful. That might make the performance hit of some of the other ideas more tolerable. Presumably you're much less likely to pollute the namespace cache than some of the others. Ok, after reading the code I see I only partly understood what you were saying. In any case, it might still be useful to do some testing with CATCACHE_STATS defined to see if there's caches that don't accumulate a lot of negative entries. Attached is a patch that tries to document some of this. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index 253c7b53ed..d718d82d80 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -101,7 +101,12 @@ typedef struct catctup * * A negative cache entry is an assertion that there is no tuple matching * a particular key. This is just as useful as a normal entry so far as -* avoiding catalog searches is concerned. Management of positive and +* avoiding catalog searches is concerned. In particular, caching negative +* entries is critical for performance of some caches. For example, current +* code will produce a negative RELNAMENSP entry for every un-qualified +* table looking with the default search_path that puts the pg_catalog +* schema first. This effect can be obverved by defining CATCACHE_STATS and +* observing the log at backend exit. Management of positive and * negative entries is identical. */ int refcount; /* number of active references */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Protect syscache from bloating with negative cache entries
On 1/21/17 8:54 PM, Tom Lane wrote: Jim Nasby writes: The other (possibly naive) question I have is how useful negative entries really are? Will Postgres regularly incur negative lookups, or will these only happen due to user activity? It varies depending on the particular syscache, but in at least some of them, negative cache entries are critical for performance. See for example RelnameGetRelid(), which basically does a RELNAMENSP cache lookup for each schema down the search path until it finds a match. Ahh, I hadn't considered that. So one idea would be to only track negative entries on caches where we know they're actually useful. That might make the performance hit of some of the other ideas more tolerable. Presumably you're much less likely to pollute the namespace cache than some of the others. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum
On 1/20/17 12:40 AM, Amit Khandekar wrote: My impression was that postmaster is supposed to just do a minimal work of starting auto-vacuum launcher if not already. And, the work of ensuring all the things keep going is the job of auto-vacuum launcher. There's already a ton of logic in the launcher... ISTM it'd be nice to not start adding additional logic to the postmaster. If we had a generic need for rate limiting launching of things maybe it wouldn't be that bad, but AFAIK we don't. That limits us to launching the autovacuum launcher at most six times a minute when autovacuum = off. You could argue that defeats the point of the SendPostmasterSignal in SetTransactionIdLimit, but I don't think so. If vacuuming the oldest database took less than 10 seconds, then we won't vacuum the next-oldest database until we hit the next 64kB transaction ID boundary, but that can only cause a problem if we've got so many databases that we don't get to them all before we run out of transaction IDs, which is almost unthinkable. If you had a ten million tiny databases that all crossed the threshold at the same instant, it would take you 640 million transaction IDs to visit them all. If you also had autovacuum_freeze_max_age set very close to the upper limit for that variable, you could conceivably have the system shut down before all of those databases were reached. But that's a pretty artificial scenario. If someone has that scenario, perhaps they should consider more sensible configuration choices. Yeah this logic makes sense ... I'm not sure that's true in the case of a significant number of databases and a very high XID rate, but I might be missing something. In any case I agree it's not worth worrying about. If you've disabled autovac you're already running with scissors. But I guess , from looking at the code, it seems that it was carefully made sure that in case of auto-vacuum off, we should clean up all databases as fast as possible with multiple workers cleaning up multiple tables in parallel. Instead of autovacuum launcher and worker together making sure that the cycle of iterations keep on running, I was thinking the auto-vacuum launcher itself should make sure it does not spawn another worker on the same database if it did nothing. But that seemed pretty invasive. IMHO we really need some more sophistication in scheduling for both launcher and worker. Somewhere on my TODO is allowing the worker to call a user defined SELECT to get a prioritized list, but since the launcher doesn't connect to a database that wouldn't work. What we could do rather simply is honor adl_next_worker in the logic that looks for freeze, something like the attached. On another note, does anyone else find the database selection logic rather difficult to trace through? The logic is kinda spread throughout several functions. The naming of rebuild_database_list() and get_database_list() is rather confusing too. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 264298e8a9..34969b058f 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -1066,6 +1066,43 @@ db_comparator(const void *a, const void *b) } /* + * Returns true if database has been processed recently (less than + * autovacuum_naptime seconds ago). + */ +static boolean +recent_activity(Oid datid) +{ + dlist_iter iter; + + dlist_reverse_foreach(iter, &DatabaseList) + { + avl_dbase *dbp = dlist_container(avl_dbase, adl_node, iter.cur); + + if (dbp->adl_datid == datid) + { + /* +* Skip this database if its next_worker value falls between +* the current time and the current time plus naptime. +*/ + if (!TimestampDifferenceExceeds(dbp->adl_next_worker, + current_time, 0) && + !TimestampDifferenceExceeds(current_time, + dbp->adl_next_worker, + autovacuum_naptime * 1000)) + return true; + return false; + + } + } + + /* +* Didn't find the database on the internal list, so it must be new. Skip +* it for now. +*/ + return true; +} + +/* * do_start_worker * * Bare-bones procedure for starting an autovacuum worker from the launcher. @@ -1162,24 +1199,33 @@ do_start_worker(void) foreach(
Re: [HACKERS] PATCH: recursive json_populate_record()
Nikita Glukhov writes: > On 01/08/2017 09:52 PM, Tom Lane wrote: >> ... attndims is really syntactic sugar only and doesn't affect anything >> meaningful semantically. > I have fixed the first patch: when the number of dimensionsis unknown > we determine it simply by the number of successive opening brackets at > the start of a json array. But I'm still using for verification non-zero > ndims values that can be get from composite type attribute (attndims) or > from domain array type (typndims) through specially added function > get_type_ndims(). Apparently I didn't make my point forcefully enough. attndims is not semantically significant in Postgres; the fact that we even store it is just a historical artifact, I think. There might be an argument to start enforcing it, but we'd have to do that across the board, and I think most likely any such proposal would fail because of backwards compatibility concerns. (There would also be a lot of discussion as to whether, say, "int foo[3]" shouldn't enforce that the array length is 3, not just that it be one-dimensional.) In the meantime, we are *not* going to have attndims be semantically significant in just this one function. That would not satisfy the principle of least astonishment. Therefore, please remove this patch's dependence on attndims. I looked through the patch briefly and have some other comments: 1. It's pretty bizarre that you need dummy forward struct declarations for ColumnIOData and RecordIOData. The reason evidently is that somebody ignored project style and put a bunch of typedefs after the local function declarations in jsonfuncs.c. Project style of course is to put the typedefs first, precisely because they may be needed in the local function declarations. I will go move those declarations right now, as a single- purpose patch, so that you have something a bit cleaner to modify. 2. The business with isstring, saved_scalar_is_string, etc makes me itch. I'm having a hard time explaining exactly why, but it just seems like a single-purpose kluge. Maybe it would seem less so if you saved the original JsonTokenType instead. But also I'm wondering why the ultimate consumers are concerned with string-ness as such. It seems like mainly what they're trying to do is forbid converting a string into a non-scalar Postgres type, but (a) why, and (b) if you are going to restrict it, wouldn't it be more sensible to frame it as you can't convert any JSON scalar to a non-scalar type? As to (a), it seems like you're just introducing unnecessary compatibility breakage if you insist on that. As to (b), really it seems like an appropriate restriction of that sort would be like "if target type is composite, source must be a JSON object, and if target type is array, source must be a JSON array". Personally I think what you ought to do here is not change the semantics of cases that are allowed today, but only change the semantics in the cases of JSON object being assigned to PG composite and JSON array being assigned to PG array; both of those fail today, so there's nobody depending on the existing behavior. But if you reject string-to-composite cases then you will be breaking existing apps, and I doubt people will thank you for it. 3. I think you'd be better off declaring ColumnIOData.type_category as an actual enum type, not "char" with some specific values documented only by a comment. Also, could you fold the domain case in as a fourth type_category? Also, why does ColumnIOData have both an ndims field and another one buried in io.array.ndims? 4. populate_array_report_expected_array() violates our error message guidelines, first by using elog rather than ereport for a user-facing error, and second by assembling the message from pieces, which would make it untranslatable even if it were being reported through ereport. I'm not sure if this needs to be fixed though; will the function even still be needed once you remove the attndims dependency? Even if there are still callers, you wouldn't necessarily need such a function if you scale back your ambitions a bit as to the amount of detail required. I'm not sure you really need to report what you think the dimensions are when complaining that an array is nonrectangular. 5. The business with having some arguments that are only for json and others that are only for jsonb, eg in populate_scalar(), also makes me itch. I wonder whether this wouldn't all get a lot simpler and more readable if we gave up on trying to share code between the two cases. In populate_scalar(), for instance, the amount of code actually shared between the two cases amounts to a whole two lines, which are dwarfed by the amount of crud needed to deal with trying to serve both cases in the one function. There are places where there's more sharable code than that, but it seems like a better design might be to refactor the sharable code out into subroutines called by separate functions for json and jsonb. 6. I'm not too exci
[HACKERS] Logical replication failing when foreign key present
Hi, There's an issue which I haven't seen documented as expected behaviour, where replicating data to a table which has a foreign key results in a replication failure. This produces the following log entries: LOG: starting logical replication worker for subscription "contacts_sub" LOG: logical replication apply for subscription "contacts_sub" has started ERROR: AfterTriggerSaveEvent() called outside of query LOG: worker process: logical replication worker for subscription 16408 (PID 19201) exited with exit code 1 Reproducible test case: On both instances: CREATE TABLE b (bid int PRIMARY KEY); CREATE TABLE a (id int PRIMARY KEY, bid int REFERENCES b (bid)); INSERT INTO b (bid) VALUES (1); First instance: CREATE PUBLICATION a_pub FOR TABLE a; Second instance: CREATE SUBSCRIPTION a_sub CONNECTION 'host=127.0.0.1 port=5530 user=thom dbname=postgres' PUBLICATION a_pub; First instance: INSERT INTO a (id, bid) VALUES (1,1); Regards Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new autovacuum criterion for visible pages
Amit, * Amit Kapila (amit.kapil...@gmail.com) wrote: > On Sun, Jan 22, 2017 at 3:27 AM, Stephen Frost wrote: > > * Simon Riggs (si...@2ndquadrant.com) wrote: > >> On 12 August 2016 at 01:01, Tom Lane wrote: > >> > Michael Paquier writes: > >> >> In short, autovacuum will need to scan by itself the VM of each > >> >> relation and decide based on that. > >> > > >> > That seems like a worthwhile approach to pursue. The VM is supposed to > >> > be > >> > small, and if you're worried it isn't, you could sample a few pages of > >> > it. > >> > I do not think any of the ideas proposed so far for tracking the > >> > visibility percentage on-the-fly are very tenable. > >> > >> Sounds good, but we can't scan the VM for every table, every minute. > >> We need to record something that will tell us how many VM bits have > >> been cleared, which will then allow autovac to do a simple SELECT to > >> decide what needs vacuuming. > >> > >> Vik's proposal to keep track of the rows inserted seems like the best > >> approach to this issue. > > > > I tend to agree with Simon on this. I'm also worried that an approach > > which was based off of a metric like "% of table not all-visible" might > > result in VACUUM running over and over on a table because it isn't able > > to actually make any progress towards improving that percentage. We'd > > have to have some kind of "cool-off" period or something. > > > > Tracking INSERTs and then kicking off a VACUUM based on them seems to > > address that in a natural way and also seems like something that users > > would generally understand as it's very similar to what we do for > > UPDATEs and DELETEs. > > > > Tracking the INSERTs as a reason to VACUUM is also very natural when you > > consider the need to update BRIN indexes. > > Another possible advantage of tracking INSERTs is for hash indexes > where after split we need to remove tuples from buckets that underwent > split recently. That's a good point also, and for clearing GIN pending lists too, now that I think about it. We really need to get this fixed for PG10. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Online enabling of page level checksums
So, that post I made about checksums certainly spurred a lot of discussion :) One summary is that life would be a lot easier if we could turn checksums on (and off) without re-initdbing. I'm breaking out this question into this thread to talk about it separately. I've been toying with a branch to work on this, but haven't had a time to get it even to compiling state. But instead of waiting until I have some code to show, let me outline the idea I had. My general idea is this: Take one bit in the checksum version field and make it mean "in progress". That means chat checksums can now be "on", "off", or "in progress". When checksums are "in progress", PostgreSQL will compute and write checksums whenever it writes out a buffer, but it will *not* verify checksums on read. This state would be set by calling a function (or an external command with the system shut down if need be - I can accept a restart for this, but I'd rather avoid it if possible). This function would also launch a background worker. This worker would enumerate the entire database block by block. Read a block, verify if the checksum is set and correct. If it is, ignore it (because any further updates will keep it in state ok when we're in state "in progress"). If not then mark it as dirty and write it out through regular means, which will include computing and writing the checksum since we're "in progress". With something similar to vacuum cost delay to control how quickly it writes. Yes, this means the entire db will end up in the transaction log since everything is rewritten. That's not great, but for a lot of people that will be a trade they're willing to make since it's a one-time thing. Yes, this background process might take days or weeks - that's OK as long as it happens online. Once the background worker is done, it flips the checksum state to "on", and the system starts verifying checksums as well. If the system is interrupted before the background worker is done, it starts over from the beginning. Previously touched blocks will be read and verified, but not written (because their checksum is already correct). This will take time, but not re-generate the WAL. I think the actual functions and background worker could go in an extension that's installed and loaded only by those who need it. But the core functionality of being able to have "checksum in progress" would have to be in the core codebase. So, is there something obviously missing in this plan? Or just the code to do it :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Checksums by default?
On Sat, Jan 21, 2017 at 8:16 PM, Ants Aasma wrote: > On Sat, Jan 21, 2017 at 7:39 PM, Petr Jelinek > wrote: > > So in summary "postgresql.conf options are easy to change" while "initdb > > options are hard to change", I can see this argument used both for > > enabling or disabling checksums by default. As I said I would be less > > worried if it was easy to turn off, but we are not there afaik. And even > > then I'd still want benchmark first. > > Adding support for disabling checksums is almost trivial as it only > requires flipping a value in the control file. And I have somewhere > sitting around a similarly simple tool for turning on checksums while > the database is offline. FWIW, based on customers and fellow > conference goers I have talked to most would gladly take the > performance hit, but not the downtime to turn it on on an existing > database. > This is exactly the scenario I've been exposed to over and over again. If it can be turned on/off online, then the default matters a lot less. But it has to be online. Yes, you can set up a replica (which today requires third party products like slony, bucardo or pglogical -- at least we'll hopefully have pglogical fully in 10, but it's still a very expensive way to fix the problem). If we can make it cheap and easy to turn them off, that makes a change of the default a lot cheaper. Did you have a tool for that sitting around as well? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Checksums by default?
On 01/21/2017 04:18 PM, Petr Jelinek wrote: On 21/01/17 11:39, Magnus Hagander wrote: Is it time to enable checksums by default, and give initdb a switch to turn it off instead? I'd like to see benchmark first, both in terms of CPU and in terms of produced WAL (=network traffic) given that it turns on logging of hint bits. ... and those hint bits may easily trigger full-page writes, resulting in significant write amplification. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/21/2017 05:53 PM, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost writes: * Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote: The change of wal_level was supported by benchmark, I think it's reasonable to ask for this to be as well. No, it wasn't, it was that people felt the cases where changing wal_level would seriously hurt performance didn't out-weigh the value of making the change to the default. It was "supported" in the sense that somebody took the trouble to measure the impact, so that we had some facts on which to base the value judgment that the cost was acceptable. In the case of checksums, you seem to be in a hurry to arrive at a conclusion without any supporting evidence. Exactly. No, no one measured the impact in the cases where wal_level=minimal makes a big difference, that I saw, at least. We already knew we can construct data-loading workloads relying on the wal_level=minimal optimization and demonstrating pretty arbitrary benefits, so there was no point in benchmarking them. That does not mean those cases were not considered, though. > Further info with links to what was done are in my reply to Petr. As for checksums, I do see value in them and I'm pretty sure that the author of that particular feature did as well, or we wouldn't even have it as an option. You seem to be of the opinion that we might as well just rip all of that code and work out as being useless. I do see value in them too, and if turning then off again would be as simple as reverting back to wal_level=minimal, I would be strongly in favor of enabling then to 'on' by default (after a bit of benchmarking, similar to what we did in the wal_level=minimal case). The fact that disabling them requires initdb makes the reasoning much trickier, though. That being said, I'm ready to do some benchmarking on this, so that we have at least some numbers to argue about. Can we agree on a set of workloads that we want to benchmark in the first round? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/21/2017 05:51 PM, Stephen Frost wrote: * Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote: On 21/01/17 17:31, Stephen Frost wrote: This is just changing the *default*, not requiring checksums to always be enabled. We do not hold the same standards for our defaults as we do for always-enabled code, for clear reasons- not every situation is the same and that's why we have defaults that people can change. I can buy that. If it's possible to turn checksums off without recreating data directory then I think it would be okay to have default on. I'm glad to hear that. The change of wal_level was supported by benchmark, I think it's reasonable to ask for this to be as well. No, it wasn't, it was that people felt the cases where changing wal_level would seriously hurt performance didn't out-weigh the value of making the change to the default. Really? Yes. https://www.postgresql.org/message-id/d34ce5b5-131f-66ce-f7c5-eb406dbe0...@2ndquadrant.com From the above link: So while it'd be trivial to construct workloads demonstrating the optimizations in wal_level=minimal (e.g. initial loads doing CREATE TABLE + COPY + CREATE INDEX in a single transaction), but that would be mostly irrelevant I guess. Instead, I've decided to run regular pgbench TPC-B-like workload on a bunch of different scales, and measure throughput + some xlog stats with each of the three wal_level options. In other words, there was no performance testing of the cases where wal_level=minimal (the old default) optimizations would have been compared against wal_level > minimal. I'm quite sure that the performance numbers for the CREATE TABLE + COPY case with wal_level=minimal would have been *far* better than for wal_level > minimal. That case was entirely punted on as "mostly irrelevant" even though there are known production environments where those optimizations make a huge difference. Those are OLAP cases though, and not nearly enough folks around here seem to care one bit about them, which I continue to be disappointed by. You make it look as if we swept that case under the carpet, despite there being quite a bit of relevant discussion in that thread. We might argue how many deployments benefit from the Wal_level=minimal optimization (I'm sure some of our customers do benefit from it too), and whether it makes it 'common workload' or not. It's trivial to construct a workload demonstrating pretty arbitrary performance advantage of the wal_level=minimal case. Hence no point in wasting time on demonstrating it, making the case rather irrelevant for the benchmarking. Moreover, there are quite a few differences between enabling checksums by default, and switching to wal_level=minimal: - There are reasonable workaround that give you wal_level=minimal back (UNLOGGED tables). And those work even when you actually need a standby, which is pretty common these days. With checksums there's nothing like that. - You can switch between wal_level values by merely restarting the cluster, while checksums may only be enabled/disabled by initdb. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checksums by default?
On 01/21/2017 05:35 PM, Tom Lane wrote: Stephen Frost writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Have we seen *even one* report of checksums catching problems in auseful way? This isn't the right question. I disagree. If they aren't doing something useful for people who have turned them on, what's the reason to think they'd do something useful for the rest? I believe Stephen is right. The fact that you don't see something, e.g. reports about checksums catching something in production deployments, proves nothing because of "survivorship bias" discovered by Abraham Wald during WWW II [1]. Not seeing bombers with bullet holes in engines does not mean you don't need to armor engines. Quite the opposite. [1] https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d#.j9d9c35mb Applied to checksums, we're quite unlikely to see reports about data corruption caught by checksums because "ERROR: invalid page in block X" is such a clear sign of data corruption that people don't even ask us about that. Combine that with the fact that most people are running with defaults (i.e. no checksums) and that data corruption is a rare event by nature, and we're bound to have no such reports. What we got, however, are reports about strange errors from instances without checksums enabled, that were either determined to be data corruption, or disappeared after dump/restore or reindexing. It's hard to say for sure whether those were cases of data corruption (where checksums might have helped) or some other bug (resulting in a corrupted page with the checksum computed on the corrupted page). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add function to import operating system collations
On Sun, Jan 22, 2017 at 5:28 AM, Tom Lane wrote: > Peter Eisentraut writes: >> On 1/21/17 12:50 PM, Tom Lane wrote: >>> I have to question the decision to make "no locales" a hard error. >>> What's the point of that? In fact, should we even be bothering with >>> a warning, considering how often initdb runs unattended these days? > >> Hmm, it was a warning in initdb, so making it an error now is probably a >> mistake. We should change it back to a warning at least. > > I'd drop it altogether I think ... it was useful debug back in the day > but now I doubt it is worth much. > >> Also, if we add ICU initialization to this, then it's not clear how we >> would report if one provider provided zero locales but another did >> provide some. > > Would it help to redefine the function as returning the number of locale > entries it successfully added? It would be nice at least to avoid an error, still even if we decide to keep it an error I can add a couple of locales in hamster and dangomushi and we are good to go in the buildfarm. Regarding the warning, I have found it useful a couple of times on ArchLinux where no locales are enabled by default. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers