Re: Libpq support to connect to standby server as priority
On Thu, Apr 11, 2019 at 9:13 AM Haribabu Kommi wrote: > I fixed all the comments that you have raised above and attached the > updated > patches. > Rebased patches are attached. Regards, Haribabu Kommi Fujitsu Australia 0001-New-pg_basebackup-g-option-to-control-the-group-acce.patch Description: Binary data 0002-New-TargetSessionAttrsType-enum.patch Description: Binary data 0003-Make-transaction_read_only-as-GUC_REPORT-varaible.patch Description: Binary data 0004-New-prefer-read-target_session_attrs-type.patch Description: Binary data 0005-New-read-only-target_session_attrs-type.patch Description: Binary data 0006-Primary-prefer-standby-and-standby-options.patch Description: Binary data 0007-New-function-to-rejecting-the-checked-write-connecti.patch Description: Binary data 0008-Server-recovery-mode-handling.patch Description: Binary data
Re: Print baserestrictinfo for varchar fields
Donald Dong writes: > I noticed that debug_print_rel outputs "unknown expr" when the fields > in baserestrictinfo are typed as varchar. > ... > I wonder if this is a proper way of fixing it? It's hard to muster much enthusiasm for extending print_expr(), considering how incomplete and little-used it is. I'd rather spend effort on ripping it out in favor of using the far more complete, and better-tested, code in ruleutils.c. regards, tom lane
Re: [PATCH] Simple typos fix
On Sun, Jun 02, 2019 at 04:42:57PM -0500, Justin Pryzby wrote: > Thanks for finding these ; I think a few hunks are false positives and should > be removed. Yes, some of them are the changes in imath.c and snowball/, which we include in Postgres but in reality are independent projects, so these should be fixed in upstream instead, and Postgres will include those fixes when merging with newer versions. If we were to fix those issues ourselves, then we would likely create conflicts when merging with newer versions of the upstream modules. > A few more are debatable and could be correct either way: > > alloced - not an English word, but a technical one; Indeed. The current wording is fine by me. > delink - "unlink" is better for the filesystem operation, but I > don't think "delink" is wrong for a list operation. > dependees (?) These terms could be used in programming. > This'd > define'd Don't think it is much of a big deal to keep these as well. "invokable" can be used in programming, and "cachable" is an alternate spelling of "cacheable" based on some research. > On Tue, May 28, 2019 at 08:17:18PM +0200, Andrea Gelmini wrote: >> diff --git a/contrib/amcheck/verify_nbtree.c >> b/contrib/amcheck/verify_nbtree.c >> index de0a98f6d9..ff13b0c9e7 100644 >> --- a/contrib/amcheck/verify_nbtree.c >> +++ b/contrib/amcheck/verify_nbtree.c >> @@ -1278,7 +1278,7 @@ bt_right_page_check_scankey(BtreeCheckState *state) >> * Routines like _bt_search() don't require *any* page split interlock >> * when descending the tree, including something very light like a >> buffer >> * pin. That's why it's okay that we don't either. This avoidance of >> any >> - * need to "couple" buffer locks is the raison d' etre of the Lehman & >> Yao >> + * need to "couple" buffer locks is the reason d'etre of the Lehman & >> Yao > > I think this is wrong. The French phase is "raison d'etre". French here. Note that an accent is missing on the first 'e' (être) but we don't want non-ASCII characters in the code. So the current wording is fine in my opinion. > I think this is wrong. It should say "though". Or perhaps: > * store at segment to which its start lsn belongs, but don't > split over > * multiple segments I would replace it by "though", "tho" is not incorrect tho ;) >> @@ -907,7 +907,7 @@ write_relmap_file(bool shared, RelMapFile *newmap, >> * Make sure that the files listed in the map are not deleted if the >> outer >> * transaction aborts. This had better be within the critical section >> * too: it's not likely to fail, but if it did, we'd arrive at >> transaction >> - * abort with the files still vulnerable. PANICing will leave things >> in a >> + * abort with the files still vulnerable. Panicking will leave things >> in a > > Wrong ? Yes, the suggestion is wrong. The comment refers to the elog level. The original patch proposed 63 diffs. After the false positives are removed, 21 remain, which I have now committed. You have done good work in catching all these, by the way. Thanks for taking the time to do so. -- Michael signature.asc Description: PGP signature
Re: psql completion bugs with access methods
On Sat, Jun 01, 2019 at 12:44:05PM -0700, Andres Freund wrote: > "I'm fine with adding those for 12", so no, I'm not strongly opposed. OK, fixed this one for now. -- Michael signature.asc Description: PGP signature
Re: initdb recommendations
On Tue, May 28, 2019 at 12:15:35PM -0400, Magnus Hagander wrote: > On Fri, May 24, 2019 at 11:24 AM Noah Misch wrote: > > On Thu, May 23, 2019 at 06:56:49PM +0200, Magnus Hagander wrote: > > > On Thu, May 23, 2019, 18:54 Peter Eisentraut > > > wrote: > > > > To recap, the idea here was to change the default authentication methods > > > > that initdb sets up, in place of "trust". > > > > > > > > I think the ideal scenario would be to use "peer" for local and some > > > > appropriate password method (being discussed elsewhere) for host. > > > > > > > > Looking through the buildfarm, I gather that the only platforms that > > > > don't support peer are Windows, AIX, and HP-UX. I think we can probably > > > > figure out some fallback or alternative default for the latter two > > > > platforms without anyone noticing. But what should the defaults be on > > > > Windows? It doesn't have local sockets, so the lack of peer wouldn't > > > > matter. But is it OK to default to a password method, or would that > > > > upset people particularly? > > > > > > I'm sure password would be fine there. It's what "everybody else" does > > > (well sqlserver also cord integrated security, but people are used to it). > > > > Our sspi auth is a more-general version of peer auth, and it works over TCP. > > It would be a simple matter of programming to support "peer" on Windows, > > consisting of sspi auth with an implicit pg_ident map. Nonetheless, I agree > > password would be fine. > > I hope oyu don't mean "make peer use sspi on windows". I think that's a > really bad idea from a confusion perspective. I don't mean "make peer an alias for SSPI", but I do mean "implement peer on Windows as a special case of sspi, using the same Windows APIs". To the client, "peer" would look like "sspi". If that's confusion-prone, what's confusing about it? > However, what we could do there is have the defaut pg_hba.conf file contain > a "reasonable setup using sspi" that's a different story. That's another way to do it. Currently, to behave like "peer" behaves, one hard-codes the machine's SSPI realm into pg_ident.conf. If we introduced pg_ident.conf syntax to remove that need (e.g. %MACHINE_REALM%), that approach would work. > But I wonder if that isn't better implemented at the installer level. I > think we're better off doing something like scram as the config when you > build from source ,and then encourage installers to do other things based on > the fact that they know more information about the setup (such as usernames > actually used). If initdb has the information needed to configure the recommended authentication, that's the best place to do it, since there's one initdb and many installers. So far, I haven't seen a default auth configuration proposal involving knowledge of OS usernames or other information initdb lacks.
Re: [PATCH] Simple typos fix
Thanks for finding these ; I think a few hunks are false positives and should be removed. A few more are debatable and could be correct either way: Kazakstan alloced - not an English word, but a technical one; delink - "unlink" is better for the filesystem operation, but I don't think "delink" is wrong for a list operation. dependees (?) This'd define'd On Tue, May 28, 2019 at 08:17:18PM +0200, Andrea Gelmini wrote: > diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c > index de0a98f6d9..ff13b0c9e7 100644 > --- a/contrib/amcheck/verify_nbtree.c > +++ b/contrib/amcheck/verify_nbtree.c > @@ -1278,7 +1278,7 @@ bt_right_page_check_scankey(BtreeCheckState *state) >* Routines like _bt_search() don't require *any* page split interlock >* when descending the tree, including something very light like a > buffer >* pin. That's why it's okay that we don't either. This avoidance of > any > - * need to "couple" buffer locks is the raison d' etre of the Lehman & > Yao > + * need to "couple" buffer locks is the reason d'etre of the Lehman & > Yao I think this is wrong. The French phase is "raison d'etre". > diff --git a/src/backend/replication/logical/reorderbuffer.c > b/src/backend/replication/logical/reorderbuffer.c > index e7c32f2a13..20bb928016 100644 > --- a/src/backend/replication/logical/reorderbuffer.c > +++ b/src/backend/replication/logical/reorderbuffer.c > @@ -2279,7 +2279,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, > ReorderBufferTXN *txn) > > /* >* store in segment in which it belongs by start lsn, don't > split over > - * multiple segments tho > + * multiple segments to I think this is wrong. It should say "though". Or perhaps: * store at segment to which its start lsn belongs, but don't split over * multiple segments > diff --git a/src/backend/utils/cache/relmapper.c > b/src/backend/utils/cache/relmapper.c > index 3b4f21bc54..403435df52 100644 > --- a/src/backend/utils/cache/relmapper.c > +++ b/src/backend/utils/cache/relmapper.c > @@ -146,7 +146,7 @@ static void perform_relmap_update(bool shared, const > RelMapFile *updates); > /* > * RelationMapOidToFilenode > * > - * The raison d' etre ... given a relation OID, look up its filenode. > + * The reason d'etre... given a relation OID, look up its filenode. Wrong > @@ -907,7 +907,7 @@ write_relmap_file(bool shared, RelMapFile *newmap, >* Make sure that the files listed in the map are not deleted if the > outer >* transaction aborts. This had better be within the critical section >* too: it's not likely to fail, but if it did, we'd arrive at > transaction > - * abort with the files still vulnerable. PANICing will leave things > in a > + * abort with the files still vulnerable. Panicking will leave things > in a Wrong ? Thanks, Justin
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Hi James, On Fri, May 31, 2019 at 03:51:57PM -0400, James Coleman wrote: I've rebased the patch on master and confirmed make check world passes. Thanks for the rebase! I think the patch is in pretty good shape - I'm sure we'll find ways to make it more efficient etc. but IMO that's fine and should not prevent getting it committed. The planning/costing logic may need further discussion and improvements, though. IIRC this was the main reason why the patch missed PG11, because at that point it simply used incremental sort whenever the input was already presorted by a pathkey prefix, but that may be slower than regular sort in some cases (unexpectedly large groups, etc.). I see the current patch partially improves this by removing the creating both paths (full and and incremental sort). That'd good, because it means the decision is cost-based (as it should be). The question however is how accurate the costing model is - and per discussion in the thread, it may need some improvements, to handle skewed distributions better. Currently, the costing logic (cost_incremental_sort) assumes all groups have the same size, which is fine for uniform distributions. But for skewed distributions, that may be an issue. Consider for example a table with 1M rows, two columns, 100 groups in each column, and index on the first one. CREATE table t (a INT, b INT); INSERT INTO t SELECT 100*random(), 100*random() FROM generate_series(1,100); Now, let's do a simple limit query to find the first row: SELECT * FROM t ORDER BU a, b LIMIT 1; In this case the current costing logic is fine - the groups are close to average size, and we only need to sort the first group, i.e. 1% of data. Now, let's say the first group is much larger: INSERT INTO t SELECT 0, 100*random() FROM generate_series(1,90) s(i); INSERT INTO t SELECT 100*random(), 100*random() FROM generate_series(1,10); That is, the first group is roughly 90% of data, but the number of groups is still the same. But we need to scan 90% of data. But the average group size is still the same, so the current cost model is oblivious to this. But I think we can improve this by looking at the MCV lists (either per-column or multi-column) and see if some groups are much larger, and consider that when computing the costs. In particular, I think we should estimate the size of the first group, because that's important for startup cost - we need to process the whole first group before producing the first tuple, and that matters for LIMIT queries etc. For example, let's say the first (already sorted) column has a MCV. Then we can see how large the first group (by valule, not frequency) is, and use that instead of the average group size. E.g. in the above example we'd know the first group is ~90%. And we could do the same for multiple columns, either by looking at multi-column MCV lists (if there's one), or by using minimum from each per-column MCV lists. Of course, these are only the groups that made it to the MCV list, and there may be other (smaller) groups before this large one. For example there could be a group with "-1" value and a single row. For a moment I thought we could/should look at the histogram, becase that could tell us if there are groups "before" the first MCV one, but I don't think we should do that, for two reasons. Firstly, rare values may not get to the histogram anyway, which makes this rather unreliable and might introduce sudden plan changes, because the cost would vary wildly depending on whether we happened to sample the rare row or not. And secondly, the rare row may be easily filtered out by a WHERE condition or something, at which point we'll have to deal with the large group anyway. So I think we should look at the MCV list, and use that information when computing the startup/total cost. I think using the first/largest group to compute the startup cost, and the average group size for total cost would do the trick. I don't think we can do much better than this during planning. There will inevitably be cases where the costing model will push us to do the wrong thing, in either direction. The question is how serious issue that is, and whether we could remedy that during execution somehow. When we "incorrectly" pick full sort (when the incremental sort would be faster), that's obviously sad but I think it's mostly fine because it's not a regression. The opposite direction (picking incremental sort, while full sort being faster) is probably more serious, because it's a regression between releases. I don't think we can fully fix that by refining the cost model. We have two basic options: 1) Argue/show that this is not an issue in practice, because (a) such cases are very rare, and/or (b) the regression is limited. In short, the benefits of the patch outweight the losses. 2) Provide some fallback at execution time. For example, we might watch the size of the group, and if we run into an unexpectedly la
unused_oids script should advertise reserved OID range
The unused_oids script has gone from being something of interest to everybody that wants to write a patch that creates a new catalog entry, to something that patch authors could do without in many cases. I think that its output should prominently advertise that patches should use random OIDs in the range 8000 - . Commit a6417078c4140e51cfd717448430f274b449d687 established that this should be standard practice for patch authors. Actually, maybe it should even suggest a *particular* random OID in that range, so that the choice of OID is reliably random -- why even require patch authors to pick a number at random? It also looks like pg_proc.dat should be updated, since it still mentions the old custom of trying to use contiguous OIDs. It also discourages the practice of picking OIDs at random, which is almost the opposite of what it should say. -- Peter Geoghegan
Residual cpluspluscheck issues
cpluspluscheck's expanded coverage is now passing cleanly for me on the macOS laptop I was testing it with at PGCon. But on returning home, I find there's still some issues on some other boxes: * On Linux (at least Fedora and RHEL), I get variants of this: /usr/include/arpa/inet.h:84: error: declaration of 'char* inet_net_ntop(int, const void*, int, char*, size_t) throw ()' throws different exceptions /home/postgres/pgsql/src/include/port.h:506: error: from previous declaration 'char* inet_net_ntop(int, const void*, int, char*, size_t)' That's because /usr/include/arpa/inet.h declares it as extern char *inet_net_ntop (int __af, const void *__cp, int __bits, char *__buf, size_t __len) __THROW; and of course when a C++ compiler reads that, __THROW will expand as something nonempty. One possible fix for that is to teach configure to test whether arpa/inet.h provides a declaration, and not compile our own declaration when it does. This would require being sure that we include arpa/inet.h anywhere we use that function, but there are few enough callers that that's not much of a hardship. Alternatively, we could rename our function to pg_inet_net_ntop to dodge the conflict. This might be a good idea anyway to avoid confusion, since our function doesn't necessarily recognize the same address-family codes that libc would. * On FreeBSD 12, I get /home/tgl/pgsql/src/include/utils/hashutils.h:23:23: warning: 'register' storage class specifier is deprecated and incompatible with C++17 [-Wdeprecated-register] extern Datum hash_any(register const unsigned char *k, register int keylen); ^ /home/tgl/pgsql/src/include/utils/hashutils.h:23:56: warning: 'register' storage class specifier is deprecated and incompatible with C++17 [-Wdeprecated-register] extern Datum hash_any(register const unsigned char *k, register int keylen); ^ /home/tgl/pgsql/src/include/utils/hashutils.h:24:32: warning: 'register' storage class specifier is deprecated and incompatible with C++17 [-Wdeprecated-register] extern Datum hash_any_extended(register const unsigned char *k, ^ /home/tgl/pgsql/src/include/utils/hashutils.h:25:11: warning: 'register' storage class specifier is deprecated and incompatible with C++17 [-Wdeprecated-register] register int ... ^ which I'm inclined to think means we should drop those register keywords. (The FreeBSD box shows another couple of complaints too, but I think the fixes for those are uncontroversial.) Comments? regards, tom lane
Optimize partial TOAST decompression
Hi, hackers! I'm a student participating in GSoC 2019 and my project is related to TOAST slices. When I'm getting familiar with the postgresql codebase, I find that PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch all compressed data chunks then extract the relevant slice. Obviously, this is unnecessary, we only need to fetch the data chunks we need. The patch optimizes partial TOAST decompression. For an example of the improvement possible, this trivial example: - create table slicingtest ( id serial primary key, a text ); insert into slicingtest (a) select repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100) as a from generate_series(1,100); \timing select sum(length(substr(a, 0, 20))) from slicingtest; - environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux On master, I get Time: 28.123 ms (Take ten times average) With the patch, I get Time: 2.306 ms (take ten times average) This seems to have a 10x improvement. If the number of toast data chunks is more, I believe that patch can play a greater role, there are about 200 related TOAST data chunks for each entry in the case. Related discussion: https://www.postgresql.org/message-id/flat/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com Best regards, Binguo Bao. From a7c99439ffe309526b57fe26ab367e4b7bf62f39 Mon Sep 17 00:00:00 2001 From: BBG Date: Sun, 2 Jun 2019 19:18:46 +0800 Subject: [PATCH] Optimize partial TOAST decompression --- src/backend/access/heap/tuptoaster.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c index 55d6e91..7d30538 100644 --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -273,8 +273,11 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) return toast_fetch_datum_slice(attr, sliceoffset, slicelength); - /* fetch it back (compressed marker will get set automatically) */ - preslice = toast_fetch_datum(attr); + /* + * Be sure to get enough compressed slice + * and compressed marker will get set automatically + */ + preslice = toast_fetch_datum_slice(attr, 0, sliceoffset + slicelength + 1); } else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) { @@ -2031,7 +2034,8 @@ toast_fetch_datum(struct varlena *attr) * Reconstruct a segment of a Datum from the chunks saved * in the toast relation * - * Note that this function only supports non-compressed external datums. + * Note that this function supports non-compressed external datums + * and compressed external datum slices at the start of the object. * -- */ static struct varlena * @@ -2072,10 +2076,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length) VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); /* - * It's nonsense to fetch slices of a compressed datum -- this isn't lo_* - * we can't return a compressed datum which is meaningful to toast later + * It's meaningful to fetch slices at the start of a compressed datum. */ - Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)); + Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset); attrsize = toast_pointer.va_extsize; totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1; @@ -2091,7 +2094,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length) result = (struct varlena *) palloc(length + VARHDRSZ); - SET_VARSIZE(result, length + VARHDRSZ); + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) { + SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ); + } else { + SET_VARSIZE(result, length + VARHDRSZ); + } if (length == 0) return result; /* Can save a lot of work at this point! */ -- 2.7.4