Re: [HACKERS] Pluggable storage
On Tue, Aug 8, 2017 at 2:21 PM, Amit Kapilawrote: > On Tue, Jun 13, 2017 at 7:20 AM, Haribabu Kommi > wrote: > > > > > > On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > > wrote: > >> > >> I have sent the partial patch I have to Hari Babu Kommi. We expect that > >> he will be able to further this goal some more. > > > > > > Thanks Alvaro for sharing your development patch. > > > > Most of the patch design is same as described by Alvaro in the first mail > > [1]. > > I will detail the modifications, pending items and open items (needs > > discussion) > > to implement proper pluggable storage. > > > > Here I attached WIP patches to support pluggable storage. The patch > series > > are may not work individually. Still so many things are under > development. > > These patches are just to share the approach of the current development. > > > > +typedef struct StorageAmRoutine > +{ > > In this structure, you have already covered most of the API's that a > new storage module needs to provide, but I think there could be more. > One such API could be heap_hot_search. This seems specific to current > heap where we have the provision of HOT. I think we can provide a new > API tuple_search or something like that. Thanks for the review. Yes, the storageAmRoutine needs more function pointers. Currently I am adding all the functions that are present in the heapam.h and some slot related function from tuptable.h. Once I stabilize the code and API's that are currently added, then I will further enhance it with remaining functions that are necessary to support pluggable storage API. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Pluggable storage
On Mon, Aug 7, 2017 at 11:12 PM, Amit Kapilawrote: > On Tue, Aug 1, 2017 at 1:56 PM, Haribabu Kommi > wrote: > > > > > > On Sun, Jul 23, 2017 at 4:10 PM, Amit Kapila > > wrote: > >> > > > >> > >> > 1. Design an API that returns values/nulls array and convert that > into a > >> > HeapTuple whenever it is required in the upper layers. All the > existing > >> > heap form/deform tuples are used for every tuple with some > adjustments. > >> > > >> > >> So, this would have the additional cost of form/deform. Also, how > >> would it have lesser changes as compare to what you have described > >> earlier? > > > > > > Yes, It have the additional cost of form/deform. It is the same approach > > that > > is described earlier. But I have an intention of modifying everywhere the > > HeapTuple is accessed. But with the other prototype changes of removing > > HeapTuple usage from triggers, I realized that it needs some clear design > > to proceed further, instead of combining those changes with pluggable > > Storage API. > > > > - heap_getnext function is kept as it as and it is used only for system > > table > > access. > > - heap_getnext_slot function is introduced to return the slot whenever > the > > data is found, otherwise NULL, This function is used in all the places > > from > > Executor and etc. > > > > - The TupleTableSlot structure is modified to contain a void* tuple > instead > > of > > HeapTuple. And also it contains the storagehanlder functions. > > - heap_insert and etc function can take Slot as an argument and perform > the > > insert operation. > > > > The cases where the TupleTableSlot is not possible to sent, form a > HeapTuple > > from the data and sent it and also note down that it is a HeapTuple data, > > not > > the tuple from the storage. > > > .. > > > > > >> > >> > 3. Design an API that returns StorageTuple(void *) but the necessary > >> > format information of that tuple can be get from the tupledesc. > wherever > >> > the tuple is present, there exists a tupledesc in most of the cases. > How > >> > about adding some kind of information in tupledesc to find out the > tuple > >> > format and call the necessary functions > >> > > > > > > > heap_getnext function returns StorageTuple instead of HeapTuple. The > tuple > > type information is available in the TupleDesc structure. > > > > All heap_insert and etc function accepts TupleTableSlot as input and > perform > > the insert operation. This approach is almost same as first approach > except > > the > > storage handler functions are stored in TupleDesc. > > > > Why do we need to store handler function in TupleDesc? As of now, the > above patch series has it available in RelationData and > TupleTableSlot, I am not sure if instead of that keeping it in > TupleDesc is a good idea. Which all kind of places require TupleDesc > to contain handler? If those are few places, can we think of passing > it as a parameter? Till now I am to able to proceed without adding any storage handler functions to TupleDesc structure. Sure, I will try the way of passing as a parameter when there is a need of it. During the progress of the patch, I am facing problems in designing the storage API regarding the Buffer. For example To replace all the HeapTupleSatisfiesMVCC and related functions with function pointers, In HeapTuple format, the tuple may belongs to one buffer, so the buffer is passed to the HeapTupleSatifisifes*** functions along with buffer, But in case of other storage formats, the single buffer may not contains the actual data. This buffer is used to set the Hint bits and mark the buffer as dirty. In case if the buffer is not available, the performance may affect for the following queries if the hint bits are not set. And also the Buffer is used to get from heap_fetch, heap_lock_tuple and related functions to check the Tuple visibility, but currently returning a buffer from the above heap_** function is not possible for other formats. And also for the HeapTuple data, the tuple data is copied into palloced buffer instead of pointing directly to the page. So, returning a Buffer is a valid or not here? Currently I am proceeding to remove the Buffer as parameter in the API and proceed further, In case if it affects the performance, we need to find out a different appraoch in handling the hint bits. comments? Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken
On Thu, Aug 10, 2017 at 09:59:40PM -0400, Tom Lane wrote: > Noah Mischwrites: > > On Tue, Aug 08, 2017 at 07:25:37PM -0400, Peter Eisentraut wrote: > >> I don't think I can usefully contribute to this. Could someone else > >> take it? This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > If nobody volunteers, you could always resolve this by reverting 1e8a850 and > > successors. > > I think you're blaming the victim. Our current theory about the cause > of this is that on Windows, WaitLatchOrSocket cannot be used to wait for > completion of a nonblocking connect() call. That seems pretty broken > independently of whether libpqwalreceiver needs the capability. Yes, the theorized defect lies in APIs commit 1e8a850 used, not in the commit itself. Nonetheless, commit 1e8a850 promoted the defect from one reachable only by writing C code to one reachable by merely configuring replication on Windows according to the documentation. For that, its committer owns this open item. Besides the one approach I mentioned, there exist several other fine ways to implement said ownership. > In any case, we have a draft patch, so what we should be pressing for > is for somebody to test it. Now done. (Thanks, Jobin.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POC: Sharing record typmods between backends
Hi, On 2017-08-11 20:39:13 +1200, Thomas Munro wrote: > Please find attached a new patch series. I apologise in advance for > 0001 and note that the patchset now weighs in at ~75kB compressed. > Here are my in-line replies to your two reviews: Replying to a few points here, then I'll do a pass through your your submission... > On Tue, Jul 25, 2017 at 10:09 PM, Andres Freundwrote: > > It does concern me that we're growing yet another somewhat different > > hashtable implementation. Yet I don't quite see how we could avoid > > it. dynahash relies on proper pointers, simplehash doesn't do locking > > (and shouldn't) and also relies on pointers, although to a much lesser > > degree. All the open coded tables aren't a good match either. So I > > don't quite see an alternative, but I'd love one. > > Yeah, I agree. To deal with data structures with different pointer > types, locking policy, inlined hash/eq functions etc, perhaps there is > a way we could eventually do 'policy based design' using the kind of > macro trickery you started where we generate N different hash table > variations but only have to maintain source code for one chaining hash > table implementation? Or perl scripts that effectively behave as a > cfront^H^H^H nevermind. I'm not planning to investigate that for this > cycle. Whaaa, what have I done But more seriously, I'm doubtful it's worth going there. > > + * level. However, when a resize operation begins, all partition locks > > must > > + * be acquired simultaneously for a brief period. This is only expected to > > + * happen a small number of times until a stable size is found, since > > growth is > > + * geometric. > > > > I'm a bit doubtful that we need partitioning at this point, and that it > > doesn't actually *degrade* performance for your typmod case. > > Yeah, partitioning not needed for this case, but this is supposed to > be more generally useful. I thought about making the number of > partitions a construction parameter, but it doesn't really hurt does > it? Well, using multiple locks and such certainly isn't free. An exclusively owned cacheline mutex is nearly an order of magnitude faster than one that's currently shared, not to speak of modified. Also it does increase the size overhead, which might end up happening for a few other cases. > > + * Resizing is done incrementally so that no individual insert operation > > pays > > + * for the potentially large cost of splitting all buckets. > > > > I'm not sure this is a reasonable tradeoff for the use-case suggested so > > far, it doesn't exactly make things simpler. We're not going to grow > > much. > > Yeah, designed to be more generally useful. Are you saying you would > prefer to see the DHT patch split into an initial submission that does > the simplest thing possible, so that the unlucky guy who causes the > hash table to grow has to do all the work of moving buckets to a > bigger hash table? Then we could move the more complicated > incremental growth stuff to a later patch. Well, most of the potential usecases for dsmhash I've heard about so far, don't actually benefit much from incremental growth. In nearly all the implementations I've seen incremental move ends up requiring more total cycles than doing it at once, and for parallelism type usecases the stall isn't really an issue. So yes, I think this is something worth considering. If we were to actually use DHT for shared caches or such, this'd be different, but that seems darned far off. > This is complicated, and in the category that I would normally want a > stack of heavy unit tests for. If you don't feel like making > decisions about this now, perhaps iteration (and incremental resize?) > could be removed, leaving only the most primitive get/put hash table > facilities -- just enough for this purpose? Then a later patch could > add them back, with a set of really convincing unit tests... I'm inclined to go for that, yes. > > +/* > > + * Detach from a hash table. This frees backend-local resources associated > > + * with the hash table, but the hash table will continue to exist until it > > is > > + * either explicitly destroyed (by a backend that is still attached to > > it), or > > + * the area that backs it is returned to the operating system. > > + */ > > +void > > +dht_detach(dht_hash_table *hash_table) > > +{ > > + /* The hash table may have been destroyed. Just free local memory. > > */ > > + pfree(hash_table); > > +} > > > > Somewhat inclined to add debugging refcount - seems like bugs around > > that might be annoying to find. Maybe also add an assert ensuring that > > no locks are held? > > Added assert that not locks are held. > > In an earlier version I had reference counts. Then I realised that it > wasn't really helping anything. The state of being 'attached' to a > dht_hash_table isn't really the same as holding a heavyweight resource > like a DSM segment or a
[HACKERS] pg_stat_statements query normalization, and the 'in' operator
Hello there, Given the following list of queries: create table foo (id serial, bar integer); select * from foo where id in (1); select * from foo where id in (2,3); select * from foo where id in (1,3,5); select * from foo where id in (select id from foo); would it be possible to have first three select queries to be normalized into a single one so that 'select query from pg_stat_statements' returns something like: select * from foo where id in (...); select * from foo where id in (select id from foo); (2 rows) instead of: select * from foo where id in (?,?); select * from foo where id in (?,?,?); select * from foo where id in (?); select * from foo where id in (select id from foo); (4 rows) ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] additional contrib test suites
Here are some small test suites for some contrib modules as well as pg_archivecleanup that didn't have one previously, as well as one patch to improve code coverage in a module. Will add to commit fest. Testing on different platforms and with different build configurations would be useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 65b09d5b15b43c7eab24aae2d2e7e7a7979d57f3 Mon Sep 17 00:00:00 2001 From: Peter EisentrautDate: Fri, 11 Aug 2017 21:04:04 -0400 Subject: [PATCH 1/7] adminpack: Add test suite --- contrib/adminpack/.gitignore | 4 + contrib/adminpack/Makefile | 2 + contrib/adminpack/expected/adminpack.out | 144 +++ contrib/adminpack/sql/adminpack.sql | 56 4 files changed, 206 insertions(+) create mode 100644 contrib/adminpack/.gitignore create mode 100644 contrib/adminpack/expected/adminpack.out create mode 100644 contrib/adminpack/sql/adminpack.sql diff --git a/contrib/adminpack/.gitignore b/contrib/adminpack/.gitignore new file mode 100644 index 00..5dcb3ff972 --- /dev/null +++ b/contrib/adminpack/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/contrib/adminpack/Makefile b/contrib/adminpack/Makefile index f065f84bfb..89c249bc0d 100644 --- a/contrib/adminpack/Makefile +++ b/contrib/adminpack/Makefile @@ -8,6 +8,8 @@ EXTENSION = adminpack DATA = adminpack--1.0.sql PGFILEDESC = "adminpack - support functions for pgAdmin" +REGRESS = adminpack + ifdef USE_PGXS PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) diff --git a/contrib/adminpack/expected/adminpack.out b/contrib/adminpack/expected/adminpack.out new file mode 100644 index 00..83cbb741da --- /dev/null +++ b/contrib/adminpack/expected/adminpack.out @@ -0,0 +1,144 @@ +CREATE EXTENSION adminpack; +-- create new file +SELECT pg_file_write('test_file1', 'test1', false); + pg_file_write +--- + 5 +(1 row) + +SELECT pg_read_file('test_file1'); + pg_read_file +-- + test1 +(1 row) + +-- append +SELECT pg_file_write('test_file1', 'test1', true); + pg_file_write +--- + 5 +(1 row) + +SELECT pg_read_file('test_file1'); + pg_read_file +-- + test1test1 +(1 row) + +-- error, already exists +SELECT pg_file_write('test_file1', 'test1', false); +ERROR: file "test_file1" exists +SELECT pg_read_file('test_file1'); + pg_read_file +-- + test1test1 +(1 row) + +-- disallowed file paths +SELECT pg_file_write('../test_file0', 'test0', false); +ERROR: path must be in or below the current directory +SELECT pg_file_write('/tmp/test_file0', 'test0', false); +ERROR: absolute path not allowed +SELECT pg_file_write(current_setting('data_directory') || '/test_file4', 'test4', false); + pg_file_write +--- + 5 +(1 row) + +SELECT pg_file_write(current_setting('data_directory') || '/../test_file4', 'test4', false); +ERROR: reference to parent directory ("..") not allowed +-- rename file +SELECT pg_file_rename('test_file1', 'test_file2'); + pg_file_rename + + t +(1 row) + +SELECT pg_read_file('test_file1'); -- not there +ERROR: could not stat file "test_file1": No such file or directory +SELECT pg_read_file('test_file2'); + pg_read_file +-- + test1test1 +(1 row) + +-- error +SELECT pg_file_rename('test_file1', 'test_file2'); +WARNING: file "test_file1" is not accessible: No such file or directory + pg_file_rename + + f +(1 row) + +-- rename file and archive +SELECT pg_file_write('test_file3', 'test3', false); + pg_file_write +--- + 5 +(1 row) + +SELECT pg_file_rename('test_file2', 'test_file3', 'test_file3_archive'); + pg_file_rename + + t +(1 row) + +SELECT pg_read_file('test_file2'); -- not there +ERROR: could not stat file "test_file2": No such file or directory +SELECT pg_read_file('test_file3'); + pg_read_file +-- + test1test1 +(1 row) + +SELECT pg_read_file('test_file3_archive'); + pg_read_file +-- + test3 +(1 row) + +-- unlink +SELECT pg_file_unlink('test_file1'); -- does not exist + pg_file_unlink + + f +(1 row) + +SELECT pg_file_unlink('test_file2'); -- does not exist + pg_file_unlink + + f +(1 row) + +SELECT pg_file_unlink('test_file3'); + pg_file_unlink + + t +(1 row) + +SELECT pg_file_unlink('test_file3_archive'); + pg_file_unlink + + t +(1 row) + +SELECT pg_file_unlink('test_file4'); + pg_file_unlink + + t +(1 row) + +-- superuser checks +CREATE USER regress_user1; +SET ROLE regress_user1; +SELECT pg_file_write('test_file0', 'test0', false); +ERROR: only superuser may access generic file functions +SELECT pg_file_rename('test_file0', 'test_file0'); +ERROR: only
Re: [HACKERS] POC: Sharing record typmods between backends
On 2017-08-11 11:14:44 -0400, Robert Haas wrote: > On Fri, Aug 11, 2017 at 4:39 AM, Thomas Munro >wrote: > > OK. Now it's ds_hash_table.{c,h}, where "ds" stands for "dynamic > > shared". Better? If we were to do other data structures in DSA > > memory they could follow that style: ds_red_black_tree.c, ds_vector.c, > > ds_deque.c etc and their identifier prefix would be drbt_, dv_, dd_ > > etc. > > > > Do you want to see a separate patch to rename dsa.c? Got a better > > name? You could have spoken up earlier :-) It does sound like a bit > > like the thing from crypto or perhaps a scary secret government > > department. I, and I bet a lot of other people, kind of missed dsa being merged for a while... > I doubt that we really want to have accessor functions with names like > dynamic_shared_hash_table_insert or ds_hash_table_insert. Long names > are fine, even desirable, for APIs that aren't too widely used, > because they're relatively self-documenting, but a 30-character > function name gets annoying in a hurry if you have to call it very > often, and this is intended to be reusable for other things that want > a dynamic shared memory hash table. I think we should (a) pick some > reasonably short prefix for all the function names, like dht or dsht > or ds_hash, but not ds_hash_table or dynamic_shared_hash_table and (b) > also use that prefix as the name for the .c and .h files. Yea, I agree with this. Something dsmhash_{insert,...}... seems like it'd kinda work without being too ambiguous like dht imo is, while still being reasonably short. > Right now, we've got a situation where the most widely-used hash table > implementation uses dynahash.c for the code, hsearch.h for the > interface, and "hash" as the prefix for the names, and that's really > hard to remember. I think having a consistent naming scheme > throughout would be a lot better. Yea, that situation still occasionally confuses me, a good 10 years after starting to look at pg... There's even a a dynahash.h, except it's useless. And dynahash.c doesn't even include hsearch.h directly (included via shmem.h)! Personally I'd actually in favor of moving hsearch.h stuff into dynahash.h and leave hsearch as a wrapper. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Failover Slots
On 2017-08-02 16:35:17 -0400, Robert Haas wrote: > I actually think failover slots are quite desirable, especially now > that we've got logical replication in core. In a review of this > thread I don't see anyone saying otherwise. The debate has really > been about the right way of implementing that. Given that I presumably was one of the people pushing back more strongly: I agree with that. Besides disagreeing with the proposed implementation our disagreements solely seem to have been about prioritization. I still think we should have a halfway agreed upon *design* for logical failover, before we introduce a concept that's quite possibly going to be incompatible with that, however. But that doesn't mean it has to submitted/merged to core. > - When a standby connects to a master, it can optionally supply a list > of slot names that it cares about. > - The master responds by periodically notifying the standby of changes > to the slot contents using some new replication sub-protocol message. > - The standby applies those updates to its local copies of the slots. > So, you could create a slot on a standby with an "uplink this" flag of > some kind, and it would then try to keep it up to date using the > method described above. It's not quite clear to me how to handle the > case where the corresponding slot doesn't exist on the master, or > initially does but then it's later dropped, or it initially doesn't > but it's later created. I think there's a couple design goals we need to agree upon, before going into the weeds of how exactly we want this to work. Some of the axis I can think of are: - How do we want to deal with cascaded setups, do slots have to be available everywhere, or not? - What kind of PITR integration do we want? Note that simple WAL based slots do *NOT* provide proper PITR support, there's not enough interlock easily available (you'd have to save slots at the end, then increment minRecoveryLSN to a point later than the slot saving) - How much divergence are we going to accept between logical decoding on standbys, and failover slots. I'm probably a lot closer to closer than than Craig is. - How much divergence are we going to accept between infrastructure for logical failover, and logical failover via failover slots (or however we're naming this)? Again, I'm probably a lot closer to zero than craig is. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Hi, On 2017-07-21 19:32:02 +0300, Marina Polyakova wrote: > Here is the third version of the patch for pgbench thanks to Fabien Coelho > comments. As in the previous one, transactions with serialization and > deadlock failures are rolled back and retried until they end successfully or > their number of tries reaches maximum. Just had a need for this feature, and took this to a short test drive. So some comments: - it'd be useful to display a retry percentage of all transactions, similar to what's displayed for failed transactions. - it appears that we now unconditionally do not disregard a connection after a serialization / deadlock failure. Good. But that's useful far beyond just deadlocks / serialization errors, and should probably be exposed. - it'd be useful to also conveniently display the number of retried transactions, rather than the total number of retries. Nice feature! - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel COPY FROM execution
On Fri, Aug 11, 2017 at 9:55 AM, Alex Kwrote: > - I have used both Latch and ConditionalVariable for the same > purpose–wait until some signal > occurs–and for me as an end user they perform quite similar. I > looked into the condition_variable.c > code and it uses Latch and SpinLock under the hood. So what are > differences and dis-/advantages > between Latch and ConditionalVariable? A ConditionVariable lets you signal the processes that are waiting without needing to know in advance exactly which processes those are. If you use latches directly, you'll have to somehow keep track of which processes need to be signaled. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
On Thu, Aug 10, 2017 at 11:12 AM, Alexander Korotkovwrote: > These results look very cool! > I think CSN is eventually inevitable, but it's a long distance feature. > Thus, this optimization could make us a good serve before we would have CSN. > Do you expect there are cases when this patch causes slowdown? What about > case when we scan each xip array only once (not sure how to simulate that > using pgbench)? Just a random thought here: The statements pgbench executes are pretty simple: they touch one row in one table. You wouldn't expect them to scan the xip array very many times. If even those statements touch the array enough for this to win, it seems like it might be hard to construct an even worse case. I might be missing something, though. We're not going to accept code like this, though: +d = xip[i] >> 6; +j = k = xip[i] & mask; +while (xiphash[j] != InvalidTransactionId) +{ +j = (k + d) & mask; +d = d*5 + 1; +} Single-character variable names, hard-coded constants, and no comments! I kind of doubt as a general point that we really want another open-coded hash table -- I wonder if this could be made to use simplehash. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathiawrote: > Please find attach patch with the changes. I found the way that you had the logic structured in flagInhTables() to be quite hard to follow, so I rewrote it in the attached version. This version also has a bunch of minor cosmetic changes. Barring objections, I'll commit it once the tree opens for v11 development. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company load-via-partition-root-rmh.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Arrays of domains
I wrote: > Probably a better answer is to start supporting arrays over domain > types. That was left unimplemented in the original domains patch, > but AFAICS not for any better reason than lack of round tuits. Attached is a patch series that allows us to create arrays of domain types. I haven't tested this in great detail, so there might be some additional corners of the system that need work, but it passes basic sanity checks. I believe it's independent of the other patch I have in the commitfest for domains over composites, but I haven't tested for interactions there either. 01-rationalize-coercion-APIs.patch cleans up the APIs of coerce_to_domain() and some internal functions in parse_coerce.c so that we consistently pass around a CoercionContext along with CoercionForm. Previously, we sometimes passed an "isExplicit" boolean flag instead, which is strictly less information; and coerce_to_domain() didn't even get that, but instead had to reverse-engineer isExplicit from CoercionForm. That's contrary to the documentation in primnodes.h that says that CoercionForm only affects display and not semantics. I don't think this change fixes any live bugs, but it makes things more consistent. The main reason for doing it though is that now build_coercion_expression() receives ccontext, which it needs in order to be able to recursively invoke coerce_to_target_type(), as required by the next patch. 02-reimplement-ArrayCoerceExpr.patch is the core of the patch. It changes ArrayCoerceExpr so that the node does not directly know any details of what has to be done to the individual array elements while performing the array coercion. Instead, the per-element processing is represented by a sub-expression whose input is a source array element and whose output is a target array element. This simplifies life in parse_coerce.c, because it can build that sub-expression by a recursive invocation of coerce_to_target_type(), and it allows the executor to handle the per-element processing as a compiled expression instead of hard-wired code. This is probably about a wash or a small loss performance-wise for the simplest case where we just need to invoke one coercion function for each element. However, there are many cases where the existing code ends up generating two nested ArrayCoerceExprs, one to do the type conversion and one to apply a typmod (length) coercion function. In the new code there will be just one ArrayCoerceExpr, saving one deconstruction and reconstruction of the array. If I hadn't done it like this, adding domains into the mix could have produced as many as three ArrayCoerceExprs, where the top one would have only checked domain constraints; that did not sound nice performance-wise, and it would have required a lot of code duplication as well. Finally, 03-support-arrays-of-domains.patch simply turns on the spigot by creating an array type in DefineDomain(), and adds some test cases. Given the new method of handling ArrayCoerceExpr, everything Just Works. I'll add this to the next commitfest. regards, tom lane diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c index afc733f..277ca8b 100644 *** a/src/backend/optimizer/prep/preptlist.c --- b/src/backend/optimizer/prep/preptlist.c *** expand_targetlist(List *tlist, int comma *** 306,314 new_expr = coerce_to_domain(new_expr, InvalidOid, -1, atttype, COERCE_IMPLICIT_CAST, -1, - false, false); } else --- 306,314 new_expr = coerce_to_domain(new_expr, InvalidOid, -1, atttype, + COERCION_IMPLICIT, COERCE_IMPLICIT_CAST, -1, false); } else diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c index 0bc7dba..c406cea 100644 *** a/src/backend/parser/parse_coerce.c --- b/src/backend/parser/parse_coerce.c *** *** 34,48 static Node *coerce_type_typmod(Node *node, Oid targetTypeId, int32 targetTypMod, ! CoercionForm cformat, int location, ! bool isExplicit, bool hideInputCoercion); static void hide_coercion_node(Node *node); static Node *build_coercion_expression(Node *node, CoercionPathType pathtype, Oid funcId, Oid targetTypeId, int32 targetTypMod, ! CoercionForm cformat, int location, ! bool isExplicit); static Node *coerce_record_to_complex(ParseState *pstate, Node *node, Oid targetTypeId, CoercionContext ccontext, --- 34,49 static Node *coerce_type_typmod(Node *node, Oid targetTypeId, int32 targetTypMod, ! CoercionContext ccontext, CoercionForm cformat, ! int location, ! bool hideInputCoercion); static void hide_coercion_node(Node *node); static Node
Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)
В Thu, 10 Aug 2017 18:12:34 +0300 Alexander Korotkovпишет: > On Thu, Aug 10, 2017 at 3:30 PM, Sokolov Yura > wrote: > > > On 2017-07-31 18:56, Sokolov Yura wrote: > > > >> Good day, every one. > >> > >> In attempt to improve performance of YCSB on zipfian distribution, > >> it were found that significant time is spent in XidInMVCCSnapshot > >> in scanning snapshot->xip array. While overall CPU time is not too > >> noticable, it has measurable impact on scaleability. > >> > >> First I tried to sort snapshot->xip in GetSnapshotData, and search > >> in a sorted array. But since snapshot->xip is not touched if no > >> transaction contention occurs, sorting xip always is not best > >> option. > >> > >> Then I sorted xip array on demand in XidInMVCCSnapshot only if > >> search in snapshot->xip occurs (ie lazy sorting). It performs much > >> better, but since it is O(NlogN), sort's execution time is > >> noticable for large number of clients. > >> > >> Third approach (present in attached patch) is making hash table > >> lazily on first search in xip array. > >> > >> Note: hash table is not built if number of "in-progress" xids is > >> less than 60. Tests shows, there is no big benefit from doing so > >> (at least on Intel Xeon). > >> > >> With regards, > >> > > > > Here is new, more correct, patch version, and results for pgbench > > with zipfian distribution (50% read 50% write) (same to Alik's > > tests at > > https://www.postgresql.org/message-id/BF3B6F54-68C3-417A-BFA > > b-fb4d66f2b...@postgrespro.ru ) > > > > > > clients | master | hashsnap2 | hashsnap2_lwlock > > ++---+-- > > 10 | 203384 |203813 | 204852 > > 20 | 334344 |334268 | 363510 > > 40 | 228496 |231777 | 383820 > > 70 | 146892 |148173 | 221326 > > 110 | 99741 |111580 | 157327 > > 160 | 65257 | 81230 | 112028 > > 230 | 38344 | 56790 |77514 > > 310 | 22355 | 39249 |55907 > > 400 | 13402 | 26899 |39742 > > 500 | 8382 | 17855 |28362 > > 650 | 5313 | 11450 |17497 > > 800 | 3352 | 7816 |11030 > > > > (Note: I'm not quite sure, why my numbers for master are lower than > > Alik's one. Probably, our test methodology differs a bit. Perhaps, > > it is because I don't recreate tables between client number change, > > but between branch switch). > > > > These results look very cool! > I think CSN is eventually inevitable, but it's a long distance > feature. Thus, this optimization could make us a good serve before we > would have CSN. Do you expect there are cases when this patch causes > slowdown? What about case when we scan each xip array only once (not > sure how to simulate that using pgbench)? > Patched version allocates a bit more memory per process (if number of backends is greater than 60), and for copied snapshot (if number of concurrent transactions is greater than 60). I have no strong evidence patch could cause slowdown. Different When xip array is scanned only once, then XidInMVCCSnapshot consumes usually 0.2-0.5% CPU in total, and scanning xip array even less. So even building hash table slows first scan in three-four times, it could increase overall CPU usage just on 0.5-1%, and it is not necessary directly mapped to tps. -- With regards, Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select
On Thu, Aug 10, 2017 at 6:21 PM, Mark Dilgerwrote: > That misses the point I was making. I was suggesting a syntax where > the caller promises to use all rows without stopping short, and the > database performance problems of having a bunch of parallel workers > suspended in mid query is simply the caller's problem if the caller does > not honor the contract. I understand. My point is: that isn't sufficient to solve the problem. It's not a question of whether the caller uses all the tuples, but whether the caller gets to do anything else while the tuples are being generated, so to make it work, you'd have to first run the parallel query to completion and buffer all the tuples in memory and *only then* begin iterating over them and running the user-supplied code. You can't run the parallel query until it produces a tuple, then do the code inside the loop, then run it until the next tuple shows up, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patches I'm thinking of pushing shortly
Robert Haaswrites: > On Fri, Aug 11, 2017 at 11:24 AM, Tom Lane wrote: >> 3. remove-pgbench-option-ordering-constraint.patch from >> https://www.postgresql.org/message-id/20559.1501703...@sss.pgh.pa.us >> >> That one was already informally reviewed by two people, so I don't >> think it needs another look. > I'd vote for putting this fix into v10, but maybe that's just me. ... OK by me, any objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patches I'm thinking of pushing shortly
On Fri, Aug 11, 2017 at 11:24 AM, Tom Lanewrote: > 3. remove-pgbench-option-ordering-constraint.patch from > https://www.postgresql.org/message-id/20559.1501703...@sss.pgh.pa.us > > That one was already informally reviewed by two people, so I don't > think it needs another look. I'd vote for putting this fix into v10, but maybe that's just me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] scan on inheritance parent with no children in current session
On Sun, Aug 6, 2017 at 10:56 PM, Amit Langotewrote: > Good catch. I agree that getting an Append node after that commit is > unintentional and we should fix so that we don't get an Append. So, +1 to > your patch. I looked at the patch and the code fix seems to do what we want. So, I also agree that this is a good fix, but I don't think it fixes the whole problem. Consider: rhaas=# create table parent (a int) partition by list (a); CREATE TABLE rhaas=# create temp table child partition of parent for values in (1); CREATE TABLE rhaas=# explain verbose select * from parent; QUERY PLAN - Append (cost=0.00..35.50 rows=2550 width=4) -> Seq Scan on pg_temp_3.child (cost=0.00..35.50 rows=2550 width=4) Output: child.a (3 rows) But the comments say: * A childless table is never considered to be an inheritance set; therefore * a parent RTE must always have at least two associated AppendRelInfos. Yet, not. So at least the comments need to be updated; not sure if we want to try to eliminate the Append node in this case also. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patches I'm thinking of pushing shortly
I have some patches sitting around in my workspace that I think are non-controversial, and so I was considering just pushing them once the tree opens for v11 development. If anyone thinks they need further review, I'll put them into the September commitfest, but otherwise we might as well skip the overhead. These are: 1. check-hash-bucket-size-against-work_mem-2.patch from https://www.postgresql.org/message-id/13698.1487283...@sss.pgh.pa.us That discussion sort of trailed off, but there wasn't really anyone saying not to commit it, and no new ideas have surfaced. 2. simplify-simple-expresssion-checking.patch from https://www.postgresql.org/message-id/2257.1498412...@sss.pgh.pa.us This is the patch to get rid of plpgsql's exec_simple_check_node() as Robert suggested. It's never been anything but a maintenance nuisance. 3. remove-pgbench-option-ordering-constraint.patch from https://www.postgresql.org/message-id/20559.1501703...@sss.pgh.pa.us That one was already informally reviewed by two people, so I don't think it needs another look. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POC: Sharing record typmods between backends
On Fri, Aug 11, 2017 at 4:39 AM, Thomas Munrowrote: > OK. Now it's ds_hash_table.{c,h}, where "ds" stands for "dynamic > shared". Better? If we were to do other data structures in DSA > memory they could follow that style: ds_red_black_tree.c, ds_vector.c, > ds_deque.c etc and their identifier prefix would be drbt_, dv_, dd_ > etc. > > Do you want to see a separate patch to rename dsa.c? Got a better > name? You could have spoken up earlier :-) It does sound like a bit > like the thing from crypto or perhaps a scary secret government > department. I doubt that we really want to have accessor functions with names like dynamic_shared_hash_table_insert or ds_hash_table_insert. Long names are fine, even desirable, for APIs that aren't too widely used, because they're relatively self-documenting, but a 30-character function name gets annoying in a hurry if you have to call it very often, and this is intended to be reusable for other things that want a dynamic shared memory hash table. I think we should (a) pick some reasonably short prefix for all the function names, like dht or dsht or ds_hash, but not ds_hash_table or dynamic_shared_hash_table and (b) also use that prefix as the name for the .c and .h files. Right now, we've got a situation where the most widely-used hash table implementation uses dynahash.c for the code, hsearch.h for the interface, and "hash" as the prefix for the names, and that's really hard to remember. I think having a consistent naming scheme throughout would be a lot better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel COPY FROM execution
Greetings pgsql-hackers, I have completed the general infrastructure for parallel COPY FROM execution, consisting of Main (master) process and multiple BGWorkers connected with master using a personal message query (shm_mq). Master process does: - Dynamic shared memory allocation with parallel state across BGWorkers and master - Attaching every worker to the personal message query (shm_mq) - Wait workers initialization using Latch - Read raw text lines using CopyReadLine and puts them into shm_mq's via round-robin to balance queries load - When EOF is reached sends zero-length message and workers are safely shut down when receive it - Wait for worker until they complete their jobs using ConditionalVariable Each BGWorker does: - Signal master on initialization via Latch - Receive raw text lines over the personal shm_mq and put them into the log (for now) - Reinitialize db connection using the same db_id and user_id as main process - Signal master via ConditionalVariable on job done All parallel state modifications are done under LWLocks. You can find actual code here https://github.com/ololobus/postgres/pull/2/files (it is still in progress, so has a lot of duplications and comments, which are to-be-deleted) To go forward I have to overcome some obstacles: - Currently all copy.c methods are designed to work with one giant structure – CopyState. It includes buffers, many initial parameters which stay unchanged and a few variables which vary during COPY FROM execution. Since I need all these parameters, I have to obtain them somehow inside each BGWorker process. I see two possible solutions here: 1) Perform BeginCopyFrom initialization inside master and put required parameters into shared memory. However, many of them are arrays of a variable size (e.g. partition_tupconv_maps, force_notnull_flags), so I cannot put them into shmem inside one single struct. The best idea I have is to put each parameter under the personal shmem TOC key, which seems to be quite ugly. 2) Perform BeginCopyFrom initialization inside each BGWorker. However, it also opens input file/pipe for read, which is not necessary for workers and may cause some interference with master, but I can modify BeginCopyFrom. - I have used both Latch and ConditionalVariable for the same purpose–wait until some signal occurs–and for me as an end user they perform quite similar. I looked into the condition_variable.c code and it uses Latch and SpinLock under the hood. So what are differences and dis-/advantages between Latch and ConditionalVariable? I will be glad if someone will help me to find an answer to my question; also any comments and remarks to the overall COPY FROM processing architecture are very welcome. Alexey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on unit testing?
Peter Eisentrautwrites: > On 8/10/17 17:53, Thomas Munro wrote: >> The current regression tests, isolation tests and TAP tests are very >> good (though I admit my experience with TAP is limited), but IMHO we >> are lacking support for C-level unit testing. Complicated, fiddly >> things with many states, interactions, edge cases etc can be hard to >> get full test coverage on from the outside. Consider >> src/backend/utils/mmgr/freepage.c as a case in point. > I don't have a good idea how to address this, but I agree that something > in this area would be widely useful. I don't think anyone doubts it would be useful. The question is can we get to something useful with an acceptable level of effort and overhead. So far I've not seen anything promising in that sense. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny WAL corruption issue
On Fri, Aug 11, 2017 at 1:33 PM, Greg Starkwrote: > On 10 August 2017 at 15:26, Chris Travers wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset > > 1146880 > ... > > Since this didn't throw a checksum error (we have data checksums > disabled but wal records ISTR have a separate CRC check), would this > perhaps indicate that the checksum operated over incorrect data? > > No checksum error and this "unexpected pageaddr" doesn't necessarily > mean data corruption. It could mean that when the database stopped logging > it was reusing a wal file and the old wal stream had a record boundary > on the same byte position. So the previous record checksum passed and > the following record checksum passes but the record header is for a > different wal stream position. > > I think you could actually hack xlogdump to ignore this condition and > keep outputting and you'll see whether the records that follow appear > to be old wal log data. I haven't actually tried this though. > For better or worse, I get a different error at the same spot if I try this: Doing so involved disabling the check in the backend wal reader. pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: invalid contrecord length 4509 at 1E39C/E1117FF8 If I hack it to ignore all errors on that record, no further records come out though it does run over the same records. This leads me to conclude there are no further valid records. > > -- > greg > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] Thoughts on unit testing?
On 8/10/17 17:53, Thomas Munro wrote: > The current regression tests, isolation tests and TAP tests are very > good (though I admit my experience with TAP is limited), but IMHO we > are lacking support for C-level unit testing. Complicated, fiddly > things with many states, interactions, edge cases etc can be hard to > get full test coverage on from the outside. Consider > src/backend/utils/mmgr/freepage.c as a case in point. I don't have a good idea how to address this, but I agree that something in this area would be widely useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM protocol documentation
On 8/11/17 09:06, Álvaro Hernández Tortosa wrote: > Strictly speaking the RFC assumes that the username is at least 1 > character. I understand this was precisely Peter's original comment. Well, my main point was that the documentation, the code, and the code comments all say slightly different things. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM protocol documentation
On 8/11/17 07:18, Michael Paquier wrote: > The problem is where a username includes characters as a comma or '=', > which can be avoided if the string is in UTF-8 as the username is > prepared with SASLprep before being used in the SASL exchange, but we > have no way now to be sure now that the string is actually in UTF-8. > If at some point we decide that only things using UTF-8 are good to be > used during authentication, using the username in the exchange > messages instead of the one in the startup packet would be fine and > actually better IMO in the long term. Please note that the > specification says that both the username and the password must be > encoded in UTF-8, so we are not completely compliant here. If there is > something to address, that would be this part. So we already handle passwords. Can't we handle user names the same way? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
> > Comments on the latest patch: Thank you once again for reviewing my patches. > > 1. > /* > + * We remember prev and next block number along with current block > + * number so that if fetching the tuples using cursor we know the > + * page from where to begin. This is for the case where we have > + * reached the end of bucket chain without finding any matching > + * tuples. > + */ > + if (!BlockNumberIsValid(opaque->hasho_nextblkno)) > + prev_blkno = opaque->hasho_prevblkno; > > This doesn't seem to be the right place for this comment as you are > not saving next or current block number here. I am not sure whether > you really need this comment at all. Will it be better if you move > this to else part of the loop and reword it as: > > "Remember next and previous block numbers for scrollable cursors to > know the start position." Shifted the comments to else part of the loop. > > 2. The code in _hash_readpage looks quite bloated. I think we can > make it better if we do something like below. > a. > +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) > { > .. > + if (so->currPos.buf == so->hashso_bucket_buf || > + so->currPos.buf == so->hashso_split_bucket_buf) > + { > + so->currPos.prevPage = InvalidBlockNumber; > + so->currPos.nextPage = opaque->hasho_nextblkno; > + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); > + } > + else > + { > + so->currPos.prevPage = opaque->hasho_prevblkno; > + so->currPos.nextPage = opaque->hasho_nextblkno; > + _hash_relbuf(rel, so->currPos.buf); > + so->currPos.buf = InvalidBuffer; > + } > .. > } > > This code chunk is same for both forward and backward scans. I think > you can have single copy of this code by moving it out of if-else > loop. Done. > > b. > +_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir) > { > .. > + /* new page, locate starting position by binary search */ > + offnum = _hash_binsearch(page, so->hashso_sk_hash); > + > + itemIndex = _hash_load_qualified_items(scan, page, offnum, dir); > + > + while (itemIndex == 0) > + { > + /* > + * Could not find any matching tuples in the current page, move to > + * the next page. Before leaving the current page, also deal with > + * any killed items. > + */ > + if (so->numKilled > 0) > + _hash_kill_items(scan); > + > + /* > + * We remember prev and next block number along with current block > + * number so that if fetching the tuples using cursor we know the > + * page from where to begin. This is for the case where we have > + * reached the end of bucket chain without finding any matching > + * tuples. > + */ > + if (!BlockNumberIsValid(opaque->hasho_nextblkno)) > + prev_blkno = opaque->hasho_prevblkno; > + > + _hash_readnext(scan, , , ); > + if (BufferIsValid(buf)) > + { > + so->currPos.buf = buf; > + so->currPos.currPage = BufferGetBlockNumber(buf); > + so->currPos.lsn = PageGetLSN(page); > + offnum = _hash_binsearch(page, so->hashso_sk_hash); > + itemIndex = _hash_load_qualified_items(scan, page, > + offnum, dir); > .. > } > > Have just one copy of code search the offset and load qualified items. > Change the above while loop to do..while loop and have a check in > between break the loop when item index is not zero. Done that way. > > 3. > - * Find the first item in the index that > - * satisfies the qualification associated with the scan descriptor. On > - * success, the page containing the current index tuple is read locked > - * and pinned, and the scan's opaque data entry is updated to > - * include the buffer. > + * We find the first item (or, if backward scan, the last item) in > + * the index that satisfies the qualification associated with the > + * scan descriptor. On success, the page containing the current > + * index tuple is read locked and pinned, and data about the > + * matching tuple(s) on the page has been loaded into so->currPos, > + * scan->xs_ctup.t_self is set to the heap TID of the current tuple. > + * > + * If there are no matching items in the index, we return FALSE, > + * with no pins or locks held. > */ > bool > _hash_first(IndexScanDesc scan, ScanDirection dir) > @@ -229,15 +297,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir) > > I don't think on success, the lock or pin is held anymore, after this > patch the only pin will be retained and that too for bucket page. > Also, there is no need to modify the part of the comment which is not > related to change in this patch. Corrected. > > I don't see scan->xs_ctup.t_self getting set anywhere in this > function. I think you are setting it in hashgettuple. It is better > to move that assignment from hashgettuple to _hash_first so as to be > consistent with _hash_next. Done that way. > > 4. > + * On successful exit, scan->xs_ctup.t_self is set to the TID > + * of the next heap tuple, and if requested, scan->xs_itup > + * points to a copy of the index tuple. so->currPos is updated > + * as needed. > + * > + * On failure exit (no more tuples), we return FALSE with no > + *
Re: [HACKERS] SCRAM protocol documentation
On 11/08/17 15:00, Michael Paquier wrote: On Fri, Aug 11, 2017 at 9:31 PM, Álvaro Hernández Tortosawrote: On 11/08/17 13:18, Michael Paquier wrote: On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosa wrote: Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? The problem is where a username includes characters as a comma or '=', which can be avoided if the string is in UTF-8 as the username is prepared with SASLprep before being used in the SASL exchange, but we have no way now to be sure now that the string is actually in UTF-8. If at some point we decide that only things using UTF-8 are good to be used during authentication, using the username in the exchange messages instead of the one in the startup packet would be fine and actually better IMO in the long term. Please note that the specification says that both the username and the password must be encoded in UTF-8, so we are not completely compliant here. If there is something to address, that would be this part. The reason why the username is ignored, unless I'm wrong, is not exactly that it is already sent. It is that Postgres does not restrict usernames to be UTF-8 only, while SCRAM does. As such, if a username would not be UTF-8, it will not be sent reliably over SCRAM. That's basically the point I was making. Note that I would not be against Postgres forcing strings to be in UTF-8. Now things are fuzzy because of the lack of restrictions. I'm +1 for that. But I guess that involves a protocol change, and that's a completely different can of worms If there's a clear meaning about ignoring the user here, why not settle on something like the "*"? It's not going to change the world sending a few bytes less on initialization, but I guess it doesn't hurt either... I am not sure either that '*' would be that much helpful. Requiring that things are in UTF-8 would be more compliant with the original RFC. But we really don't need to send the username, since Postgres already knows it (and that accommodates for non UTF-8 usernames). So why bother? Just sending something like "*" (which is UTF-8 and produces the same value under Saslprep) should be enough. I think the idea of ignoring the username is pretty neat, but maybe a "standard" like "send me an asterisk here" could be even better than leaving it empty. Personally I don't see much difference between both, so I'd rather leave things as they are now. Strictly speaking the RFC assumes that the username is at least 1 character. I understand this was precisely Peter's original comment. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM protocol documentation
On Fri, Aug 11, 2017 at 9:31 PM, Álvaro Hernández Tortosawrote: > On 11/08/17 13:18, Michael Paquier wrote: >> On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosa >> wrote: Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? >> >> The problem is where a username includes characters as a comma or '=', >> which can be avoided if the string is in UTF-8 as the username is >> prepared with SASLprep before being used in the SASL exchange, but we >> have no way now to be sure now that the string is actually in UTF-8. >> If at some point we decide that only things using UTF-8 are good to be >> used during authentication, using the username in the exchange >> messages instead of the one in the startup packet would be fine and >> actually better IMO in the long term. Please note that the >> specification says that both the username and the password must be >> encoded in UTF-8, so we are not completely compliant here. If there is >> something to address, that would be this part. > > The reason why the username is ignored, unless I'm wrong, is not exactly > that it is already sent. It is that Postgres does not restrict usernames to > be UTF-8 only, while SCRAM does. As such, if a username would not be UTF-8, > it will not be sent reliably over SCRAM. That's basically the point I was making. Note that I would not be against Postgres forcing strings to be in UTF-8. Now things are fuzzy because of the lack of restrictions. >>> If there's a clear meaning about ignoring the user here, why not >>> settle >>> on something like the "*"? It's not going to change the world sending a >>> few >>> bytes less on initialization, but I guess it doesn't hurt either... >> >> I am not sure either that '*' would be that much helpful. Requiring >> that things are in UTF-8 would be more compliant with the original >> RFC. > > But we really don't need to send the username, since Postgres already > knows it (and that accommodates for non UTF-8 usernames). So why bother? > Just sending something like "*" (which is UTF-8 and produces the same value > under Saslprep) should be enough. I think the idea of ignoring the username > is pretty neat, but maybe a "standard" like "send me an asterisk here" could > be even better than leaving it empty. Personally I don't see much difference between both, so I'd rather leave things as they are now. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny WAL corruption issue
On Fri, Aug 11, 2017 at 1:33 PM, Greg Starkwrote: > On 10 August 2017 at 15:26, Chris Travers wrote: > > > > > > The bitwise comparison is interesting. Remember the error was: > > > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset > > 1146880 > ... > > Since this didn't throw a checksum error (we have data checksums > disabled but wal records ISTR have a separate CRC check), would this > perhaps indicate that the checksum operated over incorrect data? > > No checksum error and this "unexpected pageaddr" doesn't necessarily > mean data corruption. It could mean that when the database stopped logging > it was reusing a wal file and the old wal stream had a record boundary > on the same byte position. So the previous record checksum passed and > the following record checksum passes but the record header is for a > different wal stream position. > I expect to test this theory shortly. Assuming it is correct, what can we do to prevent restarts of slaves from running into it? > I think you could actually hack xlogdump to ignore this condition and > keep outputting and you'll see whether the records that follow appear > to be old wal log data. I haven't actually tried this though. > > -- > greg > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more
Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken
Appears that patch is not helping. Errors like below are still appearing in the log === 2017-08-11 12:22:35 UTC [2840]: [1-1] user=,db=,app=,client= FATAL: could not connect to the primary server: could not send data to server: Socket is not connected (0x2749/10057) could not send startup packet: Socket is not connected (0x2749/10057) 2017-08-11 12:22:40 UTC [1964]: [1-1] user=,db=,app=,client= FATAL: could not connect to the primary server: could not send data to server: Socket is not connected (0x2749/10057) could not send startup packet: Socket is not connected (0x2749/10057) 2017-08-11 12:22:45 UTC [248]: [1-1] user=,db=,app=,client= FATAL: could not connect to the primary server: could not send data to server: Socket is not connected (0x2749/10057) could not send startup packet: Socket is not connected (0x2749/10057) === On Fri, Aug 11, 2017 at 7:28 AM, Augustine, Jobin < jobin.august...@openscg.com> wrote: > I am in an effort to create independent build environment on Windows to > test the patch from Andres. > I shall come back with result possibly in another 24 hours. > > -Jobin > > On Fri, Aug 11, 2017 at 7:25 AM, Andres Freundwrote: > >> On 2017-08-10 18:51:07 -0700, Noah Misch wrote: >> > On Tue, Aug 08, 2017 at 07:25:37PM -0400, Peter Eisentraut wrote: >> > > On 8/7/17 21:06, Noah Misch wrote: >> > > >> That would fit. Until v10 (commit 1e8a850), PQconnectStart() had >> no in-tree >> > > >> callers outside of libpq itself. >> > > > [Action required within three days. This is a generic >> notification.] >> > > > >> > > > The above-described topic is currently a PostgreSQL 10 open item. >> Peter, >> > > > since you committed the patch believed to have created it, you own >> this open >> > > > item. >> > > >> > > I don't think I can usefully contribute to this. Could someone else >> > > take it? >> > >> > If nobody volunteers, you could always resolve this by reverting >> 1e8a850 and >> > successors. >> >> I've volunteered a fix nearby, just can't test it. I don't think we can >> require committers to work on windows. >> >> - Andres >> > > > > -- > > *Jobin Augustine* > Architect : Production Database Operations > > > *OpenSCG* > > > *phone : +91 9989932600* > -- *Jobin Augustine* Architect : Production Database Operations *OpenSCG* *phone : +91 9989932600*
Re: [HACKERS] SCRAM protocol documentation
On 11/08/17 13:18, Michael Paquier wrote: On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosawrote: On 11/08/17 03:57, Peter Eisentraut wrote: The SCRAM protocol documentation (https://www.postgresql.org/docs/devel/static/sasl-authentication.html) states "To avoid confusion, the client should use pg_same_as_startup_message as the username in the client-first-message." However, the client implementation in libpq doesn't actually do that, it sends an empty string for the user name. I find no other reference to "pg_same_as_startup_message" in the sources. Should the documentation be updated? Yes, definitely. I think that we should mention that the server uses the username of the startup packet and ignores the data sent by the frontend potentially provided in client-first-message. But it already says so the documentation: "The username that was already sent in the startup message is used instead." Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? The problem is where a username includes characters as a comma or '=', which can be avoided if the string is in UTF-8 as the username is prepared with SASLprep before being used in the SASL exchange, but we have no way now to be sure now that the string is actually in UTF-8. If at some point we decide that only things using UTF-8 are good to be used during authentication, using the username in the exchange messages instead of the one in the startup packet would be fine and actually better IMO in the long term. Please note that the specification says that both the username and the password must be encoded in UTF-8, so we are not completely compliant here. If there is something to address, that would be this part. The reason why the username is ignored, unless I'm wrong, is not exactly that it is already sent. It is that Postgres does not restrict usernames to be UTF-8 only, while SCRAM does. As such, if a username would not be UTF-8, it will not be sent reliably over SCRAM. If there's a clear meaning about ignoring the user here, why not settle on something like the "*"? It's not going to change the world sending a few bytes less on initialization, but I guess it doesn't hurt either... I am not sure either that '*' would be that much helpful. Requiring that things are in UTF-8 would be more compliant with the original RFC. But we really don't need to send the username, since Postgres already knows it (and that accommodates for non UTF-8 usernames). So why bother? Just sending something like "*" (which is UTF-8 and produces the same value under Saslprep) should be enough. I think the idea of ignoring the username is pretty neat, but maybe a "standard" like "send me an asterisk here" could be even better than leaving it empty. Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Funny WAL corruption issue
On 10 August 2017 at 15:26, Chris Traverswrote: > > > The bitwise comparison is interesting. Remember the error was: > > pg_xlogdump: FATAL: error in WAL record at 1E39C/E1117FB8: unexpected > pageaddr 1E375/61118000 in log segment 0001E39C00E1, offset > 1146880 ... > Since this didn't throw a checksum error (we have data checksums disabled but > wal records ISTR have a separate CRC check), would this perhaps indicate that > the checksum operated over incorrect data? No checksum error and this "unexpected pageaddr" doesn't necessarily mean data corruption. It could mean that when the database stopped logging it was reusing a wal file and the old wal stream had a record boundary on the same byte position. So the previous record checksum passed and the following record checksum passes but the record header is for a different wal stream position. I think you could actually hack xlogdump to ignore this condition and keep outputting and you'll see whether the records that follow appear to be old wal log data. I haven't actually tried this though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SCRAM protocol documentation
On Fri, Aug 11, 2017 at 3:50 PM, Álvaro Hernández Tortosawrote: > On 11/08/17 03:57, Peter Eisentraut wrote: >> The SCRAM protocol documentation >> (https://www.postgresql.org/docs/devel/static/sasl-authentication.html) >> states >> >> "To avoid confusion, the client should use pg_same_as_startup_message as >> the username in the client-first-message." >> >> However, the client implementation in libpq doesn't actually do that, it >> sends an empty string for the user name. I find no other reference to >> "pg_same_as_startup_message" in the sources. Should the documentation >> be updated? Yes, definitely. I think that we should mention that the server uses the username of the startup packet and ignores the data sent by the frontend potentially provided in client-first-message. >> Relatedly, the SCRAM specification doesn't appear to allow omitting the >> user name in this manner. Why don't we just send the actual user name, >> even though it's redundant with the startup message? The problem is where a username includes characters as a comma or '=', which can be avoided if the string is in UTF-8 as the username is prepared with SASLprep before being used in the SASL exchange, but we have no way now to be sure now that the string is actually in UTF-8. If at some point we decide that only things using UTF-8 are good to be used during authentication, using the username in the exchange messages instead of the one in the startup packet would be fine and actually better IMO in the long term. Please note that the specification says that both the username and the password must be encoded in UTF-8, so we are not completely compliant here. If there is something to address, that would be this part. > If there's a clear meaning about ignoring the user here, why not settle > on something like the "*"? It's not going to change the world sending a few > bytes less on initialization, but I guess it doesn't hurt either... I am not sure either that '*' would be that much helpful. Requiring that things are in UTF-8 would be more compliant with the original RFC. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reload-through-the-top-parent switch the partition table
On Thu, Aug 10, 2017 at 8:26 PM, Robert Haaswrote: > On Thu, Aug 10, 2017 at 3:47 AM, Rushabh Lathia > wrote: > >> (1) seems like a pretty arbitrary restriction, so I don't like that > >> option. (2) would hurt performance in some use cases. Do we have an > >> option (3)? > > > > How about protecting option 2) with the load-via-partition-root > protection. > > Means > > load the parents information even dump is not set only when > > load-via-partition-root > > & ispartition. > > > > This won't hurt performance for the normal cases. > > Yes, that seems like the right approach. > > +Dump data via the top-most partitioned table (rather than > partition > +table) when dumping data for the partition table. > > I think we should phrase this a bit more clearly, something like this: > When dumping a COPY or INSERT statement for a partitioned table, > target the root of the partitioning hierarchy which contains it rather > than the partition itself. This may be useful when reloading data on > a server where rows do not always fall into the same partitions as > they did on the original server. This could happen, for example, if > the partitioning column is of type text and the two system have > different definitions of the collation used to partition the data. > > Done. > +printf(_(" --load-via-partition-rootload partition table via > the root relation\n")); > > "relation" seems odd to me here. root table? > > Done. > /* Don't bother computing anything for non-target tables, either > */ > if (!tblinfo[i].dobj.dump) > +{ > +/* > + * Load the parents information for the partition table when > + * the load-via-partition-root option is set. As we need the > + * parents information to get the partition root. > + */ > +if (dopt->load_via_partition_root && > +tblinfo[i].ispartition) > +findParentsByOid([i], inhinfo, numInherits); > continue; > +} > > Duplicating the call to findParentsByOid seems less then ideal. How > about doing something like this: > > if (tblinfo[i].dobj.dump) > { > find_parents = true; > mark_parents = true; > } > else if (dopt->load_via_partition_root && tblinfo[i].ispartition) > find_parents = true; > > if (find_parents) > findParentsByOid([i], inhinfo, numInherits); > > etc. > > Done changes to avoid duplicate call to findParentsByOid(). > The comments for this function also need some work - e.g. the function > header comment deserves some kind of update for these changes. > > Done. > +static TableInfo * > +getRootTableInfo(TableInfo *tbinfo) > +{ > +Assert(tbinfo->ispartition); > +Assert(tbinfo->numParents == 1); > +if (tbinfo->parents[0]->ispartition) > +return getRootTableInfo(tbinfo->parents[0]); > + > +return tbinfo->parents[0]; > +} > > This code should iterate, not recurse, to avoid any risk of blowing > out the stack. > > Done. Please find attach patch with the changes. Thanks, Rushabh Lathia www.EnterpriseDB.com diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index bafa031..de8297a 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -889,6 +889,21 @@ PostgreSQL documentation + --load-via-partition-root + + +When dumping a COPY or INSERT statement for a partitioned table, +target the root of the partitioning hierarchy which contains it rather +than the partition itself. This will be useful when reloading data on +a server where rows do not always fall into the same partitions as +they did on the original server. This could happen, for example, if +the partitioning column is of type text and the two system have +different definitions of the collation used to partition the data. + + + + + --section=sectionname diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml index aa944a2..dc1d3cc 100644 --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -431,6 +431,21 @@ PostgreSQL documentation + --load-via-partition-root + + +When dumping a COPY or INSERT statement for a partitioned table, +target the root of the partitioning hierarchy which contains it rather +than the partition itself. This will be useful when reloading data on +a server where rows do not always fall into the same partitions as +they did on the original server. This could happen, for example, if +the partitioning column is of type text and the two system have +different definitions of the collation used to partition the data. + + + + + --use-set-session-authorization
Re: [HACKERS] Foreign tables privileges not shown in information_schema.table_privileges
On Thu, Aug 10, 2017 at 6:30 PM, Nicolas Thauvinwrote: > Hello, > > The information_schema.table_privileges view filters on regular tables > and views. Foreign tables are not shown in this view but they are in > other views of the information_schema like tables or column_privileges. > > Is it intentional? A patch is attached if not. The line was first added by 596652d6 and updated by 262e821d to include partitioned tables. Looks like we have forgot to add tables added in between i.e. foreign tables and materialized views. column_privileges doesn't have materialized views. Attached patch adds materialized views to column_privileges view along with your changes. Please add this to the next commitfest so that it doesn't get forgotten. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql index 98bcfa0..fbb5460 100644 --- a/src/backend/catalog/information_schema.sql +++ b/src/backend/catalog/information_schema.sql @@ -573,7 +573,7 @@ CREATE VIEW column_privileges AS pr_c.relowner FROM (SELECT oid, relname, relnamespace, relowner, (aclexplode(coalesce(relacl, acldefault('r', relowner.* FROM pg_class - WHERE relkind IN ('r', 'v', 'f', 'p') + WHERE relkind IN ('r', 'v', 'f', 'p', 'm') ) pr_c (oid, relname, relnamespace, relowner, grantor, grantee, prtype, grantable), pg_attribute a WHERE a.attrelid = pr_c.oid @@ -595,7 +595,7 @@ CREATE VIEW column_privileges AS ) pr_a (attrelid, attname, grantor, grantee, prtype, grantable), pg_class c WHERE pr_a.attrelid = c.oid - AND relkind IN ('r', 'v', 'f', 'p') + AND relkind IN ('r', 'v', 'f', 'p', 'm') ) x, pg_namespace nc, pg_authid u_grantor, @@ -1868,7 +1868,7 @@ CREATE VIEW table_privileges AS ) AS grantee (oid, rolname) WHERE c.relnamespace = nc.oid - AND c.relkind IN ('r', 'v', 'p') + AND c.relkind IN ('r', 'v', 'f', 'p', 'm') AND c.grantee = grantee.oid AND c.grantor = u_grantor.oid AND c.prtype IN ('INSERT', 'SELECT', 'UPDATE', 'DELETE', 'TRUNCATE', 'REFERENCES', 'TRIGGER') -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] POC: Sharing record typmods between backends
Hi, Please find attached a new patch series. I apologise in advance for 0001 and note that the patchset now weighs in at ~75kB compressed. Here are my in-line replies to your two reviews: On Tue, Jul 25, 2017 at 10:09 PM, Andres Freundwrote: > It does concern me that we're growing yet another somewhat different > hashtable implementation. Yet I don't quite see how we could avoid > it. dynahash relies on proper pointers, simplehash doesn't do locking > (and shouldn't) and also relies on pointers, although to a much lesser > degree. All the open coded tables aren't a good match either. So I > don't quite see an alternative, but I'd love one. Yeah, I agree. To deal with data structures with different pointer types, locking policy, inlined hash/eq functions etc, perhaps there is a way we could eventually do 'policy based design' using the kind of macro trickery you started where we generate N different hash table variations but only have to maintain source code for one chaining hash table implementation? Or perl scripts that effectively behave as a cfront^H^H^H nevermind. I'm not planning to investigate that for this cycle. > > diff --git a/src/backend/lib/dht.c b/src/backend/lib/dht.c > new file mode 100644 > index 000..2fec70f7742 > --- /dev/null > +++ b/src/backend/lib/dht.c > > FWIW, not a big fan of dht as a filename (nor of dsa.c). For one DHT > usually refers to distributed hash tables, which this is not, and for > another the abbreviation is so short it's not immediately > understandable, and likely to conflict further. I think it'd possibly > ok to have dht as symbol prefixes, but rename the file to be longer. OK. Now it's ds_hash_table.{c,h}, where "ds" stands for "dynamic shared". Better? If we were to do other data structures in DSA memory they could follow that style: ds_red_black_tree.c, ds_vector.c, ds_deque.c etc and their identifier prefix would be drbt_, dv_, dd_ etc. Do you want to see a separate patch to rename dsa.c? Got a better name? You could have spoken up earlier :-) It does sound like a bit like the thing from crypto or perhaps a scary secret government department. > + * To deal with currency, it has a fixed size set of partitions, each of > which > + * is independently locked. > > s/currency/concurrency/ I presume. Fixed. > + * Each bucket maps to a partition; so insert, find > + * and iterate operations normally only acquire one lock. Therefore, good > + * concurrency is achieved whenever they don't collide at the lock partition > > s/they/operations/? Fixed. > + * level. However, when a resize operation begins, all partition locks must > + * be acquired simultaneously for a brief period. This is only expected to > + * happen a small number of times until a stable size is found, since growth > is > + * geometric. > > I'm a bit doubtful that we need partitioning at this point, and that it > doesn't actually *degrade* performance for your typmod case. Yeah, partitioning not needed for this case, but this is supposed to be more generally useful. I thought about making the number of partitions a construction parameter, but it doesn't really hurt does it? > + * Resizing is done incrementally so that no individual insert operation pays > + * for the potentially large cost of splitting all buckets. > > I'm not sure this is a reasonable tradeoff for the use-case suggested so > far, it doesn't exactly make things simpler. We're not going to grow > much. Yeah, designed to be more generally useful. Are you saying you would prefer to see the DHT patch split into an initial submission that does the simplest thing possible, so that the unlucky guy who causes the hash table to grow has to do all the work of moving buckets to a bigger hash table? Then we could move the more complicated incremental growth stuff to a later patch. > +/* The opaque type used for tracking iterator state. */ > +struct dht_iterator; > +typedef struct dht_iterator dht_iterator; > > Isn't it actually the iterator state? Rather than tracking it? Also, why > is it opaque given you're actually defining it below? Guess you'd moved > it at some point. Improved comment. The iterator state is defined below in the .h, but with a warning that client code mustn't access it; it exists in the header only because it's very useful to be able to but dht_iterator on the stack which requires the client code to have its definition, but I want to reserve the right to change it arbitrarily in future. > +/* > + * The set of parameters needed to create or attach to a hash table. The > + * members tranche_id and tranche_name do not need to be initialized when > + * attaching to an existing hash table. > + */ > +typedef struct > +{ > + Size key_size; /* Size of the key (initial > bytes of entry) */ > + Size entry_size;/* Total size of entry */ > > Let's use size_t, like we kind of concluded in the thread you
Re: [HACKERS] SCRAM protocol documentation
On 11/08/17 03:57, Peter Eisentraut wrote: The SCRAM protocol documentation (https://www.postgresql.org/docs/devel/static/sasl-authentication.html) states "To avoid confusion, the client should use pg_same_as_startup_message as the username in the client-first-message." However, the client implementation in libpq doesn't actually do that, it sends an empty string for the user name. I find no other reference to "pg_same_as_startup_message" in the sources. Should the documentation be updated? Relatedly, the SCRAM specification doesn't appear to allow omitting the user name in this manner. Why don't we just send the actual user name, even though it's redundant with the startup message? Hi Peter. You are absolutely right, I was also surprised by this when I was doing the JDBC implementation. Actually I chose to send an asterisk ("*"), see https://github.com/pgjdbc/pgjdbc/pull/842/files#diff-c52128420a3882543ffa20a48964abe4R88, as it is shorter than the username (likely). I don't like the empty string either, and actually the library built for the JDBC and used in pgjdbc does explicitly disallow the use of an empty username. If there's a clear meaning about ignoring the user here, why not settle on something like the "*"? It's not going to change the world sending a few bytes less on initialization, but I guess it doesn't hurt either... Álvaro -- Álvaro Hernández Tortosa --- <8K>data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers