Re: [HACKERS] Proposal: Incremental Backup
On Thu, Aug 7, 2014 at 6:29 PM, Gabriele Bartolini < gabriele.bartol...@2ndquadrant.it> wrote: > Hi Marco, > > > With the current full backup procedure they are backed up, so I think > > that having them backed up with a rsync-like algorithm is what an user > > would expect for an incremental backup. > > Exactly. I think a simple, flexible and robust method for file based > incremental backup is all we need. I am confident it could be done for > 9.5. > > I would like to quote every single word Simon said. Block level > incremental backup (with Robert's proposal) is definitely the ultimate > goal for effective and efficient physical backups. I see file level > incremental backup as a very good "compromise", a sort of intermediate > release which could nonetheless produce a lot of benefits to our user > base, for years to come too. > > Thanks, > Gabriele > I haven't been following this discussion closely at all. But at Janestreet we have been using pg_start_backup together with rsync --link-dest (onto a big NFS) to achieve incremental stored backup. In our experience this works very well, it is however advisable to look into whatever is used to serve the NFS as we had to set some options to increase the maximum number of hardlinks. Cheers, Bene > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] jsonb format is pessimal for toast compression
On Fri, Aug 8, 2014 at 10:48 AM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > I looked into the issue reported in bug #11109. The problem appears to > be > > that jsonb's on-disk format is designed in such a way that the leading > > portion of any JSON array or object will be fairly incompressible, > because > > it consists mostly of a strictly-increasing series of integer offsets. > > This interacts poorly with the code in pglz_compress() that gives up if > > it's found nothing compressible in the first first_success_by bytes of a > > value-to-be-compressed. (first_success_by is 1024 in the default set of > > compression parameters.) > > I haven't looked at this in any detail, so take this with a grain of > salt, but what about teaching pglz_compress about using an offset > farther into the data, if the incoming data is quite a bit larger than > 1k? This is just a test to see if it's worthwhile to keep going, no? I > wonder if this might even be able to be provided as a type-specific > option, to avoid changing the behavior for types other than jsonb in > this regard. > > +1 for offset. Or sample the data in the beginning, middle and end. Obviously one could always come up with worst case, but. > (I'm imaginging a boolean saying "pick a random sample", or perhaps a > function which can be called that'll return "here's where you wanna test > if this thing is gonna compress at all") > > I'm rather disinclined to change the on-disk format because of this > specific test, that feels a bit like the tail wagging the dog to me, > especially as I do hope that some day we'll figure out a way to use a > better compression algorithm than pglz. > > Thanks, > > Stephen > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.
On Fri, Aug 8, 2014 at 8:54 AM, Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hi, thank you for the comment. > > > Hi Kyotaro, > > I looked at the patches and felt that the approach taken here is too > > intrusive, considering that the feature is only for foreign scans. > > I agree to you premising that it's only for foreign scans but I > regard it as an example of parallel execution planning. > > > There are quite a few members added to the generic Path, Plan structures, > > whose use is is induced only through foreign scans. Each path now stores > > two sets of costs, one with parallelism and one without. The parallel > > values will make sense only when there is a foreign scan, which uses > > parallelism, in the plan tree. So, those costs are maintained > unnecessarily > > or the memory for those members is wasted in most of the cases, where the > > tables involved are not foreign. Also, not many foreign tables will be > able > > to use the parallelism, e.g. file_fdw. Although, that's my opinion; I > would > > like hear from others. > > I intended to discuss what the estimation and planning for > parallel exexution (not limited to foreign scan) would be > like. Backgroud worker would be able to take on executing some > portion of path tree in 'parallel'. The postgres_fdw for this > patch is simply a case in planning of parallel > executions. Although, as you see, it does only choosing whether > to go parallel for the path constructed regardless of parallel > execution but thinking of the possible alternate paths of > parallel execution will cost too much. > > Limiting to parallel scans for this discussion, the overall gain > by multiple simultaneous scans distributed in path/plan tree > won't be known before cost counting is done up to the root node > (more precisely the common parent of them). This patch foolishly > does bucket brigade of parallel cost up to root node, but there > should be smarter way to shortcut it, for example, simplly > picking up parallelizable nodes by scanning completed path/plan > tree and calculate the probably-eliminable costs from them, then > subtract it from or compare to the total (nonparallel) cost. This > might be more acceptable for everyone than current implement. > > Planning for parallel execution, would be a much harder problem to solve. Just to give a glimpse, how many worker backends can be spawned depends entirely on the load at the time of execution. For prepared queries, the load condition can change between planning and execution and thus the number of parallel backends, which would decide the actual time of execution and hence cost, can not be estimated at the time of the planning. Mixing this that parallelism with FDW's parallelism would make things even more complicated. I think those two problems are to be solved in different ways. > > Instead, an FDW which can use parallelism can add two paths one with and > > one without parallelism with appropriate costs and let the logic choosing > > the cheapest path take care of the actual choice. In fact, I thought, > > parallelism would be always faster than the non-parallel one, except when > > the foreign server is too much loaded. But we won't be able to check that > > anyway. Can you point out a case where the parallelism may not win over > > serial execution? > > It always wins against serial execution if parallel execution can > launched with no extra cost. But actually it costs extra resource > so I thought that parallel execution should be curbed for small > gain. It's the two GUCs added by this patch and what > choose_parallel_scans() does, although in non-automated way. The > overloading issue is not a matter confined to parallel execution > but surely it will be more severe since it is less visible and > controllable from users. However, it anyhow would should go to > manual configuration at end. > I am not sure, whether the way this patch provides manual control is really effective or in-effective without understanding the full impact. Do we have any numbers to show the cases, when parallelism would effective and when it would not and how those GUCs help choose the effective one? > > > BTW, the name parallelism seems to be misleading here. All, it will be > able > > to do is fire the queries (or data fetch requests) asynchronously. So, we > > might want to change the naming appropriately. > > It is right ragarding what I did exactly to postgres_fdw. But not > allowing all intermedate tuples from child execution nodes in > parallel to be piled up on memory without restriction, I suppose > all 'parallel' execution to be a kind of this 'asynchronous' > startup/fething. As for postgres_fdw, it would look more like > 'parallel' (and perhaps more effeicient) by processing queries > using libpq's single-row mode instead of a cursor but the similar > processing takes place under system calls even for the case. > > By single mode, do you mean executing FETCH for every row? That wouldn't be
Re: [HACKERS] postgresql.auto.conf and reload
On Fri, Aug 8, 2014 at 1:19 PM, Amit Kapila wrote: > On Thu, Aug 7, 2014 at 12:36 PM, Fujii Masao wrote: >> On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila >> wrote: >> > On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao >> > wrote: > >> >> What about picking up only data_directory at the first pass? >> > >> > I think that will workout and for that I think we might need to add >> > few checks during ProcessConfigFile. Do you want to address it >> > during parsing of config file or during setting of values? >> >> I prefer "during paring". The attached patch does that. In this patch, >> the first call of ProcessConfigFile() picks up only data_directory. One >> concern of this patch is that the logic is a bit ugly. > > I think handling 'data_directory' in ParseConfigFp() looks bit out of > place as this API is used to parse other config files as well like > recovery.conf. I agree that for all other paths DataDir might be > already set due to which this new path will never be executed, still > from code maintenance point of view it would have been better if we > can find a way to handle it in a place where we are processing > the server's main config file (postgresql.conf). Yep, right. ParseConfigFp() is not good place to pick up data_directory. What about the attached patch which makes ProcessConfigFile() instead of ParseConfigFp() pick up data_directory just after the configuration file is parsed? Regards, -- Fujii Masao *** a/src/backend/utils/misc/guc-file.l --- b/src/backend/utils/misc/guc-file.l *** *** 162,167 ProcessConfigFile(GucContext context) --- 162,209 } /* + * Pick up only the data_directory if DataDir is not set, which + * means that the configuration file is read for the first time and + * PG_AUTOCONF_FILENAME file cannot be read yet. In this case, + * we should not pick up all the settings except the data_directory + * from postgresql.conf yet because they might be overwritten + * with the settings in PG_AUTOCONF_FILENAME file which will be + * read later. OTOH, since it's ensured that data_directory doesn't + * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten + * later. Now only the data_directory needs to be picked up to + * read PG_AUTOCONF_FILENAME file later. + */ + if (DataDir == NULL) + { + ConfigVariable *prev = NULL; + + for (item = head; item;) + { + ConfigVariable *ptr = item; + + item = item->next; + if (strcmp(ptr->name, "data_directory") != 0) + { + if (prev == NULL) + head = ptr->next; + else + { + prev->next = ptr->next; + /* + * On removing last item in list, we need to update tail + * to ensure that list will be maintianed. + */ + if (prev->next == NULL) + tail = prev; + } + pfree(ptr); + } + else + prev = ptr; + } + } + + /* * Mark all extant GUC variables as not present in the config file. * We need this so that we can tell below which ones have been removed * from the file since we last processed it. *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 4342,4347 SelectConfigFiles(const char *userDoption, const char *progname) --- 4342,4354 return false; } + /* + * Read the configuration file for the first time. This time only + * data_directory parameter is picked up to determine the data directory + * so that we can read PG_AUTOCONF_FILENAME file next time. Then don't + * forget to read the configuration file again later to pick up all the + * parameters. + */ ProcessConfigFile(PGC_POSTMASTER); /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Specifying the unit in storage parameter
On Fri, Aug 8, 2014 at 2:12 PM, Alvaro Herrera wrote: > Fujii Masao wrote: >> On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera >> wrote: > >> > Hm, what's with the parse_int signature change and the hintmsg thing? >> > Is it just me or the patch is incomplete? >> >> Sorry, probably I failed to see your point. You mean that the signature >> of parse_int needs to be changed so that it includes the hintmsg as the >> argument? If yes, there is no problem. Both signature and function body >> of parse_int has already included the hingmsg as the argument so far. >> Am I missing something? > > I just mean that the parse_int function body is not touched by your > patch, unless I am failing to see something. Yes, my patch doesn't change the parse_int function at all because I didn't think such change is required for the purpose (i.e., just allows us to specify the unit in the setting of storage parameters). But, you might find the reason why it needs to be changed? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing pgbench parameter checking
Fabien, > Hello Tatsuo-san, > >> Thanks for the review. I have registered it to Aug Commit fest. >> https://commitfest.postgresql.org/action/patch_view?id=1532 >> >>> I'm not sure of the variable name "is_non_init_parameter_set". I would >>> suggest "benchmarking_option_set"? >> >> Ok, I will replace the variable name as you suggested. >> >>> Also, to be consistent, maybe it should check that no >>> initialization-specific option are set when benchmarking. >> >> Good suggesition. Here is the v2 patch. > > I applied it without problem and tested it. > > > * It seems that -c is ignored, the atoi() line has been removed. > > * Option -q is initialization specific, but not detected as such like > * the other, although there is a specific detection later. I think that > * it would be better to use the systematic approach, and to remove the > * specific check. > > * I would name the second boolean "initialization_option_set", as it is > * describe like that in the documentation. > > > * I would suggest the following error messages: > "some options cannot be used in initialization (-i) mode\n" and > "some options cannot be used in benchmarking mode\n". > Although these messages are rough, I think that they are enough and > avoid running something unexpected, which is your purpose. > > > Find attached a patch which adds these changes to your current > version. Thank you for the review and patch. Looks good. I changed the status to "Ready for Committer". I will wait for a fewd days and if there's no objection, I will commit the patch. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
* Tom Lane (t...@sss.pgh.pa.us) wrote: > I looked into the issue reported in bug #11109. The problem appears to be > that jsonb's on-disk format is designed in such a way that the leading > portion of any JSON array or object will be fairly incompressible, because > it consists mostly of a strictly-increasing series of integer offsets. > This interacts poorly with the code in pglz_compress() that gives up if > it's found nothing compressible in the first first_success_by bytes of a > value-to-be-compressed. (first_success_by is 1024 in the default set of > compression parameters.) I haven't looked at this in any detail, so take this with a grain of salt, but what about teaching pglz_compress about using an offset farther into the data, if the incoming data is quite a bit larger than 1k? This is just a test to see if it's worthwhile to keep going, no? I wonder if this might even be able to be provided as a type-specific option, to avoid changing the behavior for types other than jsonb in this regard. (I'm imaginging a boolean saying "pick a random sample", or perhaps a function which can be called that'll return "here's where you wanna test if this thing is gonna compress at all") I'm rather disinclined to change the on-disk format because of this specific test, that feels a bit like the tail wagging the dog to me, especially as I do hope that some day we'll figure out a way to use a better compression algorithm than pglz. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Specifying the unit in storage parameter
Fujii Masao wrote: > On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera > wrote: > > Hm, what's with the parse_int signature change and the hintmsg thing? > > Is it just me or the patch is incomplete? > > Sorry, probably I failed to see your point. You mean that the signature > of parse_int needs to be changed so that it includes the hintmsg as the > argument? If yes, there is no problem. Both signature and function body > of parse_int has already included the hingmsg as the argument so far. > Am I missing something? I just mean that the parse_int function body is not touched by your patch, unless I am failing to see something. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org
JD, * Joshua D. Drake (j...@commandprompt.com) wrote: > But this is just plain wrong. I don't care that the FAQ (on the > wiki) says we are doing it wrong for good reasons. When I (or anyone > else) pulls postgresql-$version-dev, I want the libpq for my > version. I do not want 9.3. No, it isn't wrong. If you'd like the specific version, follow what's in the FAQ to get the specific version. Otherwise, we're going to follow the Debian guidelines which are quite sensible and more-or-less say "make new builds go against the latest version". > There can be unintended circumstances on machines when you mix and > match like that. Can we please do some proper packaging on this? This *is* the proper packaging. If you want the specific version, update your deb line. Don't complain because you used the Debian repo that follows the Debian guidelines and didn't like what you got- that's not going to change. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Specifying the unit in storage parameter
On Fri, Aug 8, 2014 at 12:56 PM, Alvaro Herrera wrote: > Fujii Masao wrote: >> Hi, >> >> We can specify the unit when setting autovacuum_vacuum_cost_delay >> GUC as follows. >> >> ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms'; >> >> OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay >> as storage parameter as follows. >> >> CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = >> '80ms'); >> ERROR: invalid value for integer option >> "autovacuum_vacuum_cost_delay": 80ms >> >> This is not user-friendly. > > No disagreement here. > >> I'd like to propose the attached patch which >> introduces the infrastructure which allows us to specify the unit when >> setting INTEGER storage parameter like autovacuum_vacuum_cost_delay. >> Comment? Review? > > Hm, what's with the parse_int signature change and the hintmsg thing? > Is it just me or the patch is incomplete? Sorry, probably I failed to see your point. You mean that the signature of parse_int needs to be changed so that it includes the hintmsg as the argument? If yes, there is no problem. Both signature and function body of parse_int has already included the hingmsg as the argument so far. Am I missing something? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
On Thu, Aug 7, 2014 at 12:36 PM, Fujii Masao wrote: > On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila wrote: > > On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao wrote: > >> What about picking up only data_directory at the first pass? > > > > I think that will workout and for that I think we might need to add > > few checks during ProcessConfigFile. Do you want to address it > > during parsing of config file or during setting of values? > > I prefer "during paring". The attached patch does that. In this patch, > the first call of ProcessConfigFile() picks up only data_directory. One > concern of this patch is that the logic is a bit ugly. I think handling 'data_directory' in ParseConfigFp() looks bit out of place as this API is used to parse other config files as well like recovery.conf. I agree that for all other paths DataDir might be already set due to which this new path will never be executed, still from code maintenance point of view it would have been better if we can find a way to handle it in a place where we are processing the server's main config file (postgresql.conf). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] jsonb format is pessimal for toast compression
Apologies if this is a ridiculous suggestion, but I believe that swapping out the compression algorithm (for Snappy, for example) has been discussed in the past. I wonder if that algorithm is sufficiently different that it would produce a better result, and if that might not be preferable to some of the other options. On Thu, Aug 7, 2014 at 11:17 PM, Tom Lane wrote: > I looked into the issue reported in bug #11109. The problem appears to be > that jsonb's on-disk format is designed in such a way that the leading > portion of any JSON array or object will be fairly incompressible, because > it consists mostly of a strictly-increasing series of integer offsets. > This interacts poorly with the code in pglz_compress() that gives up if > it's found nothing compressible in the first first_success_by bytes of a > value-to-be-compressed. (first_success_by is 1024 in the default set of > compression parameters.) > > As an example, here's gdb's report of the bitwise representation of the > example JSON value in the bug thread: > > 0x2ab85ac: 0x2005 0x0004 0x50003098 0x309f > 0x2ab85bc: 0x30ae 0x30b8 0x30cf 0x30da > 0x2ab85cc: 0x30df 0x30ee 0x3105 0x6b6e756a > 0x2ab85dc: 0x40de 0x0034 0x0068 0x009c > 0x2ab85ec: 0x00d0 0x0104 0x0138 0x016c > 0x2ab85fc: 0x01a0 0x01d4 0x0208 0x023c > 0x2ab860c: 0x0270 0x02a4 0x02d8 0x030c > 0x2ab861c: 0x0340 0x0374 0x03a8 0x03dc > 0x2ab862c: 0x0410 0x0444 0x0478 0x04ac > 0x2ab863c: 0x04e0 0x0514 0x0548 0x057c > 0x2ab864c: 0x05b0 0x05e4 0x0618 0x064c > 0x2ab865c: 0x0680 0x06b4 0x06e8 0x071c > 0x2ab866c: 0x0750 0x0784 0x07b8 0x07ec > 0x2ab867c: 0x0820 0x0854 0x0888 0x08bc > 0x2ab868c: 0x08f0 0x0924 0x0958 0x098c > 0x2ab869c: 0x09c0 0x09f4 0x0a28 0x0a5c > 0x2ab86ac: 0x0a90 0x0ac4 0x0af8 0x0b2c > 0x2ab86bc: 0x0b60 0x0b94 0x0bc8 0x0bfc > 0x2ab86cc: 0x0c30 0x0c64 0x0c98 0x0ccc > 0x2ab86dc: 0x0d00 0x0d34 0x0d68 0x0d9c > 0x2ab86ec: 0x0dd0 0x0e04 0x0e38 0x0e6c > 0x2ab86fc: 0x0ea0 0x0ed4 0x0f08 0x0f3c > 0x2ab870c: 0x0f70 0x0fa4 0x0fd8 0x100c > 0x2ab871c: 0x1040 0x1074 0x10a8 0x10dc > 0x2ab872c: 0x1110 0x1144 0x1178 0x11ac > 0x2ab873c: 0x11e0 0x1214 0x1248 0x127c > 0x2ab874c: 0x12b0 0x12e4 0x1318 0x134c > 0x2ab875c: 0x1380 0x13b4 0x13e8 0x141c > 0x2ab876c: 0x1450 0x1484 0x14b8 0x14ec > 0x2ab877c: 0x1520 0x1554 0x1588 0x15bc > 0x2ab878c: 0x15f0 0x1624 0x1658 0x168c > 0x2ab879c: 0x16c0 0x16f4 0x1728 0x175c > 0x2ab87ac: 0x1790 0x17c4 0x17f8 0x182c > 0x2ab87bc: 0x1860 0x1894 0x18c8 0x18fc > 0x2ab87cc: 0x1930 0x1964 0x1998 0x19cc > 0x2ab87dc: 0x1a00 0x1a34 0x1a68 0x1a9c > 0x2ab87ec: 0x1ad0 0x1b04 0x1b38 0x1b6c > 0x2ab87fc: 0x1ba0 0x1bd4 0x1c08 0x1c3c > 0x2ab880c: 0x1c70 0x1ca4 0x1cd8 0x1d0c > 0x2ab881c: 0x1d40 0x1d74 0x1da8 0x1ddc > 0x2ab882c: 0x1e10 0x1e44 0x1e78 0x1eac > 0x2ab883c: 0x1ee0 0x1f14 0x1f48 0x1f7c > 0x2ab884c: 0x1fb0 0x1fe4 0x2018 0x204c > 0x2ab885c: 0x2080 0x20b4 0x20e8 0x211c > 0x2ab886c: 0x2150 0x2184 0x21b8 0x21ec > 0x2ab887c: 0x2220 0x2254 0x2288 0x22bc > 0x2ab888c: 0x22f0 0x2324 0x2358 0x238c > 0x2ab889c: 0x23c0 0x23f4 0x2428 0x245c > 0x2ab88ac: 0x2490 0x24c4 0x24f8 0x252c > 0x2ab88bc: 0x2560 0x2594 0x25c8 0x25fc > 0x2ab88cc: 0x2630 0x2664 0x2698 0x26cc > 0x2ab88dc: 0x2700
Re: [HACKERS] Specifying the unit in storage parameter
Fujii Masao wrote: > Hi, > > We can specify the unit when setting autovacuum_vacuum_cost_delay > GUC as follows. > > ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms'; > > OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay > as storage parameter as follows. > > CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = '80ms'); > ERROR: invalid value for integer option > "autovacuum_vacuum_cost_delay": 80ms > > This is not user-friendly. No disagreement here. > I'd like to propose the attached patch which > introduces the infrastructure which allows us to specify the unit when > setting INTEGER storage parameter like autovacuum_vacuum_cost_delay. > Comment? Review? Hm, what's with the parse_int signature change and the hintmsg thing? Is it just me or the patch is incomplete? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use unique index for longer pathkeys.
Hello, > > Although, yes, you're right, irrespective of the "common > > something", and even if the dropped index was i_t1_pkey_2, which > > is on t1(a, b), the result tuples are sorted as expected only by > > the pathkey (t.a = t1.a, t1.b). It is because both t and t1 are > > still unique so the joined tuples are also unique, and the unique > > key of the result tuples is the merged pkey (t.a, t1.a, t1.b), > > which can be transformed to (t.a, t1.b) using the equiality > > between t.a and t1.a. And considering the inner relation (t1) is > > already sorted by (a, b), the sort itself could be elimited from > > the plan. > > I think if we could eliminate t1.c,t1.d considering indexes on > individual relations (may be by using technique I have mentioned > upthread or some other way), then the current code itself will > eliminate the ORDER BY clause. I have tried that by using a query > without having t1.c,t1.d in ORDER BY clause > > explain (costs off, analyze off) select * from t,t1 where t.a=t1.a order by > t1.a,t1.b; > QUERY PLAN > -- > Merge Join >Merge Cond: (t1.a = t.a) >-> Index Scan using i_t1_pkey_2 on t1 >-> Index Scan using i_t_pkey on t > (4 rows) Ya, the elimination seems to me so fascinate:) > Okay, I think you want to handle the elimination of ORDER BY clauses > at a much broader level which might handle most of simple cases as > well. However I think eliminating clauses as mentioned above is itself > a good optimization for many cases and more so if that can be done in > a much simpler way. Yes, if can be done in "much" simpler way.. I guess that it could be looking from opposite side, that is, equivalence classes, anyway, I'll try it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Specifying the unit in storage parameter
Hi, We can specify the unit when setting autovacuum_vacuum_cost_delay GUC as follows. ALTER SYSTEM SET autovacuum_vacuum_cost_delay TO '80ms'; OTOH we cannot specify the unit when setting autovacuum_vacuum_cost_delay as storage parameter as follows. CREATE TABLE test (col1 int) WITH (autovacuum_vacuum_cost_delay = '80ms'); ERROR: invalid value for integer option "autovacuum_vacuum_cost_delay": 80ms This is not user-friendly. I'd like to propose the attached patch which introduces the infrastructure which allows us to specify the unit when setting INTEGER storage parameter like autovacuum_vacuum_cost_delay. Comment? Review? Regards, -- Fujii Masao *** a/src/backend/access/common/reloptions.c --- b/src/backend/access/common/reloptions.c *** *** 105,111 static relopt_int intRelOpts[] = "Packs btree index pages only to this percentage", RELOPT_KIND_BTREE }, ! BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100 }, { { --- 105,111 "Packs btree index pages only to this percentage", RELOPT_KIND_BTREE }, ! BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100, 0 }, { { *** *** 113,119 static relopt_int intRelOpts[] = "Packs hash index pages only to this percentage", RELOPT_KIND_HASH }, ! HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100 }, { { --- 113,119 "Packs hash index pages only to this percentage", RELOPT_KIND_HASH }, ! HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100, 0 }, { { *** *** 121,127 static relopt_int intRelOpts[] = "Packs gist index pages only to this percentage", RELOPT_KIND_GIST }, ! GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100 }, { { --- 121,127 "Packs gist index pages only to this percentage", RELOPT_KIND_GIST }, ! GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100, 0 }, { { *** *** 129,135 static relopt_int intRelOpts[] = "Packs spgist index pages only to this percentage", RELOPT_KIND_SPGIST }, ! SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100 }, { { --- 129,135 "Packs spgist index pages only to this percentage", RELOPT_KIND_SPGIST }, ! SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100, 0 }, { { *** *** 137,143 static relopt_int intRelOpts[] = "Minimum number of tuple updates or deletes prior to vacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, INT_MAX }, { { --- 137,143 "Minimum number of tuple updates or deletes prior to vacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, INT_MAX, 0 }, { { *** *** 145,151 static relopt_int intRelOpts[] = "Minimum number of tuple inserts, updates or deletes prior to analyze", RELOPT_KIND_HEAP }, ! -1, 0, INT_MAX }, { { --- 145,151 "Minimum number of tuple inserts, updates or deletes prior to analyze", RELOPT_KIND_HEAP }, ! -1, 0, INT_MAX, 0 }, { { *** *** 153,159 static relopt_int intRelOpts[] = "Vacuum cost delay in milliseconds, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 100 }, { { --- 153,159 "Vacuum cost delay in milliseconds, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 100, GUC_UNIT_MS }, { { *** *** 161,167 static relopt_int intRelOpts[] = "Vacuum cost amount available before napping, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 1, 1 }, { { --- 161,167 "Vacuum cost amount available before napping, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 1, 1, 0 }, { { *** *** 169,175 static relopt_int intRelOpts[] = "Minimum age at which VACUUM should freeze a table row, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 10 }, { { --- 169,175 "Minimum age at which VACUUM should freeze a table row, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 10, 0 }, { { *** *** 177,183 static relopt_int intRelOpts[] = "Minimum multixact age at which VACUUM should freeze a row multixact's, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 10 }, { { --- 177,183 "Minimum multixact age at which VACUUM should freeze a row multixact's, for autovacuum", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 0, 10, 0 }, { { *** *** 185,191 static relopt_int intRelOpts[] = "Age at which to autovacuum a table to prevent transaction ID wraparound", RELOPT_KIND_HEAP | RELOPT_KIND_TOAST }, ! -1, 1, 20 }, {
Re: [HACKERS] Introducing coarse grain parallelism by postgres_fdw.
Hi, thank you for the comment. > Hi Kyotaro, > I looked at the patches and felt that the approach taken here is too > intrusive, considering that the feature is only for foreign scans. I agree to you premising that it's only for foreign scans but I regard it as an example of parallel execution planning. > There are quite a few members added to the generic Path, Plan structures, > whose use is is induced only through foreign scans. Each path now stores > two sets of costs, one with parallelism and one without. The parallel > values will make sense only when there is a foreign scan, which uses > parallelism, in the plan tree. So, those costs are maintained unnecessarily > or the memory for those members is wasted in most of the cases, where the > tables involved are not foreign. Also, not many foreign tables will be able > to use the parallelism, e.g. file_fdw. Although, that's my opinion; I would > like hear from others. I intended to discuss what the estimation and planning for parallel exexution (not limited to foreign scan) would be like. Backgroud worker would be able to take on executing some portion of path tree in 'parallel'. The postgres_fdw for this patch is simply a case in planning of parallel executions. Although, as you see, it does only choosing whether to go parallel for the path constructed regardless of parallel execution but thinking of the possible alternate paths of parallel execution will cost too much. Limiting to parallel scans for this discussion, the overall gain by multiple simultaneous scans distributed in path/plan tree won't be known before cost counting is done up to the root node (more precisely the common parent of them). This patch foolishly does bucket brigade of parallel cost up to root node, but there should be smarter way to shortcut it, for example, simplly picking up parallelizable nodes by scanning completed path/plan tree and calculate the probably-eliminable costs from them, then subtract it from or compare to the total (nonparallel) cost. This might be more acceptable for everyone than current implement. > Instead, an FDW which can use parallelism can add two paths one with and > one without parallelism with appropriate costs and let the logic choosing > the cheapest path take care of the actual choice. In fact, I thought, > parallelism would be always faster than the non-parallel one, except when > the foreign server is too much loaded. But we won't be able to check that > anyway. Can you point out a case where the parallelism may not win over > serial execution? It always wins against serial execution if parallel execution can launched with no extra cost. But actually it costs extra resource so I thought that parallel execution should be curbed for small gain. It's the two GUCs added by this patch and what choose_parallel_scans() does, although in non-automated way. The overloading issue is not a matter confined to parallel execution but surely it will be more severe since it is less visible and controllable from users. However, it anyhow would should go to manual configuration at end. > BTW, the name parallelism seems to be misleading here. All, it will be able > to do is fire the queries (or data fetch requests) asynchronously. So, we > might want to change the naming appropriately. It is right ragarding what I did exactly to postgres_fdw. But not allowing all intermedate tuples from child execution nodes in parallel to be piled up on memory without restriction, I suppose all 'parallel' execution to be a kind of this 'asynchronous' startup/fething. As for postgres_fdw, it would look more like 'parallel' (and perhaps more effeicient) by processing queries using libpq's single-row mode instead of a cursor but the similar processing takes place under system calls even for the case. Well, I will try to make the version not including parallel costs in plan/path structs, and single-row mode for postgres_fdw. I hope it will go towards anything. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On 08/08/2014 09:51 AM, Tom Lane wrote: >> AFAIK we don't _have_ a fancy negotiation system in the protocol, with >> back-and-forth exchanges of capabilities information. > > Maybe it's time to invent that. It would be positively foolish to > create any such behavior without a protocol version bump anyway. I was hoping it'd be easier to sneak a new message type in without a full protocol bump. As you can imagine that makes it a ... rather larger job. Still, if it's unsafe to do it that way... > Although as I said, I don't think embedding knowledge of LSNs at the > protocol level is a good thing to begin with. As I said upthread, it need not necessarily be an LSN. A commit timestamp would do the job too, if information about the last-replayed commit timestamp was accessible on the downstream node. It needs to be a sequence identifier that can be matched against pg_replication_slots / pg_stat_replication or passed to a function on the downstream end to say "wait until we're replayed to this point". For streaming replication there's only one upstream, so there's no need to identify it. For BDR you'd also have to identify the upstream node of interest - probably by slot ID, or by (sysid, tlid, dboid) tuple. In the end, it can be an opaque magic cookie. It really doesn't matter, so long as what the client receives is a value it can pass to another Pg instance and say "wait until you've replayed up to this, please" or "have you replayed up to this yet?". > Is it really necessary that this information be pushed to the client on every commit, as opposed to the client asking for it occasionally? I think so, yes, though I'd be glad to be proved wrong. For the purpose of transparent failover (BDR) at least, the server currently being written to can go away at any moment, and you should know exactly what you're up to in order to make it safe to continue on another server. Consider, for a multi-master configuration where two servers replicate to each other: On a connection to server1: INSERT INTO bird(id, parrot) VALUES (1, 'African Grey'); [client grabs magic cookie for server replay state] INSERT INTO bird(id, parrot) VALUES (2, 'Lorikkeet'); [server1 sends changes to server2, which is behind on replay and still working on catching up] [server1 dies abruptly] [client drops connection to server1, connects to server2] -- Correct spelling UPDATE bird SET parrot = 'Lorikeet' WHERE id = 2; If the INSERT from server1 hasn't replayed on server2 yet this will fail. Other anomalies can be worse and cause lost updates, etc. To protect against this the client needs a way to wait, after connecting to server2, until it's caught up with the state of server1. That's what I'm talking about here. In this case, if you used a periodic progress indicator requested by the client, you'd still get the same error, because you'd wait until the first INSERT but not the second. So yes, the client needs this info at every commit. That means that enabling client-side fail-over won't be free, especially for lots of small transactions. It'll be cheaper if Pg can push the info with the commit confirmation instead of the client having to request it afterwards though. (Note that the client risks waiting forever if server1 didn't send the required commits before it died, but that's where application policy decisions come in). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] jsonb format is pessimal for toast compression
I looked into the issue reported in bug #11109. The problem appears to be that jsonb's on-disk format is designed in such a way that the leading portion of any JSON array or object will be fairly incompressible, because it consists mostly of a strictly-increasing series of integer offsets. This interacts poorly with the code in pglz_compress() that gives up if it's found nothing compressible in the first first_success_by bytes of a value-to-be-compressed. (first_success_by is 1024 in the default set of compression parameters.) As an example, here's gdb's report of the bitwise representation of the example JSON value in the bug thread: 0x2ab85ac: 0x2005 0x0004 0x50003098 0x309f 0x2ab85bc: 0x30ae 0x30b8 0x30cf 0x30da 0x2ab85cc: 0x30df 0x30ee 0x3105 0x6b6e756a 0x2ab85dc: 0x40de 0x0034 0x0068 0x009c 0x2ab85ec: 0x00d0 0x0104 0x0138 0x016c 0x2ab85fc: 0x01a0 0x01d4 0x0208 0x023c 0x2ab860c: 0x0270 0x02a4 0x02d8 0x030c 0x2ab861c: 0x0340 0x0374 0x03a8 0x03dc 0x2ab862c: 0x0410 0x0444 0x0478 0x04ac 0x2ab863c: 0x04e0 0x0514 0x0548 0x057c 0x2ab864c: 0x05b0 0x05e4 0x0618 0x064c 0x2ab865c: 0x0680 0x06b4 0x06e8 0x071c 0x2ab866c: 0x0750 0x0784 0x07b8 0x07ec 0x2ab867c: 0x0820 0x0854 0x0888 0x08bc 0x2ab868c: 0x08f0 0x0924 0x0958 0x098c 0x2ab869c: 0x09c0 0x09f4 0x0a28 0x0a5c 0x2ab86ac: 0x0a90 0x0ac4 0x0af8 0x0b2c 0x2ab86bc: 0x0b60 0x0b94 0x0bc8 0x0bfc 0x2ab86cc: 0x0c30 0x0c64 0x0c98 0x0ccc 0x2ab86dc: 0x0d00 0x0d34 0x0d68 0x0d9c 0x2ab86ec: 0x0dd0 0x0e04 0x0e38 0x0e6c 0x2ab86fc: 0x0ea0 0x0ed4 0x0f08 0x0f3c 0x2ab870c: 0x0f70 0x0fa4 0x0fd8 0x100c 0x2ab871c: 0x1040 0x1074 0x10a8 0x10dc 0x2ab872c: 0x1110 0x1144 0x1178 0x11ac 0x2ab873c: 0x11e0 0x1214 0x1248 0x127c 0x2ab874c: 0x12b0 0x12e4 0x1318 0x134c 0x2ab875c: 0x1380 0x13b4 0x13e8 0x141c 0x2ab876c: 0x1450 0x1484 0x14b8 0x14ec 0x2ab877c: 0x1520 0x1554 0x1588 0x15bc 0x2ab878c: 0x15f0 0x1624 0x1658 0x168c 0x2ab879c: 0x16c0 0x16f4 0x1728 0x175c 0x2ab87ac: 0x1790 0x17c4 0x17f8 0x182c 0x2ab87bc: 0x1860 0x1894 0x18c8 0x18fc 0x2ab87cc: 0x1930 0x1964 0x1998 0x19cc 0x2ab87dc: 0x1a00 0x1a34 0x1a68 0x1a9c 0x2ab87ec: 0x1ad0 0x1b04 0x1b38 0x1b6c 0x2ab87fc: 0x1ba0 0x1bd4 0x1c08 0x1c3c 0x2ab880c: 0x1c70 0x1ca4 0x1cd8 0x1d0c 0x2ab881c: 0x1d40 0x1d74 0x1da8 0x1ddc 0x2ab882c: 0x1e10 0x1e44 0x1e78 0x1eac 0x2ab883c: 0x1ee0 0x1f14 0x1f48 0x1f7c 0x2ab884c: 0x1fb0 0x1fe4 0x2018 0x204c 0x2ab885c: 0x2080 0x20b4 0x20e8 0x211c 0x2ab886c: 0x2150 0x2184 0x21b8 0x21ec 0x2ab887c: 0x2220 0x2254 0x2288 0x22bc 0x2ab888c: 0x22f0 0x2324 0x2358 0x238c 0x2ab889c: 0x23c0 0x23f4 0x2428 0x245c 0x2ab88ac: 0x2490 0x24c4 0x24f8 0x252c 0x2ab88bc: 0x2560 0x2594 0x25c8 0x25fc 0x2ab88cc: 0x2630 0x2664 0x2698 0x26cc 0x2ab88dc: 0x2700 0x2734 0x2768 0x279c 0x2ab88ec: 0x27d0 0x2804 0x2838 0x286c 0x2ab88fc: 0x28a0 0x28d4 0x2908 0x293c 0x2ab890c: 0x2970 0x29a4 0x29d8 0x2a0c 0x2ab891c: 0x2a40 0x2a74 0x2aa8 0x2adc 0x2ab892c: 0x2b10 0x2b44 0x2b78 0x2bac 0x2ab893c: 0x2be0 0x2c14 0x2c48 0x2c7c 0x
Re: [HACKERS] Reporting the commit LSN at commit time
On Fri, Aug 8, 2014 at 11:58 AM, Fujii Masao wrote: > On Fri, Aug 8, 2014 at 9:50 AM, Craig Ringer wrote: > ISTM that the proper solution to that problem is the introduction of > new synchronous replication mode which makes the transaction wait for > its WAL to be replayed by the standby. If this mode is used, a client > doesn't need to track the LSN of each transaction and check whether > the committed transaction has already replayed by the standby or not. Don't you need to combine that with the possibility to wait for N targets instead of 1 in synchronous_standby_names? You may want to be sure that the commit is done on a series of standbys before considering any further operations after this transaction commit. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On 08/08/2014 10:58 AM, Fujii Masao wrote: > ISTM that the proper solution to that problem is the introduction of > new synchronous replication mode which makes the transaction wait for > its WAL to be replayed by the standby. If this mode is used, a client > doesn't need to track the LSN of each transaction and check whether > the committed transaction has already replayed by the standby or not. I'm not convinced of that. That pushes the penalty onto the writer - which now has to wait until replicas catch up. It has to pay this for every commit, even if actually failing over to another node is unlikely. It'd be better to just enable sync rep instead, or it would if we had all-nodes sync rep. IMO any waiting needs to be done on the other side, i.e. "Wait until I am caught up before proceeding" rather than "wait for the other end to catch up before returning". Doing it the way you describe would make it nearly useless for enabling client-side failover in BDR, where half the point is that it can deal with high latency or intermittently available links to downstream replicas. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On Fri, Aug 8, 2014 at 9:50 AM, Craig Ringer wrote: > On 08/08/2014 03:54 AM, Tom Lane wrote: >> Craig Ringer writes: >>> Hi all >>> To support transparent client-side failover in BDR, it's necessary to >>> know what the LSN of a node was at the time a transaction committed and >>> keep track of that in the client/proxy. >> >>> I'm thinking about adding a new message type in the protocol that gets >>> sent immediately before CommandComplete, containing the LSN of the >>> commit. Clients would need to enable the sending of this message with a >>> GUC that they set when they connect, so it doesn't confuse clients that >>> aren't expecting it or aware of it. >> >> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol >> at all. What happens five years from now when we switch to some other >> implementation that doesn't have LSNs? > > Everyone who's relying on them already via pg_stat_replication, etc, breaks. > > They're _already_ exposed to users. That ship has sailed. > > That's not to say I'm stuck to LSNs as the way to do this, just that I > don't think that particular argument is relevant. > >> I don't mind if you expose some other way to inquire about LSNs, but >> let's *not* embed it into the wire protocol. Not even as an option. > > Well, the underlying need here is to have the client able to keep track > of what point in the server's time-line it must see on a replica before > it proceeds to use that replica. > > So if the client does some work on server A, then for some reason needs > to / must use server B, it can ask server B "are you replayed up to the > last transaction I performed on server A yet?" and wait until it is. ISTM that the proper solution to that problem is the introduction of new synchronous replication mode which makes the transaction wait for its WAL to be replayed by the standby. If this mode is used, a client doesn't need to track the LSN of each transaction and check whether the committed transaction has already replayed by the standby or not. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 08/07/2014 05:52 PM, Michael Paquier wrote: > On Fri, Aug 8, 2014 at 9:47 AM, Josh Berkus wrote: >> On 08/07/2014 08:38 AM, Oleg Bartunov wrote: >>> +1 for BRIN ! >>> >>> On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs wrote: On 7 August 2014 14:53, Robert Haas wrote: A better description would be "block range index" since we are indexing a range of blocks (not just one block). Perhaps a better one would be simply "range index", which we could abbreviate to RIN or BRIN. >> >> How about Block Range Dynamic indexes? >> >> Or Range Usage Metadata indexes? >> >> You see what I'm getting at: >> >> BRanDy >> >> RUM >> >> ... to keep with our "new indexes" naming scheme ... > Not the best fit for kids, fine for grad students. But, it goes perfectly with our GIN and VODKA indexes. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On 08/08/2014 09:02 AM, Tom Lane wrote: > Craig Ringer writes: >> On 08/08/2014 03:54 AM, Tom Lane wrote: >>> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol >>> at all. What happens five years from now when we switch to some other >>> implementation that doesn't have LSNs? > >> Everyone who's relying on them already via pg_stat_replication, etc, breaks. >> They're _already_ exposed to users. That ship has sailed. > > They're exposed to replication tools, yeah, but embedding them in the > wire protocol would be moving the goalposts a long way past that. As an > example of something that doubtless seemed like a good idea at the time, > consider the business about how an INSERT command completion tag includes > the OID of the inserted row. We're stuck with that obsolete idea > *forever* because it's embedded in the protocol for all clients. That makes sense. >> Well, I'd prefer to be able to negotiate with the server and establish >> requirements during the protocol handshake. > > Sure, but a GUC is not that. The problem with a GUC for changing > wire-level behavior is that it might be set by code far removed from > the wire, possibly breaking lower code levels that expected different > behavior. The multitude of ways that we offer for setting GUCs is > an active evil in this particular context. I'd value your input and suggestions then. I thought this is what PGC_BACKEND GUCs were for - set them in the startup packet and you can't mess with them afterwards. I can see that the ability of someone to cause that to be set in (e.g.) PGOPTIONS could be a problem though. AFAIK we don't _have_ a fancy negotiation system in the protocol, with back-and-forth exchanges of capabilities information. So is the appropriate thing to do here to set a non-GUC 'options' field in the startup packet? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed redundant i18n strings in json
On Thu, Aug 7, 2014 at 5:53 PM, Tom Lane wrote: > David G Johnston writes: > > Tom Lane-2 wrote > >> Surely that was meant to read "invalid number OF arguments". The > errhint > >> is only charitably described as English, as well. I'd suggest something > >> like "Arguments of json_build_object() must be pairs of keys and > values." > >> --- but maybe someone else can phrase that better. > > > The user documentation is worth emulating here: > > http://www.postgresql.org/docs/9.4/interactive/functions-json.html > > > errmsg("argument count must be divisible by 2") > > errhint("The argument list consists of alternating names and values") > > Seems reasonable to me. > > > Note that I s/keys/names/ to match said documentation. > > Hm. The docs aren't too consistent either: there are several other nearby > places that say "keys". Notably, the functions json[b]_object_keys() have > that usage embedded in their names, where we can't readily change it. > > I'm inclined to think we should s/names/keys/ in the docs instead. > Thoughts? > > Agreed - have the docs match the common API term usage in our implementation. Not sure its worth a thorough hunt but at least fix them as they are noticed. David J.
Re: [HACKERS] Reporting the commit LSN at commit time
On 08/07/2014 11:42 PM, Robert Haas wrote: > I doubt whether it makes sense to do this without a broader > understanding of how the client-side failover mechanism would work. > If we're going to add something like this, it should include libpq > support for actually doing something useful with it. I'm currently interested in targeting PgBouncer and PgJDBC, not libpq, though I can see that exposing helpers for it in libpq could be useful. The goal here is to permit a client to safely switch from one server to another - either in a multimaster async replication system like BDR, or routing read-only queries to hot standbys with streaming replication - and know for sure that its last commit is visible on the server it is now connected to. For hot standby that means it can avoid running queries that won't see the latest work it did if the standby is lagging - deciding to run them on the upstream instead, or wait, as appropriate. For BDR it'll permit the client to safely perform transparent failover to another node and resume write operations without risking conflicts with its own prior transactions . (I wrote some explanations about this on -general in the thread here: http://www.postgresql.org/message-id/84184aef-887d-49df-8f47-6377b1d6e...@gmail.com ). Broadly, what I'm thinking of is: * Whenever a client issues a transaction that gets a txid assigned, and that tx commits, the server reports the LSN that includes the commit. * The client keeps track of which server it is connected to using the server's (sysid, timelineid, databaseoid) or a similar identifier - probably specific to the replication protocol in use, unless something generic proves practical. * When the client has to switch to a new server or chooses to do so, it checks pg_stat_replication or pg_replication_slots, finds the server it was previously connected to, and checks to see if the new server has replayed up to the last write transaction this client performed on the previous server. If not, it can make a policy-driven decision: wait until replay catchup, wait for a while then bail out, etc. This is admittedly all a bit hand-wavey. I'm looking at ways to do it, not a firm implementation plan. Notably, the LSN (and timelineID) aren't the only way to keep track of the replay progress of a server and check it from another server. If the commit timestamps work is merged and the timestamp of the last replayed commit record is exposed in pg_replication_slots, the client could use the server-reported commit timestamp to the same effect. In the above you'll note that the client has to make some choices. The client might be picking different servers for failover, read load spreading, or other things I haven't thought of. It might be retaining the old connection and making new ones it wants to be consistent up to a certain point on the old connection (read spreading), or it might be dropping the old connection and making a new one (failover). If the new server to connect to isn't caught up yet it might want to wait indefinitely, wait a short while, or bail out immediately and try a different server. There's a lot of client/application specific policy going to be involved here, so I'm not sure it makes sense to try to make it transparent in libpq. I can see it being useful to expose some tools in libpq for it, without a doubt, so clients can do these sorts of things usefully. (There's also another whole new question: how do you pick which alternative server to connect to? But that's not really within the scope of this.) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
Craig Ringer writes: > On 08/08/2014 03:54 AM, Tom Lane wrote: >> FWIW, I think it's a seriously bad idea to expose LSNs in the protocol >> at all. What happens five years from now when we switch to some other >> implementation that doesn't have LSNs? > Everyone who's relying on them already via pg_stat_replication, etc, breaks. > They're _already_ exposed to users. That ship has sailed. They're exposed to replication tools, yeah, but embedding them in the wire protocol would be moving the goalposts a long way past that. As an example of something that doubtless seemed like a good idea at the time, consider the business about how an INSERT command completion tag includes the OID of the inserted row. We're stuck with that obsolete idea *forever* because it's embedded in the protocol for all clients. >> This position also obviates the need to complain about having a GUC >> that changes the protocol-level behavior, which is also a seriously >> bad idea. > Well, I'd prefer to be able to negotiate with the server and establish > requirements during the protocol handshake. Sure, but a GUC is not that. The problem with a GUC for changing wire-level behavior is that it might be set by code far removed from the wire, possibly breaking lower code levels that expected different behavior. The multitude of ways that we offer for setting GUCs is an active evil in this particular context. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Thu, Aug 7, 2014 at 7:58 AM, Robert Haas wrote: > range index might get confused with range types; block range index > seems better. I like summary, but I'm fine with block range index or > block filter index, too. +1 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed redundant i18n strings in json
David G Johnston writes: > Tom Lane-2 wrote >> Surely that was meant to read "invalid number OF arguments". The errhint >> is only charitably described as English, as well. I'd suggest something >> like "Arguments of json_build_object() must be pairs of keys and values." >> --- but maybe someone else can phrase that better. > The user documentation is worth emulating here: > http://www.postgresql.org/docs/9.4/interactive/functions-json.html > errmsg("argument count must be divisible by 2") > errhint("The argument list consists of alternating names and values") Seems reasonable to me. > Note that I s/keys/names/ to match said documentation. Hm. The docs aren't too consistent either: there are several other nearby places that say "keys". Notably, the functions json[b]_object_keys() have that usage embedded in their names, where we can't readily change it. I'm inclined to think we should s/names/keys/ in the docs instead. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Fri, Aug 8, 2014 at 9:47 AM, Josh Berkus wrote: > On 08/07/2014 08:38 AM, Oleg Bartunov wrote: >> +1 for BRIN ! >> >> On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs wrote: >>> On 7 August 2014 14:53, Robert Haas wrote: >>> A better description would be "block range index" since we are >>> indexing a range of blocks (not just one block). Perhaps a better one >>> would be simply "range index", which we could abbreviate to RIN or >>> BRIN. > > How about Block Range Dynamic indexes? > > Or Range Usage Metadata indexes? > > You see what I'm getting at: > > BRanDy > > RUM > > ... to keep with our "new indexes" naming scheme ... Not the best fit for kids, fine for grad students. BRIN seems to be a perfect consensus, so +1 for it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On 08/08/2014 03:54 AM, Tom Lane wrote: > Craig Ringer writes: >> Hi all >> To support transparent client-side failover in BDR, it's necessary to >> know what the LSN of a node was at the time a transaction committed and >> keep track of that in the client/proxy. > >> I'm thinking about adding a new message type in the protocol that gets >> sent immediately before CommandComplete, containing the LSN of the >> commit. Clients would need to enable the sending of this message with a >> GUC that they set when they connect, so it doesn't confuse clients that >> aren't expecting it or aware of it. > > FWIW, I think it's a seriously bad idea to expose LSNs in the protocol > at all. What happens five years from now when we switch to some other > implementation that doesn't have LSNs? Everyone who's relying on them already via pg_stat_replication, etc, breaks. They're _already_ exposed to users. That ship has sailed. That's not to say I'm stuck to LSNs as the way to do this, just that I don't think that particular argument is relevant. > I don't mind if you expose some other way to inquire about LSNs, but > let's *not* embed it into the wire protocol. Not even as an option. Well, the underlying need here is to have the client able to keep track of what point in the server's time-line it must see on a replica before it proceeds to use that replica. So if the client does some work on server A, then for some reason needs to / must use server B, it can ask server B "are you replayed up to the last transaction I performed on server A yet?" and wait until it is. That's useful for streaming replication (to permit consistent queries against read replicas) but it's much more so for BDR, where it's necessary to avoid a variety of multi-master replication anomalies and conflicts. I considered LSNs to be the logical mechanism for this as they're already user-visible, exposed in pg_stat_replication, they can already be used for just this purpose by hand (just with an extra round-trip), etc. An obvious alternative is to merge the commit timestamp work, then expose the timestamp of the last commit replayed in pg_stat_replication. Then all the client needs to keep track of is the server time of the last commit. > This position also obviates the need to complain about having a GUC > that changes the protocol-level behavior, which is also a seriously > bad idea. Well, I'd prefer to be able to negotiate with the server and establish requirements during the protocol handshake. As far as I know there isn't an explicit protocol negotiation with capabilities fields (just a plain version field), but we do have the startup packet's 'options' field. So I was thinking that requesting the setting of a PGC_BACKEND GUC in the startup packet would be a logical way for the client to request use of a protocol extension. Looking at ProcessStartupPacket(...) in postmaster.c I see that there's room for special-casing options. Do you think it would be more appropriate to add a new connection option that's sent by a client to request reporting of commit timestamps / LSNs / whatever by the server at commit time? If not, do you have an alternative suggestion? I can't imagine that extending the CommandComplete message is a desirable option. It seems like it'd be useful to expose this as a read-only GUC anyway, so I don't really see why a PGC_BACKEND GUC isn't exactly the right thing to use for this, but I'm happy to listen to suggestions. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 08/07/2014 08:38 AM, Oleg Bartunov wrote: > +1 for BRIN ! > > On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs wrote: >> On 7 August 2014 14:53, Robert Haas wrote: >> A better description would be "block range index" since we are >> indexing a range of blocks (not just one block). Perhaps a better one >> would be simply "range index", which we could abbreviate to RIN or >> BRIN. How about Block Range Dynamic indexes? Or Range Usage Metadata indexes? You see what I'm getting at: BRanDy RUM ... to keep with our "new indexes" naming scheme ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Quick doc fix
Guillaume Lelarge writes: > Still translating the 9.4 manual, and found another typo. Patch attached. Applied, thanks! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
On 08/07/2014 04:48 PM, Tom Lane wrote: > plpgsql is not efficient at all about coercions performed as a side > effect of assignments; if memory serves, it always handles them by > converting to text and back. So basically the added cost here came > from float8out() and float4in(). There has been some talk of trying > to do such coercions via SQL casts, but nothing's been done for fear > of compatibility problems. Yeah, that's a weeks-long project for someone. And would require a lot of tests ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
James Cloos writes: > "ST" == Shaun Thomas writes: > ST> That said, the documentation here says FLOAT4 is an alias for REAL, > ST> so it's somewhat nonintuitive for FLOAT4 to be so much slower than > ST> FLOAT8, which is an alias for DOUBLE PRECISION. > There are some versions of glibc where doing certain math on double is > faster than doing it on float, depending on how things are compiled. > Maybe this is one of them? No, it isn't. The problem here is that the result of SQRT() is float8 (a/k/a double precision) while the variable that it is to be assigned to is float4 (a/k/a real). As was noted upthread, changing the variable's declared type to eliminate the run-time type coercion removes just about all the discrepancy between PG and Oracle runtimes. The original comparison is not apples-to-apples because the Oracle coding required no type coercions. (Or at least, so I assume; I'm not too familiar with Oracle's math functions.) plpgsql is not efficient at all about coercions performed as a side effect of assignments; if memory serves, it always handles them by converting to text and back. So basically the added cost here came from float8out() and float4in(). There has been some talk of trying to do such coercions via SQL casts, but nothing's been done for fear of compatibility problems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hokey wrong versions of libpq in apt.postgresql.org
Hello, I know this has been brought up before: http://www.postgresql.org/message-id/20140724080902.ga28...@msg.df7cb.de But this is just plain wrong. I don't care that the FAQ (on the wiki) says we are doing it wrong for good reasons. When I (or anyone else) pulls postgresql-$version-dev, I want the libpq for my version. I do not want 9.3. Yes, it "should" (because of protocol compatibility) work but it doesn't always (as stated in that email and in a similar problem we just ran into). There can be unintended circumstances on machines when you mix and match like that. Can we please do some proper packaging on this? Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans." -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Append to a GUC parameter ?
Alvaro Herrera writes: > Fabrízio de Royes Mello wrote: > >> On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane wrote: >> >> > Josh Berkus writes: >> > > BTW, while there's unlikely to be a good reason to put search_path in >> > > pg.conf with appends, there are a LOT of reasons to want to be able to >> > > append to it during a session. >> > >> > [shrug...] You can do that today with current_setting()/set_config(). >> >> With a very simple statement you can do that: > > Of course, this doesn't solve the requirement that started this thread, > which is about having "includeable" pg.conf fragments to enable > extensions. o ISTM the idea of a token in the value string that would expand to an existing setting s/b general purpose enough to allow for prepend/append and not require adding a new opperator as += or whatever. > I say this without knowing just exactly what the implementation effort is but just to reiterate the original intent. I think someone already suggest this upthread. shared_preload_libraries = '%,more_libs' shared_preload_libraries = 'more_libs,%' At conclusion of file processing, stripping off an unnecessary delimiter at beginning or end of string would be a nice asthetic feature and/or might be required depending on whether an empty list value is legal. Thanks > -- > Álvaro Herrerahttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consult...@comcast.net p: 312.241.7800 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor wrote: > This one is frequently sorted as batch operations against the files are > performed in alphabetical order to reduce conflict issues that a random > ordering may cause between jobs. Sure. There are cases out there. But, again, I have a hard time imagining why you'd expect those to be pre-sorted in practice, and particularly why you'd feel justified in expecting that to sort much faster than equivalent though slightly imperfectly correlated data. Without that, the fmgr-elision aspect of sort support appears to offer enough for us to still win on balance [1], assuming 9.4 is our basis of comparison. [1] http://www.postgresql.org/message-id/cam3swzqhjxiyrsqbs5w3u-vtj_jt2hp8o02big5wyb4m9lp...@mail.gmail.com -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
Sigh. Found another example. A table with 15 million entries and a unique key on filesystem location for things users created via a web interface. Entries all begin with /usr/home/ ... This one is frequently sorted as batch operations against the files are performed in alphabetical order to reduce conflict issues that a random ordering may cause between jobs. regards, Rod On Thu, Aug 7, 2014 at 5:23 PM, Rod Taylor wrote: > > On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan wrote: > >> I think that pre-sorted, all-unique text datums, that have all >> differences beyond the first 8 bytes, that the user happens to >> actually want to sort are fairly rare. > > > While I'm sure it's not common, I've seen a couple of ten-million tuple > tables having a URL column as primary key where 98% of the entries begin > with 'http://www.' > > So, that exact scenario is out there. >
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 2:23 PM, Rod Taylor wrote: > While I'm sure it's not common, I've seen a couple of ten-million tuple > tables having a URL column as primary key where 98% of the entries begin > with 'http://www.' > > So, that exact scenario is out there. Sure, that scenario is relatively common. My point was that that scenario, alongside a perfect logical/physical heap correlation, and alongside a frequent need to sort is uncommon. If you're able to get away with a "bubble sort best case" there, then the sort is going to be extremely fast relative to any other database system anyway, and you don't have too much to complain about. It's exactly the same wasted cost as in the general case (where the tuples aren't in order), except that it looks more expensive next to your unrealistically fast sort that you were very lucky to be able to get away with. I think the special pre-check for already sorted tuples within qsort() is at best only justified by scenarios like where a serial primary key index is reindexed. That's about it. You can totally alter the outcome of this case by inserting a single tuple, so the top-level pre-check fails. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed redundant i18n strings in json
Tom Lane-2 wrote > Robert Haas < > robertmhaas@ > > writes: >> On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo >> < > daniele.varrazzo@ > > wrote: >>> I'd definitely replace /arg/argument/. Furthermore I'd avoid the form >>> "argument 1: something is wrong": the string is likely to end up in >>> sentences with other colons so I'd rather use "something is wrong at >>> argument 1". >>> >>> Is the patch attached better? > >> Looks OK to me. I thought someone else might comment, but since no >> one has, committed. > > It looks to me like this is still wrong: > > if (nargs % 2 != 0) > ereport(ERROR, > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > -errmsg("invalid number or arguments: object must be > matched key value pairs"))); > +errmsg("invalid number or arguments"), > +errhint("Object must be matched key value pairs."))); > > Surely that was meant to read "invalid number OF arguments". The errhint > is only charitably described as English, as well. I'd suggest something > like "Arguments of json_build_object() must be pairs of keys and values." > --- but maybe someone else can phrase that better. The user documentation is worth emulating here: http://www.postgresql.org/docs/9.4/interactive/functions-json.html errmsg("argument count must be divisible by 2") errhint("The argument list consists of alternating names and values") Note that I s/keys/names/ to match said documentation. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fixed-redundant-i18n-strings-in-json-tp5813330p5814122.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan wrote: > I think that pre-sorted, all-unique text datums, that have all > differences beyond the first 8 bytes, that the user happens to > actually want to sort are fairly rare. While I'm sure it's not common, I've seen a couple of ten-million tuple tables having a URL column as primary key where 98% of the entries begin with 'http://www.' So, that exact scenario is out there.
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 12:06 PM, Peter Geoghegan wrote: > I think that pre-sorted, all-unique text datums, that have all > differences beyond the first 8 bytes, that the user happens to > actually want to sort are fairly rare. Actually, you could use that case to justify not doing strxfrm() transformation of very large lists in the more general case, where strxfrm() blobs aren't truncated/abbreviated, but our qsort() is used, which goes against the strong recommendation of, just to pick an example at random, the glibc documentation. It states: "Here is an example of how you can use strxfrm when you plan to do many comparisons. It does the same thing as the previous example, but much faster, because it has to transform each string only once, no matter how many times it is compared with other strings. Even the time needed to allocate and free storage is much less than the time we save, when there are many strings." [1] Sure, that would still be better than the equivalent abbreviated keys case, since no strcoll() tie-breakers are required, but it would still be bad, and all because of our pre-check for sorted input. [1] https://www.gnu.org/software/libc/manual/html_node/Collation-Functions.html -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Quick doc fix
Hi, Still translating the 9.4 manual, and found another typo. Patch attached. Thanks. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index d3fcb82..cf174f0 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -528,7 +528,7 @@ the relfrozenxid column of a table's pg_class row contains the freeze cutoff XID that was used by the last whole-table VACUUM for that table. All rows -inserted by transactions with XIDs XIDs older than this cutoff XID are +inserted by transactions with XIDs older than this cutoff XID are guaranteed to have been frozen. Similarly, the datfrozenxid column of a database's pg_database row is a lower bound on the unfrozen XIDs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Aug 7, 2014 at 12:17 PM, Robert Haas wrote: > Gah. Hit send to soon. Also, as much as I'd prefer to avoid > relitigating the absolutely stupid debate about how to expand the > buffers, this is no good: > > + tss->buflen1 = Max(len1 + 1, tss->buflen1 * 2); > > If the first expansion is for a string >512MB and the second string is > longer than the first, this will exceed MaxAllocSize and error out. Fair point. I think this problem is already present in a few existing places, but it shouldn't be. I suggest this remediation: > + tss->buflen1 = Max(len1 + 1, Min(tss->buflen1 * 2, (int) > MaxAllocSize)); I too would very much prefer to not repeat that debate. :-) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
Craig Ringer writes: > Hi all > To support transparent client-side failover in BDR, it's necessary to > know what the LSN of a node was at the time a transaction committed and > keep track of that in the client/proxy. > I'm thinking about adding a new message type in the protocol that gets > sent immediately before CommandComplete, containing the LSN of the > commit. Clients would need to enable the sending of this message with a > GUC that they set when they connect, so it doesn't confuse clients that > aren't expecting it or aware of it. FWIW, I think it's a seriously bad idea to expose LSNs in the protocol at all. What happens five years from now when we switch to some other implementation that doesn't have LSNs? I don't mind if you expose some other way to inquire about LSNs, but let's *not* embed it into the wire protocol. Not even as an option. This position also obviates the need to complain about having a GUC that changes the protocol-level behavior, which is also a seriously bad idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Aug 7, 2014 at 12:15 PM, Robert Haas wrote: > In my original patch, I wrote NUL, as in the NUL character. You've > changed it to NULL, but the original was correct. NULL is a pointer > value that is not relevant here; the character with value 0 is NUL. "NULL-terminated string" seems like acceptable usage (e.g. [1]), but I'll try to use the term NUL in reference to '\0' in the future to avoid confusion. [1] https://en.wikipedia.org/wiki/Null-terminated_string -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
2014-08-07 7:10 GMT+02:00 Fujii Masao : > On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule > wrote: > > Hello > > > > updated version patch in attachment > > Thanks! But ISTM you forgot to attached the patch. > grr .. I am sorry > > >> +/* all psql known variables are included in list by default */ > >> +for (known_varname = known_varnames; *known_varname; > known_varname++) > >> +varnames[nvars++] = pg_strdup(*known_varname); > >> > >> Don't we need to append both prefix and suffix to even known variables? > > > > > > ??? I am not sure if I understand well - probably system "read only" > > variables as DBNAME, USER, VERSION should not be there > > I had that question because complete_from_variables() is also called by the > tab-completion of "\echo :" and it shows half-baked variables list. So > I thought probably we need to change complete_from_variables() more to > fix the problem. > I understand now. I fixed it. I have a question.\echo probably should not to show empty known variable. data for autocomplete for \echo should be same as result of "\set" > > >> +else if (strcmp(prev2_wd, "\\set") == 0) > >> +{ > >> +if (strcmp(prev_wd, "AUTOCOMMIT") == 0) > >> > >> ISTM that some psql variables like IGNOREEOF are not there. Why not? > > > > > > yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, > HISTCONTROL, > > HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER, VERSION > > > > There are more reasons: > > > > * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT, > > HISTSIZE, .. > > > > * variable is pseudo read only - change has not any effect: DBNAME, > > ENCODING, VERSION > > So HISTCONTROL should be there because it doesn't have such reasons at all? > > yes Pavel > Regards, > > -- > Fujii Masao > commit bacfdc0e03219265aeee34b78f7ec9d272d49f72 Author: Pavel Stehule Date: Wed Aug 6 23:05:42 2014 +0200 warnings and typo fixed diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 24e60b7..b94ed63 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -3608,6 +3608,79 @@ psql_completion(const char *text, int start, int end) { matches = complete_from_variables(text, "", ""); } + else if (strcmp(prev2_wd, "\\set") == 0) + { + if (strcmp(prev_wd, "AUTOCOMMIT") == 0) + { + static const char *const my_list[] = + {"on", "off", "interactive", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "COMP_KEYWORD_CASE") == 0) + { + static const char *const my_list[] = + {"lower", "upper", "preserve-lower", "preserve-upper", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "ECHO") == 0) + { + static const char *const my_list[] = + {"none", "errors", "queries", "all", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "ECHO_HIDDEN") == 0) + { + static const char *const my_list[] = + {"noexec", "off", "on", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "ON_ERROR_ROLLBACK") == 0) + { + static const char *const my_list[] = + {"on", "off", "interactive", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "ON_ERROR_STOP") == 0) + { + static const char *const my_list[] = + {"on", "off", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "QUIET") == 0) + { + static const char *const my_list[] = + {"on", "off", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "SINGLELINE") == 0) + { + static const char *const my_list[] = + {"on", "off", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "SINGLESTEP") == 0) + { + static const char *const my_list[] = + {"on", "off", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + else if (strcmp(prev_wd, "VERBOSITY") == 0) + { + static const char *const my_list[] = + {"default", "verbose", "terse", NULL}; + + COMPLETE_WITH_LIST_CS(my_list); + } + } else if (strcmp(prev_wd, "\\sf") == 0 || strcmp(prev_wd, "\\sf+") == 0) COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_functions, NULL); else if (strcmp(prev_wd, "\\cd") == 0 || @@ -4062,28 +4135,61 @@ complete_from_variables(const char *text, const char *prefix, const char *suffix { char **matches; char **varnames; + static const char **known_varname; int nvars = 0; int maxvars = 100; int i; struct _variable *ptr; + static const char *known_varnames[] = { + "AUTOCOMMIT", "COMP_KEYWORD_CASE", "DBNAME", "ECHO", "ECHO_HIDDEN", "ENCODING", + "FETCH_COUNT", "HISTCONTROL", "HISTFILE", "HISTSIZE", "HOST", "IGNOREEOFF", + "LASTOID", "ON_ERROR_ROLLBACK", "ON_ERROR_STOP", "PORT", "PROMPT1", "PROMPT2", + "PROMPT3", "QUIET", "SINGLELINE", "SINGLESTEP", "USER", "VERBOSITY", + NULL + }; + varnames = (char **) pg_malloc((maxvars + 1) * sizeof(char *)); + /* all psql known variables are included in list by default
Re: [HACKERS] Fixed redundant i18n strings in json
Robert Haas writes: > On Sat, Aug 2, 2014 at 9:15 AM, Daniele Varrazzo > wrote: >> I'd definitely replace /arg/argument/. Furthermore I'd avoid the form >> "argument 1: something is wrong": the string is likely to end up in >> sentences with other colons so I'd rather use "something is wrong at >> argument 1". >> >> Is the patch attached better? > Looks OK to me. I thought someone else might comment, but since no > one has, committed. It looks to me like this is still wrong: if (nargs % 2 != 0) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), -errmsg("invalid number or arguments: object must be matched key value pairs"))); +errmsg("invalid number or arguments"), +errhint("Object must be matched key value pairs."))); Surely that was meant to read "invalid number OF arguments". The errhint is only charitably described as English, as well. I'd suggest something like "Arguments of json_build_object() must be pairs of keys and values." --- but maybe someone else can phrase that better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Thu, Aug 7, 2014 at 3:15 PM, Robert Haas wrote: > On Wed, Aug 6, 2014 at 7:18 PM, Peter Geoghegan wrote: >> On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas wrote: >>> I've committed the patch I posted yesterday. I did not see a good >>> reason to meld that together in a single commit with the first of the >>> patches you posted; I'll leave you to revise that patch to conform >>> with this approach. >> >> Okay. Attached is the same patch set, rebased on top on your commit >> with appropriate amendments. > > Two things: > > +* result. Also, since strxfrm()/strcoll() require > NULL-terminated inputs, > > In my original patch, I wrote NUL, as in the NUL character. You've > changed it to NULL, but the original was correct. NULL is a pointer > value that is not relevant here; the character with value 0 is NUL. Gah. Hit send to soon. Also, as much as I'd prefer to avoid relitigating the absolutely stupid debate about how to expand the buffers, this is no good: + tss->buflen1 = Max(len1 + 1, tss->buflen1 * 2); If the first expansion is for a string >512MB and the second string is longer than the first, this will exceed MaxAllocSize and error out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Wed, Aug 6, 2014 at 7:18 PM, Peter Geoghegan wrote: > On Wed, Aug 6, 2014 at 1:11 PM, Robert Haas wrote: >> I've committed the patch I posted yesterday. I did not see a good >> reason to meld that together in a single commit with the first of the >> patches you posted; I'll leave you to revise that patch to conform >> with this approach. > > Okay. Attached is the same patch set, rebased on top on your commit > with appropriate amendments. Two things: +* result. Also, since strxfrm()/strcoll() require NULL-terminated inputs, In my original patch, I wrote NUL, as in the NUL character. You've changed it to NULL, but the original was correct. NULL is a pointer value that is not relevant here; the character with value 0 is NUL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas wrote: > I think that's actually not a very unrealistic case at all. In > general, I think that if a particular data distribution is a > reasonable scenario, that data distribution plus "it's already sorted" > is also reasonable. Data gets presorted in all kinds of ways: > sometimes it gets loaded in sorted (or nearly-sorted) order from > another data source; sometimes we do an index scan that produces data > in order by column A and then later sort by A, B; sometimes somebody > clusters the table. Respecting the right of other people to have > different opinions, I don't think I'll ever be prepared to concede the > presorted case as either uncommon or unimportant. I think that pre-sorted, all-unique text datums, that have all differences beyond the first 8 bytes, that the user happens to actually want to sort are fairly rare. All I'm saying is: if that's a case that has to lose out more than your more limited case in order to get a large number of representative cases 3 - 7 times faster, and marginal cases a good bit faster, that's probably worth it. I am willing to fix any case that I can - as you say, it's often easier to write the code that to argue - but let's have a sense of proportion about these things. Even your quite reasonable case is going to lose out a bit, because what I've done is fundamentally about deciding at intervals during copying tuples if it's worth it. If it isn't, then clearly that whole process went to waste, and we've merely cut our losses. We could add a GUC to control it as suggested by Simon, or even put something in attoptions per attribute to indicate that the optimization should be avoided, but that's something that we've historically resisted doing. > That's not to prejudge anything that may or may not be in your patch, > which I have not studied in enormous detail. It's just what I think > about the subject in general. Fair enough. At the risk of sounding trite, I'll add: everything is a trade-off. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade and toast tables bug discovered
On Tue, Aug 5, 2014 at 07:31:21PM -0400, Bruce Momjian wrote: > On Thu, Jul 10, 2014 at 06:38:26PM -0400, Bruce Momjian wrote: > > On Thu, Jul 10, 2014 at 06:17:14PM -0400, Bruce Momjian wrote: > > > Well, we are going to need to call internal C functions, often bypassing > > > their typical call sites and the assumption about locking, etc. Perhaps > > > this could be done from a plpgsql function. We could add and drop a > > > dummy column to force TOAST table creation --- we would then only need a > > > way to detect if a function _needs_ a TOAST table, which was skipped in > > > binary upgrade mode previously. > > > > > > That might be a minimalistic approach. > > > > I have thought some more on this. I thought I would need to open > > pg_class in C and do complex backend stuff, but I now realize I can do > > it from libpq, and just call ALTER TABLE and I think that always > > auto-checks if a TOAST table is needed. All I have to do is query > > pg_class from libpq, then construct ALTER TABLE commands for each item, > > and it will optionally create the TOAST table if needed. I just have to > > use a no-op ALTER TABLE command, like SET STATISTICS. > > Attached is a completed patch which handles oid conflicts in pg_class > and pg_type for TOAST tables that were not needed in the old cluster but > are needed in the new one. I was able to recreate a failure case and > this fixed it. > > The patch need to be backpatched because I am getting more-frequent bug > reports from users using pg_upgrade to leave now-end-of-life'ed 8.4. > There is not a good work-around for pg_upgrade failures without this > fix, but at least pg_upgrade throws an error. Patch applied through 9.3, with an additional Assert check. 9.2 code was different enough that there was too high a risk for backpatching. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan wrote: > On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas wrote: >> So here. You may not agree that the mitigation strategies for which >> others are asking for are worthwhile, but you can't expect everyone >> else to agree with your assessment of which cases are likely to occur >> in practice. The case of a cohort of strings to be sorted which share >> a long fixed prefix and have different stuff at the end does not seem >> particularly pathological to me. It doesn't, in other words, require >> an adversary: some real-world data sets will look like that. I will >> forebear repeating examples I've given before, but I've seen that kind >> of thing more than once in real data sets that people (well, me, >> anyway) actually wanted to put into a PostgreSQL database. So I'm >> personally of the opinion that the time you've put into trying to >> protect against those cases is worthwhile. I understand that you may >> disagree with that, and that's fine: we're not all going to agree on >> everything. > > I actually agree with you here. /me pops cork. > Sorting text is important, so we > should spend a lot of time avoiding regressions. Your example is > reasonable - I believe that people do that to a non-trivial degree. > The fact is, I probably can greatly ameliorate that case. However, to > give an example of a case I have less sympathy for, I'll name the case > you mention, *plus* the strings are already in logical order, and so > our qsort() can (dubiously) take advantage of that, so that there is a > 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls. > There might be other cases like that that crop up. I think that's actually not a very unrealistic case at all. In general, I think that if a particular data distribution is a reasonable scenario, that data distribution plus "it's already sorted" is also reasonable. Data gets presorted in all kinds of ways: sometimes it gets loaded in sorted (or nearly-sorted) order from another data source; sometimes we do an index scan that produces data in order by column A and then later sort by A, B; sometimes somebody clusters the table. Respecting the right of other people to have different opinions, I don't think I'll ever be prepared to concede the presorted case as either uncommon or unimportant. That's not to prejudge anything that may or may not be in your patch, which I have not studied in enormous detail. It's just what I think about the subject in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 07/08/14 16:16, Simon Riggs wrote: A better description would be "block range index" since we are indexing a range of blocks (not just one block). Perhaps a better one would be simply "range index", which we could abbreviate to RIN or BRIN. +1 for block range index (BRIN) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
2014-08-07 Oleg Bartunov : > +1 for BRIN ! +1, rolls off the tongue smoothly and captures the essence :-). Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
Hi Marco, > With the current full backup procedure they are backed up, so I think > that having them backed up with a rsync-like algorithm is what an user > would expect for an incremental backup. Exactly. I think a simple, flexible and robust method for file based incremental backup is all we need. I am confident it could be done for 9.5. I would like to quote every single word Simon said. Block level incremental backup (with Robert's proposal) is definitely the ultimate goal for effective and efficient physical backups. I see file level incremental backup as a very good "compromise", a sort of intermediate release which could nonetheless produce a lot of benefits to our user base, for years to come too. Thanks, Gabriele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas wrote: > So here. You may not agree that the mitigation strategies for which > others are asking for are worthwhile, but you can't expect everyone > else to agree with your assessment of which cases are likely to occur > in practice. The case of a cohort of strings to be sorted which share > a long fixed prefix and have different stuff at the end does not seem > particularly pathological to me. It doesn't, in other words, require > an adversary: some real-world data sets will look like that. I will > forebear repeating examples I've given before, but I've seen that kind > of thing more than once in real data sets that people (well, me, > anyway) actually wanted to put into a PostgreSQL database. So I'm > personally of the opinion that the time you've put into trying to > protect against those cases is worthwhile. I understand that you may > disagree with that, and that's fine: we're not all going to agree on > everything. I actually agree with you here. Sorting text is important, so we should spend a lot of time avoiding regressions. Your example is reasonable - I believe that people do that to a non-trivial degree. The fact is, I probably can greatly ameliorate that case. However, to give an example of a case I have less sympathy for, I'll name the case you mention, *plus* the strings are already in logical order, and so our qsort() can (dubiously) take advantage of that, so that there is a 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls. There might be other cases like that that crop up. I also thought the paper was pretty cool, and thought it might be interesting because of the fact that is was authored by McIlroy, and he refers to weaknesses in his and our very own implementation specifically. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Thu, Aug 7, 2014 at 11:07 AM, Robert Haas wrote: > On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan wrote: > > "The adversarial method works for almost any polymorphic program > > recognizable as quicksort. The subject quicksort may copy values at > > will, or work with lists rather than arrays. It may even pick the > > pivot at random. The quicksort will be vulnerable provided only that > > it satisfies some mild assumptions that are met by every > > implementation I have seen". > > > > IMHO, while worst case performance is a very useful tool for analyzing > > algorithms (particularly their worst case time complexity), a worst > > case should be put in its practical context. For example, if we had > > reason to be concerned about *adversarial* inputs, I think that there > > is a good chance that our qsort() actually would be problematic to the > > point of driving us to prefer some generally slower alternative. > If one is concerned about adversarial inputs, then as mentioned earlier, two main steps need to be taken. 1. Don't permit potential adversaries define the comparison function used by qsort(). 2. Perform random selection of pivot to prevent precomputed lists of data that invoke quadratic behavior in qsort. And before the argument that a random pivot will be less efficient than the median of median that's currently being used, you need to look at what is currently being used. 1. If a "small" partition is being sorted, the element at n/2 is selected as the pivot with n being the size of the partition. 2. If a "medium" partition is being sorted, then pivot selected is the median of the elements at 0, n/2, and n. 3. And finally, if a "large" partition is being sorted, potential pivots are selected from 0, n/8, n/4, 3n/8, n/2, 5n/8, 3n/4, 7n/8, and n. Of those 9 candidates, the median of medians is selected as the pivot. There is nothing in the world that would prevent the following. 1. Short partition - select a pivot at random. 2. Medium partition - select the median of 3 randomly selected candidates. 3. Large partition - select the median of medians of 9 randomly selected candidates. Using the above random pivot selection methods, the performance of a hardened qsort should be close to that of an unhardened qsort. The only real difference is the cost of generating random numbers to select the candidates for the median tests. However, if an malicious compare function is permitted to be defined, it would still be vulnerable to that attack, but it would be pretty much immune to a precomputed data attack. Or perhaps it just might be simpler to detect an attack and fall back to a sort algorithm that doesn't have quadratic behavior. What I mean by that is the qsort in the "Engineering a Sort Function" has a big O of approximately 1.1 n ln n comparisons. If upon starting qsort, it computes an upper bound of comparisons it's willing to perform. Say 3 n ln n or so (1.386 n log n is the estimate for a randomly selected pivot). Then if the upper bound is reached, the qsort function backs out leaving the array in a partially sorted order, and then calls a slower sort that doesn't exhibit quadratic behavior to actually finish the sort. The result of doing that would be most, if not all, sorts being handled by qsort in a timely fashion. But if bad luck or an adversary strikes, the qsort bails out after things look bad and passes the task to a different sort. I'll be the first to admit that the hybrid approach would be slower in those bad cases than it would be if the alternate sort was selected at the beginning, but it would be faster than letting qsort continue with quadratic behavior. -- There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. — C.A.R. Hoare
Re: [HACKERS] replication commands and log_statements
At 2014-08-07 23:22:43 +0900, masao.fu...@gmail.com wrote: > > That is, we log replication commands only when log_statement is set to > all. Neither new parameter is introduced nor log_statement is > redefined as a list. That sounds good to me. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
Il 07/08/14 17:25, Bruce Momjian ha scritto: > On Thu, Aug 7, 2014 at 08:35:53PM +0900, Michael Paquier wrote: >> On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao wrote: >>> There are some data which don't have LSN, for example, postgresql.conf. >>> When such data has been modified since last backup, they also need to >>> be included in incremental backup? Probably yes. >> Definitely yes. That's as well the case of paths like pg_clog, >> pg_subtrans and pg_twophase. > > I assumed these would be unconditionally backed up during an incremental > backup because they relatively small and you don't want to make a mistake. > You could decide to always copy files which doesn't have LSN, but you don't know what the user could put inside PGDATA. I would avoid any assumption on files which are not owned by Postgres. With the current full backup procedure they are backed up, so I think that having them backed up with a rsync-like algorithm is what an user would expect for an incremental backup. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Proposal: Incremental Backup
Il 07/08/14 17:29, Bruce Momjian ha scritto: > I am a little worried that many users will not realize this until they > try it and are disappointed, e.g. "Why is PG writing to my static data > so often?" --- then we get beaten up about our hint bits and freezing > behavior. :-( > > I am just trying to set realistic expectations. > Our experience is that for big databases (size over about 50GB) the file-level approach is often enough to halve the size of the backup. Users which run Postgres as Data Warehouse surely will benefit from it, so we could present it as a DWH oriented feature. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Reporting the commit LSN at commit time
On Wed, Aug 6, 2014 at 9:15 PM, Craig Ringer wrote: > To support transparent client-side failover in BDR, it's necessary to > know what the LSN of a node was at the time a transaction committed and > keep track of that in the client/proxy. I doubt whether it makes sense to do this without a broader understanding of how the client-side failover mechanism would work. If we're going to add something like this, it should include libpq support for actually doing something useful with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
+1 for BRIN ! On Thu, Aug 7, 2014 at 6:16 PM, Simon Riggs wrote: > On 7 August 2014 14:53, Robert Haas wrote: >> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier >> wrote: >>> 2014-08-06 Claudio Freire : >>> So, I like blockfilter a lot. I change my vote to blockfilter ;) >>> >>> +1 for blockfilter, because it stresses the fact that the "physical" >>> arrangement of rows in blocks matters for this index. >> >> I don't like that quite as well as summary, but I'd prefer either to >> the current naming. > > Yes, "summary index" isn't good. I'm not sure where the block or the > filter part comes in though, so -1 to "block filter", not least > because it doesn't have a good abbreviation (bfin??). > > A better description would be "block range index" since we are > indexing a range of blocks (not just one block). Perhaps a better one > would be simply "range index", which we could abbreviate to RIN or > BRIN. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Thu, Aug 7, 2014 at 11:03:40AM +0100, Simon Riggs wrote: > Well, there is a huge difference between file-level and block-level backup. > > Designing, writing and verifying block-level backup to the point that > it is acceptable is a huge effort. (Plus, I don't think accumulating > block numbers as they are written will be "low overhead". Perhaps > there was a misunderstanding there and what is being suggested is to > accumulate file names that change as they are written, since we > already do that in the checkpointer process, which would be an option > between 2 and 3 on the above list). > > What is being proposed here is file-level incremental backup that > works in a general way for various backup management tools. It's the > 80/20 first step on the road. We get most of the benefit, it can be > delivered in this release as robust, verifiable code. Plus, that is > all we have budget for, a fairly critical consideration. > > Big features need to be designed incrementally across multiple > releases, delivering incremental benefit (or at least that is what I > have learned). Yes, working block-level backup would be wonderful, but > if we hold out for that as the first step then we'll get nothing > anytime soon. That is fine. I just wanted to point out that as features are added, file-level incremental backups might not be useful. In fact, I think there are a lot of users for which file-level incremental backups will never be useful, i.e. you have to have a lot of frozen/static data for file-level incremental backups to be useful. I am a little worried that many users will not realize this until they try it and are disappointed, e.g. "Why is PG writing to my static data so often?" --- then we get beaten up about our hint bits and freezing behavior. :-( I am just trying to set realistic expectations. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Thu, Aug 7, 2014 at 08:35:53PM +0900, Michael Paquier wrote: > On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao wrote: > > There are some data which don't have LSN, for example, postgresql.conf. > > When such data has been modified since last backup, they also need to > > be included in incremental backup? Probably yes. > Definitely yes. That's as well the case of paths like pg_clog, > pg_subtrans and pg_twophase. I assumed these would be unconditionally backed up during an incremental backup because they relatively small and you don't want to make a mistake. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing pgbench parameter checking
Hello Tatsuo-san, Thanks for the review. I have registered it to Aug Commit fest. https://commitfest.postgresql.org/action/patch_view?id=1532 I'm not sure of the variable name "is_non_init_parameter_set". I would suggest "benchmarking_option_set"? Ok, I will replace the variable name as you suggested. Also, to be consistent, maybe it should check that no initialization-specific option are set when benchmarking. Good suggesition. Here is the v2 patch. I applied it without problem and tested it. * It seems that -c is ignored, the atoi() line has been removed. * Option -q is initialization specific, but not detected as such like the other, although there is a specific detection later. I think that it would be better to use the systematic approach, and to remove the specific check. * I would name the second boolean "initialization_option_set", as it is describe like that in the documentation. * I would suggest the following error messages: "some options cannot be used in initialization (-i) mode\n" and "some options cannot be used in benchmarking mode\n". Although these messages are rough, I think that they are enough and avoid running something unexpected, which is your purpose. Find attached a patch which adds these changes to your current version. -- Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index 67d7960..2f7d80e 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -2521,7 +2521,7 @@ main(int argc, char **argv) bool scale_given = false; bool benchmarking_option_set = false; - bool initializing_option_set = false; + bool initialization_option_set = false; CState *state; /* status of clients */ TState *threads; /* array of thread */ @@ -2610,6 +2610,7 @@ main(int argc, char **argv) break; case 'c': benchmarking_option_set = true; +nclients = atoi(optarg); if (nclients <= 0 || nclients > MAXCLIENTS) { fprintf(stderr, "invalid number of clients: %d\n", nclients); @@ -2695,6 +2696,7 @@ main(int argc, char **argv) use_log = true; break; case 'q': +initialization_option_set = true; use_quiet = true; break; case 'f': @@ -2722,7 +2724,7 @@ main(int argc, char **argv) } break; case 'F': -initializing_option_set = true; +initialization_option_set = true; fillfactor = atoi(optarg); if ((fillfactor < 10) || (fillfactor > 100)) { @@ -2776,14 +2778,14 @@ main(int argc, char **argv) case 0: /* This covers long options which take no argument. */ if (foreign_keys || unlogged_tables) - initializing_option_set = true; + initialization_option_set = true; break; case 2:/* tablespace */ -initializing_option_set = true; +initialization_option_set = true; tablespace = pg_strdup(optarg); break; case 3:/* index-tablespace */ -initializing_option_set = true; +initialization_option_set = true; index_tablespace = pg_strdup(optarg); break; case 4: @@ -2835,7 +2837,7 @@ main(int argc, char **argv) { if (benchmarking_option_set) { - fprintf(stderr, "some parameters cannot be used in initializing mode\n"); + fprintf(stderr, "some options cannot be used in initialization (-i) mode\n"); exit(1); } @@ -2844,9 +2846,9 @@ main(int argc, char **argv) } else { - if (initializing_option_set) + if (initialization_option_set) { - fprintf(stderr, "some parameters cannot be used in benchmarking mode\n"); + fprintf(stderr, "some options cannot be used in benchmarking mode\n"); exit(1); } } @@ -2868,13 +2870,6 @@ main(int argc, char **argv) exit(1); } - /* -q may be used only with -i */ - if (use_quiet && !is_init_mode) - { - fprintf(stderr, "quiet-logging is allowed only in initialization mode (-i)\n"); - exit(1); - } - /* --sampling-rate may must not be used with --aggregate-interval */ if (sample_rate > 0.0 && agg_interval > 0) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan wrote: > "The adversarial method works for almost any polymorphic program > recognizable as quicksort. The subject quicksort may copy values at > will, or work with lists rather than arrays. It may even pick the > pivot at random. The quicksort will be vulnerable provided only that > it satisfies some mild assumptions that are met by every > implementation I have seen". > > IMHO, while worst case performance is a very useful tool for analyzing > algorithms (particularly their worst case time complexity), a worst > case should be put in its practical context. For example, if we had > reason to be concerned about *adversarial* inputs, I think that there > is a good chance that our qsort() actually would be problematic to the > point of driving us to prefer some generally slower alternative. I completely agree with this, and I think everyone else does, too. Where we don't all necessarily agree is which worst cases are realistic and which worst cases are simply pathological cases with which we need not be concerned in practice. For example, when I proposed the patch to use MVCC catalog snapshots, Andres invented a test case which I thought was far more brutal than anything likely to be encountered in the real world. But Andres didn't agree; he thought the test case was realistic. So, I worked on the patch some more and came up with a solution that performs acceptably even if those brutal conditions are encountered in the world. As a result, the patch as committed is better than the one originally submitted. I could certainly have spent more time debating whether that effort was worthwhile, and maybe I would have won the argument, but it was a better of use of that time to instead try to improve the patch. So here. You may not agree that the mitigation strategies for which others are asking for are worthwhile, but you can't expect everyone else to agree with your assessment of which cases are likely to occur in practice. The case of a cohort of strings to be sorted which share a long fixed prefix and have different stuff at the end does not seem particularly pathological to me. It doesn't, in other words, require an adversary: some real-world data sets will look like that. I will forebear repeating examples I've given before, but I've seen that kind of thing more than once in real data sets that people (well, me, anyway) actually wanted to put into a PostgreSQL database. So I'm personally of the opinion that the time you've put into trying to protect against those cases is worthwhile. I understand that you may disagree with that, and that's fine: we're not all going to agree on everything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Thu, Aug 7, 2014 at 10:16 AM, Simon Riggs wrote: > On 7 August 2014 14:53, Robert Haas wrote: >> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier >> wrote: >>> 2014-08-06 Claudio Freire : >>> So, I like blockfilter a lot. I change my vote to blockfilter ;) >>> >>> +1 for blockfilter, because it stresses the fact that the "physical" >>> arrangement of rows in blocks matters for this index. >> >> I don't like that quite as well as summary, but I'd prefer either to >> the current naming. > > Yes, "summary index" isn't good. I'm not sure where the block or the > filter part comes in though, so -1 to "block filter", not least > because it doesn't have a good abbreviation (bfin??). > > A better description would be "block range index" since we are > indexing a range of blocks (not just one block). Perhaps a better one > would be simply "range index", which we could abbreviate to RIN or > BRIN. range index might get confused with range types; block range index seems better. I like summary, but I'm fine with block range index or block filter index, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
Simon Riggs wrote: > On 7 August 2014 14:53, Robert Haas wrote: > > On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier > > wrote: > >> 2014-08-06 Claudio Freire : > >> > >>> So, I like blockfilter a lot. I change my vote to blockfilter ;) > >> > >> +1 for blockfilter, because it stresses the fact that the "physical" > >> arrangement of rows in blocks matters for this index. > > > > I don't like that quite as well as summary, but I'd prefer either to > > the current naming. > > Yes, "summary index" isn't good. I'm not sure where the block or the > filter part comes in though, so -1 to "block filter", not least > because it doesn't have a good abbreviation (bfin??). I was thinking just "blockfilter" (I did show a sample command). Claudio explained the name downthread; personally, of all the options suggested thus far, it's the one I like the most (including minmax). At this point, the naming issue is what is keeping me from committing this patch, so the quicker we can solve it, the merrier. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
On Thu, Aug 7, 2014 at 5:12 AM, Craig Ringer wrote: > New Intel hardware supports IEEE 754:2008 decimal floating point in > hardware, and I'm quite interested in implementing DECFLOAT(n) for > PostgreSQL to take advantage of that. +1 merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and log_statements
On Wed, Jul 2, 2014 at 4:26 AM, Abhijit Menon-Sen wrote: > Hi. > > Do we have any consensus about what to do with these two patches? > > 1. Introduce a "log_replication_command" setting. > 2. Change log_statement to be a list of tokens. > > If I understand correctly, there weren't any strong objections to the > former, but the situation is less clear when it comes to the second. On second thought, I'd like to propose the third option. This is the same idea as Andres suggested upthread. That is, we log replication commands only when log_statement is set to all. Neither new parameter is introduced nor log_statement is redefined as a list. Firstly, since log_statement=all means that all statements are logged, it's basically natural and intuitive to log even replication commands in that setting. Secondly, any statements except DDL and data-modifying statements, e.g., VACUUM, CHECKPOINT, BEGIN, ..etc, are logged only when log_statement=all. So even if some users want to control the logging of DDL and maintenance commands orthogonally, which is impossible for now. Therefore, even if the logging of replication commands which are neither DDL nor data-modifying statements also cannot be controlled orthogonally, I don't think that users get so surprised. Of course I have no objection to address the issue by, e.g., enabling such orthogonal logging control, in the future, though. Thought? What about introducing that simple but not so confusing change first? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
Thanks for your review and comments, Amit! Amit Khandekar wrote: > On 21 June 2014 23:36, Kevin Grittner wrote: >> Kevin Grittner wrote: >> I didn't change the tuplestores to TID because it seemed to me that >> it would preclude using transition relations with FDW triggers, and >> it seemed bad not to support that. Does anyone see a way around >> that, or feel that it's OK to not support FDW triggers in this >> regard? > > I think it is ok to use tuplestores for now, but as mentioned by you > somewhere else in the thread, later on we should change this to using > tids as an optimization. Well, the optimization would probably be to use a tuplestore of tids referencing modified tuples in the base table, rather than a tuplestore of the data itself. But I think we're in agreement. >> Does this look good otherwise, as far as it goes? > > I didn't yet extensively go through the patch, but before that, just a > few quick comments: > > I see that the tupelstores for transition tables are stored per query > depth. If the > DML involves a table that has multiple child tables, it seems as though > a single tuplestore would be shared among all these tables. That means > if we define > such triggers using transition table clause for all the child tables, then > the trigger function for a child table will see tuples from other child tables > as well. Is that true ? I don't think so. I will make a note of the concern to confirm by testing. > If it is, it does not make sense. I agree. > I tried to google some SQLs that use REFERENCING clause with triggers. > It looks like in some database systems, even the WHEN clause of CREATE TRIGGER > can refer to a transition table, just like how it refers to NEW and > OLD row variables. > > For e.g. : > CREATE TRIGGER notify_dept > AFTER UPDATE ON weather > REFERENCING NEW_TABLE AS N_TABLE > NEW AS N_ROW > FOR EACH ROW > WHEN ((SELECT AVG (temperature) FROM N_TABLE) > 10) > BEGIN > notify_department(N_ROW.temperature, N_ROW.city); > END > > Above, it is used to get an aggregate value of all the changed rows. I think > we do not currently support aggregate expressions in the where clause, but > with > transition tables, it makes more sense to support it later if not now. Interesting point; I had not thought about that. Will see if I can include support for that in the patch for the next CF; failing that; I will at least be careful to not paint myself into a corner where it is unduly hard to do later. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Thu, Aug 7, 2014 at 11:16 AM, Simon Riggs wrote: > On 7 August 2014 14:53, Robert Haas wrote: >> On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier >> wrote: >>> 2014-08-06 Claudio Freire : >>> So, I like blockfilter a lot. I change my vote to blockfilter ;) >>> >>> +1 for blockfilter, because it stresses the fact that the "physical" >>> arrangement of rows in blocks matters for this index. >> >> I don't like that quite as well as summary, but I'd prefer either to >> the current naming. > > Yes, "summary index" isn't good. I'm not sure where the block or the > filter part comes in though, so -1 to "block filter", not least > because it doesn't have a good abbreviation (bfin??). Block filter would refer to the index property that selects blocks, not tuples, and it does so through a "filter function" (for min-max, it's a range check, but for other opclasses it could be anything). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On 7 August 2014 14:53, Robert Haas wrote: > On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier > wrote: >> 2014-08-06 Claudio Freire : >> >>> So, I like blockfilter a lot. I change my vote to blockfilter ;) >> >> +1 for blockfilter, because it stresses the fact that the "physical" >> arrangement of rows in blocks matters for this index. > > I don't like that quite as well as summary, but I'd prefer either to > the current naming. Yes, "summary index" isn't good. I'm not sure where the block or the filter part comes in though, so -1 to "block filter", not least because it doesn't have a good abbreviation (bfin??). A better description would be "block range index" since we are indexing a range of blocks (not just one block). Perhaps a better one would be simply "range index", which we could abbreviate to RIN or BRIN. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Append to a GUC parameter ?
Fabrízio de Royes Mello wrote: > On Tue, Aug 5, 2014 at 10:55 PM, Tom Lane wrote: > > > Josh Berkus writes: > > > BTW, while there's unlikely to be a good reason to put search_path in > > > pg.conf with appends, there are a LOT of reasons to want to be able to > > > append to it during a session. > > > > [shrug...] You can do that today with current_setting()/set_config(). > > With a very simple statement you can do that: Of course, this doesn't solve the requirement that started this thread, which is about having "includeable" pg.conf fragments to enable extensions. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minmax indexes
On Wed, Aug 6, 2014 at 4:06 PM, Nicolas Barbier wrote: > 2014-08-06 Claudio Freire : > >> So, I like blockfilter a lot. I change my vote to blockfilter ;) > > +1 for blockfilter, because it stresses the fact that the "physical" > arrangement of rows in blocks matters for this index. I don't like that quite as well as summary, but I'd prefer either to the current naming. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixed redundant i18n strings in json
On Wed, Aug 6, 2014 at 7:20 PM, Jeff Janes wrote: > On Wed, Aug 6, 2014 at 8:35 AM, Robert Haas wrote: >> >> On Tue, Aug 5, 2014 at 6:05 PM, Jeff Janes wrote: >> > I think you missed one of the regression tests, see attached >> >> Woops. Thanks, committed. > > Thanks. > > It needs to go into 9_4_STABLE as well. Sheesh, where is my brain? Done, thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning performance: cache stringToNode() of pg_constraint.ccbin
On Wed, Aug 6, 2014 at 9:35 PM, Bruce Momjian wrote: > On Sun, Jan 12, 2014 at 12:53:40PM -0500, Noah Misch wrote: >> On Sat, Jan 11, 2014 at 02:10:01PM -0500, Bruce Momjian wrote: >> > On Mon, Jun 3, 2013 at 03:07:27PM -0400, Noah Misch wrote: >> > > A colleague, Korry Douglas, observed a table partitioning scenario where >> > > deserializing pg_constraint.ccbin is a hot spot. The following test >> > > case, a >> > > simplification of a typical partitioning setup, spends 28% of its time in >> > > stringToNode() and callees thereof: >> > >> > Noah, what is the status on this? >> >> Further study revealed a defect in the patch's memory management, and I have >> not gotten around to correcting that. > > I talked to Noah and he can't continue on this item. Can someone else > work on it? Well, it would be helpful if he could describe the defect he found, so that the next person doesn't have to guess. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD crashes with assertion and LWLOCK_STATS enabled
On Fri, May 23, 2014 at 9:38 PM, Robert Haas wrote: > On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU wrote: >> The failing assertion is for prohibiting memory allocation in a critical >> section, which is introduced by commit 4a170ee9 on 2014-04-04. This issue is still in open item list for 9.4. But the commit which had caused this issue was reverted by d27d493. So I removed this from the Open Item list. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wraparound limits
On 08/07/2014 01:34 PM, Teodor Sigaev wrote: Hi! I have a questions about setting transaction's wraparound limits. Function SetTransactionIdLimit() in access/transam/varsup.c: 1) xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1); if (xidWrapLimit < FirstNormalTransactionId) xidWrapLimit += FirstNormalTransactionId; Isn't it a problem if oldest_datfrozenxid > MaxTransactionId/2? Don't think so. What problem do you see? 2) xidStopLimit = xidWrapLimit - 100; if (xidStopLimit < FirstNormalTransactionId) xidStopLimit -= FirstNormalTransactionId; xidWarnLimit = xidStopLimit - 1000; if (xidWarnLimit < FirstNormalTransactionId) xidWarnLimit -= FirstNormalTransactionId; Why does it use '-' instead of '+' if variable < FirstNormalTransactionId? In this case it is easy to get xidStopLimit > xidWrapLimit or xidWarnLimit > xidStopLimit... Remember that the limits are compared with xids using wrap-around aware functions TransactionIdPrecedes and TransactionidFollows. Not regular < and >. The "<" checks above are just to check if the XID hit one of the special TransactionIds, and if so, increase/decrese it to get back to the normal range. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On Thu, May 8, 2014 at 10:28 PM, Andres Freund wrote: > Well, we have to live with it for now :) I just had a look at the first patch and got some comments: 1) Instead of using an assertion here, wouldn't it be better to error out if name is NULL, and truncate the name if it is longer than SHMEM_INDEX_KEYSIZE - 1 (including '\0')? scanstr in scansup.c? Assert(IsUnderPostmaster); + Assert(name != NULL && strlen(name) > 0 && + strlen(name) < SHMEM_INDEX_KEYSIZE - 1); 2) The addition of a field to track the size of a dsm should be explicitly mentioned, this is useful for the 2nd patch. 3) The refactoring done in dsm_create to find an unused slot should be done as a separate patch for clarity. 4) Using '\0' here would be more adapted: + item->name[SHMEM_INDEX_KEYSIZE - 1] = 0; Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On 21 June 2014 23:36, Kevin Grittner wrote: > Kevin Grittner wrote: > I didn't change the tuplestores to TID because it seemed to me that > it would preclude using transition relations with FDW triggers, and > it seemed bad not to support that. Does anyone see a way around > that, or feel that it's OK to not support FDW triggers in this > regard? I think it is ok to use tuplestores for now, but as mentioned by you somewhere else in the thread, later on we should change this to using tids as an optimization. > > Does this look good otherwise, as far as it goes? I didn't yet extensively go through the patch, but before that, just a few quick comments: I see that the tupelstores for transition tables are stored per query depth. If the DML involves a table that has multiple child tables, it seems as though a single tuplestore would be shared among all these tables. That means if we define such triggers using transition table clause for all the child tables, then the trigger function for a child table will see tuples from other child tables as well. Is that true ? If it is, it does not make sense. For fdw tuplestore, this issue does not arise because the DML won't involve multiple target tables I suppose. --- I tried to google some SQLs that use REFERENCING clause with triggers. It looks like in some database systems, even the WHEN clause of CREATE TRIGGER can refer to a transition table, just like how it refers to NEW and OLD row variables. For e.g. : CREATE TRIGGER notify_dept AFTER UPDATE ON weather REFERENCING NEW_TABLE AS N_TABLE NEW AS N_ROW FOR EACH ROW WHEN ((SELECT AVG (temperature) FROM N_TABLE) > 10) BEGIN notify_department(N_ROW.temperature, N_ROW.city); END Above, it is used to get an aggregate value of all the changed rows. I think we do not currently support aggregate expressions in the where clause, but with transition tables, it makes more sense to support it later if not now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao wrote: > There are some data which don't have LSN, for example, postgresql.conf. > When such data has been modified since last backup, they also need to > be included in incremental backup? Probably yes. Definitely yes. That's as well the case of paths like pg_clog, pg_subtrans and pg_twophase. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Thu, Aug 7, 2014 at 12:20 AM, Bruce Momjian wrote: > On Wed, Aug 6, 2014 at 06:48:55AM +0100, Simon Riggs wrote: >> On 6 August 2014 03:16, Bruce Momjian wrote: >> > On Wed, Aug 6, 2014 at 09:17:35AM +0900, Michael Paquier wrote: >> >> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs wrote: >> >> > >> >> > On 5 August 2014 22:38, Claudio Freire wrote: >> >> > Thinking some more, there seems like this whole store-multiple-LSNs >> >> > thing is too much. We can still do block-level incrementals just by >> >> > using a single LSN as the reference point. We'd still need a complex >> >> > file format and a complex file reconstruction program, so I think that >> >> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL. >> >> >> >> Yes, that's the approach taken by pg_rman for its block-level >> >> incremental backup. Btw, I don't think that the CPU cost to scan all >> >> the relation files added to the one to rebuild the backups is worth >> >> doing it on large instances. File-level backup would cover most of the >> > >> > Well, if you scan the WAL files from the previous backup, that will tell >> > you what pages that need incremental backup. >> >> That would require you to store that WAL, which is something we hope >> to avoid. Plus if you did store it, you'd need to retrieve it from >> long term storage, which is what we hope to avoid. > > Well, for file-level backups we have: > > 1) use file modtime (possibly inaccurate) > 2) use file modtime and checksums (heavy read load) > > For block-level backups we have: > > 3) accumulate block numbers as WAL is written > 4) read previous WAL at incremental backup time > 5) read data page LSNs (high read load) > > The question is which of these do we want to implement? There are some data which don't have LSN, for example, postgresql.conf. When such data has been modified since last backup, they also need to be included in incremental backup? Probably yes. So implementing only block-level backup seems not complete solution. It needs file-level backup as an infrastructure for such data. This makes me think that it's more reasonable to implement file-level backup first. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On Thu, Aug 7, 2014 at 5:02 PM, Heikki Linnakangas wrote: > On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote: >> >> 2014-08-07 13:47 GMT+09:00 Fujii Masao : >> >>> On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas >>> wrote: On 08/06/2014 08:39 PM, Fujii Masao wrote: > > The WAL files that pg_receivexlog writes will not be re-read soon > basically, > so we can advise the OS to release any cached pages when WAL file is > closed. I feel inclined to change pg_receivexlog that way. Thought? -1. The OS should be smart enough to not thrash the cache by files that >>> >>> are written sequentially and never read. >>> >>> >> OS's buffer strategy is optimized for general situation. Do you forget OS >> hackers discussion last a half of year? >> >>> Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe >>> not, >>> so I was thinking that posix_fadvise is called when the server closes WAL >>> file. >> >> >> That's right. > > > Well, I'd like to hear someone from the field complaining that > pg_receivexlog is thrashing the cache and thus reducing the performance of > some other process. Or a least a synthetic test case that demonstrates that > happening. Yeah, I will test that by seeing the performance of PostgreSQL which is running in the same server as pg_receivexlog is running. We can just compare that performance with normal pg_receivexlog and that with the patched one (i.e., posix_fadvise is called). > > >> By the way, does pg_receivexlog process have fsync() in every WAL commit? > > > It fsync's each file after finishing to write it. Ie. each WAL file is > fsync'd once. > > >> If yes, I think that we need no or less fsync() option for the better >> performance. It is general in NOSQL storages. >> If no, we need fsync() option for more getting reliability and data >> integrarity. > > > Hmm. An fsync=off style option might make sense, although I doubt the one > fsync at end of file is causing a performance problem for anyone in > practice. Haven't heard any complaints, anyway. > > An option to fsync after every commit record might make sense if you use > pg_receivexlog with synchronous replication. Doing that would require > parsing the WAL, though, to see where the commit records are. But then > again, the fsync's wouldn't need to correspond to commit records. We could > fsync just before we go to sleep to wait for more WAL to be received. That's what Furuya-san proposed in last CommitFest. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reporting the commit LSN at commit time
On Thu, Aug 7, 2014 at 4:15 AM, Craig Ringer wrote: > I'm thinking about adding a new message type in the protocol that gets > sent immediately before CommandComplete, containing the LSN of the > commit. Clients would need to enable the sending of this message with a > GUC that they set when they connect, so it doesn't confuse clients that > aren't expecting it or aware of it. > > Is this something you can see being useful for other non-BDR purposes? I have been thinking about something similar. For regular streaming replication you could keep track of the commit LSN on a per client basis and automatically redirect read queries to a standby if standby apply location is larger than the commit LSN of this particular client. This can be done mostly transparently for the application while not running into the issue that recent modifications disappear for a while after commit. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [GSoC] kmedoids status report
Hi! Here is a report of what has been discussed yesterday on IRC. The kmedoids module now seems to work correctly on basic datasets. I've also implemented its variants with different seeding methods: random initial medoids, and initial medoids distributed among the points (similar to kmeans++ [0]). The next steps are: - Making better tests (1-2d) - Writing the documentation (1d) - Adapting my code to GP and HAWQ -- btw, are default parameters now available in GP and HAWQ? (1-2d) - Refactoring kmedoids and kmeans, as there is code duplication between those two. For this step, I don't know if I'll have time to create a clustering module, and make kmeans and kmedoids submodules of it. If yes, then it's perfect; otherwise, I'll just rename the common functions in kmeans, and have kmedoids call them from there. Hai also helped me setup (once more) the VM where GreenPlum and HAWQ are installed, so that I can test my code on these DBMS. As a reminder, I'm supposed to stop coding next Monday, and then the last week is dedicated to documentation, tests, refactoring and polishing. Regards, Maxence [0] https://en.wikipedia.org/wiki/K-means%2B%2B
[HACKERS] Wraparound limits
Hi! I have a questions about setting transaction's wraparound limits. Function SetTransactionIdLimit() in access/transam/varsup.c: 1) xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1); if (xidWrapLimit < FirstNormalTransactionId) xidWrapLimit += FirstNormalTransactionId; Isn't it a problem if oldest_datfrozenxid > MaxTransactionId/2? 2) xidStopLimit = xidWrapLimit - 100; if (xidStopLimit < FirstNormalTransactionId) xidStopLimit -= FirstNormalTransactionId; xidWarnLimit = xidStopLimit - 1000; if (xidWarnLimit < FirstNormalTransactionId) xidWarnLimit -= FirstNormalTransactionId; Why does it use '-' instead of '+' if variable < FirstNormalTransactionId? In this case it is easy to get xidStopLimit > xidWrapLimit or xidWarnLimit > xidStopLimit... Thank you. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?
On 08/05/2014 10:44 PM, Shaun Thomas wrote: > On 08/05/2014 12:56 AM, Mark Kirkwood wrote: > >> The moral of the story for this case is that mapping Oracle to Postgres >> datatypes can require some careful thought. Using 'native' types (like >> integer, float8 etc) will generally give vastly quicker performance. > > We've seen a lot of this ourselves. Oracle's NUMERIC is a native type, > whereas ours is emulated. I'm not sure what you mean by "native" vs "emulated" here. PostgreSQL's NUMERIC is binary-coded decimal with mathematical operations performed in software. According to the docs, my impression is that Oracle's NUMBER is stored more like a decfloat: http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i22289 but my Oracle expertise is admittedly lacking. New Intel hardware supports IEEE 754:2008 decimal floating point in hardware, and I'm quite interested in implementing DECFLOAT(n) for PostgreSQL to take advantage of that. A DECFLOAT type would also be more compatible with things like the C# "Decimal" type than our current NUMERIC is. > At least you used INT though. I've seen too many Oracle shops using > NUMERIC in PostgreSQL because it's there, and suffering major > performance hits because of it. In retrospect it might be a bit of a loss that the numeric type format doesn't reserve a couple of bits for short-value flags, so we could store and work with native integers for common values. There's NumericShort and NumericLong, but no NumericNative or NumericInt32 or whatever. OTOH, by the time you handle alignment and padding requirements and the cost of deciding which numeric format the input is, it might not've been much faster. Presumably it was looked at during the introduction of NumericShort. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On 6 August 2014 17:27, Bruce Momjian wrote: > On Wed, Aug 6, 2014 at 01:15:32PM -0300, Claudio Freire wrote: >> On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian wrote: >> > >> > Well, for file-level backups we have: >> > >> > 1) use file modtime (possibly inaccurate) >> > 2) use file modtime and checksums (heavy read load) >> > >> > For block-level backups we have: >> > >> > 3) accumulate block numbers as WAL is written >> > 4) read previous WAL at incremental backup time >> > 5) read data page LSNs (high read load) >> > >> > The question is which of these do we want to implement? #1 is very easy >> > to implement, but incremental _file_ backups are larger than block-level >> > backups. If we have #5, would we ever want #2? If we have #3, would we >> > ever want #4 or #5? >> >> You may want to implement both #3 and #2. #3 would need a config >> switch to enable updating the bitmap. That would make it optional to >> incur the I/O cost of updating the bitmap. When the bitmap isn't >> there, the backup would use #2. Slow, but effective. If slowness is a >> problem for you, you enable the bitmap and do #3. >> >> Sounds reasonable IMO, and it means you can start by implementing #2. > > Well, Robert Haas had the idea of a separate process that accumulates > the changed WAL block numbers, making it low overhead. I question > whether we need #2 just to handle cases where they didn't enable #3 > accounting earlier. If that is the case, just do a full backup and > enable #3. Well, there is a huge difference between file-level and block-level backup. Designing, writing and verifying block-level backup to the point that it is acceptable is a huge effort. (Plus, I don't think accumulating block numbers as they are written will be "low overhead". Perhaps there was a misunderstanding there and what is being suggested is to accumulate file names that change as they are written, since we already do that in the checkpointer process, which would be an option between 2 and 3 on the above list). What is being proposed here is file-level incremental backup that works in a general way for various backup management tools. It's the 80/20 first step on the road. We get most of the benefit, it can be delivered in this release as robust, verifiable code. Plus, that is all we have budget for, a fairly critical consideration. Big features need to be designed incrementally across multiple releases, delivering incremental benefit (or at least that is what I have learned). Yes, working block-level backup would be wonderful, but if we hold out for that as the first step then we'll get nothing anytime soon. I would also point out that the more specific we make our backup solution the less likely it is to integrate with external backup providers. Oracle's RMAN requires specific support in external software. 10 years after Postgres PITR we still see many vendors showing "PostgreSQL Backup Supported" as meaning pg_dump only. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enhancing pgbench parameter checking
Fabien, > I have not tested, but the patch looks ok in principle. Thanks for the review. I have registered it to Aug Commit fest. https://commitfest.postgresql.org/action/patch_view?id=1532 > I'm not sure of the variable name "is_non_init_parameter_set". I would > suggest "benchmarking_option_set"? Ok, I will replace the variable name as you suggested. > Also, to be consistent, maybe it should check that no initialization-specific > option are set when benchmarking. Good suggesition. Here is the v2 patch. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c index c0e5e24..67d7960 100644 --- a/contrib/pgbench/pgbench.c +++ b/contrib/pgbench/pgbench.c @@ -2520,6 +2520,9 @@ main(int argc, char **argv) char *filename = NULL; bool scale_given = false; + bool benchmarking_option_set = false; + bool initializing_option_set = false; + CState *state; /* status of clients */ TState *threads; /* array of thread */ @@ -2599,12 +2602,14 @@ main(int argc, char **argv) break; case 'S': ttype = 1; +benchmarking_option_set = true; break; case 'N': ttype = 2; +benchmarking_option_set = true; break; case 'c': -nclients = atoi(optarg); +benchmarking_option_set = true; if (nclients <= 0 || nclients > MAXCLIENTS) { fprintf(stderr, "invalid number of clients: %d\n", nclients); @@ -2629,6 +2634,7 @@ main(int argc, char **argv) #endif /* HAVE_GETRLIMIT */ break; case 'j': /* jobs */ +benchmarking_option_set = true; nthreads = atoi(optarg); if (nthreads <= 0) { @@ -2637,9 +2643,11 @@ main(int argc, char **argv) } break; case 'C': +benchmarking_option_set = true; is_connect = true; break; case 'r': +benchmarking_option_set = true; is_latencies = true; break; case 's': @@ -2652,6 +2660,7 @@ main(int argc, char **argv) } break; case 't': +benchmarking_option_set = true; if (duration > 0) { fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both.\n"); @@ -2665,6 +2674,7 @@ main(int argc, char **argv) } break; case 'T': +benchmarking_option_set = true; if (nxacts > 0) { fprintf(stderr, "specify either a number of transactions (-t) or a duration (-T), not both.\n"); @@ -2681,12 +2691,14 @@ main(int argc, char **argv) login = pg_strdup(optarg); break; case 'l': +benchmarking_option_set = true; use_log = true; break; case 'q': use_quiet = true; break; case 'f': +benchmarking_option_set = true; ttype = 3; filename = pg_strdup(optarg); if (process_file(filename) == false || *sql_files[num_files - 1] == NULL) @@ -2696,6 +2708,8 @@ main(int argc, char **argv) { char *p; + benchmarking_option_set = true; + if ((p = strchr(optarg, '=')) == NULL || p == optarg || *(p + 1) == '\0') { fprintf(stderr, "invalid variable definition: %s\n", optarg); @@ -2708,6 +2722,7 @@ main(int argc, char **argv) } break; case 'F': +initializing_option_set = true; fillfactor = atoi(optarg); if ((fillfactor < 10) || (fillfactor > 100)) { @@ -2716,6 +2731,7 @@ main(int argc, char **argv) } break; case 'M': +benchmarking_option_set = true; if (num_files > 0) { fprintf(stderr, "query mode (-M) should be specifiled before transaction scripts (-f)\n"); @@ -2731,6 +2747,7 @@ main(int argc, char **argv) } break; case 'P': +benchmarking_option_set = true; progress = atoi(optarg); if (progress <= 0) { @@ -2745,6 +2762,8 @@ main(int argc, char **argv) /* get a double from the beginning of option value */ double throttle_value = atof(optarg); + benchmarking_option_set = true; + if (throttle_value <= 0.0) { fprintf(stderr, "invalid rate limit: %s\n", optarg); @@ -2756,14 +2775,19 @@ main(int argc, char **argv) break; case 0: /* This covers long options which take no argument. */ +if (foreign_keys || unlogged_tables) + initializing_option_set = true; break; case 2:/* tablespace */ +initializing_option_set = true; tablespace = pg_strdup(optarg); break; case 3:/* index-tablespace */ +initializing_option_set = true; index_tablespace = pg_strdup(optarg); break; case 4: +benchmarking_option_set = true; sample_rate = atof(optarg); if (sample_rate <= 0.0 || sample_rate > 1.0) { @@ -2776,6 +2800,7 @@ main(int argc, char **argv) fprintf(stderr, "--aggregate-interval is not currently supported on Windows"); exit(1); #else +benchmarking_option_set = true; agg_interval = atoi(optarg); if (agg_interval <= 0)
Re: [HACKERS] pg_receivexlog add synchronous mode
> Okay, applied the patch. > > I heavily modified your patch based on the master that the refactoring > patch has been applied. Attached is that modified version. Could you > review that? Thank you for the patch. I did a review of the patch. No problem in the patch. Behavior after the true return of ProcessXLogDataMsg was changed by the patch. Although it was moving to while(1), it has changed so that a while(r != 0) loop may be continued. Since still_sending is false, although skip processing is performed, a result of operation does not change. Regards, -- Furuya Osamu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)
On Wed, Aug 6, 2014 at 10:36 PM, Peter Geoghegan wrote: > This *almost* applies to patched Postgres if you pick a benchmark that > is very sympathetic to my patch. To my surprise, work_mem = '10MB' > (which results in an external tape sort) is sometimes snapping at the > heels of a work_mem = '5GB' setting (which results in an in-memory > quicksort). Note that this was with a default temp_tablespaces setting that wrote temp files on my home partition/SSD. With a /dev/shm/ temp tablespace, tape sort edges ahead, and has a couple of hundred milliseconds on quicksort for this test case. It's actually faster. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
On 08/07/2014 10:10 AM, Mitsumasa KONDO wrote: 2014-08-07 13:47 GMT+09:00 Fujii Masao : On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas wrote: On 08/06/2014 08:39 PM, Fujii Masao wrote: The WAL files that pg_receivexlog writes will not be re-read soon basically, so we can advise the OS to release any cached pages when WAL file is closed. I feel inclined to change pg_receivexlog that way. Thought? -1. The OS should be smart enough to not thrash the cache by files that are written sequentially and never read. OS's buffer strategy is optimized for general situation. Do you forget OS hackers discussion last a half of year? Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe not, so I was thinking that posix_fadvise is called when the server closes WAL file. That's right. Well, I'd like to hear someone from the field complaining that pg_receivexlog is thrashing the cache and thus reducing the performance of some other process. Or a least a synthetic test case that demonstrates that happening. By the way, does pg_receivexlog process have fsync() in every WAL commit? It fsync's each file after finishing to write it. Ie. each WAL file is fsync'd once. If yes, I think that we need no or less fsync() option for the better performance. It is general in NOSQL storages. If no, we need fsync() option for more getting reliability and data integrarity. Hmm. An fsync=off style option might make sense, although I doubt the one fsync at end of file is causing a performance problem for anyone in practice. Haven't heard any complaints, anyway. An option to fsync after every commit record might make sense if you use pg_receivexlog with synchronous replication. Doing that would require parsing the WAL, though, to see where the commit records are. But then again, the fsync's wouldn't need to correspond to commit records. We could fsync just before we go to sleep to wait for more WAL to be received. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
Hello John, [...] In fact, the mentioned paper says this about the subject "Moreover, if worst-case performance is important, Quicksort is the wrong algorithm." I fully agree with this conclusion. -- Fabien -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A worst case for qsort
IMHO, while worst case performance is a very useful tool for analyzing algorithms (particularly their worst case time complexity), a worst case should be put in its practical context. For example, if we had reason to be concerned about *adversarial* inputs, I think that there is a good chance that our qsort() actually would be problematic to the point of driving us to prefer some generally slower alternative. ISTM that you raise two distinct questions wrt to PostgreSQL, which are, is the worst case performance really an issue: (1) in general (2) wrt adversarial inputs The answer could be (1) "mostly no" and (2) "maybe yes". It suggests that where qsort is used, the administrator wary of (2) could be allowed to use an alternate implementation, maybe some merge sort, say by tweaking a configuration option in "postgresql.conf". -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] posix_fadvise() and pg_receivexlog
Hi, 2014-08-07 13:47 GMT+09:00 Fujii Masao : > On Thu, Aug 7, 2014 at 3:59 AM, Heikki Linnakangas > wrote: > > On 08/06/2014 08:39 PM, Fujii Masao wrote: > >> The WAL files that pg_receivexlog writes will not be re-read soon > >> basically, > >> so we can advise the OS to release any cached pages when WAL file is > >> closed. I feel inclined to change pg_receivexlog that way. Thought? > > > > > > -1. The OS should be smart enough to not thrash the cache by files that > are > > written sequentially and never read. > OS's buffer strategy is optimized for general situation. Do you forget OS hackers discussion last a half of year? > Yep, the OS should be so smart, but I'm not sure if it actually is. Maybe > not, > so I was thinking that posix_fadvise is called when the server closes WAL > file. That's right. > > If we go down this path, we'd need to > > sprinkle posix_fadvises into many, many places. > Why do you aim to be perfect at the beginning? It is as same as history of postgres, your concern doesn't make sense. > > Anyway, who are we to say that they won't be re-read soon? You might e.g > > have a secondary backup site where you copy the files received by > > pg_receivexlog, as soon as they're completed. > > So whether posix_fadvise is called or not needs to be exposed as an > user-configurable option. We would need to measure how useful exposing > that is, though. By the way, does pg_receivexlog process have fsync() in every WAL commit? If yes, I think that we need no or less fsync() option for the better performance. It is general in NOSQL storages. If no, we need fsync() option for more getting reliability and data integrarity. Best regards, -- Mitsumasa KONDO
Re: [HACKERS] postgresql.auto.conf and reload
On Thu, Aug 7, 2014 at 12:28 PM, Amit Kapila wrote: > On Wed, Aug 6, 2014 at 11:39 AM, Fujii Masao wrote: >> >> On Tue, Aug 5, 2014 at 12:49 PM, Fujii Masao >> wrote: >> > On Mon, Aug 4, 2014 at 11:52 PM, Tom Lane wrote: >> >> Fujii Masao writes: >> >>> The patch chooses the last settings for every parameters and ignores >> >>> the >> >>> former settings. But I don't think that every parameters need to be >> >>> processed >> >>> this way. That is, we can change the patch so that only PGC_POSTMASTER >> >>> parameters are processed that way. The invalid settings in the >> >>> parameters >> >>> except PGC_POSTMASTER can be checked by pg_ctl reload as they are now. >> >>> Also this approach can reduce the number of comparison to choose the >> >>> last setting, i.e., "n" in O(n^2) is the number of uncommented >> >>> *PGC_POSTMASTER* >> >>> parameters (not every parameters). Thought? >> >> >> >> I don't find that to be a particularly good idea. In the first place, >> >> it introduces extra complication and a surprising difference in the >> >> behavior of different GUCs. In the second place, I thought part of the >> >> point of this patch was to suppress log messages complaining about >> >> invalid values that then weren't actually used for anything. That >> >> issue >> >> exists just as much for non-POSTMASTER variables. (IOW, "value cannot >> >> be changed now" is not the only log message we're trying to suppress.) >> > >> > Yep, sounds reasonable. This makes me think that we can suppress >> > such invalid message of the parameters which are actually not used >> > at not only conf file reload but also *postmaster startup*. That's >> > another >> > story, though. Anyway, barring any objection, I will commit Amit's >> > patch. >> >> Applied the slightly-modified version! > > Thanks. There is a commitfest entry [1] for this patch, do you > want some thing more to be addressed or shall we mark that as > committed. > > [1] > https://commitfest.postgresql.org/action/patch_view?id=1500 Yeah, let's mark this as committed because your patch has been committed and the originally-reported problem has been fixed. We are now discussing another patch, but I will add that as new CF entry. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
On Thu, Aug 7, 2014 at 12:36 PM, Amit Kapila wrote: > On Wed, Aug 6, 2014 at 1:32 PM, Fujii Masao wrote: >> On Tue, Jul 29, 2014 at 9:26 PM, Amit Kapila >> wrote: >> > >> > The reason is that during startup DataDir is not set by the time server >> > processes config file due to which we process .auto.conf file in second >> > pass. I think ideally it should ignore the invalid setting in such a >> > case >> > as it does during reload, however currently there is no good way to >> > process .auto.conf incase DataDir is not set, so we handle it >> > separately >> > in second pass. >> >> What about picking up only data_directory at the first pass? > > I think that will workout and for that I think we might need to add > few checks during ProcessConfigFile. Do you want to address it > during parsing of config file or during setting of values? I prefer "during paring". The attached patch does that. In this patch, the first call of ProcessConfigFile() picks up only data_directory. One concern of this patch is that the logic is a bit ugly. Do you have any other better idea? Regards, -- Fujii Masao *** a/src/backend/utils/misc/guc-file.l --- b/src/backend/utils/misc/guc-file.l *** *** 648,653 ParseConfigFp(FILE *fp, const char *config_file, int depth, int elevel, --- 648,672 else { /* + * Pick up only the data_directory if DataDir is not set, which + * means that the configuration file is read for the first time and + * PG_AUTOCONF_FILENAME file cannot be read yet. In this case, + * we should not pick up all the settings except the data_directory + * from postgresql.conf yet because they might be overwritten + * with the settings in PG_AUTOCONF_FILENAME file which will be + * read later. OTOH, since it's ensured that data_directory doesn't + * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten + * later. Now only the data_directory needs to be picked up to + * read PG_AUTOCONF_FILENAME file later. + */ + if (DataDir == NULL && strcmp(opt_name, "data_directory") != 0) + { + if (token == 0) + break; + continue; + } + + /* * ordinary variable, append to list. For multiple items of * same parameter, retain only which comes later. */ *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *** *** 4342,4347 SelectConfigFiles(const char *userDoption, const char *progname) --- 4342,4354 return false; } + /* + * Read the configuration file for the first time. This time only + * data_directory parameter is picked up to determine the data directory + * so that we can read PG_AUTOCONF_FILENAME file next time. Then don't + * forget to read the configuration file again later to pick up all the + * parameters. + */ ProcessConfigFile(PGC_POSTMASTER); /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers