Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
Hi, Le 26 juin 09 à 05:20, Jeremy Kerr a écrit : Unfortunately, the cases with lots of padding spaces are probably much less probable than the cases with fewer. It would be unpleasant for example if this patch resulted in a severe performance degradation for a canonical example of char(n) being used properly, such as char(2) for US state abbreviations. Yep, makes sense. The other consideration is stock-ticker symbols, I assume they may also be stored in CHAR(small n) columns. Could this optimisation only kicks in when n is big enough? I'm don't know if this part of the code knows the typmod... Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Query progress indication - an implementation
Hi all, following the link in http://wiki.postgresql.org/wiki/Query_progress_indication but mostly: http://www.postech.ac.kr/~swhwang/progress2.pdf [1] I'm trying to write an implementation of the dne method in postgresql. I added another column to the pg_stat_get_activity function to report the percentage of work done for the query (of course, any other method could be used... the way the percentage is reported to the user can be easily changed). I attached a first patch (just to see if anyone is interested, the work is by no means finished). I guess I did a lot of mistakes, since I don't know anything about postgresql code... 1) the progress indicator can be eliminated at runtime; this could be done with another runtime flag (at the moment is always on) 2) I added a new structure (Progress) to PlanState to keep all the info about execution progress 3) I needed a pointer to the root of the PlanStates, to be able to calculate the total progress of the query tree (I bet this pointer was already available somewhere, but I couldn't find where...) 4) sub-plans are not included yet (well, just to be honest, I don't really know what postgresql means with those... :) ) 5) the percentage is updated at most every second (can be easily changed) 6) the methods to adjust upper/lower bounds in [1] are not implemented yet (but that shouldn't be a problem) 7) the spilled tuples handling in [1] is not supported yet 8) only hash join, loop join, aggregate, sequence scans are implemented at the moment 9) I added another flag (EXEC_FLAG_DRIVER_BRANCH) in executor.h to signal to the sub-nodes if they are part of a branch that will contain a driver node (for example, inner subtree of a Nested Loops join is not a driver branch). I guess this could be done better at Plan level (instead of PlanState), but this way less code has to be changed 10) at the moment all driver nodes have the same work_per_tuple=1, but this could be changed (for example, CPU-intensive driver nodes could have a smaller work_per_tuple value) Well, some (very early) tests on a tpcd db showed it works as expected (well, I only did very few tests...) Hope someone is interested Index: src/include/pgstat.h === RCS file: /projects/cvsroot/pgsql/src/include/pgstat.h,v retrieving revision 1.83 diff -r1.83 pgstat.h 568a569,571 /* current percentage of progress */ float st_progress_perc; 646a650,651 extern void pgstat_report_progress_percentage(double perc); Index: src/backend/executor/Makefile === RCS file: /projects/cvsroot/pgsql/src/backend/executor/Makefile,v retrieving revision 1.29 diff -r1.29 Makefile 25c25 nodeWindowAgg.o tstoreReceiver.o spi.o --- nodeWindowAgg.o tstoreReceiver.o spi.o progress.o Index: src/backend/executor/execProcnode.c === RCS file: /projects/cvsroot/pgsql/src/backend/executor/execProcnode.c,v retrieving revision 1.65 diff -r1.65 execProcnode.c 109a110,111 #include executor/progress.h #include pgstat.h 111a114 void ProgressUpdate(PlanState* node, double* tot_operations_expected, double* tot_operations_so_far); 132a136 boolis_driver_node_candidate = false; 175a180,181 is_driver_node_candidate = true; 261a268 is_driver_node_candidate = true; 314a322,325 /* Set up progress info for this node if requested */ if (result-state-es_progress) ProgressSetInfo(result, node, eflags, is_driver_node_candidate); 328a340,343 struct timeval t; double tot_operations_expected = 0; double tot_operations_so_far = 0; 462a478,493 // progress calcs (only if required) if (node-state-es_progress node-progress != NULL node-progress-is_driver_node) { node-progress-operations_so_far++; gettimeofday(t, NULL); if (t.tv_sec node-state-es_progress_last_update.tv_sec) { ProgressUpdate(node-state-es_root_planstate, tot_operations_expected, tot_operations_so_far); if (tot_operations_expected != 0) { pgstat_report_progress_percentage(tot_operations_so_far*100/tot_operations_expected); node-state-es_progress_last_update = t; } } } 466a498,524 void ProgressUpdate(PlanState* node, double* tot_operations_expected, double* tot_operations_so_far) { // TODO here a switch (nodeTag(node)) is needed in case we want upper/lower limit update if (node-progress-is_driver_node) { *tot_operations_expected += node-progress-lower_bound; *tot_operations_so_far +=
[HACKERS] Proposal: More portable way to support 64bit platforms
Proposal: More portable way to support 64bit platforms Short description: Current PostgreSQL implementation has some portability issues to support 64bit platforms: pointer calculations using long is not portable, for example on Windows x64 platform. We propose to use intptr_t instead of long, which appears in in C99. Details: intptr_t is defined in stdint.h. configure script already has HAVE_STDINT_H but never uses it. This needs to be enabled. Please note that Windows/VC++ defines intptr_t in crtdefs.h. Included is a conceptual patch to use intptr_t. Comments are welcome. Some notes for the patches: access/common/heaptuple.c: Casting using (long) is removed. It is no more necessary if we introduce intptr_t. include/c.h: Many Alignment macros which use long are rewritten to use intrptr_t. The patches is against PostgreSQL 8.4beta2. Regression test passed. Windows x64 is ok even with shared_buffers = 3000MB. Tested platforms are as follows: Windows Server 2008 SP1 x64+ Visual Studio 2005 RHEL 4 x86_64 + gcc 3.4.6 FreeBSD 7.1 i386 + gcc 4.2.1 TODO: Some problems may occur on older platforms, which do not have stdint.h. In this case we need to add something like below to include/port/*.h. /* LP64, IPL64, ILP32, LP32 */ typedef long intptr_t; typedef unsigned long uintptr_t; /* LLP64 */ typedef long long intptr_t; typedef unsigned long long uintptr_t; Thanks, -- Tsutomu Yamada // tsut...@sraoss.co.jp SRA OSS, Inc. Japan diff -cwbr postgresql-8.4beta2-orig/src/backend/access/common/heaptuple.c postgresql-8.4beta2-winx64/src/backend/access/common/heaptuple.c *** postgresql-8.4beta2-orig/src/backend/access/common/heaptuple.c 2009-03-30 13:08:43.0 +0900 --- postgresql-8.4beta2-winx64/src/backend/access/common/heaptuple.c 2009-06-19 16:26:01.0 +0900 *** *** 192,198 if (att[i]-attbyval) { /* pass-by-value */ ! data = (char *) att_align_nominal((long) data, att[i]-attalign); store_att_byval(data, values[i], att[i]-attlen); data_length = att[i]-attlen; } --- 192,198 if (att[i]-attbyval) { /* pass-by-value */ !data = (char *) att_align_nominal(data, att[i]-attalign); store_att_byval(data, values[i], att[i]-attlen); data_length = att[i]-attlen; } *** *** 226,232 else { /* full 4-byte header varlena */ ! data = (char *) att_align_nominal((long) data, att[i]-attalign); data_length = VARSIZE(val); memcpy(data, val, data_length); --- 226,232 else { /* full 4-byte header varlena */ ! data = (char *) att_align_nominal(data, att[i]-attalign); data_length = VARSIZE(val); memcpy(data, val, data_length); *** *** 243,249 else { /* fixed-length pass-by-reference */ ! data = (char *) att_align_nominal((long) data, att[i]-attalign); Assert(att[i]-attlen 0); data_length = att[i]-attlen; memcpy(data, DatumGetPointer(values[i]), data_length); --- 243,249 else { /* fixed-length pass-by-reference */ ! data = (char *) att_align_nominal(data, att[i]-attalign); Assert(att[i]-attlen 0); data_length = att[i]-attlen; memcpy(data, DatumGetPointer(values[i]), data_length); diff -cwbr postgresql-8.4beta2-orig/src/backend/access/hash/hashfunc.c postgresql-8.4beta2-winx64/src/backend/access/hash/hashfunc.c *** postgresql-8.4beta2-orig/src/backend/access/hash/hashfunc.c 2009-02-10 06:18:28.0 +0900 --- postgresql-8.4beta2-winx64/src/backend/access/hash/hashfunc.c 2009-06-18 22:37:46.0 +0900 *** *** 319,325 a = b = c = 0x9e3779b9 + len + 3923095; /* If the source pointer is word-aligned, we use word-wide fetches */ ! if (((long) k UINT32_ALIGN_MASK) == 0) { /* Code path for aligned source data */ register const uint32 *ka = (const uint32 *) k; --- 319,325 a
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
* Dimitri Fontaine (dfonta...@hi-media.com) wrote: Le 26 juin 09 à 05:20, Jeremy Kerr a écrit : Unfortunately, the cases with lots of padding spaces are probably much less probable than the cases with fewer. It would be unpleasant for example if this patch resulted in a severe performance degradation for a canonical example of char(n) being used properly, such as char(2) for US state abbreviations. Yep, makes sense. The other consideration is stock-ticker symbols, I assume they may also be stored in CHAR(small n) columns. Could this optimisation only kicks in when n is big enough? I'm don't know if this part of the code knows the typmod... Is it just the size that matters, or is it when there are few spaces at the end? We do know the overall length, but I didn't see a definite that if it's larger than X words, doing the by-word comparison is a win regardless of how many actual spaces are at the end (apologies to Jeremy if it's in his more detailed report, I havn't had a chance to look yet). Thanks, Stpehen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
Stephen, Is it just the size that matters, or is it when there are few spaces at the end? It's the number of spaces at the end. If we knew this number, then we wouldn't have to do any comparisons at all :) Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
* Jeremy Kerr (j...@ozlabs.org) wrote: Is it just the size that matters, or is it when there are few spaces at the end? It's the number of spaces at the end. If we knew this number, then we wouldn't have to do any comparisons at all :) I meant in terms of affecting the performance of this function.. We know the total length of the string, including spaces, coming into the function. If the updated function is always faster when the overall string is at least, say, 16 characters long, then Dimitri's suggestion to just check the overall length coming in and decide which approach to use might make sense. If the new function is always slower, regardless of overall string length, when there's only 1 extra space at the end, then chances are it's not worth it. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
Stephen, If the updated function is always faster when the overall string is at least, say, 16 characters long, But that's not the case - the cost of the function (and the speedup from the previous version) depends on the number of spaces that there are at the end. For the new function to be faster, we need to know that there are more than 6 (on average, depending on alignment) trailing spaces. The number of non-space characters in the string won't affect performance (other than that there is probably a correlation between total string length and number of spaces, but we can't be certain of that). If the new function is always slower, regardless of overall string length, when there's only 1 extra space at the end Yes, that is correct. Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] gettext version problem exposed by buildfarm failures on canary
I see buildfarm member canary (NetBSD 1.6) was recently resurrected, but is failing to build HEAD, because msgfmt fails like this: po/es.po:8039: keyword msgid_plural unknown po/es.po:8036: missing `msgstr' section po/es.po:8039: parse error This indicates (1) canary needs a newer version of gettext, and (2) our configure test for a sufficiently new gettext installation is not in fact correct, because it didn't complain. I asked about the configure issue some time ago, and was told that the existing test for bind_textdomain_codeset() in libintl was sufficient because that was introduced after ngettext(). It could be that the library's history is different on NetBSD, or perhaps canary has a recent libintl.so and a not so recent msgfmt. I doubt it's worth trying to fix this before 8.4 release, but it should get looked into eventually. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
Jeremy Kerr j...@ozlabs.org writes: Stephen, If the updated function is always faster when the overall string is at least, say, 16 characters long, But that's not the case - the cost of the function (and the speedup from the previous version) depends on the number of spaces that there are at the end. Right, but there are certainly not more spaces than there are string characters ;-) I think Dimitri's idea is eminently worth trying. In a string of less than, say, 16 bytes, the prospects of being able to win anything get much smaller compared to the prospects of wasting the extra loop overhead. There is also a DBA psychology angle to it. If you've got CHAR(n) for very small n, it's likely that the type is being used in the canonical fashion and there won't be many trailing blanks. The case where we can hope to win is where we have CHAR(255) or some other plucked-from-the-air limit. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query cancel issues in contrib/dblink
On Thu, Jun 25, 2009 at 10:41 PM, Itagaki Takahiroitagaki.takah...@oss.ntt.co.jp wrote: Hi, contrib/dblink seems to have no treatments for query cancels. It causes the following issues: (1) Users need to wait for completion of remote query. Requests for query cancel won't be delivered to remote servers. (2) PGresult objects will be memory leak. The result is not released when query is cancelled; it is released only when dblink function is called max_calls times. They are long standing issues (not only in 8.4), but I hope we will fix them to make dblink more robust. For (1), asynchronous libpq functions should be used instead of blocking ones, and wait for the remote query using a loop with CHECK_FOR_INTERRUPTS(). How would you structure this loop exactly? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: More portable way to support 64bit platforms
On Friday 26 June 2009 12:07:24 Tsutomu Yamada wrote: Proposal: More portable way to support 64bit platforms Short description: Current PostgreSQL implementation has some portability issues to support 64bit platforms: pointer calculations using long is not portable, for example on Windows x64 platform. We propose to use intptr_t instead of long, which appears in in C99. This makes sense. You can also review the archives for previous iterations of this discussion (search for intptr_t). You might want to add your patch to the next commit fest. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettext version problem exposed by buildfarm failures on canary
On Fri, June 26, 2009 9:34 am, Tom Lane wrote: I see buildfarm member canary (NetBSD 1.6) was recently resurrected, but is failing to build HEAD, because msgfmt fails like this: po/es.po:8039: keyword msgid_plural unknown po/es.po:8036: missing `msgstr' section po/es.po:8039: parse error This indicates (1) canary needs a newer version of gettext, and (2) our configure test for a sufficiently new gettext installation is not in fact correct, because it didn't complain. I asked about the configure issue some time ago, and was told that the existing test for bind_textdomain_codeset() in libintl was sufficient because that was introduced after ngettext(). It could be that the library's history is different on NetBSD, or perhaps canary has a recent libintl.so and a not so recent msgfmt. I doubt it's worth trying to fix this before 8.4 release, but it should get looked into eventually. yes ... but presumably we don't want canary upgraded until we fix this, unless we get some other test platform for it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettext version problem exposed by buildfarm failures on canary
Andrew Dunstan and...@dunslane.net writes: On Fri, June 26, 2009 9:34 am, Tom Lane wrote: I doubt it's worth trying to fix this before 8.4 release, but it should get looked into eventually. yes ... but presumably we don't want canary upgraded until we fix this, unless we get some other test platform for it. What I'd suggest Kris do is temporarily disable the NLS option on that build, so we can get at least some useful testing from the machine. After a proposed configure fix has been installed, re-enable NLS, verify configure fails as expected, *then* upgrade msgfmt. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
Le 26 juin 09 à 14:47, Jeremy Kerr a écrit : For the new function to be faster, we need to know that there are more than 6 (on average, depending on alignment) trailing spaces. It's becoming somewhat tricky, but maybe the test to do for the optimisation to get used is n = threshold str[n-6] == 0x20, àla Boyer/Moore? I call it tricky because you could have a space here which isn't followed by spaces, but still, if it's not a space here, you're saying we should not even try the optimisation. -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Fri, Jun 26, 2009 at 05:03:11PM +0200, Dimitri Fontaine wrote: Le 26 juin 09 à 14:47, Jeremy Kerr a écrit : For the new function to be faster, we need to know that there are more than 6 (on average, depending on alignment) trailing spaces. It's becoming somewhat tricky, but maybe the test to do for the optimisation to get used is n = threshold str[n-6] == 0x20, àla Boyer/Moore? That's cute. What about comparing the last aligned word which completely fits in the buffer? Something along the lines of (assuming four-byte words) * (int*) (4 * ((int) buf[0]) / 4) (now that's an ugly one, but you know what I mean?) Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFKROlCBcgs9XrR2kYRArHPAJ9VhT+RfK5/5BxwA0nxaOmK4nfuWACdFtFL iKtvPaZY/KhDJMOf4hyzmQI= =yd05 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
to...@tuxteam.de writes: On Fri, Jun 26, 2009 at 05:03:11PM +0200, Dimitri Fontaine wrote: It's becoming somewhat tricky, but maybe the test to do for the optimisation to get used is n = threshold str[n-6] == 0x20, Ã la Boyer/Moore? That's cute. What about comparing the last aligned word which completely fits in the buffer? Something along the lines of (assuming four-byte words) * (int*) (4 * ((int) buf[0]) / 4) We're trying to avoid adding cycles to the optimization-is-useless case. The more expensive this test gets, the slower the unoptimizable case becomes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] backend: compare word-at-a-time in bcTruelen
On Fri, June 26, 2009 11:39 am, Tom Lane wrote: to...@tuxteam.de writes: On Fri, Jun 26, 2009 at 05:03:11PM +0200, Dimitri Fontaine wrote: It's becoming somewhat tricky, but maybe the test to do for the optimisation to get used is n = threshold str[n-6] == 0x20, Ã la Boyer/Moore? That's cute. What about comparing the last aligned word which completely fits in the buffer? Something along the lines of (assuming four-byte words) * (int*) (4 * ((int) buf[0]) / 4) We're trying to avoid adding cycles to the optimization-is-useless case. The more expensive this test gets, the slower the unoptimizable case becomes. Yeah. Like you, I like the idea of a switch based on string length. I would suggest a cutoff of something like 36 (length of the string representation of a UUID). But maybe that will miss lots of optimisable cases like address fields. I guess those people should really be using varchar(n) anyway. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettext version problem exposed by buildfarm failures on canary
Tom Lane wrote: I see buildfarm member canary (NetBSD 1.6) was recently resurrected, but is failing to build HEAD, because msgfmt fails like this: po/es.po:8039: keyword msgid_plural unknown po/es.po:8036: missing `msgstr' section po/es.po:8039: parse error This indicates (1) canary needs a newer version of gettext, and (2) our configure test for a sufficiently new gettext installation is not in fact correct, because it didn't complain. NetBSD has an odd mix of a BSD libintl and GNU msgfmt, so the library provides both ngettext and bind_textdomain_codeset, but that has no implications about what msgfmt supports. pgf...@netbsd:~$ msgfmt --version msgfmt (GNU gettext) 0.10.35 Also, the man page for libintl says, bind_textdomain_codeset() does not work at this moment (always fail). So perhaps this platform is just a lost cause. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gettext version problem exposed by buildfarm failures on canary
Tom Lane wrote: What I'd suggest Kris do is temporarily disable the NLS option on that build, so we can get at least some useful testing from the machine. After a proposed configure fix has been installed, re-enable NLS, verify configure fails as expected, *then* upgrade msgfmt. I disabled NLS, got a clean test run, and have now re-enabled nls so that people can test things if they care. Kris Jurka -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Join optimization for inheritance tables
Hi all, We are working with Aster for the summer and we would like to bounce some ideas that we are having for some possible PostgreSQL extensions. In order to describe our ideas we will use the following example: create table msg( msg_id int, msg text ); create table receiver( msg_id int, user_id int, ts timestamp ); create table msg_100( check ( 1 = msg_id and msg_id 100 ) ) inherits (msg); create table msg_200( check ( 100 = msg_id and msg_id 200 ) ) inherits (msg); create table msg_300( check ( 200 = msg_id and msg_id 300 ) ) inherits (msg); create table receiver_100( check ( 1 = msg_id and msg_id 100 ) ) inherits (receiver); create table receiver_200( check ( 100 = msg_id and msg_id 200 ) ) inherits (receiver); create table receiver_300( check ( 200 = msg_id and msg_id 300 ) ) inherits (receiver); When we are issuing queries on one of the parent tables, like, SELECT * FROM msg WHERE msg_id BETWEEN 50 AND 70; PostgreSQL is smart enough to filter out child tables with check constraints that are refuted by the filter conditions. In this example, the optimizer will pick a plan that only considers the parent table 'msg' and one of the child tables 'msg_100': Result - Append - Seq Scan on msg Filter: ((msg_id = 50) AND (msg_id = 70)) - Seq Scan on msg_100 msg Filter: ((msg_id = 50) AND (msg_id = 70)) Plan costs are removed for simplicity of the presentation. Now, if we issue a join query between the two parent tables, like, SELECT * FROM msg m JOIN receiver r ON m.msg_id = r.msg_id; the execution plan will be: Merge Join Merge Cond: (m.msg_id = r.msg_id) - Sort Sort Key: m.msg_id - Append - Seq Scan on msg m - Seq Scan on msg_100 m - Seq Scan on msg_200 m - Seq Scan on msg_300 m - Sort Sort Key: r.msg_id - Append - Seq Scan on receiver r - Seq Scan on receiver_100 r - Seq Scan on receiver_200 r - Seq Scan on receiver_300 r During the planning phase, the optimizer treats an entire hierarchy as a single entity. Hence, it first considers the most efficient way to create the append paths for the two hierarchies, and then the best way to join them. However, there are some optimizations that are possible here, similar to the table filtering described above. In particular, instead of joining the two appends, we could push down the join to the child relations - that is, create pairwise joins between the children and then append the join results together. Based on the check conditions of the children and the join predicate, it is possible to filter out joins that cannot produce any results. For example, joining 'msg_100' with 'receiver_300' is redundant since the check constraints of these two tables do not overlap. Tuples in 'msg_100' have 'msg_id' between 1 and 100, whereas tuples in 'receiver_300' have 'msg_id' between 200 and 300. Therefore, no tuples can be produce from this join. A plan with such optimizations could be: Result - Append - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg msg - Hash - Seq Scan on receiver receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg msg - Hash - Seq Scan on receiver_100 receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg msg - Hash - Seq Scan on receiver_200 receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg msg - Hash - Seq Scan on receiver_300 receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg_100 msg - Hash - Seq Scan on receiver receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg_100 msg - Hash - Seq Scan on receiver_100 receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg_200 msg - Hash - Seq Scan on receiver receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg_200 msg - Hash - Seq Scan on receiver_200 receiver - Hash Join Hash Cond: (msg.msg_id = receiver.msg_id) - Seq Scan on msg_300 msg - Hash - Seq Scan on
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: The checkpoint_segments seems dramatic enough to be real. I wonder if the test is short enough that it never got around to re-using any of them, so it was doing extra writes for the initial creation during the test? That's exactly what I was about to suggest. Are you starting each run from a fresh initdb? If so, try running the load long enough that the number of WAL files stabilizes (should happen at 2x checkpoint_segments) and then start the test measurement. default conf (xlogs not populated) real3m49.604s real3m47.225s real3m45.831s default conf (xlogs populated) real3m45.603s real3m45.284s real3m45.906s default conf + checkpoint_segments = 100 (xlogs not populated) real4m27.629s real4m24.496s real4m22.832s default conf + checkpoint_segments = 100 (xlogs populated) real3m52.746s real3m52.619s real3m50.418s I used ten times the number of rows, to get more meaningful results. To get the populated times, I just dropped the target table and created it again; otherwise identical runs. Clearly, pre-populating the xlog files reduces run time, especially for a large number of xlog files; however, I still got better performance with a smaller set of xlog files. Regarding the fact that even with the xlog files pre-populated, the smaller set of xlog files is faster: I'm only guessing, but I suspect the battery backed RAID controller is what's defeating conventional wisdom here. By writing to the same, relatively small, set of xlog files repeatedly, some of the actual disk writes probably evaporate in the BBU cache. More frequent checkpoints from the smaller number of xlog files might also have caused data to start streaming to the disk a little sooner, minimizing write gluts later. I've often seen similar benefits to the BBU cache which cause some of the frequently-given advice here to have no discernible affect or be counter-productive in our environment. (I know that some doubted that my aggressive background writer settings didn't increase disk writes, but I couldn't even measure a difference there in the writes from OS cache to the controller cache, much less anything which indicated it actually increased physical disk writes.) By the way, the number of xlog files seemed to always go to two above 2x checkpoint_segments. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Kevin Grittner kevin.gritt...@wicourts.gov writes: Regarding the fact that even with the xlog files pre-populated, the smaller set of xlog files is faster: I'm only guessing, but I suspect the battery backed RAID controller is what's defeating conventional wisdom here. By writing to the same, relatively small, set of xlog files repeatedly, some of the actual disk writes probably evaporate in the BBU cache. Yeah, sounds plausible. How big is your BBU cache? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
On Fri, June 26, 2009 4:13 pm, Kevin Grittner wrote: By the way, the number of xlog files seemed to always go to two above 2x checkpoint_segments. The docs say: There will always be at least one WAL segment file, and will normally not be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 files. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.4 open item: copy performance regression?
Tom Lane t...@sss.pgh.pa.us wrote: How big is your BBU cache? On this machine, I guess it is 512MB. (Possibly 1GB, but I'm having trouble finding the right incantation to check it at the moment, so I'm going by what the hardware tech remembers.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] [v8.5] Security checks on largeobjects
--On 26. Juni 2009 13:08:37 +0900 KaiGai Kohei kai...@ak.jp.nec.com wrote: * Is the named large object (including fully qualified one) worth? It will enables us to specify a largeobject with human readable identifier string. I don't understand the notion of this. Does this mean you can create a LO with an identifier string, generated from (e.g.) your application? -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Join optimization for inheritance tables
Nedyalko Borisov nedya...@asterdata.com writes: In summary, we are making two suggestions: 1. Extend the optimizer to consider joins between child tables when hierarchies are joined together. We already handle this for the case where the join is nestloop with inner index scan, and I'm not convinced that there's any real gain to be had for other join types. 2. Add the Empty Check Constraint, which would enforce that a particular table is to remain empty. The trouble with that is that a constraint that doesn't propagate to its child tables is a weird beast that I'd just as soon not invent. We are currently thinking about inventing an explicit notion of partitioned tables. If we had that, it would be reasonable to have a special kind of parent table for a partitioned set and refuse to allow any data in that relation. But I'm not excited about contorting the general constraint mechanism in the way that would be necessary to express this as a constraint. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers