Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries
Kevin Brown wrote: Let me expand a little on some of the peculiarities of shared libraries on AIX: - A normal AIX shared library is called libXX.a It is an 'ar' archive that contains the shared object(s). Ah, so the problem really boils down to funny naming conventions. If they use .a for both shared and static libraries, how does anyone tell the difference? It sounds to me like there is no difference. Notice how his example ldd output shows dependencies on specific .o entries within the various .a files that reside on the system, rather than on the .a files as a whole. If those entries had been statically linked then they wouldn't have shown up in the ldd output at all. That is not entirely true. The difference between a static and a shared library on AIX is that the *.o files in a dynamic library are dynamic objects, produced by the linker (what is called *.so in Linux), and the *.o files in a static library are the output of the compiler (what is called *.o in Linux). What IS true is that you can do a static build against a dynamic library. Against a static library you can only do static builds. Yours, Laurenz Albe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Lock partitions
On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote: Mark Wong [EMAIL PROTECTED] writes: Tom Lane wrote: It would be nice to see some results from the OSDL tests with, say, 4, 8, and 16 lock partitions before we forget about the point though. Anybody know whether OSDL is in a position to run tests for us? Yeah, I can run some dbt2 tests in the lab. I'll get started on it. We're still a little bit away from getting the automated testing for PostgreSQL going again though. Great, thanks. The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in src/include/storage/lwlock.h. You need a full backend recompile after changing it, but you shouldn't need to initdb, if that helps. IIRC we did that already and the answer was 16... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Optimizer improvements: to do or not to do?
On Mon, 2006-09-11 at 06:20 -0700, Say42 wrote: I intend to play with some optimizer aspects. Just for fun. Cool. If you think its fun (it is), you're half way there. I'm a novice in the DBMS development so I can not promise any available results but if it can be useful even as yet another failed attempt I will try. This type of work is 90% analysis, 10% coding. You'll need to do a lot of investigation, lots of discussion and listening. That's what I want to do: 1. Replace not very useful indexCorrelation with indexClustering. An opinion such as not very useful isn't considered sufficient explanation or justification for a change around here. 2. Consider caching of inner table in a nested loops join during estimation total cost of the join. More details: 1. During analyze we have sample rows. For every N-th sample row we can scan indices on qual like 'value = index_first_column' and fetch first N row TIDs. To estimate count of fetched heap pages is not hard. To take the index clustering value just divide the pages count by the sample rows count. 2. It's more-more harder and may be impossible to me at all. The main ideas: - split page fetches cost and CPU cost into different variables and don't summarize it before join estimation. - final path cost estimation should be done in the join cost estimation and take into account number of inner table access (=K). CPU cost is directly proportionate to K but page fetches can be estimated by Mackert and Lohman formula using the total tuples count (K * inner_table_selectivity * inner_table_total_tuples). I'd work on one thing at a time and go into it deeply. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Fixed length data types issue
On Mon, 2006-09-11 at 14:25 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Is this an 8.2 thing? You are joking, no? Confirming, using an open question, and a smile. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Buildfarm vs. Linux Distro classification
Alvaro Herrera [EMAIL PROTECTED] writes: Christopher Browne wrote: It seems to me that there is some value in putting together a script that tries to identify some of the interesting bits of the toolchain. Yeah; but why not just a bunch of commands, some of which are expected to work on any particular machine, and save the whole output as a single string? It's not very clean, but should get the important details. To support a new machine, just add more commands to the script. A simple version of this, based on your Mark 0, could be: uname -a $CC --version $CC -V $CC -v ls -l /lib/libc.so* No need to comment/uncomment anything. I would have said ldd postgres would work on any ELF system and show you all the library so versions it depends on. I guess that only helps if it actually builds and then fails the regression tests -- not if the build fails. On Debian it would be useful to do something like below. Though note that a) this depends on having a postgres package installed which the build machines may not have and b) it shows the libraries that package depends on not the versions of the *-dev packages installed. [EMAIL PROTECTED]:~$ reportbug --offline --template postgresql-8.1 2/dev/null | sed '1,/^-- System Information/d' Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.17.7-swsusp2 Locale: LANG=en_GB, LC_CTYPE=en_GB (charmap=ISO-8859-1) Versions of packages postgresql-8.1 depends on: ii libc62.3.6.ds1-4 GNU C Library: Shared libraries ii libcomerr2 1.39-1 common error description library ii libkrb53 1.4.3-8 MIT Kerberos runtime libraries ii libpam0g 0.79-3.1Pluggable Authentication Modules l ii libpq4 8.1.4-5 PostgreSQL C client library ii libssl0.9.8 0.9.8b-2SSL shared libraries ii postgresql-client-8.18.1.4-4 front-end programs for PostgreSQL ii postgresql-common57 manager for PostgreSQL database cl postgresql-8.1 recommends no packages. -- no debconf information -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Bitmap index status
Hi, What's the status of the bitmap index patch? Have you worked on it since the last posted patch (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)? I've started to review it, to get it into CVS early in the 8.3 cycle. I just want to make sure that I'm working on the latest version. Beside the issues already discussed, I found two minor bugs: * pg_am says that bitmap am supports unique indexes, while it doesn't. Second, * race condition in _bitmap_inserttuple if two backends try to insert the same, new value. If they both find that there's no lov item for the key, and try to create one, one backend will get a duplicate key error on the lov index. Also, vacuum actually does a reindex, which seems awfully wasteful. That needs to be looked at. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Optimizer improvements: to do or not to do?
Simon Riggs [EMAIL PROTECTED] writes: That's what I want to do: 1. Replace not very useful indexCorrelation with indexClustering. An opinion such as not very useful isn't considered sufficient explanation or justification for a change around here. There's been some previous discussion about how correlation was not really what we wanted to be measuring. But that discussion was in regards to cross-column correlation. In that case we're trying to predict how selective a clause will be. If we read x% of the table due to a restriction on X what percentage of the values of Y will be represented? In this case I think we do need to know correlation or something like it. That's because what we're trying to predict is how close to sequential the i/o accesses will be. If there's no correlation between index order and disk order then they'll be random. If they're highly correlated then accesses will be close to sequential. It's possible there's some sort of block-wise correlated measure which would be even better for our needs. We don't care if all the high values are towards the start and low values towards the end as long as each section is in order, for example. It's also possible that we could use something like what you describe to predict how many physical i/os will happen altogether. If the table is highly clustered but disordered then the io will be random access but the cache will be more effective than if the table is highly correlated but not clustered (though it would take a large table to make that possible I think). In short I think what's needed is someone to review a lot of different stats metrics for correlation and clustering and do some analysis of how each would be useful for cost modelling. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Fixed length data types issue
Alvaro Herrera [EMAIL PROTECTED] writes: Gregory Stark wrote: Well char doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fixing char semantics to match CHAR(1) or adding a separate type instead. What semantics? The main bit that comes to mind is 32::CHAR(1) give you '3' but 32::char gives you ' '. Really it makes more sense if you think of char is a 1 byte integer type with some extra text casts and operators to make C programmers happy, not a 1 byte character type. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
Tom Lane wrote: Stefan Kaltenbrunner [EMAIL PROTECTED] writes: btw - the hashjoin is bad was more or less based on the observation that nearly all of the cpu is burned in hash-related functions in the profile (when profiling over a longer period of time those accumulate even more % of the time than in the short profile I included in the original report) [ shrug... ] Two out of the three functions you mentioned are not used by hash join, and anyway the other plan probably has a comparable execution density in sort-related functions; does that make it bad? hmm sorry for that - I should have checked the source before I made that assumption :-( It's possible that the large time for ExecScanHashBucket has something to do with skewed usage of the hash buckets due to an unfortunate data distribution, but that's theorizing far in advance of the data. http://www.kaltenbrunner.cc/files/4/ has preliminary data of the dbt3/scaling 10 run I did which seems to imply we have at least 4 queries in there that take an excessive amount of time (query 5 is the one I started the complaint with). However those results have to be taken with a graint of salt since there is an appearant bug in the dbt3 code which seems to rely on add_missing_from=on (as can be seen in some of the errorlogs of the database) and towards the end of the throughput run I did some of the explain analyzes for the report (those are the small 100% spikes in the graph due to the box using the second CPU to run them). I will redo those tests later this week though ... Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Optimizer improvements: to do or not to do?
Simon Riggs wrote: This type of work is 90% analysis, 10% coding. You'll need to do a lot of investigation, lots of discussion and listening. I absolutely agree with you and I am not about to rush into coding right now. First of all I'm going to dig a lot in the PG sources, readme's and so on. It's a good school of coding and DBMS internals understanding. That's what I want to do: 1. Replace not very useful indexCorrelation with indexClustering. An opinion such as not very useful isn't considered sufficient explanation or justification for a change around here. Sometimes the indexCorrelation even wrongful. There are many examples of overestimation of index scan cost (data well-clustered but not ordered - correlation is low) and some cases of underestimation when tuples look like well ordered with high degree of correlation, but index scan actually causes random page fetches (1-3-2-4-6-5, for example. On server without RAID it is VERY slow. 25 times slower than bitmap index scan). If we have special clustering measure we can more precisely estimate pages count. The next step could be to introduce 'ordering' as a measure of pages access sequentiality. Without the 'ordering' all we can assume that pages are fetched in random order. Anyhow, if index access cost is overestimated we can set random_page_cost=2. (Is it true in a production database with smart RAID?) Moreover, I think problem is more complex. With assumption that index access is always random we dip in another problem: overestimation of master table index scan. If it is small enough PG can choose seq scan instead of index scan even if the last one actually much cheaper because of caching. That is why caching should be taking into account during joining cost calculation. 2. Consider caching of inner table in a nested loops join during estimation total cost of the join. I'd work on one thing at a time and go into it deeply. Good news. So I'm very interested in what you think about my ideas. Is it wrong or too naive? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries
Tom Lane wrote: In our case, we have libpq.a and libpq.so in the same directory, so unless you link with -brtl you will get a static link (because libpq.a is a static library). I wonder whether we ought to suppress building (or at least installing) our .a libraries at all on AIX. Adding -btrl to LDFLAGS would help within the context of our own build, but external clients that link to libpq without saying that are going to get undesirable results. I think there's a reasonable argument that by installing a .a file that isn't a shared library, we are violating the platform's conventions. The natural way in AIX would be: - Create libpq.so - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so' - Install only libpq.a For a static build on AIX, you have to specify all the libraries and give the linker -bstatic and -bI:/lib/syscalls.exp Should -brtl be added to src/template/aix? Sounds that way, but that'll only help for psql and other stuff built within our build. Could you try this against CVS tip: * add -brtl to LDFLAGS in the template * Remove the AIX-specific hack on $(libpq) at lines 349-354 of src/Makefile.global.in * see if it configures and builds I have done that (see the attached patch) and it works fine. I don't have the native AIX C compiler, so I could only test it with gcc. I have taken the liberty to modify the static link line in Makefile.global.in to contain the LDAP libraries, I hope that's appropriate. Yours, Laurenz Albe aix.link.patch Description: aix.link.patch ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] dump / restore functionality
At risk of being chastised for reviving old issues, I was wondering, what are the chances were of getting the dump / restore selectivity into 8.2 ? I am referring to the idea that, instead of the current 2 parts, a dump could be broken up into 3 parts, namely tables, data and everything else, so that data from one dump could be mixed and matched with schema defs from another dump easily and scriptably. I think the previous discussion concluded that the functionality would be best implemented as a selective restore, rather than a breakable dump due to the risk of inconsistent restores, so you could restore just the tables, data or everything else components from a given dump. Did this item make it onto the to-do list? If so, did anyone pick this up or will I be waiting until a future as-yet-undefined date? More generally, is there a publicly accessible place one can see the to-do items, who has adopted which ones and what the status is on them? Sorry for asking this, but I am still a rather new participant in here. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Lock partitions
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote: Great, thanks. The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in src/include/storage/lwlock.h. You need a full backend recompile after changing it, but you shouldn't need to initdb, if that helps. IIRC we did that already and the answer was 16... No, no one has shown me any numbers from any real tests (anything more than pgbench on a Dell PC ...). regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
AIX shared libraries (was Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries)
Albe Laurenz [EMAIL PROTECTED] writes: Tom Lane wrote: I think there's a reasonable argument that by installing a .a file that isn't a shared library, we are violating the platform's conventions. The natural way in AIX would be: - Create libpq.so - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so' - Install only libpq.a Hm. This seems possible with some moderate hacking on Makefile.shlib (certainly it'd be no more invasive than the existing Windows-specific platform variants). However, looking at what's already in Makefile.shlib for AIX makes me doubt the above claim a bit, because AFAICS libpq.so is produced from libpq.a on that platform. Is it possible that the rules have changed across AIX versions, and that the code in there now is needful for older versions? Another issue with installing only .a is that there's no provision for versioning in .a library names ... what happens to someone who needs two generations of libpq on his machine? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] currentItemData currentMarkData
It has bothered me for some time that we have currentItemData and currentMarkData fields in IndexScanDescData, while it's really indexam's private data. They should be in the indexam opaque structure. At the moment, they're used in gist and hash access methods, and they're not used in b-tree and gin. The new bitmap index code uses them, but only as an internal boolean flag indicating that the scan has been initialized, so that should be fixed anyway. How about removing those fields from IndexScanDesc? There's also is_multiscan field which is set but never read. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bitmap index status
Heikki Linnakangas [EMAIL PROTECTED] writes: What's the status of the bitmap index patch? Have you worked on it since the last posted patch (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)? Gavin and Jie have made major changes since that version (or at least they'd better have something to show for the month since then ;-)). I wouldn't recommend reviewing the patch until they post something current ... Also, vacuum actually does a reindex, which seems awfully wasteful. That needs to be looked at. Yikes. I imagine they've not tried to do anything about that; if you want to help, maybe you could take that subproblem? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] dump / restore functionality
Am Dienstag, 12. September 2006 15:22 schrieb Naz Gassiep: At risk of being chastised for reviving old issues, I was wondering, what are the chances were of getting the dump / restore selectivity into 8.2 ? Zero, because feature freeze is over. Did this item make it onto the to-do list? If so, did anyone pick this up or will I be waiting until a future as-yet-undefined date? If you find this feature interesting, you are free to drive the development yourself, independent of it appearing on any list. To avoid tears later on, look for a consensus about the merit of the feature first, though. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Optimizer improvements: to do or not to do?
Am Dienstag, 12. September 2006 12:48 schrieb Say42: That is why caching should be taking into account during joining cost calculation. If you know of a more effective way to do that beyond the effective_cache_size parameter that we have now, let us know. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Optimizer improvements: to do or not to do?
Gregory Stark [EMAIL PROTECTED] writes: It's possible there's some sort of block-wise correlated measure which would be even better for our needs. Actually, it seems obvious to me that the correlation measure ought to ignore within-block ordering, but right now it does not. OTOH it's not clear how important that is, as on a decent-size table you'll probably not have more than one sample row in a block anyway. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] dump / restore functionality
Zero, because feature freeze is over. Aah yes, fair enough If you find this feature interesting, you are free to drive the development yourself, independent of it appearing on any list. To avoid tears later on, look for a consensus about the merit of the feature first, though This has been discussed already, and there was a not insignificant amount of support from it, IIRC Tom Lane agreed that such functionality would be useful. Tom, are you aware if this item made it onto the to-do list? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] dump / restore functionality
Naz Gassiep [EMAIL PROTECTED] writes: At risk of being chastised for reviving old issues, I was wondering, what are the chances were of getting the dump / restore selectivity into 8.2 ? None, but feel free to start coding for 8.3. I am referring to the idea that, instead of the current 2 parts, a dump could be broken up into 3 parts, namely tables, data and everything else, so that data from one dump could be mixed and matched with schema defs from another dump easily and scriptably. That seems like a rather spectacular overstatement of the likely benefits, not to mention a misdescription of what was discussed. AFAIR what was discussed was separating - schema stuff needed before loading data - table data - schema stuff needed after loading data where the last category boils down to indexes and then foreign keys. All the other stuff such as functions really needs to be in the first part ... or at least there's no visible benefit to delaying loading it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] currentItemData currentMarkData
Heikki Linnakangas [EMAIL PROTECTED] writes: It has bothered me for some time that we have currentItemData and currentMarkData fields in IndexScanDescData, while it's really indexam's private data. They should be in the indexam opaque structure. Can't get very excited about it, but if you want. There's also is_multiscan field which is set but never read. It has been needed in the past and could plausibly be needed again sometime --- an index AM's rescan method might want to know what kind of scan it's setting up for. Of course, if we get rid of amgetmulti entirely as has been discussed, this'd go away too. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] dump / restore functionality
That seems like a rather spectacular overstatement of the likely benefits, not to mention a misdescription of what was discussed. AFAIR what was discussed was separating Yes, that is what was discussed. - schema stuff needed before loading data - table data - schema stuff needed after loading data where the last category boils down to indexes and then foreign keys. All the other stuff such as functions really needs to be in the first part ... or at least there's no visible benefit to delaying loading it. Right. This breakdown I still think would be useful. An additional item that would be useful is to allow pg_restore to restore plain text dumps. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Lock partitions
We can pass on what we've seen when running tests here with different BufMapping and LockMgr partition sizes. We use a TPC-C inspired benchmark. Currently it is configured to run 25 backend processes. The test runs for 16 minutes as this is the minimum amount of time we can run and obtain useful information. This gives us 24,000 seconds (25 * 16 * 60) of processing time. The following timings have been rounded to the nearest second and represent the amount of time amongst all backend processes to acquire and release locks. For example, a value of 2500 seconds would mean each backend process (25) took ~100 seconds to acquire or release a lock. Although, in reality, the time spent locking or releasing each partition entry is not uniform and there are some definite hotspot entries. We can pass on some of the lock output if anyone is interested. When using 16 buffer and 16 lock partitions, we see that BufMapping takes 809 seconds to acquire locks and 174 seconds to release locks. The LockMgr takes 362 seconds to acquire locks and 26 seconds to release locks. When using 128 buffer and 128 lock partitions, we see that BufMapping takes 277 seconds (532 seconds improvement) to acquire locks and 78 seconds (96 seconds improvement) to release locks. The LockMgr takes 235 seconds (127 seconds improvement) to acquire locks and 22 seconds (4 seconds improvement) to release locks. Overall, 128 BufMapping partitions improves locking/releasing by 678 seconds, 128 LockMgr partitions improves locking/releasing by 131 seconds. With the improvements in the various locking times, one might expect an improvement in the overall benchmark result. However, a 16 partition run produces a result of 198.74 TPS and a 128 partition run produces a result of 203.24 TPS. Part of the time saved from BufMapping and LockMgr partitions is absorbed into the WALInsertLock lock. For a 16 partition run, the total time to lock/release the WALInsertLock lock is 5845 seconds. For 128 partitions, the WALInsertLock lock takes 6172 seconds, an increase of 327 seconds. Perhaps we have our WAL configured incorrectly? Other static locks are also affected, but not as much as the WALInsertLock lock. For example, the ProcArrayLock lock increases from 337 seconds to 348 seconds. The SInvalLock lock increases from 317 seconds to 331 seconds. Due to expansion of time in other locks, a 128 partition run only spends 403 seconds less in locking than a 16 partition run. We can generate some OProfile statistics, but most of the time saved is probably absorbed into functions such as HeapTupleSatisfiesSnapshot and PinBuffer which seem to have a very high overhead. David -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon Riggs Sent: Tuesday, September 12, 2006 1:37 AM To: Tom Lane Cc: Mark Wong; Bruce Momjian; PostgreSQL-development Subject: Re: [HACKERS] Lock partitions On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote: Mark Wong [EMAIL PROTECTED] writes: Tom Lane wrote: It would be nice to see some results from the OSDL tests with, say, 4, 8, and 16 lock partitions before we forget about the point though. Anybody know whether OSDL is in a position to run tests for us? Yeah, I can run some dbt2 tests in the lab. I'll get started on it. We're still a little bit away from getting the automated testing for PostgreSQL going again though. Great, thanks. The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in src/include/storage/lwlock.h. You need a full backend recompile after changing it, but you shouldn't need to initdb, if that helps. IIRC we did that already and the answer was 16... -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: AIX shared libraries (was Re: [PATCHES] [HACKERS] Fix linking of OpenLDAP libraries)
Tom Lane wrote: The natural way in AIX would be: - Create libpq.so - Create libpq.a by 'rm -f libpq.a; ar -rc libpq.a libpq.so' - Install only libpq.a Hm. This seems possible with some moderate hacking on Makefile.shlib (certainly it'd be no more invasive than the existing Windows-specific platform variants). However, looking at what's already in Makefile.shlib for AIX makes me doubt the above claim a bit, because AFAICS libpq.so is produced from libpq.a on that platform. Is it possible that the rules have changed across AIX versions, and that the code in there now is needful for older versions? I don't think that this behaviour has changed. I remember it from AIX 4.3.2. Of course libpq.so is created from (the static) libpq.a. But once you have the dynamic library, you can link statically against it. Another issue with installing only .a is that there's no provision for versioning in .a library names ... what happens to someone who needs two generations of libpq on his machine? Use different directories and set LIBPATH? I don't know if there is a canonical way to do that. I'll investigate. Yours, Laurenz Albe ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bitmap index status
Hi Heikki, Gavin and I are trying to merge our changes together this week. We will post a new patch by the end of this week. This patch will include some style fixes, bug fixes, and the stream bitmap implementation. I will look into the problems you have mentioned in this email. Yes, vacuum currently does a reindex now. Gavin and I just talked about this yesterday. We are looking into ways to improve this. One way is not to do reindex for each vacuum. We maintain a list of updated tids along with the bitmap index. Only when this list goes to a certain point, vacuum will re-build the index. Thanks, Jie On 9/12/06 2:43 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Hi, What's the status of the bitmap index patch? Have you worked on it since the last posted patch (http://archives.postgresql.org/pgsql-patches/2006-08/msg3.php)? I've started to review it, to get it into CVS early in the 8.3 cycle. I just want to make sure that I'm working on the latest version. Beside the issues already discussed, I found two minor bugs: * pg_am says that bitmap am supports unique indexes, while it doesn't. Second, * race condition in _bitmap_inserttuple if two backends try to insert the same, new value. If they both find that there's no lov item for the key, and try to create one, one backend will get a duplicate key error on the lov index. Also, vacuum actually does a reindex, which seems awfully wasteful. That needs to be looked at. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Lock partitions
Strong, David [EMAIL PROTECTED] writes: When using 16 buffer and 16 lock partitions, we see that BufMapping takes 809 seconds to acquire locks and 174 seconds to release locks. The LockMgr takes 362 seconds to acquire locks and 26 seconds to release locks. When using 128 buffer and 128 lock partitions, we see that BufMapping takes 277 seconds (532 seconds improvement) to acquire locks and 78 seconds (96 seconds improvement) to release locks. The LockMgr takes 235 seconds (127 seconds improvement) to acquire locks and 22 seconds (4 seconds improvement) to release locks. While I don't see any particular penalty to increasing NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very significant penalty (increasing PGPROC size as well as the work needed during LockReleaseAll, which is executed at every transaction end). I think 128 lock partitions is probably verging on the ridiculous ... particularly if your benchmark only involves touching half a dozen tables. I'd be more interested in comparisons between 4 and 16 lock partitions. Also, please vary the two settings independently rather than confusing the issue by changing them both at once. With the improvements in the various locking times, one might expect an improvement in the overall benchmark result. However, a 16 partition run produces a result of 198.74 TPS and a 128 partition run produces a result of 203.24 TPS. Part of the time saved from BufMapping and LockMgr partitions is absorbed into the WALInsertLock lock. For a 16 partition run, the total time to lock/release the WALInsertLock lock is 5845 seconds. For 128 partitions, the WALInsertLock lock takes 6172 seconds, an increase of 327 seconds. Perhaps we have our WAL configured incorrectly? I fear this throws your entire measurement procedure into question. For a fixed workload the number of acquisitions of WALInsertLock ought to be fixed, so you shouldn't see any more contention for WALInsertLock if the transaction rate didn't change materially. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Lock partitions
Tom, Thanks for the feedback. We'll run a few tests with differing buffer and lock partition sizes in the range you're interested in and we'll let you know what we see. Our workload is not fixed, however. Our benchmark does not follow the strict TPC-C guideline of using think times etc. We throw as many transactions at the database as we can. So, when any time is freed up, we will fill it with another transaction. We simply want to stress as much as we can. As one bottleneck is removed, the time saved obviously flows to the next. Postgres 8.2 moves some of the time that used to be consumed by single BufMappingLock and LockMGRLock locks to the WALInsertLock lock. We have run tests where we made XLogInsert a NOP, because we wanted to see where the next bottleneck would be, and some of the time occupied by WALInsertLock lock was absorbed by the SInvalLock lock. We have not tried to remove the SInvalLock lock to see where time flows to next, but we might. David -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 12, 2006 9:40 AM To: Strong, David Cc: PostgreSQL-development Subject: Re: [HACKERS] Lock partitions Strong, David [EMAIL PROTECTED] writes: When using 16 buffer and 16 lock partitions, we see that BufMapping takes 809 seconds to acquire locks and 174 seconds to release locks. The LockMgr takes 362 seconds to acquire locks and 26 seconds to release locks. When using 128 buffer and 128 lock partitions, we see that BufMapping takes 277 seconds (532 seconds improvement) to acquire locks and 78 seconds (96 seconds improvement) to release locks. The LockMgr takes 235 seconds (127 seconds improvement) to acquire locks and 22 seconds (4 seconds improvement) to release locks. While I don't see any particular penalty to increasing NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very significant penalty (increasing PGPROC size as well as the work needed during LockReleaseAll, which is executed at every transaction end). I think 128 lock partitions is probably verging on the ridiculous ... particularly if your benchmark only involves touching half a dozen tables. I'd be more interested in comparisons between 4 and 16 lock partitions. Also, please vary the two settings independently rather than confusing the issue by changing them both at once. With the improvements in the various locking times, one might expect an improvement in the overall benchmark result. However, a 16 partition run produces a result of 198.74 TPS and a 128 partition run produces a result of 203.24 TPS. Part of the time saved from BufMapping and LockMgr partitions is absorbed into the WALInsertLock lock. For a 16 partition run, the total time to lock/release the WALInsertLock lock is 5845 seconds. For 128 partitions, the WALInsertLock lock takes 6172 seconds, an increase of 327 seconds. Perhaps we have our WAL configured incorrectly? I fear this throws your entire measurement procedure into question. For a fixed workload the number of acquisitions of WALInsertLock ought to be fixed, so you shouldn't see any more contention for WALInsertLock if the transaction rate didn't change materially. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Simplifying standby mode
On Wed, 2006-09-06 at 12:01 -0400, Bruce Momjian wrote: Simon Riggs wrote: 1. Notes on restartable recovery Previously submitted 2. Notes on standby functionality 3. discussion on rolling your own record-level polling using pg_xlogfile_name_offset() Given below, but not in SGML yet. Looking for general pointers/feedback before I drop those angle-brackets in place. Warm Standby Servers for High Availability == Overview Continuous Archiving can also be used to create a High Availability (HA) cluster configuration with one or more Standby Servers ready to take over operations in the case that the Primary Server fails. This capability is more widely known as Warm Standby Log Shipping. The Primary and Standby Server work together to provide this capability, though the servers are only loosely coupled. The Primary Server operates in Continuous Archiving mode, while the Standby Server operates in a continuous Recovery mode, reading the WAL files from the Primary. No changes to the database tables are required to enable this capability, so it offers a low administration overhead in comparison with other replication approaches. This configuration also has a very low performance impact on the Primary server. Directly moving WAL or log records from one database server to another is typically described as Log Shipping. PostgreSQL implements file-based Log Shipping, meaning WAL records are batched one file at a time. WAL files can be shipped easily and cheaply over any distance, whether it be to an adjacent system, another system on the same site or another system on the far side of the globe. The bandwidth required for this technique varies according to the transaction rate of the Primary Server. Record-based Log Shipping is also possible with custom-developed procedures, discussed in a later section. Future developments are likely to include options for synchronous and/or integrated record-based log shipping. It should be noted that the log shipping is asynchronous, i.e. the WAL records are shipped after transaction commit. As a result there can be a small window of data loss, should the Primary Server suffer a catastrophic failure. The window of data loss is minimised by the use of the archive_timeout parameter, which can be set as low as a few seconds if required. A very low setting can increase the bandwidth requirements for file shipping. The Standby server is not available for access, since it is continually performing recovery processing. Recovery performance is sufficiently good that the Standby will typically be only minutes away from full availability once it has been activated. As a result, we refer to this capability as a Warm Standby configuration that offers High Availability. Restoring a server from an archived base backup and rollforward can take considerably longer and so that technique only really offers a solution for Disaster Recovery, not HA. Other mechanisms for High Availability replication are available, both commercially and as open-source software. In general, log shipping between servers running different release levels will not be possible. However, it may be possible for servers running different minor release levels e.g. 8.2.1 and 8.2.2 to inter-operate successfully. No formal support for that is offered and there may be minor releases where that is not possible, so it is unwise to rely on that capability. Planning On the Standby server all tablespaces and paths will refer to similarly named mount points, so it is important to create the Primary and Standby servers so that they are as similar as possible, at least from the perspective of the database server. Furthermore, any CREATE TABLESPACE commands will be passed across as-is, so any new mount points must be created on both servers before they are used on the Primary. Hardware need not be the same, but experience shows that maintaining two identical systems is easier than maintaining two dissimilar ones over the whole lifetime of the application and system. There is no special mode required to enable a Standby server. The operations that occur on both Primary and Standby servers are entirely normal continuous archiving and recovery tasks. The primary point of contact between the two database servers is the archive of WAL files that both share: Primary writing to the archive, Standby reading from the archive. Care must be taken to ensure that WAL archives for separate servers do not become mixed together or confused. The magic that makes the two loosely coupled servers work together is simply a restore_command that waits for the next WAL file to be archived from the Primary. The restore_command is specified in the recovery.conf file on the Standby Server. Normal recovery processing would request a file from the WAL archive, causing an error if the file was unavailable. For Standby processing it is normal for the next file to be unavailable, so we
Re: [HACKERS] Simplifying standby mode
Simon Riggs wrote: In general, log shipping between servers running different release levels will not be possible. However, it may be possible for servers running different minor release levels e.g. 8.2.1 and 8.2.2 to inter-operate successfully. No formal support for that is offered and there may be minor releases where that is not possible, so it is unwise to rely on that capability. My memory is lousy at the best of times, but when have we had a minor release that would have broken this due to changed format? OTOH, the Primary and Backup servers need the same config settings (e.g. --enable-integer-datetimes), architecture, compiler, etc, do they not? Probably working from an identical set of binaries would be ideal. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Simplifying standby mode
On Tue, 2006-09-12 at 13:25 -0400, Andrew Dunstan wrote: Simon Riggs wrote: In general, log shipping between servers running different release levels will not be possible. However, it may be possible for servers running different minor release levels e.g. 8.2.1 and 8.2.2 to inter-operate successfully. No formal support for that is offered and there may be minor releases where that is not possible, so it is unwise to rely on that capability. My memory is lousy at the best of times, but when have we had a minor release that would have broken this due to changed format? OTOH, the Primary and Backup servers need the same config settings (e.g. --enable-integer-datetimes), architecture, compiler, etc, do they not? Probably working from an identical set of binaries would be ideal. Not often, which is why I mention the possibility of having interoperating minor release levels at all. If it was common, I'd just put a blanket warning on doing that. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Release notes
I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Release notes
Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. Will that be in a few years, or are you traveling backwards in time? ;-) cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Release notes
Andrew Dunstan wrote: Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. Will that be in a few years, or are you traveling backwards in time? ;-) Sorry, September 18. I will probably be done before then, but it seems best to set a date I know I will hit. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Release notes
On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. The next Monday, August 18, is in 2008. Surely that'll be enough time ;-) -- Michael Fuhr ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Release notes
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr Sent: 12 September 2006 19:57 To: Bruce Momjian Cc: PostgreSQL-development Subject: Re: [HACKERS] Release notes On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. The next Monday, August 18, is in 2008. Surely that'll be enough time ;-) Someone will have to speak to Denis about getting Bruce more community time :-) Regards, Dave. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Release notes
Dave Page wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr Sent: 12 September 2006 19:57 To: Bruce Momjian Cc: PostgreSQL-development Subject: Re: [HACKERS] Release notes On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. The next Monday, August 18, is in 2008. Surely that'll be enough time ;-) Someone will have to speak to Denis about getting Bruce more community time :-) It is more family activity that is causing my delays. I was hoping to carve out last weekend to work on it, but I couldn't. I wish I could blame Denis. ;-) -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Simplifying standby mode
Simon Riggs [EMAIL PROTECTED] writes: My memory is lousy at the best of times, but when have we had a minor release that would have broken this due to changed format? OTOH, the Primary and Backup servers need the same config settings (e.g. --enable-integer-datetimes), architecture, compiler, etc, do they not? Probably working from an identical set of binaries would be ideal. Not often, which is why I mention the possibility of having interoperating minor release levels at all. If it was common, I'd just put a blanket warning on doing that. I don't know that it's happened in the past but I wouldn't be surprised. Consider that the bug being fixed in the point release may well be a bug in WAL log formatting. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Simplifying standby mode
Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: My memory is lousy at the best of times, but when have we had a minor release that would have broken this due to changed format? Not often, which is why I mention the possibility of having interoperating minor release levels at all. If it was common, I'd just put a blanket warning on doing that. I don't know that it's happened in the past but I wouldn't be surprised. Consider that the bug being fixed in the point release may well be a bug in WAL log formatting. This would be the exception, not the rule, and should not be documented as if it were the rule. It's not really different from telling people to expect a forced initdb at a minor release: you are simply misrepresenting the project's policy. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Release notes
Bruce Momjian wrote: Dave Page wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr Sent: 12 September 2006 19:57 To: Bruce Momjian Cc: PostgreSQL-development Subject: Re: [HACKERS] Release notes On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. The next Monday, August 18, is in 2008. Surely that'll be enough time ;-) Someone will have to speak to Denis about getting Bruce more community time :-) It is more family activity that is causing my delays. I was hoping to carve out last weekend to work on it, but I couldn't. I wish I could blame Denis. ;-) Bah!! who needs family ;) -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] UUID datatype progress
Folks, During past days I have been coding the UUID datatype which has been fun and great learning curve for me. Here is the current progress: - basic i/o functions are done. - various records for pg_class,pg_operator,pg_amop etc, are added - the additions pass the regression check :) - btree and hash indexes go as expected (as far as I could test) - currently 3 inputs and 1 general output are supported (as suggested by Tom) remaining uuid todo list: - binary receive and send functions are yet in development. - Additional testing, testing and again testing. - cast and convert functions have to be done - develop some kind of a new_guid() function (if required) - develop a SERIAL like type (has to be discussed yet) If everything goes okay, I should be able to submit a beta patch for reviewing during next two weeks (and get shot at :)) Regards, Gevik ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Simplifying standby mode
Tom Lane [EMAIL PROTECTED] writes: This would be the exception, not the rule, and should not be documented as if it were the rule. It's not really different from telling people to expect a forced initdb at a minor release: you are simply misrepresenting the project's policy. Well it's never been a factor before so I'm not sure there is a policy. Is there now a policy that WAL files like database formats are as far as possible not going to be changed in minor versions? This means if there's a bug fix that affects WAL records the new point release will generally have to be patched to recognise the broken WAL records and process them correctly rather than simply generate corrected records. That could be quite a burden. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Simplifying standby mode
Gregory Stark [EMAIL PROTECTED] writes: Well it's never been a factor before so I'm not sure there is a policy. Is there now a policy that WAL files like database formats are as far as possible not going to be changed in minor versions? This means if there's a bug fix that affects WAL records the new point release will generally have to be patched to recognise the broken WAL records and process them correctly rather than simply generate corrected records. That could be quite a burden. Let's see, so if we needed a bug fix that forced a tuple header layout change or datatype representation change or page header change, your position would be what exactly? The project policy has always been that we don't change on-disk formats in minor releases. I'm not entirely clear why you are so keen on carving out an exception for WAL data. While I can imagine bugs severe enough to make us violate that policy, our track record of not having to is pretty good. And I don't see any reason at all to suppose that such a bug would be more likely to affect WAL (and only WAL) than any other part of our on-disk structures. But having said all that, I'm not sure why we are arguing about it in this context. There was an upthread mention that we ought to recommend using identical executables on master and slave PITR systems, and I think that's a pretty good recommendation in any case, because of the variety of ways in which you could screw yourself through configuration differences. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] dump / restore functionality
None, but feel free to start coding for 8.3.My coding skills are still nascent, but I shall do my best. My coding skills are still pretty nascent, but I shall do my best. That seems like a rather spectacular overstatement of the likely benefits, not to mention a misdescription of what was discussed. Once again I get pulled over by the semantics police :) Yes, you are right, that's what was discussed, and that is the functionality I am hoping for, as it would allow scripting the merging of a schema from one database with the table data from another. Did this make it into the to-do list for 8.3 ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] dump / restore functionality
Naz Gassiep wrote: None, but feel free to start coding for 8.3.My coding skills are still nascent, but I shall do my best. My coding skills are still pretty nascent, but I shall do my best. That seems like a rather spectacular overstatement of the likely benefits, not to mention a misdescription of what was discussed. Once again I get pulled over by the semantics police :) Yes, you are right, that's what was discussed, and that is the functionality I am hoping for, as it would allow scripting the merging of a schema from one database with the table data from another. Did this make it into the to-do list for 8.3 ? Don't worry about the to-do list too much. If you care about it, post a patch; if you keep a link to the archives pointing at this discussion, you can later bang us over our heads if we reject the patch. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] dump / restore functionality
Alvaro Herrera [EMAIL PROTECTED] writes: Naz Gassiep wrote: Did this make it into the to-do list for 8.3 ? Don't worry about the to-do list too much. In particular, if you're imagining that being in the TODO list will in itself cause anyone to work on it, you're much mistaken about this community operates. Scratching your own itch is the general rule. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] UUID datatype progress
If everything goes okay, I should be able to submit a beta patch for reviewing during next two weeks (and get shot at :)) /me starts loading... Regards, Gevik ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
Stefan Kaltenbrunner [EMAIL PROTECTED] writes: Tom Lane wrote: Could we see the actual EXPLAIN ANALYZE results for the slow plan? http://www.kaltenbrunner.cc/files/dbt3_explain_analyze.txt Well, indeed it seems that the hash join is just an innocent bystander: the bulk of the runtime (all but about 120 sec in fact) is spent here: - Nested Loop (cost=13.65..1719683.85 rows=12000672 width=49) (actual time=60.325..24923860.713 rows=11897899 loops=1) - Merge Join (cost=0.00..10248.66 rows=2 width=41) (actual time=16.654..2578.060 rows=19837 loops=1) ... - Bitmap Heap Scan on lineitem (cost=13.65..77.16 rows=665 width=16) (actual time=13.492..1254.535 rows=600 loops=19837) Recheck Cond: (lineitem.l_suppkey = supplier.s_suppkey) - Bitmap Index Scan on i_l_suppkey (cost=0.00..13.65 rows=665 width=0) (actual time=10.662..10.662 rows=600 loops=19837) Index Cond: (lineitem.l_suppkey = supplier.s_suppkey) I suppose that the profile result you showed was taken during the startup transient where it was computing the hashtables that this loop's results are joined to ... but that's not where the problem is. The problem is repeating that bitmap scan on lineitem for nearly 2 different l_suppkeys. Apparently we've made the planner a bit too optimistic about the savings that can be expected from repeated indexscans occurring on the inside of a join. The other plan uses a different join order and doesn't try to join lineitem until it's got orders.o_orderkey, whereupon it does a mergejoin against an indexscan on lineitem: - Index Scan using i_l_orderkey on lineitem (cost=0.00..2715943.34 rows=60003360 width=16) (actual time=32.868..123668.380 rows=59991868 loops=1) The runtimes for the remainders of the plans are roughly comparable, so it's the cost of joining lineitem that is hurting here. Is lineitem sorted (or nearly sorted) by l_orderkey? Part of the problem could be overestimating the cost of this indexscan. What are the physical sizes of lineitem and its indexes, and how do those compare to your RAM? What are you using for planner settings (particularly effective_cache_size)? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] pgbench is badly broken since July
I have just realized that the recent patches in pgbench have altered its behavior in a way that destroys reproducibility of results --- I'm seeing reported TPS numbers about twice what they were before that. I'd love to say we did something in the past month that made the backend 2X faster, but sadly, tain't so :-( I've only done limited investigation but what seems to be happening is that the -s scale factor is no longer being applied to adjust the range of the random variables, leading to only part of the database being touched. For example, with -s = 10, the older code updates all 10 branches rows but CVS HEAD only seems to touch bid = 1. As far as I concern the bug is there, but in a different way what you said. It seems -s works. Problem is, pgbench does not get the default scaling factor (that is same as the number of rows in branches table), instead alway uses the default scaling factor 1. I have fixed the problem in CVS HEAD. I see that the code still thinks it is scaling the numbers, but the mechanism for doing that has changed drastically, and I suspect that's the problem. I assume there's some garden-variety bug in there. At a more fundamental level, the recent patches seem to mean that pg_dump is doing noticeably more work to issue its queries than it was before, and I wonder whether this isn't going to destroy cross-version reproducibility in its own way. Have you done any timing studies to verify that the new and old code have comparable performance? I don't mind enhancing pgbench's functionality for non-default tests, but I think it's important that the default case remain comparable over time. If there is a noticeable speed difference then I'd vote for going back to the klugy old code for adjusting the default script's values. Let me do further performance testing. And at the nitpicking level, is :tps a good name for the variable that reflects the scaling factor? It seems awfully easy to confuse that with the TPS numbers that pgbench reports. Perhaps :scale or some such would be better. I replaced all occurenes of tps to scale. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Release notes
Bruce Momjian wrote: Dave Page wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Fuhr Sent: 12 September 2006 19:57 To: Bruce Momjian Cc: PostgreSQL-development Subject: Re: [HACKERS] Release notes On Tue, Sep 12, 2006 at 02:31:22PM -0400, Bruce Momjian wrote: I again will not be able to complete the release notes today as promised. My next target date is Monday, August 18. Sorry. The next Monday, August 18, is in 2008. Surely that'll be enough time ;-) Someone will have to speak to Denis about getting Bruce more community time :-) It is more family activity that is causing my delays. I was hoping to carve out last weekend to work on it, but I couldn't. I wish I could blame Denis. ;-) The family is more important than PostgreSQL. Having fun with the family indeed gives energy to someone to work. So, go family fun! Best Regards, Carlo Florendo Astra Philippines Inc. www.astra.ph ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Optimizer improvements: to do or not to do?
Peter Eisentraut wrote: If you know of a more effective way to do that beyond the effective_cache_size parameter that we have now, let us know. I don't know the better way and it is not my goal at all. I think about more accurate cost estimation of nested loops join and subqueries. Usual case in data request is a joining detail and some master tables into a single relation. Often master tables are small and after some nested loops iterations are well (perhaps wholly) cached. Cost estimation of the tables access path don't care about the such caching and cause overestimation. In some cases it can lead up to choosing not the best plan. Example from real life. The following request return count of national calls from the call registration table. select count(*) from conn.conn20060803 c where exists (select code from trunk_codes tc where c.bnum = tc.code and c.bnum like tc.code || '%' order by tc.code desc limit 1) enable_seqscan = off: Aggregate (cost=103185258.68..103185258.69 rows=1 width=0) (actual time=13385.674..13385.676 rows=1 loops=1) - Seq Scan on conn20060803 c (cost=1.00..103184640.52 rows=247264 width=0) (actual time=0.409..13307.254 rows=38739 loops=1) Filter: (subplan) SubPlan - Limit (cost=0.00..6.42 rows=1 width=10) (actual time=0.020..0.020 rows=0 loops=494527) - Index Scan Backward using belg_mobile_pkey on belg_mobile tc (cost=0.00..6.42 rows=1 width=10) (actual time=0.012..0.012 rows=0 loops=494527) Index Cond: (($0)::text = (code)::text) Filter: (($0)::text ~~ ((code)::text || '%'::text)) Total runtime: 13385.808 ms enable_seqscan =on: Aggregate (cost=1101623.47..1101623.48 rows=1 width=0) (actual time=63724.508..63724.509 rows=1 loops=1) - Seq Scan on conn20060803 c (cost=0.00..1101005.30 rows=247264 width=0) (actual time=2.244..63640.413 rows=38739 loops=1) Filter: (subplan) SubPlan - Limit (cost=2.20..2.20 rows=1 width=10) (actual time=0.121..0.121 rows=0 loops=494527) - Sort (cost=2.20..2.20 rows=1 width=10) (actual time=0.114..0.114 rows=0 loops=494527) Sort Key: code - Seq Scan on belg_mobile tc (cost=0.00..2.19 rows=1 width=10) (actual time=0.096..0.099 rows=0 loops=494527) Filter: ((($0)::text = (code)::text) AND (($0)::text ~~ ((code)::text || '%'::text))) Total runtime: 63724.630 ms ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Optimizer improvements: to do or not to do?
Say42 [EMAIL PROTECTED] writes: Usual case in data request is a joining detail and some master tables into a single relation. Optimizing on the basis of only one example is seldom a good idea... and I think you are falling into that trap by supposing that there is a usual case. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Optimizer improvements: to do or not to do?
Say42 wrote: select count(*) from conn.conn20060803 c where exists (select code from belg_mobile tc ... Correction: replace 'trunk_codes' with 'belg_mobile'. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Getting a move on for 8.2 beta
On Fri, Sep 01, 2006 at 03:28:36PM -0400, Stephen Frost wrote: Overall, I really think 8.2 is going to be an excellent release. I wish autovacuum could have been enabled by default and I'd just like to ask, now and I'll try to remember again once 8.2 is out, please let's turn it on by default for 8.3 (and early on so we get some good testing of it). Can someone put this on the TODO, just so we (hopefully) don't forget about it? -- Jim C. Nasby, Database Architect [EMAIL PROTECTED] 512.569.9461 (cell) http://jim.nasby.net ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Optimizer improvements: to do or not to do?
Tom Lane wrote: Optimizing on the basis of only one example is seldom a good idea... and I think you are falling into that trap by supposing that there is a usual case. Perhaps I am wrong but I assume normalization is a usual case, small master (parent) tables are not very rare also. Yes, my example is unusual but it is _real_ and demonstrate PG optimizer inaccuracy. Why don't we make PG optimizer more close to reality if we can? Is it so needless and I make a mountain out of a molehill? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org