Re: [HACKERS] 8.3 GSS Issues
On Thu, Oct 25, 2007 at 05:39:37PM -0700, Henry B. Hotz wrote: On Oct 25, 2007, at 3:27 PM, Stephen Frost wrote: * Henry B. Hotz ([EMAIL PROTECTED]) wrote: What you're asking for is basically a krb_match_realm parameter, or do I understand you wrong? I'm asking for name matching to be done i.a.w. the gssapi recommendations. That's all I want, but it's actually necessary for this feature to be at all usable in my environment. If we don't then I suggest we pull this feature until it can be done correctly. I know what you want, that's fairly obvious. I'm only asking about *how* to do it the best way. What the krb5 method does is IMO a documented bug. The realm name is part of the name. As I explained at some length you cannot assume the username (first component of the principal) has any meaning by itself, except in small deployments with no external trust agreements. Kerberos (and AD) are designed to support larger infrastructures with multiple organizations. This isn't unexpected for PG as the current krb5 support does this. I'm not a big fan of it but at the same time I don't feel it's justification to drop it from 8.3. Having it only allow the default realm would be an option which could work in 8.3, imv. I don't think the fact that the existing krb5 code does the wrong thing (and can't be used in an environment with cross-realm agreements) is justification for doing the wrong thing in a new capability. The code in my original patch would do the latter (default realm only). More precisely: if you do a gss_import_name() on smith and [EMAIL PROTECTED] you get the same internal representation, and gss_compare_name() will tell you they're the same. Also gss_compare_name() will tell you [EMAIL PROTECTED] is different from either of the first two. Wouldn't using a specific parameter like krb_match_realm=YOUR.REALM that you set in the config file be more flexible? In that it actually allows scenarios like server/resource domains (not sure how common they are in unix krb setups, but they're certainly not unfamiliar in the Windows AD world)? If we don't use gss_compare_name(), or some similar mechanism, to compare connection names to PG usernames, then I don't think GSSAPI support should be included in 8.3. I think that's a horrible idea, given that it works perfectly fine the way it is now for the vast majority of users. That said, we should certainly fix it in one way or another for 8.3. But if that fails, I see no reason at all to pull the feature. Longer term (since it's likely too late to be accepted now), as I think has been discussed in the past, PG could really use a .k5login-esque, either admin-only (ala pg_hba.conf / ident map) or per-user (some sort of ALTER ROLE that a user could do on himself?), mapping functionality. There has been discussion of a general mapping layer between authentication names and authorization/role names. I think that's the way to go. I haven't thought about who or where the administration of the mapping ought to be. Yeah, I agree that something like that would be a good long-term solution. For a proper discussion of this topic I recommend the section starting on page 64 of Sun's Security for Developers Guide, document 816-4863. Note that there is a discussion of how to do compares efficiently. IIRC my patch did things the easy way described on page 67. In the long run it's possible we'd want to do it the fast way described on page 69, but that's merely an optimization and might not be needed. Do you have an URL for this? Or is it a book one has t buy? //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] module archive
On 10/25/07, Andrew Dunstan [EMAIL PROTECTED] wrote: From time to time people have raised the idea of a CPAN-like mechanism for downloading, building and installing extensions and the like (types, functions, sample dbs, anything not requiring Postgres itself to be rebuilt), and I have been thinking on this for the last few days. What sort of requirements would people have of such a mechanism? How do people envision it working? From a user's point of view, it would work like this : $ pgsql-pkg list Installed packages : -- pgpool-II v.1.2.1 plR v.8.2.0.5 $ cd /usr/src/pgsql-packages/ $ cvs update $ cd languages/plR $ make upgrade $ cd ../../compatibility/orafce $ make install $ pgsql-pkg list Installed packages : -- pgpool-II v.1.2.1 plR v.8.2.0.7 orafce v.2.1.2 A starting point worth considering could be pkgsrc , the packaging system used by NetBSD ( http://www.netbsd.org/docs/software/packages.html ) . Pkgsrc is not limited to NetBSD : it works on other OS'es as well (Linux, *BSD, Solaris, HP-UX, ... ) . The pkgsrc is based on a directory tree organized in categories. For each package there is a directory that contains the Makefile, the description, and sometimes patches . The user can do : make build , make install , make upgrade , etc. The sources are downloaded from the server where the project is originally hosted : NetBSD is not duplicating the source archives on their server. If necessary, the build system can apply patches before compiling the package. This allows the NetBSD folks to adapt the package without requiring modifications to the original sources So, the mechanism for PostgreSQL extensions could work this way : (A) The source tree would be an adapted version of pkgsrc : - the extensions are organized into directories (based on categories) - each extension would be a directory that basically contains a Makefile that contains enough information for downloading the source tarball, optionally apply patches, and install the package(extension). - probably it will be necessary to create a separate source tree for each PostgreSQL version , so that the build system wouldn't be cluttered with keeping information about which package version is compatible with which PostgreSQL version. - with such a system it is possible to include an extension even if the maintainer of that particular project doesn't modify anything for making it compatible with the extensions system . (B) Some extensions are only sql scripts that install new objects into a database, while others could also contain binaries, libraries, or maybe other stuff as well. Therefore, the installation probably has to be two-step : step 1 : 'make install' would copy the files in a new subdirectory called extensions : /usr/local/pgsql/extensions/bin /usr/local/pgsql/extensions/lib /usr/local/pgsql/extensions/install_scripts /usr/local/pgsql/extensions/remove_scripts /usr/local/pgsql/extensions/examples and add the extension to the list of 'installed extensions'. step 2 : for activating the extension inside a particular database the user would execute the extension's install_script. The first step is installing the extension at the database cluster level , while the second step is installing the extension inside a particular database. I think that this distinction is an important one : the user will need full control in installing the extensions only in the databases where she/he wants to. (C) It would be nice to have a table containing information about the extensions currently installed . Cheers, Adrian Maier ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
Josh Berkus [EMAIL PROTECTED] writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? Well it made a difference but it didn't remove the bottleneck, it just moved it. IIRC under that benchmark Jignesh was able to run with x sessions efficiently with 8 clog buffers, x + 100 or so sessions with 16 clog buffers and x + 200 or so sessions with 32 clog buffers. It happened that x + 200 was the number of sessions he wanted to run the benchmark at so it helped the benchmark results quite a bit. But that was just an artifact of how many sessions the benchmark needed. A user who needs 1200 sessions or who has a different transaction load might find he needs more clog buffers to alleviate the bottleneck. And of course most (all?) normal users use far fewer sessions and won't run into this bottleneck at all. Raising NUM_CLOG_BUFFERS just moves around the arbitrary bottleneck. This benchmark is useful in that it gives us an idea where the bottleneck lies for various values of NUM_CLOG_BUFFERS but it doesn't tell us what value realistic users are likely to bump into. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Sebastien FLAESCH wrote: Just looked at the new features of 8.3 and realized that positioned updates/deletes is now possible with this new release... We would use that if we could define the cursor name with a libpq function. I don't understand. When you open a cursor with DECLARE CURSOR, you give it a name. Doesn't that do what you want? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] module archive
On Thu, Oct 25, 2007 at 12:31:30PM -0400, Andrew Dunstan wrote: From time to time people have raised the idea of a CPAN-like mechanism for downloading, building and installing extensions and the like (types, functions, sample dbs, anything not requiring Postgres itself to be rebuilt), and I have been thinking on this for the last few days. What sort of requirements would people have of such a mechanism? How do people envision it working? There's been some discussion before: http://archives.postgresql.org/pgsql-patches/2006-05/msg00327.php The most useful feature IMHO would be if the installed modules could be recognised as a block (using pg_depends) so that: 1. pg_dump can see them and emit only INSTALL module 2. Uninstall removes everything that was installed This only covers installtion though, not building or compiling. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] module archive
Am Donnerstag, 25. Oktober 2007 schrieb Andrew Dunstan: From time to time people have raised the idea of a CPAN-like mechanism for downloading, building and installing extensions and the like (types, functions, sample dbs, anything not requiring Postgres itself to be rebuilt), and I have been thinking on this for the last few days. What sort of requirements would people have of such a mechanism? How do people envision it working? Downloading, building, and installing extensions is actually fairly standardized already (well, perhaps there are 2 or 3 standards, but CPAN has that as well). I think the inhibitions relate more to the management of what is installed. I imagine we need a package manager inside of PostgreSQL to manage installation, setup, removal, dependencies and so on. Much like rpm or dpkg really. That should replace the current run this .sql file mechanism, much like rpm and dpkg replaced the run make install and trust me mechanism. I have some of this mapped out in my head if there is interest. We'd also need easy integration with the real rpm and dpkg, so that distribution packages can be built easily and I can run apt-get install postgresql extension1 extension2 and it's all set up. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Hi hackers, Just looked at the new features of 8.3 and realized that positioned updates/deletes is now possible with this new release... We would use that if we could define the cursor name with a libpq function. Something similar to ODBC's SQLSetCursorName() function... For now we must use OIDs to emulate WHERE CURRENT OF, but since 8.1, OIDs are no more created by default and require additional configuration when setting up a PostgreSQL server. If I missed something, please point me to the docs where I can find this. Understand this is not an isolated project: We are a Development Tool vendor and have a bunch of customers migrating legacy applications from Informix to PostgreSQL... Thanks a lot! Sebastien FLAESCH Database Interfaces Four J's Development Tools ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: As far as Load Test is concerned, i have tried to provide all the relevant details. Please inform me, if i have left any. Thanks! How large were the tables? Did you run all the queries concurrently? At this point, I think it'd be better to run them separately so that you can look at the impact on each kind of operation in isolation. What kind of an I/O system does the server have? It'd be interesting to get the cache hit/miss ratios, as well as the output of iostat (or similar) during the test. How much of the benefit is due to reduced random I/O? What does the numbers look like if the the tables are small enough to fit in RAM? You should do some tuning, the PostgreSQL default configuration is not tuned for maximum performance. At least increase checkpoint_segments and checkpoint_timeout and shared_buffers. Though I noticed that you're running on Windows; I don't think anyone's done any serious performance testing or tuning on Windows yet, so I'm not sure how you should tune that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes
On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: As far as Load Test is concerned, i have tried to provide all the relevant details. Please inform me, if i have left any. Thanks! How large were the tables? It is in the Performance test report. They contain 2 million records. 6 columns wide, 3 text and 3 numeric. same set of tables used for both tests, after refresh from a file Did you run all the queries concurrently? At this point, I think it'd be better to run them separately so that you can look at the impact on each kind of operation in isolation. Performance tests are run against a workload and i have taken the workload of a small scale partitioning setup. Running the queries individually has already been done and the count of logical reads have been verified. I have already suggested that. For some reason, i am not able to convince that for simple index scans, Logical reads are a good measure of performance. What kind of an I/O system does the server have? Its a normal desktop system. The model no. is ST3400633A, 7200 RPM It'd be interesting to get the cache hit/miss ratios, as well as the output of iostat (or similar) during the test. How much of the benefit is due to reduced random I/O? Good suggestion. i have run the test against Windows. Let me try perfmon in the next performance test, to monitor the performance test. What does the numbers look like if the the tables are small enough to fit in RAM? I don't know whether this is a valid production setup, against which we need to benchmark. But if you insist, i will do that and get back to you next time. You should do some tuning, the PostgreSQL default configuration is not tuned for maximum performance. At least increase checkpoint_segments and checkpoint_timeout and shared_buffers. Though I noticed that you're running on Windows; I don't think anyone's done any serious performance testing or tuning on Windows yet, so I'm not sure how you should tune that. What we are trying to do here, is to try and compare the performance of two indexing structures. AFAIK, the performance test done to compare two software implementations should not have parameter settings, favorable to one. I have not done any settings change favorable to thick index. But i have a limited setup, from which i am trying to contribute. So please don't ask me to run the tests against large scale servers. I think a better idea would be to form a Performance testing Workload mix ( Taking into account the QoS Parameters used in the normal database, purging frequency, typical workload models used in the industry), with freedom in hardware/software can be drawn. That might solve some of the Load test riddles. -- Thanks, Gokul. CertoSQL Project, Allied Solution Groups. (www.alliedgroups.com)
Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes
Gokulakannan Somasundaram wrote: On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: As far as Load Test is concerned, i have tried to provide all the relevant details. Please inform me, if i have left any. Thanks! How large were the tables? It is in the Performance test report. They contain 2 million records. 6 columns wide, 3 text and 3 numeric. same set of tables used for both tests, after refresh from a file I meant in megabytes. How wide is the data in the text and numeric fields? Did you run all the queries concurrently? At this point, I think it'd be better to run them separately so that you can look at the impact on each kind of operation in isolation. Performance tests are run against a workload and i have taken the workload of a small scale partitioning setup. Running the queries individually has already been done and the count of logical reads have been verified. I have already suggested that. For some reason, i am not able to convince that for simple index scans, Logical reads are a good measure of performance. I wouldn't expect any performance gain for simple, not index-only, scans. They have to hit the heap anyway. What does the numbers look like if the the tables are small enough to fit in RAM? I don't know whether this is a valid production setup, against which we need to benchmark. But if you insist, i will do that and get back to you next time. A lot of people run databases that fit in RAM. And a lot of people don't. Both cases are interesting. I'm particularly curious about that because you've argued that the number of logical reads is important, even if they don't become physical reads. Hannu also suggested that swapping pages in/out of shared_buffers is relatively expensive; if that's the case, we should see index-only scans performing much better regular index scans, even when there's no physical I/O. You should do some tuning, the PostgreSQL default configuration is not tuned for maximum performance. At least increase checkpoint_segments and checkpoint_timeout and shared_buffers. Though I noticed that you're running on Windows; I don't think anyone's done any serious performance testing or tuning on Windows yet, so I'm not sure how you should tune that. What we are trying to do here, is to try and compare the performance of two indexing structures. AFAIK, the performance test done to compare two software implementations should not have parameter settings, favorable to one. I have not done any settings change favorable to thick index. The tuning I suggested is just basic tuning any knowledgeable Postgres DBA will do. It's not particularly in favor of any indexing scheme. With the default checkpoint settings, for example, the system is going to be busy doing checkpoints all the time if you have a reasonable rate of updates. But i have a limited setup, from which i am trying to contribute. So please don't ask me to run the tests against large scale servers. Understood. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
On Mon, Oct 22, 2007 at 01:19:24PM -0700, Trevor Talbot wrote: On 10/22/07, Magnus Hagander [EMAIL PROTECTED] wrote: Trevor Talbot wrote: I'd probably take the approach of combining win32_waitpid() and threads. You'd end up with 1 thread per 64 backends; when something interesting happens the thread could push the info onto a queue, which the new win32_waitpid() would check. Use APCs to add new backends to threads with free slots. I was planning to make it even easier and let Windows do the job for us, just using RegisterWaitForSingleObject(). Does the same - one thread per 64 backends, but we don't have to deal with the queueing ourselves. Oh, good call -- I keep forgetting the native thread pool exists. Taking this one to -hackers once and for all now... Can you try the attached patch? See how many backends you can get up to. This patch changes from using a single thread for each backend started to using the builtin threadpool functionality. It also replaces the pid/handle arrays with an i/o completion port. The net result is also, imho, much more readable code :-) Beware - there's still plenty of debugging code in there :-) //Magnus Index: src/backend/postmaster/postmaster.c === RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.542 diff -c -r1.542 postmaster.c *** src/backend/postmaster/postmaster.c 26 Sep 2007 22:36:30 - 1.542 --- src/backend/postmaster/postmaster.c 26 Oct 2007 11:46:45 - *** *** 331,344 #ifdef EXEC_BACKEND #ifdef WIN32 - static void win32_AddChild(pid_t pid, HANDLE handle); - static void win32_RemoveChild(pid_t pid); static pid_t win32_waitpid(int *exitstatus); ! static DWORD WINAPI win32_sigchld_waiter(LPVOID param); ! static pid_t *win32_childPIDArray; ! static HANDLE *win32_childHNDArray; ! static unsigned long win32_numChildren = 0; HANDLEPostmasterHandle; #endif --- 331,347 #ifdef EXEC_BACKEND #ifdef WIN32 static pid_t win32_waitpid(int *exitstatus); ! static void WINAPI pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired); ! static HANDLE win32ChildQueue; ! ! typedef struct ! { ! HANDLE waitHandle; ! HANDLE procHandle; ! DWORD procId; ! } win32_deadchild_waitinfo; HANDLEPostmasterHandle; #endif *** *** 899,914 #ifdef WIN32 /* !* Initialize the child pid/HANDLE arrays for signal handling. */ ! win32_childPIDArray = (pid_t *) ! malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(pid_t))); ! win32_childHNDArray = (HANDLE *) ! malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(HANDLE))); ! if (!win32_childPIDArray || !win32_childHNDArray) ereport(FATAL, ! (errcode(ERRCODE_OUT_OF_MEMORY), !errmsg(out of memory))); /* * Set up a handle that child processes can use to check whether the --- 902,913 #ifdef WIN32 /* !* Initialize I/O completion port used to deliver list of dead children. */ ! win32ChildQueue = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 1); ! if (win32ChildQueue == NULL) ereport(FATAL, ! (errmsg(could not create I/O completion port for child queue))); /* * Set up a handle that child processes can use to check whether the *** *** 2072,2083 #define LOOPHEADER() (exitstatus = status.w_status) #else /* WIN32 */ #define LOOPTEST()((pid = win32_waitpid(exitstatus)) 0) ! /* !* We need to do this here, and not in CleanupBackend, since this is !* to be called on all children when we are done with them. Could move !* to LogChildExit, but that seems like asking for future trouble... !*/ ! #define LOOPHEADER() (win32_RemoveChild(pid)) #endif /* WIN32 */ #endif /* HAVE_WAITPID */ --- 2071,2077 #define LOOPHEADER() (exitstatus = status.w_status) #else /* WIN32 */ #define LOOPTEST()((pid = win32_waitpid(exitstatus)) 0) ! #define LOOPHEADER() #endif /* WIN32 */ #endif /* HAVE_WAITPID */ *** *** 3332,3343 int i; int j; charcmdLine[MAXPGPATH * 2]; - HANDLE childHandleCopy; - HANDLE waiterThread; HANDLE paramHandle; BackendParameters *param; SECURITY_ATTRIBUTES sa; charparamHandleStr[32]; /* Make sure caller set up argv properly */ Assert(argc = 3); --- 3326,3336 int i; int j; charcmdLine[MAXPGPATH * 2];
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
On Fri, Oct 26, 2007 at 05:25:39AM -0700, Trevor Talbot wrote: On 10/26/07, Magnus Hagander [EMAIL PROTECTED] wrote: Can you try the attached patch? See how many backends you can get up to. This patch changes from using a single thread for each backend started to using the builtin threadpool functionality. It also replaces the pid/handle arrays with an i/o completion port. The net result is also, imho, much more readable code :-) The patch looks good; I'm not set up to build yet, but I should be able to test it sometime in the next week. I've uploaded a set of binary files to http://www.hagander.net/pgsql/pgsql_83_snapshot_win32child.zip. You'll need to get the dependency DLLs elsewhere, but you may have them already. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Including Snapshot Info with Indexes
On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: On 10/26/07, Heikki Linnakangas [EMAIL PROTECTED] wrote: Gokulakannan Somasundaram wrote: As far as Load Test is concerned, i have tried to provide all the relevant details. Please inform me, if i have left any. Thanks! How large were the tables? It is in the Performance test report. They contain 2 million records. 6 columns wide, 3 text and 3 numeric. same set of tables used for both tests, after refresh from a file I meant in megabytes. How wide is the data in the text and numeric fields? I have observed the size of PGDATA\base folder for size details Size of Tables : 367 MB Size of Tables + thin indexes : 616 MB Size of Tables + thick indexes : 720 MB The numbers were simply running between 1 and 2 million in a serial fashion. I think i made a mistake here. this would have helped thin indexes in range scans, since the data is clustered at the table, the bitmap heap scan would have been more effective. So i hope thick indexes will be more effective, if uncluster the data, since the thin index has to goto more table buffers. The test columns are approx 10 characters in length. Did you run all the queries concurrently? At this point, I think it'd be better to run them separately so that you can look at the impact on each kind of operation in isolation. Performance tests are run against a workload and i have taken the workload of a small scale partitioning setup. Running the queries individually has already been done and the count of logical reads have been verified. I have already suggested that. For some reason, i am not able to convince that for simple index scans, Logical reads are a good measure of performance. I wouldn't expect any performance gain for simple, not index-only, scans. They have to hit the heap anyway. I just feel the above test didn't do much I/Os and yet the index only scans are faster with thick indexes. since the size of RAM is 1GB and the size of the data is only 616MB, i hope most of them might have been OS cached. May be i am missing something here. What does the numbers look like if the the tables are small enough to fit in RAM? I don't know whether this is a valid production setup, against which we need to benchmark. But if you insist, i will do that and get back to you next time. A lot of people run databases that fit in RAM. And a lot of people don't. Both cases are interesting. I'm particularly curious about that because you've argued that the number of logical reads is important, even if they don't become physical reads. Hannu also suggested that swapping pages in/out of shared_buffers is relatively expensive; if that's the case, we should see index-only scans performing much better regular index scans, even when there's no physical I/O. So the above test has fit into the RAM. Now do we need a test with tables that won't fit into RAM. i feel if the thick indexes were effective with data that would fit into RAM, then it will definitely be more effective with data that wouldn't fit into RAM. There is one performance bug, with updates where the caching strategy for BTStack didn't go effective for the Varlena structures. i will fix that bug next time. Also calls to HOT related stuff can be avoided, if it happens to be a thick index, I think these two changes, if made would further improve the performance of thick indexes. You should do some tuning, the PostgreSQL default configuration is not tuned for maximum performance. At least increase checkpoint_segments and checkpoint_timeout and shared_buffers. Though I noticed that you're running on Windows; I don't think anyone's done any serious performance testing or tuning on Windows yet, so I'm not sure how you should tune that. What we are trying to do here, is to try and compare the performance of two indexing structures. AFAIK, the performance test done to compare two software implementations should not have parameter settings, favorable to one. I have not done any settings change favorable to thick index. The tuning I suggested is just basic tuning any knowledgeable Postgres DBA will do. It's not particularly in favor of any indexing scheme. With the default checkpoint settings, for example, the system is going to be busy doing checkpoints all the time if you have a reasonable rate of updates. The inserts and updates were at the rate of 10 every 2 seconds (there in the performance report) and the update was affecting two rows. I i haven't got any warning to increase the checkpoint during the test. But my doubt is if checkpoint has caused so much of overhead, as we think of, how can the performance of thick indexes exceed thin indexes in index only scans? As you might have observed all the statistics (Even the 90 and 95th percentile/median) were in milliseconds. So that might give a hint about the stress on the
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
On 10/26/07, Magnus Hagander [EMAIL PROTECTED] wrote: Can you try the attached patch? See how many backends you can get up to. This patch changes from using a single thread for each backend started to using the builtin threadpool functionality. It also replaces the pid/handle arrays with an i/o completion port. The net result is also, imho, much more readable code :-) The patch looks good; I'm not set up to build yet, but I should be able to test it sometime in the next week. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PANIC caused by open_sync on Linux
Greg Smith [EMAIL PROTECTED] writes: On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote: Mixed usage of buffered and direct i/o is legal, but enforces complexity to kernels. If we simplify it, things would be more relaxed. For example, dropping zero-filling and only use direct i/o. Is it possible? It's possible, but performance suffers considerably. I played around with this at one point when looking into doing all database writes as sync writes. Having to wait until the entire 16MB WAL segment made its way to disk before more WAL could be written can cause a nasty pause in activity, even with direct I/O sync writes. Even the current buffered zero-filled write of that size can be a bit of a drag on performance for the clients that get caught behind it, making it any sort of sync write will be far worse. This ties into a loose end we didn't get to yet: being more aggressive about creating future WAL segments. ISTM there is no good reason for clients ever to have to wait for WAL segment creation --- the bgwriter, or possibly the walwriter, ought to handle that in the background. But we only check for the case once per checkpoint and we don't create a segment unless there's very little space left. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Thanks Heikki for this quick answer. Actually we do the following lipq calls: - PQPrepare(... SELECT ... FROM ... FOR UPDATE ... ) - PQexecPrepared(...) - PQntuples(...) / PQgetvalue(...) i.e. we don't use the DECLARE CURSOR instruction, we just prepare/execute the plain SELECT statement (with potential parameters)... I can't remember why but there was some limitation or problems to use the DECLARE CURSOR in our context... must dig in my mail archive to give you more details... FYI we do actually a PostgreSQL driver for our runtime VM... Does a simple PQPrepare() with a SELECT statement not create a cursor on the server side? If yes, would it not be possible to pass a cursor name as in ODBC? Best regards, Seb Heikki Linnakangas wrote: Sebastien FLAESCH wrote: Just looked at the new features of 8.3 and realized that positioned updates/deletes is now possible with this new release... We would use that if we could define the cursor name with a libpq function. I don't understand. When you open a cursor with DECLARE CURSOR, you give it a name. Doesn't that do what you want? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though iGen showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are related to each other * ProcArrayLock waits - causing Waits as reported by 83_lockwait.d script * SimpleLRUReadPage - causing read IOs as reported by iostat/rsnoop.d * GetSnapshotData - causing CPU utilization as reported by hotuser But I will shut up and do more testing. Regards, Jignesh Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? I'd want to see a new set of test runs backing up any call for a change in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that benchmarks using code from a few months back shouldn't carry a lot of weight. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Tom Lane wrote: Sebastien FLAESCH [EMAIL PROTECTED] writes: Does a simple PQPrepare() with a SELECT statement not create a cursor on the server side? No. A prepared statement is just a query plan, not a query-in-progress. Yes of course, I meant PQprepare() + PQexecPrepared() ... The Bind/Execute messages sent by PQexecPrepared create something akin to a cursor, but libpq doesn't expose any API for fetching one row at a time in that context, so there's no way to use the current row anyway. OK... that makes total sense (sorry I jump from one database to another and sometimes I forget implementation details of a specific driver). Sounds like I need to rework my driver to enable row by row fetching with the DECLARE CURSOR + FETCH commands... right? As I wrote: I knew these commands before, but for some reason I did not use that solution because I faced problems. I started with PostgreSQL 7.1 so maybe there was some limitation that does no more exist in 8.x ... Maybe it was because there is not real OPEN command, so you can't distinguish the preparation phase from the execution phase with the DECLARE CURSOR instruction... Or maybe because cursors could only exist inside a transaction block? Just found this in the 7.1 doc: http://www.postgresql.org/docs/7.1/static/sql-declare.html ERROR: DECLARE CURSOR may only be used in begin/end transaction blocks This error occurs if the cursor is not declared within a transaction block. Anyway... Thanks a lot guys for these quick answers. That changes from Oracle or SQL Server support. Seb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
The problem I saw was first highlighted by EAStress runs with PostgreSQL on Solaris with 120-150 users. I just replicated that via my smaller internal benchmark that we use here to recreate that problem. EAStress should be just fine to highlight it.. Just put pg_clog on O_DIRECT or something so that all IOs go to disk making it easier to observe. In the meanwhile I will try to get more information. Regards, Jignesh Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Didn't we already go through this? He and Simon were pushing to bump up NUM_CLOG_BUFFERS and you were arguing that the test wasn't representative and some other clog.c would have to be reengineered to scale well to larger values. AFAIR we never did get any clear explanation of what the test case is. I guess it must be write-mostly, else lazy XID assignment would have helped this by reducing the rate of XID consumption. It's still true that I'm leery of a large increase in the number of buffers without reengineering slru.c. That code was written on the assumption that there were few enough buffers that a linear search would be fine. I'd hold still for 16, or maybe even 32, but I dunno how much impact that will have for such a test case. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PANIC caused by open_sync on Linux
On 10/26/07, Tom Lane [EMAIL PROTECTED] wrote: This ties into a loose end we didn't get to yet: being more aggressive about creating future WAL segments. ISTM there is no good reason for clients ever to have to wait for WAL segment creation --- the bgwriter, or possibly the walwriter, ought to handle that in the background. Agreed. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Forget this one, just missing the WITH HOLD option... Must teach myself a bit more before sending further mails. Seb Sebastien FLAESCH wrote: Ok... Tested with 8.2.3: Actually you can't DECLARE a cursor outside a transaction: test1= declare c1 cursor for select * from dbit2; ERROR: DECLARE CURSOR may only be used in transaction blocks That's the main reason why we don't use DECLARE CURSOR... I understand we could use DECLARE CURSOR when a FOR UPDATE is detected, to allow UPDATE/DELETE WHERE CURRENT OF - these only make sense inside a transaction... But for normal cursors, we need to support multiple active result sets that can last outside a transaction block. Basically, we need all what you can do with ODBC cursors. Anyway, thanks for your help. Seb Tom Lane wrote: Sebastien FLAESCH [EMAIL PROTECTED] writes: Does a simple PQPrepare() with a SELECT statement not create a cursor on the server side? No. A prepared statement is just a query plan, not a query-in-progress. The Bind/Execute messages sent by PQexecPrepared create something akin to a cursor, but libpq doesn't expose any API for fetching one row at a time in that context, so there's no way to use the current row anyway. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Ok... Tested with 8.2.3: Actually you can't DECLARE a cursor outside a transaction: test1= declare c1 cursor for select * from dbit2; ERROR: DECLARE CURSOR may only be used in transaction blocks That's the main reason why we don't use DECLARE CURSOR... I understand we could use DECLARE CURSOR when a FOR UPDATE is detected, to allow UPDATE/DELETE WHERE CURRENT OF - these only make sense inside a transaction... But for normal cursors, we need to support multiple active result sets that can last outside a transaction block. Basically, we need all what you can do with ODBC cursors. Anyway, thanks for your help. Seb Tom Lane wrote: Sebastien FLAESCH [EMAIL PROTECTED] writes: Does a simple PQPrepare() with a SELECT statement not create a cursor on the server side? No. A prepared statement is just a query plan, not a query-in-progress. The Bind/Execute messages sent by PQexecPrepared create something akin to a cursor, but libpq doesn't expose any API for fetching one row at a time in that context, so there's no way to use the current row anyway. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PostgreSQL 8.3, libpq and WHERE CURRENT OF
Sebastien FLAESCH [EMAIL PROTECTED] writes: Does a simple PQPrepare() with a SELECT statement not create a cursor on the server side? No. A prepared statement is just a query plan, not a query-in-progress. The Bind/Execute messages sent by PQexecPrepared create something akin to a cursor, but libpq doesn't expose any API for fetching one row at a time in that context, so there's no way to use the current row anyway. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PERFORM] 8.3beta1 testing on Solaris
Hi George, I have seen the 4M/sec problem first actually during an EAStress type run with only 150 connections. I will try to do more testing today that Tom has requested. Regards, Jignesh Gregory Stark wrote: Jignesh K. Shah [EMAIL PROTECTED] writes: CLOG data is not cached in any PostgreSQL shared memory segments and hence becomes the bottleneck as it has to constantly go to the filesystem to get the read data. This is the same bottleneck you discussed earlier. CLOG reads are cached in the Postgres shared memory segment but only NUM_CLOG_BUFFERS are which defaults to 8 buffers of 8kb each. With 1,000 clients and the transaction rate you're running you needed a larger number of buffers. Using the filesystem buffer cache is also an entirely reasonable solution though. That's surely part of the logic behind not trying to keep more of the clog in shared memory. Do you have any measurements of how much time is being spent just doing the logical I/O to the buffer cache for the clog pages? 4MB/s seems like it's not insignificant but your machine is big enough that perhaps I'm thinking at the wrong scale. I'm really curious whether you see any benefit from the vxid read-only transactions. I'm not sure how to get an apples to apples comparison though. Ideally just comparing it to CVS HEAD from immediately prior to the vxid patch going in. Perhaps calling some function which forces an xid to be allocated and seeing how much it slows down the benchmark would be a good substitute. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid(arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace -s syscall::read:entry /execname==postgres/ { @read[fds[arg0].fi_pathname, arg1] = count(); } # ./read.d dtrace: script './read.d' matched 1 probe ^C /export/home0/igen/pgdata/pg_clog/0014 -27530282934721 /export/home0/igen/pgdata/pg_clog/0014 -27530282770881 /export/home0/igen/pgdata/pg_clog/0015 -27530282443202 /export/home0/igen/pgdata/pg_clog/0015 -2753028268896 14 /export/home0/igen/pgdata/pg_clog/0015 -2753028260704 25 /export/home0/igen/pgdata/pg_clog/0015 -2753028252512 27 /export/home0/igen/pgdata/pg_clog/0015 -2753028277088 28 /export/home0/igen/pgdata/pg_clog/0015 -2753028293472 37 FYI I pressed ctrl-c within like less than a second So to me this seems that multiple processes are reading the same page from different pids. (This was with about 600 suers active. Aparently we do have a problem that we are reading the same buffer address again. (Same as not being cached anywhere or not finding it in cache anywhere). I reran lock wait script on couple of processes and did not see CLogControlFileLock as a problem.. # ./83_lwlock_wait.d 14341 Lock IdMode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 16 Lock Id Combined Time (ns) WALInsertLock 383109 ProcArrayLock198866236 # ./83_lwlock_wait.d 14607 Lock IdMode Count WALInsertLock Exclusive 2 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) WALInsertLock55243 ProcArrayLock 69700140 # What will help you find out why it is reading the same page again? -Jignesh Jignesh K. Shah wrote: I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though iGen showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are related to each other * ProcArrayLock waits - causing Waits as reported by 83_lockwait.d script * SimpleLRUReadPage - causing read IOs as reported by iostat/rsnoop.d * GetSnapshotData - causing CPU utilization as reported by hotuser But I will shut up and do more testing. Regards, Jignesh Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: Actually, 32 made a significant difference as I recall ... do you still have the figures for that, Jignesh? I'd want to see a new set of test runs backing up any call for a change in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that benchmarks using code from a few months back shouldn't carry a lot of weight. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Taking this one to -hackers once and for all now... Can you try the attached patch? See how many backends you can get up to. Regression tests run just fine, and I've run multiple pgbench runs with 3 and 4 sessions of 100 connections each*, with pgAdmin monitoring things at the same time. Saw up to 403 simultanteous connections in pg_stat_activity, and the system remained stable and responsive, albeit somewhat slower than normal. What was the memory space consumption of the postmaster process, and compared to without the patch? VM size in taskmgr should show that I think, and should show a much smaller footprint now.. /Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Magnus Hagander wrote: Taking this one to -hackers once and for all now... Can you try the attached patch? See how many backends you can get up to. Regression tests run just fine, and I've run multiple pgbench runs with 3 and 4 sessions of 100 connections each*, with pgAdmin monitoring things at the same time. Saw up to 403 simultanteous connections in pg_stat_activity, and the system remained stable and responsive, albeit somewhat slower than normal. So, 400 connections on a 2.33GHz MacBook Pro running XP Pro with 2GB RAM - thats not too shabby :-) /D * For those that weren't peering over Magnus' or Greg's shoulder during various IM discussions over the last few days, I've found that the ~125 connection ceiling I was hitting when running from a command prompt was actually an as yet unsolved problem in pgbench, not the server. Multiple pgbench sessions seem to run just fine if kept to around 100 connections each. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
Also to give perspective on the equivalent writes on CLOG I used the following script which runs for 10 sec to track all writes to the clog directory and here is what it came up with... (This is with 500 users running) # cat write.d #!/usr/sbin/dtrace -s syscall::write:entry /execname==postgres dirname(fds[arg0].fi_pathname)==/export/home0/igen/pgdata/pg_clog/ { @write[fds[arg0].fi_pathname,arg1] = count(); } tick-10sec { exit(0); } # ./write.d dtrace: script './write.d' matched 2 probes CPU IDFUNCTION:NAME 3 1026 :tick-10sec /export/home0/igen/pgdata/pg_clog/001E -27530282770881 # I modified read.d to do a 5sec read # ./read.d dtrace: script './read.d' matched 3 probes CPU IDFUNCTION:NAME 0 1 :BEGIN 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/001F -27530282688961 /export/home0/igen/pgdata/pg_clog/001F -27530282525121 /export/home0/igen/pgdata/pg_clog/001F -27530282852802 /export/home0/igen/pgdata/pg_clog/001F -27530282770883 /export/home0/igen/pgdata/pg_clog/001F -27530282361283 /export/home0/igen/pgdata/pg_clog/001E -27530282852805 /export/home0/igen/pgdata/pg_clog/001E -27530282361289 /export/home0/igen/pgdata/pg_clog/001E -2753028277088 13 /export/home0/igen/pgdata/pg_clog/001E -2753028268896 15 /export/home0/igen/pgdata/pg_clog/001E -2753028252512 27 # So the ratio of reads vs writes to clog files is pretty huge.. -Jignesh Jignesh K. Shah wrote: Tom, Here is what I did: I started aggregating all read information: First I also had added group by pid(arg0,arg1, pid) and the counts were all coming as 1 Then I just grouped by filename and location (arg0,arg1 of reads) and the counts came back as # cat read.d #!/usr/sbin/dtrace -s syscall::read:entry /execname==postgres/ { @read[fds[arg0].fi_pathname, arg1] = count(); } # ./read.d dtrace: script './read.d' matched 1 probe ^C /export/home0/igen/pgdata/pg_clog/0014 -27530282934721 /export/home0/igen/pgdata/pg_clog/0014 -27530282770881 /export/home0/igen/pgdata/pg_clog/0015 -27530282443202 /export/home0/igen/pgdata/pg_clog/0015 -2753028268896 14 /export/home0/igen/pgdata/pg_clog/0015 -2753028260704 25 /export/home0/igen/pgdata/pg_clog/0015 -2753028252512 27 /export/home0/igen/pgdata/pg_clog/0015 -2753028277088 28 /export/home0/igen/pgdata/pg_clog/0015 -2753028293472 37 FYI I pressed ctrl-c within like less than a second So to me this seems that multiple processes are reading the same page from different pids. (This was with about 600 suers active. Aparently we do have a problem that we are reading the same buffer address again. (Same as not being cached anywhere or not finding it in cache anywhere). I reran lock wait script on couple of processes and did not see CLogControlFileLock as a problem.. # ./83_lwlock_wait.d 14341 Lock IdMode Count WALInsertLock Exclusive 1 ProcArrayLock Exclusive 16 Lock Id Combined Time (ns) WALInsertLock 383109 ProcArrayLock198866236 # ./83_lwlock_wait.d 14607 Lock IdMode Count WALInsertLock Exclusive 2 ProcArrayLock Exclusive 15 Lock Id Combined Time (ns) WALInsertLock55243 ProcArrayLock 69700140 # What will help you find out why it is reading the same page again? -Jignesh Jignesh K. Shah wrote: I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I promise to look more into it before making any recommendations to increase NUM_CLOG_BUFFERs. Because though iGen showed improvements in that area by increasing num_clog_buffers , EAStress had shown no improvements.. Plus the reason I think this is not the problem in 8.3beta1 since the Lock Output clearly does not show CLOGControlFile as to be the issue which I had seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS will change thing here. Now I dont understand the code pretty well yet I see three hotspots and not sure if they are related to each other * ProcArrayLock waits - causing Waits as reported by 83_lockwait.d script
Re: [HACKERS] 8.3 GSS Issues
On Oct 26, 2007, at 12:56 AM, Magnus Hagander wrote: On Thu, Oct 25, 2007 at 05:39:37PM -0700, Henry B. Hotz wrote: On Oct 25, 2007, at 3:27 PM, Stephen Frost wrote: * Henry B. Hotz ([EMAIL PROTECTED]) wrote: What you're asking for is basically a krb_match_realm parameter, or do I understand you wrong? I'm asking for name matching to be done i.a.w. the gssapi recommendations. That's all I want, but it's actually necessary for this feature to be at all usable in my environment. If we don't then I suggest we pull this feature until it can be done correctly. I know what you want, that's fairly obvious. I'm only asking about *how* to do it the best way. What the krb5 method does is IMO a documented bug. The realm name is part of the name. As I explained at some length you cannot assume the username (first component of the principal) has any meaning by itself, except in small deployments with no external trust agreements. Kerberos (and AD) are designed to support larger infrastructures with multiple organizations. This isn't unexpected for PG as the current krb5 support does this. I'm not a big fan of it but at the same time I don't feel it's justification to drop it from 8.3. Having it only allow the default realm would be an option which could work in 8.3, imv. I don't think the fact that the existing krb5 code does the wrong thing (and can't be used in an environment with cross-realm agreements) is justification for doing the wrong thing in a new capability. The code in my original patch would do the latter (default realm only). More precisely: if you do a gss_import_name() on smith and [EMAIL PROTECTED] you get the same internal representation, and gss_compare_name() will tell you they're the same. Also gss_compare_name() will tell you [EMAIL PROTECTED] is different from either of the first two. Wouldn't using a specific parameter like krb_match_realm=YOUR.REALM that you set in the config file be more flexible? In that it actually allows scenarios like server/resource domains (not sure how common they are in unix krb setups, but they're certainly not unfamiliar in the Windows AD world)? Yes and no. It certainly would have made it easier to test my original patch since the server was in a test realm and I couldn't use my normal production identity. I'd imagine deployments where the users are in a different realm from the servers are somewhat common. The counter is that (if done naively) it would prevent you from supporting users from multiple realms at all. I never completely tested this, but I think with my original patch you could define both smith (== [EMAIL PROTECTED]) and [EMAIL PROTECTED] as users to PG. They wouldn't be the same user (which you might want), but you could support both of them. Is there any (other) code in PG that would barf on long usernames that contain @ and/or .? If we don't use gss_compare_name(), or some similar mechanism, to compare connection names to PG usernames, then I don't think GSSAPI support should be included in 8.3. I think that's a horrible idea, given that it works perfectly fine the way it is now for the vast majority of users. That said, we should certainly fix it in one way or another for 8.3. But if that fails, I see no reason at all to pull the feature. If this isn't fixed then PG will never be a supported infrastructure service at JPL the way MySQL currently is. I had hoped to use the GSSAPI support as a feature to pry some people away from MySQL, but without the ability to integrate into a multi-realm infrastructure this won't fly. Of course even with proper support it still may never happen, so that isn't a threat. Longer term (since it's likely too late to be accepted now), as I think has been discussed in the past, PG could really use a .k5login-esque, either admin-only (ala pg_hba.conf / ident map) or per-user (some sort of ALTER ROLE that a user could do on himself?), mapping functionality. There has been discussion of a general mapping layer between authentication names and authorization/role names. I think that's the way to go. I haven't thought about who or where the administration of the mapping ought to be. Yeah, I agree that something like that would be a good long-term solution. For those on the periphery: the PG protocol already carries the PG username, and the Kerberos and GSSAPI auth methods carry the authentication name. If you define the PG username to be the authorization name then you have a pretty standard architecture. If anyone has used a kerberized ssh then you have a similar split. You can kinit as A and then ssh [EMAIL PROTECTED]. If you have put A into ~B/.k5login then you get in without a password prompt. For a proper discussion of this topic I recommend the section starting on page 64 of Sun's Security for Developers Guide, document 816-4863. Note that there is a discussion of
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Magnus Hagander wrote: VM size in taskmgr should show that I think, and should show a much smaller footprint now.. With patch -4,492K Without patch: 28,224K Thats with 3 x 100 pgbench connections. /D ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Dave Page wrote: Magnus Hagander wrote: VM size in taskmgr should show that I think, and should show a much smaller footprint now.. With patch -4,492K Without patch: 28,224K Thats with 3 x 100 pgbench connections. That's nice! But. That can't be address space usage, it has to be actual memory usage. Since each thread should chew up 4Mb of address space, and there's at least two threads in there :-) So looking at the VM column was obviously not correct. * looks up some docs* Right. You need to look at VM size in *process explorer*. VM size in task manager has nothing to do with VM size, it's the private bytes :-S And there is no way to see that info from task manager, I think. PE is your friend. Anyway. Other than a refresher on those, I'd be interested in two other important parts: * How many threads does it reach when you have 300 active backends? * Is there a handle leak? meaning once your 300 backends have exited, does the number of handles in the process drop down to the same value it had before? (sorry, wish I was in a position to run these tests myself, but I'm not right now) //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
Jignesh K. Shah [EMAIL PROTECTED] writes: So the ratio of reads vs writes to clog files is pretty huge.. It looks to me that the issue is simply one of not having quite enough CLOG buffers. Your first run shows 8 different pages being fetched and the second shows 10. Bearing in mind that we pin the latest CLOG page into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for older pages, so what we've got here is thrashing for the available slots. Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Dave Page wrote: Magnus Hagander wrote: Right. You need to look at VM size in *process explorer*. VM size in task manager has nothing to do with VM size, it's the private bytes :-S And there is no way to see that info from task manager, I think. PE is your friend. Anyway. Other than a refresher on those, I'd be interested in two other important parts: * How many threads does it reach when you have 300 active backends? * Is there a handle leak? meaning once your 300 backends have exited, does the number of handles in the process drop down to the same value it had before? Without patch: VM: 1,322,792K Idle threads: 6 Peak threads: 306 Handles at start: 576 Handles at end: 576 With patch: VM: 98,088K Idle threads: 3 Peak threads: 7 Handles at start: 576 Handles at end: 585 (585 again after second run). Ah, now we're talking. That's the kind of reduction I was looking for :-) I think the difference in handles is because the threadpool keeps some things around. As long as it stays at 585 and comes back down after a second run, we're fine at that - there's no leak. Attached is an updated version of the patch, currently being tested by both me and Dave. If it passes our tests, I'll apply this so it gets included for broader testing in beta2. //Magnus Index: src/backend/postmaster/postmaster.c === RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.542 diff -c -r1.542 postmaster.c *** src/backend/postmaster/postmaster.c 26 Sep 2007 22:36:30 - 1.542 --- src/backend/postmaster/postmaster.c 26 Oct 2007 20:09:35 - *** *** 331,344 #ifdef EXEC_BACKEND #ifdef WIN32 - static void win32_AddChild(pid_t pid, HANDLE handle); - static void win32_RemoveChild(pid_t pid); static pid_t win32_waitpid(int *exitstatus); ! static DWORD WINAPI win32_sigchld_waiter(LPVOID param); ! static pid_t *win32_childPIDArray; ! static HANDLE *win32_childHNDArray; ! static unsigned long win32_numChildren = 0; HANDLE PostmasterHandle; #endif --- 331,347 #ifdef EXEC_BACKEND #ifdef WIN32 static pid_t win32_waitpid(int *exitstatus); ! static void WINAPI pgwin32_deadchild_callback(PVOID lpParameter, BOOLEAN TimerOrWaitFired); ! static HANDLE win32ChildQueue; ! ! typedef struct ! { ! HANDLE waitHandle; ! HANDLE procHandle; ! DWORD procId; ! } win32_deadchild_waitinfo; HANDLE PostmasterHandle; #endif *** *** 899,914 #ifdef WIN32 /* ! * Initialize the child pid/HANDLE arrays for signal handling. */ ! win32_childPIDArray = (pid_t *) ! malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(pid_t))); ! win32_childHNDArray = (HANDLE *) ! malloc(mul_size(NUM_BACKENDARRAY_ELEMS, sizeof(HANDLE))); ! if (!win32_childPIDArray || !win32_childHNDArray) ereport(FATAL, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg(out of memory))); /* * Set up a handle that child processes can use to check whether the --- 902,913 #ifdef WIN32 /* ! * Initialize I/O completion port used to deliver list of dead children. */ ! win32ChildQueue = CreateIoCompletionPort(INVALID_HANDLE_VALUE, NULL, 0, 1); ! if (win32ChildQueue == NULL) ereport(FATAL, ! (errmsg(could not create I/O completion port for child queue))); /* * Set up a handle that child processes can use to check whether the *** *** 2072,2083 #define LOOPHEADER() (exitstatus = status.w_status) #else /* WIN32 */ #define LOOPTEST() ((pid = win32_waitpid(exitstatus)) 0) ! /* ! * We need to do this here, and not in CleanupBackend, since this is ! * to be called on all children when we are done with them. Could move ! * to LogChildExit, but that seems like asking for future trouble... ! */ ! #define LOOPHEADER() (win32_RemoveChild(pid)) #endif /* WIN32 */ #endif /* HAVE_WAITPID */ --- 2071,2077 #define LOOPHEADER() (exitstatus = status.w_status) #else /* WIN32 */ #define LOOPTEST() ((pid = win32_waitpid(exitstatus)) 0) ! #define LOOPHEADER() #endif /* WIN32 */ #endif /* HAVE_WAITPID */ *** *** 3332,3343 int i; int j; char cmdLine[MAXPGPATH * 2]; - HANDLE childHandleCopy; - HANDLE waiterThread; HANDLE paramHandle; BackendParameters *param; SECURITY_ATTRIBUTES sa; char paramHandleStr[32]; /* Make sure caller set up argv properly */ Assert(argc = 3); --- 3326,3336 int i; int j; char cmdLine[MAXPGPATH * 2]; HANDLE paramHandle; BackendParameters *param; SECURITY_ATTRIBUTES sa; char paramHandleStr[32]; + win32_deadchild_waitinfo *childinfo; /* Make sure caller set up argv properly */ Assert(argc = 3); *** *** 3345,3359
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Magnus Hagander wrote: Right. You need to look at VM size in *process explorer*. VM size in task manager has nothing to do with VM size, it's the private bytes :-S And there is no way to see that info from task manager, I think. PE is your friend. Anyway. Other than a refresher on those, I'd be interested in two other important parts: * How many threads does it reach when you have 300 active backends? * Is there a handle leak? meaning once your 300 backends have exited, does the number of handles in the process drop down to the same value it had before? Without patch: VM: 1,322,792K Idle threads: 6 Peak threads: 306 Handles at start: 576 Handles at end: 576 With patch: VM: 98,088K Idle threads: 3 Peak threads: 7 Handles at start: 576 Handles at end: 585 (585 again after second run). /D ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] win32 threads patch vs beta2 - what to do?
Magnus Hagander wrote: Dave Page wrote: Magnus Hagander wrote: Right. You need to look at VM size in *process explorer*. VM size in task manager has nothing to do with VM size, it's the private bytes :-S And there is no way to see that info from task manager, I think. PE is your friend. Anyway. Other than a refresher on those, I'd be interested in two other important parts: * How many threads does it reach when you have 300 active backends? * Is there a handle leak? meaning once your 300 backends have exited, does the number of handles in the process drop down to the same value it had before? Without patch: VM: 1,322,792K Idle threads:6 Peak threads:306 Handles at start:576 Handles at end: 576 With patch: VM: 98,088K Idle threads:3 Peak threads:7 Handles at start:576 Handles at end: 585 (585 again after second run). Ah, now we're talking. That's the kind of reduction I was looking for :-) I think the difference in handles is because the threadpool keeps some things around. As long as it stays at 585 and comes back down after a second run, we're fine at that - there's no leak. Attached is an updated version of the patch, currently being tested by both me and Dave. If it passes our tests, I'll apply this so it gets included for broader testing in beta2. So of course, for all good patches, problems turn up :-( This patch doesn't work on mingw, at least not on all versions. The reason is that, as usual, the mingw libraries are not complete. We've dealt with this before, by dynamically loading the functions. We know this works. But I don't have time to fix that tonight, and I'll be offline much of this weekend. Now, given these great improvements, I'd very much like this in beta2, so it can get proper testing. This leaves us with a couple of choices: 1) Delay beta2 until the beginning of next week. I'll get this fixed sunday evening or monday at the latest. I know how to fix it, it's just that I don't have the time right now :( (This assumes that the plan still is to wrap beta2 today) 2) Apply the patch as-is. Then beta2 will work fine with the msvc build, which is used for the binary distribution. But it's broken on mingw until fixed, which of course includes the buildfarm stuff. Again, we know how to fix this. 2b) I apply this as-is, and someone else cleans up mingw before beta2 is wrapped. 3) We don't apply this, and wait until beta3 to have it tested. Depending on how many betas we end up with, that may leave us with very little testing before release. 2b is of course the best here, but then someone has to step up and volunteer to do that. I'm leaning towards applying the patch now, and hoping for 2b to happen. If it doesn't happen, the choice between 1 and 2 can be made when the time to wrap the beta approaches (at which point I will be offline, so I escape :-P) The patch that would go in is the one previously posted plus a couple of minor edits in comments as suggested on IM by Alvaro. Comments? //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] win32 threads patch vs beta2 - what to do?
Magnus Hagander wrote: I'm leaning towards applying the patch now, and hoping for 2b to happen. I think we should live with the mingw BF breakage for a day or two. The patch is clearly an important improvement, but it should be as widely tested as possible. /D ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] win32 threads patch vs beta2 - what to do?
On Fri, 26 Oct 2007 21:58:03 +0100 Dave Page [EMAIL PROTECTED] wrote: Magnus Hagander wrote: I'm leaning towards applying the patch now, and hoping for 2b to happen. I think we should live with the mingw BF breakage for a day or two. The patch is clearly an important improvement, but it should be as widely tested as possible. I think this makes sense. I certainly don't want to hold back Beta2 and this patch so far is an obvious improvement. Sincerely, Joshua D. Drake /D ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ signature.asc Description: PGP signature
Re: [HACKERS] win32 threads patch vs beta2 - what to do?
Dave Page [EMAIL PROTECTED] writes: Magnus Hagander wrote: I'm leaning towards applying the patch now, and hoping for 2b to happen. I think we should live with the mingw BF breakage for a day or two. The patch is clearly an important improvement, but it should be as widely tested as possible. If Dave is intending to build the Windows installer as soon as beta2 is wrapped, then I agree with this plan. But my understanding was we'd missed his window for today and so that wouldn't happen till Monday. Assuming that's true, the idea Bruce and I had discussed on the phone was: 1. Wrap official beta2 tonight, so that other packagers can work from it over the weekend; 2. Magnus fixes his patch Sunday and applies it then; 3. Dave builds the Windows installer Monday *from CVS tip*. So the Windows version would be beta2-plus-a-little but we'd neither hold up work on other platforms nor break anything in buildfarm. Just an alternative thought. In any case I agree that we want Windows testing of beta2 to include this patch. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [HACKERS] 8.3beta1 testing on Solaris
I changed CLOG Buffers to 16 Running the test again: # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0024 -27530282192961 /export/home0/igen/pgdata/pg_clog/0025 -27530282111041 # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 1 1027 :tick-5sec # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 1 1027 :tick-5sec # ./read.d dtrace: script './read.d' matched 2 probes CPU IDFUNCTION:NAME 0 1027 :tick-5sec /export/home0/igen/pgdata/pg_clog/0025 -27530281947201 So Tom seems to be correct that it is a case of CLOG Buffer thrashing. But since I saw the same problem with two different workloads, I think people hitting this problem is pretty high. Also I am bit surprised that CLogControlFile did not show up as being hot.. Maybe because not much writes are going on .. Or maybe since I did not trace all 500 users to see their hot lock status.. Dmitri has another workload to test, I might try that out later on to see if it causes similar impact or not. Of course I havent seen my throughput go up yet since I am already CPU bound... But this is good since the number of IOPS to the disk are reduced (and hence system calls). If I take this as my baseline number.. I can then proceed to hunt other bottlenecks Whats the view of the community? Hunt down CPU utilizations or Lock waits next? Your votes are crucial on where I put my focus. Another thing Josh B told me to check out was the wal_writer_delay setting: I have done two settings with almost equal performance (with the CLOG 16 setting) .. One with 100ms and other default at 200ms.. Based on the runs it seemed that the 100ms was slightly better than the default .. (Plus the risk of loosing data is reduced from 600ms to 300ms) Thanks. Regards, Jignesh Tom Lane wrote: Jignesh K. Shah [EMAIL PROTECTED] writes: So the ratio of reads vs writes to clog files is pretty huge.. It looks to me that the issue is simply one of not having quite enough CLOG buffers. Your first run shows 8 different pages being fetched and the second shows 10. Bearing in mind that we pin the latest CLOG page into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for older pages, so what we've got here is thrashing for the available slots. Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Magnus Hagander [EMAIL PROTECTED] writes: Attached is an updated version of the patch, currently being tested by both me and Dave. If it passes our tests, I'll apply this so it gets included for broader testing in beta2. One question: what's this about? + #define _WIN32_WINNT 0x0500 This looks like it might be tightening our assumptions about which Windows flavors we can run on. I'm not necessarily against that, but it should be publicly discussed if it's happening. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] win32 threads patch vs beta2 - what to do?
Tom Lane wrote: Dave Page [EMAIL PROTECTED] writes: Magnus Hagander wrote: I'm leaning towards applying the patch now, and hoping for 2b to happen. I think we should live with the mingw BF breakage for a day or two. The patch is clearly an important improvement, but it should be as widely tested as possible. If Dave is intending to build the Windows installer as soon as beta2 is wrapped, then I agree with this plan. But my understanding was we'd missed his window for today and so that wouldn't happen till Monday. Assuming that's true, the idea Bruce and I had discussed on the phone was: 1. Wrap official beta2 tonight, so that other packagers can work from it over the weekend; 2. Magnus fixes his patch Sunday and applies it then; 3. Dave builds the Windows installer Monday *from CVS tip*. So the Windows version would be beta2-plus-a-little but we'd neither hold up work on other platforms nor break anything in buildfarm. Just an alternative thought. In any case I agree that we want Windows testing of beta2 to include this patch. If we do that, then what we label as beta2 in the installer is *not* the same as someone who has built beta2 from source. Can't say I like that one, and I know Dave doesn't like it (he said so before going offline). I'd rather see msvc working and mingw broken for beta2 really, because then we *know* that if someone says they're doing beta2 on mingw they're misinformed... //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 8.2.3: Server crashes on Windows using Eclipse/Junit
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Attached is an updated version of the patch, currently being tested by both me and Dave. If it passes our tests, I'll apply this so it gets included for broader testing in beta2. One question: what's this about? + #define _WIN32_WINNT 0x0500 This looks like it might be tightening our assumptions about which Windows flavors we can run on. I'm not necessarily against that, but it should be publicly discussed if it's happening. It enables Windows 2000-specific headers. We already require Windows 2000 to run, so it doesn't restrict us anymore than we already are. It just exposes those parts of the header files. //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, Oct 26, 2007 at 08:34:49AM -0400, Tom Lane wrote: we only check for the case once per checkpoint and we don't create a segment unless there's very little space left. Sort of a filthy hack, but what about always having an _extra_ segment around? The bgwriter could do that, no? A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] win32 threads patch vs beta2 - what to do?
Joshua D. Drake wrote: On Fri, 26 Oct 2007 21:58:03 +0100 Dave Page [EMAIL PROTECTED] wrote: Magnus Hagander wrote: I'm leaning towards applying the patch now, and hoping for 2b to happen. I think we should live with the mingw BF breakage for a day or two. The patch is clearly an important improvement, but it should be as widely tested as possible. I think this makes sense. I certainly don't want to hold back Beta2 and this patch so far is an obvious improvement. I read the consensus of this thread as apply the patch as-is and let's fix mingw as soon as we can, so this is what I've done. If I got it wrong, feel free to back out :-) //Magnus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] WAL archiving idle database
Jeff Davis [EMAIL PROTECTED] writes: What's happening is that you have a checkpoint_timeout of 5 minutes, and that checkpoint must write a checkpoint record in the WAL, prompting the archiving. If you want it to happen less frequently, it's often safe to have checkpoint timeout set to something larger by a reasonable amount. I think you're confusing checkpoint_timeout and archive_timeout... I seem to recall this behavior having been discussed before, but I can't find it in the archives right now. What is happening is that after each checkpoint_timeout, we test to see if we need to write a new checkpoint; which is determined by whether anything's been inserted into WAL since the start of the last checkpoint. And after each archive_timeout, we test to see if we need to flush the current WAL segment out to the archive; which is determined by whether the write pointer is currently exactly at the start of a segment or not. Which would be fine, except that the has anything been inserted since last checkpoint test is actually done by seeing if the WAL insert pointer has moved. Which it will have, if we did an archive flush. And that means that each of these activities makes it look to the other one like something has happened, and so you get a checkpoint record every checkpoint_timeout seconds, and then we flush the entire WAL file (containing only that record), even if the database is in reality completely idle. Lather, rinse, repeat. In the prior discussion that I seem to remember, we didn't think of a decent solution, and it kinda fell off the radar since zero-activity isn't too interesting to a lot of folks. However, chewing on it again I think I've come up with a good idea that will fix this and actually simplify the code a bit: * Add a boolean flag insertedXLog to XLogCtlInsert, which means at least one WAL record has been inserted since start of last checkpoint. Also add a flag completedCkpt somewhere in XLogCtlData, which means checkpoint successfully completed; this second flag is only used by checkpoint so it can be considered as being protected by the CheckpointLock. At startup we can initialize insertedXLog = false, completedCkpt = true. * XLogInsert sets insertedXLog to true while holding WALInsertLock, *except* when inserting either a checkpoint record or an xlog switch record; in those cases it doesn't change the flag. * CreateCheckpoint replaces its current rather complex test (lines 5693-5703 in CVS-tip xlog.c) with if insertedXLog is clear and completedCkpt is set, we need not checkpoint. If it does have to perform a checkpoint, it clears both flags before releasing WALInsertLock. * After successful completion of a checkpoint, completedCkpt gets set. Because insertedXLog is cleared at the same time the checkpoint's REDO pointer is determined, this will correctly implement the requirement of detecting whether anything has been inserted since the last REDO point. This replaces the current indirect test involving comparing the last checkpoint's REDO pointer to its own address. However we have to not set insertedXLog when we finally do insert the checkpoint record, thus the special case is needed in XLogInsert. The other special case of ignoring xlog switch is what's needed to fix the bug, and is obviously OK because an xlog switch doesn't represent a checkpointable change. The reason we need the completedCkpt flag is that if a checkpoint fails partway through, it would nonetheless have cleared insertedXLog, and we don't want that to cause us to not retry the checkpoint next time. This is slightly warty but it certainly seems a lot clearer than the current test in lines 5693-5703. The couple of lines to be added to XLogInsert should have negligible performance impact. Comments? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [GENERAL] WAL archiving idle database
Jeff Davis [EMAIL PROTECTED] writes: Keep in mind that even in the current system, your configuration is variable based on the checkpoint_timeout setting. Yeah, and he has to keep this less than archive_timeout in order for it to work the way he wants, which is probably not good for performance. (Sane settings of checkpoint_timeout are probably higher, not lower, than what people are likely to use for archive_timeout.) I think my recommendation to Kevin would be to force some trivial transaction to occur a little before each expected archive_timeout, so that there will be something to be archived. This would have the additional advantage that the monitor is checking that the database is actually responding to queries, whereas just noting that it's spitting out WAL files doesn't really prove that --- especially not if mere no-op checkpoints can cause WAL files to be emitted. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] WAL archiving idle database
On Fri, Oct 26, 2007 at 5:47 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: And after each archive_timeout, we test to see if we need to flush the current WAL segment out to the archive; which is determined by whether the write pointer is currently exactly at the start of a segment or not. Hmmm... We would actually prefer to get the WAL file at the specified interval. We have software to ensure that the warm standby instances are not getting stale, and that's pretty simple with the current behavior. We don't have a bandwidth or storage space issue because we zero out the unused portion of the WAL file and gzip it -- an empty file's about 16 KB. Checking that the whole system is healthy gets a lot more complicated if we stop sending empty WAL files. Could this at least be a configurable option? -Kevin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] WAL archiving idle database
Kevin Grittner [EMAIL PROTECTED] writes: On Fri, Oct 26, 2007 at 5:47 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: And after each archive_timeout, we test to see if we need to flush the current WAL segment out to the archive; which is determined by whether the write pointer is currently exactly at the start of a segment or not. Hmmm... We would actually prefer to get the WAL file at the specified interval. Well, if it's a feature not a bug, that's fine with me. I wonder though how predictable the behavior will really be with 8.3's distributed checkpoints ... you might need to find another way anyhow. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [GENERAL] WAL archiving idle database
On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote: On Fri, Oct 26, 2007 at 5:47 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: And after each archive_timeout, we test to see if we need to flush the current WAL segment out to the archive; which is determined by whether the write pointer is currently exactly at the start of a segment or not. Hmmm... We would actually prefer to get the WAL file at the specified interval. We have software to ensure that the warm standby instances are not getting stale, and that's pretty simple with the current behavior. We don't have a bandwidth or storage space issue because we zero out the unused portion of the WAL file and gzip it -- an empty file's about 16 KB. Checking that the whole system is healthy gets a lot more complicated if we stop sending empty WAL files. Could this at least be a configurable option? A good point. Keep in mind that even in the current system, your configuration is variable based on the checkpoint_timeout setting. Regards, Jeff Davis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [GENERAL] WAL archiving idle database
On Fri, 2007-10-26 at 18:47 -0400, Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: What's happening is that you have a checkpoint_timeout of 5 minutes, and that checkpoint must write a checkpoint record in the WAL, prompting the archiving. If you want it to happen less frequently, it's often safe to have checkpoint timeout set to something larger by a reasonable amount. I think you're confusing checkpoint_timeout and archive_timeout... Thanks for clarifying it. The user-visible behavior, as I understand it, is that the time between archiving on an idle database is: MAX(archive_timeout,checkpoint_timeout) [ of course, there's no guarantee that the archive_command succeeds in that time ] It looks like checkpoint_timeout was the limiting factor, in his case. Regards, Jeff Davis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, 26 Oct 2007, Andrew Sullivan wrote: Sort of a filthy hack, but what about always having an _extra_ segment around? The bgwriter could do that, no? Now it could. The bgwriter in =8.2 stops executing when there's a checkpoint going on, and needing more WAL segments because a checkpoint is taking too long is one of the major failure cases where proactively creating additional segments would be most helpful. The 8.3 bgwriter keeps running even during checkpoints, so it's feasible to add such a feature now. But that only became true well into the 8.3 feature freeze, after some changes Heikki made just before the load distributed checkpoint patch was commited. Before that, it was hard to implement this feature; afterwards, it was too late to fit the change into the 8.3 release. Should be easy enough to add to 8.4 one day. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] WAL archiving idle database
On Fri, 2007-10-26 at 18:06 -0500, Kevin Grittner wrote: Hmmm... We would actually prefer to get the WAL file at the specified interval. We have software to ensure that the warm standby instances are not getting stale, and that's pretty simple with the current behavior. We don't have a bandwidth or storage Another thought: when you say it's pretty simple, what do you do now? My monitoring scripts for this particular situation employ some pretty ugly code. I think if this did get changed, I would change my script to monitor the pg_current_xlog_location() of the primary database and compare to the last restored log file... entry in the standby database's log. I would think if the current location does not end in all zeros, you should expect a new WAL segment to be archived soon. Although this assumes that an idle database would not advance that location at all, and I'm still trying to understand Tom's proposal well enough to know whether that would be true or not. If this doesn't get changed, I think we should archive every archive_timeout seconds, rather than MAX(archive_timeout,checkpoint_timeout), which is less obvious. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [GENERAL] WAL archiving idle database
Jeff Davis [EMAIL PROTECTED] writes: I would think if the current location does not end in all zeros, you should expect a new WAL segment to be archived soon. Although this assumes that an idle database would not advance that location at all, and I'm still trying to understand Tom's proposal well enough to know whether that would be true or not. With my proposal, after the last activity, you'd get a checkpoint, and then at the next archive_timeout we'd advance the pointer to a segment boundary and archive the old segment, and then nothing more would happen until the next WAL-loggable update. So yeah, the master's pg_current_xlog_location could be expected to sit at a segment boundary while it was idle. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PANIC caused by open_sync on Linux
Greg Smith [EMAIL PROTECTED] writes: The 8.3 bgwriter keeps running even during checkpoints, so it's feasible to add such a feature now. I wonder though whether the walwriter wouldn't be a better place for it. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Feature Freeze date for 8.4
Tom, This seems pretty entirely orthogonal to the commit-fest proposal. I see no reason to think that snapshots taken at those times would be any better than any other nightly snapshot, nor any reason to memorialize them in an archive. I can see that. And it would be pretty hard to keep members of the public from treating the persistent snapshots as releases, and then flooding the lists when they break. So, scratch an idea with unintended consequences. I'll just have to keep the snapshots on our CVS slave. I do hope that the commit-fests will lead to more testing, though, just because people waiting for a specific feature will want to try out the initial commit of that feature. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Avoiding planning redundant backwards merges
While fooling around with the planner performance bug reported here: http://archives.postgresql.org/pgsql-bugs/2007-10/msg00173.php I noticed that even after fixing the bug, 8.3 seemed to plan this many-way join about 50% slower than 8.2 :-(. After some digging I understand the reason: 8.3 will almost always consider both a forward and a backward indexscan on each potentially useful index. This results in twice as many pre-sorted paths available to match_unsorted_outer(), and therefore about twice as much work generating paths that are exactly equivalent cost-wise but generate opposite output orders. These extra paths are not without use, now that we have the ability to declare DESC index columns: the only way to produce an indexed mergejoin between an ASC and a DESC index is to scan one of them backwards. So I don't want to lobotomize the code entirely. But we're paying a pretty high price for that capability. The idea I'm toying with is to make pathkeys_useful_for_merging() consider only ASC pathkeys as useful for merging --- that is, only pathkeys with pk_strategy = BTLessStrategyNumber. This would mean that only forward scans on ASC indexes and backward scans on DESC indexes would be considered to have interesting sort orders, and therefore in cases without any ORDER BY clause to worry about, the other indexscan path would not survive the initial competition in add_path. It'd be seen as having the same cost and worse ordering, and would be dropped. Now the tricky point in this is that if there's an ORDER BY on the query, then you might want a backwards mergejoin after all, if that means you come out with the right ordering for the ORDER BY. However, if there is such an ORDER BY, then pathkeys_useful_for_ordering will pick up on it and the opposite-direction indexscan will survive after all. (The number of pathkeys we keep for a path is the larger of these two functions' estimates, and paths with unequal pathkeys do not compete in add_path.) We'll plan out all the alternatives from both starting points, and eventually figure out at the top level that one of them avoids a final sort and is therefore cheaper. So in this not-too-common case, we'll be slower than 8.2 but also produce a better plan; I don't feel too bad about that. I admit this seems a bit Rube Goldbergian, but then again there is a whole lot of the planner's behavior that is emergent from the interplay of little pieces rather than being explicitly coded in one place. Comments, better ideas? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, 26 Oct 2007, Tom Lane wrote: The 8.3 bgwriter keeps running even during checkpoints, so it's feasible to add such a feature now. I wonder though whether the walwriter wouldn't be a better place for it. I do, too, but that wasn't available until too late in the 8.3 cycle to consider adding this feature to there either. There's a couple of potential to-do list ideas that build on the changes in this area in 8.3: -Aggressively pre-allocate WAL segments -Space out checkpoint fsync requests in addition to disk writes -Consider re-inserting a smarter bgwriter all-scan that writes sorted by usage count during idle periods -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] URGENT HELP about 'duration' stats
Hi, I have been doing my master degree research and I am using Postgresql 8.2.5. I would appreciate if anyone could help me, ASAP, concerning the following question: How can I obtain the time spent by the Postgresql EXECUTOR during a given time interval? I am trying to get the above information in the following way:: Using the BenchmarkSQL Software, I submit random transactions (by simulating the TPC-C benchmark) during a time interval (say 30 seconds). PostgreSQL then generates statistics for each phase that a SQL Command is executed. In thje following, it is shown the time duration of the EXECUTE Phase of some SQL Command: . . . . 2007-10-11 15:49: 05.221 BRT [46994] LOG: duration: 0.012 ms execute S_1: BEGIN 2007-10-11 15:49:05.430 BRT [46994] LOG: duration: 94.073 ms execute unnamed: SELECT c_discount, c_last, c_credit, w_tax FROM customer, warehouse WHERE w_id = $1 AND w_id = c_w_id AND c_d_id = $2 AND c_id = $3 2007-10-11 15:49:05.462 BRT [46994] LOG: duration: 13.662 ms execute unnamed: SELECT d_next_o_id, d_tax FROM district WHERE d_id = $1 AND d_w_id = $2 FO R UPDATE 2007-10-11 15:49:05.562 BRT [46994] LOG: duration: 98.818 ms execute unnamed: INSERT INTO NEW_ORDER (no_o_id, no_d_id, no_w_id) VALUES ( $1, $2, $3) 2007-10-11 15:49:05.579 BRT [46994] LOG: duration: 15.459 ms execute unnamed: UPDATE district SET d_next_o_id = d_next_o_id + 1 WHERE d_id = $1 AND d_w _id = $2 2007-10-11 15:49:05.748 BRT [46994] LOG: duration: 122.387 ms execute unnamed: INSERT INTO OORDER (o_id, o_d_id, o_w_id, o_c_id, o_entry_d, o_ol_cnt, o _all_local) VALUES ($1, $2, $3, $4, $5, $6, $7) 2007-10-11 15:49:05.779 BRT [46994] LOG: duration: 29.787 ms execute unnamed: SELECT i_price, i_name , i_data FROM item WHERE i_id = $1 2007-10-11 15:49: 05.846 BRT [46994] LOG: duration: 64.963 ms execute unnamed: SELECT s_quantity, s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_d ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE 2007-10-11 15:49:05.882 BRT [46994] LOG: duration: 33.865 ms execute unnamed: SELECT i_price, i_name , i_data FROM item WHERE i_id = $1 2007-10-11 15:49:05.931 BRT [46994] LOG: duration: 48.075 ms execute unnamed: SELECT s_quantity, s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_d ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE 2007-10-11 15:49:05.962 BRT [46994] LOG: duration: 28.782 ms execute unnamed: SELECT i_price, i_name , i_data FROM item WHERE i_id = $1 2007-10-11 15:49:06.002 BRT [46994] LOG: duration: 38.026 ms execute unnamed: SELECT s_quantity, s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_d ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE 2007-10-11 15:49:06.027 BRT [46994] LOG: duration: 23.220 ms execute unnamed: SELECT i_price, i_name , i_data FROM item WHERE i_id = $1 2007-10-11 15:49:06.043 BRT [46994] LOG: duration: 14.308 ms execute unnamed: SELECT s_quantity, s_data, s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_d ist_05,s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10 FROM stock WHERE s_i_id = $1 AND s_w_id = $2 FOR UPDATE . . . . Trying to answer my initial question, I am doing something like this: 1. Through the statements TIMESTAMP, I calculate the total time interval in which all the statments had been executed: (END TIMESTAMP(2007-10-11 15:49: 06.043) - BEGIN TIMESTAMP (2007-10-11 15:49:05.221)) 2. Then I sum all the 'duration' fields of all EXECUTOR Sections of all statements executed. This, in thesis, give me the amount of time that the executor had been used during the total time interval in which the statements had been executed) The problem I have encountered is that the sum of executor's duration time is, *sometimes*, bigger than the total time interval in which the statements had been executed!! And this makes no sense! In some tests I have done, I have found these results: - Total Time interval which BenchmarkSQL submitted queries = 30 seconds - sum of executor's duration time during this period of time = 2 minutes and 36 seconds!! (5x longer than total time) How could the sum of duration time of executor be longer than the sum of duration of the statements that use it? Can anyone, please, help me to solve this? Many thanks, Camilo _ Veja mapas e encontre as melhores rotas para fugir do trĂ¢nsito com o Live Search Maps! http://www.livemaps.com.br/index.aspx?tr=true
Re: [HACKERS] URGENT HELP about 'duration' stats
Camilo Porto [EMAIL PROTECTED] writes: The problem I have encountered is that the sum of executor's duration time is, *sometimes*, bigger than the total time interval in which the statements had been executed!! And this makes no sense! Umm ... why not? If you have, say, two queries executing in parallel for 1 second, they'll each report a duration: of 1 second, thus summing to 2 seconds, but the elapsed time was only 1 second. If you don't see that always, then your benchmark program isn't trying very hard to run more than one query in parallel ... regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate