Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Tue, 26 Oct 2004, Greg Stark wrote: I see mmap or O_DIRECT being the only viable long-term stable states. My natural inclination was the former but after the latest thread on the subject I suspect it'll be forever out of reach. That makes O_DIRECT And a Postgres managed cache the only real choice. Having both caches is just a waste of memory and a waste of cpu cycles. I don't see why mmap is any more out of reach than O_DIRECT; it's not all that much harder to implement, and mmap (and madvise!) is more widely available. But if using two caches is only costing us 1% in performance, there's not really much point cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.NetBSD.org Make up enjoying your city life...produced by BIC CAMERA ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Using ALTER TABLESPACE in pg_dump
Dear Tom, ISTM that the core business of a database is to help organize and protect data, and it is plainly that. You just wish you won't need it, so it is somehow abstract, but when and if you need it, it is not second-order at all;-) and it is much too late to redo the dump. So you create some tablespaces by hand. Big deal. I agree that is is doable this way, although not really nice. This objection is not strong enough to justify an ugly, klugy definition for where tables get created. I do also agree about this. My real point is that while reading the thread quickly, I was afraid the problem would not be better addressed at all in the coming release. It seems that I was wrong as it does not look to be the case. Any fix instead of nothing is fair enough for me. Thanks for your answer, have a nice day, -- Fabien Coelho - [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Mon, 2004-10-25 at 16:34, Jan Wieck wrote: The problem is, with a too small directory ARC cannot guesstimate what might be in the kernel buffers. Nor can it guesstimate what recently was in the kernel buffers and got pushed out from there. That results in a way too small B1 list, and therefore we don't get B1 hits when in fact the data was found in memory. B1 hits is what increases the T1target, and since we are missing them with a too small directory size, our implementation of ARC is propably using a T2 size larger than the working set. That is not optimal. I think I have seen that the T1 list shrinks too much, but need more tests...with some good test results The effectiveness of ARC relies upon the balance between the often conflicting requirements of recency and frequency. It seems possible, even likely, that pgsql's version of ARC may need some subtle changes to rebalance it - if we are unlikely enough to find cases where it genuinely is out of balance. Many performance tests are required, together with a few ideas on extra parameters to includehence my support of Jan's ideas. That's also why I called the B1+B2 hit ratio turbulence because it relates to how much oscillation is happening between T1 and T2. In physical systems, we expect the oscillations to be damped, but there is no guarantee that we have a nearly critically damped oscillator. (Note that the absence of turbulence doesn't imply that T1+T2 is optimally sized, just that is balanced). [...and all though the discussion has wandered away from my original patch...would anybody like to commit, or decline the patch?] If we would replace the dynamic T1 buffers with a max_backends*2 area of shared buffers, use a C value representing the effective cache size and limit the T1target on the lower bound to effective cache size - shared buffers, then we basically moved the T1 cache into the OS buffers. Limiting the minimum size of T1len to be 2* maxbackends sounds like an easy way to prevent overbalancing of T2, but I would like to follow up on ways to have T1 naturally stay larger. I'll do a patch with this idea in, for testing. I'll call this T1 minimum size so we can discuss it. Any other patches are welcome... It could be that B1 is too small and so we could use a larger value of C to keep track of more blocks. I think what is being suggested is two GUCs: shared_buffers (as is), plus another one, larger, which would allow us to track what is in shared_buffers and what is in OS cache. I have comments on effective cache size below On Mon, 2004-10-25 at 17:03, Tom Lane wrote: Jan Wieck [EMAIL PROTECTED] writes: This all only holds water, if the OS is allowed to swap out shared memory. And that was my initial question, how likely is it to find this to be true these days? I think it's more likely that not that the OS will consider shared memory to be potentially swappable. On some platforms there is a shmctl call you can make to lock your shmem in memory, but (a) we don't use it and (b) it may well require privileges we haven't got anyway. Are you saying we shouldn't, or we don't yet? I simply assumed that we did use that function - surely it must be at least an option? RHEL supports this at least It may well be that we don't have those privileges, in which case we turn off the option. Often, we (or I?) will want to install a dedicated server, so we should have all the permissions we need, in which case... This has always been one of the arguments against making shared_buffers really large, of course --- if the buffers aren't all heavily used, and the OS decides to swap them to disk, you are worse off than you would have been with a smaller shared_buffers setting. Not really, just an argument against making them *too* large. Large *and* utilised is OK, so we need ways of judging optimal sizing. However, I'm still really nervous about the idea of using effective_cache_size to control the ARC algorithm. That number is usually entirely bogus. Right now it is only a second-order influence on certain planner estimates, and I am afraid to rely on it any more heavily than that. ...ah yes, effective_cache_size. The manual describes effective_cache_size as if it had something to do with the OS, and some of this discussion has picked up on that. effective_cache_size is used in only two places in the code (both in the planner), as an estimate for calculating the cost of a) nonsequential access and b) index access, mainly as a way of avoiding overestimates of access costs for small tables. There is absolutely no implication in the code that effective_cache_size measures anything in the OS; what it gives is an estimate of the number of blocks that will be available from *somewhere* in memory (i.e. in shared_buffers OR OS cache) for one particular table (the one currently being considered by the planner). Crucially, the size referred to is the size of the *estimate*, not the size
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Tue, 2004-10-26 at 06:53, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Another issue is what we do with the effective_cache_size value once we have a number we trust. We can't readily change the size of the ARC lists on the fly. Huh? I thought effective_cache_size was just used as an factor the cost estimation equation. Today, that is true. Jan is speculating about using it as a parameter of the ARC cache management algorithm ... and that worries me. ISTM that we should be optimizing the use of shared_buffers, not whats outside. Didn't you (Tom) already say that? BTW, very good ideas on how to proceed, but why bother? For me, if the sysadmin didn't give shared_buffers to PostgreSQL, its because the memory is intended for use by something else and so not available at all. At least not dependably. The argument against large shared_buffers because of swapping applies to that assumption also...the OS cache is too volatile to attempt to gauge sensibly. There's an argument for improving performance for people that haven't set their parameters correctly, but thats got to be a secondary consideration anyhow. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS]
[sorry if this mail appears more than once, but it seems the others didn't make it through the list] This is a short preview on the view update code i'm currently working on. It is far away from being ready, but i want to share the current code and get some hints, what things have to be made better: http://www.oopsware.de/pgsql/pgsql-view_update.tar.gz The patchfile is against beta4, view_update.h belongs to src/include/rewrite, view_update.c to src/backend/rewrite. The code requires an initdb, however, the catalog change contained in this code is used but not very useful yet. The code has the following known problems: - User defined update rules on views are created without any intervention from the code. This is why the rule regression tests (and others) fails, because there suddenly are two INSERT/UPDATE/DELETE rules - Indexed array fields in a view columns list causes SIGSEGV (due to some experiments i do with array subscripting operations). - There are problems with NULL fields in WHERE conditions. - gram.y is only an ugly hack to get the CHECK OPTION working. needs deeper efforts, because it makes WITH a reserved keyword The following items needs deeper discussion i think: - DEFAULT values on the underlying base tables needs to be specified explicitly to the view itself (via ALTER TABLE), another problem are SERIALs ... - What should happen, if someone throws a pg_dump or sql script at the backend, that holds own update rules for a view? in this case the implicit ones should be removed or deactivated - the code only supports cascaded rules and all rules are created on the base relations only. So if one underlying base relation is not updateable, the view itself is not updateable, too. -- TIA Bernd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] plans for bitmap indexes?
On K, 2004-10-20 at 03:03, Simon Riggs wrote: Well, thats the best one yet. That's the solution, if ever I heard it. The reduction in bitmap size makes their use much safer. Size matters, since we're likely to start using these techniques on very large databases, which imply obviously have very large CTID lists. The problem with guessing the number of rows is you're never too sure whether its worth the startup overhead of using the bitmap technique. my next question was going to be, so how will you know when to use the technique? Hmmmthinkyou'd need to be clear that the cost of scanning a block didn't make the whole thing impractical. Generally, since we're using this technique to access infrequent row combinations, we'd be looking at no more than one row per block usually anyway. So the technique is still I/O bound - a bit extra post I/O cpu work won't hurt much. OK, cool. I still think that an initial implementation could be done with a plain bitmap kind of bitmap, if we are content with storing one bit per page only - a simple page bitmap for 1TB table with 8kB pages takes only 16 MB, and that's backends private memory not more scarce shared memory. It takes only 4mb for 32kb pages. I guess that anyone working with terabyte size tables can afford a few megabytes of RAM for query processing. Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Automatic view update rules
--On Dienstag, Oktober 26, 2004 13:02:27 +0200 Bernd Helmle [EMAIL PROTECTED] wrote: [sorry if this mail appears more than once, but it seems the others didn't make it through the list] This is a short preview on the view update code i'm currently working on. It is far away from being ready, but i want to share the current code and get some hints, what things have to be made better: http://www.oopsware.de/pgsql/pgsql-view_update.tar.gz The patchfile is against beta4, view_update.h belongs to src/include/rewrite, view_update.c to src/backend/rewrite. The code requires an initdb, however, the catalog change contained in this code is used but not very useful yet. The code has the following known problems: - User defined update rules on views are created without any intervention from the code. This is why the rule regression tests (and others) fails, because there suddenly are two INSERT/UPDATE/DELETE rules - Indexed array fields in a view columns list causes SIGSEGV (due to some experiments i do with array subscripting operations). - There are problems with NULL fields in WHERE conditions. - gram.y is only an ugly hack to get the CHECK OPTION working. needs deeper efforts, because it makes WITH a reserved keyword The following items needs deeper discussion i think: - DEFAULT values on the underlying base tables needs to be specified explicitly to the view itself (via ALTER TABLE), another problem are SERIALs ... - What should happen, if someone throws a pg_dump or sql script at the backend, that holds own update rules for a view? in this case the implicit ones should be removed or deactivated - the code only supports cascaded rules and all rules are created on the base relations only. So if one underlying base relation is not updateable, the view itself is not updateable, too. Sorry, forgot the subject :( -- Bernd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] plans for bitmap indexes?
On K, 2004-10-20 at 01:52, Mark Kirkwood wrote: I don't believe that read only is required. The update/insert performance impact of bimap indexes is however very high (in Oracle's implementation anyway) - to the point where many sites drop them before adding in new data, and recreated 'em afterwards! In the advent that there is a benefit for the small on-disk footprint, the insert/update throughput implications will need to be taken into account. I repeat here my earlier proposal of making the bitmap indexes page-level and clustering data automatically on AND of all defined bitmap indexes. This would mostly solve this problem too, as there will be only one insert per page per index (when the first tuple is inserted) and one delete (when the page gets empty). This has a downside of suboptimal space usage but this should not (tm) be an issue for large tables, where most combinations of bits will get enough hits to fill several pages. Such clustering would also help (probably a lot) all queries actually using these indexes. - Hannu ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Tue, 2004-10-26 at 09:49, Simon Riggs wrote: On Mon, 2004-10-25 at 16:34, Jan Wieck wrote: The problem is, with a too small directory ARC cannot guesstimate what might be in the kernel buffers. Nor can it guesstimate what recently was in the kernel buffers and got pushed out from there. That results in a way too small B1 list, and therefore we don't get B1 hits when in fact the data was found in memory. B1 hits is what increases the T1target, and since we are missing them with a too small directory size, our implementation of ARC is propably using a T2 size larger than the working set. That is not optimal. I think I have seen that the T1 list shrinks too much, but need more tests...with some good test results If we would replace the dynamic T1 buffers with a max_backends*2 area of shared buffers, use a C value representing the effective cache size and limit the T1target on the lower bound to effective cache size - shared buffers, then we basically moved the T1 cache into the OS buffers. Limiting the minimum size of T1len to be 2* maxbackends sounds like an easy way to prevent overbalancing of T2, but I would like to follow up on ways to have T1 naturally stay larger. I'll do a patch with this idea in, for testing. I'll call this T1 minimum size so we can discuss it. Don't know whether you've seen this latest update on the ARC idea: Sorav Bansal and Dharmendra S. Modha, CAR: Clock with Adaptive Replacement, in Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 187--200, March 2004. [I picked up the .pdf here http://citeseer.ist.psu.edu/bansal04car.html] In that paper Bansal and Modha introduce an update to ARC called CART which they say is more appropriate for databases. Their idea is to introduce a temporal locality window as a way of making sure that blocks called twice within a short period don't fall out of T1, though don't make it into T2 either. Strangely enough the temporal locality window is made by increasing the size of T1... in an adpative way, of course. If we were going to put a limit on the minimum size of T1, then this would put a minimal temporal locality window in placerather than the increased complexity they go to in order to make T1 larger. I note test results from both the ARC and CAR papers that show that T2 usually represents most of C, so the observations that T1 is very small is not atypical. That implies that the cost of managing the temporal locality window in CART is usually wasted, even though it does cut in as an overall benefit: The results show that CART is better than ARC over the whole range of cache sizes tested (16MB to 4GB) and workloads (apart from 1 out 22). If we were to implement a minimum size of T1, related as suggested to number of users, then this would provide a reasonable approximation of the temporal locality window. This wouldn't prevent the adaptation of T1 to be higher than this when required. Jan has already optimised ARC for PostgreSQL by the addition of a special lookup on transactionId required to optimise for the double cache lookup of select/update that occurs on a T1 hit. That seems likely to be able to be removed as a result of having a larger T1. I'd suggest limiting T1 to be a value of: shared_buffers = 1000 T1limit = max_backends *0.75 shared_buffers = 2000 T1limit = max_backends shared_buffers = 5000 T1limit = max_backends *1.5 shared_buffers 5000 T1limit = max_backends *2 I'll try some tests with both - minimum size of T1 - update optimisation removed Thoughts? -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Possible make_oidjoins_check Security Issue
Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: On Wed, 2004-10-20 at 06:18, Rod Taylor wrote: http://secunia.com/advisories/12860/ This seems like a rather inconsequential problem, Indeed, since ordinary users have no use for make_oidjoins_check. It's surely very implausible that anyone would run it as root; and even if someone did, the attacker cannot control what gets written. but it should be fixed. The first two ideas that come to mind: use temporary files in $PWD rather than /tmp, or create a subdirectory in /tmp to use for the temporary files. I believe that the subdirectory idea is also vulnerable without great care. I believe the proper way to handle this is a new directory under /tmp. I use this in my local scripts: TMP=/tmp/$$ OMASK=`umask` umask 077 if ! mkdir $TMP thenecho Can't create temporary directory $TMP. 12 exit 1 fi umask $OMASK unset OMASK It basically makes sure it creates a new directory under /tmp with a umask that guarantees no one else can create a file in that directory. All temp files are accessed as $TMP/XXX. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] rmtree() failure on Windows
Here is some more info. Below is a trace from dropdb. There is a loop around the rmdir() calls which I have set to time out at 600 seconds. The call eventually succeeds after around 300 seconds (I've seen this several times). It looks like we are the victim of some caching - the directory still thinks it has some of the files it has told us we have deleted successfully. Bottom line, this is a real mess. Surely postgres is not the only application in the world that wants to be able to delete a directory tree reliably on Windows. What do other apps do? cheers andrew 2004-10-26 10:26:35 [2496] LOG: connection received: host=127.0.0.1 port=1918 2004-10-26 10:26:35 [2496] LOG: connection authorized: user=pgrunner database=template1 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1247 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1249 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1255 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/1259 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16384 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16386 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16388 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16390 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16392 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16394 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16396 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16398 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16400 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16402 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16404 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16406 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16408 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16410 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16412 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16414 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16416 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16418 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16672 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16674 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16676 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16678 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16679 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16680 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16681 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16682 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16683 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16684 2004-10-26 10:26:35 [2496] DEBUG: unlinking C:/msys/1.0/home/pgrunner/pgbuildfarm/root/HEAD/inst.blurfl/data/base/17230/16685 2004-10-26 10:26:35 [2496]
Re: [HACKERS] Possible make_oidjoins_check Security Issue
Bruce Momjian [EMAIL PROTECTED] writes: I believe the proper way to handle this is a new directory under /tmp. It's definitely not worth the trouble. I looked at what configure does to make /tmp subdirectories portably, and it is spectacularly ugly (not to mention long). If make_oidjoins_check were a user-facing tool that would be one thing, but it isn't ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] making pdf of docs
Is there something wrong that makes it impossible to build the doc as a pdf? I started a build 4 hours ago, and it has still not finished (stuck at 100% CPU on my old 800Mhz 1G RAM machine). I know that openjade is very slow so for the first 3 hours I didn't worry. Now I'm starting to think that it will never finish. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] making pdf of docs
On Tue, 26 Oct 2004, Dennis Bjorklund wrote: Is there something wrong that makes it impossible to build the doc as a pdf? My experience is that the latest openjade crashes. The latest jade takes about 10 days on an Athlon 1600, but I can build it in a very reasonable timeframe with an older version of jade. Well scratch that. I just checked the jade versions and they are identical, perhaps different sytlesheets or something? For now all I know is that it works on debian stable, but takes forever on unstable. I'll have to do some more investigating to try and find the real difference. Kris Jurka ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
Curt Sampson [EMAIL PROTECTED] writes: On Tue, 26 Oct 2004, Greg Stark wrote: I see mmap or O_DIRECT being the only viable long-term stable states. My natural inclination was the former but after the latest thread on the subject I suspect it'll be forever out of reach. That makes O_DIRECT And a Postgres managed cache the only real choice. Having both caches is just a waste of memory and a waste of cpu cycles. I don't see why mmap is any more out of reach than O_DIRECT; it's not all that much harder to implement, and mmap (and madvise!) is more widely available. Because there's no way to prevent a write-out from occurring and no way to be notified by mmap before a write-out occurs, and Postgres wants to do its WAL logging at that time if it hasn't already happened. But if using two caches is only costing us 1% in performance, there's not really much point Well firstly it depends on the work profile. It can probably get much higher than we saw in that profile if your work load is causing more fresh buffers to be fetched. Secondly it also reduces the amount of cache available. If you have 256M of ram with about 200M free, and 40Mb of ram set aside for Postgres's buffer cache then you really only get 160Mb. It's costing you 20% of your cache, and reducing the cache hit rate accordingly. Thirdly the kernel doesn't know as much as Postgres about the load. Postgres could optimize its use of cache based on whether it knows the data is being loaded by a vacuum or sequential scan rather than an index lookup. In practice Postgres has gone with ARC which I suppose a kernel could implement anyways, but afaik neither linux nor BSD choose to do anything like it. -- greg ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] making pdf of docs
Dennis Bjorklund wrote: Is there something wrong that makes it impossible to build the doc as a pdf? I started a build 4 hours ago, and it has still not finished (stuck at 100% CPU on my old 800Mhz 1G RAM machine). I know that openjade is very slow so for the first 3 hours I didn't worry. Now I'm starting to think that it will never finish. I've never tried building PDF from PostgreSQL DocBook source. However, in other DocBook documents, I've found that if there is an embedded image that is too large to fit on a single page, various PDF renderers will paginate the image onto the next page, discover it is too large to fit on the next page, generate a page break, and the process continues ad infinitum. Maybe a recent large image was added to the docs? FWIW, Mike Mascari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] plans for bitmap indexes?
Hannu Krosing [EMAIL PROTECTED] writes: I repeat here my earlier proposal of making the bitmap indexes page-level and clustering data automatically on AND of all defined bitmap indexes. The problem with page-level bitmaps is that they could be much less effective. Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of those matches about 1% of your tuples. If you have 100 tuples per page then each of those bitmaps will find a tuple in about half the pages. So the resulting AND will find about 1/8th of the pages as candidates. In reality the number of pages it should have to fetch should be more like 1 in a million. The other problem is that for persist on-disk indexes they require more work to update. You would have to recheck every other tuple in the page to recalculate the bit value instead of just being able to flip one bit. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Postgres performs a Seq Scan instead of an Index Scan!
Hello, I have a little question. Why performs Postgresql a Seq. Scan in the next Select statement instead of a Index Read? I have an index on (ATTR1,ATTR2,ATTR3), so why is postgresql not performing a Index Keyed Read in the SELECT? I agree that the tables are empty so maybe this influence the decision to do a Seq scan, but my app use a DB with arround 100.000 records and it still does a seq. scan. CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 TIMESTAMP); CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3); EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND ATTR3='2004-01-01'; Result in: QUERY PLAN -- Seq Scan on testtable (cost=0.00..27.50 rows=1 width=20) Filter: ((attr1 = 1) AND (attr2 = 2) AND (attr3 = '2004-01-01 00:00:00'::timestamp without time zone)) (2 rows) If I add a INDEXHELPER it helps a bit. But it's not a 100% Index Scan. CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 TIMESTAMP); CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3); CREATE INDEX INDEXHELPER ON TESTTABLE(ATTR3); EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND ATTR3='2004-01-01'; QUERY PLAN --- Index Scan using indexhelper on testtable (cost=0.00..17.09 rows=1 width=20) Index Cond: (attr3 = '2004-01-01 00:00:00'::timestamp without time zone) Filter: ((attr1 = 1) AND (attr2 = 2)) (3 rows) Changing from TIMESTAMP to DATE don't help (I am not using the time component in my app): CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 DATE); CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3); EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND ATTR3='2004-01-01'; QUERY PLAN -- Seq Scan on testtable (cost=0.00..27.50 rows=1 width=16) Filter: ((attr1 = 1) AND (attr2 = 2) AND (attr3 = '2004-01-01'::date)) (2 rows) Thanks in Advance, Jos ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Postgres performs a Seq Scan instead of an Index Scan!
On Oct 26, 2004, at 12:12 PM, Jos van Roosmalen wrote: ATTR1 INT8 Looks like your column is int8, yet your query is sending in an int4. Therefore the index is not used. This is fixed in PG 8.0. In the mean time, you can: SELECT * FROM TESTTABLE WHERE ATTR1=1::INT8 ... which explicitly casts the literal int4 to an int8, making the int8 column index useable. James Robinson Socialserve.com ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Postgres performs a Seq Scan instead of an Index Scan!
On Tue, 26 Oct 2004 18:12:36 +0200, Jos van Roosmalen wrote: I have a little question. Why performs Postgresql a Seq. Scan in the next Select statement instead of a Index Read? That is a FAQ: http://www.postgresql.org/docs/faqs/FAQ.html#4.8 Please direct any further questions of this nature that are not covered in the FAQ or the documentation to the pgsql-performance list. Jochem ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Postgres performs a Seq Scan instead of an Index Scan!
On Tue, Oct 26, 2004 at 06:12:36PM +0200, Jos van Roosmalen wrote: CREATE TABLE TESTTABLE (ATTR1 INT8,ATTR2 INT4,ATTR3 TIMESTAMP); CREATE UNIQUE INDEX TESTINDEX ON TESTTABLE(ATTR1,ATTR2,ATTR3); EXPLAIN SELECT * FROM TESTTABLE WHERE ATTR1=1 AND ATTR2=2 AND ATTR3='2004-01-01'; try: explain select * from testtable where attr1=1::int8 and attr2=2 and attr3='2004-01-01'; Or change the int8 to bigint. Isn't this in some FAQ yet? It will not do an index scan if the types don't match. Kurt ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] rmtree() failure on Windows
Andrew Dunstan [EMAIL PROTECTED] writes: Here is some more info. Below is a trace from dropdb. There is a loop around the rmdir() calls which I have set to time out at 600 seconds. The call eventually succeeds after around 300 seconds (I've seen this several times). It looks like we are the victim of some caching - the directory still thinks it has some of the files it has told us we have deleted successfully. If you rescan the directory after deleting the files, does it show as empty? Bottom line, this is a real mess. Surely postgres is not the only application in the world that wants to be able to delete a directory tree reliably on Windows. What do other apps do? I'm wondering if this is a side effect of the way win32_open does things. It's hard to believe that rmdir is that bogus in general, but perhaps win32_open is causing us to exercise a corner case? regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] to_char/to_number loses sign
Karel Zak [EMAIL PROTECTED] writes: Yes, you're right. It strange, but NUM_S missing there. The conversion from string to number is less stable part of formatting.c... The patch is in the attachment. This patch causes the regression tests to fail. I think you need to consider the to_char() side of it more carefully. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Postresql 8.0 Beta 3 - SELECT ... FOR UPDATE
There is a statement_timeout that will control how long a statement can execute before being cancelled. We have never agreed that controlling how long we wait for an individual lock is valuable. --- Robert Treat wrote: On Thursday 21 October 2004 06:44, you wrote: Hi Was already implemented the timeout on the SELECT ... FOR UPDATE (non-blocking lock) and/or is possible known if the lock exist on the specified ROW before executing the SELECT? No. Please note: ours need is the timeout/verify at the ROW level, not at the table level. Is already OK? Is in the TODO list? May you suggest an alternative method? Thank you. You would need a more extensive implementation of row level locks than PostgreSQL currently offers. There have been discussions about this in the past, but afaik no one is actively working on it. You can probably find more info in the archives about it, also I believe it is on the TODO list, so you might find some more detail by looking there. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] New compile warnings in CVS
I just updated my CVS copy and am seeing four new warnings from pgstat.c: pgstat.c:2352: warning: variable `dbentry' might be clobbered by `longjmp' or `vfork' pgstat.c:2360: warning: variable `havebackends' might be clobbered by `longjmp' or `vfork' pgstat.c:2362: warning: variable `use_mcxt' might be clobbered by `longjmp' or `vfork' pgstat.c:2363: warning: variable `mcxt_flags' might be clobbered by `longjmp' or `vfork' Any ideas on this? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] New compile warnings in CVS
Bruce Momjian wrote: I just updated my CVS copy and am seeing four new warnings from pgstat.c: pgstat.c:2352: warning: variable `dbentry' might be clobbered by `longjmp' or `vfork' pgstat.c:2360: warning: variable `havebackends' might be clobbered by `longjmp' or `vfork' pgstat.c:2362: warning: variable `use_mcxt' might be clobbered by `longjmp' or `vfork' pgstat.c:2363: warning: variable `mcxt_flags' might be clobbered by `longjmp' or `vfork' Any ideas on this? Oh this is BSD/OS 4.3 using gcc: gcc version 2.95.3 20010315 (release) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] rmtree() failure on Windows
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Here is some more info. Below is a trace from dropdb. There is a loop around the rmdir() calls which I have set to time out at 600 seconds. The call eventually succeeds after around 300 seconds (I've seen this several times). It looks like we are the victim of some caching - the directory still thinks it has some of the files it has told us we have deleted successfully. If you rescan the directory after deleting the files, does it show as empty? No! That's how I got the list of files it still thinks are there. Gross, eh? Bottom line, this is a real mess. Surely postgres is not the only application in the world that wants to be able to delete a directory tree reliably on Windows. What do other apps do? I'm wondering if this is a side effect of the way win32_open does things. It's hard to believe that rmdir is that bogus in general, but perhaps win32_open is causing us to exercise a corner case? I don't know. I tried to reproduce it in a simple case using fopen/fclose and wasn't able to. cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] New compile warnings in CVS
Bruce Momjian [EMAIL PROTECTED] writes: I just updated my CVS copy and am seeing four new warnings from pgstat.c: pgstat.c:2352: warning: variable `dbentry' might be clobbered by `longjmp' or `vfork' pgstat.c:2360: warning: variable `havebackends' might be clobbered by `longjmp' or `vfork' pgstat.c:2362: warning: variable `use_mcxt' might be clobbered by `longjmp' or `vfork' pgstat.c:2363: warning: variable `mcxt_flags' might be clobbered by `longjmp' or `vfork' No doubt this is from the PG_TRY that Neil added a couple days ago. I think he is going to take it out again in favor of using AllocateFile, so ignore the warnings for now (they're obviously bogus anyway). regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] 8.0b4: COMMIT outside of a transaction echoes ROLLBACK
just wondering: test= select version(); version -- PostgreSQL 8.0.0beta4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux) (1 row) test= begin; BEGIN test= commit; COMMIT test= commit; WARNING: there is no transaction in progress ROLLBACK Is there any reason ROLLBACK and not COMMIT is echoed here? Ian Barwick [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] timestamp with time zone a la sql99
Added to TODO: * Once we expand timestamptz to bigger than 8 bytes, there's essentially --- Tom Lane wrote: Dennis Bjorklund [EMAIL PROTECTED] writes: So if I understand you correctly you are planning to extend the current timestamp type to work with both named time zones and HH:MM ones? I didn't think you wanted the last one since your plan was to store a UTC+OID where the OID pointed to a named time zone. And I guess that you don't plan to add 00:00, 00:01, 00:02, ... as named zones with an OID. I missed getting back to you on this, but I think we can do both. Some random points: * Once we expand timestamptz to bigger than 8 bytes, there's essentially zero cost to making it 12 bytes, and for that matter we could go to 16 without much penalty, because of alignment considerations. So there's plenty of space. * What we need is to be able to represent either a fixed offset from UTC or a reference of some kind to a zic database entry. The most bit-splurging way of doing the former is a signed offset in seconds from Greenwich, which would take 17 bits. It'd be good enough to represent the offset in minutes, which needs only 11 bits. * I suggested OIDs for referencing zic entries, but we don't have to do that; any old mapping table will do. 16 bits would surely be plenty to assign a unique label to every present and future zic entry. * My inclination therefore is to extend timestamptz with two 16-bit fields, one being the offset from UTC (in minutes) and one being the zic identifier. If the identifier is zero then it's a straight numeric offset from UTC and the offset field is all you need (this is the SQL spec compatible case). If the identifier is not zero then it gives you an index to look up the timezone rules. However, there is no need for the offset field to go to waste; we should store the offset anyway, since that might save a trip to the zic database in some cases. * It's not clear to me yet whether the stored offset in the second case should be the zone's standard UTC offset (thus always the same for a given zone ID) or the current-time offset for the timestamp (thus different if the timestamp is in daylight-savings or standard time). * If we store the current-time offset then it almost doesn't matter whether the timestamp itself is stored as a UTC or local time value; you can trivially translate either to the other by adding or subtracting the offset (*60). But I'm inclined to store UTC for consistency with past practice, and because it will make comparisons a bit faster: you can compare the timestamps without adjusting first. Generally I think comparisons ought to be the best-optimized operations in a Postgres datatype, because index operations will do a ton of 'em. (We definitely do NOT want to have to visit the zic database in order to compare two timestamptz values.) regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS]
On T, 2004-10-26 at 14:02, Bernd Helmle wrote: - gram.y is only an ugly hack to get the CHECK OPTION working. needs deeper efforts, because it makes WITH a reserved keyword IMHO it should be a reserved keyword. I once wrangled to fit ANSI SQL recursive queries into postgres grammar and the only way I could do that also forced me to make WITH a reserved keyword. Thus I think that reserved keyword is what it is meant to be in the first place ;) -- Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] DBT-3 Query 2 EXPLAIN ANALYZE differences
I was doing some testing with DBT-3 on our 8-way STP systems and noticed a significant difference in the execution of Query 2 using 8.0beta3. Here is the query template we're using: select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = :1 and p_type like '%:2' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ':3' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ':3' ) order by s_acctbal desc, n_name, s_name, p_partkey This first run executes it pretty fast: http://khack.osdl.org/stp/298338/ 1555.414 ms This second run executes it relatively slow: http://khack.osdl.org/stp/298340/ 42532.855 ms The plans are different and I suspect thats where the differences lie. For brevity (and readability) I won't copy the plans here but I'll provide the links. Search for 'PERF1.POWER.Q2' in the file, it's the second query executed and you'll notice the differences under the SubPlan: The first run: http://khack.osdl.org/stp/298338/results/run/power_query.result The second run: http://khack.osdl.org/stp/298340/results/run/power_query.result I know Jenny has previously presented a problem that was solved by doing a setseed(0), but I noticed the kit doesn't do that anymore. Anyone know if this might be the same or related issues? Thanks, Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] plans for bitmap indexes?
On T, 2004-10-26 at 18:42, Greg Stark wrote: Hannu Krosing [EMAIL PROTECTED] writes: I repeat here my earlier proposal of making the bitmap indexes page-level and clustering data automatically on AND of all defined bitmap indexes. The problem with page-level bitmaps is that they could be much less effective. Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of those matches about 1% of your tuples. If you have 100 tuples per page then each of those bitmaps will find a tuple in about half the pages. So the resulting AND will find about 1/8th of the pages as candidates. In reality the number of pages it should have to fetch should be more like 1 in a million. The other problem is that for persist on-disk indexes they require more work to update. You would have to recheck every other tuple in the page to recalculate the bit value instead of just being able to flip one bit. Read again ;) the per-page clustering would make sure that all the tuples would be on 1 (or on a few) pages. and what comes to updating the index, you have to do it only once per 100 pages ;) Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] plans for bitmap indexes?
On T, 2004-10-26 at 23:53, Hannu Krosing wrote: On T, 2004-10-26 at 18:42, Greg Stark wrote: Hannu Krosing [EMAIL PROTECTED] writes: I repeat here my earlier proposal of making the bitmap indexes page-level and clustering data automatically on AND of all defined bitmap indexes. The problem with page-level bitmaps is that they could be much less effective. Consider a query like 'WHERE foo = ? AND bar = ? AND baz = ? where each of those matches about 1% of your tuples. If you have 100 tuples per page then each of those bitmaps will find a tuple in about half the pages. So the resulting AND will find about 1/8th of the pages as candidates. In reality the number of pages it should have to fetch should be more like 1 in a million. Another way to solve the unequal number of tuples per page problem was to have a fixed length bitmap with rollower (mod length) for each page. So your 100 tuples per page on average table should get a 32 (or 64) bits per page bitmap where bits 1, 33, 65 and 97 would all be in the same position (for 32 bits), but one could still do fast ANDS and ORS with high degree of accuracy. I guess the per-page clustering idea described in my previous mail can even be extended inside the pages (i.e. cluster on same bits in 2/4/8/16/32bit page bitmap) if simple per/page bitmaps would waste too much space (many different values, few actual rows - i.e. not a good candidate for real bitmap indexes ;-p ) The other problem is that for persist on-disk indexes they require more work to update. You would have to recheck every other tuple in the page to recalculate the bit value instead of just being able to flip one bit. Read again ;) the per-page clustering would make sure that all the tuples would be on 1 (or on a few) pages. and what comes to updating the index, you have to do it only once per 100 pages ;) This kind of clustering index works best when created on an empty table, so all tuples can be inserted on their rightful pages. If this kind of BM index is created on a table with some data, we need an additional bitmap for gray pages - that is pages containing tuples matching several combinations of index bits. The way to sharpen a table with gray pages would be either a CLUSTER command or VACUUM (which could check for same-bit-combination-ness. At least an empty page would be initially (or after becoming empty during vacuum) marked non-gray and it should also never become gray unless a new bitmap index is added. - Hannu ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] plans for bitmap indexes?
Hannu Krosing wrote: the per-page clustering would make sure that all the tuples would be on 1 (or on a few) pages. I understand that You can cluster on one column, but how do you do it for indexes on other columns? BTW, lossy variants also lose count(), group by only from index and what comes to updating the index, you have to do it only once per 100 pages ;) Sorry, how does that work, if I update foo = 'bar'-'baz' - I can flip the 'baz' bit on right away but I have to check every other row to see if I can turn the 'bar' bit off Andre ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Unixware 714 pthreads
The only help I can be is that on Unixware (only) the backend is compiled with threading enabled. This might be showing some thread bugs. --- [EMAIL PROTECTED] wrote: Hi every one, I need help to debug the problem I have on Unixware 714 and beta3. postgresql make check hangs on plpgsql test when compiled with --enable-thread-safty. It does hang on select block_me(); This select should be canceled by the set statement_timeout=1000, instead, the backend is 100% CPU bound and only kill -9 can kill it. It works ok when compiled without -enable-thread-safty. I've tried almost every thing I could think of, but not knowing so much about threads and PG source code, I request that someone can help me as to find a way to debug this. It worked up until beta2, but I'm not sure block_me()was there. I really need someone to tell me where to begin. TIA -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] rmtree() failure on Windows
Andrew Dunstan schrieb: Here is some more info. Below is a trace from dropdb. There is a loop around the rmdir() calls which I have set to time out at 600 seconds. The call eventually succeeds after around 300 seconds (I've seen this several times). It looks like we are the victim of some caching - the directory still thinks it has some of the files it has told us we have deleted successfully. 300 secs (!) fs timeout is really broken. Looks more like a locking or network timeout issue. What error codes does unlink(3) return? Why don't you use DeletFileA() instead of unlink()? Or even better, why don't you use this delete on close snippet instead: HANDLE h; h = CreateFile (win32_name, 0, FILE_SHARE_READ, sec_none_nih, OPEN_EXISTING, FILE_FLAG_DELETE_ON_CLOSE, 0); if (h != INVALID_HANDLE_VALUE) { (void) SetFileAttributes (win32_name, (DWORD) win32_name); BOOL res = CloseHandle (h); //syscall_printf (%d = CloseHandle (%p), res, h); if (GetFileAttributes (win32_name) == INVALID_FILE_ATTRIBUTES) { //syscall_printf (CreateFile (FILE_FLAG_DELETE_ON_CLOSE) succeeded); goto ok; } else { //syscall_printf (CreateFile (FILE_FLAG_DELETE_ON_CLOSE) failed); SetFileAttributes (win32_name, (DWORD) win32_name ~(FILE_ATTRIBUTE_READONLY | FILE_ATTRIBUTE_SYSTEM)); } } } /* Try a delete with attributes reset */ if (DeleteFile (win32_name)) { syscall_printf (DeleteFile after CreateFile/CloseHandle succeeded); goto ok; } It should only happen a ERROR_SHARING_VIOLATION on NT systems with such a long timeout. This is then a concurrency problem. win95 will not return ERROR_SHARING_VIOLATION, only ERROR_ACCESS_DENIED ... 2004-10-26 10:31:09 [2496] WARNING: rmtree: rmdir took 274 secs/loops 2004-10-26 10:31:09 [2496] LOG: disconnection: session time: 0:04:34.11 user=pgrunner database=template1 host=127.0.0.1 port=1918 -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Should bgwriter log checkpoint start/end?
In previous releases it was possible to observe whether an automatic checkpoint was in progress by looking to see if there was a postmaster child process doing one. In 8.0 this will not work because the bgwriter is always there. I am thinking that for tasks such as performance debugging it would be a good idea if the bgwriter could emit postmaster log messages at start and end of a checkpoint. However, this should probably not happen at the default LOG level since it would clutter the logs with perfectly routine messages. Any opinions about what elog level to use for this? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] DBT-3 Query 2 EXPLAIN ANALYZE differences
Mark, The plans are different and I suspect thats where the differences lie. For brevity (and readability) I won't copy the plans here but I'll provide the links. Search for 'PERF1.POWER.Q2' in the file, it's the second query executed and you'll notice the differences under the SubPlan: Yeah, the difference seems to be an arbitrary choice of which table to join on first.The error is only +10 ms, but when it's being executed 3000 times ... Will delve into this when I can re-format it, and try to figure out why the planner is choosing what it is. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [PATCHES] [HACKERS] ARC Memory Usage analysis
Thomas, As a result, I was intending to inflate the value of effective_cache_size to closer to the amount of unused RAM on some of the machines I admin (once I've verified that they all have a unified buffer cache). Is that correct? Currently, yes. Right now, e_c_s is used just to inform the planner and make index vs. table scan and join order decisions. The problem which Simon is bringing up is part of a discussion about doing *more* with the information supplied by e_c_s.He points out that it's not really related to the *real* probability of any particular table being cached. At least, if I'm reading him right. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] New compile warnings in CVS
On Wed, 2004-10-27 at 03:57, Tom Lane wrote: No doubt this is from the PG_TRY that Neil added a couple days ago. I think he is going to take it out again in favor of using AllocateFile, so ignore the warnings for now (they're obviously bogus anyway). Sorry, I didn't see those compile warnings locally (using GCC 3.3). I'll send the patch for AllocateFile() to -patches shortly. -Neil ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Wed, 26 Oct 2004, Greg Stark wrote: I don't see why mmap is any more out of reach than O_DIRECT; it's not all that much harder to implement, and mmap (and madvise!) is more widely available. Because there's no way to prevent a write-out from occurring and no way to be notified by mmap before a write-out occurs, and Postgres wants to do its WAL logging at that time if it hasn't already happened. I already described a solution to that problem in a post earlier in this thread (a write queue on the block). I may even have described it on this list a couple of years ago, that being about the time I thought it up. (The mmap idea just won't die, but at least I wasn't the one to bring it up this time. :-)) Well firstly it depends on the work profile. It can probably get much higher than we saw in that profile True, but 1% was is much, much lower than I'd expected. That tells me that my intuitive idea of the performance model is wrong, which means, for me at least, it's time to shut up or put up some benchmarks. Secondly it also reduces the amount of cache available. If you have 256M of ram with about 200M free, and 40Mb of ram set aside for Postgres's buffer cache then you really only get 160Mb. It's costing you 20% of your cache, and reducing the cache hit rate accordingly. Yeah, no question about that. Thirdly the kernel doesn't know as much as Postgres about the load. Postgres could optimize its use of cache based on whether it knows the data is being loaded by a vacuum or sequential scan rather than an index lookup. In practice Postgres has gone with ARC which I suppose a kernel could implement anyways, but afaik neither linux nor BSD choose to do anything like it. madvise(). cjs -- Curt Sampson [EMAIL PROTECTED] +81 90 7737 2974 http://www.NetBSD.org Make up enjoying your city life...produced by BIC CAMERA ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] 8.0b4: COMMIT outside of a transaction echoes ROLLBACK
Ian Barwick wrote: just wondering: test= select version(); version -- PostgreSQL 8.0.0beta4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux) (1 row) test= begin; BEGIN test= commit; COMMIT test= commit; WARNING: there is no transaction in progress ROLLBACK Is there any reason ROLLBACK and not COMMIT is echoed here? Because the transaction was not committed, but rather rolled back. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] ARC Memory Usage analysis
On Mon, 2004-10-25 at 23:53, Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Another issue is what we do with the effective_cache_size value once we have a number we trust. We can't readily change the size of the ARC lists on the fly. Huh? I thought effective_cache_size was just used as an factor the cost estimation equation. Today, that is true. Jan is speculating about using it as a parameter of the ARC cache management algorithm ... and that worries me. Because it's so often set wrong I take it. But if it's set right, and it makes the the database faster to pay attention to it, then I'd be in favor of it. Or at least having a switch to turn on the ARC buffer's ability to look at it. Or is it some other issue, having to do with the idea of knowing effective cache size cause a positive effect overall on the ARC algorhythm? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] pg_dump test success
Hi guys, I just thought I'd let you know that i just dumped our production database with 8.0's pg_dumpall and reloaded it into a test 8.0 database. It worked. No errors. For the first time in our company's history with PostgreSQL, we can upgrade without editing the dump file!!! I feel like my life's work is complete :D Chris ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match