Re: [HACKERS] [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Hiroshi Inoue wrote: I've not seen your reply yet. You keep sending your emails to randomly invented addresses, so I don't get them. I think your questions have been answered in the meantime. If not, please address them to me. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Allow use of stable functions with constraint exclusion
Tom Lane [EMAIL PROTECTED] wrote: Marshall, Steve [EMAIL PROTECTED] writes: I have developed a small patch to optimizer/util/plancat.c that eliminates one of hte caveats associated with constraint exclusions, namely the inability to avoid searching tables based on the results of stable functions. Do you not understand why this is completely unsafe? I think the proposal itself is very useful, because time-based partitioning is commonly used and functions like now() or CURRENT_TIMESTAMP are marked as stable. I'm not clear why the optimization is unsafe. I'm confused to read the definition of stable functions in our documentation. Which is required for stable functions 'stable in a single table scan' or 'stable in a SQL statements' ? If the latter definition is true, can we use them in constraint exclusions? | STABLE indicates that the function cannot modify the database, and | that within a single table scan it will consistently return the same | result for the same argument values, but that its result could change | across SQL statements. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Am Dienstag, 8. Mai 2007 01:45 schrieb Hiroshi Inoue: Must I mail them directly to you in the first place ? Yes. Please note that the question is to your recent sudden change to psqlodbc. These are the same files that psqlodbc has been shipping for years. Just now they are in the CVS repository, to address Tom Lane's concern about being able to reproduce the build. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Seq scans roadmap
Here's my roadmap for the scan-resistant buffer cache and synchronized scans patches. 1. Fix the current vacuum behavior of throwing dirty buffers to the freelist, forcing a lot of WAL flushes. Instead, use a backend-private ring of shared buffers that are recycled. This is what Simon's scan-resistant buffer manager did. The theory here is that if a page is read in by vacuum, it's unlikely to be accessed in the near future, therefore it should be recycled. If vacuum doesn't dirty the page, it's best to reuse the buffer immediately for the next page. However, if the buffer is dirty (and not just because we set hint bits), we ought to delay writing it to disk until the corresponding WAL record has been flushed to disk. Simon's patch used a fixed size ring of buffers that are recycled, but I think the ring should be dynamically sized. Start with a small ring, and whenever you need to do a WAL flush to write a dirty buffer, increase the ring size. On every full iteration through the ring, decrease its size to trim down an unnecessarily large ring. This only alters the behavior of vacuums, and it's pretty safe to say it won't get worse than what we have now. In the future, we can use the buffer ring for seqscans as well; more on that on step 3. 2. Implement the list/table of last/ongoing seq scan positions. This is Jeff's synchronized scans patch. When a seq scan starts on a table larger than some threshold, it starts from where the previous seq scan is currently, or where it ended. This will synchronize the scans so that for two concurrent scans the total I/O is halved in the best case. There should be no other effect on performance. If you have a partitioned table, or union of multiple tables or any other plan where multiple seq scans are performed in arbitrary order, this change won't change the order the partitions are scanned and won't therefore ensure they will be synchronized. Now that we have both pieces of the puzzle in place, it's time to consider what more we can do with them: 3A. To take advantage of the cache trail of a previous seq scan, scan backwards from where the previous seq scan ended, until a you hit a buffer that's not in cache. This will allow taking advantage of the buffer cache even if the table doesn't fit completely in RAM. That can make a big difference if the table size is just slightly bigger than RAM, and can avoid the nasty surprise when a table grows beyond RAM size and queries start taking minutes instead of seconds. This should be a non-controversial change on its own from performance point of view. No query should get slower, and some will become faster. But see step 3B: 3B. Currently, sequential scans on a large table spoils the buffer cache by evicting other pages from the cache. In CVS HEAD, as soon as the table is larger than shared_buffers, the pages in the buffer won't be used to speed up running the same query again, and there's no reason to believe the pages read in would be more useful than any other page in the database, and in particular the pages that were in the buffer cache before the huge seq scan. If the table being scanned is 5 * shared_buffers, the scan will evict every other page from the cache if there's no other activity in the database (max usage_count is 5). If the table is much larger than shared_buffers, say 10 times as large, even with the change 3B to read the pages that are in cache first, using all shared_buffers to cache the table will only speed up the query by 10%. We should not spoil the cache for such a small gain, and use the local buffer ring strategy instead. It's better to make queries that are slow anyway a little bit slower, than making queries that are normally really fast, slow. As you may notice, 3A and 3B are at odds with each other. We can implement both, but you can't use both strategies in the same scan. Therefore we need to have decision logic of some kind to figure out which strategy is optimal. A simple heuristic is to decide based on the table size: 0.1*shared_buffers - start from page 0, keep in cache (like we do now) 5 * shared_buffers - start from last read page, keep in cache 5 * shared_buffers - start from last read page, use buffer ring I'm not sure about the constants, we might need to make them GUC variables as Simon argued, but that would be the general approach. In the future, I'm envisioning a smarter algorithm to size the local buffer ring. Take all buffers with usage_count=0, plus a few with usage_count=1 from the clock sweep. That way if there's a lot of buffers in the buffer cache that are seldomly used, we'll use more buffers to cache the large scan, and vice versa. And no matter how large the scan, it wouldn't blow all buffers from the cache. But if you execute the same query again, the buffers left in the cache from the last scan were apparently useful, so we use a bigger ring this time. I'm going to do this
Re: [HACKERS] Seq scans roadmap
Heikki, On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) implement dynamic I/O caching. The experiments have shown that benefit of re-using PG buffer cache on large sequential scans is vanishingly small when the buffer cache size is small compared to the system memory. Since this is a normal and recommended situation (OS I/O cache is auto-tuning and easy to administer, etc), IMO the effort to optimize buffer cache reuse for seq scans 1 x buffer cache size is not worthwhile. On 3B: The scenario described is multiple readers seq scanning large table and sharing bufcache, but in practice this is not a common situation. The common situation is multiple queries joining several small tables to one or more large tables that are 1 x bufcache. In the common scenario, the dominant factor is the ability to keep the small tables in bufcache (or I/O cache for that matter) while running the I/O bound large table scans as fast as possible. To that point - an important factor in achieving max I/O rate for large tables ( 1 x bufcache) is avoiding the pollution of the CPU L2 cache. This is commonly in the range of 512KB - 2MB, which is only important when considering a bound on the size of the ring buffer. The effect has been demonstrated to be significant - in the 20%+ range. Another thing to consider is the use of readahead inside the heapscan, in which case sizes = 32KB are very effective. We've implemented both ideas (ring buffer, readahead) and see very significant improvements in I/O and query speeds on DSS workloads. I would expect benefits on OLTP as well. The modifications you suggest here may not have the following properties: - don't pollute bufcache for seqscan of tables 1 x bufcache - for tables 1 x bufcache use a ringbuffer for I/O that is ~ 32KB to minimize L2 cache pollution - Luke -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Heikki Linnakangas Sent: Tuesday, May 08, 2007 3:40 AM To: PostgreSQL-development Cc: Jeff Davis; Simon Riggs Subject: [HACKERS] Seq scans roadmap Here's my roadmap for the scan-resistant buffer cache and synchronized scans patches. 1. Fix the current vacuum behavior of throwing dirty buffers to the freelist, forcing a lot of WAL flushes. Instead, use a backend-private ring of shared buffers that are recycled. This is what Simon's scan-resistant buffer manager did. The theory here is that if a page is read in by vacuum, it's unlikely to be accessed in the near future, therefore it should be recycled. If vacuum doesn't dirty the page, it's best to reuse the buffer immediately for the next page. However, if the buffer is dirty (and not just because we set hint bits), we ought to delay writing it to disk until the corresponding WAL record has been flushed to disk. Simon's patch used a fixed size ring of buffers that are recycled, but I think the ring should be dynamically sized. Start with a small ring, and whenever you need to do a WAL flush to write a dirty buffer, increase the ring size. On every full iteration through the ring, decrease its size to trim down an unnecessarily large ring. This only alters the behavior of vacuums, and it's pretty safe to say it won't get worse than what we have now. In the future, we can use the buffer ring for seqscans as well; more on that on step 3. 2. Implement the list/table of last/ongoing seq scan positions. This is Jeff's synchronized scans patch. When a seq scan starts on a table larger than some threshold, it starts from where the previous seq scan is currently, or where it ended. This will synchronize the scans so that for two concurrent scans the total I/O is halved in the best case. There should be no other effect on performance. If you have a partitioned table, or union of multiple tables or any other plan where multiple seq scans are performed in arbitrary order, this change won't change the order the partitions are scanned and won't therefore ensure they will be synchronized. Now that we have both pieces of the puzzle in place, it's time to consider what more we can do with them: 3A. To take advantage of the cache trail of a previous seq scan, scan backwards from where the previous seq scan ended, until a you hit a buffer that's not in cache. This will allow taking advantage of the buffer cache even if the table doesn't fit completely in RAM. That can make a big difference if the table size is just slightly bigger than RAM, and can avoid the nasty surprise when a table grows beyond RAM size and queries start taking minutes instead of seconds. This should be a non-controversial change on its own from performance point of view. No query should get slower, and some will become faster. But see step 3B: 3B. Currently, sequential scans on a large table spoils the buffer cache by evicting other pages from the cache. In CVS HEAD, as
Re: [HACKERS] Allow use of immutable functions operating on constants with constraint exclusion
ITAGAKI Takahiro wrote: Tom Lane [EMAIL PROTECTED] wrote: Marshall, Steve [EMAIL PROTECTED] writes: I have developed a small patch to optimizer/util/plancat.c that eliminates one of hte caveats associated with constraint exclusions, namely the inability to avoid searching tables based on the results of stable functions. Do you not understand why this is completely unsafe? I think the proposal itself is very useful, because time-based partitioning is commonly used and functions like now() or CURRENT_TIMESTAMP are marked as stable. I'm not clear why the optimization is unsafe. I'm confused to read the definition of stable functions in our documentation. Which is required for stable functions 'stable in a single table scan' or 'stable in a SQL statements' ? If the latter definition is true, can we use them in constraint exclusions? | STABLE indicates that the function cannot modify the database, and | that within a single table scan it will consistently return the same | result for the same argument values, but that its result could change | across SQL statements The lack of safety comes from prepared statements. If the above optimization was used, the value of the stable function would be used when the statement was prepared, and the query plan would then be set using the stable function value as though it were a constant. For partitioned tables, this could result in a failure to scan tables needed to meet the query constraints. I think the optimization could work if you could exclude prepared statements. However, I looked at the planning code and found no clear way to distinguish between a statement being prepared for later execution, and a statement being planned only for immediate execution. As a result, I don't think stable functions can (or should) be expanded to help optimize queries using constraint exclusion. However, I think it would be possible to expand immutable functions operating on constants to optimize constraint exclusion. Immutable functions will always return the same result given the same inputs, so this would be safe. Currently, immutable functions are not expanded during planning such that the first query would be optimized using constraint exclusion, while the second query would not: SELECT * FROM test_bulletins WHERE created_at '2006-09-09 05:00:00+00'::timestamptz; SELECT * FROM test_bulletins WHERE created_at '2006-09-09 05:00:00+00'::timestamptz + '0 days'::interval; See the attached SQL file for table creation and other SQL examples. The real question here is if the optimization is worth the effort. Personally, I commonly use queries of this sort, and so would be in favor of this expansion of immutable functions operating on constrats as an optimization. I find it convenient to use the database to do the time manipulation (e.g. adding intervals to timestamps). However, the logic to manipulate times can be pushed into application code if need be. I've found I have to do a lot of explaining to developers as to why two queries that look so similar perform very differently. -- -- Create and connect to the test database -- CREATE DATABASE test_ce_db; ALTER DATABASE test_ce_db OWNER TO postgres; \connect test_ce_db; -- -- Make a parent table and three child tables partitioned by time using created_at column. -- CREATE TABLE test_bulletins ( created_at timestamp with time zone NOT NULL, data text NOT NULL ); CREATE TABLE test_bulletins_20060908 (CONSTRAINT time_partition_limits CHECK (((created_at = '2006-09-08 00:00:00+00'::timestamp with time zone) AND (created_at '2006-09-09 00:00:00+00'::timestamp with time zone))) ) INHERITS (test_bulletins); CREATE TABLE test_bulletins_20060909 (CONSTRAINT time_partition_limits CHECK (((created_at = '2006-09-09 00:00:00+00'::timestamp with time zone) AND (created_at '2006-09-10 00:00:00+00'::timestamp with time zone))) ) INHERITS (test_bulletins); CREATE TABLE test_bulletins_20060910 (CONSTRAINT time_partition_limits CHECK (((created_at = '2006-09-10 00:00:00+00'::timestamp with time zone) AND (created_at '2006-09-11 00:00:00+00'::timestamp with time zone))) ) INHERITS (test_bulletins); -- -- Setup environment for queries. -- SET constraint_exclusion = on; \pset footer off; -- -- Do test case queries. -- SELECT 'This query should avoid use of table test_bulletins_20060908, even with OLD CE code' as Test case 1; EXPLAIN SELECT * FROM test_bulletins WHERE created_at '2006-09-09 05:00:00+00'::timestamptz; SELECT 'This query should avoid use of table test_bulletins_20060908, ONLY with NEW CE code' as Test case 2; EXPLAIN SELECT * FROM test_bulletins WHERE created_at '2006-09-09 05:00:00+00'::timestamptz + '0 days'::interval; SELECT 'This query uses a stable function; it should NOT be optimized' as Test case 3; EXPLAIN SELECT * FROM test_bulletins WHERE created_at now() - (now() - '2006-09-09 05:00:00+00'::timestamptz);
Re: [HACKERS] Seq scans roadmap
Luke Lonergan wrote: On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) implement dynamic I/O caching. The experiments have shown that benefit of re-using PG buffer cache on large sequential scans is vanishingly small when the buffer cache size is small compared to the system memory. Since this is a normal and recommended situation (OS I/O cache is auto-tuning and easy to administer, etc), IMO the effort to optimize buffer cache reuse for seq scans 1 x buffer cache size is not worthwhile. That's interesting. Care to share the results of the experiments you ran? I was thinking of running tests of my own with varying table sizes. The main motivation here is to avoid the sudden drop in performance when a table grows big enough not to fit in RAM. See attached diagram for what I mean. Maybe you're right and the effect isn't that bad in practice. I'm thinking of attacking 3B first anyway, because it seems much simpler to implement. On 3B: The scenario described is multiple readers seq scanning large table and sharing bufcache, but in practice this is not a common situation. The common situation is multiple queries joining several small tables to one or more large tables that are 1 x bufcache. In the common scenario, the dominant factor is the ability to keep the small tables in bufcache (or I/O cache for that matter) while running the I/O bound large table scans as fast as possible. How is that different from what I described? To that point - an important factor in achieving max I/O rate for large tables ( 1 x bufcache) is avoiding the pollution of the CPU L2 cache. This is commonly in the range of 512KB - 2MB, which is only important when considering a bound on the size of the ring buffer. The effect has been demonstrated to be significant - in the 20%+ range. Another thing to consider is the use of readahead inside the heapscan, in which case sizes = 32KB are very effective. Yeah I remember the discussion on the L2 cache a while back. What do you mean with using readahead inside the heapscan? Starting an async read request? The modifications you suggest here may not have the following properties: - don't pollute bufcache for seqscan of tables 1 x bufcache - for tables 1 x bufcache use a ringbuffer for I/O that is ~ 32KB to minimize L2 cache pollution So the difference is that you don't want 3A (the take advantage of pages already in buffer cache) strategy at all, and want the buffer ring strategy to kick in earlier instead. Am I reading you correctly? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com inline: seqscan-caching.png ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Seq scans roadmap
Heikki, That's interesting. Care to share the results of the experiments you ran? I was thinking of running tests of my own with varying table sizes. Yah - it may take a while - you might get there faster. There are some interesting effects to look at between I/O cache performance and PG bufcache, and at those speeds the only tool I've found that actually measures scan rate in PG is VACUUM. SELECT COUNT(*) measures CPU consumption in the aggregation node, not scan rate. Note that the copy from I/O cache to PG bufcache is where the L2 effect is seen. The main motivation here is to avoid the sudden drop in performance when a table grows big enough not to fit in RAM. See attached diagram for what I mean. Maybe you're right and the effect isn't that bad in practice. There are going to be two performance drops, first when the table doesn't fit into PG bufcache, the second when it doesn't fit in bufcache + I/O cache. The second is severe, the first is almost insignificant (for common queries). How is that different from what I described? My impression of your descriptions is that they overvalue the case where there are multiple scanners of a large ( 1x bufcache) table such that they can share the first load of the bufcache, e.g. your 10% benefit for table = 10x bufcache argument. I think this is a non-common workload, rather there are normally many small tables and several large tables such that sharing the PG bufcache is irrelevant to the query speed. Yeah I remember the discussion on the L2 cache a while back. What do you mean with using readahead inside the heapscan? Starting an async read request? Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page size (which is much smaller than what is needed for fast sequential I/O). The modifications you suggest here may not have the following properties: - don't pollute bufcache for seqscan of tables 1 x bufcache - for tables 1 x bufcache use a ringbuffer for I/O that is ~ 32KB to minimize L2 cache pollution So the difference is that you don't want 3A (the take advantage of pages already in buffer cache) strategy at all, and want the buffer ring strategy to kick in earlier instead. Am I reading you correctly? Yes, I think the ring buffer strategy should be used when the table size is 1 x bufcache and the ring buffer should be of a fixed size smaller than L2 cache (32KB - 128KB seems to work well). - Luke ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Seq scans roadmap
Luke Lonergan wrote: What do you mean with using readahead inside the heapscan? Starting an async read request? Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page size (which is much smaller than what is needed for fast sequential I/O). Are you filling multiple buffers in the buffer cache with a single read-call? The OS should be doing readahead for us anyway, so I don't see how just issuing multiple ReadBuffers one after each other helps. Yes, I think the ring buffer strategy should be used when the table size is 1 x bufcache and the ring buffer should be of a fixed size smaller than L2 cache (32KB - 128KB seems to work well). I think we want to let the ring grow larger than that for updating transactions and vacuums, though, to avoid the WAL flush problem. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Allow use of immutable functions operating on constants with constraint exclusion
On Tue, May 08, 2007 at 08:08:28AM -0400, Marshall, Steve wrote: However, I think it would be possible to expand immutable functions operating on constants to optimize constraint exclusion. Immutable functions will always return the same result given the same inputs, so this would be safe. Currently, immutable functions are not expanded during planning such that the first query would be optimized using constraint exclusion, while the second query would not: The problem with this is that at planning time you don't necessarily have an active transaction snapshot. Prepared statements are the obvious example, but I think even in one-off statements there's no snapshot until after the planner has completed. This is also one of the problems with type input/output functions looking up stuff in tables. There was discussion about the handling type input/output functins and casts as a sort off InitExpr that is executed once, then inserted into the tree. However, that would still be too late to affect the planning. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Peter Eisentraut wrote: Am Dienstag, 8. Mai 2007 01:45 schrieb Hiroshi Inoue: Must I mail them directly to you in the first place ? Yes. Oh I seem to have been apart from the community too long. Could you please tell me where I can find the rule ? regards, HIroshi Inoue ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Allow use of immutable functions operating on constants with constraint exclusion
Marshall, Steve [EMAIL PROTECTED] writes: Currently, immutable functions are not expanded during planning Incorrect, see eval_const_expressions(). regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Feature lists for 8.3 and 8.4
Hi, I am migrating from Sybase to Postgres. trying to get a hold on the issue of future releases feature list. 1. Where can I find the feature list for 8.3 ? When is it expected? 2. And for 8.4? 3. Who is the guy to ask about a feature like startup migrate in ORACLE? Thanks Danny ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Feature lists for 8.3 and 8.4
Abraham, Danny wrote: Hi, I am migrating from Sybase to Postgres. trying to get a hold on the issue of future releases feature list. 1. Where can I find the feature list for 8.3 ? When is it expected? 2. And for 8.4? There are various places to look for this kind of information, but probably the easiest to digest is looking at the todo lists on the wiki: http://developer.postgresql.org/index.php/Todo:Contents regards, Lukas PS: Since this is a fairly new effort, if you have any hints on how to improve things, let us know. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Managing the community information stream
On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote: Instead, if all feature requests are tracked then users can vote on what's most important to them. I am sympathetic to the issues you and Andrew are describing (I understand Bruce's stream analogy, but I think Andrew is right that from the user's point of view, it's not usable). But I am not convinced that users voting on desired features will get us the users' desired features. The features we get are mostly the features that have attracted developers. The method by which that attraction happens is interesting, but I don't think it's democratic. A -- Andrew Sullivan | [EMAIL PROTECTED] Information security isn't a technological problem. It's an economics problem. --Bruce Schneier ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Am Dienstag, 8. Mai 2007 15:12 schrieb Hiroshi Inoue: Oh I seem to have been apart from the community too long. Could you please tell me where I can find the rule ? The only rule there is is that if you want to talk to person X, you write to person X. That rule is as old as communications. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Seq scans roadmap
Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page size (which is much smaller than what is needed for fast sequential I/O). Problem here is that eighter you issue the large read into throwaway private memory and hope that when you later read 8k you get the page from OS buffercache, or you need ScatterGather IO and a way to grab 32 buffers at once. Yes, I think the ring buffer strategy should be used when the table size is 1 x bufcache and the ring buffer should be of a fixed size smaller than L2 cache (32KB - 128KB seems to work well). How do you merge those two objectives? It seems the ring buffer needs to be at least as large as the contiguous read size. Thus you would need at least 256k ring buffer. Better yet have twice the IO size as ring buffer size, so two sessions can alternately take the lead while the other session still blocks a prev page. Modern L2 cache is 8 Mb, so 512k seems no problem ? Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Seq scans roadmap
What do you mean with using readahead inside the heapscan? Starting an async read request? Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page size (which is much smaller than what is needed for fast sequential I/O). Are you filling multiple buffers in the buffer cache with a single read-call? yes, needs vector or ScatterGather IO. The OS should be doing readahead for us anyway, so I don't see how just issuing multiple ReadBuffers one after each other helps. Last time I looked OS readahead was only comparable to 32k blocked reads. 256k blocked reads still perform way better. Also when the OS is confronted with an IO storm the 256k reads perform way better than OS readahead. Andreas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Managing the community information stream
Andrew Sullivan wrote: On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote: Instead, if all feature requests are tracked then users can vote on what's most important to them. I am sympathetic to the issues you and Andrew are describing (I understand Bruce's stream analogy, but I think Andrew is right that from the user's point of view, it's not usable). But I am not convinced that users voting on desired features will get us the users' desired features. The features we get are mostly the features that have attracted developers. The method by which that attraction happens is interesting, but I don't think it's democratic. Getting votes might provide a useful point of information, not a way of making decisions, though. I certainly don't regard it as a must-have feature. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Managing the community information stream
On May 8, 2007, at 9:50 AM, Andrew Sullivan wrote: On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote: Instead, if all feature requests are tracked then users can vote on what's most important to them. I am sympathetic to the issues you and Andrew are describing (I understand Bruce's stream analogy, but I think Andrew is right that from the user's point of view, it's not usable). But I am not convinced that users voting on desired features will get us the users' desired features. The features we get are mostly the features that have attracted developers. The method by which that attraction happens is interesting, but I don't think it's democratic. It may... it may not. If a high-demand feature sits around long enough it could well attract someone capable of working on it, but who isn't a current contributor. Or it could attract a bounty. I'm also not sure if PostgreSQL is quite the same as other OSS projects. My impression is that we have quite a few developers who no longer do much if any database development (ie: they're not serious users); they continue to contribute because of other reasons. I suspect developers like that are not unlikely to scratch an itch that isn't their own. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Seq scans roadmap
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: Are you filling multiple buffers in the buffer cache with a single read-call? yes, needs vector or ScatterGather IO. I would expect that to get only moderate improvement. To get the full benefit I would think you would want to either fire off a separate thread to do the read-ahead, use libaio, or funnel the read-ahead requests to a separate thread like our bgwriter only it would be a bgreader or something like that. The OS should be doing readahead for us anyway, so I don't see how just issuing multiple ReadBuffers one after each other helps. Last time I looked OS readahead was only comparable to 32k blocked reads. 256k blocked reads still perform way better. Also when the OS is confronted with an IO storm the 256k reads perform way better than OS readahead. Well that's going to depend on the OS. Last I checked Linux's readahead logic is pretty straightforward and doesn't try to do any better than 32k readahead and is easily fooled. However I wouldn't be surprised if that's changed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Managing the community information stream
Jim Nasby wrote: On May 8, 2007, at 9:50 AM, Andrew Sullivan wrote: On Mon, May 07, 2007 at 07:36:55AM -0500, Jim Nasby wrote: Instead, if all feature requests are tracked then users can vote on what's most important to them. I am sympathetic to the issues you and Andrew are describing (I understand Bruce's stream analogy, but I think Andrew is right that from the user's point of view, it's not usable). But I am not convinced that users voting on desired features will get us the users' desired features. The features we get are mostly the features that have attracted developers. The method by which that attraction happens is interesting, but I don't think it's democratic. It may... it may not. If a high-demand feature sits around long enough it could well attract someone capable of working on it, but who isn't a current contributor. Or it could attract a bounty. Also keep in mind that many of the developers are working for companies that ensure that resources get allocated according to what users need and not only by what developers are motivated to work on. That being said, it seems obvious that so far PostgreSQL has been mainly driven by what developers feel like implementing. I think this is also what ensured the high level of standards compliance of PostgreSQL, since features were tailored for experienced DBA types, rather than end users that are less experienced in how to leverage these standards. regards, Lukas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Patch: Allocation of kind codes for spatial type.
Ale Raza [EMAIL PROTECTED] writes: Attached is the modified pg_statistic.h file for this patch. As you mentioned 200 - 299 is reserved for ESRI st_geometry type. Out of these values I am using code 200 for st_geometry. Done. For future reference, what people normally mean by a patch is diff -c output. Sending the whole modified file is not convenient because it can't be dropped-in if there have been any changes since the version you worked from (as indeed there have been, in this case). regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [COMMITTERS] pgsql: Teach tuplesort.c about top N sorting, in which only the first
Magnus Hagander wrote: Alvaro Herrera wrote: Jim Nasby wrote: If we really want to make the logfile the approved method for monitoring performance, then why do we have the stats infrastructure at all? It could all be replaced with logging output and pgfouine. First we'd have to fix the usability problem of our redirect_stderr stuff for pgfouine (multiline messages from different backends are mixed, according to Guillaume). I'd like to sign on to the list of people saying that logging isn't the best way to do performance measuring. Providing a way to get at the counters in realtime for monitoring or graphing or whatever is what AFAIK everybody else do, and it's for a good reason - it fits in to existing monitoring/management solutions. It may be that our current stats system isn't the best way to do this, but if it isn't that just means we have to come up with a better way. Speaking of which, it might be interesting to actually show these values in the stats collector. I was thinking three cols for each database (probably the best level?) that counts each of those three counters. If you have a lot of sorts (percentage-wise) spilling to disk, it is often something you want to investigate, so exposing it that way seems like a good thing. Comments on that? Oh, and is it too late to sneak this into 8.3 claiming it's an extension to the addition of the new sort feature? ;-) //Magnus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Managing the community information stream
Hi, guess I missed hackers on my initial reply. So I am re-sending the reply I send to Joshua based on the reply I send to him in regards to a hackers@ posting. Read below. regards, Lukas Joshua D. Drake wrote: That being said, it seems obvious that so far PostgreSQL has been mainly driven by what developers feel like implementing. I think this is also what ensured the high level of standards compliance of PostgreSQL, since features were tailored for experienced DBA types, rather than end users that are less experienced in how to leverage these standards. PostgreSQL has *never* been developed with the DBA in mind. Keep in mind that most of the postgresql developers have *zero* real world experience. Nor do they run postgresql themselves in real world production environments. Well, certainly more with a DBA in mind than a middle tier developer? regards, Lukas PS: Did you mean to only reply to me? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] problem using semaphores
On Wed, 2007-09-05 at 01:07 +0530, Raja Agrawal wrote: We are using the following piece of code for updating a list synchronously i.e. no two threads would update the list at a time. Do you mean threads or processes? Is the following way of using semaphores not correct in the postgres environment? Why can't you use LWLocks? -Neil ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] savepoints and upgrading locks
This thread here became a TODO item: http://archives.postgresql.org/pgsql-hackers/2006-11/msg01011.php During that discussion a patch was produced that nobody seemed to have objections to. The main problem seemed to be that it wouldn't always downgrade the lock on a ROLLBACK TO (as I understand it), which doesn't seem like a problem to me. Is there a reason this isn't a part of 8.3, or was it just forgotten? Regards, Jeff Davis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Patch: Allocation of kind codes for spatial type.
Ale Raza wrote: Ok, Thanks, Which release it will be in, PostgreSQL 8.2.5? 8.3, but you can actually use the numbers in whatever release you please as it is unlikely that they would have been used by anyone else (much less somebody with which you would be sharing a database). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Seq scans roadmap
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote: I'm going to do this incrementally, and we'll see how far we get for 8.3. We might push 3A and/or 3B to 8.4. First, I'm going to finish up Simon's patch (step 1), run some performance tests with vacuum, and submit a patch for that. Then I'll move to Jeff's patch (step 2). I think it's best to postpone 3A (one aspect of my patch). There are a couple reasons: 1) The results didn't show enough benefit yet. 2) Complex interactions with Simon's patch. I think it's an area of research for the future, but for now I just want to concentrate on the most important aspect of my patch: the synchronized scanning ( #2 in your list ). Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Seq scans roadmap
On Tue, 2007-05-08 at 07:47 -0400, Luke Lonergan wrote: Heikki, On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) implement dynamic I/O caching. The experiments have shown that benefit of re-using PG buffer cache on large sequential scans is vanishingly small when the buffer cache size is small compared to the system memory. Since this is a normal and recommended situation (OS I/O cache is auto-tuning and easy to administer, etc), IMO the effort to optimize buffer cache reuse for seq scans 1 x buffer cache size is not worthwhile. I think 3A is still an interesting idea, but I agree that it needs some results to back it up. Let's save 3A for the next release so that we have more time to see. To that point - an important factor in achieving max I/O rate for large tables ( 1 x bufcache) is avoiding the pollution of the CPU L2 cache. This is commonly in the range of 512KB - 2MB, which is only important when considering a bound on the size of the ring buffer. The effect has been demonstrated to be significant - in the 20%+ range. Another thing to consider is the use of readahead inside the heapscan, in which case sizes = 32KB are very effective. One thing I'd like us to keep in mind is to have a reasonable number of buffers active for a sequential scan. If the number is too small, my sync scans might not even work. Right now my patch only reports every 16 pages, so 32KB (=4 pages) is not going to work for sync scans (I suppose only testing will tell). Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Re: [COMMITTERS] psqlodbc - psqlodbc: Put Autotools-generated files into subdirectory
Peter Eisentraut wrote: Am Dienstag, 8. Mai 2007 15:12 schrieb Hiroshi Inoue: Oh I seem to have been apart from the community too long. Could you please tell me where I can find the rule ? The only rule there is is that if you want to talk to person X, you write to person X. That rule is as old as communications. Hmmm have I misunderstood mailing lists ? Is the rule right in case of communications via mailing lists ? For example I know a ML which doesn't use direct reply mais at all. As for PostgreSQL a few years ago AFAIR it was left to each member whether he sends direct reply mails as well. I don't know if it is stll so currently. regards, Hiroshi Inoue ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org