Re: [HACKERS] Bug: Buffer cache is not scan resistant
Simon Riggs [EMAIL PROTECTED] wrote: I've implemented buffer recycling, as previously described, patch being posted now to -patches as scan_recycle_buffers. - for VACUUMs of any size, with the objective of reducing WAL thrashing whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as originally suggested by Itagaki-san, just with a different implementation). I tested your patch with VACUUM FREEZE. The performance was improved when I set scan_recycle_buffers 32. I used VACUUM FREEZE to increase WAL traffic, but this patch should be useful for normal VACUUMs with backgrond jobs! N | time | WAL flush(*) -+---+--- 0 | 58.7s | 0.01% 1 | 80.3s | 81.76% 8 | 73.4s | 16.73% 16 | 64.2s | 9.24% 32 | 59.0s | 4.88% 64 | 56.7s | 2.63% 128 | 55.1s | 1.41% (*) WAL flush is the ratio of the need of fsync to buffer recycle. # SET scan_recycle_buffers = 0; # UPDATE accounts SET aid=aid WHERE random() 0.005; # CHECKPOINT; # SET scan_recycle_buffers = N; # VACUUM FREEZE accounts; BTW, does the patch change the default usage of buffer in vacuum? From what I've seen, scan_recycle_buffers = 1 is the same as before. With the default value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool, just like existing sequential scans. Is this intended? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug: Buffer cache is not scan resistant
On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote: Simon Riggs [EMAIL PROTECTED] wrote: I've implemented buffer recycling, as previously described, patch being posted now to -patches as scan_recycle_buffers. - for VACUUMs of any size, with the objective of reducing WAL thrashing whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as originally suggested by Itagaki-san, just with a different implementation). I tested your patch with VACUUM FREEZE. The performance was improved when I set scan_recycle_buffers 32. I used VACUUM FREEZE to increase WAL traffic, but this patch should be useful for normal VACUUMs with backgrond jobs! Thanks. N | time | WAL flush(*) -+---+--- 0 | 58.7s | 0.01% 1 | 80.3s | 81.76% 8 | 73.4s | 16.73% 16 | 64.2s | 9.24% 32 | 59.0s | 4.88% 64 | 56.7s | 2.63% 128 | 55.1s | 1.41% (*) WAL flush is the ratio of the need of fsync to buffer recycle. Do you have the same measurement without patch applied? I'd be interested in the current state also (the N=0 path is modified as well for VACUUM, in this patch). # SET scan_recycle_buffers = 0; # UPDATE accounts SET aid=aid WHERE random() 0.005; # CHECKPOINT; # SET scan_recycle_buffers = N; # VACUUM FREEZE accounts; Very good results, thanks. I'd be interested in the same thing for just VACUUM and for varying ratios of dirty/clean blocks during vacuum. BTW, does the patch change the default usage of buffer in vacuum? From what I've seen, scan_recycle_buffers = 1 is the same as before. That was the intention. With the default value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool, just like existing sequential scans. Is this intended? Yes, but its not very useful for testing to have done that. I'll do another version within the hour that sets N=0 (only) back to current behaviour for VACUUM. One of the objectives of the patch was to prevent VACUUM from tripping up other backends. I'm confident that it will improve that situation for OLTP workloads running regular concurrent VACUUMs, but we will need to wait a couple of days to get those results in also. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bug: Buffer cache is not scan resistant
On Mon, 2007-03-12 at 09:14 +, Simon Riggs wrote: On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote: With the default value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool, just like existing sequential scans. Is this intended? Yes, but its not very useful for testing to have done that. I'll do another version within the hour that sets N=0 (only) back to current behaviour for VACUUM. New test version enclosed, where scan_recycle_buffers = 0 doesn't change existing VACUUM behaviour. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ? .swo ? .swp ? GNUmakefile ? RUIP-Oct12.patch ? UIP-datetime.patch ? WAL_notes.txt ? autovac.patch ? backup_minor_changes.patch ? backup_single_transaction.patch ? backup_typo.patch ? bufmgr_cyclic_reuse.v1.patch ? bufmgr_cyclic_reuse.v2.patch ? bufmgr_cyclic_reuse.v3.patch ? c2 ? cache_option.sql ? cacheinval.v1.patch ? check ? check2 ? checl ? commit_nowait.wip1.patch ? commit_nowait.wip2.patch ? commit_options.wip.v1.patch ? commitsiblings.v1.patch ? config.log ? config.status ? copy_noWAL.patch ? copy_noWAL2.patch ? copy_nowal.v1.patch ? copy_nowal.v2.patch ? copy_nowal_prep.sql ? copy_nowal_test.sql ? crc ? d.out ? datatype_frequency.sql ? dbtValidate.sql ? ddl_caveats.patch ? deadlock.patch ? edbDataValidation.sql ? error.sql ? executestricttest ? explain_analyze_timer.v1.patch ? f.sql ? faq.patch ? faq_tab.patch ? fast_cluster.v1.patch ? fast_cluster.v2.patch ? fastcopytest.sql ? funcs ? heap_diag_funcs.v1.patch ? heap_page_func.v1.patch ? hot ? iub_doc.patch ? iubackup.patch ? keepcache.patch ? last_restart_point.v1.patch ? last_restart_point.v2.patch ? log_autovacuum.v1.patch ? log_autovacuum.v2.patch ? log_lock_waits.v1.patch ? logfile ? makefile.custom ? mi ? mj.sql ? mj_circular.v1.patch ? mj_circular.v2.patch ? mj_circular.v3.patch ? mk.out ? mk.tmp ? mki.out ? mkinfo ? mkt.tmp ? mlog ? notintransblock.patch ? num_commits++ ? on.sql ? perf_doc.v1.patch ? pg_dump_analyze.v1.patch ? pg_dump_analyze.v1a.patch ? pg_standby.v2.tar ? pg_standby.v3.tar ? pg_standby.v4.tar ? pg_standby.v5.tar ? pg_standby.v6.tar ? pgbenchValidate.sql ? pgre.tar ? pgs.tar ? pgtrace_idle.REL8_3-20070104.v1.patch ? pgtrace_newprobes.REL8_3-20070104.v1.patch ? pgtrace_newprobes.v2.patch ? pitr_cleanup.patch ? portals_per_user.v1.patch ? ps.txt ? raw_page_funcs.v1.patch ? raw_page_funcs.v2.patch ? relationkeepcache.patch ? relopt ? restartablerecovery_docs.patch ? ri_initial_check.sql ? scan_recycle_buffers.v1.patch ? scan_recycle_buffers.v2.patch ? scan_recycle_buffers.v3.patch ? sel.patch ? singletransdocpatch.patch ? sss ? table_options.patch ? test_warm_standby.tar ? testcrc ? toast_options.sql ? toast_relopts.pgsql.v2.patch ? toast_relopts.v1.patch ? toast_tuning_ideas.patch ? toast_tuple_threshold_fixed.patch ? toastcache.patch ? trace_wal_flush.v1.patch ? tracedocs.patch ? transaction_guarantee.v1.patch ? transaction_guarantee.v2.patch ? transaction_guarantee.v3.patch ? transaction_guarantee.v4.patch ? transaction_guarantee.v5.patch ? transaction_guarantee.v6.patch ? uip_both.patch ? unlink_pg_internal_at_startup.patch ? vac_full_reindex.v1.patch ? vac_hint.v1.patch ? vac_hint.v2.patch ? vacstrategy.v1.patch ? vacstrategy.v2.patch ? vacstrategy_simple.v1.patch ? wal_checksum.v1.patch ? wal_checksum.v2.patch ? wrap_limit.patch ? xlog_clog_truncate.patch ? xlog_relcache.patch ? xlogswitchtuning.patch ? xlogswitchtuning2.patch ? xlogviewer.tar ? contrib/pg_relation_extend ? contrib/pgbench/.runtest.sh.swp ? contrib/pgbench/tpc_b.sql ? contrib/pgbench/truckin.pgb ? contrib/pgstattuple/pgstatheap.c ? doc/src/sgml/check ? doc/src/sgml/trace.sgml ? src/Makefile.global ? src/backend/postgres ? src/backend/access/common/rawpage.c ? src/backend/access/nbtree/nbttuple.c ? src/backend/catalog/.catalog.c.swp ? src/backend/catalog/postgres.bki ? src/backend/catalog/postgres.description ? src/backend/catalog/postgres.shdescription ? src/backend/commands/.tablespace.c.swp ? src/backend/postmaster/walwriter.c ? src/backend/storage/lmgr/.deadlock.c.swp ? src/backend/utils/mb/conversion_procs/conversion_create.sql ? src/backend/utils/mb/conversion_procs/ascii_and_mic/libascii_and_mic.so.0.0 ? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/libcyrillic_and_mic.so.0.0 ? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/libeuc_cn_and_mic.so.0.0 ? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/libeuc_jp_and_sjis.so.0.0 ? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/libeuc_kr_and_mic.so.0.0 ? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/libeuc_tw_and_big5.so.0.0 ? src/backend/utils/mb/conversion_procs/latin2_and_win1250/liblatin2_and_win1250.so.0.0 ? src/backend/utils/mb/conversion_procs/latin_and_mic/liblatin_and_mic.so.0.0 ? src/backend/utils/mb/conversion_procs/utf8_and_ascii/libutf8_and_ascii.so.0.0 ? src/backend/utils/mb/conversion_procs/utf8_and_big5/libutf8_and_big5.so.0.0 ?
Re: [HACKERS] autovacuum next steps, take 3
Alvaro Herrera wrote: worker to-do list - It removes from its to-do list the tables being processed. Finally, it writes the list to disk. I am worrying about the worker-to-do-list in your proposal. I think worker isn't suitable to maintain any vacuum task list; instead it is better to maintain a unified vacuum task queue on autovacuum share memory. Here are the basic ideas: * Why is such a task queue needed? - Launcher might schedule all vacuum tasks by such a queue. It provides a facility to schedule tasks smartly for further autovacuum improvement. - Also such a task list can be viewed easily from a system view. This can be implemented easily in 8.3 by the task queue. * VACUUM task queue VACUUM tasks of cluster are maintained in a unified cluster-wide queue in the share memory of autovacuum. global shared TaskInfo tasks[]; It can be viewed as: SELECT * FROM pg_autovacuum_tasks; dbid | relid | group | worker ---+---+---+ 2 | 20001 | 0 | 1001 2 | 20002 | 0 | 3 | 30001 | 0 | 1002 VACUUM tasks belong to the same database might be divided into several groups. One worker might be assigned to process one specific task group. The task queue might be filled by dedicated task-gathering-worker or it might be filled by *external task gatherer*. It allows external program to develop a more sophisticated vacuum scheme. Based on previous discussion, it appears that it is difficult to implement an all-purpose algorithm to satisfy the requirements of all applications. It is better to allow user to develop their vacuum strategies. *User-defined external program* might fill the task queue, and schedule tasks by their own strategy. Launcher will response for coordinating workers only. This pluggable-vacuum-strategy approach seems a good solution. * status of worker It is also convenience to allow user to monitor the status of vacuum worker by a system view.The snapshot of worker can also be viewed as: SELECT * FROM pg_autovacuum_workers; pid | dbid | relid | group --+---+---+--- 1001 | 2 | 20001 | 0 1002 | 3 | 30001 | 0 Best Regards Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Grouped Index Tuples / Clustered Indexes
Simon Riggs wrote: Better thought: say that CLUSTER requires an order-defining index. That better explains the point that it is the table being clustered, using the index to define the physical order of the rows in the heap. We then use the word clustered to refer to what has happened to the table, and with this patch, for the index also. That way we can have new syntax for CLUSTER CLUSTER table ORDER BY indexname which is then the preferred syntax, rather than the perverse CLUSTER index ON table which gives the wrong impression about what is happening, since it is the table that is changed, not the index. I like that, order-defining index conveys the point pretty well. - Are you suggesting that we have an explicit new syntax CREATE [UNIQUE] CLUSTERED INDEX [CONCURRENTLY] fooidx ON foo () ... or just that we refer to this feature as Clustered Indexes? I'm not proposing new syntax, just a WITH-parameter. Makes more sense to me that way, the clusteredness has no user-visible effects except performance, and it's b-tree specific (though I guess you could apply the same concept to other indexams as well). - Do you think that all Primary Keys should be clustered? No. There's a significant CPU overhead when the index and table are in memory and you're doing simple one-row lookups. And there's no promise that a table is physically in primary key order anyway. There might be some interesting cases where we could enable it automatically. I've been thinking that if you explicitly CLUSTER a table, the order-defining index would definitely benefit from being a clustered index. If it's small enough that it fits in memory, there's no point in running CLUSTER in the first place. And if you run CLUSTER, we know it's in order. That seems like a pretty safe bet. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Synchronized Scan update
On Fri, 2007-03-02 at 15:03 -0800, Jeff Davis wrote: Is there any consensus about whether to include these two parameters as GUCs or constants if my patch is to be accepted? (1) sync_scan_threshold: Use synchronized scanning for tables greater than this many pages; smaller tables will not be affected. (2) sync_scan_offset: Start a new scan this many pages before a currently running scan to take advantage of the pages that are likely already in cache. Right now they are just constants defined in a header, but a GUC might make sense. I'd like to know which version is more acceptable when I submit my final patch. I'm looking at ways of helping Synch Scan and scan_recycle_buffers to work well together. I've had a look through the synch scan patch in more detail to help this along. ISTM they can play nicely together. Results from tests show that with a single scan we get better performance with scan_recycle_buffers = ~32. That gives a +25% performance bonus. Synch scan has the capability to reduce I/O by 50% across two concurrent scans, so that would be likely to outweigh the gain from L2 cache reduction, unless we can get both together. It's possible that multiple scans would use the same CPU, or at least use multiple cores on the same chip and hence same L2 cache. That's more likely if the synch scan works well, since the second scan can process the buffer while the lead process performs I/O. So its likely that on dual and quad core CPUs that we'll be able to get both benefits in most cases. So based on those thoughts, sync_scan_offset should be fixed at 16, rather than being variable. In addition, ss_report_loc() should only report its position every 16 blocks, rather than do this every time, which will reduce overhead of this call. To match that, scan_recycle_buffers should be fixed at 32. So GUCs for sync_scan_offset and scan_recycle_buffers would not be required at all. IMHO we can also remove sync_scan_threshold and just use NBuffers instead. That way we get the benefit of both patches or neither, making it easier to understand what's going on. On my patch, I'll make buffer recycling only work for SeqScans and VACUUMs. If need be, the value of scan_recycle_buffers can be varied upwards should the scans drift apart, as a way of bringing them back together. We aren't tracking whether they are together or apart, so I would like to see some debug output from synch scans to allow us to assess how far behind the second scan is as it progresses. e.g. LOG: synch scan currently on block N, trailing pathfinder by M blocks issued every 128 blocks as we go through the scans. Thoughts? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Bug: Buffer cache is not scan resistant
ITAGAKI Takahiro [EMAIL PROTECTED] writes: I tested your patch with VACUUM FREEZE. The performance was improved when I set scan_recycle_buffers 32. I used VACUUM FREEZE to increase WAL traffic, but this patch should be useful for normal VACUUMs with backgrond jobs! Proving that you can see a different in a worst-case scenario is not the same as proving that the patch is useful in normal cases. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum next steps, take 3
Galy Lee [EMAIL PROTECTED] writes: I am worrying about the worker-to-do-list in your proposal. I think worker isn't suitable to maintain any vacuum task list; instead it is better to maintain a unified vacuum task queue on autovacuum share memory. Shared memory is fixed-size. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
On Saturday 10 March 2007 00:13, NikhilS wrote: Hi, Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better. ISTM you're trading appending the master table for appending the DUMP partition, which afaict would give you no gain. If there are entries in the master table, I think it would get appended for all queries regardless of whether we need to examine its contents or not. Segregating dump data into a partition will avoid that. I have seen examples in some other databases wherein a partition specifies a range of someval - MAXINT for instance, to catch such cases. That again means that the onus is on the partition creator most of the times.. *shrug*... we can do that now in pgsql -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Updating large postgresql database with blobs
CAJ CAJ wrote: Hello, I didn't get any response on the GENERAL list so i'm escalating this We have several independent database servers with ~50GB+ databases running postgres 8.0.x. We are planning to upgrade these databases to postgres 8.2.x over the weekend We plan to use the following steps to upgrade each server, 1. Dump the 8.0.x database cluster using 8.2.x pg_dumpall % ./pg_dumpall pgdumpall_backup.sql 2.Dump the 8.0.x database including large objects in compressed custom format using 8.2.x pg_dump % ./pg_dump -Fc -b -Z9 dbname pgdump_lobs_backup Restoring database 1. Initialize 8.2.x darabase % initdb -D /data/pgdata 2. Restore template1 database from cluster dump % ./psql -d template1 pgdumpall_backup.sql 3. Delete database dbname else restoring will give error about existing dbname % dropdb dbname 4. Create fresh dbname % createdb -O dbowner dbname 5. Restore database with lobs % ./pg_restore -v -Fc -d dbname -e -U dbowner pgdumpall_lobs_backup Some of the problems we have are, 1. We are not sure if all of the data will be available after dump/restore with above process 2. The dump and restore process is very very slow to be complete over the weekend (takes approx 1GB/hr to dump on a dual G5 PPC 2Ghz with 1GB RAM and RAID 1 disks) What is the fastest way to upgrade postgres for large databases that has binary objects? Thanks for all your help. Your procedure dumps and restore the databases twice. This seems less than sound. My prediction is that you could get a 50% speed improvement by fixing that ... The only thing you really need pg_dumpall for is the global tables. I would just use pg_dumpall -g to get those, and then use pg_dump -F c + pg_restore for each actual database. Another thing is to make sure that pg_dump/pg_restore are not competing with postgres for access to the same disk(s). One way to do that is to run them from a different machine - they don't have to be run on the server machine - of course then the network can become a bottleneck, so YMMV. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Synchronized Scan update
Simon, On 3/12/07 6:21 AM, Simon Riggs [EMAIL PROTECTED] wrote: So based on those thoughts, sync_scan_offset should be fixed at 16, rather than being variable. In addition, ss_report_loc() should only report its position every 16 blocks, rather than do this every time, which will reduce overhead of this call. And for N concurrent scans? I think there is actually no need to synchronize the shared buffers at all for synchronized scans. The OS I/O cache will do that for us and we're just going to interfere and pessimize by trying to divide up a local buffer. I suggest that this be proven or disproved by running this test: measure the performance of syncscan with the non-polluting buffer change, then measure with Jeff's patch and non-polluting with multiple scans, then measure with your suggested changes to synchronize the buffers. Somewhere in that progression we'll learn more about how the multi-level buffering really works. I think we'll get all the shared I/O cache we need. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Synchronized Scan update
On Mon, 2007-03-12 at 08:42 -0700, Luke Lonergan wrote: On 3/12/07 6:21 AM, Simon Riggs [EMAIL PROTECTED] wrote: So based on those thoughts, sync_scan_offset should be fixed at 16, rather than being variable. In addition, ss_report_loc() should only report its position every 16 blocks, rather than do this every time, which will reduce overhead of this call. And for N concurrent scans? I think there is actually no need to synchronize the shared buffers at all for synchronized scans. The OS I/O cache will do that for us and we're just going to interfere and pessimize by trying to divide up a local buffer. I think you've misunderstood my comment slightly. In Jeff's patch, ss_report_loc() is called after every block is read, AFAICS. I was suggesting that we don't do it that frequently, to reduce the overhead of reporting the location. That has nothing to do with re-synchronising the two scans mid-way. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Updating large postgresql database with blobs
snip What is the fastest way to upgrade postgres for large databases that has binary objects? Your procedure dumps and restore the databases twice. This seems less than sound. My prediction is that you could get a 50% speed improvement by fixing that ... Thanks for the response. This'd be wonderful if I can get my process right. My assumptions (probably incorrect) are that pgdump has to be excuted twice on a database with blobs. Once to get the data and once to get the blob (using the -b flag). The only thing you really need pg_dumpall for is the global tables. I would just use pg_dumpall -g to get those, and then use pg_dump -F c + pg_restore for each actual database. This makes sense :) I assume that running pg_dump with -b will get all of the data including the blobs? Another thing is to make sure that pg_dump/pg_restore are not competing with postgres for access to the same disk(s). One way to do that is to run them from a different machine - they don't have to be run on the server machine - of course then the network can become a bottleneck, so YMMV. We are using separate servers for dump and restore. Thanks again for your suggestions. This helps immensely.
[HACKERS] To connect a debbuger...
Hello! I'm trying to add some features to PostgreSQL, but I just can't figure out how to make gdb work with it. I attach gdb to /usr/.../bin/postgres, then I put a breakpoint. Whenever postgres stop on that breakpoint, it just kills/restarts the server instead of asking me what to do. It's surely a noob's question but how can I use a debugger with PostegreSQL? Regards, Jonathan Scher (gdb) file /usr/local/pgsql/bin/postgres Reading symbols from /usr/local/pgsql/bin/postgres...(no debugging symbols found)...done. (gdb) set args -D /home/heziva/pgsql/pgsql_data/ (gdb) b transformFromClause Breakpoint 1 at 0x80e371e (gdb) r Starting program: /usr/local/pgsql/bin/postgres -D /home/heziva/pgsql/pgsql_data/ (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...LOG: database system was interrupted; last known up at 2007-03-12 18:04:04 UTC LOG: checkpoint record is at 0/481504 LOG: redo record is at 0/481504; shutdown TRUE LOG: next transaction ID: 0/710; next OID: 16392 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/481544 LOG: redo is not required LOG: database system is ready to accept connections LOG: autovacuum launcher started = on another shell psql\d (gdb) file /usr/local/pgsql/bin/postgres Reading symbols from /usr/local/pgsql/bin/postgres...(no debugging symbols found)...done. (gdb) set args -D /home/heziva/pgsql/pgsql_data/ (gdb) b transformFromClause Breakpoint 1 at 0x80e371e (gdb) r Starting program: /usr/local/pgsql/bin/postgres -D /home/heziva/pgsql/pgsql_data/ (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...LOG: database system was interrupted; last known up at 2007-03-12 18:04:04 UTC LOG: checkpoint record is at 0/481504 LOG: redo record is at 0/481504; shutdown TRUE LOG: next transaction ID: 0/710; next OID: 16392 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/481544 LOG: redo is not required LOG: database system is ready to accept connections LOG: autovacuum launcher started ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Updating large postgresql database with blobs
CAJ CAJ wrote: Thanks for the response. This'd be wonderful if I can get my process right. My assumptions (probably incorrect) are that pgdump has to be excuted twice on a database with blobs. Once to get the data and once to get the blob (using the -b flag). Why do you assume that? The pg_dump manual says: -b --blobs Include large objects in the dump. This is the default behavior except when --schema, --table, or --schema-only is specified, so the -b switch is only useful to add large objects to selective dumps. IOW, in most cases you will just get the large objects for free, and don't even need to use this flag to get them. No need to run twice. cheers andrew ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] To connect a debbuger...
Jonathan Scher [EMAIL PROTECTED] writes: I attach gdb to /usr/.../bin/postgres, then I put a breakpoint. Whenever postgres stop on that breakpoint, it just kills/restarts the server instead of asking me what to do. Not sure why it's crashing but you don't want to run postgres itself under the debugger, at least not usually. What you have to do is start postgres up normally, connect to it with psql or whatever client you want, then in a separate shell run something like: (gdb) shell ps auxww | grep [i]dle stark 3724 0.0 0.1 36180 2044 ?Ts 18:07 0:00 postgres: stark postgres [local] idle (gdb) attach 3724 Attaching to process 3724 Reading symbols from /usr/local/pgsql/bin/postgres...done. ... Reading symbols from /usr/local/pgsql/bin/postgres...(no debugging symbols found)...done. Also, you'll want to configure with --enable-debug or else your gdb session isn't going to be very enlightening. Personally I suggest: CFLAGS='-O0 -g' ./configure --enable-debug --enable-depend --enable-cassert -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Bug: Buffer cache is not scan resistant
On Mon, 2007-03-12 at 10:30 -0400, Tom Lane wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: I tested your patch with VACUUM FREEZE. The performance was improved when I set scan_recycle_buffers 32. I used VACUUM FREEZE to increase WAL traffic, but this patch should be useful for normal VACUUMs with backgrond jobs! Proving that you can see a different in a worst-case scenario is not the same as proving that the patch is useful in normal cases. I agree, but I think that this VACUUM FREEZE test does actually represent the normal case, here's why: The poor buffer manager behaviour occurs if the block is dirty as a result of WAL-logged changes. It only takes the removal of a single row for us to have WAL logged this and dirtied the block. If we invoke VACUUM from autovacuum, we do this by default when 20% of the rows have been updated, which means with many distributions of UPDATEs we'll have touched a very large proportion of blocks before we VACUUM. That isn't true for *all* distributions of UPDATEs, but it will soon be. Dead Space Map will deliver only dirty blocks for us. So running a VACUUM FREEZE is a reasonable simulation of running a large VACUUM on a real production system with default autovacuum enabled, as will be the case for 8.3. It is possible that we run VACUUM when fewer dirty blocks are generated, but this won't be the common situation and not something we should optimise for. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] autovacuum next steps, take 3
Galy Lee wrote: Alvaro Herrera wrote: worker to-do list - It removes from its to-do list the tables being processed. Finally, it writes the list to disk. I am worrying about the worker-to-do-list in your proposal. I think worker isn't suitable to maintain any vacuum task list; instead it is better to maintain a unified vacuum task queue on autovacuum share memory. Galy, Thanks for your comments. I like the idea of having a global task queue, but sadly it doesn't work for a simple reason: the launcher does not have enough information to build it. This is because we need access to catalogs in the database; pg_class and pg_autovacuum in the current code, and the catalogs related to the maintenance window feature when we implement it in the (hopefully near) future. Another point to be made, though of less importance, is that we cannot keep such a task list in shared memory, because we aren't able to grow that memory after postmaster start. It is of lesser importance, because we could keep the task list in plain files on disk; this is merely a SMOP. The functions to expose the task list to SQL queries would just need to read those files. It would be slower than shared memory, certainly, but I don't think it's a showstopper (given the amount of work VACUUM takes, anyway). Not having access to the catalogs is a much more serious problem for the scheduling. One could think about dumping catalogs to plain files that are readable to the launcher, but this is not very workable: how do you dump pg_class and have it up to date all the time? You'd have to be writing that file pretty frequently, which doesn't sound a very good idea. Other idea I had was having a third kind of autovacuum process, namely a schedule builder, which would connect to the database, read catalogs, compute needed vacuuming, write to disk, and exit. This seems similar to your task-gathering worker. The launcher could then dispatch regular workers as appropriate. Furthermore, the launcher could create a global schedule, based on the combination of the schedules for all databases. I dismissed this idea because a schedule gets out of date very quickly as tables continue to be used by regular operation. A worker starting at t0 may find that a task list built at t0-5 min is not very relevant. So it needs to build a new task list anyway, which then begs the question of why not just let the worker itself build its task list? Also, combining schedules is complicated and you start thinking in asking the DBA to give each database a priority, which is annoying. So the idea I am currently playing with is to have workers determine the task list at start, by looking at both the catalogs and considering the task lists of other workers. I think this is the natural evolution of the other ideas -- the worker is just smarter to start with, and the whole thing is a lot simpler. The task queue might be filled by dedicated task-gathering-worker or it might be filled by *external task gatherer*. The idea of an external task gatherer is an interesting one which I think would make sense to implement in the future. I think it is not very difficult to implement once the proposal we're currently discussing is done, because it just means we have to modify the part where each worker decides what needs to be done, and at what times the launcher decides to start a worker on each database. The rest of the stuff I'm working on is just infrastructure to make it happen. So I think your basic idea here is still workable, just not right now. Let's discuss it again as soon as I'm done with the current stuff. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] autovacuum next steps, take 3
Alvaro Herrera wrote: worker to-do list - When each worker starts, it determines which tables to process in the usual fashion: get pg_autovacuum and pgstat data and compute the equations. The worker then takes a snapshot of what's currently going on in the database, by storing worker PIDs, the corresponding table OID that's being currently worked, and the to-do list for each worker. It removes from its to-do list the tables being processed. Finally, it writes the list to disk. The table list will be written to a file in PGDATA/vacuum/database-oid/todo.worker-pid The file will consist of table OIDs, in the order in which they are going to be vacuumed. At this point, vacuuming can begin. Before processing each table, it scans the WorkerInfos to see if there's a new worker, in which case it reads its to-do list to memory. Then it again fetches the tables being processed by other workers in the same database, and for each other worker, removes from its own in-memory to-do all those tables mentioned in the other lists that appear earlier than the current table being processed (inclusive). Then it picks the next non-removed table in the list. All of this must be done with the Autovacuum LWLock grabbed in exclusive mode, so that no other worker can pick the same table (no IO takes places here, because the whole lists were saved in memory at the start.) Sorry, I confused matters here by not clarifing on-disk to-do lists versus in-memory. When we write the to-do list to file, that's the to-do lists that other workers will see. It will not change; when I say remove a table for the to-do list, it will be removed from the to-do list in memory, but the file will not get rewritten. Note that a worker will not remove from its list a table that's in the to-do list of another worker but not yet processed. It will only remove those tables that are currently being processed (i.e. they appear in the shared memory entry for that worker), and any tables that appear _before that one_ on that particular worker's file. So this behaves very much like what Tom describes in an email downthread, not like what Matthew is thinking. In fact I'm thinking that the above is needlessly complex, and that Tom's proposal is simpler and achieves pretty much the same effect, so I'll have a look at evolving from that instead. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CLUSTER and MVCC
Heikki Linnakangas wrote: Tom Lane wrote: The reason it's not trivial is that you also have to preserve the t_ctid links of update chains. If you look into VACUUM FULL, a very large part of its complexity is that it moves update chains as a unit to make that possible. (BTW, I believe the problem Pavan Deolasee reported yesterday is a bug somewhere in there --- it looks to me like sometimes the same update chain is getting copied multiple times.) Ah, that's it. Thanks. The easiest solution I can think of is to skip newer versions of updated rows when scanning the old relation, and to fetch and copy all tuples in the update chain to the new relation whenever you encounter the first tuple in the chain. To get a stable view of what's the first tuple in chain, you need to get the oldest xmin once at the beginning, and use that throughout the operation. Since we take an exclusive lock on the table, no-one can insert new updated tuples during the operation, and all updaters are finished before the lock is granted. I've been thinking about this some more, and I think the above would work. The tricky part is to recognize the first tuple in a chain, given the recent bug in vacuum full. In each chain, there must be at least one non-dead tuple with xmin Oldestxmin. Otherwise we're already missing tuples; there would be no version visible to a transaction with xid between OldestXmin and the min xmin present in the chain. That tuple is the root of the chain. There might be a dead tuple in the middle of the chain. Dead tuples have xmax OldestXmin, which means there must be another non-dead tuple in the chain with xmax = OldestXmin. For cluster's purposes, that another non-dead tuple is also considered as a root. Dead tuples and chains ending in a dead tuples don't need to be stored in the new table. This picture helped me a lot: http://community.enterprisedb.com/updatechain.gif Arrows represent tuples, beginning at xmin and ending at xmax. OldestXmin is represented by the vertical bar, everything to the left is smaller than OldestXmin and everything to the right is larger than OldestXmin. The chain must begin with a tuple with xmin on the left side of OldestXmin, and the last tuple in the chain must end on the right side. Does anyone see a problem with this? If not, I'm going to write a patch. One potential issue I'm seeing is that if we rely on the unbroken chain starting from OldestXmin, and that tuple isn't there because of a bug, for example, the later version of the tuple is skipped and the row is lost. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Inconsistent behavior on select * from void_function()?
Folks, This seems wrong to me: postgres=# create table test1 ( testy int ); CREATE TABLE postgres=# insert into test1 values ( 5 ); INSERT 0 1 postgres=# create function void_func ( IN theval int ) returns void as $f$ postgres$# update test1 set testy = $1; postgres$# $f$ language sql; CREATE FUNCTION ^ postgres=# select * from void_func( 9 ); void_func --- (1 row) postgres=# select void_func( 10 ) is null; ?column? -- t (1 row) postgres=# create function void_func2( IN theval int ) postgres-# returns void as $f$ postgres$# begin postgres$# update test1 set testy = theval; postgres$# return; postgres$# end;$f$ language plpgsql; CREATE FUNCTION postgres=# select * from void_func2(19); void_func2 (1 row) postgres=# select void_func2(19) is null; ?column? -- f (1 row) Why is a function which returns void returning a row? Why is that row NULL if it's a SQL function and empty if it's a PLPGSQL function? (version 8.2.3) -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] possible de-optimization of multi-column index plans in 8.3
All, One of the Sun benchmarking guys ran across this issue, using a fairly recent (within the last 4 weeks) 8.3 snapshot. It appears to me that the 8.3 planner is unable to tell that, if the value for the first column of an index scan is a constant, the second column doesn't need to be sorted. This was working in 8.2. Am I interpreting this wrong? All data was ANALYZED. == 8.3 plan postgres @ bigleaf% psql tpce -c '\i tl3_1.sql' QUERY PLAN - Limit (cost=988.46..988.47 rows=20 width=64) (actual time=9.444..9.457 rows=20 loops=1) - Sort (cost=988.46..988.62 rows=647 width=64) (actual time=9.440..9.443 rows=20 loops=1) Sort Key: t_dts - Index Scan using idx_t_s_symb_dts on trade (cost=0.00..985.44 rows=647 width=64) (actual time=0.166..6.629 rows=845 loops=1) Index Cond: (((t_s_symb)::text = 'SYMPRA'::text) AND (t_dts = '2006-02-15 00:00:00'::timestamp without time zone) AND (t_dts = '2007-06-28 00:00:00'::timestamp without time zone)) Filter: ((t_ca_id = 435::bigint) AND (t_st_id = 'CMPT'::bpchar)) Total runtime: 9.679 ms (7 rows) 8.2.1 plan... postgres @ bigleaf% psql tpce -c '\i tl3_1.sql' QUERY PLAN --- Limit (cost=0.00..30.48 rows=20 width=64) (actual time=0.217..0.343 rows=20 loops=1) - Index Scan using idx_t_s_symb_dts on trade (cost=0.00..944.86 rows=620 width=64) (actual time=0.212..0.332 rows=20 loops=1) Index Cond: (((t_s_symb)::text = 'SYMPRA'::text) AND (t_dts = '2006-02-15 00:00:00'::timestamp without time zone) AND (t_dts = '2007-06-28 00:00:00'::timestamp without time zone)) Filter: ((t_st_id = 'CMPT'::bpchar) AND (t_ca_id = 435::bigint)) Total runtime: 0.644 ms (5 rows) EXPLAIN ANALYZE SELECT t_id, t_ca_id, t_exec_name, t_trade_price, t_qty, t_dts, t_tt_id, t_is_cash FROM trade WHERE t_s_symb = 'SYMPRA' AND t_st_id = 'CMPT' AND t_dts = '2006-02-15' AND t_dts = '2007-06-28' AND t_ca_id = 435 -- This test is not required for a reportable run ORDER BY t_dts ASC LIMIT 20; tpce=# \d+ trade Table public.trade Column | Type | Modifiers | Description ---+-+---+- t_id | bigint | not null | t_dts | timestamp without time zone | not null | t_st_id | character(4) | not null | t_tt_id | character(3) | not null | t_is_cash | boolean | not null | t_s_symb | character varying(15) | not null | t_qty | integer | not null | t_bid_price | numeric(8,2) | not null | t_ca_id | bigint | not null | t_exec_name | character varying(64) | not null | t_trade_price | numeric(8,2) | | t_chrg | numeric(10,2) | | t_comm | numeric(10,2) | not null | t_tax | numeric(10,2) | not null | t_lifo | boolean | not null | Indexes: trade_pkey PRIMARY KEY, btree (t_id) idx_t_ca_id_dts btree (t_ca_id, t_dts) idx_t_s_symb_dts btree (t_s_symb, t_dts) Check constraints: trade_t_bid_price_check CHECK (t_bid_price 0::numeric) trade_t_chrg_check CHECK (t_chrg = 0::numeric) trade_t_comm_check CHECK (t_comm = 0::numeric) trade_t_qty_check CHECK (t_qty 0) trade_t_tax_check CHECK (t_tax = 0::numeric) Has OIDs: no -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Inconsistent behavior on select * from void_function()?
Josh Berkus josh@agliodbs.com writes: postgres=# select * from void_func2(19); void_func2 (1 row) postgres=# select void_func2(19) is null; ?column? -- f (1 row) Why is a function which returns void returning a row? Why is that row NULL if it's a SQL function and empty if it's a PLPGSQL function? Generally you can treat functions that return a data type as if they returned a set of rows of that data type. I get the impression this is a considered a quirk of the implementation and not an advertised feature though: postgres=# create function foo() returns integer as 'select 1' language sql; CREATE FUNCTION postgres=# select foo(); foo - 1 (1 row) postgres=# select * from foo(); foo - 1 (1 row) I can't speak to the handling of IS NULL though. It is a bit curious. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Mar 11, 12:47 pm, [EMAIL PROTECTED] (Josh Berkus) wrote: No matter how much Heikki hates them, I think he'd agree that EAV tables are better than having the application execute DDL at runtime. EAV moves the structure that is typically in the design of the tables into the contents of the tables. With an EAV database you have effectively destroyed the semantic difference between DML and DDL. I'm willing to concede that there may be situations where EAV is actually the right answer to a design problem. I have yet to encounter one, but Josh has more experience, and more varied experience than I do. To me, EAV is a perfect example of ignoring the YAGNI principal. http://c2.com/xp/YouArentGonnaNeedIt.html What I'd basically like to know is a) Is this problem worth solving? I think you're solving a symptom, not the root cause of the problem. Clarify the root cause, and then you have an interesting project. Andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Amdrew, I have yet to encounter one, but Josh has more experience, and more varied experience than I do. To me, EAV is a perfect example of ignoring the YAGNI principal. I've done plenty of applications where part of the specification for the application was User Defined Fields allowing the users to customize the data structure at runtime. This is a very, very common thing; of the 15 or so commercial applications I implemented from scratch as a consultant probably 10 of them had it. I really don't see any way you could implement UDFs other than EAV that wouldn't be immensely awkward, or result in executing DDL at runtime. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Josh Berkus wrote: Amdrew, I have yet to encounter one, but Josh has more experience, and more varied experience than I do. To me, EAV is a perfect example of ignoring the YAGNI principal. I've done plenty of applications where part of the specification for the application was User Defined Fields allowing the users to customize the data structure at runtime. This is a very, very common thing; of the 15 or so commercial applications I implemented from scratch as a consultant probably 10 of them had it. Just to add a note from YAJ, custom fields are unfortunately a requirement in most apps I have seen, from a CMS to Quickbooks. The reality is, you don't know everything the person wants to know about a particular set of data. Joshua D. Drake I really don't see any way you could implement UDFs other than EAV that wouldn't be immensely awkward, or result in executing DDL at runtime. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Josh Berkus wrote: I really don't see any way you could implement UDFs other than EAV that wouldn't be immensely awkward, or result in executing DDL at runtime. What's so horrible about DDL at runtime? Obviously, you're only going to allow specific additions to specific schemas/tables, but why not? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On 3/12/07, Richard Huxton dev@archonet.com wrote: Josh Berkus wrote: I really don't see any way you could implement UDFs other than EAV that wouldn't be immensely awkward, or result in executing DDL at runtime. What's so horrible about DDL at runtime? Obviously, you're only going to allow specific additions to specific schemas/tables, but why not? More to the point, since EAV is effectively smearing the semantics of DDL with DML, what, if any of the arguments against doing DDL at runtime don't apply equally to EAV? Well, aside from being able to say hey, I'm not executing DDL at runtime. :) I see the issue as one of cost: it's substantially harder to implement DDL at runtime than to work around the problem using EAV. If that analysis is reasonable, then it would be a very interesting research project to see how to cut down that cost of implementation. Andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Andrew Hammond wrote: On 3/12/07, Richard Huxton dev@archonet.com wrote: Josh Berkus wrote: I really don't see any way you could implement UDFs other than EAV that wouldn't be immensely awkward, or result in executing DDL at runtime. What's so horrible about DDL at runtime? Obviously, you're only going to allow specific additions to specific schemas/tables, but why not? More to the point, since EAV is effectively smearing the semantics of DDL with DML, what, if any of the arguments against doing DDL at runtime don't apply equally to EAV? Well, aside from being able to say hey, I'm not executing DDL at runtime. :) I see the issue as one of cost: it's substantially harder to implement DDL at runtime than to work around the problem using EAV. If that analysis is reasonable, then it would be a very interesting research project to see how to cut down that cost of implementation. Well the cost depends on where/how complex the extra fields are. If you're just talking about adding columns usercol01..NN with different types and possibly a lookup to a single client_attributes table, it's not difficult. Of course, if inheritence worked fully, you could just have core and user versions of relevant tables. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [COMMITTERS] pgsql: Make configuration parameters fall back to their default values
[EMAIL PROTECTED] (Peter Eisentraut) writes: Make configuration parameters fall back to their default values when they are removed from the configuration file. It appears that this patch has broken custom GUC variables; at the very least it's broken plperl. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] possible de-optimization of multi-column index plans in 8.3
Josh Berkus josh@agliodbs.com writes: One of the Sun benchmarking guys ran across this issue, using a fairly recent (within the last 4 weeks) 8.3 snapshot. It appears to me that the 8.3 planner is unable to tell that, if the value for the first column of an index scan is a constant, the second column doesn't need to be sorted. Works for me. Care to provide a self-contained test case? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Richard Huxton dev@archonet.com writes: Well the cost depends on where/how complex the extra fields are. If you're just talking about adding columns usercol01..NN with different types and possibly a lookup to a single client_attributes table, it's not difficult. And then what? dynamically construct all your SQL queries? Sure, sounds like a simple solution to me... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [COMMITTERS] pgsql: Make configuration parameters fall back to their default values
Tom Lane [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Peter Eisentraut) writes: Make configuration parameters fall back to their default values when they are removed from the configuration file. It appears that this patch has broken custom GUC variables; at the very least it's broken plperl. Huh, it occurs to me that I haven't seen any plperl regression tests fly by when I've been running regression tests myself. What do I have to do to test if plperl, plpython, etc work with the packed varlena patch? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Inconsistent behavior on select * from void_function()?
Josh Berkus josh@agliodbs.com writes: Why is a function which returns void returning a row? Returning a scalar result that happens to be of type VOID is an entirely different thing from returning a set result that contains no rows. Why is that row NULL if it's a SQL function and empty if it's a PLPGSQL function? I'd say that the SQL function is probably doing the right thing. It appears that plpgsql has hacked this specially for backward compatibility: /* * Special hack for function returning VOID: instead of NULL, return a * non-null VOID value. This is of dubious importance but is kept for * backwards compatibility. Note that the only other way to get here is * to have written RETURN NULL in a function returning tuple. */ if (estate-fn_rettype == VOIDOID) { estate-retval = (Datum) 0; estate-retisnull = false; estate-rettype = VOIDOID; } I haven't tested, but I think that diking out this section would make the result be a null (still of type void). regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [COMMITTERS] pgsql: Make configuration parameters fall back to their default values
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: [EMAIL PROTECTED] (Peter Eisentraut) writes: Make configuration parameters fall back to their default values when they are removed from the configuration file. It appears that this patch has broken custom GUC variables; at the very least it's broken plperl. Huh, it occurs to me that I haven't seen any plperl regression tests fly by when I've been running regression tests myself. What do I have to do to test if plperl, plpython, etc work with the packed varlena patch? cd src/pl; make installcheck cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [COMMITTERS] pgsql: Make configuration parameters fall back to their default values
Gregory Stark [EMAIL PROTECTED] writes: Huh, it occurs to me that I haven't seen any plperl regression tests fly by when I've been running regression tests myself. What do I have to do to test if plperl, plpython, etc work with the packed varlena patch? cd to $TOP/src/pl, run make installcheck. Make sure you have configured --with all the PLs you want to test. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CLUSTER and MVCC
Heikki Linnakangas [EMAIL PROTECTED] writes: In each chain, there must be at least one non-dead tuple with xmin Oldestxmin. Huh? Typically *all* the tuples but the last are dead, for varying values of dead. Please be more specific what you mean. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
And then what? dynamically construct all your SQL queries? Sure, sounds like a simple solution to me... Not to mention DB security issues. How do you secure your database when your web client has DDL access? So, Edward, the really *interesting* idea would be to come up with a secure, normalized way to do UDFs *without* EAV tables. People would be very impressed. BTW, Google Summer of Code opens Wednesday: http://www.postgresql.org/developer/summerofcode.html -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] autovacuum next steps, take 3
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: 1. Grab the AutovacSchedule LWLock exclusively. 2. Check to see if another worker is currently processing that table; if so drop LWLock and go to next list entry. 3. Recompute whether table needs vacuuming; if not, drop LWLock and go to next entry. (This test covers the case where someone vacuumed the table since you made your list.) 4. Put table OID into shared memory, drop LWLock, then vacuum table. 5. Clear current-table OID from shared memory, then repeat for next list entry. The point I'm not very sure about is that this proposal means we need to do I/O with the AutovacSchedule LWLock grabbed, to obtain up-to-date stats. True. You could probably drop the lock while rechecking stats, at the cost of having to recheck for collision (repeat step 2) afterwards. Or recheck stats before you start, but if collisions are likely then that's a waste of time. But on the third hand, does it matter? Rechecking the stats should be much cheaper than a vacuum operation, so I'm not seeing that there's going to be a problem. It's not like there are going to be hundreds of workers contending for that lock... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] autovacuum next steps, take 3
Hi, Alvaro Alvaro Herrera wrote: keep such a task list in shared memory, because we aren't able to grow that memory after postmaster start. We can use the fix-size share memory to maintain such a queue. The maximum task size is the number of all tables. So the size of the queue can be the same with max_fsm_relations which is usually larger than the numbers of tables and indexes in the cluster. This is sufficient to contain most of the vacuum tasks. Even though the queue is over flow, for task-gatherer is scanning the whole cluster every autovacuum_naptime, it is quickly enough to pick those tasks up again. We don’t need to write any thing to external file. So there is no problem to use a fix-size share memory to maintain a global queue. Other idea I had was having a third kind of autovacuum process, namely a schedule builder If we have such a global queue, task-gathering worker can connect to every database every naptime to gather tasks in time. The task-gathering worker won’t build the schedule, LAUNCHER or external program responses for such activity. How to dispatch tasks to worker is just a scheduling problem, a good dispatching algorithm needs to ensure each worker can finish their tasks on time, this might resolve the headache HOT table problem. But this is a further issue to be discussed after 8.3. Best Regards Galy Lee lee.galy _at_ oss.ntt.co.jp NTT Open Source Software Center ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
Gregory Stark wrote: Richard Huxton dev@archonet.com writes: Well the cost depends on where/how complex the extra fields are. If you're just talking about adding columns usercol01..NN with different types and possibly a lookup to a single client_attributes table, it's not difficult. And then what? dynamically construct all your SQL queries? Sure, sounds like a simple solution to me... No different to dynamically constructing a query for a report. Simpler, since in my experience most of these user-defined fields are just slots for extra codes/tags/notes rather than something you'd join on. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring
On Fri, 9 Mar 2007, Tom Lane wrote: It strikes me that the patch would be more useful if it produced a histogram of the observed usage_counts Don't have something worth releasing yet, but I did code a first rev of this today. The results are quite instructive and it's well worth looking at. The main server I work on has shared_buffers=6 for pgbench testing and the background writers turned way up (in hopes of reducing checkpoint times; that's a whole nother topic). I run pgbench with s=100 (~1.6GB database). Here's what I get as statistics on the buffer pool after a scan when the server is happy, from a run with 20 clients: writes=38.3MB (8.2%) pinned+used=38.3MB (8.2%) dirty buffer usage count histogram: 0=0.1% 1=0.3% 2=26% 3=17% 4=21% 5+=36% Basically, everything that's left dirty is also heavily used; as I noted before, when I inspect with pg_buffercache these are mostly index blocks. I note that I should probably generate a second histogram that compares the clean data. The all scan is writing pages out as soon as they dirty, the LRU background writer is lucky if it can find anything to do. (Note that I don't rely on the LRU writer more because that causes a significant lull in writes after a checkpoint. By the time it gets going again it's harder to catch up, and delaying PostgreSQL writes also aggrevates issues with the way Linux caches writes.) What's I found really interesting was comparing a sad section where performance breaks down. This is from a minute later: writes=441.6MB (94.2%) pinned+used=356.2MB (76.0%) dirty buffer usage count histogram: 0=18.7% 1=26.4% 2=31% 3=11% 4=9% 5+=4% Note how the whole buffer distribution has shifted toward lower usage counts. The breakdown seems to involve evicting the index blocks to make room for the recently dirty data when it can't be written out fast enough. As the index blocks go poof, things slow further, and into the vicious circle you go. Eventually you can end up blocked on a combination of buffer evictions and disk seeks for uncached data that are fighting with the writes. The bgwriter change this suggested to me is defining a triage behavior where the all scan switches to acting like the LRU one: -Each sample period, note what % of the usage_count=0 records are dirty -If that number is above a tunable threshold, switch to only writing usage_count=0 records until it isn't anymore. On my system a first guess for that tunable would be 2-5%, based on what values I see on either side of the sad periods. No doubt some systems would set that much higher, haven't tested my system at home yet to have a guideline for a more typical PC. As for why this behavior matters so much to me, I actually have a prototype auto-tuning background writer design that was hung up on this particular problem. It notes how often you write out max_pages, uses that to model the average percentage of the buffer cache you're traversing each pass, then runs a couple of weighted-average feedback loops to aim for a target seconds/sweep. The problem was that it went berzerk when the whole buffer cache was dirty (I hope someone appreciates that I've been calling this Felix Unger on crack mode in my notes). I needed some way to prioritize which buffers to concentrate on when that happens, and so far the above has been a good first-cut way to help with that. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] autovacuum next steps, take 3
Galy Lee [EMAIL PROTECTED] writes: We can use the fix-size share memory to maintain such a queue. The maximum task size is the number of all tables. So the size of the queue can be the same with max_fsm_relations which is usually larger than the numbers of tables and indexes in the cluster. The trouble with that analogy is that the system can still operate reasonably sanely when max_fsm_relations is exceeded (at least, the excess relations behave no worse than they did before we had FSM). If there are relations that autovacuum ignores indefinitely because they don't fit in a fixed-size work queue, that will be a big step backward from prior behavior. In any case, I still haven't seen a good case made why a global work queue will provide better behavior than each worker keeping a local queue. The need for small hot tables to be visited more often than big tables suggests to me that a global queue will actually be counterproductive, because you'll have to contort the algorithm in some hard-to-understand way to get it to do that. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring
Greg Smith [EMAIL PROTECTED] writes: Here's what I get as statistics on the buffer pool after a scan when the server is happy, from a run with 20 clients: writes=38.3MB (8.2%) pinned+used=38.3MB (8.2%) dirty buffer usage count histogram: 0=0.1% 1=0.3% 2=26% 3=17% 4=21% 5+=36% Interesting, but not real meaningful by itself --- what's the situation for non-dirty buffers? And how many are there of each? It might also be interesting to know exactly how many buffers were pinned at the time the scan passed over them. In theory it should be a small fraction, but maybe it isn't ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autovacuum next steps, take 3
Tom Lane [EMAIL PROTECTED] wrote: In any case, I still haven't seen a good case made why a global work queue will provide better behavior than each worker keeping a local queue. The need for small hot tables to be visited more often than big tables suggests to me that a global queue will actually be counterproductive, because you'll have to contort the algorithm in some hard-to-understand way to get it to do that. If we have some external vacuum schedulers, we need to see and touch the content of work queue. That's why he suggested the shared work queue. I think the present strategy of autovacuum is not enough in some heavily-used cases and need more sophisticated schedulers, even if the optimization for hot tables is added. Also, the best strategies of vacuum are highly depending on systems, so that I don't think we can supply one monolithic strategy that fits all purposes. That was a proposal of the infrastructure for interaction between autovacuum and user-land vacuum schedulers. Of cource, we can supply a simple scheduler for not-so-high-load systems, but I need a kind of autovacuum that can be controlled from an external program that knows user application well. Though we can use a completely separated autovacuum daemon like as contrib/pg_autovacuum of 8.0, but I think it is good for us to share some of the codes between a built-in scheduler and external ones. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] autovacuum next steps, take 3
ITAGAKI Takahiro [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] wrote: In any case, I still haven't seen a good case made why a global work queue will provide better behavior than each worker keeping a local queue. If we have some external vacuum schedulers, we need to see and touch the content of work queue. Who said anything about external schedulers? I remind you that this is AUTOvacuum. If you want to implement manual scheduling you can still use plain 'ol vacuum commands. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] autovacuum next steps, take 3
Tom Lane [EMAIL PROTECTED] wrote: Who said anything about external schedulers? I remind you that this is AUTOvacuum. If you want to implement manual scheduling you can still use plain 'ol vacuum commands. I think we can split autovacuum into two (or more?) functions: task gatherers and task workers. We don't have to bother with the monolithic style of current autovacuum. Galy said: The task queue might be filled by dedicated task-gathering-worker or it might be filled by *external task gatherer*. Alvaro said: The idea of an external task gatherer is an interesting one which I think would make sense to implement in the future. I think it is not very difficult to implement once the proposal we're currently discussing is done I said: Though we can use a completely separated autovacuum daemon like as contrib/pg_autovacuum of 8.0, but I think it is good for us to share some of the codes between a built-in scheduler and external ones. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bug: Buffer cache is not scan resistant
Simon Riggs [EMAIL PROTECTED] wrote: With the default value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool, just like existing sequential scans. Is this intended? New test version enclosed, where scan_recycle_buffers = 0 doesn't change existing VACUUM behaviour. This is a result with scan_recycle_buffers.v3.patch. I used normal VACUUM with background load using slowdown-ed pgbench in this instance. I believe the patch is useful in normal cases, not only for VACUUM FREEZE. N | time | WAL flush(*) -++--- 0 | 112.8s | 44.3% 1 | 148.9s | 52.1% 8 | 105.1s | 17.6% 16 | 96.9s | 8.7% 32 | 103.9s | 6.3% 64 | 89.4s | 6.6% 128 | 80.0s | 3.8% Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bug: Buffer cache is not scan resistant
Simon, You may know we've built something similar and have seen similar gains. We're planning a modification that I think you should consider: when there is a sequential scan of a table larger than the size of shared_buffers, we are allowing the scan to write through the shared_buffers cache. The hypothesis is that if a relation is of a size equal to or less than the size of shared_buffers, it is cacheable and should use the standard LRU approach to provide for reuse. - Luke On 3/12/07 3:08 AM, Simon Riggs [EMAIL PROTECTED] wrote: On Mon, 2007-03-12 at 09:14 +, Simon Riggs wrote: On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote: With the default value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool, just like existing sequential scans. Is this intended? Yes, but its not very useful for testing to have done that. I'll do another version within the hour that sets N=0 (only) back to current behaviour for VACUUM. New test version enclosed, where scan_recycle_buffers = 0 doesn't change existing VACUUM behaviour. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
And then what? Make the search box on www.postgresql.org able to handle an email address as search text without throwing a shoe? Search for [EMAIL PROTECTED] or any other 'email' address from the postgres home page. Barfage every time. Easy for some isn't easy for all, apparently. Left that out as a test case did we? Someone searching a mailing list for an email address? Who wudda thunk it? It works without the . -- don't know why, but then I also don't know why someone hasn't tried that before me. Sure, sounds like a simple solution to me... Richard said sarcastically. Would be funnier if the search on the website wasn't broken in a completely stupid, almost ironic way. Ah, irony and sarcasm -- the ugly twins. Yeah, we have to dynamically generate queries day in and day out. But then some of us actually work for a living. Since we already have to do that, maybe someone could make that easier? Isn't that really the point here? Someone asked if something would be useful, and the people who use the database to do real work said YES, and here's how I might use it. Like full text seach and recursive queries, user defined (fields|attributes|properties) and the ability to manage them would be BUTTER! Is it a difficult problem? YES, but if it wasn't, why should it be worth an advanced degree? Or maybe 'we' only solve the trivial and obvious problems, like searching a mailing list for an email address? Sarcastically yours, Sean - Original Message - From: Richard Huxton dev@archonet.com To: Gregory Stark [EMAIL PROTECTED] Cc: Andrew Hammond [EMAIL PROTECTED]; josh@agliodbs.com; pgsql-hackers@postgresql.org Sent: Monday, March 12, 2007 7:30 PM Subject: Re: [HACKERS] My honours project - databases using dynamically attached entity-properties Gregory Stark wrote: Richard Huxton dev@archonet.com writes: Well the cost depends on where/how complex the extra fields are. If you're just talking about adding columns usercol01..NN with different types and possibly a lookup to a single client_attributes table, it's not difficult. And then what? dynamically construct all your SQL queries? Sure, sounds like a simple solution to me... No different to dynamically constructing a query for a report. Simpler, since in my experience most of these user-defined fields are just slots for extra codes/tags/notes rather than something you'd join on. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org