Re: [HACKERS] Covering Indexes
Hi all, > On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler > wrote: > I don't see the virtue of this in this case. Since the index is not > unique, why not just put the index on (a,b,c,d) and be done with it? > Is there some advantage to be had in inventing a way to store c and d > in the index without having them usable for indexing? Why not restrict it to UNIQUE indexes ? For not unique indexes it has no advantages (it could be in fact indexed on all columns anyway as an implementation detail). For the unique case the problem of many identical entries mentioned by Tom is not relevant, so the additional data will only bloat the index but not otherwise affect the index performance. Would this get close enough to index-covered table ? _That_ would be interesting - I have a really big table (table/index size: 370G/320G, ~8*10^9 rows) which is basically using double space because it's primary key is covering all columns of the table. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
Hi Tomas, On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote: > No, the multi-column statistics do not require constant updating. There > are cases where a sampling is perfectly fine, although you may need a > bit larger sample. Generally if you can use a multi-dimensional > histogram, you don't need to scan the whole table. In the cases where sampling is enough, you can do that to the updates too: do a sampling on the changes, in that you only process every Nth change to make it to the estimator. If you can also dynamically tune the N to grow it as the statistics stabilize, and lower it if you detect high variance, even better. If the analyze process could be decoupled from the backends, and maybe just get the data passed over to be processed asynchronously, then that could be a feasible way to have always up to date statistics when the bottleneck is IO and CPU power is in excess. If that then leads to better plans, it could really be a win exceeding the overhead. If this analyze process (or more of them) could also just get the data from the modified buffers in a cyclic way, so that backends need nothing extra to do, then I don't see any performance disadvantage other than possible extra locking contention on the buffers and non-determinism of the actual time when a change makes it to the statistics. Then you just need to get more CPU power and higher memory bandwidth to pay for the accurate statistics. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 0: Introduction; test cases
On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote: > On Tue, Jan 11, 2011 at 09:24:46AM +, Simon Riggs wrote: > > I have a concern that by making the ALTER TABLE more complex that we > > might not be able to easily tell if a rewrite happens, or not. What about add EXPLAIN support to it, then whoever wants to know what it does should just run explain on it. I have no idea how hard that would be to implement and if it makes sense at all, but I sure would like to see a plan for DDLs too to estimate how long it would take. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Fri, 2011-01-07 at 12:32 +0100, t...@fuzzy.cz wrote: > the problem is you will eventually need to drop the results and rebuild > it, as the algorithms do not handle deletes (ok, Florian mentioned an > algorithm L_0 described in one of the papers, but I'm not sure we can use > it). Yes, but even then you can start with much better cards if you already have an estimate of what it looks like, based on the fact that you did continuous updating of it. For example you'll have a pretty good estimate of the bounds of the number of distinct values, while if you really start from scratch you have nothing to start with but assume that you must cope with the complete range between all values are distinct -> there's only a few of them. > I'm not sure a constantly running background process is a good idea. I'd > prefer storing an info about the modified tuples somewhere, and starting > analyze only when a given threshold is reached. I'm not sure how to do > that, though. > > Another thing I'm not sure about is where to store those intermediate > stats (used to get the current estimate, updated incrementally). The forks implementation proposed in other responses is probably the best idea if usable. It will also solve you the problem of memory limitations, at the expense of more resources used if the structure grows too big, but it will still be probably fast enough if it stays small and in cache. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
On Thu, 2010-12-30 at 21:02 -0500, Tom Lane wrote: > How is an incremental ANALYZE going to work at all? How about a kind of continuous analyze ? Instead of analyzing just once and then drop the intermediate results, keep them on disk for all tables and then piggyback the background writer (or have a dedicated process if that's not algorithmically feasible) and before writing out stuff update the statistics based on the values found in modified buffers. Probably it could take a random sample of buffers to minimize overhead, but if it is done by a background thread the overhead could be minimal anyway on multi-core systems. Not sure this makes sense at all, but if yes it would deliver the most up to date statistics you can think of. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Tue, 2010-12-14 at 14:36 -0500, Robert Haas wrote: > > Well, you have to do that for DROP TABLE as well, and I don't see any > > way around doing it for REPLACE WITH. > > Sure, but in Simon's proposal you can load the data FIRST and then > take a lock just long enough to do the swap. That's very different > from needing to hold the lock during the whole data load. Except Simon's original proposal has this line in it: * "new_table" is TRUNCATEd. I guess Simon mixed up "new_table" and "old_table", and the one which should get truncated is the replaced one and not the replacement, otherwise it doesn't make sense to me. BTW, I would have also used such a feature on multiple occasions in the past and expect I would do in the future too. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE ... REPLACE WITH
On Wed, 2010-12-15 at 10:39 +, Simon Riggs wrote: > Perhaps a more useful definition would be > > EXCHANGE TABLE target WITH source; > > which just swaps the heap and indexes of each table. > You can then use TRUNCATE if you want to actually destroy data. Yes please, that's exactly what I would have needed in many occasions. But one problem would be when the replaced table is the _parent_ for a foreign key relationship. I don't think you can have that constraint pre-verified on the replacement table and simply replacing the content could leave the child relations with orphans. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another proposal for table synonyms
Hi all, On Tue, 2010-11-30 at 12:05 -0800, Josh Berkus wrote: > Can you explain, for our benefit, the use case for this? Specifically, > what can be done with synonyms which can't be done with search_path and > VIEWs? I had a few cases where synonyms for user/data base names would have helped me slightly (not something I couldn't work around). The actual use case was when I wanted to change the name of a data base and user names (just a configuration coherency thing) - using a synonym I could have done it without downtime by creating the synonym first, then reconfiguring each application machine individually (they are redundant, and can be restarted transparently). Without the synonyms, I had to wait for the next full downtime (which we do quite rarely) and reconfigure the DB and all application boxes at the same time. Ideally the user/DB name synonym would be like a hard link, equivalent in all respects to the original name, so that you can delete the original name and the synonym continues to work. Likely the pg_hba.conf would need to still distinguish between the DB/user names and their synonyms - not sure if that could be useful or would be a PITA. Of course this has nothing to do with the table synonyms - there I didn't have yet any reason I would use one... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DELETE with LIMIT (or my first hack)
Hi Robert, On Tue, 2010-11-30 at 09:19 -0500, Robert Haas wrote: > That's a very elegant hack, but not exactly obvious to a novice user > or, say, me. So I think it'd be nicer to have the obvious syntax > work. I fully agree - but you first have to convince core hackers that this is not just a foot-gun. This was discussed many times in the past, patches were also offered (perhaps not complete one, but proving that there is an itch getting scratched): http://archives.postgresql.org/pgsql-patches/2002-09/msg00255.php The reaction: http://archives.postgresql.org/pgsql-patches/2002-09/msg00256.php There are other discussions too, if I remember correctly Tom once admitted that the core of implementing the feature would likely consist in letting it work, as the infrastructure is there to do it but it is actively disabled. I can't find the mail now though. So it is really an ideological thing and not lack of demand or implementation attempts... I for myself can't write working C code anyway, so I got my peace with the workaround - I wish you good luck arguing Tom :-) Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DELETE with LIMIT (or my first hack)
Hi all, The workaround recommended some time ago by Tom is: DELETE FROM residents_of_athens WHERE ctid = any(array(SELECT ctid FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1)); It is about as efficient as the requested feature would be, just uglier to write down. I use it all the time when batch-deleting something large (to avoid long running transactions and to not crash slony). It also helps to vacuum frequently if you do that on large amount of data... Cheers, Csaba. On Tue, 2010-11-30 at 00:05 -0500, Robert Haas wrote: > On Mon, Nov 29, 2010 at 11:25 PM, Andrew Dunstan wrote: > > > > > > On 11/29/2010 10:19 PM, Robert Haas wrote: > > > > For example, suppose we're trying to govern an ancient Greek > > democracy: > > > > http://en.wikipedia.org/wiki/Ostracism > > > > DELETE FROM residents_of_athens ORDER BY ostracism_votes DESC LIMIT 1; > > > > I'm not sure this is a very good example. Assuming there isn't a tie, I'd do > > it like this: > > > > DELETE FROM residents_of_athens > > WHERE ostracism_votes >= 6000 > >and ostracism_votes = > > (SELECT max(ostracism_votes) > > FROM residents_of_athens); > > That might be a lot less efficient, though, and sometimes it's not OK > to delete more than one record. Imagine, for example, wanting to > dequeue the work item with the highest priority. Sure, you can use > SELECT ... LIMIT to identify one and then DELETE it by some other key, > but DELETE .. ORDER BY .. RETURNING .. LIMIT would be cool, and would > let you do it with just one scan. > > > I can't say I'd be excited by this feature. In quite a few years of writing > > SQL I don't recall ever wanting such a gadget. > > It's something I've wanted periodically, though not badly enough to do > the work to make it happen. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Thu, 2010-09-23 at 11:43 -0400, Tom Lane wrote: > > What other problems are there that mean we *must* have a file? > > Well, for one thing, how do you add a new slave? If its configuration > comes from a system catalog, it seems that it has to already be > replicating before it knows what its configuration is. Or the slave gets a connection string to the master, and reads the configuration from there - it has to connect there anyway... The ideal bootstrap for a slave creation would be: get the params to connect to the master + the replica ID, and the rest should be done by connecting to the master and getting all the needed thing from there, including configuration. Maybe you see some merit for this idea: it wouldn't hurt to get the interfaces done so that the master could be impersonated by some WAL repository serving a PITR snapshot, and that the same WAL repository could connect as a slave to the master and instead of recovering the WAL stream, archive it. Such a WAL repository would possibly connect to multiple masters and could also get regularly snapshots too. This would provide a nice complement to WAL replication as PITR solution using the same protocols as the WAL standby. I have no idea if this would be easy to implement or useful for anybody. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Thu, 2010-09-23 at 16:18 +0300, Heikki Linnakangas wrote: > There's a program called pg_readahead somewhere on pgfoundry by NTT that > will help if it's the single-threadedness of I/O. Before handing the WAL > file to the server, it scans it through and calls posix_fadvise for all > the blocks that it touches. When the server then replays it, the data > blocks are already being fetched by the OS, using the whole RAID array. That sounds useful, thanks for the hint ! But couldn't this also be directly built in to WAL recovery process ? It would probably help a lot for recovering from a crash too. We did have recently a crash and it took hours to recover. I will try it out as soon as I get the time to set it up... [searching pgfoundry] Unfortunately I can't find it, and google is also not very helpful. Do you happen to have some links to it ? Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
On Thu, 2010-09-23 at 12:02 +0300, Heikki Linnakangas wrote: > On 23/09/10 11:34, Csaba Nagy wrote: > > In the meantime our DBs are not able to keep in sync via WAL > > replication, that would need some kind of parallel WAL restore on the > > slave I guess, or I'm not able to configure it properly - in any case > > now we use slony which is working. > > It would be interesting to debug that case a bit more. Was bottlenecked > by CPU or I/O, or network capacity perhaps? Unfortunately it was quite long time ago we last tried, and I don't remember exactly what was bottlenecked. Our application is quite write-intensive, the ratio of writes to reads which actually reaches the disk is about 50-200% (according to the disk stats - yes, sometimes we write more to the disk than we read, probably due to the relatively large RAM installed). If I remember correctly, the standby was about the same regarding IO/CPU power as the master, but it was not able to process the WAL files as fast as they were coming in, which excludes at least the network as a bottleneck. What I actually suppose happens is that the one single process applying the WAL on the slave is not able to match the full IO the master is able to do with all it's processors. If you're interested, I could try to set up another try, but it would be on 8.3.7 (that's what we still run). On 9.x would be also interesting, but that would be a test system and I can't possibly get there the load we have on production... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Configuring synchronous replication
Hi all, Some time ago I was also interested in this feature, and that time I also thought about complete setup possibility via postgres connections, meaning the transfer of the files and all configuration/slave registration to be done through normal backend connections. In the meantime our DBs are not able to keep in sync via WAL replication, that would need some kind of parallel WAL restore on the slave I guess, or I'm not able to configure it properly - in any case now we use slony which is working. In fact the way slony is doing the configuration could be a good place to look... On Wed, 2010-09-22 at 13:16 -0400, Robert Haas wrote: > > I guarantee you there is a way around the cascade slave problem. > > And that would be...? * restrict the local file configuration to a replication ID; * make all configuration refer to the replica ID; * keep all configuration in a shared catalog: it can be kept exactly the same on all replicas, as each replication "node" will only care about the configuration concerning it's own replica ID; * added advantage: after take-over the slave will change the configured master to it's own replica ID, and if the old master would ever connect again, it could easily notice that and give up; Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Differential backup
Hi all, On Tue, 2010-04-27 at 11:07 -0400, Merlin Moncure wrote: > The block level case seems pretty much covered by the hot standby feature. One use case we would have is to dump only the changes from the last backup of a single table. This table takes 30% of the DB disk space, it is in the order of ~400GB, and it's only inserted, never updated, then after ~1 year the old entries are archived. There's ~10M new entries daily in this table. If the backup would be smart enough to only read the changed blocks (in this case only for newly inserted records), it would be a fairly big win... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgadmin-hackers] Feature request: limited deletions
Hi all, On Thu, 2010-04-08 at 07:45 -0400, Robert Haas wrote: > >> 2010/4/8 Thom Brown : > >> > So you could write: > >> > > >> > DELETE FROM massive_table WHERE id < 4000 LIMIT 1; > I've certainly worked around the lack of this syntax more than once. > And I bet it's not even that hard to implement. The fact that it's not implemented has nothing to do with it's complexity (in fact it is probably just a matter of enabling it) - you'll have a hard time to convince some old-time hackers on this list that the non-determinism inherent in this kind of query is acceptable ;-) There is a workaround to do it, which works quite good in fact: delete from massive_table where ctid = any(array(select ctid from massive_table WHERE id < 4000 LIMIT 1)); Just run an explain on it and you'll see it won't get any better, but beware that it might be less optimal than you think, as you will be likely sequential scanning the table for each chunk unless you put some selective where conditions on it too - and then you'll still scan the whole deleted part and not just the next chunk - the deleted records won't go out of the way magically, you need to vacuum, and that's probably a problem too on a big table. So most likely it will help you less than you think on a massive table, the run time per chunk will increase with each chunk unless you're able to vacuum efficiently. In any case you need to balance the chunk size with the scanned portion of the table so you get a reasonable run time per chunk, and not too much overhead of the whole chunking process... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Getting to beta1
Hi all, On Thu, 2010-03-18 at 10:18 -0700, Josh Berkus wrote: > Or, let's put it another way: I've made my opinion clear in the past > that I think that we ought to ship with a minimal postgresql.conf with > maybe 15 items in it. If we are going to continue to ship with > postgresql.conf "kitchen sick" version, however, it should include > vacuum_defer_cleanup_age. But considering that these are samples anyway, what is preventing to ship 2 versions, one labeled "minimal" and one labeled "full" ? The minimal one should only contain absolutely must-change items, and the full version should contain all. As simple as that. I don't think there's any value in anything in the middle... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A thought on Index Organized Tables
Hi all, On Mon, 2010-02-22 at 10:29 +, Greg Stark wrote: > On Mon, Feb 22, 2010 at 8:18 AM, Gokulakannan Somasundaram > wrote: > > a) IOT has both table and index in one structure. So no duplication of data > > b) With visibility maps, we have three structures a) Table b) Index c) > > Visibility map. So the disk footprint of the same data will be higher in > > postgres ( 2x + size of the visibility map). > > These sound like the same point to me. I don't think we're concerned > with footprint -- only with how much of that footprint actually needs > to be scanned. For some data the disk foot-print would be actually important: on our data bases we have one table which has exactly 2 fields, which are both part of it's primary key, and there's no other index. The table is write-only, never updated and rarely deleted from. The disk footprint of the table is 30%-50% of the total disk space used by the DB (depending on the other data). This amounts to about 1.5-2TB if I count it on all of our DBs, and it has to be fast disk too as the table is heavily used... so disk space does matter for some. And yes, I put the older entries in some archive partition on slower disks, but I just halve the problem - the data is growing exponentially, and about half of it is always in use. I guess our developers are just ready to get this table out of postgres and up to hadoop... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Thoughts on statistics for continuously advancing columns
On Wed, 2009-12-30 at 17:16 +0100, Tom Lane wrote: > I think the cleanest solution to this would be to make ANALYZE > cheaper, perhaps by finding some way for it to work incrementally. What if when inserting/deleting a tuple, some random sample of them would be passed into an auto-analyze buffer ? Then a special process (the auto-analyze daemon) would process them and update the statistics incrementally based on the new values found (which might or might not be mathematically feasible). The overhead for each backend process would be kept in limits by the rate at which you randomly send or not send the change to the analyze buffer. The processing overhead would be kept in limits by the processing rate of the auto-analyze process, which can be made to periodically sleep or it could be made to span multiple processes (on multiprocessor systems). If the buffer is full, then you skip putting in it... so it also could autotune itself to a sustainable rate. Of course as with all my other posts on hackers, this is all mostly hand-waving, I have no clue about the feasibility of all this with regard to the current state of the code (which I didn't read, I unfortunately found myself hating reading C code beyond reason, and writing any of it till now resumed to copy-paste-modify). Cheers, Csaba. Csaba Nagy Software Engineer eCircle P: +49 (0)89 / 120 09-783 | F: +49 (0)89 / 120 09-750 E: c.n...@ecircle.com Nymphenburger Str. 86, 80636 München Stay in touch Web: www.ecircle.com/de | Newsletter: www.ecircle.com/index.php?id=63&L=0 Für Hilfe mit dem eC-messenger wenden Sie sich bitte an unseren Support: support...@ecircle.com. Neuste Untersuchungen Ein unschlagbares Doppel: E-mail-Marketing & Webanalyse Download Whitepaper: www.ecircle.com/index.php?id=61&L=0 eCircle AG, HRB 136 334, Handelsregister München Vorstand: Volker Wiewer (Vorsitzender), Thomas Wilke, Lars Wössner, Alexander Meyer Vorsitzender des Aufsichtsrates: Dr. Mark Wössner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deadlock error in INSERT statements
On Wed, 2009-10-07 at 16:34 +0200, Gnanam wrote: > NOTE: I've seen "deadlock" errors in UPDATE statement but why it is throwing > in INSERT statements. It is because of the foreign key. Inserting a child row will lock the corresponding parent row, and if you insert multiple rows with different parents in the same transaction, and do that in different concurrent transactions but in different order of the parent rows, you can get a deadlock. If you keep in mind that the parent row is locked on the insert of a child row, you will figure out what's happening... BTW, I don't think the "hackers" list is the right one for this kind of question, better use the "general" list... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] moving system catalogs to another tablespace
Hi all, On Tue, 2009-10-06 at 16:58 +0200, Tom Lane wrote: > Yeah, I have sometimes thought that pg_largeobject shouldn't be > considered a system catalog at all. It's more nearly like a toast > table, ie, it's storing "out of line" user data. pg_largeobject in it's current form has serious limitations, the biggest one is that it can't have triggers, and thus it can't be replicated by trigger based replication like slony. I ended up rolling my own large object table, modeling exactly the behavior of pg_largeobject but on the client side, except I can replicate it... and a few other simple things like easily duplicating an entry from client side code, and easier control of the large object ID ranges - BTW, OID is not the best data type for a client visible primary key, then better BIGINT, oid is unsigned and in Java for example won't cleanly map to any data type (java long is twice as big as needed and int is signed and won't work for all OID values - we finally had to use long, but then BIGINT is a better match). Considering that the postgres manual says: "using a user-created table's OID column as a primary key is discouraged", I don't see why use OID as the primary key for a table which can potentially outgrow the OID range. The backup is also not a special case now, it just dumps the table. I don't know what were the reasons of special casing pg_largeobject, but from a usability POV is fairly bad. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication patch for CommitFest 2009-09
On Thu, 2009-09-17 at 10:08 +0200, Heikki Linnakangas wrote: > Robert Haas suggested a while ago that walreceiver could be a > stand-alone utility, not requiring postmaster at all. That would allow > you to set up streaming replication as another way to implement WAL > archiving. Looking at how the processes interact, there really isn't > much communication between walreceiver and the rest of the system, so > that sounds pretty attractive. Just a small comment in this direction: what if the archive would be itself a postgres DB, and it would collect the WALs in some special place (together with some meta data, snapshots, etc), and then a slave could connect to it just like to any other master ? (except maybe it could specify which snapshot to to start with and possibly choosing between different archived WAL streams). Maybe it is completely stupid what I'm saying, but I see the archive as just another form of a postgres server, with the same protocol from the POV of a slave. While I don't have the clue to implement such a thing, I thought it might be interesting as an idea while discussing the walsender/receiver interface... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Wed, 2009-08-12 at 18:07 +0200, Andrew Dunstan wrote: > > Csaba Nagy wrote: > > On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > Well, the right solution would actually be NOT to use CDATA but to > replace a literal linefeed with the XML numeric escape , but I > really don't think it's necessary. Yes you're right, with the extraction too... > The extraction tools will be simple whether or not we put everything on > one line. Well I normally avoid anything containing caffeine, and today I had my first coffee of the year and 2 cokes too. I would say ignore the whole discussion of on the grounds of abnormal hyperactivity on my part... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Wed, 2009-08-12 at 17:41 +0200, Andrew Dunstan wrote: > > Csaba Nagy wrote: > > Then why you bother calling it "machine readable" at all ? Would you > > really read your auto-explain output on the DB server ? I doubt that's > > the common usage scenario, I would expect that most people would let a > > tool extract/summarize it and definitely process it somewhere else than > > on the DB machine, with the proper tool set. > Sure I would. I look at log files almost every day to find out things. > Why should I have to wade through a pile of utterly unreadable crap to > find it? I look at log files every day too (not almost, but literally every day), but I really can't imagine myself looking at an explain output directly in the log file. Even with the output of an 8.2 server which has less detail than I think the new formats have, I always copy the text from the psql prompt to some friendlier tool where I can play around with it, delete parts of it if needed for the sake of clear overview and where I can easily switch between line-wrap or not and such. I simply don't believe that a remotely human presentation of the thing worths the slightest compromise in machine readability. That said, I would like to finish this discussion here because it gets pointless, I agree to let us disagree :-) > Auto-explain lets you have *one* output format. To follow your approach, > I will have to change that, and have two log files, one machine > processable and one human readable. Triple bleah. By ad-hoc I didn't mean reading the auto-explain log, that's surely no ad-hoc operation... I would make that a mandatory daily routine which is better handled by a tool which serves you directly the plans of the worst performing queries sorted by runtime for example and highlighting the obvious planner mis-estimates. By ad-hoc I mean a query you examine on the psql command line, and there you can expect human readable of course without considerations to machine readability. > I have not suggested anything that would break the machine readability. > You seem to think that making the machine readable output remotely human > friendly is somehow going to detract from its machine processability. > But that's just silly, frankly. I do not want and should not have to > choose between a format that is machine readable and one that is to some > extent human readable. OK, if you can do that it's fine... but I'm afraid that will lead to compromises on the machine readability side and will only delay the whole thing. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Wed, 2009-08-12 at 17:31 +0200, Csaba Nagy wrote: > On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > > That will just make things worse. And it will break if the XML includes > > any expression that contains a line break. > > Then escape the expressions using CDATA or such... I'm sure it would be > possible to make sure it's one line and rely on that. That's part of > being machine readable, being able to rely on getting it at all without > too much parsing magic... Looks like today I'm talking consistently stupid... CDATA of course won't help in avoiding line breaks, but it would be still possible to define an XML entity for that. This starts to be too complicated anyway, so probably not worth it. In any case I still think making it easy for a tool to extract/parse the information should be higher priority than human readability - because the information itself is not really human readable without external help except for simple queries, and I doubt the main use case is simple queries. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote: > That will just make things worse. And it will break if the XML includes > any expression that contains a line break. Then escape the expressions using CDATA or such... I'm sure it would be possible to make sure it's one line and rely on that. That's part of being machine readable, being able to rely on getting it at all without too much parsing magic... > I repeat, I want to be able to have a log file that is both machine > processable and not utterly unreadable by a human. And I do not accept > at all that this is impossible. Nor do I accept I should need some extra > processing tool to read the machine processable output without suffering > brain damage. If we were to adopt your approach I bet you would find > that nobody in their right mind would use the machine readable formats. Then why you bother calling it "machine readable" at all ? Would you really read your auto-explain output on the DB server ? I doubt that's the common usage scenario, I would expect that most people would let a tool extract/summarize it and definitely process it somewhere else than on the DB machine, with the proper tool set. For ad-hoc explain analysis which I think it's the typical use case for on-the-DB-server inspection of text-format explain output you would surely use something else than what is called "machine readable" format... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Wed, 2009-08-12 at 16:51 +0200, Csaba Nagy wrote: > I argue that a sufficiently complicated explain output will never be > easily navigated in a text browser, however much you would like it. If > you do a where clause with 100 nested ANDs (which occasionally happens > here), I don't think you'll be able to cleanly show that in a text > browser - it will simply not fit in no matter how you format it. But > using the right GUI tool (even a generic XML one) it would be easy to > just temporarily collapse the whole top AND and have a clean view of the > rest. Just a small note: I don't know how the machine-readable auto-explain output actually looks like, so I just assumed individual conditions will have their own XML node. But even if they don't have and the AND thing would be flat in the explain output, the same argument applies to even a few 10s of joins or sub-selects (and we do have that too). Sometimes collapsing parts of the plan would help big time in having a clean picture, which (starting with a certain complexity of the query) you will have a hard time to do in plain text regardless of the formatting. I do have plenty of explain outputs (from 8.2 postgres) which don't fit on my screen in text form even without further details I gather you can get in the machine readable form. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Wed, 2009-08-12 at 15:42 +0200, Andrew Dunstan wrote: > Have you actually looked at a logfile with this in it? A simple > stylesheet won't do at all. What you get is not an XML document but a > text document with little bits of XML embedded in it. So you would need > a program to parse that file and either turn it into a single legal XML > document or pass each piece of XML individually to your XSLT processor. > Bleah. I'm pretty sure you will never find a human readable format which is easily extracted from the logs by a program. But if you format the XML in a (very human unreadable) one-line-without-breaks format then it will be a lot easier extracted by a program and formatted at your will. > And all this because you pose a false dichotomy between correctness and > completeness on one hand and human readability on the other. I don't > accept that at all. I think we can and should improve human readability > without sacrificing anything on the correctness and completeness front. > In fact, that also needs improving, and we can do them both at the same > time. I really really doubt that. I would go here on the UNIX approach of piping the things through the right tools, each one doing a simple and good job for it's single and well defined purpose. So let the explain spit out a line of XML without much thought about formatting but focusing on completeness, making it easy for tools to get that line, and then let the tools do the formatting depending on what you want to do with the information. Each part will be simpler than you would put in a directly human readable XML (if that's possible at all) approach, which will anyway not cover all the uses and tastes. > I want to be able to have machine readable explain output, but I also > want to be able to browse the logs without wasting more brain cells than > necessary and without having to use external tools other than by > standard text browser (less). As Pooh Bear said, "Both please!" I argue that a sufficiently complicated explain output will never be easily navigated in a text browser, however much you would like it. If you do a where clause with 100 nested ANDs (which occasionally happens here), I don't think you'll be able to cleanly show that in a text browser - it will simply not fit in no matter how you format it. But using the right GUI tool (even a generic XML one) it would be easy to just temporarily collapse the whole top AND and have a clean view of the rest. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] machine-readable explain output v4
On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote: > Well, I don't think that the fact that we are producing machine-readable > output means we can just ignore the human side of it. It is more than > likely that such output will be read by both machines and humans. > Obviously, we need to make sure that it meets machine processing needs > first, but once we have done that we should not ignore the human > requirements. XML is easy to transform to about anything else. I would vote for the XML output to only focus on machine readability and completeness, and then if needed provide style sheets to make it human readable. Then anybody could have his preferred style to look at it. If there would be a tool to format the XML according to some style sheet (to make it easy for those who don't care about XML and style sheets), I would volunteer to provide the XSL style sheets for different formats on request. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automating Partitions in PostgreSQL - Query on syntax
On Tue, 2009-04-21 at 11:43 -0400, Robert Haas wrote: > This doesn't sound like a very good idea, because the planner cannot > then rely on the overflow table not containing tuples that ought to be > within some other partition. > > The big win that is associated with table partitioning is using > constraint exclusion to avoid unnecessary partitions scans. Well it could always check 2 partitions: the overflow and the one selected by the constraint exclusion. If the overflow is kept empty by properly setting up the partitions so that all insertions always go to one of the active partitions, that would be cheap enough too while still providing a way to catch unexpected data. Then when a new partition is defined, there's no need to shuffle around data immediately, but there could be a maintenance command to clean up the overflow... not to mention that you could define a trigger to create the new partition once you get something in the overflow (how cool would that be if it would work ?). Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
On Wed, 2008-10-01 at 16:57 +0100, Gregory Stark wrote: > I wonder if we could do something clever here though. Only one process > is busy > calculating the checksum -- it just has to know if anyone fiddles the hint > bits while it's busy. What if the hint bits are added at the very end to the checksum, with an exclusive lock to them ? Then the exclusive lock should be short enough... only it might be deadlock-prone as any lock upgrade... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
On Fri, 2008-09-12 at 17:24 +0300, Hannu Krosing wrote: > On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote: > > Hmm, built-in rsync capability would be cool. Probably not in the first > > phase, though.. > > We have it for WAL shipping, in form of GUC "archive_command" :) > > Why not add full_backup_command ? I see the current design is all master-push centered, i.e. the master is in control of everything WAL related. That makes it hard to create a slave which is simply pointed to the server and takes all it's data from there... Why not have a design where the slave is in control for it's own data ? I mean the slave could ask for the base files (possibly through a special function deployed on the master), then ask for the WAL stream and so on. That would easily let a slave cascade too, as it could relay the WAL stream and serve the base backup too... or have a special WAL repository software with the same interface as a normal master, but having a choice of base backups and WAL streams. Plus that a slave in control approach would also allow multiple slaves at the same time for a given master... The way it would work would be something like: * configure the slave with a postgres connection to the master; * the slave will connect and set up some meta data on the master identifying itself and telling the master to keep the WAL needed by this slave, and also get some meta data about the master's details if needed; * the slave will call a special function on the slave and ask for the base backup to be streamed (potentially compressed with special knowledge of postgres internals); * once the base backup is streamed, or possibly in parallel, ask for streaming the WAL files; * when the base backup is finished, start applying the WAL stream, which is cached in the meantime, and it it's streaming continues; * keep the master updated about the state of the slave, so the master can know if it needs to keep the WAL files which were not yet streamed; * in case of network error, the slave connects again and starts to stream the WAL from where it was left; * in case of extended network outage, the master could decide to unsubscribe the slave when a certain time-out happened; * when the slave finds itself unsubscribed after a longer disconnection, it could ask for a new base backup based on differences only... some kind of built in rsync thingy; The only downside of this approach is that the slave machine needs a full postgres super user connection to the master. That could be a security problem in certain scenarios. The master-centric scenario needs a connection in the other direction, which might be seen as more secure, I don't know for sure... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Fri, 2008-09-12 at 15:08 +0300, Hannu Krosing wrote: > > * how will the buffers keep 2 different versions of the same page ? > > As the FS snapshot is mounted as a different directory, it will have > it's own buffer pages. Lack of knowledge about this shows my ignorance about the implementation of the page buffers... > > * how will you handle the creation of snapshots ? > > probably an external command, possibly shell script. > similar to current "archive_command" for wal copying > > maybe 'create_fs_snapshot_command' and 'destroy_fs_snapshot_command' [snip] > Yes, the simplest one being external command. As FS snapshots are > supposed to happen not-too often, at least not every second, just having > external commands may be enough. You could restrict the creation of snapshots to some minimum amount of time between them, and maybe also restrict the maximum number of concurrent snapshots possible. Then if the time limit (as calculated from the last open snapshot) is currently not met, any new query could reuse that last snapshot. The time intervals do not need to be evenly distributed BTW, it could be a function of the already opened snapshots, like increase the minimum interval exponentially with the number of already opened snapshots. That would help to catch more long running queries to just a few snapshots. > > I hope my continuous questioning is not too annoying... > > On the contrary, much appreciated. :) Ok, then I'll continue :-) I would like to see this feature succeed, but there's slim chance I'll ever code well in C... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
> I think that enabling long-running queries this way is both > low-hanging > fruit (or at least medium-height-hanging ;) ) and also consistent to > PostgreSQL philosophy of not replication effort. As an example we trust > OS's file system cache and don't try to write our own. I have again questions (unfortunately I only have questions usually): * how will the buffers keep 2 different versions of the same page ? * how will you handle the creation of snapshots ? I guess there's no portable and universal API for that (just guessing), or there is some POSIX thing which is supported or not by the specific FS ? So if the FS is not supporting it, you skip the snapshot step ? And if there's no universal API, will it be handled by plugins providing a specified API for snapshotting the FS ? I hope my continuous questioning is not too annoying... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Fri, 2008-09-12 at 12:31 +0100, Richard Huxton wrote: > There was a suggestion (Simon - from you?) of a transaction voluntarily > restricting itself to a set of tables. While thinking about how easy it would be for the DBA to specify the set of tables a single query is accessing, first I thought that it should be straight enough to look at the query itself for that. Then I thought what about views, rules, triggers, user functions etc. ? All those have the potential to access more than you see in the query itself. And then the actually interesting question: what will the slave do with views, rules, triggers ? I guess triggers are out of the question to be executed, what about rules ? Probably must be also ignored... user functions will probably get errors if they try to update something... Views should probably function correctly. So in any case the functionality available for querying slaves would be less than for the primary. This is probably good enough for most purposes... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Fri, 2008-09-12 at 09:38 +0100, Simon Riggs wrote: > If you request a block, we check to see whether there is a lookaside > copy of it prior to the tuple removals. We then redirect the block > request to a viewpoint relation's block. Each viewpoint gets a separate > relfilenode. We do the switcheroo while holding cleanup lock on block. Wouldn't it make sense to also have a hint bit on the pages which are copied away ? Then instead of looking up a hash table, you first would look up that bit, and if not set you won't look up the hash table at all. Then when you clean up the "lookaside copies" you clear those bits too... That would probably perform somewhat better for reading than always looking up a potentially big hash table, and the cost of setting the hint is probably a lot less than copying away the page in the first place. Resetting the hint bit might be a bit more expensive. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 15:33 +0200, Dimitri Fontaine wrote: > What would forbid the slave to choose to replay all currently lagging WALs > each time it's given the choice to advance a little? Well now that I think I understand what Heikki meant, I also think the problem is that there's no choice at all to advance, because the new queries will simply have the same snapshot as currently running ones as long as WAL reply is blocked... further blocking the WAL reply. When saying this I suppose that the snapshot is in fact based on the last recovered XID, and not on any slave-local XID. In that case once WAL recovery is blocked, the snapshot is stalled too, further blocking WAL recovery, and so on... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 16:19 +0300, Heikki Linnakangas wrote: > Well, yes, but you can fall behind indefinitely that way. Imagine that > each transaction on the slave lasts, say 10 minutes, with a new > transaction starting every 5 minutes. On the master, there's a table > that's being vacuumed (or HOT-updated) frequently, say after each > transaction for simplicity. What can happen is that every transaction > that finishes on the slave will only let the WAL replay advance by one > XID before blocking on the snapshot of the next slave transaction. The > WAL replay will advance at a rate of 0.2 TPM, while the master is > generating 1.0 TPM. Aha, now I see where I was mistaken... I thought in terms of time and not transaction IDs. So the time distance between the slave transactions does not matter at all, only the distance in recovered XIDs matter for the "blocking horizon"... and if the WAL recovery is blocked, the "blocking horizon" is stalled as well, so the next transaction on the slave will in fact require the same "blocking horizon" as all currently running ones. Now I got it... and that means in fact that if you have continuously overlapping small transactions, the "blocking horizon" could be even blocked forever, as there'll always be a query running, and the new queries will always have the snapshot of the currently running ones because WAL recovery is stalled... or at least that's what I understand from the whole thing... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote: > One problem with this, BTW, is that if there's a continuous stream of > medium-length transaction in the slave, each new snapshot taken will > prevent progress in the WAL replay, so the WAL replay will advance in > "baby steps", and can fall behind indefinitely. Why would it fall behind indefinitely ? It only should fall behind to the "blocking horizon", which should be the start of the longest currently running transaction... which should be continually advancing and not too far in the past if there are only medium length transactions involved. Isn't normal WAL recovery also doing baby-steps, one WAL record a time ? ;-) Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Thu, 2008-09-11 at 15:23 +0300, Heikki Linnakangas wrote: > I'd imagine that even if applying the WAL on the slave is blocked, it's > still streamed from the master to the slave, and in case of failover the > slave will fast-forward before starting up as the new master. Which begs the question: what happens with a query which is running on the slave in the moment when the slave switches from recovery mode and starts up ? Should the running queries be canceled if they are blocking applying of WAL, to allow start-up, or let them finish ? Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
On Tue, 2008-09-09 at 20:59 +0200, Zeugswetter Andreas OSB sIT wrote: > All in all a useful streamer seems like a lot of work. I mentioned some time ago an alternative idea of having the slave connect through a normal SQL connection and call a function which streams the WAL file from the point requested by the slave... wouldn't that be feasible ? All the connection part would be already there, only the streaming function should be implemented. It even could use SSL connections if needed. Then you would have one normal backend per slave, and they should access either the files directly or possibly some shared area where the WAL is buffered for this purpose... the streaming function could also take care of signaling the "up-to-dateness" of the slaves in case of synchronous replication. There could also be some system table infrastructure to track the slaves. There could also be some functions to stream the files of the DB through normal backends, so a slave could be bootstrapped all the way from copying the files through a simple postgres backend connection... that would make for the easiest possible setup of a slave: configure a connection to the master, and hit "run"... and last but not least the same interface could be used by a PITR repository client for archiving the WAL stream and occasional file system snapshots. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching between terminals
On Thu, 2008-07-03 at 19:56 +0530, cinu wrote: > Could anyone please tell me where I am going wrong and if there is a > way I can get the same behaviour that I am getting while I am > executing the through psql prompt. You're mistake is that you think a transaction is related to your terminal, but it is in fact tied to the psql session you are running... Your first example is running one psql instance per terminal, hence one transaction per terminal, while in your second example the transaction is terminated each time psql finishes to run. Basically what you're asking for is to keep a transaction opened by one session (the first psql execution) and connect to it with the second session (the second psql call) and continue the transaction which was opened by the first one... which I'm pretty sure is wrong to want. It is likely possible to do (using PREPARE TRANSACTION), but even likelier that it is a wrong thing to do in normal circumstances. If you'll say what you really want to do, I bet you'll get a lot more useful advices... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command execution
On Thu, 2008-07-03 at 23:15 +1000, Aaron Spiteri wrote: > Inside foo there was a INSERT and UPDATE, and the INSERT failed but > the UPDATE succeeded would the UPDATE be rolled back? Just to add to the other answers, if the INSERT is before the UPDATE in the function, the function execution stops when the INSERT fails, and so the UPDATE will never be executed in the first place... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Runtime checking of MCV (Was: ... histogram bucket numdistinct statistics)
On Tue, 2008-06-10 at 19:03 -0400, Tom Lane wrote: > Given such an MCV list, the planner will always make the right choice > of whether to do index or seqscan ... as long as it knows the value > being searched for, that is. Parameterized plans have a hard time here, > but that's not really the fault of the statistics. This is maybe the best example where multiple (sub)plans could be glued together with some kind of plan fork node, so that the actual plan to be executed would be decided based on the parameter values and checking the statistics at runtime instead of plan time for parameterized plans... so the planner creates alternative (sub)plans (e.g. seqscan vs index scan) for the cases where the parameters are MCV or not, and then place them in different branches of a runtime check of the parameter values vs the statistics. Of course the number of branches must be limited, this would be the challenge of such a feature... to cover the parameter space with the minimal number of plan branches so that disastrous plans for special parameter values are avoided. It would also be possible perhaps to gradually grow the alternative counts as a reaction to the actual parameter values used by queries, so that only the parameter space actually in use by queries is covered. In fact I would be interested in experimenting with this. Would it be possible to add new planner behavior as external code ? I would expect not, as the planner is in charge also for the correctness of the results and any external code would put that correctness at risk I guess... in any case, I'll go and check the source. BTW, there was a discussion about global prepared statements/caching of query plans, is there any advance on that ? Thorough planning would make the most sense in that context, possibly by using a special syntax for the application to signal the need for such planning for the most problematic (not necessarily the most used though) queries. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote: > Hmm, WAL version compatibility is an interesting question. Most minor > releases hasn't changed the WAL format, and it would be nice to allow > running different minor versions in the master and slave in those cases. > But it's certainly not unheard of to change the WAL format. Perhaps we > should introduce a WAL version number, similar to catalog version? Would that also cover possible differences in page size, 32bit OS vs. 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these things can have an influence on how the data is written and possibly make the WAL incompatible with other postgres instances, even if the exact same version... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Fwd: Re: [HACKERS] Core team statement on replication in PostgreSQL
[Looks like this mail missed the hackers list on reply to all, I wonder how it could happen... so I forward it] On Thu, 2008-05-29 at 17:00 +0100, Dave Page wrote: > Yes, we're talking real-time streaming (synchronous) log shipping. Is there any design already how would this be implemented ? Some time ago I was interested in this subject and thought about having a normal postgres connection where the slave would issue a query to a special view which would simply stream WAL records as bytea from a requested point, without ever finishing. This would have the advantage (against a custom socket streaming solution) that it can reuse the complete infrastructure of connection making/managing/security (think SSL for no sniffing) of the postgres server. It would also be a public interface, which could be used by other possible tools too (think PITR management application/WAL stream repository). Another advantage would be that a PITR solution could be serving as a source for the WAL stream too, so the slave could either get the real time stream from the master, or rebuild a PITR state from a WAL repository server, using the same interface... Probably some kind of WAL subscription management should be also implemented, so that the slave can signal the master which WAL records it already applied and can be recycled on the master, and it would be nice if there could be multiple subscribers at the same time. Some subscriber time-out could be also implemented, while marking the subscription as timed out, so that the slave can know that it has to rebuild itself... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] constraint exclusion analysis caching
On Fri, 2008-05-09 at 08:47 -0400, Andrew Dunstan wrote: > However, I wondered if we couldn't mitigate this by caching the results > of constraint exclusion analysis for a particular table + condition. I > have no idea how hard this would be, but in principle it seems silly to > keep paying the same penalty over and over again. This would be a perfect candidate for the plan-branch based on actual parameters capability, in association with globally cached plans mentioned here: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00920.php Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 2008-04-14 at 17:08 +0200, PFC wrote: > Those "Decision" nodes could potentially lead to lots of decisions > (ahem). > What if you have 10 conditions in the Where, plus some joined ones ? > That > would make lots of possibilities... Yes, that's true, but most of them are likely not relevant for the end result. In any real life query there are a few parameters which are really important for what plan you should choose... the key here is that you should spend more time on finding the possibilities for a cached plan than you do for a one shot query. In principle one-shot planning should be the default and caching should be something the user has to chose deliberately. I would really like a special command to plan and cache a query without actually executing it, possibly having a parameter how hard to try... for e.g. you could expend the extra cycles to eliminate all redundancies from boolean expressions, in lists, to get the parse tree in a canonical format - all things which can make planning easier. All these lose in one-shot queries, but once you cache you can really do a lot of smarts which were no-no before... > Consider several types of queries : > > - The small, quick query which returns one or a few rows : in this > case, > planning overhead is large relative to execution time, but I would venture > to guess that the plans always end up being the same. Consider a 'select a where b like $1' -> the parameter $1 will considerably affect the query plan. A query can't go much simpler... > - The query that takes a while : in this case, planning overhead is nil > > compared to execution time, better replan every time with the params. I guess these queries are not the ones targeted by this feature. In fact for these queries it really doesn't matter if you cache or not, except: if you know you're gonna cache, you'll expend more effort planning right, and that could still matter for a query which runs long. Note that if you don't cache, planning harder will lose in the long run, only once you cache you can afford to plan harder... > - The complex query that still executes fast because it doesn't process > a > lot of rows and postgres finds a good plan (for instance, a well optimized > search query). Those would benefit from reducing the planning overhead, > but those also typically end up having many different plans depending on > the search parameters. Besides, those queries are likely to be dynamically > generated. So, would it be worth it to add all those features just to > optimize those ? I don't know... We have here dynamically generated queries which are specifically chunked to be executed in small increments so none of the queries runs too long (they would block vacuuming vital tables otherwise). Those chunks would greatly benefit from properly planned and cached plans... A real smart system would store canonical plan fragments as response to (also canonicalized) parse tree fragments, and then assemble the plan out of those fragments, but that would be indeed really complex (to design, the resulting code might be simpler than one thinks) ;-) Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans
On Mon, 2008-04-14 at 10:55 -0400, Mark Mielke wrote: > The other ideas about automatically deciding between plans based on > ranges and such strike me as involving enough complexity and logic, that > to do properly, it might as well be completely re-planned from the > beginning to get the most benefit. ... except if you hard-wire the most common alternative plans, you still get the benefit of cached plan for a wider range of parameter values. Not to mention that if you know you'll cache the plan, you can try harder planning it right, getting possibly better plans for complex queries... you could argue that complex queries tend not to be repeated, but we do have here some which are in fact repeated a lot in batches, then discarded. So I guess a cached plan discard/timeout mechanism would also be nice. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
> ... or plan the query with the actual parameter value you get, and also > record the range of the parameter values you expect the plan to be valid > for. If at execution time the parameter happens to be out of that range, > replan, and possibly add new sublpan covering the extra range. This > could still work with prepared queries (where you don't get any > parameter values to start with) by estimating the most probable > parameter range (whatever that could mean), and planning for that. Another thought: if the cached plans get their own table (as it was suggested) then you could also start gathering parameter range statistics meaningfully... and on the next replan you know what to optimize your planning efforts for. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 2008-04-14 at 16:10 +0200, Csaba Nagy wrote: > ... or plan the query with the actual parameter value you get, and also > record the range of the parameter values you expect the plan to be valid > for. If at execution time the parameter happens to be out of that range, > replan, and possibly add new sublpan covering the extra range. This > could still work with prepared queries (where you don't get any > parameter values to start with) by estimating the most probable > parameter range (whatever that could mean), and planning for that. More on that: recording the presumptions under which the (cached!)plan is thought to be valid would also facilitate setting up dependencies against statistics, to be checked when you analyze tables... and if the key value which you depend on with your query changed, the analyze process could possibly replan it in the background. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
On Mon, 2008-04-14 at 16:54 +0300, Heikki Linnakangas wrote: > Figuring out the optimal "decision points" is hard, and potentially very > expensive. There is one pretty simple scenario though: enabling the use > of partial indexes, preparing one plan where a partial index can be > used, and another one where it can't. Another such case is "col LIKE ?" > queries, where ? is actually a prefix query, "foo%". Another point is when the cardinality distribution of some key's values is very skewed, with some values very frequent and the majority of values being unique. There you could check the stats at execution time just for deciding to go for the low cardinality plan or the high one... > As an optimization, we could decide the decision points on the prepare > message, and delay actually planning the queries until they're needed. > That way we wouldn't waste time planning queries for combinations of > parameters that are never used. ... or plan the query with the actual parameter value you get, and also record the range of the parameter values you expect the plan to be valid for. If at execution time the parameter happens to be out of that range, replan, and possibly add new sublpan covering the extra range. This could still work with prepared queries (where you don't get any parameter values to start with) by estimating the most probable parameter range (whatever that could mean), and planning for that. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached Query Plans (was: global prepared statements)
> The hairiness is in the plan dependence (or independence) on parameter > values, ideally we only want to cache plans that would be good for all > parameter values, only the user knows that precisely. Although it could be > possible to examine the column histograms... If cached plans would be implemented, the dependence on parameter values could be solved too: use special "fork" nodes in the plan which execute different sub-plans depending on special parameter values/ranges, possibly looking up the stats at runtime, so that the plan is in a compiled state with the "decision points" wired in. This of course would mean a lot heavier planning and possibly a lot bigger plans, but you could afford that if you cache the plan. You could even have a special command to plan a query this way. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS][OT] Concurrent psql API
> I find myself doing this frequently with any long-running command, > but currently it's a PITA because I'd doing it at the shell level and > firing up a new psql: more work than should be necessary, and psql > sometimes gets confused when you resume it from the background in > interactive mode (stops echoing characters, though maybe this has > been fixed). I would recommend trying out the 'screen' utility (see my other post too). And here you find a nice .screenrc too which will show you a status bar of your active session, I find it super cool (and it's well commented if you don't like it as it is): http://home.insightbb.com/~bmsims1/Scripts/Screenrc.html The man page has all commands you need, the most used by me: Ctrl-a Ctrl-c -> open a new session; Ctrl-a A -> name the session 8will show up with that name in the status bar, note that the second key is a capital A not a); Ctrl-a Ctrl-a -> switch to the last viewed session; Ctrl-a -> switch to the th session, where is a digit 0-9 I usually leave the screen sessions running end detach only the terminal, and then I can connect again to the already set up sessions using "screen -R". It's a real time saver. It has many more facilities, and creating a new psql session is just Ctrl-a Ctrl-c and then type in psql... and you're good to go... I don't think you can beat that by a large margin with psql-intern commands (you still need to type in something extra), and you do have added benefits of clearly separated workflows and a nice overview of it. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Concurrent psql API
On Thu, 2008-04-10 at 05:03 +0930, Shane Ambler wrote: > I do think it is useful for more than typo's in the \join command. What > about a slip where you forget to \g& the command. Or you start a query > that seems to be taking too long, background it and look into what is > happening. This would be more helpful to those that ssh into a machine > then run psql from there. For interactive use in the above mentioned scenario you can use the 'screen' command and start as many psqls as needed ('man screen' to see what it can do). I would probably always use screen instead of psql's multisession capability in interactive use. I do want to instantly see what is currently running, and a psql screen cluttered with multiple results will not make that easier. Even a list method of what is running will only help if it actually shows the complete SQL for all running sessions and that will be a PITA if the SQLs are many and big. Multiple screens are much better at that. So from my POV scripting should be the main case for such a feature... and there it would be welcome if it would be made easy to synchronize the different sessions. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY Transform support
On Thu, 2008-04-03 at 16:44 +0200, PFC wrote: > CREATE FLATFILE READER mydump ( > id INTEGER, > dateTEXT, > ... > ) FROM file 'dump.txt' > (followed by delimiter specification syntax identical to COPY, etc) > ; Very cool idea, but why would you need to create a reader object first ? You should be able to use COPY directly with the target table being omitted, meaning the copy will not pump it's result in the target but be equivalent to a select... and use it in any place where a select can be used. This would have absolutely no new syntax, just the rules changed... Now that I had a second look you actually need the field definitions to meaningfully interpret the file, but then why not use a record specification instead of the table in the normal COPY command ? I'm not sure if there's any existing syntax for that but I would guess yes... In any case, such a feature would help a lot in processing input files based also on other existing data in the DB. Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Permanent settings
On Tue, 2008-02-19 at 16:41 +0100, Magnus Hagander wrote: > The end result wouldn't be "as clean" as some would expect, but it would > certainly be easier code-wise. For example, I'm sure someone would get the > suggestion to go edit postgresql.conf to change a config value, and be > surprised when it didn't show up as a changed setting because it was > overridden from another file.. Yes, but at least the override part would be nicely separated in a file, and could suggestively be named as something like postgresql.conf.override, and hopefully will stick out sufficiently for those who edit the config file directly to wonder about it's purpose... and of course always editable directly too, so you can easily manually fix foot-shooting mistakes made from the admin interface. It would be just simply rewritten each time you change something without regard to the manual changes, and possibly ignored altogether if your manual changes violate it's expected layout. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Permanent settings
> Are you suggesting we keep appending? So if I set the same parameter 100 > times, it would show up on 100 rows? What about not touching the config file at all, but write to a separate file which is completely under the control of postgres and include that at the end of the config file ? You just said includes are a new feature which could complicate things, so why not use it actually in your advantage ;-) That way disabling the overrides would be as simple as commenting out the inclusion of the postgres controlled config file. And it would separate the user writable and machine writable configuration... Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Postgresql Materialized views
On Mon, 2008-01-14 at 09:22 +, Simon Riggs wrote: > So I support Mark Mielke's views on writing code. Anybody who wants to > code, can. There's probably a project of a size and complexity that's > right for your first project. The main problem is that usually that initial thing is not what you desperately need today... so the motivation will be pretty low unless you just have loads of time to start off playing with the code. > Apparently the guy that invented the new > scheduling algorithms for Linux wasn't even a coder, but he sat down and > worked it out. > This is Hackers: Write some code today, everybody. You *can*. Certainly everybody can write code, but the barrier to accept it is pretty high in the postgres community. So you better be a damn good coder if you expect your code to be accepted... and even then with considerable fight for justifying the use case for your feature ;-) This is all good for a stable product, but it really makes the barrier between simple users and hackers pretty high. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Storage Model for Partitioning
> Which is roughly what Simon's original "Dynamic Partitioning" would be > if it became visible at the planner level (unless I've misunderstood). I > was picturing it in my head as a two-dimensional bitmap with > value-ranges along one axis and block-ranges along the other. Of course, > unlike other indexes it needs visibility information to be of any use. But why not have it as a normal index of ranges ? I'm not familiar with the GIST extensions, but this sounds like a set of records (segments in Simon's terms) indexed by their interval position on a line... isn't that covered by some GIST index type ? > Thinking about it, I'm not sure how my thinking would affect a > full-table seq-scan. You'd not get blocks back in-order throughout the > scan - would that matter? That could be covered by something like the bitmap scan, just on coarser level, the bits covering segments instead of blocks. Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Storage Model for Partitioning
On Fri, 2008-01-11 at 11:34 +, Richard Huxton wrote: > 1. Make an on-disk "chunk" much smaller (e.g. 64MB). Each chunk is a > contigous range of blocks. > 2. Make a table-partition (implied or explicit constraints) map to > multiple "chunks". > That would reduce fragmentation (you'd have on average 32MB's worth of > blocks wasted per partition) and allow for stretchy partitions at the > cost of an extra layer of indirection. This sounds almost like some kind of "clustering index", where the index contains ranges pointing to blocks of data... if the same index is also used for inserting (i.e. the free space map is a partial "cluster index" on blocks with free space), that would be a coarse clustering solution I guess... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps
On Mon, 2008-01-07 at 14:20 +0100, Markus Schiltknecht wrote: > Why is that? AFAIUI, Segment Exclusion combines perfectly well with > clustering. Or even better, with an upcoming feature to maintain > clustered ordering. Where do you see disadvantages such an optimization > for sequential scans? Well, as I understood it, this would be some kind of special case of clustering, where the cluster key is expected to be ever increasing in time and new rows would not be randomly distributed over the complete possible range. In theory you could also have each segment in turn be clustered on some other criteria than the partitioning criteria so indexed access could also be better on the main selection criteria which could be different than the partitioning criteria. All this is of course just speculations - but I guess that's what you expected too in this discussion :-) Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps
On Mon, 2008-01-07 at 13:59 +0100, Markus Schiltknecht wrote: > However, for tables which don't fit the use case of SE, people certainly > don't want such a fill factor to bloat their tables. Sure, but it could be configurable and should only be enabled if the table is marked as partitioned on some condition... I think it would be a bad idea anyway if the DB would start partitioning on some arbitrary criteria based on analyzing he table, so the DBA should be the one to decide on what criteria to partition. In particular it could be a bad idea on occasions to partition on the clustering criteria for tables which were clustered once. Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Dynamic Partitioning using Segment Visibility Maps
On Wed, 2008-01-02 at 17:56 +, Simon Riggs wrote: > Like it? Very cool :-) One additional thought: what about a kind of "segment fill factor" ? Meaning: each segment has some free space reserved for future updates/inserts of records in the same range of it's partitioning constraint. And when inserting/updating you put the new record into the corresponding segment... just like a very coarse clustering. Then you could vacuum the segments separately to keep the free space not running out. For active segments you would then fix the partitioning constraint range once the fill factor is reached, to allow for keeping it's constraint even when heavily updating (heavily vacuuming it too as response to that), and create a new segment for the unbounded range for new inserts... this would work fine for tables where the constraint is based on ever increasing keys and accidental inserts in old ranges (which do happen occasionally in real life). When the change rate of old segments get down, the segments could be reorganized to have a smaller fill factor, so that you still allow for accidental updates but keep space usage efficient. This would be some similar action as a clustering, but hopefully not blocking (which might be a hard thing to do)... and later again you could mark some of the really old things as read only and put them in special segments with no wasted space. One problem would be when the segment's free space runs out, so you must put records from the same constraint range in multiple segments - but that could still work, you just would have multiple segments covering the same range, but if the "segment fill factor" is chosen properly it should not be the case... you could normally maintain a set of non-overlapping segments in terms of the partitioning constraint. Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 14:58 +0200, Hannu Krosing wrote: > Ühel kenal päeval, T, 2007-12-11 kell 13:44, kirjutas Csaba Nagy: > >> Then put the active chunk on a high performance file system and the > > archive tablespace on a compressed/slow/cheap file system and you're > > done. Allow even the archive chunk to be updateable, and put new tuple > > data in the active chunk. It would work just fine for cases where the > > old data is rarely updated/deleted... > > You can't update a table on a read-only (write-once) partition, at least > not with current header structure. OK, but that's what I'm challenging, why do you need a write once partition ? You mean by that tapes ? OK, it means I was thinking in completely different usage scenarios then... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 13:44 +0100, Csaba Nagy wrote: > Another advantage I guess would be that active data would more likely > stay in cache, as updated records would stay together and not spread > over the inactive. And I forgot to mention that vacuum could mostly skip the archive part, and only vacuum the active part, which would drastically reduce the cost of vacuuming big & active tables. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] WORM and Read Only Tables (v0.1)
On Tue, 2007-12-11 at 11:12 +, Simon Riggs wrote: > Features > - Read Only Tables > - Compressed Tablespaces I wonder if instead of read-only tables wouldn't it be better to have some kind of automatic partitioning which permits to have different chunks of the table data in different tablespaces, and a freeze command which effectively moves the data from the (normally small) active chunk to the archive chunk when it's transaction id is older than a predefined threshold ? Then put the active chunk on a high performance file system and the archive tablespace on a compressed/slow/cheap file system and you're done. Allow even the archive chunk to be updateable, and put new tuple data in the active chunk. It would work just fine for cases where the old data is rarely updated/deleted... Another advantage I guess would be that active data would more likely stay in cache, as updated records would stay together and not spread over the inactive. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Ordered Append Node
On Fri, 2007-11-23 at 12:36 +, Gregory Stark wrote: > I also did an optimization similar to the bounded-sort case where we check if > the next tuple from the same input which last contributed the result record > comes before the top element of the heap. That avoids having to do an insert > and siftup only to pull out the same record you just inserted. In theory this > is not an optimization but in practice I think partitioned tables will often > contain contiguous blocks of key values and queries will often be joining > against that key and therefore often want to order by it. If it is an option, you could also do this by a new method on the heap which adds a new entry and removes the resulting new head in one atomic operation. That would work with one single comparison for the less than current head situation, and it would not need to repeat that comparison if that fails. Also it could directly remove the head and balance the tree in one go. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Feature Freeze date for 8.4
On Tue, 2007-10-23 at 11:00 +0200, Rafael Martinez wrote: > We are always 1 year back the main release. We are testing and planing > the move to 8.2 now, and it won't happen until desember. In a 6 month > cycle we will have to jump over every second release. We here are also just in the process of upgrading to 8.2, so in principle I agree... except nobody is forced to go through ALL versions - skipping over every second release is perfectly OK... but: shorter releases would allow people to install only releases which actually bring performance/functionality improvements they really need. All other releases can be skipped... in fact you could possibly go for installing a specific release only on some of the DBs based on what you need there. The main question is if shorter release will impact quality - from a user point of view (and I can only give my opinion in this respect) I don't see any other possible disadvantage... Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Including Snapshot Info with Indexes
On Tue, 2007-10-09 at 11:22 -0400, Andrew Dunstan wrote: > > Csaba Nagy wrote: > > You mean postgres should check your function if it is really immutable ? > > I can't imagine any way to do it correctly in reasonable time :-) > I would say that in the general case it's analogous to the halting > problem, not solvable at all let alone in any reasonable time. In the light of Florian's mail, I would say that in the context of a language which can check each of it's constructs if it is immutable or not, a procedure using only immutable constructs should be itself immutable... the halting problem is avoided in that you don't really need to know if/how the procedure works, you only need to know that it will always work the same ;-) The problem is that in the general case the languages don't have available checks for this kind of thing, so either you restrict the immutability check to simple languages ("static enough" as Florian would say) or you must allow the user to decide if the function is immutable or not. In the general case I assume the users will want the power to decide (and potentially be wrong), and will expect that if they do mistake, the result won't be catastrophic. I guess this is the same conclusion as in previous threads about the subject... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
> I think you're overly pessimistic here ;-) This classification can be done > quite > efficiently as long as your language is "static enough". The trick is not to > execute the function, but to scan the code to find all other functions and > SQL > statements a given function may possibly call. If your function calls no SQL > statements, and only other functions already marked IMMUTABLE, then it must > be > IMMUTABLE itself. OK, I have a "black-box" mindset right now due to the problem I'm currently working on, so I didn't even think about checking the source code of the function (which is the right thing to do if you have the source code)... in which case you're right, I was overly pessimistic :-) Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Including Snapshot Info with Indexes
[snip] > In the case of User-Defined functions, the user should be defining it > as Deterministic. The user CAN already define his functions as "Deterministic=IMMUTABLE"... the problem is that many of us will define functions as immutable, when in fact they are not. And do that by mistake... and there's nothing postgres can do about that. > Can we frame a set of guidelines, or may be some test procedure, which > can declare a certain function as deterministic? You mean postgres should check your function if it is really immutable ? I can't imagine any way to do it correctly in reasonable time :-) Imagine a function of 10 parameters which returns the sum of the parameters all the time except for parameters all 1 it will randomly return a value _once in a thousand executions_... please find a generic algorithm which spots this function as not immutable in reasonable execution time ;-) So this example is a bit extreme, but don't underestimate the user ;-) > I am just saying from the top of my mind. Even otherwise, if we can > even restrict this indexing to only Built-in deterministic functions., > don't you think it would help the cause of a majority? I have just > made the proposal to create the index with snapshot a optional one. Restrictions like this are always confusing for the end user (i.e. why can I use built-ins here and not my own ?). I leave to the actual coders to say anything about code maintenance concerns... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Including Snapshot Info with Indexes
On Mon, 2007-10-08 at 09:40 +0100, Heikki Linnakangas wrote: > This idea has been discussed to death many times before. Please search > the archives. Somewhat related to the "visibility in index" thing: would it be possible to have a kind of index-table ? We do have here some tables which have 2-4 fields, and the combination of them forms the primary key of the table. There are usually no other indexes on the table, and the net result is that the PK index duplicates the heap. So it would be cool if the index would be THE heap somehow... The most prominent example of this in our DBs is this table: db> \d table_a Table "public.table_a" Column | Type | Modifiers ---++--- id_1| bigint | not null id_2| bigint | not null Indexes: "pk_table_a" PRIMARY KEY, btree (id_1, id_2) db> select reltuples::bigint, relpages from pg_class where relname='table_a'; reltuples | relpages ---+-- 445286464 | 710090 (1 row) db> select reltuples::bigint, relpages from pg_class where relname='pk_table_a'; reltuples | relpages ---+-- 445291072 | 599848 (1 row) This postgres install is compiled with 32K page size (for the ones who wonder about the page counts). In any case, it is clear that the index basically duplicates the heap... Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] GUC variable renaming, redux
On Mon, 2007-09-24 at 10:55 -0700, Joshua D. Drake wrote: > IMO, monitor_ seems weird versus track_. To me monitor implies actions > to be taken when thresholds are met. PostgreSQL doesn't do that. > PostgreSQL tracks/stores information for application to monitor or > interact with and those application may doing something based on the > tracked information. Not that my opinion would count too much on hackers, but "track" sounds better to me too, seems more to the point, not to mention it's shorter too. +1 "track" Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
> In other words, if I can assure that data exported and then imported > will always, under all circumstances, compare the same to the original, > would that be enough of a requirement? In other words, if I offer a > format that is assured of preserving both mantissa and exponent > precision and range, as well as all extra attributes (+/-Infinity and > NaN), but does not guarantee that the semantically identical constructs > are told apart (+0 vs. -0, and the different NaNs), would that format be > acceptable? If you care about the +/- for +/-Infinity, you must also care about +/-0 too, so you get the right type of infinity if you divide with 0... so +0 and -0 are far from being semantically identical. Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Feature freeze progress report
On Thu, 2007-05-03 at 13:51, Bruce Momjian wrote: > I believe the problem is not that there isn't enough information, but > not enough people able to do the work. Seeking solutions in areas that > aren't helping was the illustration. Yes Bruce, but you're failing to see that a more structured information infrastructure will attract more people to do the work which could eventually solve the problem you're having in the first place (contradicting your argument that it won't help), at the cost of some possible additional work (which I actually think you're overestimating the amount of - it's probably more like a learning curve than actual additional work). The fact that there is enough information is not relevant, it must be in the right place too - too much information or hard to find information is sometimes worst than no information. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Feature freeze progress report
> We have _ample_ evidence that the problem is lack of people able to > review patches, and yet there is this discussion to track patches > better. It reminds me of someone who has lost their keys in an alley, > but is looking for them in the street because the light is better there. Bruce, I guess the analogy fails on the fact that you're not looking for a key, but for people, and I thought "better light" will attract people to find you instead of you to need to search... I'm an outsider regarding postgres development, so my opinion does not count, but the gut feeling is that more light attracts better people. Not to mention it can help people get better... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Last chance to object to MVCC-safe CLUSTER
On Sat, 2007-04-07 at 18:09, Tom Lane wrote: > Awhile back Csaba Nagy <[EMAIL PROTECTED]> wrote: > > Making cluster MVCC-safe will kill my back-door of clustering a hot > > table while I run a full DB backup. > > Are we agreed that the TRUNCATE-based workaround shown here > http://archives.postgresql.org/pgsql-hackers/2007-03/msg00606.php > is an adequate response to this objection? That workaround should actually work. It is more work but the desired goal is achieved. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CREATE INDEX and HOT - revised design
> speaking with pavan off list he seems to think that only 'create > index' is outside transaction, not the other ddl flavors of it because > they are generally acquiring a excl lock. so, in that sense it is > possibly acceptable to me although still a pretty tough pill to > swallow (thinking, guc time). It would also preclude ever integrating > vanilla 'create index' to create table command, fwiw. Just to signal that it is in use: we did use create index in transactions occasionally when we had to do DB schema upgrade on production systems for application upgrades which span multiple versions of our application (normally we upgrade versions one by one, but we have some systems which are upgraded rarely). In these occasions it was riskier than usually to run the cumulated upgrade scripts outside a transaction block. But that was mostly a convenience feature, we could always rearrange our upgrade scripts to do all the rest first and then all the index creation at the end if all the rest succeeded... but if implicit index creation fails (e.g. when adding a new field to a table which happens also to be a primary key) inside the transaction, that would hurt... mostly in more work/more risks of extended downtime, but it will have a factor of inconvenience. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Stats for multi-column indexes
This should read: > Considering that the FK part is unique, the ^^PK^^ > skewness in the relationship is completely determined by the FK part's > histogram. That would give at least a lower/upper bound and MCVs to the > relationship. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Stats for multi-column indexes
On Tue, 2007-03-20 at 18:12, Josh Berkus wrote: > Tom, > > > Actually, I think you don't particularly need stats for that in most > > cases --- if the planner simply took note that the FK relationship > > exists, it would know that each row of the FK side joins to exactly > > one row of the PK side, which in typical cases is sufficient. > > Is it? What about the other direction? Currently, doesn't the planner > assume that the rowcount relationship is 1 to ( child total rows / > parent total rows) ? That's ok for tables with relatively even > distribution, but not for skewed ones. Wouldn't that be improved if the MCVs/histogram of the FK column are taken into account ? Considering that the FK part is unique, the skewness in the relationship is completely determined by the FK parts histogram. That would give at least a lower/upper bound and MCVs to the relationship. Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Thu, 2007-03-15 at 17:01, A.M. wrote: > It seems to me that postgresql is especially well-suited to run DDL > at runtime, so what's the issue? The issue is that some applications are not well suited to run DDL at runtime :-) As I already mentioned in another post in this thread, our application also has a requirement of user defined fields in one table. Problem is, that table is so accessed in peak hours, that it is simply impossible to take an exclusive lock on it without causing an extended perceived downtime of the application. And guess what, users will always want to add new fields in peak hours... We did solve this in our case with some application logic, but a generic solution would be nice ;-) Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, 2007-03-14 at 16:50, David Fetter wrote: > On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: > > "David Fetter" <[EMAIL PROTECTED]> writes: > > > > > CREATE TABLE symptom ( > > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > > ... > > > ); > > > > > > CREATE TABLE patient_presents_with ( > > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > > UNIQUE(patient_id, symptom_id) > > > ); > > > > I'm just glad I don't have your doctor. I hope mine doesn't think > > symptoms are all boolean values. > > What's in the symptom table is up to the doctor. OK, and here's the problem: each doctor might want to put something else in the symptom table. Each doctor might want to do it in a type safe way, e.g. so that the application enforces an enumeration of "high/moderate/low" for the symptom fever (or maybe another doctor wants it in exact degrees)... you can all stuff it in a string field, but you know how reliable that will be. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Wed, 2007-03-14 at 16:08, [EMAIL PROTECTED] wrote: > On Wed, Mar 14, 2007 at 02:28:03PM +, Gregory Stark wrote: > > "David Fetter" <[EMAIL PROTECTED]> writes: > > > CREATE TABLE symptom ( > > > symptom_id SERIAL PRIMARY KEY, /* See above. */ > > > ... > > > ); > > > > > > CREATE TABLE patient_presents_with ( > > > patient_id INTEGER NOT NULL REFERENCES patient(patient_id), > > > symptom_id INTEGER NOT NULL REFERENCES symptom(symptom_id), > > > UNIQUE(patient_id, symptom_id) > > > ); > > > > I'm just glad I don't have your doctor. I hope mine doesn't think symptoms > > are > > all boolean values. > > Where is the boolean above? It is M:N, with each having whatever data > is required. The boolean is assumed in the symptoms table. In any case, even if it's not a boolean value, even if maybe the symptoms table is a complex one on it's own, it still is one single type for all symptoms of all patients. The real problem is that in some real world applications you have a mix of wildly varying types of attributes a user might want to use, and you can't know what those will be beforehand... the symptoms thing is simple to solve in the way David did it, but there really are other situations which a simple m:n can't easily cover. How would you handle a data base of user settings for 10K different applications and 100M different users where each application must be able to store it's own (type safe !!) settings in the same structure, and applications come and go with their own settings ? Come up with a good solution to this combined with queries like "give me all the users who have this set of settings set to these values" running fast, and then you're talking. Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] My honours project - databases using dynamically attached entity-properties
On Tue, 2007-03-13 at 00:43, Richard Huxton wrote: > Josh Berkus wrote: > > I really don't see any way you could implement UDFs other than EAV that > > wouldn't be immensely awkward, or result in executing DDL at runtime. > > What's so horrible about DDL at runtime? Obviously, you're only going to > allow specific additions to specific schemas/tables, but why not? Well, exclusively locking the table for DDL is not always possible in production systems. We also shortly had a solution where we added new columns on the fly, and we had to ditch it... it was simply not working. The users were usually adding columns in the peek business hours, and in the same hours it was simply impossible to take an exclusive lock on that table. I think DDL will actually also exclusively lock parent tables of FK relationships (I might be mistaken here, but I think I have seen such lock), which is even worse. After it caused extensive perceived downtime for hours, we simply redesigned the feature so that the users need now to announce beforehand how many different types of new columns they will need and we just create a few extra of them, and assign them to the users needs as they need it... the unused columns stay null and hopefully don't have too big overhead, but it's the simplest solution we found which actually works. When the user runs out of spare columns, the admin will create some new spares in quiet hours. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Interaction of PITR backups and Bulk operationsavoiding WAL
On Fri, 2007-03-09 at 17:47, Tom Lane wrote: > I don't think that people are very likely to need to turn archiving on > and off on-the-fly. We did need occasionally to turn archiving on on-the-fly. It did happen that I started up a new DB machine and I did not have yet the log archive available, so I had to wait with configuring that, but the machine went on-line before the archive machine was ready... and then later I had to switch on archiving. It was very convenient that I could do it without a restart. It's true that has been rare occasion, more often you just need to change the archive command (e.g. to archive to a different location if the archive repository goes down). It's somewhat moot for us as we changed to use Slony (which is a heavy beast but once it works it's great). Cheers, Csaba. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CLUSTER and MVCC
> Hmm. You could use something along these lines instead: > > 0. LOCK TABLE queue_table > 1. SELECT * INTO queue_table_new FROM queue_table > 2. DROP TABLE queue_table > 3. ALTER TABLE queue_table_new RENAME queue_table > > After all, it's not that you care about the clustering of the table, you > just want to remove old tuples. ... and then restart the app so all my pooled connections drop their cached plans ;-) Seriously, that won't work. If a session tries to insert a new row after I lock the table to clean it up, I still want it to be able to insert after the cleanup is finished... if I drop the table it tries to insert to, it will fail. > As a long term solution, it would be nice if we had more fine-grained > bookkeeping of snapshots that are in use in the system. In your case, > there's a lot of tuples that are not visible to pg_dump because xmin is > too new, and also not visible to any other transaction because xmax is > too old. If we had a way to recognize situations like that, and vacuum > those tuples, much of the problem with long-running transactions would > go away. In the general case that won't work either in a strict MVCC sense... if you have an old transaction, you should never clean up a dead tuple which could be still visible to it. > > Wouldn't be possible to do it like Simon (IIRC) suggested, and add a > > parameter to enable/disable the current behavior, and use the MVCC > > behavior as default ? > > I guess we could, but I don't see why should encourage using CLUSTER for > that. A more aggressive, MVCC-breaking version of VACUUM would make more > sense to me, but I don't like the idea of adding "break-MVCC" flags to > any commands. Well, if there would be any other way to avoid the table bloat I would agree. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CLUSTER and MVCC
On Fri, 2007-03-09 at 14:00, Alvaro Herrera wrote: > But I'm not really seeing the problem here. Why isn't Csaba's problem > fixed by the fact that HOT reduces the number of dead tuples in the > first place? If it does, then he no longer needs the CLUSTER > workaround, or at least, he needs it to a much lesser extent. Is this actually true in the case of HOT + long running transactions ? I was supposing HOT has the same problems in the presence of long running transactions... Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CLUSTER and MVCC
On Fri, 2007-03-09 at 13:42, Gregory Stark wrote: > "Csaba Nagy" <[EMAIL PROTECTED]> writes: > > > Wouldn't be possible to do it like Simon (IIRC) suggested, and add a > > parameter to enable/disable the current behavior, and use the MVCC > > behavior as default ? > > Doing it in CLUSTER would be weird. However perhaps it would be useful to have > some sort of stand-alone tool that just bumped all the xmin/xmax's. It would > have to be super-user-only and carry big warning labels saying it breaks MVCC. Well, the current behavior of CLUSTER is just perfect for what I'm using it. If anything else would do the job, I would be happy to use it instead... > But it would be useful any time you have a table that you want to exempt a > particular table from serializable snapshots. Basically a per-table way to > force a read-committed snapshot on. Though, actually it's not quite a > read-committed snapshot is it? Anyone using an old serializable snapshot will > see what, no tuples at all? I'm afraid what I need has nothing to do with serializable snapshots... I still want the table to be completely transactional except if somebody can get an exclusive lock on it, it can be compacted regardless of other running transactions. I'm not sure how to express this in other way... it means something like: no transaction cares about the content of the table until it gets some kind of lock on it. In other words the table's state is not connected with the state of other tables until I actually do something on it... Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CLUSTER and MVCC
On Fri, 2007-03-09 at 12:29, Heikki Linnakangas wrote: > Csaba, you mentioned recently > (http://archives.postgresql.org/pgsql-hackers/2007-03/msg00027.php) that > you're actually using the MVCC-violation to clean up tables during a > backup. Can you tell us a bit more about that? Would you be upset if we > shut that backdoor? My use case: a queue-like table (in fact a 'task' table) which is very frequently inserted/updated/deleted. This table tends to be bloated in the presence of any long running transaction... the only transactional behavior we need from this table is to make sure that when we insert something in this table in a transaction (possibly together with other actions) and then commit/rollback, it commits/rolls back the insert. CLUSTER's violation of MVCC does not affect this, as CLUSTER will not be able to lock the table if another transaction inserted something in it (the inserting transaction will have a lock on the table). Selections on this table are not critical for us, it just doesn't matter which job processor is getting which task and in what order... (actually it does matter, but CLUSTER won't affect that either). So what I do is execute CLUSTER once in 5 minutes on this table. This works just fine, and keeps the table size small even if I have long running transactions in progress. The DB backup is one of such unavoidable long running transactions, and I use the table exclusion switch to exclude this task table from the backup so it won't get locked by it and let CLUSTER still do it's job (I had a rudimentary patch to do this even before the feature was introduced to pg_dump). The table can be dumped separately which is a brief operation, but I would have anyway to clear it on a crash... Now I could try and disable the CLUSTER cron job and see if i get problems, as last it was disabled with postgres 7.4, maybe something changed in between... but I can tell for sure that last time I enabled it it really fixed our load on the DB server... Wouldn't be possible to do it like Simon (IIRC) suggested, and add a parameter to enable/disable the current behavior, and use the MVCC behavior as default ? Cheers, Csaba. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Revitalising VACUUM FULL for 8.3
On Thu, 2007-03-01 at 13:56, Simon Riggs wrote: > > Wouldn't this be deadlock prone ? What if a non-utility transaction > > (which could even be started before the vacuum full) blocks on the table > > being vacuumed, then if the vacuum wants to wait until all non-utility > > transactions finish will deadlock. > > Exactly the same as CREATE INDEX CONCURRENTLY, which has a much more > frequent use case than VACUUM FULL does, even after I've made the > proposed changes. > > The situation, as I understand it, would be that the utility command > waits on another transaction to complete. As soon as that other > transaction touches the locked table it will detect a simple deadlock > and the non-utility statement will abort. Fair enough. > > > The utility transaction concept would make new VACUUM FULL MVCC-safe and > > > would also make most executions of CLUSTER MVCC-safe also (the implicit > > > top-level transaction cases). > > > > Making cluster MVCC-safe will kill my back-door of clustering a hot > > table while I run a full DB backup. > > Wow. I'll take that as a request for a NOWAIT option on utility > commands, rather than a suggestion that we shouldn't strive to make > things safe in the default case. Yes please... if the current behavior is possible to be triggered (by that NOWAIT for eg.), it would be actually good to have the MVCC behavior as default. Thanks, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Revitalising VACUUM FULL for 8.3
On Thu, 2007-03-01 at 13:02, Simon Riggs wrote: > I would like to introduce the concept of utility transactions. This is > any transaction that touches only one table in a transaction and is not > returning or modifying data. All utility transactions wait until they > are older than all non-utility transactions before they commit. A > utility transaction would currently be any VACUUM, VACUUM FULL and > CREATE INDEX CONCURRENTLY. That is safe because each of those commands > executes in its own transaction and doesn't touch more than one table at > a time. Once each knows there is no chance of being interfered with, it > can continue its work and commit. This technique is already in use for > CREATE INDEX CONCURRENTLY, so just needs to be extended to all other > utilities - but in a way that allows them to recognise each other. This > extends upon the thought that VACUUMs already recognise other VACUUMs > and avoid using them as part of their Snapshot. Wouldn't this be deadlock prone ? What if a non-utility transaction (which could even be started before the vacuum full) blocks on the table being vacuumed, then if the vacuum wants to wait until all non-utility transactions finish will deadlock. > The utility transaction concept would make new VACUUM FULL MVCC-safe and > would also make most executions of CLUSTER MVCC-safe also (the implicit > top-level transaction cases). Making cluster MVCC-safe will kill my back-door of clustering a hot table while I run a full DB backup. Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] New feature request: FlashBack Query
> Do 97% of transactions commit because Oracle has slow rollbacks and > developers are working around that performance issue, or because they > really commit? > > I have watched several developers that would prefer to issue numerous > selects to verify things like foreign keys in the application in > order to avoid a rollback. Most of the code we have will not afford a rollback because it can be part of a much bigger transaction which would have much higher performance penalty if retried than a simple rollback. And you know that in postgres you can't roll back just the last insert, you will crash the whole transaction with it... and it's simply a performance bottleneck to retry in a high contention scenario (which is usually so in our case). So I would say we don't avoid rollbacks because of the cost of the rollback, but because of the cost of the retry... Cheers, Csaba. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] autovacuum next steps
> One option that I've heard before is to have vacuum after a single iteration > (ie, after it fills maintenance_work_mem and does the index cleanup and the > second heap pass), remember where it was and pick up from that point next > time. >From my experience this is not acceptable... I have tables for which the index cleanup takes hours, so no matter how low I would set the maintenance_work_mem (in fact I set it high enough so there's only one iteration), it will take too much time so the queue tables get overly bloated (not happening either, they get now special "cluster" treatment). Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)
On Thu, 2007-02-08 at 17:47, Marc Munro wrote: > [snip] One of the causes of deadlocks in Postgres is that its referential > integrity triggers can take locks in inconsistent orders. Generally a > child record will be locked before its parent, but not in all cases. [snip] The problem is that eliminating the deadlock is still not the complete cake... the interlocking still remains, possibly leading to degraded performance on high contention on very common parent rows. The real solution would be when an update on the parent table's non-referenced fields is not interlocking at all with updates of the child rows... and I think there were some proposals to do that. In fact one possibility to avoid this problem is vertical partitioning, i.e. separating the non-key columns in a parallel table and updating them there. However this is a choice only when you know it beforehand and you're not inheriting the schema from other DBs... Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Referential Integrity and SHARE locks
> You say below the cut that you're not updating keys, so presumably it's > other columns. Which leads me to something I've wondered for a while - > why do we lock the whole row? Is it just a matter of "not optimised that > yet" or is there a good reason why locking just some columns isn't > practical. For the conditions of generating the deadlock, see: http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php The reason of the occasional orphan rows is not completely clear to me, but it must be some kind of race condition while inserting/deleting/?updating concurrently the parent/child tables. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2007-02-02 at 10:51, Simon Riggs wrote: [snip] > Why do we need a SHARE lock at all, on the **referenc(ed)** table? > > It sounds like if we don't put a SHARE lock on the referenced table then > we can end the transaction in an inconsistent state if the referenced > table has concurrent UPDATEs or DELETEs. BUT those operations do impose > locking rules back onto the referencing tables that would not be granted > until after any changes to the referencing table complete, whereupon > they would restrict or cascade. So an inconsistent state doesn't seem > possible to me. > > What am I missing? Well, here we do have a patch (deployed on production servers) which does not put the shared lock on the referenced table, and it lets in occasionally rows in the referencing tables which do not have parent rows in the referenced table. I'm not sure what is the mechanism, but it does happen, I can assure you. It happens rare enough that is not disturbing for us, compared to the deadlocks which happen without the patch - that's another matter... In our application we never update any key ids, so only deletes/inserts come in play, and I guess it happens when a referenced row is deleted just between a newly inserted child row checks that the parent row exists and the row is really inserted. Or something like that... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] savepoint improvements
On Sat, 2007-01-20 at 18:08, Merlin Moncure wrote: [snip] > To be honest, I'm not a huge fan of psql tricks (error recovery being > another example) but this could provide a solution. in your opnion, > how would you use \if to query the transaction state? Wouldn't it make sense to introduce instead something like: \set language plpgsql ... and then redirect to plpgsql all you type ? That would give you the possibility to execute things in your favorite language directly from psql without creating a function. Cheers, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Frequent Update Project: Design Overview ofHOTUpdates
[snip] > IMHO *most* UPDATEs occur on non-indexed fields. [snip] > > If my assumption is badly wrong on that then perhaps HOT would not be > useful after all. If we find that the majority of UPDATEs meet the HOT > pre-conditions, then I would continue to advocate it. Just to confirm that the scenario is valid: our application has almost all it's updates affecting only non-indexed columns. There are a few exceptions, but the vast majority is non-indexed, and that holds to the execution frequency too, not just for the count of tables/queries. Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Deadlock with pg_dump?
On Fri, 2006-10-27 at 09:23, Albe Laurenz wrote: > > [ Memo to hackers: why is it that log_min_error_statement = error > > isn't the default? ] > > To avoid spamming the logs with every failed SQL statement? And it would be hurting applications where query failure is taken as a valid path (as inserting first and update if failing)... Cheers, Csaba. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match