Re: [HACKERS] Proposal: Commit timestamp
On Thu, 8 Feb 2007, Joshua D. Drake wrote: > > Well how deep are we talking here? My understanding of what Jan wants to > do is simple. > > Be able to declare which triggers are fired depending on the state of > the cluster. > > In Jan's terms, the Origin or Subscriber. In Replicator terms the Master > or Slave. > > This is useful because I may have a trigger on the Master and the same > trigger on the Slave. You do not want the trigger to fire on the Slave > because we are doing data replication. In short, the we replicate the > result, not the action. > > However, you may want triggers that are on the Slave to fire separately. > A reporting server that generates materialized views is a good example. > Don't tie up the Master with what a Slave can do. > It'd be great if Jan considers the blending of replication; any given DB instance shouldn't be only a master/originator or only a slave/subscriber. A solution that lets you blend replication strategies in a single db is, from my point of view, very important. > > I have no clue what got you into what you are doing here. Jan, some sleep now and then might be helpful to your public disposition. -smile- peace, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 [EMAIL PROTECTED], http://ScienceTools.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Full page writes improvement
Full_page_compress is not intended to use with PITR slave, but for the case to keep both online backup and archive log for archive recovery, which is very popular PostgreSQL operation now. I've just posted my evaluation for the patch as a reply for another thread of the same proposal (sorry, I created new thread because old one seemed not good). It compares log compression with gzip case. Also, our proposal can combine with gzip. It's overall overhead is slightly less than just copying WAL using cat. As a result, my proposal does not include serious overhead. Please refer to the thread "Archive log compression keeping physical log available in the crash recovery". I appreciate further opinion/comment on this. I'd like to have more suggestion which evaluation is useful. I've posted two (archive and restore) commands and a small patch. These two commands can be treated as contrib and the patch itself does work if WAL is simply copied to the archive directory. Regards; Koichi Suzuki Tom Lane wrote: > Koichi Suzuki <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> Doesn't this break crash recovery on PITR slaves? > >> Compressed archive log contains the same data as full_page_writes off >> case. So the influence to PITR slaves is the same as full_page_writes off. > > Right. So what is the use-case for running your primary database with > full_page_writes on and the slaves with it off? It doesn't seem like > a very sensible combination to me. > > Also, it seems to me that some significant performance hit would be > taken by having to grovel through the log files to remove and re-add the > full-page data. Plus you are actually writing *more* WAL data out of > the primary, not less, because you have to save both the full-page > images and the per-tuple data they normally replace. Do you have > numbers showing that there's actually any meaningful savings overall? > > regards, tom lane > -- Koichi Suzuki ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] How can I use 2GB of shared buffers on Windows?
From: "Magnus Hagander" <[EMAIL PROTECTED]> > hnetcfg.dll is a part of Windows. "Home Networking Configuration > Manager". LPK.DLL is also a part of Windows - it's the language pack. Thank you for information. > On Thu, Feb 08, 2007 at 09:50:26PM +0900, Takayuki Tsunakawa wrote: >> When I try to start PostgreSQL 8.2.1 on Windows 2003 Server with >> shared_buffers=1024MB, I get the following error messages in the Event >> Log (with log_min_messages=debug5) and can't start PostgreSQL: > > Is this for testing, or for production? From what I've heard, you would > normally never want that much shared memory - I've seen more reports on > taht you shuld keep it as low as possible, really. For performance > reasons. For testing. I wanted to place all data in shared buffers to eliminate reads from disk while I run pgbench repeatedly (actually most reads should come from kernel cache, though.) Does PostgreSQL for Windows have any problem when using a large database cache unlike UNIX versions? I'm excited about your current great work to enable building all of PostgreSQL with MSVC. I thought you are aiming at making PostgreSQL 64-bit on Windows in the near future (though you may not have signified in ML.) I'm afraid MinGW will not extend to 64-bit (for x64 and Itanium) at least reliably and immediately, due to the difference of data model -- 'long' is still 32-bit in 64-bit applications on Windows. I thought Magnus-san got worried about it and started the activity of completely switching to MSVC. BTW, the current PostgreSQL for Windows is very slow, isn't it? I compared the performance of PostgreSQL 8.2.x for Linux (RHEL4 for x86, kernel 2.6.x) and Windows Server 2003. I ran 'pgbench -c32 -t500' on the same machine with the same disk layout for data files and WAL, i.e. they are stored on separate disks. The settings in postgresql.conf is the same, except for wal_sync_method -- it is set to open_sync on Linux and open_datasync on Windows, because they are the best for each platform. Linux version shows 1100 tps, but Windows version shows only 450 tps. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Archive log compression keeping physical log available in the crash recovery
Sorry for the late responce; Gzip can reduce the archive log size about one fourth. My point is that it can still be large enough.Removing physical log record (by replacing them with logical log record) from archive log will achieve will shrink the size of the archive log to one twentieth, in the case of pgbehcn test about ten hours (3,600,000 transactions) with database size about 2GB. In the case of gzip, maybe becuase of higher CPU load, total throughput for gzip is less than just copying WAL to archive. In our case, throughput seems to be slightly higher than just copying (preserving physical log) or gzip. I'll gather the meaturement result and try to post. The size of archive log seems not affected by the size of the database, but just by the number of transactions. In the case of full_page_writes=on and full_page_compress=on, compressed archive log size seems to be dependent only on the number of transactions and transaction characteristics. Our evaluation result is as follows: Database size: 2GB WAL size (after 10hours pgbench run): 48.3GB gzipped size: 8.8GB removal of the physical log: 2.36GB fullpage_writes=off log size: 2.42GB The reason why archive log size of our case is slightly smaller than full_page_writes=off is because we remove not only the physical logs but also each page header and the dummy part at the tail of each log segment. Further, we can apply gzip to this archive (2.36GB). Final size is 0.75GB, less than one sixtieth of the original WAL. Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec, and our compression to 2.36GB needed about 1010sec, slightly less than just cat command (1386sec). When gzip is combined with our compression (48.3GB to 0.75GB), total duration was about 1330sec. This shows that phyiscal log removal is good selection for the following case: 1) Need same crash recovery possibility as full_page_writes=on, and 2) Need to shrink the size of archive log for loger period to store. Of course, if we care crash recovery in PITR slave, we still need physical log records in archive log. In this case, because archive log is not intended to be kept long, its size will not be an issue. I'm planning to do archive log size evalutation with other benchmarks such as DBT-2 as well. Materials for this has already been thrown to HACKERS and PATCHES. I hope you try this. Jim Nasby wrote: I thought the drive behind full_page_writes = off was to reduce the amount of data being written to pg_xlog, not to shrink the size of a PITR log archive. ISTM that if you want to shrink a PITR log archive you'd be able to get good results by (b|g)zip'ing the WAL files in the archive. I quick test on my laptop shows over a 4x reduction in size. Presumably that'd be even larger if you increased the size of WAL segments. On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote: This is a proposal for archive log compression keeping physical log in WAL. In PotgreSQL 8.2, full-page_writes option came back to cut out physical log both from WAL and archive log. To deal with the partial write during the online backup, physical log is written only during the online backup. Although this dramatically reduces the log size, it can risk the crash recovery. If any page is inconsisitent because of the fault, crash recovery doesn't work because full page images are necessary to recover the page in such case. For critical use, especially in commercial use, we don't like to risk the crash recovery chance, while reducing the archive log size will be crucial too for larger databases.WAL size itself may be less critical, because they're reused cyclickly. Here, I have a simple idea to reduce archive log size while keeping physical log in xlog: 1. Create new GUC: full_page_compress, 2. Turn on both the full_page_writes and full_page_compress: physical log will be written to WAL at the first write to a page after the checkpoint, just as conventional full_page_writes ON. 3. Unless physical log is written during the online backup, this can be removed from the archive log. One bit in XLR_BKP_BLOCK_MASK (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only three of them are in use) and this mark can be set in XLogInsert(). With the both full_page_writes and full_page_compress on, both logical log and physical log will also be written to WAL with XLR_BKP_REMOVABLE flag on. Having both physical and logical log in a same WAL is not harmful in the crash recovery. In the crash recovery, physical log is used if it's available. Logical log is used in the archive recovery, as the corresponding physical log will be removed. 4. The archive command (separate binary), removes physical logs if XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a minumum information of very small size, which is used to restore the physical log to keep other log records's LSN consistent. 5. The restore command (sep
Re: [HACKERS] Variable length varlena headers redux
Uh, I thought the approach was to create type-specific in/out functions, and add casting so every time there were referenced, they would expand to a varlena structure in memory. --- Gregory Stark wrote: > > I've been looking at this again and had a few conversations about it. This may > be easier than I had originally thought but there's one major issue that's > bugging me. Do you see any way to avoid having every user function everywhere > use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP? > > The two approaches I see are either > > a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are > for constructing new tuples and behaves exactly as it does now. So you always > construct a four-byte header datum. Then in heap_form*tuple we check if you > can use a shorter header and convert. VARDATA/VARSIZE would be for looking at > existing datums and would interpret the header bits. > > This seems very fragile since one stray call site using VARATT_DATA to find > the data in an existing datum would cause random bugs that only occur rarely > in certain circumstances. It would even work as long as the size is filled in > with VARATT_SIZEP first which it usually is, but fail if someone changes the > order of the statements. > > or > > b) throw away VARATT_DATA and VARATT_SIZEP and make all user function > everywhere change over to a new macro api. That seems like a pretty big > burden. It's safer but means every contrib module would have to be updated and > so on. > > I'm hoping I'm missing something and there's a way to do this without breaking > the api for every user function. > > -- Start of included mail From: Tom Lane <[EMAIL PROTECTED]> > To: Gregory Stark <[EMAIL PROTECTED]> > cc: Gregory Stark <[EMAIL PROTECTED]>, Bruce Momjian <[EMAIL PROTECTED]>, > Peter Eisentraut <[EMAIL PROTECTED]>, > pgsql-hackers@postgresql.org, > Martijn van Oosterhout > Subject: Re: [HACKERS] Fixed length data types issue > Date: Mon, 11 Sep 2006 13:15:43 -0400 > Lines: 64 > Xref: stark.xeocode.com work.enterprisedb:683 > Gregory Stark <[EMAIL PROTECTED]> writes: > > In any case it seems a bit backwards to me. Wouldn't it be better to > > preserve bits in the case of short length words where they're precious > > rather than long ones? If we make 0xxx the 1-byte case it means ... > > Well, I don't find that real persuasive: you're saying that it's > important to have a 1-byte not 2-byte header for datums between 64 and > 127 bytes long. Which is by definition less than a 2% savings for those > values. I think its's more important to pick bitpatterns that reduce > the number of cases heap_deform_tuple has to think about while decoding > the length of a field --- every "if" in that inner loop is expensive. > > I realized this morning that if we are going to preserve the rule that > 4-byte-header and compressed-header cases can be distinguished from the > data alone, there is no reason to be very worried about whether the > 2-byte cases can represent the maximal length of an in-line datum. > If you want to do 16K inline (and your page is big enough for that) > you can just fall back to the 4-byte-header case. So there's no real > disadvantage if the 2-byte headers can only go up to 4K or so. This > gives us some more flexibility in the bitpattern choices. > > Another thought that occurred to me is that if we preserve the > convention that a length word's value includes itself, then for a > 1-byte header the bit pattern 1000 is meaningless --- the count > has to be at least 1. So one trick we could play is to take over > this value as the signal for "toast pointer follows", with the > assumption that the tuple-decoder code knows a-priori how big a > toast pointer is. I am not real enamored of this, because it certainly > adds one case to the inner heap_deform_tuple loop and it'll give us > problems if we ever want more than one kind of toast pointer. But > it's a possibility. > > Anyway, a couple of encodings that I'm thinking about now involve > limiting uncompressed data to 1G (same as now), so that we can play > with the first 2 bits instead of just 1: > > 00xx 4-byte length word, aligned, uncompressed data (up to 1G) > 01xx 4-byte length word, aligned, compressed data (up to 1G) > 100x 1-byte length word, unaligned, TOAST pointer > 1010 2-byte length word, unaligned, uncompressed data (up to 4K) > 1011 2-byte length word, unaligned, compressed data (up to 4K) > 11xx 1-byte length word, unaligned, uncompressed data (up to 63b) > > or > > 00xx 4-byte length word, aligned, uncompressed data (up to 1G) > 010x 2-byte length word, unaligned, uncompressed data (up to 8K) > 011x 2-byte length word, unaligned, compressed data (up to 8K) > 1000 1-byte length word, u
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom CommandIDs,updated patch
On Thu, 2007-02-08 at 22:15 -0500, Tom Lane wrote: > I'm testing the patch currently. I was a bit surprised to find the > without_oid test failing, but it makes sense because I'm using a > MAXALIGN=8 machine. I suppose Heikki tested on MAXALIGN=4. That's definitely strange. The patch has been performance tested on a MAXALIGN=8 system on tables without OIDs on, but I'm not sure we ran the make check itself on that server. Will do that in future. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal: Commit timestamp
I just want an outline of what each option is supposed to control. If that information is in a documentation patch, then fine, he can just post that and tell people to read the patch documentation. --- Joshua D. Drake wrote: > Jan Wieck wrote: > > On 2/8/2007 3:32 PM, Bruce Momjian wrote: > >> Alvaro Herrera wrote: > >>> > > Is this a new policy that after discussion, all patches must be > > >>> > resubmitted with a summary and conclusions of the discussion? I can > >>> > > certainly do that for you, but just tell me if you are going to > >>> ask the > > same from everyone. > >>> > > No, I am asking only this time because I feel there was too much > >>> > disconnect between the patch and the extensive replication discussion > >>> > that few community members would see the connection. > >>> > >>> FYI, in my opinion the trigger addition is clearly useful to Mammoth > >>> Replicator as well. In fact, it's so obviously useful that I didn't see > >>> a need to state that in the original thread where it was discussed. > >> > >> Right, I know it is useful too, but I would like a layout of what it > >> does and why so everyone is clear on it. > > Well how deep are we talking here? My understanding of what Jan wants to > do is simple. > > Be able to declare which triggers are fired depending on the state of > the cluster. > > In Jan's terms, the Origin or Subscriber. In Replicator terms the Master > or Slave. > > This is useful because I may have a trigger on the Master and the same > trigger on the Slave. You do not want the trigger to fire on the Slave > because we are doing data replication. In short, the we replicate the > result, not the action. > > However, you may want triggers that are on the Slave to fire separately. > A reporting server that generates materialized views is a good example. > Don't tie up the Master with what a Slave can do. > > Sincerely, > > Joshua D. Drake > > > > > > > I have no clue what got you into what you are doing here. But that shall > > not be my real concern. If you feel the need to do this sort of thing, > > be my guest. I will add the remaining changes to pg_rewrite, including > > the new support commands and changes to psql as well as pg_dump and > > resubmit the new patch with explanations that will hopefully help you to > > comprehend what and how this relatively small and fully backward > > compatible change in the trigger and rule firing mechanism will work and > > what existing problems it will solve. > > > > > > Regards, > > Jan > > > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate > PostgreSQL Replication: http://www.commandprompt.com/products/ > > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal: Commit timestamp
Jan Wieck wrote: > On 2/8/2007 3:32 PM, Bruce Momjian wrote: >> Alvaro Herrera wrote: >>> > > Is this a new policy that after discussion, all patches must be > >>> > resubmitted with a summary and conclusions of the discussion? I can >>> > > certainly do that for you, but just tell me if you are going to >>> ask the > > same from everyone. >>> > > No, I am asking only this time because I feel there was too much >>> > disconnect between the patch and the extensive replication discussion >>> > that few community members would see the connection. >>> >>> FYI, in my opinion the trigger addition is clearly useful to Mammoth >>> Replicator as well. In fact, it's so obviously useful that I didn't see >>> a need to state that in the original thread where it was discussed. >> >> Right, I know it is useful too, but I would like a layout of what it >> does and why so everyone is clear on it. Well how deep are we talking here? My understanding of what Jan wants to do is simple. Be able to declare which triggers are fired depending on the state of the cluster. In Jan's terms, the Origin or Subscriber. In Replicator terms the Master or Slave. This is useful because I may have a trigger on the Master and the same trigger on the Slave. You do not want the trigger to fire on the Slave because we are doing data replication. In short, the we replicate the result, not the action. However, you may want triggers that are on the Slave to fire separately. A reporting server that generates materialized views is a good example. Don't tie up the Master with what a Slave can do. Sincerely, Joshua D. Drake > > I have no clue what got you into what you are doing here. But that shall > not be my real concern. If you feel the need to do this sort of thing, > be my guest. I will add the remaining changes to pg_rewrite, including > the new support commands and changes to psql as well as pg_dump and > resubmit the new patch with explanations that will hopefully help you to > comprehend what and how this relatively small and fully backward > compatible change in the trigger and rule firing mechanism will work and > what existing problems it will solve. > > > Regards, > Jan > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Simon Riggs wrote: >> Combo is OK, because it's a *combination* of two CommandIds. >> >> That means they are ComboCommandIds or CCIs. > CCI is CommandCounterIncrement to me, so let's not use that > abbreviation. Agreed. I looked for a bit at adding a separate typedef ComboCommandId to be used where a cid was definitely a combo cid, but it didn't seem to be very useful to do that... I'm testing the patch currently. I was a bit surprised to find the without_oid test failing, but it makes sense because I'm using a MAXALIGN=8 machine. I suppose Heikki tested on MAXALIGN=4. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom CommandIDs,updated patch
On Fri, 2007-02-09 at 00:08 -0300, Alvaro Herrera wrote: > Simon Riggs wrote: > > On Thu, 2007-02-08 at 15:32 -0500, Tom Lane wrote: > > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > > Tom Lane wrote: > > > >> Packed doesn't seem to have quite the right connotation either --- it > > > >> sounds like it means there are two separable fields in the CID value. > > > >> > > > >> Maybe "composite cid"? > > > > > > > At one point I was thinking "combo". but "composite" sounds good. > > > > Combo is OK, because it's a *combination* of two CommandIds. > > > > That means they are ComboCommandIds or CCIs. > > CCI is CommandCounterIncrement to me, so let's not use that > abbreviation. True; given the similar context that would be a mistake. Just ComboIds then? I was worried that was cid also. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch
Simon Riggs wrote: > On Thu, 2007-02-08 at 15:32 -0500, Tom Lane wrote: > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > > Tom Lane wrote: > > >> Packed doesn't seem to have quite the right connotation either --- it > > >> sounds like it means there are two separable fields in the CID value. > > >> > > >> Maybe "composite cid"? > > > > > At one point I was thinking "combo". but "composite" sounds good. > > Combo is OK, because it's a *combination* of two CommandIds. > > That means they are ComboCommandIds or CCIs. CCI is CommandCounterIncrement to me, so let's not use that abbreviation. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs,updated patch
On Thu, 2007-02-08 at 15:32 -0500, Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Packed doesn't seem to have quite the right connotation either --- it > >> sounds like it means there are two separable fields in the CID value. > >> > >> Maybe "composite cid"? > > > At one point I was thinking "combo". but "composite" sounds good. Combo is OK, because it's a *combination* of two CommandIds. That means they are ComboCommandIds or CCIs. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Bad bug in fopen() wrapper code
Tom Lane wrote: > "Magnus Hagander" <[EMAIL PROTECTED]> writes: > > That is part of the original open() code that Claudio did back for 8.0, > > so it has definitly been working since then. > > Hm, maybe best not to touch it, but still... > > > I haven't really read into > > the code, though... But a qiuck look doesn't show me any place wher eit > > does ignore O_EXCL - which combination would that be? > > What's bugging me is that 0 and O_EXCL give the same answer, and > O_TRUNC and O_TRUNC | O_EXCL give the same answer, but O_CREAT and > O_CREAT | O_EXCL give different answers, as do O_CREAT | O_TRUNC > and O_CREAT | O_TRUNC | O_EXCL. I'm also pretty suspicious of > both O_CREAT | O_EXCL and O_CREAT | O_TRUNC | O_EXCL giving the > same answer. However, I have no idea what the semantics are of > the symbols the function is mapping into, so maybe it's OK. I am CC'ing Claudio Natoli on this question about open.c::openFlagsToCreateFileFlags(), but in looking at the code, it seems OK because: o O_EXCL doesn't have any meaning unless O_CREAT is used o O_TRUNC has no meaning when O_CREAT | O_EXCL are used because you are guaranteed to be creating a new file Claudio, the function is here, at the top of the file: http://developer.postgresql.org/cvsweb.cgi/pgsql/src/port/open.c?rev=1.18;content-type=text%2Fx-cvsweb-markup -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Hierarchical Queries--Status
Who is working on this item? --- Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Mon, Sep 04, 2006 at 05:15:57PM +0100, Mark Cave-Ayland wrote: > > 3) Add planner support so that WITH clauses are mapped to a new type of > > node that utilises two tuplestores - an output tuplestore and a working > > tuplestore. The output tuple store will in effect be the contents of the > > table expression while the working tuplestore holds the results of the > > last iteration if recursive. Also implement some kind of WithState node > > which keeps track of the recursion state. > > That's basically what I came up with. Basically you have a sort of loop > in the execution plan where tuples that come out are copied into a > tuplestore and run through a particular part of the executor again. The > top-down approach of the executor makes it a bit trickier... > > > Having spent some more time today looking at 1) and also at the SQL 2003 > > spec, it would seem that other databases offer the WITH clause within > > subqueries and also as part of a view. I'd be interested to hear > > feedback from other developers as to whether it would be possible to > > achieve this level of support within PostgreSQL. > > Absolutly possible. The question is how much work :) > > Incidently, if you find a way to support common subplans (where a part > of the executor is shared between two executions) that might provide a > way to solve some of the trickier multiple evaluation problems with > rules. Again, it would just be a tuplestore the stored the results for > multiple executions. > > Have a nice day, > -- > Martijn van Oosterhout http://svana.org/kleptog/ > > From each according to his ability. To each according to his ability to > > litigate. -- End of PGP section, PGP failed! -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: Commit timestamp
Jan Wieck wrote: > On 2/8/2007 3:32 PM, Bruce Momjian wrote: > > Alvaro Herrera wrote: > >> > > Is this a new policy that after discussion, all patches must be > >> > > resubmitted with a summary and conclusions of the discussion? I can > >> > > certainly do that for you, but just tell me if you are going to ask > >> > > the > >> > > same from everyone. > >> > > >> > No, I am asking only this time because I feel there was too much > >> > disconnect between the patch and the extensive replication discussion > >> > that few community members would see the connection. > >> > >> FYI, in my opinion the trigger addition is clearly useful to Mammoth > >> Replicator as well. In fact, it's so obviously useful that I didn't see > >> a need to state that in the original thread where it was discussed. > > > > Right, I know it is useful too, but I would like a layout of what it > > does and why so everyone is clear on it. > > I have no clue what got you into what you are doing here. But that shall > not be my real concern. If you feel the need to do this sort of thing, > be my guest. I will add the remaining changes to pg_rewrite, including > the new support commands and changes to psql as well as pg_dump and > resubmit the new patch with explanations that will hopefully help you to > comprehend what and how this relatively small and fully backward > compatible change in the trigger and rule firing mechanism will work and > what existing problems it will solve. Yep, that's what I want everyone to see. This is standard procedure for everyone in the community, and core is not immune. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Proposal: Commit timestamp
On 2/8/2007 3:32 PM, Bruce Momjian wrote: Alvaro Herrera wrote: > > Is this a new policy that after discussion, all patches must be > > resubmitted with a summary and conclusions of the discussion? I can > > certainly do that for you, but just tell me if you are going to ask the > > same from everyone. > > No, I am asking only this time because I feel there was too much > disconnect between the patch and the extensive replication discussion > that few community members would see the connection. FYI, in my opinion the trigger addition is clearly useful to Mammoth Replicator as well. In fact, it's so obviously useful that I didn't see a need to state that in the original thread where it was discussed. Right, I know it is useful too, but I would like a layout of what it does and why so everyone is clear on it. I have no clue what got you into what you are doing here. But that shall not be my real concern. If you feel the need to do this sort of thing, be my guest. I will add the remaining changes to pg_rewrite, including the new support commands and changes to psql as well as pg_dump and resubmit the new patch with explanations that will hopefully help you to comprehend what and how this relatively small and fully backward compatible change in the trigger and rule firing mechanism will work and what existing problems it will solve. Regards, Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] HOT for PostgreSQL 8.3
On Wed, 2007-02-07 at 14:17 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > The basic idea is that when a tuple is UPDATEd we can, in certain > > circumstances, avoid inserting index tuples for a tuple. Such tuples are > > marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to > > other tuples. > > What is VACUUM FULL going to do when it wants to move one of these things? In addition to others suggested, one option is to rework VACUUM FULL: Use case for VACUUM FULL is very low these days. It has an appallingly long execution time. This can be speeded up by dropping and re-creating indexes, says the manual, but it is still lengthy. It is even faster to drop the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old table and then rebuild the indexes. When moving into the new relation the space requirement is not double, since we only copy useful data/space and we also do this using the space the indexes occupied, so the actual space overhead isn't that high. VACUUM FULL also generates lots of WAL and ties up lots of memory while it operates - and it uses memory without any constraint. The CTAS technique doesn't carry across all of the visible tuple chains, but then to be brutally frank, by the time VACUUM FULL has actually finished executing, it is very frequently the oldest transaction anyway, so we needn't really have gone to all the trouble of moving the tuple chains. So the main use case for VACUUM FULL is when the space to be freed inside the table is low enough to make defraging the table quicker than a CTAS, yet still high enough that we were worried enough to do a VACUUM FULL. Thats a very narrow use case, and if it exists at all there's a narrow time window associated with it - only a heavily updated/deleted table needs vacuuming anyway - that means most often be performing the VF when we're already into the zone where the CTAS approach is quicker. VACUUM FULL also forces us to handle various failure cases that leave half-moved tuples scattered across tables. (Incomplete VACUUM FULLs are actually fairly common because of its incredibly long run time). So the radical suggestion is to continue to support the VACUUM FULL command, but using a newly coded technique that is both higher performance and more robust. We can either make VACUUM FULL wait until it actually is the oldest transaction, or we can mark the table in some way so that a lock cannot be obtained upon it by an earlier Xid. We should be able to compact the files of a large table one physical file at a time, so the space overhead is only ever MAX_PHYSICAL_FILESIZE and the space overhead may become a net space gain as the VF continues. This is of course (almost) identical to the approach already in use for CLUSTER, so it seems like that should be acceptable. As a result, its really not that much code and can still be accomplished on time. Note also that we would not have to drop and re-add Foreign Keys, since nothing can have changed while we have the table locked. Doing this also frees up two heap info bits and simplifies many of the HeapTupleSatisfies code, which could probably use the helping hand. Tuples moved to the new files would retain their info bit settings as they are copied across. So overall, it seems a lot easier to completely replace VF than to fight through its complexities and failure cases. If we do the above, then we'll speed up VACUUM FULL and we'll be able to handle HOT tuples easily. > > CREATE INDEX requires some careful work to allow it to identify and > > correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as > > a result of the new index. > > I think you've glossed over the CREATE INDEX problem much too easily. > The difficulty with that is going to be that de-HOT-ifying a tuple > is going to require multiple updates that can't possibly be put into > a single WAL record, and I don't think that WAL replay can clean up > after an incomplete update (since it can't run user-defined functions > and hence cannot be expected to compute index entries for itself). > So I don't think you can do that while preserving crash safety. No intention to gloss, just wanted to get past first post and onto the really complex stuff. You're absolutely right that this is where much of the thinking/work needs to take place. > > Removing the root tuple will require a VACUUM *FULL*. > > That seems unacceptable ... it won't take too long for your table to > fill up with stubs, and we don't want to return to the bad old days > when periodic VACUUM FULL was unavoidable. Completely agree. I wanted to start right at the very beginning, so everybody would understand the issues, rather than jump straight in again with additional complexity. > ISTM we could fix that by extending the index VACUUM interface to > include two concepts: aside from "remove these TIDs when you find them", > there could be "replace these TIDs with those TIDs when you find them". > This would allow
Re: [HACKERS] [PATCHES] plpgsql, return can contains any expression
OK, where are we on this patch? --- Pavel Stehule wrote: > > > > > >"Pavel Stehule" <[EMAIL PROTECTED]> writes: > > >> This patch doesn't seem to cope with cases where the supplied tuple has > > >> the wrong number of columns, and it doesn't look like it's being > >careful > > >> about dropped columns either. Also, that's a mighty bizarre-looking > > >> choice of cache memory context in coerce_to_tuple ... but then again, > > >> why are you bothering with a cache at all for temporary arrays? > > > > > I am sorry, Tom. But I don't understand. I can check number of columns, > > > ofcourse and I'll do it. What cache for temporary arrays do you mean? > > > >I thought that making coerce_to_tuple depend on estate->err_func was > >pretty bizarre, and that there was no need for any "cache" at all for > >arrays that need only live as long as the function runs. All you are > >saving here is a palloc/pfree cycle, which is not worth the obscurantism > >and risk of bugs (are you sure natts can never change?). > > No, cache there is ugly. But I don't have idea about more efective > implementation of it :-(. First version of this patch was more clean. and > little bit slow. This cache save 10%. > > > > >BTW, if you want this patch to make it into 8.2, it needs to be fixed > >and resubmitted *very* soon. > > I understand, but I am not able work on it in next four days. And I need > help with it from Neil. It will be for 8.3. > > Thank you > Pavel > > _ > Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. > http://messenger.msn.cz/ > > > ---(end of broadcast)--- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] Updatable views
Here is the more recent email I have on this feature work. --- Bernd Helmle wrote: > --On Freitag, September 01, 2006 11:41:16 -0400 Tom Lane > <[EMAIL PROTECTED]> wrote: > > > > > So in other words, views on serial columns don't work? I don't think > > that's going to be acceptable. > > > > They work in such a case that someone isn't allowed to put a volatile > function in an update query > > > > > Not really worse than what the rewriter is doing already --- in fact, > > I think it's isomorphic to what would happen to the rule qual > > expressions in your existing patch. > > > > Currently you don't have to rewrite the rule conditions itself every > time you apply them to the query tree since they are stored in pg_rewrite > matching all various (reversed) varattno's, resno's and whatever. > If i understand correctly you need to do that with constraints every time > you fire an update query on a view for each underlying relation > > > > > I'm about to propose that we should try to go beta next week (see > > forthcoming message). If you can strip down your patch to avoid the > > multi-eval problems in the next couple of days, I'm still willing to > > Depends on how many days "couple of days" are.if you mean the next > three days > then definitely not, since i'm afk for the whole upcoming weekend. Bad > timing, but > it's not deferrable :( > > > consider it, but at the moment I'm assuming that it needs to be held > > for 8.3. > > Well, i'll see what i can do next weekit's a little bit disappointing > that these problems > raises so late, but that's no one's fault since there are many side effects > of the rewriting > system involved > > Many thanks for your comments. > > Bernd > > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] Updatable views
Where are we on this feature? --- Bernd Helmle wrote: > --On Mittwoch, August 30, 2006 12:01:25 -0400 Tom Lane <[EMAIL PROTECTED]> > wrote: > > > Bernd Helmle <[EMAIL PROTECTED]> writes: > >> [ latest views patch ] > > > > This is the first time I've actually looked at this patch, and I am > > dismayed. viewUpdate.c looks like nothing so much as a large program > > with a small program struggling to get out. What is all the stuff about > > handling multiple base rels? SQL92, at least, does not say that a join > > is updatable, and AFAICT this patch is rejecting that too ... though > > it's hard to tell with the conditions for allowing the join to be > > updatable scattered through a lot of different functions. And some of > > the code seems to be expecting multiple implicit rules and other parts > > not. I get the impression that a lot of this code is left over from a > > more ambitious first draft and ought to be removed in the name of > > readability/maintainability. > > > > I not sure what parts of the code you are refering to exactly, but I admit > that > there are code parts that could deal with multiple base relations and > rules. > get_base_base_relation() is an example, it is used to create lookup tables > for reversed columns so we could break them down to the correct position in > their base tables. Restricting that to only one base relation wouldn't make > any > difference. Furthermore, SQL99 allows at least updatable views with joined > relations which preserve their keys in the view definition. So i don't > think it's that > bad to leave parts of the code that way for future improvements. > > > I'm unclear as to why you've got DO INSTEAD NOTHING rules in there --- > > the spec says that a WITH CHECK OPTION violation results in an error, > > not in nothing happening, so it doesn't seem to me that we should need > > any NOTHING rules to implement the spec. It would probably help if > > Well, instead of something like > > "ERROR: cannot insert into a view > HINT: You need an unconditional ON INSERT DO INSTEAD rule." > > you will get > > "ERROR: view update commands violates rule condition" > > with the correct error code set, because the view update check function is > fired before. > The first one isn't very useful for someone who simply wants to insert data > into the > view which isn't allowed to get in. You never get the view update check > function fired > without the DO INSTEAD rule applied to a view created with a check option. > > > there were some header documentation that explained exactly how the > > module intends to transform a SELECT to create the various action rules. > > > > I agree with you, maybe it's a good to add a README to src/backend/rewrite? > > > The pg_dump changes seem pretty odd too. Why wouldn't you just > > ignore implicit rules during a dump, expecting the system to > > regenerate them when the view is reloaded? > > Uhm, you're right. It's easier to exclude them in the SELECT query directly > instead > of selecting them, iterating over and filter them out. I'll fix that. > (Looks like this is a > "cannot see the wood for the trees"-mistake) > > > -- > Thanks > > Bernd > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Re-ordering of OR conditions
IF I run the following with the a < 2900 condition first, the more expensive EXISTS only gets executed when needed, but if I change the order of the OR's, the EXISTS is always executed. It would be good if the optimizer could re-order the OR conditions based on estimated cost (granted, this wouldn't work very well if you've got functions in the OR, but it'd still be useful): select * from a where a < 2900 or exists (select * from b where b.a = a.a); Here's a full example. Note the loops count for the Subplan between both cases: decibel=# create table a as select * from generate_series(1,3000) a; SELECT decibel=# create table b as select a,b from a, generate_series(1,100) b where a > 10; SELECT decibel=# create index b__a on b(a); CREATE INDEX decibel=# explain analyze select * from a where a < 2900 or exists (select * from b where b.a = a.a); QUERY PLAN Seq Scan on a (cost=0.00..8080.41 rows=1997 width=4) (actual time=0.014..1.784 rows=3000 loops=1) Filter: ((a < 2900) OR (subplan)) SubPlan -> Index Scan using b__a on b (cost=0.00..4006.44 rows=1495 width=8) (actual time=0.009..0.009 rows=1 loops=101) Index Cond: (a = $0) Total runtime: 2.151 ms (6 rows) decibel=# explain analyze select * from a where exists (select * from b where b.a = a.a) or a < 2000; QUERY PLAN - Seq Scan on a (cost=0.00..8080.41 rows=1997 width=4) (actual time=0.067..37.011 rows=3000 loops=1) Filter: ((subplan) OR (a < 2000)) SubPlan -> Index Scan using b__a on b (cost=0.00..4006.44 rows=1495 width=8) (actual time=0.011..0.011 rows=1 loops=3000) Index Cond: (a = $0) Total runtime: 37.497 ms (6 rows) decibel=# (This is on HEAD as of a few minutes ago) -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] TODO: Allow SQL functions to reference parameters by name
Can someone add this to the TODO? patch Description: Binary data -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Tom Lane wrote: > Bruce Momjian <[EMAIL PROTECTED]> writes: > > The way combo cid is supposed to work is that you are deleting a row > > created in your same transaction by a previous command id, so you look > > in the combo cid array to see if a match for that pair exists --- if > > not, you create a new entry and put the two cids on it. > > > So, with the combo lock cid, you do the same process, and lookups of who > > holds the lock looks at the cid combo, and if the second subtransaction > > was aborted, the first one is the lock holder. If you again lock the > > row, you create a new combo cid and use the original cid there because > > the second cid was aborted. > > No, because no process other than the originator can see the combo-cid > data structure, and for locking situations you really need other > backends to be able to know whether the tuple is locked and how. Oh, OK, I forgot pg_subtrans is visible to all backends. > But I think my proposal of extending MultiXact would fix it; please look > at that. Sounds good. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Bruce Momjian <[EMAIL PROTECTED]> writes: > The way combo cid is supposed to work is that you are deleting a row > created in your same transaction by a previous command id, so you look > in the combo cid array to see if a match for that pair exists --- if > not, you create a new entry and put the two cids on it. > So, with the combo lock cid, you do the same process, and lookups of who > holds the lock looks at the cid combo, and if the second subtransaction > was aborted, the first one is the lock holder. If you again lock the > row, you create a new combo cid and use the original cid there because > the second cid was aborted. No, because no process other than the originator can see the combo-cid data structure, and for locking situations you really need other backends to be able to know whether the tuple is locked and how. But I think my proposal of extending MultiXact would fix it; please look at that. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Humm, sorry, obviously this makes no sense at all because I mentally > mixed the Xid locker and the Cids. After thinking a bit, I have a sketch of a solution. Assume that we extend the MultiXact infrastructure so that it can track whether each member of a MultiXact holds shared or exclusive lock. (There are a couple ways you could do that --- add a parallel bit-array, or separate the members into two groups. Details not important for now.) The only way you could have both shared- and exclusive-lock members is if they are subtransactions of the same backend, but that fact isn't real relevant to the MultiXact code. Then extend MultiXactIdWait() so that you can tell it to wait for all members to die, or just the exclusive members. Then the representation of the problem situation would be that a locked tuple would have as XMAX a MultiXact containing the upper XID as shared locker and the subtransaction as exclusive locker. Onlookers could wait for one or both to die as appropriate depending on what kind of lock they needed to get. HEAP_XMAX_EXCL_LOCK would have to be a hint rather than the truth, ie, once all the exclusive-lock members of the MultiXact are dead it's really only a shared lock, but I don't see that this poses any real difficulty. I don't particularly want to go implement this now; I just want a proof-of-concept sketch proving that we don't need separate cmin and cmax to support this. As for what I think we *should* do near-term, I'm pretty strongly tempted to suggest that we just throw an error if a subtransaction tries to upgrade an upper transaction's shared lock to exclusive. When and if we get a ton of complaints about that, it'd be time to put forth effort to fix it. I suspect the situation doesn't really arise much in practice, else we'd have heard complaints from the field about the fact that the shared lock can become lost. Comments? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Tom Lane wrote: > > At one point I was thinking "combo". but "composite" sounds good. > > I like "combo" --- nice and short. > > >> Another issue that we need to think about before we go too far with this > >> is the problem that we punted on before 8.2 release: how to deal with > >> rolling back an upgrade of a row-level lock from shared to exclusive > >> within a subtransaction. I'm a bit nervous about committing to merging > >> cmin and cmax before we have an idea how we're going to solve that --- > >> it might foreclose a solution. Or maybe we could piggyback on phantom/ > >> composite/whatever CIDs to solve it, which would be great, but let's > >> try to sketch out a solution now. > > > Good point. Right now we put our new cid on top of the old lock cid, > > making rollback impossible to the old lock. What if instead of > > overwriting our old cid with a new one, we create a composite cid, and > > if we roll back, we look up the composite pair and put the old cid back. > > It would only work with two cids, but that seems sufficient. > > Yeah, that's more or less what I was thinking. The problem is that the > composite CID isn't going to be enough info to tell you *where* you have > to put things back. And we don't want to try to remember per-row state > in memory. Is there a way to generalize either the composite CID or the > MultiXact mechanism to support this situation without that? Uh, well, hmmm. The way combo cid is supposed to work is that you are deleting a row created in your same transaction by a previous command id, so you look in the combo cid array to see if a match for that pair exists --- if not, you create a new entry and put the two cids on it. So, with the combo lock cid, you do the same process, and lookups of who holds the lock looks at the cid combo, and if the second subtransaction was aborted, the first one is the lock holder. If you again lock the row, you create a new combo cid and use the original cid there because the second cid was aborted. I don't see how any of this is per-row for locks anymore than it is per-row for insert/delete. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Alvaro Herrera wrote: > This starts to look awfully similar to MultiXactIds. And probably using > such a mechanism would allow you to "rollback" any number of row locks: > take the current membersoof the "multicid", substract the one that > rolled back and use that as new multicid. The main difference is that > you'd need to store both the locker Cid and the mode (shared/exclusive). Humm, sorry, obviously this makes no sense at all because I mentally mixed the Xid locker and the Cids. > The other difference is that multicids can be stored locally to a > backend, no need to have SLRUs etc. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Bruce Momjian wrote: > Tom Lane wrote: > > > > Another issue that we need to think about before we go too far with this > > is the problem that we punted on before 8.2 release: how to deal with > > rolling back an upgrade of a row-level lock from shared to exclusive > > within a subtransaction. I'm a bit nervous about committing to merging > > cmin and cmax before we have an idea how we're going to solve that --- > > it might foreclose a solution. Or maybe we could piggyback on phantom/ > > composite/whatever CIDs to solve it, which would be great, but let's > > try to sketch out a solution now. > > Good point. Right now we put our new cid on top of the old lock cid, > making rollback impossible to the old lock. What if instead of > overwriting our old cid with a new one, we create a composite cid, and > if we roll back, we look up the composite pair and put the old cid back. > It would only work with two cids, but that seems sufficient. This starts to look awfully similar to MultiXactIds. And probably using such a mechanism would allow you to "rollback" any number of row locks: take the current membersoof the "multicid", substract the one that rolled back and use that as new multicid. The main difference is that you'd need to store both the locker Cid and the mode (shared/exclusive). The other difference is that multicids can be stored locally to a backend, no need to have SLRUs etc. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Packed doesn't seem to have quite the right connotation either --- it >> sounds like it means there are two separable fields in the CID value. >> >> Maybe "composite cid"? > At one point I was thinking "combo". but "composite" sounds good. I like "combo" --- nice and short. >> Another issue that we need to think about before we go too far with this >> is the problem that we punted on before 8.2 release: how to deal with >> rolling back an upgrade of a row-level lock from shared to exclusive >> within a subtransaction. I'm a bit nervous about committing to merging >> cmin and cmax before we have an idea how we're going to solve that --- >> it might foreclose a solution. Or maybe we could piggyback on phantom/ >> composite/whatever CIDs to solve it, which would be great, but let's >> try to sketch out a solution now. > Good point. Right now we put our new cid on top of the old lock cid, > making rollback impossible to the old lock. What if instead of > overwriting our old cid with a new one, we create a composite cid, and > if we roll back, we look up the composite pair and put the old cid back. > It would only work with two cids, but that seems sufficient. Yeah, that's more or less what I was thinking. The problem is that the composite CID isn't going to be enough info to tell you *where* you have to put things back. And we don't want to try to remember per-row state in memory. Is there a way to generalize either the composite CID or the MultiXact mechanism to support this situation without that? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: Commit timestamp
Alvaro Herrera wrote: > > > Is this a new policy that after discussion, all patches must be > > > resubmitted with a summary and conclusions of the discussion? I can > > > certainly do that for you, but just tell me if you are going to ask the > > > same from everyone. > > > > No, I am asking only this time because I feel there was too much > > disconnect between the patch and the extensive replication discussion > > that few community members would see the connection. > > FYI, in my opinion the trigger addition is clearly useful to Mammoth > Replicator as well. In fact, it's so obviously useful that I didn't see > a need to state that in the original thread where it was discussed. Right, I know it is useful too, but I would like a layout of what it does and why so everyone is clear on it. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 2007-08-02 at 12:24 -0800, Stephan Szabo wrote: > On Thu, 8 Feb 2007, Marc Munro wrote: > > > I don't think this does stop the second from continuing before the > > first. What will stop it, is the eventual lock that is taken on the > > child (triggering) record. > > But at that point, you've already had to compose the new row in order to > call the trigger for the ri check, right? So, one of those new rows will > be out of date by the time it actually gets the lock. Presumably that > means that you need to recalculate the new row, but you've already done a > check and gotten a lock based on the old new row. Yes. That is tricky. For my proposed scheme to work, I guess we'd have to be able to drop those locks which were just acquired by the RI triggers. Not too simple, I guess. > Also, another big problem is the fact that SQL requires that the action > already have happened before the check in cases where the constraint > references the same table. The row being updated or inserted might > reference a row that will be updated or inserted by a later action of the > same statement. Hmmm. That does seem to be the final nail in the coffin. Consider the proposal withdrawn, and thanks for explaining it all to me. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
Tom Lane wrote: > [ time to move this thread to -hackers ] > > Bruce Momjian <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Alvaro Herrera <[EMAIL PROTECTED]> writes: > >>> Heikki Linnakangas wrote: > Tom Lane wrote: > > BTW, I don't care much for the terminology "phantom cid" ... there's > > nothing particularly "phantom" about them, seeing they get onto disk. > > Can anyone think of a better name? The best I can do offhand is > > "merged cid" or "cid pair", which aren't inspiring. > > MultiCid, like the MultiXacts? Maybe not, they're quite different > beasts... > > >>> Dual cid? Double cid? > > >> "Double cid" doesn't sound too bad. Another thought that just came to > >> mind is "cid interval" or some variant of that. > > > I don't like "double ctid" because it is really just one ctid, but > > represents two. I am thinking "packed ctid" is the right wording. It > > doesn't have the same impact as "phantom", but it is probably better. > > Packed doesn't seem to have quite the right connotation either --- it > sounds like it means there are two separable fields in the CID value. > > Maybe "composite cid"? At one point I was thinking "combo". but "composite" sounds good. > Another issue that we need to think about before we go too far with this > is the problem that we punted on before 8.2 release: how to deal with > rolling back an upgrade of a row-level lock from shared to exclusive > within a subtransaction. I'm a bit nervous about committing to merging > cmin and cmax before we have an idea how we're going to solve that --- > it might foreclose a solution. Or maybe we could piggyback on phantom/ > composite/whatever CIDs to solve it, which would be great, but let's > try to sketch out a solution now. Good point. Right now we put our new cid on top of the old lock cid, making rollback impossible to the old lock. What if instead of overwriting our old cid with a new one, we create a composite cid, and if we roll back, we look up the composite pair and put the old cid back. It would only work with two cids, but that seems sufficient. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 8 Feb 2007, Marc Munro wrote: > On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote: > > On Thu, 8 Feb 2007, Marc Munro wrote: > . . . > > > > > > That other transaction, T1, would have run the same RI triggers and so > > > would have the same parent records locked. > > > > That's not true in the case of delete, since the referencing table > > triggers are on insert and update. . . . > > Let me see if I have this scenario right: > > Transaction T1 updates child record C1, with RI causing the parent P1 to > be locked before the child. > > In the meantime transaction T2, successfully deletes C1 as it has not > yet been locked. > > (Please tell me if I have misunderstood what you are saying) > > Yes in this case, T1 must abort because the record it was going to > update has disappeared from underneath it. I don't see how this is > significantly different from the same race for the record if the table > had no RI constraints. The only difference that I can see, is that T1 > now has some locks that it must relinquish as the transaction aborts. > > > . . . Second, the parent record locks are not > > exclusive which means that both can be granted, so I don't see how this > > stops the second from continuing before the first. > > I don't think this does stop the second from continuing before the > first. What will stop it, is the eventual lock that is taken on the > child (triggering) record. But at that point, you've already had to compose the new row in order to call the trigger for the ri check, right? So, one of those new rows will be out of date by the time it actually gets the lock. Presumably that means that you need to recalculate the new row, but you've already done a check and gotten a lock based on the old new row. Also, another big problem is the fact that SQL requires that the action already have happened before the check in cases where the constraint references the same table. The row being updated or inserted might reference a row that will be updated or inserted by a later action of the same statement. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: Commit timestamp
Bruce Momjian wrote: > Jan Wieck wrote: > > On 2/7/2007 11:12 PM, Bruce Momjian wrote: > > > Jan Wieck wrote: > > >> On 2/7/2007 10:35 PM, Bruce Momjian wrote: > > >> > > >> > Also, I saw the trigger patch with no explaination of why it was > > >> > important or who would use it --- that also isn't going to fly well. > > >> > > >> You didn't respond to my explanation how the current Slony > > >> implementation could improve and evolve using it. Are you missing > > >> something? I am discussing this very issue with our own QA department, > > >> and thus far, I think I have a majority of "would use a pg_trigger > > >> backpatched PostgreSQL" vs. "No, I prefer a system that knows exactly > > >> how it corrupted my system catalog". > > > > > > No, I _now_ understand the use case, but when the patch was posted, the > > > use case was missing. I would like to see a repost with the patch, and > > > a description of its use so we can all move forward on that. > > > > Is this a new policy that after discussion, all patches must be > > resubmitted with a summary and conclusions of the discussion? I can > > certainly do that for you, but just tell me if you are going to ask the > > same from everyone. > > No, I am asking only this time because I feel there was too much > disconnect between the patch and the extensive replication discussion > that few community members would see the connection. FYI, in my opinion the trigger addition is clearly useful to Mammoth Replicator as well. In fact, it's so obviously useful that I didn't see a need to state that in the original thread where it was discussed. Not sure about the timestamp stuff, because Replicator is not multi-master, so there's no conflict resolution to take care of. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] [pgsql-patches] Phantom Command IDs, updated patch
[ time to move this thread to -hackers ] Bruce Momjian <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Alvaro Herrera <[EMAIL PROTECTED]> writes: >>> Heikki Linnakangas wrote: Tom Lane wrote: > BTW, I don't care much for the terminology "phantom cid" ... there's > nothing particularly "phantom" about them, seeing they get onto disk. > Can anyone think of a better name? The best I can do offhand is > "merged cid" or "cid pair", which aren't inspiring. MultiCid, like the MultiXacts? Maybe not, they're quite different beasts... >>> Dual cid? Double cid? >> "Double cid" doesn't sound too bad. Another thought that just came to >> mind is "cid interval" or some variant of that. > I don't like "double ctid" because it is really just one ctid, but > represents two. I am thinking "packed ctid" is the right wording. It > doesn't have the same impact as "phantom", but it is probably better. Packed doesn't seem to have quite the right connotation either --- it sounds like it means there are two separable fields in the CID value. Maybe "composite cid"? Another issue that we need to think about before we go too far with this is the problem that we punted on before 8.2 release: how to deal with rolling back an upgrade of a row-level lock from shared to exclusive within a subtransaction. I'm a bit nervous about committing to merging cmin and cmax before we have an idea how we're going to solve that --- it might foreclose a solution. Or maybe we could piggyback on phantom/ composite/whatever CIDs to solve it, which would be great, but let's try to sketch out a solution now. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote: > Marc Munro <[EMAIL PROTECTED]> writes: > > Yes in this case, T1 must abort because the record it was going to > > update has disappeared from underneath it. I don't see how this is > > significantly different from the same race for the record if the table > > had no RI constraints. The only difference that I can see, is that T1 > > now has some locks that it must relinquish as the transaction aborts. > > No, the difference is there would have been no error at all before; > if the record were deleted before T1 got to it then it wouldn't have > attempted to update it. I really don't think you can make it work > to perform updates or deletes on a record you have not yet locked. The record would be locked before the update or delete is attempted, however it would not be locked until the referential integrity constraints have succeeded in acquiring their locks. It is becoming clear to me that I am missing something but I still don't know what it is. If anyone can see it and explain it I'd really appreciate it. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Proposal: Commit timestamp
Jan Wieck wrote: > On 2/7/2007 11:12 PM, Bruce Momjian wrote: > > Jan Wieck wrote: > >> On 2/7/2007 10:35 PM, Bruce Momjian wrote: > >> > I find the term "logical proof of it's correctness" too restrictive. It > >> > sounds like some formal academic process that really doesn't work well > >> > for us. > >> > >> Thank you. > >> > >> > Also, I saw the trigger patch with no explaination of why it was > >> > important or who would use it --- that also isn't going to fly well. > >> > >> You didn't respond to my explanation how the current Slony > >> implementation could improve and evolve using it. Are you missing > >> something? I am discussing this very issue with our own QA department, > >> and thus far, I think I have a majority of "would use a pg_trigger > >> backpatched PostgreSQL" vs. "No, I prefer a system that knows exactly > >> how it corrupted my system catalog". > > > > No, I _now_ understand the use case, but when the patch was posted, the > > use case was missing. I would like to see a repost with the patch, and > > a description of its use so we can all move forward on that. > > Is this a new policy that after discussion, all patches must be > resubmitted with a summary and conclusions of the discussion? I can > certainly do that for you, but just tell me if you are going to ask the > same from everyone. No, I am asking only this time because I feel there was too much disconnect between the patch and the extensive replication discussion that few community members would see the connection. I would also like to know what your new features does for each supported option. I have not seen that spelled out yet at all. -- Bruce Momjian <[EMAIL PROTECTED]> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball
On Feb 8, 11:28 am, [EMAIL PROTECTED] (Alvaro Herrera) wrote: > Tom Lane wrote: > > I thought it might be coming from your --with-libs switch somehow, > > but when I add that to my configure command it does not change this > > output at all. Is it possible you've got environment variables > > (like CFLAGS) that configure might be picking up? > > A stout I'll bet it's $LIBS Bingo. I'll pay up next time I see you guys. Or you're welcome to drop by Santa Barbara area to collect. Andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] referential Integrity and SHARE locks
Marc Munro <[EMAIL PROTECTED]> writes: > Yes in this case, T1 must abort because the record it was going to > update has disappeared from underneath it. I don't see how this is > significantly different from the same race for the record if the table > had no RI constraints. The only difference that I can see, is that T1 > now has some locks that it must relinquish as the transaction aborts. No, the difference is there would have been no error at all before; if the record were deleted before T1 got to it then it wouldn't have attempted to update it. I really don't think you can make it work to perform updates or deletes on a record you have not yet locked. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball
Tom Lane wrote: > I thought it might be coming from your --with-libs switch somehow, > but when I add that to my configure command it does not change this > output at all. Is it possible you've got environment variables > (like CFLAGS) that configure might be picking up? A stout I'll bet it's $LIBS -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball
"Andrew Hammond" <[EMAIL PROTECTED]> writes: > I'm trying to build PostgreSQL 8.2.2 outside the ports system on a > FreeBSD 6.2 amd64 box. Note you *definitely* want 8.2.3 not 8.2.2. > configure:2209: checking for C compiler default output file name > configure:2212: gcc -O3 -funroll-loops -m64 conftest.c /usr/local/lib >&5 > /usr/local/lib: file not recognized: File format not recognized That's really strange. For reference, I get this: configure:2209: checking for C compiler default output file name configure:2212: gccconftest.c >&5 configure:2215: $? = 0 configure:2261: result: a.out I thought it might be coming from your --with-libs switch somehow, but when I add that to my configure command it does not change this output at all. Is it possible you've got environment variables (like CFLAGS) that configure might be picking up? regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball
Andrew Hammond wrote: > This file contains any messages produced by compilers while > running configure, to aid debugging if configure makes a mistake. > > It was created by PostgreSQL configure 8.2.2, which was > generated by GNU Autoconf 2.59. Invocation command line was > > $ ./configure --prefix=/usr/local/pgsql822_sl126_amd64_FreeBSD62 > --with-includes=/usr/local/include --with-libs=/usr/local/lib > --enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS=-O3 > -funroll-loops -m64 [...] > configure:2209: checking for C compiler default output file name > configure:2212: gcc -O3 -funroll-loops -m64 conftest.c /usr/local/lib >&5 > /usr/local/lib: file not recognized: File format not recognized Strange. I tried the same args on 8.2.3, and that config.log extract shows this: configure:2209: checking for C compiler default output file name configure:2212: gcc -O3 -funroll-loops -m64 conftest.c >&5 [works] Note no misplaced libs there. The configure args were ./configure --prefix=/tmp --with-includes=/usr/local/include --with-libs=/usr/local/lib --enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS="-O3 -funroll-loops -m64" (including the thread-saftey typo) I noticed that config.log has unquoted CFLAGS -- I'd say that's a bug, but maybe it's of no consequence. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal: Commit timestamp
On 2/7/2007 11:12 PM, Bruce Momjian wrote: Jan Wieck wrote: On 2/7/2007 10:35 PM, Bruce Momjian wrote: > I find the term "logical proof of it's correctness" too restrictive. It > sounds like some formal academic process that really doesn't work well > for us. Thank you. > Also, I saw the trigger patch with no explaination of why it was > important or who would use it --- that also isn't going to fly well. You didn't respond to my explanation how the current Slony implementation could improve and evolve using it. Are you missing something? I am discussing this very issue with our own QA department, and thus far, I think I have a majority of "would use a pg_trigger backpatched PostgreSQL" vs. "No, I prefer a system that knows exactly how it corrupted my system catalog". No, I _now_ understand the use case, but when the patch was posted, the use case was missing. I would like to see a repost with the patch, and a description of its use so we can all move forward on that. Is this a new policy that after discussion, all patches must be resubmitted with a summary and conclusions of the discussion? I can certainly do that for you, but just tell me if you are going to ask the same from everyone. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball
--enable-thread-saftey is surely misspelled. not sure about the rest. cheers andrew Andrew Hammond wrote: I'm trying to build PostgreSQL 8.2.2 outside the ports system on a FreeBSD 6.2 amd64 box. The databases/postgresql81-server port builds 8.1.7 just fine on the same box. My configure fails. I'm new to FreeBSD so I expect I'm missing something pretty obvious. config.log follows. Line 2212 is very odd since it appears to be giving gcc the library prefix as a file name. This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by PostgreSQL configure 8.2.2, which was generated by GNU Autoconf 2.59. Invocation command line was $ ./configure --prefix=/usr/local/pgsql822_sl126_amd64_FreeBSD62 --with-includes=/usr/local/include --with-libs=/usr/local/lib --enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS=-O3 -funroll-loops -m64 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS
Peter Eisentraut <[EMAIL PROTECTED]> writes: > I'm honestly looking for some practical use of this. We have debated > other NOTICE messages over the years, but they at least tell you > something you can use after the command. The objection I had to the original patch (which didn't return a notice) was that this seemed actively misleading: foo=> DROP TABLE IF EXISTS not_there; DROP TABLE foo=> I would be satisfied if the returned command tag were something else, maybe "NO OPERATION". regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote: > On Thu, 8 Feb 2007, Marc Munro wrote: . . . > > > > That other transaction, T1, would have run the same RI triggers and so > > would have the same parent records locked. > > That's not true in the case of delete, since the referencing table > triggers are on insert and update. . . . Let me see if I have this scenario right: Transaction T1 updates child record C1, with RI causing the parent P1 to be locked before the child. In the meantime transaction T2, successfully deletes C1 as it has not yet been locked. (Please tell me if I have misunderstood what you are saying) Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. > . . . Second, the parent record locks are not > exclusive which means that both can be granted, so I don't see how this > stops the second from continuing before the first. I don't think this does stop the second from continuing before the first. What will stop it, is the eventual lock that is taken on the child (triggering) record. I am not proposing reducing the number of locks taken, but rather changing the order in which the locks are taken. What am I missing? __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?
I can set up build farm on it if yall need it. Its running 6.2/amd64 --- Original Message --- From: "Andrew Hammond" <[EMAIL PROTECTED]> To: " Larry Rosenman " Sent: 2/8/07, 12:19:07 PM Subject: Re: BuildFarm: Do we need another FreeBSD/amd64 member? On Jan 19 2006, 9:36 pm, [EMAIL PROTECTED] ("Larry Rosenman") wrote: > Tom Lane wrote: > > "Larry Rosenman" <[EMAIL PROTECTED]> writes: > >> I've got a fastFreeBSD/amd64 server available to run Buildfarm > >> on. > > >> However, I see we already have a couple of others running it. > > >> My questions are: > >> 1) do we need another one? > >> 2) if yes, what options need coverage? > > > Looks like we're fairly well covered onfreebsdalready. Are you > > willing to consider running some less-popular OS on it? > > Not particularly, as I want it to do other things. However, I might be > willing > To get VMWare running on it, and run something else in a VM. > > What were you thinking? FreeBSD 6.2 was just released. I don't see any coverage for it yet. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] FreeBSD 6.2 ./configure plays oddly when building 8.2.2 from tarball
I'm trying to build PostgreSQL 8.2.2 outside the ports system on a FreeBSD 6.2 amd64 box. The databases/postgresql81-server port builds 8.1.7 just fine on the same box. My configure fails. I'm new to FreeBSD so I expect I'm missing something pretty obvious. config.log follows. Line 2212 is very odd since it appears to be giving gcc the library prefix as a file name. This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by PostgreSQL configure 8.2.2, which was generated by GNU Autoconf 2.59. Invocation command line was $ ./configure --prefix=/usr/local/pgsql822_sl126_amd64_FreeBSD62 --with-includes=/usr/local/include --with-libs=/usr/local/lib --enable-thread-saftey --with-openssl --enable-debug CC=gcc CFLAGS=-O3 -funroll-loops -m64 ## - ## ## Platform. ## ## - ## hostname = ahammond.ecoffice.experclick.com uname -m = amd64 uname -r = 6.2-RELEASE uname -s = FreeBSD uname -v = FreeBSD 6.2-RELEASE #0: Fri Jan 12 08:43:30 UTC 2007 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP /usr/bin/uname -p = amd64 /bin/uname -X = unknown /bin/arch = unknown /usr/bin/arch -k = unknown /usr/convex/getsysinfo = unknown hostinfo = unknown /bin/machine = unknown /usr/bin/oslevel = unknown /bin/universe = unknown PATH: /sbin PATH: /bin PATH: /usr/sbin PATH: /usr/bin PATH: /usr/games PATH: /usr/local/sbin PATH: /usr/local/bin PATH: /usr/X11R6/bin PATH: /home/ahammond/bin ## --- ## ## Core tests. ## ## --- ## configure:1408: checking build system type configure:1426: result: x86_64-unknown-freebsd6.2 configure:1434: checking host system type configure:1448: result: x86_64-unknown-freebsd6.2 configure:1458: checking which template to use configure:1558: result: freebsd configure:1700: checking whether to build with 64-bit integer date/time support configure:1732: result: no configure:1739: checking whether NLS is wanted configure:1774: result: no configure:1782: checking for default port number configure:1812: result: 5432 configure:2170: checking for C compiler version configure:2173: gcc --version &5 gcc (GCC) 3.4.6 [FreeBSD] 20060305 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. configure:2176: $? = 0 configure:2178: gcc -v &5 Using built-in specs. Configured with: FreeBSD/amd64 system compiler Thread model: posix gcc version 3.4.6 [FreeBSD] 20060305 configure:2181: $? = 0 configure:2183: gcc -V &5 gcc: `-V' option must have argument configure:2186: $? = 1 configure:2209: checking for C compiler default output file name configure:2212: gcc -O3 -funroll-loops -m64 conftest.c /usr/local/lib >&5 /usr/local/lib: file not recognized: File format not recognized configure:2215: $? = 1 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] BuildFarm: Do we need another FreeBSD/amd64 member?
On Jan 19 2006, 9:36 pm, [EMAIL PROTECTED] ("Larry Rosenman") wrote: > Tom Lane wrote: > > "Larry Rosenman" <[EMAIL PROTECTED]> writes: > >> I've got a fastFreeBSD/amd64 server available to run Buildfarm > >> on. > > >> However, I see we already have a couple of others running it. > > >> My questions are: > >> 1) do we need another one? > >> 2) if yes, what options need coverage? > > > Looks like we're fairly well covered onfreebsdalready. Are you > > willing to consider running some less-popular OS on it? > > Not particularly, as I want it to do other things. However, I might be > willing > To get VMWare running on it, and run something else in a VM. > > What were you thinking? FreeBSD 6.2 was just released. I don't see any coverage for it yet. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 8 Feb 2007, Marc Munro wrote: > Oops, forgot to include pgsql-hackers when I responded to this the first > time. > > On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote: > > Marc Munro <[EMAIL PROTECTED]> writes: > > > The RI triggers currently fire when a record is updated. Under my > > > proposal they would fire in the same way but before the record is > locked > > > rather than after. Or am I missing your point? > > > > IOW, some other transaction could update or delete the tuple > meanwhile? > > Doesn't seem very promising. > > > > That other transaction, T1, would have run the same RI triggers and so > would have the same parent records locked. That's not true in the case of delete, since the referencing table triggers are on insert and update. Second, the parent record locks are not exclusive which means that both can be granted, so I don't see how this stops the second from continuing before the first. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS
Peter Eisentraut wrote: The downside is that while I wanted to use the IF EXISTS form to reduce the chatter at the beginning of schema loading scripts, this just gives me a different spelling of that same chatter. There is possibly a good case for dropping the message level. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS
Andrew Dunstan wrote: > Peter Eisentraut wrote: > > What is the practical purpose of the notices emitted by DROP > > SOMETHING IF EXISTS when the object in fact does not exist? > > It was asked for ... The argument was that MySQL does the same. Which is valid but not overriding. I'm honestly looking for some practical use of this. We have debated other NOTICE messages over the years, but they at least tell you something you can use after the command. In this case, it just tells you that the object which you wanted removed no matter what didn't exist in the first place, but the state after the command (which is the interesting side) is always the same: "gone". The only use case I see is informing about typos, but the system generally doesn't cater to that. The downside is that while I wanted to use the IF EXISTS form to reduce the chatter at the beginning of schema loading scripts, this just gives me a different spelling of that same chatter. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)
On Thu, 2007-08-02 at 18:06 +0100, Csaba Nagy wrote: > The problem is that eliminating the deadlock is still not the complete > cake... the interlocking still remains, possibly leading to degraded > performance on high contention on very common parent rows. The real > solution would be when an update on the parent table's non-referenced > fields is not interlocking at all with updates of the child rows... and > I think there were some proposals to do that. Agreed. There are two issues here, unnecessary blocking and deadlock. These can be tackled separately. My proposal deals only with the deadlock issue. Even if if contention is reduced, for instance by implementing column-level locking, there will still be the potential for deadlock arising from inconsistent ordering of locks. I continue to stand by my proposal. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)
On Thu, 2007-02-08 at 17:47, Marc Munro wrote: > [snip] One of the causes of deadlocks in Postgres is that its referential > integrity triggers can take locks in inconsistent orders. Generally a > child record will be locked before its parent, but not in all cases. [snip] The problem is that eliminating the deadlock is still not the complete cake... the interlocking still remains, possibly leading to degraded performance on high contention on very common parent rows. The real solution would be when an update on the parent table's non-referenced fields is not interlocking at all with updates of the child rows... and I think there were some proposals to do that. In fact one possibility to avoid this problem is vertical partitioning, i.e. separating the non-key columns in a parallel table and updating them there. However this is a choice only when you know it beforehand and you're not inheriting the schema from other DBs... Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)
I am going to restate my earlier proposal, to clarify it and in the hope of stimulating more discussion. One of the causes of deadlocks in Postgres is that its referential integrity triggers can take locks in inconsistent orders. Generally a child record will be locked before its parent, but not in all cases. My proposal modifies the order in which locks are taken by referential integrity triggers, so that parents are always locked before their children. The proposal is, that referential integrity triggers should fire before locking the tuple from which they are triggered. I guess a new sort of trigger is required for this, a before-lock trigger. If this is a dumb idea, please tell me why. If it would cause more problems than it solves, ditto. If it would be difficult to implement, let's discuss and try to find solutions. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] referential Integrity and SHARE locks
Oops, forgot to include pgsql-hackers when I responded to this the first time. On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote: > Marc Munro <[EMAIL PROTECTED]> writes: > > The RI triggers currently fire when a record is updated. Under my > > proposal they would fire in the same way but before the record is locked > > rather than after. Or am I missing your point? > > IOW, some other transaction could update or delete the tuple meanwhile? > Doesn't seem very promising. > That other transaction, T1, would have run the same RI triggers and so would have the same parent records locked. The blocked transaction, T2, once T1 has committed, would fail. I don't see this as being much different from the current case, where T1 locks and deletes or updates a row, and T2 then tries to manipulate the same row. In both cases, locks manage the race for the row, and MVCC ensures that T2 fails. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Bitmap index thoughts
On Thu, 8 Feb 2007, Heikki Linnakangas wrote: > Gavin Sherry wrote: > > I will update the code tomorrow. The focus will be cleaning up the > > executor modifications. Please look else where for now. > > I'm getting a segfault with this test script: > > > CREATE TABLE bmtest (i int); > > INSERT INTO bmtest SELECT 1 FROM generate_series(1,10) a; > INSERT INTO bmtest SELECT 10 FROM generate_series(1,100) a; > DELETE FROM bmtest WHERE i = 1; > VACUUM bmtest; > > CREATE INDEX i_bmtest ON bmtest USING bitmap (i); > > INSERT INTO bmtest SELECT 10 FROM generate_series(1,10) a; > > Hmm... this triggers a bug in the newly rewritten update code I think. I'll post a fix soon. Thanks for testing! Gavin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Alvaro Herrera wrote: Heikki Linnakangas wrote: Stefan Kaltenbrunner wrote: Bruce Momjian wrote: Log Message: --- Reduce WAL activity for page splits: Currently, an index split writes all the data on the split page to WAL. That's a lot of WAL traffic. The tuples that are copied to the right page need to be WAL logged, but the tuples that stay on the original page don't. this patch killed both of my OpenBSD buildfarm members Please note that zebra died for a different reason the second time (not enough shared memory). yeah that's because initdb is leaving the shm-segments there on a failure. This resulted in the out-of-shm failure on the following run which I have manually cleared now ... Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Bitmap index thoughts
Gavin Sherry wrote: I will update the code tomorrow. The focus will be cleaning up the executor modifications. Please look else where for now. I'm getting a segfault with this test script: CREATE TABLE bmtest (i int); INSERT INTO bmtest SELECT 1 FROM generate_series(1,10) a; INSERT INTO bmtest SELECT 10 FROM generate_series(1,100) a; DELETE FROM bmtest WHERE i = 1; VACUUM bmtest; CREATE INDEX i_bmtest ON bmtest USING bitmap (i); INSERT INTO bmtest SELECT 10 FROM generate_series(1,10) a; -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] HOT for PostgreSQL 8.3
Tom Lane wrote: "Simon Riggs" <[EMAIL PROTECTED]> writes: The basic idea is that when a tuple is UPDATEd we can, in certain circumstances, avoid inserting index tuples for a tuple. Such tuples are marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to other tuples. What is VACUUM FULL going to do when it wants to move one of these things? I suppose it could move the whole chain to the same target page, if there is one with enough space to accommodate the whole chain. Or we could just cop out and not move tuples marked with HEAP_ONLY_TUPLE. I think that would be acceptable; after the last transaction that can see the old tuple is finished, the old tuple is dead. After that, VACUUM FULL can remove the old tuple and move the remaining tuple as usual. CREATE INDEX requires some careful work to allow it to identify and correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as a result of the new index. I think you've glossed over the CREATE INDEX problem much too easily. The difficulty with that is going to be that de-HOT-ifying a tuple is going to require multiple updates that can't possibly be put into a single WAL record, and I don't think that WAL replay can clean up after an incomplete update (since it can't run user-defined functions and hence cannot be expected to compute index entries for itself). So I don't think you can do that while preserving crash safety. Yeah, chilling tuples from HOT state to normal tuples is not easy. One solution I thought of is to add another flag to heap tuples, CHILL_IN_PROGRESS. To chill a tuple, you would: 1. Mark heap tuple with CHILL_IN_PROGRESS 2. Insert missing index entries 3. Clear CHILL_IN_PROGRESS and HEAP_ONLY_TUPLE flags Index scans would ignore tuples with CHILL_IN_PROGRESS and directly pointed to from the index. That way if we crash in the middle of step 2, scans and updates would work normally after replay, as if the index entries weren't there. CREATE INDEX would have to fail if there's any CHILL_IN_PROGRESS tuples, because we wouldn't know which index entries need to be inserted; some might already be there. To clear the CHILL_IN_PROGRESS flag, a vacuum would be needed. Vacuum would remove all index entries for those tuples, but instead of removing the heap tuple in the 2nd scan it would just clear the CHILL_IN_PROGRESS flag, bringing us back to where we started. However, the easiest solution would be to make CREATE INDEX wait until the old tuple is dead. That should be ok at least for concurrent CREATE INDEX, because it already has that kind of a wait between 1st and 2nd phase. Removing the root tuple will require a VACUUM *FULL*. That seems unacceptable ... it won't take too long for your table to fill up with stubs, and we don't want to return to the bad old days when periodic VACUUM FULL was unavoidable. Agreed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Heikki Linnakangas wrote: > Stefan Kaltenbrunner wrote: > >Bruce Momjian wrote: > >>Log Message: > >>--- > >>Reduce WAL activity for page splits: > >> > >>>Currently, an index split writes all the data on the split page to > >>>WAL. That's a lot of WAL traffic. The tuples that are copied to the > >>>right page need to be WAL logged, but the tuples that stay on the > >>>original page don't. > > > > > >this patch killed both of my OpenBSD buildfarm members Please note that zebra died for a different reason the second time (not enough shared memory). > lopaque was referenced after pfreeing the temp page it pointed to. Also > later in the function the LSN of the left page was set, but again using > a pointer to the pfreed temp copy instead of the real shared memory buffer. Applied, thanks. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] compilation of pg_config fails
Markus Schiltknecht wrote: > since str(n?)cat got replaced with strlcat, I fail to build Since the buildfarm thinks it's OK, I'd first of all check whether you have a complete and consistent checkout. Note that the default cvs options will probably not get you one. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] compilation of pg_config fails
Markus Schiltknecht wrote: Hi, since str(n?)cat got replaced with strlcat, I fail to build PostgreSQL (current CVS HEAD). HAVING_DECL_STRLCAT is not set, so AFAIK, the strlcat() function from src/port should be used. However, I've read the README there, but still don't quite know what's wrong. The linker throws: gcc -O1 -Wall -pg -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g pg_config.o -L../../../src/port -Wl,-rpath,'/usr/local/pgsql/lib' -lpgport -lz -lreadline -lcrypt -ldl -lm -o pg_config pg_config.o: In function `show_pgxs': /home/markus/projects/pgsql/sources/trunk/src/bin/pg_config/pg_config.c:216: undefined reference to `strlcat' Even if objdump confirms that strlcat.o is compiled in ../../../src/port/libpgport.o: In archive ../../../src/port/libpgport.a: strlcpy.o: file format elf32-i386 SYMBOL TABLE: ldf *ABS* strlcpy.c ld .text .text ld .data .data ld .bss .bss ld .debug_abbrev .debug_abbrev ld .debug_info .debug_info ld .debug_line .debug_line ld .debug_frame .debug_frame ld .debug_loc .debug_loc ld .debug_pubnames .debug_pubnames ld .debug_aranges .debug_aranges ld .debug_str .debug_str ld .note.GNU-stack .note.GNU-stack ld .comment .comment g F .text 0045 strlcpy *UND* mcount That dump shows strl*cpy*, not strlcat. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] compilation of pg_config fails
Hi, since str(n?)cat got replaced with strlcat, I fail to build PostgreSQL (current CVS HEAD). HAVING_DECL_STRLCAT is not set, so AFAIK, the strlcat() function from src/port should be used. However, I've read the README there, but still don't quite know what's wrong. The linker throws: gcc -O1 -Wall -pg -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g pg_config.o -L../../../src/port -Wl,-rpath,'/usr/local/pgsql/lib' -lpgport -lz -lreadline -lcrypt -ldl -lm -o pg_config pg_config.o: In function `show_pgxs': /home/markus/projects/pgsql/sources/trunk/src/bin/pg_config/pg_config.c:216: undefined reference to `strlcat' Even if objdump confirms that strlcat.o is compiled in ../../../src/port/libpgport.o: In archive ../../../src/port/libpgport.a: strlcpy.o: file format elf32-i386 SYMBOL TABLE: ldf *ABS* strlcpy.c ld .text .text ld .data .data ld .bss .bss ld .debug_abbrev .debug_abbrev ld .debug_info .debug_info ld .debug_line .debug_line ld .debug_frame .debug_frame ld .debug_loc .debug_loc ld .debug_pubnames .debug_pubnames ld .debug_aranges .debug_aranges ld .debug_str .debug_str ld .note.GNU-stack .note.GNU-stack ld .comment .comment g F .text 0045 strlcpy *UND* mcount ... # uname -a: Linux grml 2.6.20 #1 SMP PREEMPT Tue Feb 6 14:48:26 PST 2007 i686 GNU/Linux Any idea? Regards Markus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Heikki Linnakangas wrote: Stefan Kaltenbrunner wrote: Bruce Momjian wrote: Log Message: --- Reduce WAL activity for page splits: Currently, an index split writes all the data on the split page to WAL. That's a lot of WAL traffic. The tuples that are copied to the right page need to be WAL logged, but the tuples that stay on the original page don't. this patch killed both of my OpenBSD buildfarm members lopaque was referenced after pfreeing the temp page it pointed to. Also later in the function the LSN of the left page was set, but again using a pointer to the pfreed temp copy instead of the real shared memory buffer. Here's a fix. confirmed - with that patch -HEAD passes a full regression test run at least on emu. Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Stefan Kaltenbrunner wrote: Bruce Momjian wrote: Log Message: --- Reduce WAL activity for page splits: Currently, an index split writes all the data on the split page to WAL. That's a lot of WAL traffic. The tuples that are copied to the right page need to be WAL logged, but the tuples that stay on the original page don't. this patch killed both of my OpenBSD buildfarm members lopaque was referenced after pfreeing the temp page it pointed to. Also later in the function the LSN of the left page was set, but again using a pointer to the pfreed temp copy instead of the real shared memory buffer. Here's a fix. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/nbtree/nbtinsert.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtinsert.c,v retrieving revision 1.150 diff -c -r1.150 nbtinsert.c *** src/backend/access/nbtree/nbtinsert.c 8 Feb 2007 05:05:53 - 1.150 --- src/backend/access/nbtree/nbtinsert.c 8 Feb 2007 09:36:05 - *** *** 953,959 xlrec.rightsib = BufferGetBlockNumber(rbuf); xlrec.firstright = firstright; xlrec.rnext = ropaque->btpo_next; ! xlrec.level = lopaque->btpo.level; rdata[0].data = (char *) &xlrec; rdata[0].len = SizeOfBtreeSplit; --- 953,959 xlrec.rightsib = BufferGetBlockNumber(rbuf); xlrec.firstright = firstright; xlrec.rnext = ropaque->btpo_next; ! xlrec.level = ropaque->btpo.level; rdata[0].data = (char *) &xlrec; rdata[0].len = SizeOfBtreeSplit; *** *** 962,968 lastrdata = &rdata[0]; /* Log downlink on non-leaf pages. */ ! if (lopaque->btpo.level > 0) { lastrdata->next = lastrdata + 1; lastrdata++; --- 962,968 lastrdata = &rdata[0]; /* Log downlink on non-leaf pages. */ ! if (ropaque->btpo.level > 0) { lastrdata->next = lastrdata + 1; lastrdata++; *** *** 1040,1047 recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata); ! PageSetLSN(leftpage, recptr); ! PageSetTLI(leftpage, ThisTimeLineID); PageSetLSN(rightpage, recptr); PageSetTLI(rightpage, ThisTimeLineID); if (!P_RIGHTMOST(ropaque)) --- 1040,1047 recptr = XLogInsert(RM_BTREE_ID, xlinfo, rdata); ! PageSetLSN(origpage, recptr); ! PageSetTLI(origpage, ThisTimeLineID); PageSetLSN(rightpage, recptr); PageSetTLI(rightpage, ThisTimeLineID); if (!P_RIGHTMOST(ropaque)) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Proposal: Commit timestamp
> Yes, yes, and yes ... but aside from the problem that you use the very > ambiguous word "timestamp" (which somehow suggests using a "clock" of > some sort), isn't the "begin" timestamp of a long running transaction imho a begin timestamp is near useless > worse than the "commit" timestamp, when all its work got visible to the > outside world instantaneously? This is one of the areas I am still worried about. Is one commit lamport timestamp enough ? I think for some conflict resolutions we need to look at the row level, and resolve conflicts per row and not per transaction (yes, this means that a tx might get partially replicated). What I am trying to lead at is: maybe an infrastructure to produce wieck lamport timestamps, that can be used in different places like commit hooks and column defaults, would be of more general use. Maybe such a column could be a system column that is not visible with "select *" for those cases where commit is not enough. And a commit hook could insert it into clog like storage. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Ooops ... seems we need a re-release pronto
On Tue, Feb 06, 2007 at 02:16:05PM -0500, Tom Lane wrote: > Jim Nasby <[EMAIL PROTECTED]> writes: > > On Feb 6, 2007, at 12:40 PM, Tom Lane wrote: > >> ... massive expansion of the tests doesn't seem justified > > > What about the idea that's been floated in the past about a -- > > extensive mode for regression testing that would (generally) only be > > used by the build farm. That would mean others wouldn't have to > > suffer through extremely long make check's. > > > Or is there another reason not to expand the tests? > > I'm not concerned so much about the runtime as the development and > maintenance effort... I can see development... but are there enough changes where the maintenance would be an issue? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Stefan Kaltenbrunner wrote: Bruce Momjian wrote: Log Message: --- Reduce WAL activity for page splits: Currently, an index split writes all the data on the split page to WAL. That's a lot of WAL traffic. The tuples that are copied to the right page need to be WAL logged, but the tuples that stay on the original page don't. this patch killed both of my OpenBSD buildfarm members Sorry about that.. I'll take a look at it. a backtrace of the crash shows: (gdb) bt #0 0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50, newitemoff=54, newitemsz=76, newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956 #1 0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53, stack=0x8210a078, keysz=2, scankey=0x8210a3f0, itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at nbtinsert.c:538 #2 0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364, index_is_unique=1 '\001', heapRel=0x7ec0ee7c) at nbtinsert.c:141 #3 0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224 #4 0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2, arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267 #5 0x1c052d20 in index_insert (indexRelation=0x835fcab0, values=0xcf7defd0, isnull=0xcf7df050 "", heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1 '\001') at indexam.c:196 #6 0x1c08f87e in CatalogIndexInsert (indstate=0x2, heapTuple=0x8210a224) at indexing.c:124 #7 0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c, heapTuple=0x7df53e4c) at indexing.c:149 #8 0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1 '\001', relIsBecomingView=0 '\0') at rewriteSupport.c:80 #9 0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at rewriteDefine.c:432 #10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484, dest=0x3c0c8ff4, completionTag=0xcf7df2a0 "") at pquery.c:1063 #11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4, altdest=0x3c0c8ff4, completionTag=0xcf7df2a0 "") at pquery.c:1131 #12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647, dest=0x3c0c8ff4, altdest=0x3c0c8ff4, completionTag=0xcf7df2a0 "") at pquery.c:700 #13 0x1c187d13 in exec_simple_query ( query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n * Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n * $PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35 2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939 #14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480, username=0x80ba55b0 "mastermind") at postgres.c:3423 #15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186 Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Re: [COMMITTERS] pgsql: Reduce WAL activity for page splits: > Currently, an index split
Bruce Momjian wrote: Log Message: --- Reduce WAL activity for page splits: Currently, an index split writes all the data on the split page to WAL. That's a lot of WAL traffic. The tuples that are copied to the right page need to be WAL logged, but the tuples that stay on the original page don't. this patch killed both of my OpenBSD buildfarm members a backtrace of the crash shows: (gdb) bt #0 0x1c055504 in _bt_split (rel=0x835fcab0, buf=53, firstright=50, newitemoff=54, newitemsz=76, newitem=0x8210a364, newitemonleft=0 '\0') at nbtinsert.c:956 #1 0x1c05468b in _bt_insertonpg (rel=0x835fcab0, buf=53, stack=0x8210a078, keysz=2, scankey=0x8210a3f0, itup=0x8210a364, afteritem=0, split_only_page=0 '\0') at nbtinsert.c:538 #2 0x1c053fae in _bt_doinsert (rel=0x835fcab0, itup=0x8210a364, index_is_unique=1 '\001', heapRel=0x7ec0ee7c) at nbtinsert.c:141 #3 0x1c05a1c4 in btinsert (fcinfo=0x2) at nbtree.c:224 #4 0x1c2028fc in FunctionCall6 (flinfo=0x8bc6d1f0, arg1=2, arg2=2, arg3=2, arg4=2, arg5=2, arg6=2) at fmgr.c:1267 #5 0x1c052d20 in index_insert (indexRelation=0x835fcab0, values=0xcf7defd0, isnull=0xcf7df050 "", heap_t_ctid=0x7df53e50, heapRelation=0x7ec0ee7c, check_uniqueness=1 '\001') at indexam.c:196 #6 0x1c08f87e in CatalogIndexInsert (indstate=0x2, heapTuple=0x8210a224) at indexing.c:124 #7 0x1c08f902 in CatalogUpdateIndexes (heapRel=0x7ec0ee7c, heapTuple=0x7df53e4c) at indexing.c:149 #8 0x1c17255c in SetRelationRuleStatus (relationId=10951, relHasRules=1 '\001', relIsBecomingView=0 '\0') at rewriteSupport.c:80 #9 0x1c16f074 in DefineQueryRewrite (stmt=0x82ca7e78) at rewriteDefine.c:432 #10 0x1c18c822 in PortalRunUtility (portal=0x82cb201c, query=0x81897484, dest=0x3c0c8ff4, completionTag=0xcf7df2a0 "") at pquery.c:1063 #11 0x1c18cb12 in PortalRunMulti (portal=0x82cb201c, dest=0x3c0c8ff4, altdest=0x3c0c8ff4, completionTag=0xcf7df2a0 "") at pquery.c:1131 #12 0x1c18c28a in PortalRun (portal=0x82cb201c, count=2147483647, dest=0x3c0c8ff4, altdest=0x3c0c8ff4, completionTag=0xcf7df2a0 "") at pquery.c:700 #13 0x1c187d13 in exec_simple_query ( query_string=0x84a2501c "/*\n * PostgreSQL System Views\n *\n * Copyright (c) 1996-2007, PostgreSQL Global Development Group\n *\n * $PostgreSQL: pgsql/src/backend/catalog/system_views.sql,v 1.35 2007/01/05 22:19:25 momjian Exp $\n"...) at postgres.c:939 #14 0x1c18abb2 in PostgresMain (argc=1, argv=0xcf7df480, username=0x80ba55b0 "mastermind") at postgres.c:3423 #15 0x1c122102 in main (argc=10, argv=0xcf7df47c) at main.c:186 Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match