Re: [HACKERS] On partitioning
From: Amit Kapila [mailto:amit.kapil...@gmail.com] > > > How would you distinguish values in list partition for multiple > > > columns? I mean for range partition, we are sure there will > > > be either one value for each column, but for list it could > > > be multiple and not fixed for each partition, so I think it will not > > > be easy to support the multicolumn partition key for list > > > partitions. > > >Irrespective of difficulties of representing it using pg_node_tree, it seems > >to me that multicolumn list partitioning is not widely used. > > So I think it is better to be clear why we are not planning to > support it, is it that because it is not required by users or > is it due to the reason that code seems to be tricky or is it due > to both of the reasons. It might help us if anyone raises this > during the development of this patch or in general if someone > requests such a feature. Coming back to the how pg_node_tree representation for list partitions - For each column in a multicolumn list partition key, a value would look like a dumped Node for List of Consts (all allowed values in a given list partition). And the whole key would then be a List of such Nodes (a dump thereof). That's perhaps pretty verbose but I guess that's supposed to be only a catalog representation. During relcache building, we turn this back into a collection of structs to efficiently locate the partition of interest whatever the method of doing that ends up being (based on partition type). The relcache step ensures that we have decoupled the concern of quickly locating an interesting partition from its catalog representation. Of course, there may be flaws in this picture and would only reveal themselves when actually trying to implement it or they can be pointed out in advance. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
On Mon, Dec 8, 2014 at 3:42 PM, Rahila Syed wrote: > >>The important point to consider for this patch is the use of the additional >> 2-bytes as uint16 in the block information structure to save the length of a >> compressed >>block, which may be compressed without its hole to achieve a double level >> of compression (image compressed without its hole). We may use a simple flag >> on >>one or two bits using for example a bit from hole_length, but in this case >> we would need to always compress images with their hole included, something >> more > >expensive as the compression would take more time. > As you have mentioned here the idea to use bits from existing fields rather > than adding additional 2 bytes in header, > FWIW elaborating slightly on the way it was done in the initial patches, > We can use the following struct > > unsignedhole_offset:15, > compress_flag:2, > hole_length:15; > > Here compress_flag can be 0 or 1 depending on status of compression. We can > reduce the compress_flag to just 1 bit flag. Just adding that this is fine as the largest page size that can be set is 32k. > IIUC, the purpose of adding compress_len field in the latest patch is to > store length of compressed blocks which is used at the time of decoding the > blocks. > > With this approach, length of compressed block can be stored in hole_length > as, > > hole_length = BLCKSZ - compress_len. > > Thus, hole_length can serve the purpose of storing length of a compressed > block without the need of additional 2-bytes. In DecodeXLogRecord, > hole_length can be used for tracking the length of data received in cases of > both compressed as well as uncompressed blocks. > > As you already mentioned, this will need compressing images with hole but > we can MemSet hole to 0 in order to make compression of hole less expensive > and effective. Thanks for coming back to this point in more details, this is very important. The additional 2 bytes used make compression less expensive by ignoring the hole, for a bit more data in each record. Using uint16 is as well a cleaner code style, more in-line wit hte other fields, but that's a personal opinion ;) Doing a switch from one approach to the other is easy enough though, so let's see what others think. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
>The important point to consider for this patch is the use of the additional 2-bytes as uint16 in the block information structure to save the length of a compressed >block, which may be compressed without its hole to achieve a double level of compression (image compressed without its hole). We may use a simple flag on >one or two bits using for example a bit from hole_length, but in this case we would need to always compress images with their hole included, something more >expensive as the compression would take more time. As you have mentioned here the idea to use bits from existing fields rather than adding additional 2 bytes in header, FWIW elaborating slightly on the way it was done in the initial patches, We can use the following struct unsignedhole_offset:15, compress_flag:2, hole_length:15; Here compress_flag can be 0 or 1 depending on status of compression. We can reduce the compress_flag to just 1 bit flag. IIUC, the purpose of adding compress_len field in the latest patch is to store length of compressed blocks which is used at the time of decoding the blocks. With this approach, length of compressed block can be stored in hole_length as, hole_length = BLCKSZ - compress_len. Thus, hole_length can serve the purpose of storing length of a compressed block without the need of additional 2-bytes. In DecodeXLogRecord, hole_length can be used for tracking the length of data received in cases of both compressed as well as uncompressed blocks. As you already mentioned, this will need compressing images with hole but we can MemSet hole to 0 in order to make compression of hole less expensive and effective. Thank you, Rahila Syed On Sat, Dec 6, 2014 at 7:37 PM, Michael Paquier wrote: > > On Sat, Dec 6, 2014 at 12:17 AM, Andres Freund > wrote: > >> On 2014-12-06 00:10:11 +0900, Michael Paquier wrote: >> > On Sat, Dec 6, 2014 at 12:06 AM, Michael Paquier < >> michael.paqu...@gmail.com> >> > wrote: >> > >> > > >> > > >> > > >> > > On Fri, Dec 5, 2014 at 11:10 PM, Rahila Syed > > >> > > wrote: >> > > >> > >> I attempted quick review and could not come up with much except this >> > >> >> > >> + /* >> > >> +* Calculate the amount of FPI data in the record. Each backup >> block >> > >> +* takes up BLCKSZ bytes, minus the "hole" length. >> > >> +* >> > >> +* XXX: We peek into xlogreader's private decoded backup blocks >> for >> > >> the >> > >> +* hole_length. It doesn't seem worth it to add an accessor >> macro for >> > >> +* this. >> > >> +*/ >> > >> + fpi_len = 0; >> > >> + for (block_id = 0; block_id <= record->max_block_id; block_id++) >> > >> + { >> > >> + if (XLogRecHasCompressedBlockImage(record, block_id)) >> > >> + fpi_len += BLCKSZ - >> record->blocks[block_id].compress_len; >> > >> >> > >> >> > >> IIUC, fpi_len in case of compressed block image should be >> > >> >> > >> fpi_len = record->blocks[block_id].compress_len; >> > >> >> > > Yep, true. Patches need a rebase btw as Heikki fixed a commit related >> to >> > > the stats of pg_xlogdump. >> > > >> > >> > In any case, any opinions to switch this patch as "Ready for committer"? >> >> Needing a rebase is a obvious conflict to that... But I guess some wider >> looks afterwards won't hurt. >> > > Here are rebased versions, which are patches 1 and 2. And I am switching > as well the patch to "Ready for Committer". The important point to consider > for this patch is the use of the additional 2-bytes as uint16 in the block > information structure to save the length of a compressed block, which may > be compressed without its hole to achieve a double level of compression > (image compressed without its hole). We may use a simple flag on one or two > bits using for example a bit from hole_length, but in this case we would > need to always compress images with their hole included, something more > expensive as the compression would take more time. > > Robert wrote: > > What I would suggest is instrument the backend with getrusage() at > > startup and shutdown and have it print the difference in user time and > > system time. Then, run tests for a fixed number of transactions and > > see how the total CPU usage for the run differs. > That's a nice idea, which is done with patch 3 as a simple hack calling > twice getrusage at the beginning of PostgresMain and before proc_exit, > calculating the difference time and logging it for each process (used as > well log_line_prefix with %p). > > Then I just did a small test with a load of a pgbench-scale-100 database > on fresh instances: > 1) Compression = on: > Stop LSN: 0/487E49B8 > getrusage: proc 11163: LOG: user diff: 63.071127, system diff: 10.898386 > pg_xlogdump: FPI size: 122296653 [90.52%] > 2) Compression = off > Stop LSN: 0/4E54EB88 > Result: proc 11648: LOG: user diff: 43.855212, system diff: 7.857965 > pg_xlogdump: FPI size: 204359192 [94.10%] > And the CPU consumption is showing quite some d
Re: [HACKERS] On partitioning
On Mon, Dec 8, 2014 at 11:01 AM, Amit Langote wrote: > From: Amit Kapila [mailto:amit.kapil...@gmail.com] > Sent: Saturday, December 06, 2014 5:00 PM > To: Robert Haas > Cc: Amit Langote; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers > Subject: Re: [HACKERS] On partitioning > > On Fri, Dec 5, 2014 at 10:03 PM, Robert Haas wrote: > > On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote > > wrote: > > > > > I wonder if your suggestion of pg_node_tree plays well here. This then could be a list of CONSTs or some such... And I am thinking it's a concern only for range partitions, no? (that is, a multicolumn partition key) > > > > I guess you could list or hash partition on multiple columns, too. > > > > How would you distinguish values in list partition for multiple > > columns? I mean for range partition, we are sure there will > > be either one value for each column, but for list it could > > be multiple and not fixed for each partition, so I think it will not > > be easy to support the multicolumn partition key for list > > partitions. > > Irrespective of difficulties of representing it using pg_node_tree, it seems to me that multicolumn list partitioning is not widely used. So I think it is better to be clear why we are not planning to support it, is it that because it is not required by users or is it due to the reason that code seems to be tricky or is it due to both of the reasons. It might help us if anyone raises this during the development of this patch or in general if someone requests such a feature. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
On Mon, Dec 8, 2014 at 12:34 PM, Adam Brightwell wrote: > Michael, > >> This patch (https://commitfest.postgresql.org/action/patch_view?id=1616) >> has not been updated in the commitfest app for two months, making its >> progress hard to track. > > > I believe that the mentioned patch should be considered 'on hold' or > 'dependent' upon the acceptance of the work that is being done in this > thread. > > Also, the changes proposed by this thread have already been added to the > next commitfest > (https://commitfest.postgresql.org/action/patch_view?id=1651). > >> However, even if some progress has been made >> things are still not complete (documentation, etc.). Opinions about >> marking that as returned with feedback for the current commit fest and >> create a new entry for the next CF if this work is going to be >> pursued? >> >> I guess that it would be fine simply move it to the next CF, but it >> seems I cannot do that myself in the CF app. > > This work will certainly continue to be pursued. If a simple move is > possible/acceptable, then I think that would be the best option. I can > handle that as it would appear that I am capable of moving it, if that is > acceptable. Yes please. Actually I could have done it, just found the option to do so. Let's see what shows up with your work. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Sat, Dec 6, 2014 at 9:21 PM, Noah Misch wrote: > On Thu, Dec 04, 2014 at 10:00:14AM +0530, Ashutosh Bapat wrote: > > On Thu, Dec 4, 2014 at 9:05 AM, Etsuro Fujita < > fujita.ets...@lab.ntt.co.jp> wrote: > > > (2014/12/03 19:35), Ashutosh Bapat wrote: > > >> On Tue, Dec 2, 2014 at 8:29 AM, Etsuro Fujita > > >> mailto:fujita.ets...@lab.ntt.co.jp>> > wrote: > > > IIUC, even the transactions over the local and the *single* remote > > >> server are not guaranteed to be executed atomically in the current > > >> form. It is possible that the remote transaction succeeds and the > > >> local one fails, for example, resulting in data inconsistency > > >> between the local and the remote. > > >> > > > > > > IIUC, while committing transactions involving a single remote server, > > >> the steps taken are as follows > > >> 1. the local changes are brought to PRE-COMMIT stage, which means that > > >> the transaction *will* succeed locally after successful completion of > > >> this phase, > > >> 2. COMMIT message is sent to the foreign server > > >> 3. If step two succeeds, local changes are committed and successful > > >> commit is conveyed to the client > > >> 4. if step two fails, local changes are rolled back and abort status > is > > >> conveyed to the client > > >> 5. If step 1 itself fails, the remote changes are rolled back. > > >> This is as per one phase commit protocol which guarantees ACID for > > >> single foreign data source. So, the changes involving local and a > single > > >> foreign server seem to be atomic and consistent. > > >> > > > > > > Really? Maybe I'm missing something, but I don't think the current > > > implementation for committing transactions has such a mechanism stated > in > > > step 1. So, I think it's possible that the local transaction fails in > > > step3 while the remote transaction succeeds, as mentioned above. > > > > > > > > PFA a script attached which shows this. You may want to check the code in > > pgfdw_xact_callback() for actions taken by postgres_fdw on various > events. > > CommitTransaction() for how those events are generated. The code there > > complies with the sequence above. > > While postgres_fdw delays remote commit to eliminate many causes for having > one server commit while another aborts, other causes remain. The local > transaction could still fail due to WAL I/O problems or a system crash. > 2PC > is the reliable answer, but that was explicitly out of scope for the > initial > postgres_fdw write support. Does this inheritance patch add any atomicity > problem that goes away when one breaks up the inheritance hierarchy and > UPDATEs each table separately? If not, this limitation is okay. > If the UPDATES crafted after breaking up the inheritance hierarchy are needed to be run within the same transaction (as the UPDATE on inheritance hierarchy would do), yes, there is atomicity problem. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] On partitioning
From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila Sent: Saturday, December 06, 2014 5:06 PM To: Robert Haas Cc: Amit Langote; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers Subject: Re: [HACKERS] On partitioning On Fri, Dec 5, 2014 at 10:12 PM, Robert Haas wrote: > On Fri, Dec 5, 2014 at 2:18 AM, Amit Kapila wrote: > > Do we really need to support dml or pg_dump for individual partitions? > > I think we do. It's quite reasonable for a DBA (or developer or > whatever) to want to dump all the data that's in a single partition; > for example, maybe they have the table partitioned, but also spread > across several servers. When the data on one machine grows too big, > they want to dump that partition, move it to a new machine, and drop > the partition from the old machine. That needs to be easy and > efficient. > > More generally, with inheritance, I've seen the ability to reference > individual inheritance children be a real life-saver on any number of > occasions. Now, a new partitioning system that is not as clunky as > constraint exclusion will hopefully be fast enough that people don't > need to do it very often any more. But I would be really cautious > about removing the option. That is the equivalent of installing a new > fire suppression system and then boarding up the emergency exit. > Yeah, you *hope* the new fire suppression system is good enough that > nobody will ever need to go out that way any more. But if you're > wrong, people will die, so getting rid of it isn't prudent. The > stakes are not quite so high here, but the principle is the same. > > > Sure, I don't feel we should not provide anyway to take dump > for individual partition but not at level of independent table. > May be something like --table > --partition . > This does sound cleaner. > In general, I think we should try to avoid exposing that partitions are > individual tables as that might hinder any future enhancement in that > area (example if we someone finds a different and better way to > arrange the partition data, then due to the currently exposed syntax, > we might feel blocked). Sounds like a concern. I guess you are referring to whether we allow a partition relation to be included in the range table and then some other cases. In the former case we could allow referring to individual partitions by some additional syntax if it doesn’t end up looking too ugly or invite a bunch of other issues. This seems to have been discussed a little bit upthread (for example, see "Open Questions" in Alvaro's original proposal and Hannu Krosing's reply). Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: Saturday, December 06, 2014 5:00 PM To: Robert Haas Cc: Amit Langote; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers Subject: Re: [HACKERS] On partitioning On Fri, Dec 5, 2014 at 10:03 PM, Robert Haas wrote: > On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote > wrote: > > > I wonder if your suggestion of pg_node_tree plays well here. This then > > could be a list of CONSTs or some such... And I am thinking it's a concern > > only for range partitions, no? (that is, a multicolumn partition key) > > I guess you could list or hash partition on multiple columns, too. > > How would you distinguish values in list partition for multiple > columns? I mean for range partition, we are sure there will > be either one value for each column, but for list it could > be multiple and not fixed for each partition, so I think it will not > be easy to support the multicolumn partition key for list > partitions. Irrespective of difficulties of representing it using pg_node_tree, it seems to me that multicolumn list partitioning is not widely used. It is used in combination with range or hash partitioning as composite partitioning. So, perhaps we need not worry about that. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] On partitioning
Hi Robert, > From: Robert Haas [mailto:robertmh...@gmail.com] > On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote > wrote: > >> So, we're going to support exactly two levels of partitioning? > >> partitions with partissub=false and subpartitions with partissub=true? > >> Why not support only one level of partitioning here but then let the > >> children have their own pg_partitioned_rel entries if they are > >> subpartitioned? That seems like a cleaner design and lets us support > >> an arbitrary number of partitioning levels if we ever need them. > > > > Yeah, that's what I thought at some point in favour of dropping partissub > altogether. However, not that this design solves it, there is one question - > if > we would want to support defining for a table both partition key and sub- > partition key in advance? That is, without having defined a first level > partition > yet; in that case, what level do we associate sub-(sub-) partitioning key with > or more to the point where do we keep it? > > Do we really need to allow that? I think you let people partition a > toplevel table, and then partition its partitions once they've been > created. I'm not sure there's a good reason to associate the > subpartitioning scheme with the toplevel table. For one thing, that > forces all subpartitions to be partitioned the same way - do we want > to insist on that? If we do, then I agree that we need to think a > little harder here. > To me, it sounds better if we insist on a uniform subpartitioning scheme across all partitions. It seems that's how it's done elsewhere. It would be interesting to hear what others think though. > > That would be a default partition. That is, where the tuples that don't > belong elsewhere (other defined partitions) go. VALUES clause of the > definition for such a partition would look like: > > > > (a range partition) ... VALUES LESS THAN MAXVALUE > > (a list partition) ... VALUES DEFAULT > > > > There has been discussion about whether there shouldn't be such a place > for tuples to go. That is, it should generate an error if a tuple can't go > anywhere (or support auto-creating a new one like in interval partitioning?) > > I think Alvaro's response further down the thread is right on target. > But to go into a bit more detail, let's consider the three possible > cases: > > - Hash partitioning. Every key value gets hashed to some partition. > The concept of an overflow or default partition doesn't even make > sense. > > - List partitioning. Each key for which the user has defined a > mapping gets sent to the corresponding partition. The keys that > aren't mapped anywhere can either (a) cause an error or (b) get mapped > to some default partition. It's probably useful to offer both > behaviors. But I don't think it requires a partitionisoverflow > column, because you can represent it some other way, such as by making > partitionvalues NULL, which is otherwise meaningless. > > - Range partitioning. In this case, what you've basically got is a > list of partition bounds and a list of target partitions. Suppose > there are N partition bounds; then there will be N+1 targets. Some of > those targets can be undefined, meaning an attempt to insert a key > with that value will error out. For example, suppose the user defines > a partition for values 1-3 and 10-13. Then your list of partition > bounds looks like this: > > 1,3,10,13 > > And your list of destinations looks like this: > > undefined,firstpartition,undefined,secondpartition,undefined > > More commonly, the ranges will be contiguous, so that there are no > gaps. If you have everything <10 in the first partition, everything > 10-20 in the second partition, and everything else in a third > partition, then you have bounds 10,20 and destinations > firstpartition,secondpartition,thirdpartition. If you want values > greater than 20 to error out, then you have bounds 10,20 and > destinations firstpartition,secondpartition,undefined. > > In none of this do you really have "an overflow partition". Rather, > the first and last destinations, if defined, catch everything that has > a key lower than the lowest key or higher than the highest key. If > not defined, you error out. So just to clarify, first and last destinations are considered "defined" if you have something like: ... PARTITION p1 VALUES LESS THAN 10 PARTITION p2 VALUES BETWEEN 10 AND 20 PARTITION p3 VALUES GREATER THAN 20 ... And "not defined" if: ... PARTITION p1 VALUES BETWEEN 10 AND 20 ... In the second case, because no explicit definitions for values less than 10 and greater than 20 are in place, rows with that value error out? If so, that makes sense. > > > I wonder if your suggestion of pg_node_tree plays well here. This then > could be a list of CONSTs or some such... And I am thinking it's a concern > only > for range partitions, no? (that is, a multicolumn partition key) > > I guess you could list or hash partition on multiple colum
Re: [HACKERS] Parallel Seq Scan
On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost wrote: > > * Amit Kapila (amit.kapil...@gmail.com) wrote: > > 1. As the patch currently stands, it just shares the relevant > > data (like relid, target list, block range each worker should > > perform on etc.) to the worker and then worker receives that > > data and form the planned statement which it will execute and > > send the results back to master backend. So the question > > here is do you think it is reasonable or should we try to form > > the complete plan for each worker and then share the same > > and may be other information as well like range table entries > > which are required. My personal gut feeling in this matter > > is that for long term it might be better to form the complete > > plan of each worker in master and share the same, however > > I think the current way as done in patch (okay that needs > > some improvement) is also not bad and quite easier to implement. > > For my 2c, I'd like to see it support exactly what the SeqScan node > supports and then also what Foreign Scan supports. That would mean we'd > then be able to push filtering down to the workers which would be great. > Even better would be figuring out how to parallelize an Append node > (perhaps only possible when the nodes underneath are all SeqScan or > ForeignScan nodes) since that would allow us to then parallelize the > work across multiple tables and remote servers. > > One of the big reasons why I was asking about performance data is that, > today, we can't easily split a single relation across multiple i/o > channels. Sure, we can use RAID and get the i/o channel that the table > sits on faster than a single disk and possibly fast enough that a single > CPU can't keep up, but that's not quite the same. The historical > recommendations for Hadoop nodes is around one CPU per drive (of course, > it'll depend on workload, etc, etc, but still) and while there's still a > lot of testing, etc, to be done before we can be sure about the 'right' > answer for PG (and it'll also vary based on workload, etc), that strikes > me as a pretty reasonable rule-of-thumb to go on. > > Of course, I'm aware that this won't be as easy to implement.. > > > 2. Next question related to above is what should be the > > output of ExplainPlan, as currently worker is responsible > > for forming its own plan, Explain Plan is not able to show > > the detailed plan for each worker, is that okay? > > I'm not entirely following this. How can the worker be responsible for > its own "plan" when the information passed to it (per the above > paragraph..) is pretty minimal? Because for a simple sequence scan that much information is sufficient, basically if we have scanrelid, target list, qual and then RTE (primarily relOid), then worker can form and perform sequence scan. > In general, I don't think we need to > have specifics like "this worker is going to do exactly X" because we > will eventually need some communication to happen between the worker and > the master process where the worker can ask for more work because it's > finished what it was tasked with and the master will need to give it > another chunk of work to do. I don't think we want exactly what each > worker process will do to be fully formed at the outset because, even > with the best information available, given concurrent load on the > system, it's not going to be perfect and we'll end up starving workers. > The plan, as formed by the master, should be more along the lines of > "this is what I'm gonna have my workers do" along w/ how many workers, > etc, and then it goes and does it. I think here you want to say that work allocation for workers should be dynamic rather fixed which I think makes sense, however we can try such an optimization after some initial performance data. > Perhaps for an 'explain analyze' we > return information about what workers actually *did* what, but that's a > whole different discussion. > Agreed. > > 3. Some places where optimizations are possible: > > - Currently after getting the tuple from heap, it is deformed by > > worker and sent via message queue to master backend, master > > backend then forms the tuple and send it to upper layer which > > before sending it to frontend again deforms it via slot_getallattrs(slot). > > If this is done as I was proposing above, we might be able to avoid > this, but I don't know that it's a huge issue either way.. The bigger > issue is getting the filtering pushed down. > > > - Master backend currently receives the data from multiple workers > > serially. We can optimize in a way that it can check other queues, > > if there is no data in current queue. > > Yes, this is pretty critical. In fact, it's one of the recommendations > I made previously about how to change the Append node to parallelize > Foreign Scan node work. > > > - Master backend is just responsible for coordination among workers > > It shares the required information to workers and then f
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 8, 2014 at 7:33 AM, Dilip kumar wrote: > On 06 December 2014 20:01 Amit Kapila Wrote > > >I wanted to understand what exactly the above loop is doing. > > > > >a. > > >first of all the comment on top of it says "Some of the slot > > >are free, ...", if some slot is free, then why do you want > > >to process the results? (Do you mean to say that *None* of > > >the slot is free?) > > > > This comment is wrong, I will remove this. > I suggest rather than removing, edit the comment to indicate the idea behind code at that place. > >b. > > >IIUC, you have called function select_loop(maxFd, &slotset) > > >to check if socket descriptor is readable, if yes then why > > >in do..while loop the same maxFd is checked always, don't > > >you want to check different socket descriptors? I am not sure > > >if I am missing something here > > > > select_loop(maxFd, &slotset) > > > So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete, > Okay, I think this part of code is somewhat similar to what is done in pg_dump/parallel.c with some differences related to handling of inAbort. One thing I have noticed here which could lead to a problem is that caller of select_loop() function assumes that return value is less than zero only if there is a cancel request which I think is wrong, because select system call could also return -1 in case of error. I am referring below code in above context: + if (i < 0) + { + /* + * This can only happen if user has sent the cancel request using + * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result. + */ + + GetQueryResult(pSlot[0].connection, dbname, progname, +completedb); With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
Michael, This patch (https://commitfest.postgresql.org/action/patch_view?id=1616) > has not been updated in the commitfest app for two months, making its > progress hard to track. I believe that the mentioned patch should be considered 'on hold' or 'dependent' upon the acceptance of the work that is being done in this thread. Also, the changes proposed by this thread have already been added to the next commitfest (https://commitfest.postgresql.org/action/patch_view?id=1651 ). However, even if some progress has been made > things are still not complete (documentation, etc.). Opinions about > marking that as returned with feedback for the current commit fest and > create a new entry for the next CF if this work is going to be > pursued? I guess that it would be fine simply move it to the next CF, but it > seems I cannot do that myself in the CF app. > This work will certainly continue to be pursued. If a simple move is possible/acceptable, then I think that would be the best option. I can handle that as it would appear that I am capable of moving it, if that is acceptable. Thanks, Adam -- Adam Brightwell - adam.brightw...@crunchydatasolutions.com Database Engineer - www.crunchydatasolutions.com
Re: [HACKERS] Compression of full-page-writes
On Mon, Dec 8, 2014 at 11:30 AM, Simon Riggs wrote: > * parameter should be SUSET - it doesn't *need* to be set only at > server start since all records are independent of each other Check. > * ideally we'd like to be able to differentiate the types of usage. > which then allows the user to control the level of compression > depending upon the type of action. My first cut at what those settings > should be are ALL > LOGICAL > PHYSICAL > VACUUM. > VACUUM - only compress while running vacuum commands > PHYSICAL - only compress while running physical DDL commands (ALTER > TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't > typically be used for logical decoding > LOGICAL - compress FPIs for record types that change tables > ALL - all user commands > (each level includes all prior levels) Well, that's clearly an optimization so I don't think this should be done for a first shot but those are interesting fresh ideas. Technically speaking, note that we would need to support such things with a new API to switch a new context flag in registered_buffers of xloginsert.c for each block, and decide if the block is compressed based on this context flag, and the compression level wanted. > * name should not be wal_compression - we're not compressing all wal > records, just fpis. There is no evidence that we even want to compress > other record types, nor that our compression mechanism is effective at > doing so. Simple => keep name as compress_full_page_writes > Though perhaps we should have it called wal_compression_level I don't really like those new names, but I'd prefer wal_compression_level if we go down that road with 'none' as default value. We may still decide in the future to support compression at the record level instead of context level, particularly if we have an API able to do palloc_return_null_at_oom, so the idea of WAL compression is not related only to FPIs IMHO. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Testing DDL deparsing support
On 14/12/07 12:43, Bruce Momjian wrote: > On Tue, Dec 2, 2014 at 03:13:07PM -0300, Alvaro Herrera wrote: >> Robert Haas wrote: >>> On Thu, Nov 27, 2014 at 11:43 PM, Ian Barwick wrote: >> A simple schedule to demonstrate this is available; execute from the src/test/regress/ directory like this: ./pg_regress \ --temp-install=./tmp_check \ --top-builddir=../../.. \ --dlpath=. \ --schedule=./schedule_ddl_deparse_demo >>> >>> I haven't read the code, but this concept seems good to me. >> >> Excellent, thanks. >> >>> It has the unfortunate weakness that a difference could exist during >>> the *middle* of the regression test run that is gone by the *end* of >>> the run, but our existing pg_upgrade testing has the same weakness, so >>> I guess we can view this as one more reason not to be too aggressive >>> about having regression tests drop the unshared objects they create. >> >> Agreed. Not dropping objects also helps test pg_dump itself; the normal >> procedure there is run the regression tests, then pg_dump the regression >> database. Objects that are dropped never exercise their corresponding >> pg_dump support code, which I think is a bad thing. I think we should >> institute a policy that regression tests must keep the objects they >> create; maybe not all of them, but at least a sample large enough to >> cover all interesting possibilities. > > This causes creation DDL is checked if it is used in the regression > database, but what about ALTER and DROP? pg_dump doesn't issue those, > except in special cases like inheritance. Sure, pg_dump won't contain ALTER/DROP DDL; we are using pg_dump after replaying the DDL commands to compare the actual state of the database with the expected state. As I'm in the middle of writing these tests, before I go any further do you accept the tests need to be included? Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of full-page-writes
> On Thu, Dec 4, 2014 at 8:37 PM, Michael Paquier wrote > I pondered something that Andres mentioned upthread: we may not do the >compression in WAL record only for blocks, but also at record level. Hence >joining the two ideas together I think that we should definitely have >a different >GUC to control the feature, consistently for all the images. Let's call it >wal_compression, with the following possible values: >- on, meaning that a maximum of compression is done, for this feature >basically full_page_writes = on. >- full_page_writes, meaning that full page writes are compressed >- off, default value, to disable completely the feature. >This would let room for another mode: 'record', to completely compress >a record. For now though, I think that a simple on/off switch would be >fine for this patch. Let's keep things simple. +1 for a separate parameter for compression Some changed thoughts to the above * parameter should be SUSET - it doesn't *need* to be set only at server start since all records are independent of each other * ideally we'd like to be able to differentiate the types of usage. which then allows the user to control the level of compression depending upon the type of action. My first cut at what those settings should be are ALL > LOGICAL > PHYSICAL > VACUUM. VACUUM - only compress while running vacuum commands PHYSICAL - only compress while running physical DDL commands (ALTER TABLE set tablespace, CREATE INDEX), i.e. those that wouldn't typically be used for logical decoding LOGICAL - compress FPIs for record types that change tables ALL - all user commands (each level includes all prior levels) * name should not be wal_compression - we're not compressing all wal records, just fpis. There is no evidence that we even want to compress other record types, nor that our compression mechanism is effective at doing so. Simple => keep name as compress_full_page_writes Though perhaps we should have it called wal_compression_level -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/12/07 2:02), David Fetter wrote: On Thu, Dec 04, 2014 at 12:35:54PM +0900, Etsuro Fujita wrote: But I think there would be another idea. An example will be shown below. We show the update commands below the ModifyTable node, not above the corresponding ForeignScan nodes, so maybe less confusing. If there are no objections of you and others, I'll update the patch this way. postgres=# explain verbose update parent set a = a * 2 where a = 5; QUERY PLAN - Update on public.parent (cost=0.00..280.77 rows=25 width=10) On public.ft1 Remote SQL: UPDATE public.mytable_1 SET a = $2 WHERE ctid = $1 ^^ It occurs to me that the command generated by the FDW might well not be SQL at all, as is the case with file_fdw and anything else that talks to a NoSQL engine. Would it be reasonable to call this "Remote command" or something similarly generic? Yeah, but I'd like to propose that this line is shown by the FDW API (ie, ExplainForeignModify) as in non-inherited update cases, so that the FDW developer can choose the right name. As for "On public.ft1", I'd like to propose that the FDW API also show that by calling a function for that introduced into the PG core (Would it be better to use "For" rather than "On"?). Sorry, my explanation was not enough. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 06 December 2014 20:01 Amit Kapila Wrote >I wanted to understand what exactly the above loop is doing. >a. >first of all the comment on top of it says "Some of the slot >are free, ...", if some slot is free, then why do you want >to process the results? (Do you mean to say that *None* of >the slot is free?) This comment is wrong, I will remove this. >b. >IIUC, you have called function select_loop(maxFd, &slotset) >to check if socket descriptor is readable, if yes then why >in do..while loop the same maxFd is checked always, don't >you want to check different socket descriptors? I am not sure >if I am missing something here select_loop(maxFd, &slotset) maxFd is the max descriptor among all SETS, and slotset contains all the descriptor, so if any of the descriptor get some message select_loop will come out, and once select loop come out, we need to check how many descriptor have got the message from server so we loop and process the results. So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete, and if none of the socket is still free (that we check in below for loop), then go to select_loop again. >c. >After checking the socket descriptor for maxFd why you want >to run run the below for loop for all slots? >for (i = 0; i < max_slot; i++) After Select loop is out, it’s possible that we might have got result on multiple connections, so consume input and check if still busy, then nothing to do, but if finished process the result and mark the connection free. And if any of the connection is free, then we will break the do..while loop. From: Amit Kapila [mailto:amit.kapil...@gmail.com] Sent: 06 December 2014 20:01 To: Dilip kumar Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ] On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar mailto:dilip.ku...@huawei.com>> wrote: > > On 24 November 2014 11:29, Amit Kapila Wrote, > I have verified that all previous comments are addressed and the new version is much better than previous version. > > here we are setting each target once and doing for all the tables.. > Hmm, theoretically I think new behaviour could lead to more I/O in certain cases as compare to existing behaviour. The reason for more I/O is that in the new behaviour, while doing Analyze for a particular table at different targets, in-between it has Analyze of different table as well, so the pages in shared buffers or OS cache for a particular table needs to be reloded again for a new target whereas currently it will do all stages of Analyze for a particular table in one-go which means that each stage of Analyze could get benefit from the pages of a table loaded by previous stage. If you agree, then we should try to avoid this change in new behaviour. > > Please provide you opinion. I have few questions regarding function GetIdleSlot() + static int + GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, +const char *progname, bool completedb) { .. +/* +* Some of the slot are free, Process the results for slots whichever +* are free +*/ + +do +{ +SetCancelConn(pSlot[0].connection); + +i = select_loop(maxFd, &slotset); + +ResetCancelConn(); + +if (i < 0) +{ +/* + * This can only happen if user has sent the cancel request using +* Ctrl+C, Cancel is handled by 0th slot, so fetch the error result. +*/ + + GetQueryResult(pSlot[0].connection, dbname, progname, + completedb); +return NO_SLOT; +} + +Assert(i != 0); + + for (i = 0; i < max_slot; i++) +{ +if (!FD_ISSET(pSlot[i].sock, &slotset)) +continue; + +PQconsumeInput(pSlot[i].connection); + if (PQisBusy(pSlot[i].connection)) +continue; + + pSlot[i].isFree = true; + +if (!GetQueryResult(pSlot[i].connection, dbname, progname, + completedb)) + return NO_SLOT; + +if (firstFree < 0) +firstFree = i; + } +}while(firstFree < 0); } I wanted to understand what exactly the above loop is doing. a. first of all the comment on
Re: [HACKERS] Fractions in GUC variables
On Mon, Dec 8, 2014 at 4:48 AM, John Gorman wrote: > The attached patch applies cleanly against master and passes all regression > tests including two new tests in guc.sql. Please be sure to register your patch to the upcoming commit fest, this way it will not fall into oblivion and will get some feedback: https://commitfest.postgresql.org/action/commitfest_view?id=25 -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On 08/12/14 00:56, Noah Misch wrote: On Wed, Dec 03, 2014 at 11:54:38AM -0300, Alvaro Herrera wrote: Pushed with some extra cosmetic tweaks. The commit_ts test suite gives me the attached diff on a 32-bit MinGW build running on 64-bit Windows Server 2003. I have not checked other Windows configurations; the suite does pass on GNU/Linux. Hmm I wonder if "< now()" needs to be changed to "<= now()" in those queries to make them work correctly on that plarform, I don't have machine with that environment handy right now, so I would appreciate if you could try that, in case you don't have time for that, I will try to setup something later... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On Mon, Dec 8, 2014 at 10:18 AM, Petr Jelinek wrote: > On 08/12/14 02:06, Petr Jelinek wrote: >> Simon actually already committed something similar, so no need. >> > > ...except for the removal of pause_at_recovery_target it seems, so I > attached just that Thanks! Removal of this parameter is getting at least two votes, and nobody expressed against it, so IMO we should remove it now instead or later, or else I'm sure we would simply forget it. My2c. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Status of Commit fest 2014-10
Hi all, I have been through a certain number of patches and marked the ones that needed some love from their authors as returned with feedback (at least the ones marked as such in the CF app), but you may have noticed that :) This email is a call to move on and close soon the CF currently open and to move to the next one, which normally begins in one week. Hence, to committers, there are actually 11 patches waiting for some attention: - Use faster, higher precision timer API GetSystemTimeAsFileTime on windows - Compression of Full Page Writes - REINDEX SCHEMA - Add IF NOT EXISTS to CREATE TABLE AS and CREATE MATERIALIZED VIEW - pgcrypto: support PGP signatures - Point to polygon distance operator - Jsonb generator functions - json strip nulls functions - pg_basebackup vs. Windows and tablespaces - New operators for pgbench expressions, with modulo - Refactor of functions related to quoting from builtins.h to utils/quote.h To patch authors: please ping your reviewers (IF NOT NULL) if you think that you did not get enough feedback. To patch reviewers: please update the status of the patch on the CF app if you have done your part of the work. At this point I think that it may be preferable to move the work-in-progress items to the next CF. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On 08/12/14 02:06, Petr Jelinek wrote: On 08/12/14 02:03, Michael Paquier wrote: On Mon, Dec 8, 2014 at 9:59 AM, Petr Jelinek wrote: Ok this patch does that, along with the rename to recovery_target_action and addition to the recovery.conf.sample. This needs a rebase as at least da71632 and b8e33a8 are conflicting. Simon actually already committed something similar, so no need. ...except for the removal of pause_at_recovery_target it seems, so I attached just that -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 31473cd..07b4856 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -280,26 +280,6 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows - - pause_at_recovery_target (boolean) - -pause_at_recovery_target recovery parameter - - - - -Alias for recovery_target_action, true is same as -recovery_target_action = pause and false -is same as recovery_target_action = promote. - - -This setting has no effect if is not -enabled, or if no recovery target is set. - - - - recovery_target_action (enum) diff --git a/doc/src/sgml/release-9.1.sgml b/doc/src/sgml/release-9.1.sgml index 79a8b07..5cbfc4a 100644 --- a/doc/src/sgml/release-9.1.sgml +++ b/doc/src/sgml/release-9.1.sgml @@ -6301,8 +6301,8 @@ -Add recovery.conf setting pause_at_recovery_target +Add recovery.conf setting +pause_at_recovery_target to pause recovery at target (Simon Riggs) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0f09add..6370aed 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4653,7 +4653,6 @@ readRecoveryCommandFile(void) ConfigVariable *item, *head = NULL, *tail = NULL; - bool recoveryPauseAtTargetSet = false; bool recoveryTargetActionSet = false; @@ -4699,25 +4698,6 @@ readRecoveryCommandFile(void) (errmsg_internal("archive_cleanup_command = '%s'", archiveCleanupCommand))); } - else if (strcmp(item->name, "pause_at_recovery_target") == 0) - { - bool recoveryPauseAtTarget; - - if (!parse_bool(item->value, &recoveryPauseAtTarget)) -ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("parameter \"%s\" requires a Boolean value", "pause_at_recovery_target"))); - - ereport(DEBUG2, - (errmsg_internal("pause_at_recovery_target = '%s'", - item->value))); - - recoveryTargetAction = recoveryPauseAtTarget ? - RECOVERY_TARGET_ACTION_PAUSE : - RECOVERY_TARGET_ACTION_PROMOTE; - - recoveryPauseAtTargetSet = true; - } else if (strcmp(item->name, "recovery_target_action") == 0) { if (strcmp(item->value, "pause") == 0) @@ -4902,17 +4882,6 @@ readRecoveryCommandFile(void) RECOVERY_COMMAND_FILE))); } - /* - * Check for mutually exclusive parameters - */ - if (recoveryPauseAtTargetSet && recoveryTargetActionSet) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set both \"%s\" and \"%s\" recovery parameters", - "pause_at_recovery_target", - "recovery_target_action"), - errhint("The \"pause_at_recovery_target\" is deprecated."))); - /* * Override any inconsistent requests. Not that this is a change -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On 08/12/14 02:03, Michael Paquier wrote: On Mon, Dec 8, 2014 at 9:59 AM, Petr Jelinek wrote: Ok this patch does that, along with the rename to recovery_target_action and addition to the recovery.conf.sample. This needs a rebase as at least da71632 and b8e33a8 are conflicting. Simon actually already committed something similar, so no need. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On Mon, Dec 8, 2014 at 9:59 AM, Petr Jelinek wrote: > Ok this patch does that, along with the rename to recovery_target_action and > addition to the recovery.conf.sample. This needs a rebase as at least da71632 and b8e33a8 are conflicting. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: multivariate statistics / proof of concept
On Sun, Nov 16, 2014 at 3:35 AM, Tomas Vondra wrote: > Thanks for the link. I've been looking for a good dataset with such > data, and this one is by far the best one. > > The current version of the patch supports only data types passed by > value (i.e. no varlena types - text, ), which means it's impossible to > build multivariate stats on some of the interesting columns (state, > city, ...). > > I guess it's time to start working on removing this limitation. Tomas, what's your status on this patch? Are you planning to make it more complicated than it is? For now I have switched it to a "Needs Review" state because even your first version did not get advanced review (that's quite big btw). I guess that we should switch it to the next CF. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add shutdown_at_recovery_target option to recovery.conf
On 05/12/14 16:49, Robert Haas wrote: On Thu, Dec 4, 2014 at 8:45 AM, Michael Paquier wrote: Here is patch which renames action_at_recovery_target to recovery_target_action everywhere. Thanks, Looks good to me. A couple of things that would be good to document as well in recovery-config.sgml: - the fact that pause_at_recovery_target is deprecated. Why not just remove it altogether? I don't think the backward-compatibility break is enough to get excited about, or to justify the annoyance of carrying an extra parameter that does (part of) the same thing. Ok this patch does that, along with the rename to recovery_target_action and addition to the recovery.conf.sample. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index b4959ac..519a0ce 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -280,31 +280,11 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows - - pause_at_recovery_target (boolean) + + recovery_target_action (enum) -pause_at_recovery_target recovery parameter - - - - -Alias for action_at_recovery_target, true is same as -action_at_recovery_target = pause and false -is same as action_at_recovery_target = promote. - - -This setting has no effect if is not -enabled, or if no recovery target is set. - - - - - - action_at_recovery_target (enum) - -action_at_recovery_target recovery parameter +recovery_target_action recovery parameter @@ -336,7 +316,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows Note that because recovery.conf will not be renamed when -action_at_recovery_target is set to shutdown, +recovery_target_action is set to shutdown, any subsequent start will end with immediate shutdown unless the configuration is changed or the recovery.conf is removed manually. diff --git a/doc/src/sgml/release-9.1.sgml b/doc/src/sgml/release-9.1.sgml index 79a8b07..4fcc0b3 100644 --- a/doc/src/sgml/release-9.1.sgml +++ b/doc/src/sgml/release-9.1.sgml @@ -6301,8 +6301,7 @@ -Add recovery.conf setting pause_at_recovery_target +Add recovery.conf setting pause_at_recovery_target to pause recovery at target (Simon Riggs) diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 7657df3..42da62b 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -94,12 +94,13 @@ #recovery_target_timeline = 'latest' # # -# If pause_at_recovery_target is enabled, recovery will pause when -# the recovery target is reached. The pause state will continue until -# pg_xlog_replay_resume() is called. This setting has no effect if -# hot standby is not enabled, or if no recovery target is set. +# The recovery_target_action option can be set to either promote, paused +# or shutdown and decides the behaviour of the server once the recovery_target +# is reached. Promote will promote the server to standalone one. Pause will +# pause the recovery, which can be then resumed by pg_xlog_replay_resume(). +# And shutdown will stop the server. # -#pause_at_recovery_target = true +#recovery_target_action = 'pause' # #--- # STANDBY SERVER PARAMETERS diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index da28de9..469a83b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -229,7 +229,7 @@ static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; static bool recoveryTargetInclusive = true; -static RecoveryTargetAction actionAtRecoveryTarget = RECOVERY_TARGET_ACTION_PAUSE; +static RecoveryTargetAction recoveryTargetAction = RECOVERY_TARGET_ACTION_PAUSE; static TransactionId recoveryTargetXid; static TimestampTz recoveryTargetTime; static char *recoveryTargetName; @@ -4653,8 +4653,7 @@ readRecoveryCommandFile(void) ConfigVariable *item, *head = NULL, *tail = NULL; - bool recoveryPauseAtTargetSet = false; - bool actionAtRecoveryTargetSet = false; + bool recoveryTargetActionSet = false; fd = AllocateFile(RECOVERY_COMMAND_FILE, "r"); @@ -4699,45 +4698,26 @@ readRecoveryCommandFile(void) (errmsg_internal("archive_cleanup_command = '%s'", archiveCleanupCommand))); } - else if (strcmp(item->name, "pause_at_recovery_target") == 0) - { - bool recoveryPauseAtTarget; - - if
Re: [HACKERS] detect custom-format dumps in psql and emit a useful error
On Fri, Oct 24, 2014 at 7:23 PM, Andres Freund wrote: > On 2014-10-24 07:18:55 -0300, Alvaro Herrera wrote: >> Yeah, this patch is a lot more debatable than the other one. I have >> pushed the first one without changing the error message. > > We could just test for toc.dat and then emit the warning... One patch has been committed, and the second is debatable. Hence marking this entry as returned with feedback in the CF app. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Sat, Dec 6, 2014 at 9:01 PM, Amit Kapila wrote: > If you agree, then we should try to avoid this change in new behaviour. Still seeing many concerns about this patch, so marking it as returned with feedback. If possible, switching it to the next CF would be fine I guess as this work is still being continued. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Role Attribute Bitmask Catalog Representation
On Sun, Dec 7, 2014 at 1:50 AM, Stephen Frost wrote: > * Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote: >> > I don't see any changes to the regression test files, were they >> > forgotten in the patch? I would think that at least the view definition >> > changes would require updates to the regression tests, though perhaps >> > nothing else. >> >> Hmmm... :-/ The regression tests that changed were in >> 'src/test/regress/expected/rules.out' and should be near the bottom of the >> patch. > > Hah, looked just like changes to the system_views, sorry for the > confusion. :) > >> > Overall, I'm pretty happy with the patch and would suggest moving on to >> > writing up the documentation changes to go along with the code changes. >> > I'll continue to play around with it but it all seems pretty clean to >> > me and will allow us to easily add the additiaonl role attributes being >> > discussed. >> >> Sounds good. I'll start on those changes next. This patch (https://commitfest.postgresql.org/action/patch_view?id=1616) has not been updated in the commitfest app for two months, making its progress hard to track. However, even if some progress has been made things are still not complete (documentation, etc.). Opinions about marking that as returned with feedback for the current commit fest and create a new entry for the next CF if this work is going to be pursued? I guess that it would be fine simply move it to the next CF, but it seems I cannot do that myself in the CF app. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
On Tue, Dec 2, 2014 at 2:14 PM, Jeff Davis wrote: > On Sun, 2014-11-30 at 17:49 -0800, Peter Geoghegan wrote: >> On Mon, Nov 17, 2014 at 11:39 PM, Jeff Davis wrote: >> > I can also just move isReset there, and keep mem_allocated as a uint64. >> > That way, if I find later that I want to track the aggregated value for >> > the child contexts as well, I can split it into two uint32s. I'll hold >> > off any any such optimizations until I see some numbers from HashAgg >> > though. >> >> I took a quick look at memory-accounting-v8.patch. >> >> Is there some reason why mem_allocated is a uint64? All other things >> being equal, I'd follow the example of tuplesort.c's >> MemoryContextAllocHuge() API, which (following bugfix commit >> 79e0f87a1) uses int64 variables to track available memory and so on. > > No reason. New version attached; that's the only change. Note that I am marking this patch back to "Needs Review" state. I doesn't seem that this patch has been reviewed completely. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support UPDATE table SET(*)=...
On Wed, Nov 26, 2014 at 4:26 AM, Tom Lane wrote: > I think what's likely missing here is a clear design for the raw parse > tree representation (what's returned by the bison grammar). The patch > seems to be trying to skate by without creating any new parse node types > or fields, but that may well be a bad idea. At the very least there > needs to be some commentary added to parsenodes.h explaining what the > representation actually is (cf commentary there for MultiAssignRef). > > Also, I think it's a mistake not to be following the MultiAssignRef > model for the case of "(*) = ctext_row". We optimize the ROW-source > case at the grammar stage when there's a fixed number of target columns, > because that's a very easy transformation --- but you can't do it like > that when there's not. It's possible that that optimization should be > delayed till later even in the existing case; in general, optimizing > in gram.y is a bad habit that's better avoided ... Marking as "returned with feedback" based on those comments. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Selectivity estimation for inet operators
On Wed, Dec 3, 2014 at 6:14 AM, Emre Hasegeli wrote: >> Actually, there's a second large problem with this patch: blindly >> iterating through all combinations of MCV and histogram entries makes the >> runtime O(N^2) in the statistics target. I made up some test data (by >> scanning my mail logs) and observed the following planning times, as >> reported by EXPLAIN ANALYZE: >> >> explain analyze select * from relays r1, relays r2 where r1.ip = r2.ip; >> explain analyze select * from relays r1, relays r2 where r1.ip && r2.ip; >> >> stats targeteqjoinsel networkjoinsel >> >> 100 0.27 ms 1.85 ms >> 10002.56 ms 167.2 ms >> 1 56.6 ms 13987.1 ms >> >> I don't think it's necessary for network selectivity to be quite as >> fast as eqjoinsel, but I doubt we can tolerate 14 seconds planning >> time for a query that might need just milliseconds to execute :-( >> >> It seemed to me that it might be possible to reduce the runtime by >> exploiting knowledge about the ordering of the histograms, but >> I don't have time to pursue that right now. > > I make it break the loop when we passed the last possible match. Patch > attached. It reduces the runtime almost 50% with large histograms. > > We can also make it use only some elements of the MCV and histogram > for join estimation. I have experimented with reducing the right and > the left hand side of the lists on the previous versions. I remember > it was better to reduce only the left hand side. I think it would be > enough to use log(n) elements of the right hand side MCV and histogram. > I can make the change, if you think selectivity estimation function > is the right place for this optimization. Marking as "Returned with feedback" as more work needs to be done. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add CREATE support to event triggers
On Thu, Nov 27, 2014 at 10:16 AM, Bruce Momjian wrote: > On Sat, Nov 8, 2014 at 05:56:00PM +0100, Andres Freund wrote: >> On 2014-11-08 11:52:43 -0500, Tom Lane wrote: >> > Adding a similar >> > level of burden to support a feature with a narrow use-case seems like >> > a nonstarter from here. >> >> I don't understand this statement. In my experience the lack of a usable >> replication solution that allows temporary tables and major version >> differences is one of the most, if not *the* most, frequent criticisms >> of postgres I hear. How is this a narrow use case? > > How would replicating DDL handle cases where the master and slave > servers have different major versions and the DDL is only supported by > the Postgres version on the master server? If it would fail, does this > limit the idea that logical replication allows major version-different > replication? Marking this patch as "Returned with feedback". Even with the more-fundamental arguments of putting such replication solution in-core or not (I am skeptical as well btw), on a code-base-discussion-only I don't think that this patch is acceptable as-is without more structure of jsonb to do on-memory manipulation of containers (aka remove ObjTree stuff). Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gist vacuum gist access
On Tue, Sep 9, 2014 at 12:47 AM, Heikki Linnakangas wrote: > On 09/08/2014 03:26 PM, Alexander Korotkov wrote: >> >> On Mon, Sep 8, 2014 at 12:51 PM, Heikki Linnakangas >> >> >>> wrote: >> >> >>> On 09/08/2014 11:19 AM, Alexander Korotkov wrote: >>> On Mon, Sep 8, 2014 at 12:08 PM, Alexander Korotkov > wrote: On Mon, Sep 8, 2014 at 11:13 AM, Heikki Linnakangas < > > hlinnakan...@vmware.com> wrote: > > In the b-tree code, we solved that problem back in 2006, so it can be >> >> done but requires a bit more code. In b-tree, we solved it with a >> "vacuum >> cycle ID" number that's set on the page halves when a page is split. >> That >> allows VACUUM to identify pages that have been split concurrently sees >> them, and "jump back" to vacuum them too. See commit >> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h= >> 5749f6ef0cc1c67ef9c9ad2108b3d97b82555c80. It should be possible to do >> something similar in GiST, and in fact you might be able to reuse the >> NSN >> field that's already set on the page halves on split, instead of >> adding >> a >> new "vacuum cycle ID". >> > ... Another note. Assuming we have NSN which can play the role of "vacuum cycle ID", can we implement sequential (with possible "jump back") index scan for GiST? >>> >>> >>> Yeah, I think it would work. It's pretty straightforward, the page split >>> code already sets the NSN, just when we need it. Vacuum needs to memorize >>> the current NSN when it begins, and whenever it sees a page with a higher >>> NSN (or the FOLLOW_RIGHT flag is set), follow the right-link if it points >>> to lower-numbered page. >> >> >> I mean "full index scan" feature for SELECT queries might be implemented >> as >> well as sequential VACUUM. > > > Oh, sorry, I missed that. If you implement a full-index scan like that, you > might visit some tuples twice, so you'd have to somehow deal with the > duplicates. For a bitmap index scan it would be fine. This patch has been in a "Wait on Author" state for quite a long time and Heikki has provided comments on it. Switching it to "Returned with feedback". -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On Wed, Dec 3, 2014 at 4:03 PM, Michael Paquier wrote: > Ping? This patch is in a stale state for a couple of weeks and still > marked as waiting on author for this CF. Marked as returned with feedback. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Log inability to lock pages during vacuum
On 20 October 2014 at 10:57, Jim Nasby wrote: > Currently, a non-freeze vacuum will punt on any page it can't get a cleanup > lock on, with no retry. Presumably this should be a rare occurrence, but I > think it's bad that we just assume that and won't warn the user if something > bad is going on. (I'm having email problems, so I can't see later mails on this thread, so replying here.) Logging patch looks fine, but I would rather not add a line of text for each VACUUM, just in case this is non-zero. I think we should add that log line only if the blocks skipped > 0. What I'm more interested in is what you plan to do with the information once we get it? The assumption that skipping blocks is something bad is strange. I added it because VACUUM could and did regularly hang on busy tables, which resulted in bloat because other blocks that needed cleaning didn't get any attention. Which is better, spend time obsessively trying to vacuum particular blocks, or to spend the time on other blocks that are in need of cleaning and are available to be cleaned? Which is better, have autovacuum or system wide vacuum progress on to other tables that need cleaning, or spend lots of effort retrying? How do we know what is the best next action? I'd really want to see some analysis of those things before we spend even more cycles on this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tracking commit timestamps
On Wed, Dec 03, 2014 at 11:54:38AM -0300, Alvaro Herrera wrote: > Pushed with some extra cosmetic tweaks. The commit_ts test suite gives me the attached diff on a 32-bit MinGW build running on 64-bit Windows Server 2003. I have not checked other Windows configurations; the suite does pass on GNU/Linux. *** Z:/nm/postgresql/src/test/modules/commit_ts/expected/commit_timestamp.out 2014-12-05 05:43:01.07442 + --- Z:/nm/postgresql/src/test/modules/commit_ts/results/commit_timestamp.out 2014-12-05 08:24:13.094705200 + *** *** 19,27 ORDER BY id; id | ?column? | ?column? | ?column? +--+--+-- ! 1 | t| t| t ! 2 | t| t| t ! 3 | t| t| t (3 rows) DROP TABLE committs_test; --- 19,27 ORDER BY id; id | ?column? | ?column? | ?column? +--+--+-- ! 1 | t| f| t ! 2 | t| f| t ! 3 | t| f| t (3 rows) DROP TABLE committs_test; *** *** 34,39 SELECT x.xid::text::bigint > 0, x.timestamp > '-infinity'::timestamptz, x.timestamp < now() FROM pg_last_committed_xact() x; ?column? | ?column? | ?column? --+--+-- ! t| t| t (1 row) --- 34,39 SELECT x.xid::text::bigint > 0, x.timestamp > '-infinity'::timestamptz, x.timestamp < now() FROM pg_last_committed_xact() x; ?column? | ?column? | ?column? --+--+-- ! t| t| f (1 row) == -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fractions in GUC variables
On 12/07/2014 11:48 AM, John Gorman wrote: > > This patch implements the first wiki/Todo Configuration Files item > "Consider normalizing fractions in postgresql.conf, perhaps using '%'". > > The "Fractions in GUC variables" discussion is here. > Oh, this is nice! Thanks for working on it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fractions in GUC variables
This patch implements the first wiki/Todo Configuration Files item "Consider normalizing fractions in postgresql.conf, perhaps using '%'". The "Fractions in GUC variables" discussion is here. http://www.postgresql.org/message-id/467132cf.9020...@enterprisedb.com This patch implements expressing GUC variables as percents in postgresql.conf. autovacuum_vacuum_scale_factor = 20% # percent of table size before vacuum autovacuum_analyze_scale_factor = 10% # percent of table size before analyze As you can see the postgresql.conf file and the documentation read more naturally. I added a regression test to guc.sql. The sql interface also accepts both numeric and percent forms although the percent form must be quoted because '%' is an operator. show cursor_tuple_fraction; --> 10% set cursor_tuple_fraction = .15; --> 15% set cursor_tuple_fraction = '33%'; --> 33% I tagged four configuration variables to display as percents. The attached patch applies cleanly against master and passes all regression tests including two new tests in guc.sql. guc_percent-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
Peter Eisentraut writes: > My radical proposal therefore would have been to embrace this > inconsistency and get rid of PGC_BACKEND and PGC_SU_BACKEND altogether, > relying on users interpreting the parameter names to indicate that > changing them later may or may not have an effect. Unfortunately, the > concerns about ignore_system_indexes prevent that. Yeah, I think making ignore_system_indexes USERSET would be a pretty bad idea. People would expect that they can frob it back and forth with no impact other than performance, and I'm doubtful that that's true. If we wanted to make a push to get rid of PGC_BACKEND, I could see changing ignore_system_indexes to SUSET category, and assuming that superusers are adults who won't push a button just to see what it does. But having said that, I don't really think that PGC_BACKEND is a useless category. It provides a uniform way of documenting that changing a particular setting post-session-start is useless. Therefore I'm not on board with getting rid of it. To the extent that we can make ALTER ROLE/DATABASE control these settings, that would be a good thing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter user set local_preload_libraries.
On 11/12/14 1:01 PM, Fujii Masao wrote: > On Wed, Nov 5, 2014 at 1:22 AM, Peter Eisentraut wrote: >> On 10/9/14 1:58 PM, Fujii Masao wrote: >>> Also I think that it's useful to allow ALTER ROLE/DATABASE SET to >>> set PGC_BACKEND and PGC_SU_BACKEND parameters. So, what >>> about applying the attached patch? This patch allows that and >>> changes the context of session_preload_libraries to PGC_SU_BACKEND. >> >> After looking through this again, I wonder whether there is any reason >> why ignore_system_indexes cannot be plain PGC_USERSET. With this >> change, we'd allow setting it via ALTER ROLE, but the access to >> pg_db_role_setting happens before it. > > Even without the patch, we can set ignore_system_indexes > at the startup of the connection because it's defined with > PGC_BACKEND context, but the access to system tables > can happen before that. Am I missing something? Let's think about what would happen if we allowed PGC_BACKEND settings to be changed by ALTER ROLE. Here is the current set of PGC_BACKEND and PGC_SU_BACKEND settings: - ignore_system_indexes Reason: "interesting" consequences if changed later - post_auth_delay Reason: changing later would have no effect - local_preload_libraries Reason: changing later would have no effect - log_connections Reason: changing later would have no effect - log_disconnections Reason: dubious / consistency with log_connections? Only ignore_system_indexes is really in the spirit of the original PGC_BACKEND setting, namely for settings that unprivileged users can set at the beginning of a session but should not change later. We used to have "fsync" in that category, for example, because changing that was not considered safe at some time. We used to have a lot more of these, but not many stood the test of time. The other settings are really just things that take effect during session start but don't hurt when changed later. The problem is that the order of these relative to ALTER ROLE processing is really an implementation detail or a best effort type of thing. For example, it looks as though we are making an effort to process post_auth_delay really late, after ALTER ROLE processing (even though we currently don't allow it to be set that way; strange), but there is no reason why that has to be so. One could reasonably think that "post auth" is really early, before catalog access starts. On the other hand, log_connections is processed really early, so ALTER ROLE would have no effect. This is going to end up inconsistent one way or the other. My radical proposal therefore would have been to embrace this inconsistency and get rid of PGC_BACKEND and PGC_SU_BACKEND altogether, relying on users interpreting the parameter names to indicate that changing them later may or may not have an effect. Unfortunately, the concerns about ignore_system_indexes prevent that. We could think of another way to address those concerns, e.g., with an ad hoc way in an assign hook. The other proposal would be to keep PGC_BACKEND and PGC_SU_BACKEND as special-case warts, perhaps document them as such, but change everything to use something else as much as possible, namely post_auth_delay -> PGC_USERSET local_preload_libraries -> PGC_USERSET log_disconnections -> PGC_SUSET -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misunderstanding on the FSM README file
On 12/07/2014 02:03 PM, Guillaume Lelarge wrote: Hi, I've been reading the FSM README file lately (src/backend/storage/freespace/README), and I'm puzzled by one of the graph (the binary tree structure of an FSM file). Here it is: 4 4 2 3 4 0 2<- This level represents heap pages Shouldn't the last line be: 4 3 2 0 (ie, highest number of free space on the left node, lowest on the right one) Probably just nitpicking, but still, I'm wondering if I missed something out. No, that's not how it works. Each number at the bottom level corresponds to a particular heap page. The first number would be heap page #0 (which has 3 units of free space), the second heap page #1 (with 4 units of free space) and so forth. Each node on the upper levels stores the maximum of its two children. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Misunderstanding on the FSM README file
Hi, I've been reading the FSM README file lately (src/backend/storage/freespace/README), and I'm puzzled by one of the graph (the binary tree structure of an FSM file). Here it is: 4 4 2 3 4 0 2<- This level represents heap pages Shouldn't the last line be: 4 3 2 0 (ie, highest number of free space on the left node, lowest on the right one) Probably just nitpicking, but still, I'm wondering if I missed something out. Thanks. -- Guillaume. http://blog.guillaume.lelarge.info http://www.dalibo.com
Re: [HACKERS] Removing INNER JOINs
On 4 December 2014 at 12:24, Simon Riggs wrote: > On 3 December 2014 at 12:18, Atri Sharma wrote: > >> So the planner keeps all possibility satisfying plans, or it looks at the >> possible conditions (like presence of foreign key for this case, for eg) and >> then lets executor choose between them? > > I'm suggesting the planner keeps 2 plans: One with removable joins, > one without the removable joins. I only just noticed the thread moved on while I was flying. So it looks Tom and I said the same thing, or close enough for me to +1 Tom. Another idea would be to only skip Hash and Merge Joins, since the tests for those are fairly easy to put into the Init call. That sounds slightly easier than the proposal with the Option/Choice/Switch node. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers