Re: [HACKERS] Streaming base backups
On 7.1.2011 15:45, Magnus Hagander wrote: On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.com wrote: One very useful feature will be some way of confirming the number and size of files to transfer, so that the base backup client can find out the progress. The patch already does this. Or rather, as it's coded it does this once per tablespace. It'll give you an approximation only of course, that can change, In this case you actually could send exact numbers, as you need to only transfer the files up to the size they were when starting the base backup. The rest will be taken care of by WAL replay but it should be enough for the purposes of a progress indication. It would also be good to avoid writing a backup_label file at all on the master, so there was no reason why multiple concurrent backups could not be taken. The current coding allows for the idea that the start and stop might be in different sessions, whereas here we know we are in one session. Yeah, I have that on the todo list suggested by Heikki. I consider it a later phase though. -- Hannu Krosing Senior Consultant, Infinite Scalability Performance http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Range Types
On Sat, 2011-01-08 at 21:58 -0500, Robert Haas wrote: I mean, one semi-obvious possibility is to write one set of C functions that can have multiple SQL-level definitions bound to it. Then when the function is called, it can peek at flinfo-fn_oid to figure out which incarnation was called and then get the typo info from there. That's ugly, though. That would work, but it is pretty ugly. It means it would be very difficult for users to write new generic functions, because they would need to add a new catalog entry for every existing range type. Then again, wasting 4 bytes per row is not ideal, either. And maybe users could still write some useful generic functions if they were a combination of other generic functions, using the polymorphic system. It'd be really nice if we could just arrange for the info on which type anyrange actually is at the moment to be available in the right place. Storing it on disk to work around that is pretty horrible, but maybe there's no other reasonable option. I was surprised when I saw the solution for records. Maybe we should consider something like that as a last resort (if it's possible for non-record types)? I'd rather give up typmod than anyrange. It also might be worth figuring out why input functions get the type oid and output functions do not. I see this comment above getTypeIOParam(): * As of PostgreSQL 8.1, output functions receive only the value itself * and not any auxiliary parameters, so the name of this routine is now * a bit of a misnomer ... it should be getTypeInputParam. So, why was it eliminated? If the type output function got the type OID, would it be enough to use fn_expr_get_argtype() for the other generic functions? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming base backups
On Sun, Jan 9, 2011 at 09:55, Hannu Krosing ha...@2ndquadrant.com wrote: On 7.1.2011 15:45, Magnus Hagander wrote: On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.com wrote: One very useful feature will be some way of confirming the number and size of files to transfer, so that the base backup client can find out the progress. The patch already does this. Or rather, as it's coded it does this once per tablespace. It'll give you an approximation only of course, that can change, In this case you actually could send exact numbers, as you need to only transfer the files up to the size they were when starting the base backup. The rest will be taken care of by WAL replay It will still be an estimate, because files can get smaller, and even go away completely. But we really don't need more than an estimate... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming base backups
On 9.1.2011 10:44, Magnus Hagander wrote: On Sun, Jan 9, 2011 at 09:55, Hannu Krosingha...@2ndquadrant.com wrote: On 7.1.2011 15:45, Magnus Hagander wrote: On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.comwrote: One very useful feature will be some way of confirming the number and size of files to transfer, so that the base backup client can find out the progress. The patch already does this. Or rather, as it's coded it does this once per tablespace. It'll give you an approximation only of course, that can change, In this case you actually could send exact numbers, as you need to only transfer the files up to the size they were when starting the base backup. The rest will be taken care of by WAL replay It will still be an estimate, because files can get smaller, and even go away completely. Sure. Just wanted to remind the fact that you don't need to send the tail part of the file which was added after the start of backup. And you can give the worst case estimate for space needed by base backup. OTOH, streaming the WAL files in parallel can still fill up all available space :P But we really don't need more than an estimate... Agreed. -- Hannu Krosing Senior Consultant, Infinite Scalability Performance http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming base backups
On Sun, Jan 9, 2011 at 12:05, Hannu Krosing ha...@2ndquadrant.com wrote: On 9.1.2011 10:44, Magnus Hagander wrote: On Sun, Jan 9, 2011 at 09:55, Hannu Krosingha...@2ndquadrant.com wrote: On 7.1.2011 15:45, Magnus Hagander wrote: On Fri, Jan 7, 2011 at 02:15, Simon Riggssi...@2ndquadrant.com wrote: One very useful feature will be some way of confirming the number and size of files to transfer, so that the base backup client can find out the progress. The patch already does this. Or rather, as it's coded it does this once per tablespace. It'll give you an approximation only of course, that can change, In this case you actually could send exact numbers, as you need to only transfer the files up to the size they were when starting the base backup. The rest will be taken care of by WAL replay It will still be an estimate, because files can get smaller, and even go away completely. Sure. Just wanted to remind the fact that you don't need to send the tail part of the file which was added after the start of backup. True - but that's a PITA to keep track of. We do this if the file changes during the transmission of that *file*, since otherwise the tar header would specify an incorrect size, but not through the whole backup. And you can give the worst case estimate for space needed by base backup. OTOH, streaming the WAL files in parallel can still fill up all available space :P Yeah. I don't think it's worth the extra complexity of having to enumerate and keep records for every individual file before the streaming starts. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Fri, Jan 7, 2011 at 22:21, Robert Haas robertmh...@gmail.com wrote: On Fri, Jan 7, 2011 at 1:46 PM, Josh Berkus j...@agliodbs.com wrote: To my way of thinking, pg_stat_walsender and pg_stat_walreceiver would be more clear than pg_stat_replication_master and pg_stat_replication_slave. Let's commit it so that some of us can get a look at the data it contains, and then we can fix the name during beta. Well, the first half is committed, under the name pg_stat_replication. So go look at that, for starters... One thing I noticed is that it gives an interesting property to my patch for streaming base backups - they now show up in pg_stat_replication, with a streaming location of 0/0. If the view is named pg_stat_replication, we probably want to filter that out somehow. But then, do we want a separate view listing the walsenders that are busy sending base backups? For that matter, do we want an indication that separates a walsender not sending data from one sending that happens to be at location 0/0? Most will leave 0/0 really quickly, but a walsender can be idle (not received a command yet), or it can be running IDENTIFY_SYSTEM for example. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] estimating # of distinct values
A resource fork? Not sure what you mean, could you describe it in more detail? Ooops, resource forks are a filesystem thing; we call them relation forks. From src/backend/storage/smgr/README: Relation Forks == Since 8.4, a single smgr relation can be comprised of multiple physical files, called relation forks. This allows storing additional metadata like Free Space information in additional forks, which can be grown and truncated independently of the main data file, while still treating it all as a single physical relation in system catalogs. It is assumed that the main fork, fork number 0 or MAIN_FORKNUM, always exists. Fork numbers are assigned in src/include/storage/relfilenode.h. Functions in smgr.c and md.c take an extra fork number argument, in addition to relfilenode and block number, to identify which relation fork you want to access. Since most code wants to access the main fork, a shortcut version of ReadBuffer that accesses MAIN_FORKNUM is provided in the buffer manager for convenience. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming base backups
2011/1/7 Garick Hamlin gham...@isc.upenn.edu: On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cédric Villemain wrote: 2011/1/5 Magnus Hagander mag...@hagander.net: On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Magnus Hagander mag...@hagander.net writes: * Stefan mentiond it might be useful to put some posix_fadvise(POSIX_FADV_DONTNEED) in the process that streams all the files out. Seems useful, as long as that doesn't kick them out of the cache *completely*, for other backends as well. Do we know if that is the case? Maybe have a look at pgfincore to only tag DONTNEED for blocks that are not already in SHM? I think that's way more complex than we want to go here. DONTNEED will remove the block from OS buffer everytime. It should not be that hard to implement a snapshot(it needs mincore()) and to restore previous state. I don't know how basebackup is performed exactly...so perhaps I am wrong. posix_fadvise support is already in postgresql core...we can start by just doing a snapshot of the files before starting, or at some point in the basebackup, it will need only 256kB per GB of data... It is actually possible to be more scalable than the simple solution you outline here (although that solution works pretty well). Yes I suggest something pretty simple to go with a first shoot. I've written a program that syncronizes the OS cache state using mmap()/mincore() between two computers. It haven't actually tested its impact on performance yet, but I was surprised by how fast it actually runs and how compact cache maps can be. If one encodes the data so one remembers the number of zeros between 1s one, storage scale by the amount of memory in each size rather than the dataset size. I actually played with doing that, then doing huffman encoding of that. I get around 1.2-1.3 bits / page of _physical memory_ on my tests. I don't have my notes handy, but here are some numbers from memory... that is interesting, even if I didn't have issue with the size of the maps so far, I thought that a simple zlib compression should be enought. The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page of physical memory in the machine. The latter limit get better, however, since there are 1024 symbols possible for the encoder (since in this case symbols are spans of zeros that need to fit in a file that is 1 GB in size). So is actually real worst case is much closer to 1 bit per page of the dataset or ~10 bits per page of physical memory. The real performance I see with huffman is more like 1.3 bits per page of physical memory. All the encoding decoding is actually very fast. zlib would actually compress even better than huffman, but huffman encoder/decoder is actually pretty good and very straightforward code. pgfincore currently hold those information in flat file. The on-going dev is more simple and provide the data as bits, so you can store it in a table, and restore it on your slave thanks to SR, and use it on the slave. I would like to integrate something like this into PG or perhaps even into something like rsync, but its was written as proof of concept and I haven't had time work on it recently. Garick -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] system views for walsender activity
On Sun, 2011-01-09 at 12:52 +0100, Magnus Hagander wrote: One thing I noticed is that it gives an interesting property to my patch for streaming base backups - they now show up in pg_stat_replication, with a streaming location of 0/0. If the view is named pg_stat_replication, we probably want to filter that out somehow. But then, do we want a separate view listing the walsenders that are busy sending base backups? For that matter, do we want an indication that separates a walsender not sending data from one sending that happens to be at location 0/0? Most will leave 0/0 really quickly, but a walsender can be idle (not received a command yet), or it can be running IDENTIFY_SYSTEM for example. I think we need a status enum. ('BACKUP', 'CATCHUP', 'STREAM') for the 3 phases of replication. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch(es)
On 09.01.2011 05:06, Robert Haas wrote: On Sat, Jan 8, 2011 at 4:10 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Splitting out those three would leave src/backend/ and src/include/ which comes in at a svelte 5891 lines. With a little more work I could split the three new files (predicate.c, predicate.h, and predicate_internals.h) out from the changes scattered around the rest of the code. That's 4346 lines and 1545 lines, respectively. Now, these numbers are likely to change a little in the next few days, but not much as a percentage outside the documentation. Thoughts? Well, my first thought is - I'm not sure it's realistic to think we're going to get this committed to 9.1. But that's not a very helpful thought. I just don't know who is going to review 7700 lines of code in the next month, and it doesn't sound like it can be decomposed into independent subfeatures that can be committed independently. I'm tempted to raise my hand and volunteer, but I don't want to make promises I might not be able to keep. But I'll definitely at least take another look at it. I don't think pushing this to 9.2 helps at all. This patch has gone through several rounds of reviews already, and unless someone sees a major issue with it, it's not going to get any more ready by postponing it. And it wouldn't be fair to Kevin and others who've worked hard on it. We'll just have to slurp it in somehow. Splitting it up by directory isn't really all that helpful. Agreed. If it can't easily be split into increments by functionality, then we'll just have to deal with it as on big patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds
Magnus Hagander mag...@hagander.net writes: Properly install gram.h on MSVC builds This file is now needed by pgAdmin builds, which started failing since it was missing in the installer builds. I'd like to protest this patch as misguided. AFAICS it is a *seriously* bad idea for pgAdmin to be including gram.h --- we don't even want most of the backend to include that, let alone external pieces of code. It's not stable enough. See the note in gramparse.h. To my way of thinking, not installing that file is a fine idea. What we really need to be asking is why the pgAdmin folks think they should be including it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds
On Sun, Jan 9, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Properly install gram.h on MSVC builds This file is now needed by pgAdmin builds, which started failing since it was missing in the installer builds. I'd like to protest this patch as misguided. AFAICS it is a *seriously* Uh, we install the file on Unix, so we should do the same on Windows. bad idea for pgAdmin to be including gram.h --- we don't even want most of the backend to include that, let alone external pieces of code. It's not stable enough. See the note in gramparse.h. Whether that's misguided is another thing :-) To my way of thinking, not installing that file is a fine idea. What we really need to be asking is why the pgAdmin folks think they should be including it. It is required in order to pull kwlist.h, which is used to determine which keywords to highlight in the SQL editor (and other places that do syntax highlighting). It never actually uses the values, but it needs the file for kwlist.h to compile. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for negative index values in array fetching
On 1/5/11 6:19 PM, Florian Pflug wrote: Sorry, but It isn't too intuitive. Minimally for me. Why you don't thinking about simple functions with only positive arguments. There are only four combinations. I don't think we must have only one super function. we need functionality for: a) get first n items b) get items without last n items c) get last n items d) skip first n items Now you've moved the goalpost - the OP wanted to access individual elements, not slices! To support slices, a three-argument version of array_relative() would be required, with the signature array_relative(some_array anyarray, first int[], last int[]) Your requirements (a) to (d) are then easily satisfied a) array_relative(ary, array[0], array[n-1]) b) array_relative(ary, array[0], array[-n-1]) c) array_relative(ary, array[-n], array[-1]) d) array_relative(ary, array[n], array[-1]) The individual function approach might be a tad more readable for one-dimensional arrays, but they don't scale well to the general case. Maybe the OP could comment on whether any of these solutions would fit his needs? Hi, (sorry for the late reply, got lost in my Inbox) For my main use case I just needed the last element of the array, but I could see myself needing a slice as well. (i.e. give me the last 5 items in an array) So in that sense yes, this would fit the bill. Hannu Valtonen Lead Software Architect Technology Office F-Secure Corporationhttp://www.F-Secure.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds
Magnus Hagander mag...@hagander.net writes: On Sun, Jan 9, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote: I'd like to protest this patch as misguided. AFAICS it is a *seriously* Uh, we install the file on Unix, so we should do the same on Windows. Well, my idea of how to fix that would be the other way 'round. What we really need to be asking is why the pgAdmin folks think they should be including it. It is required in order to pull kwlist.h, No, it is not required. What they should be doing is #define'ing PG_KEYWORD() in a way that ignores its second argument. See pg_dump's keywords.c for an example of safe usage. If we allow this to stand then we are going to find people complaining that we've broken ABI anytime we make a minor change in the grammar. I repeat: it's a SERIOUSLY bad idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly install gram.h on MSVC builds
On Sun, Jan 9, 2011 at 17:49, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Sun, Jan 9, 2011 at 17:31, Tom Lane t...@sss.pgh.pa.us wrote: I'd like to protest this patch as misguided. AFAICS it is a *seriously* Uh, we install the file on Unix, so we should do the same on Windows. Well, my idea of how to fix that would be the other way 'round. Sure, then it's at least consistent... What we really need to be asking is why the pgAdmin folks think they should be including it. It is required in order to pull kwlist.h, No, it is not required. What they should be doing is #define'ing PG_KEYWORD() in a way that ignores its second argument. See pg_dump's keywords.c for an example of safe usage. Ahh, good point. And yes, that seems to work for pgadmin. I'll commit a patch there as soon as I've finished testing, at which point it won't be required anymore. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSI and 2PC
In going back through old emails to see what issues might have been raised but not yet addressed for the SSI patch, I found the subject issue described in a review by Jeff Davis here: http://archives.postgresql.org/pgsql-hackers/2010-10/msg01159.php I think this is already handled based on feedback from Heikki: http://archives.postgresql.org/pgsql-hackers/2010-09/msg00789.php Basically, the PREPARE TRANSACTION statement plays the same role for the prepared transaction that COMMIT does for other transactions -- final conflict checking is done and the transaction becomes immune to later serialization_failure rollback. A transaction which starts after PREPARE TRANSACTION executes is not considered to overlap with the prepared transaction. In Jeff's example, if the Session2 runs a query before Session1 executes PREPARE TRANSACTION, one of them will fail. (I tested to make sure.) Does that sound sane, or is something else needed here? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Compatibility GUC for serializable
There's an issue where we don't seem to have consensus yet, so I figured I'd bounce it off the list. If the SSI patch were to be accepted as is, REPEATABLE READ would continue to provide the exact same snapshot isolation behavior which both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would always use SSI on top of the snapshot isolation to prevent serialization anomalies. In his review, Jeff argued for a compatibility GUC which could be changed to provide legacy behavior for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back to working the same as REPEATABLE READ. In an off-list exchange with me, David Fetter expressed opposition to this, as a foot-gun. I'm not sure where anyone else stands on this. Personally, I don't care a whole lot because it's trivial to add, so that seems to leave the vote at 1 to 1. Anyone else care to tip the scales? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compatibility GUC for serializable
On Sun, Jan 09, 2011 at 12:07:49PM -0600, Kevin Grittner wrote: There's an issue where we don't seem to have consensus yet, so I figured I'd bounce it off the list. If the SSI patch were to be accepted as is, REPEATABLE READ would continue to provide the exact same snapshot isolation behavior which both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would always use SSI on top of the snapshot isolation to prevent serialization anomalies. In his review, Jeff argued for a compatibility GUC which could be changed to provide legacy behavior for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back to working the same as REPEATABLE READ. In an off-list exchange with me, David Fetter expressed opposition to this, as a foot-gun. I'm not sure where anyone else stands on this. Personally, I don't care a whole lot because it's trivial to add, so that seems to leave the vote at 1 to 1. Anyone else care to tip the scales? For what it's worth, that exchange started with my proposing a separate SNAPSHOT isolation, but since we'll already providing that isolation level and calling it REPEATABLE READ, I figured we didn't need an extra one that did the exact same thing. :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] multiset patch review
Patching: patching file doc/src/sgml/func.sgml Hunk #6 succeeded at 10567 (offset 1 line). Hunk #7 succeeded at 10621 (offset 1 line). Hunk #8 succeeded at 10721 (offset 1 line). Hunk #9 succeeded at 10775 (offset 1 line). patching file src/backend/nodes/makefuncs.c patching file src/backend/parser/gram.y patching file src/backend/utils/adt/array_userfuncs.c patching file src/include/catalog/pg_aggregate.h patching file src/include/catalog/pg_proc.h patching file src/include/nodes/makefuncs.h patching file src/include/parser/kwlist.h patching file src/include/utils/array.h Hunk #1 succeeded at 281 (offset 1 line). patching file src/test/regress/expected/arrays.out Hunk #1 succeeded at 1558 (offset 272 lines). patching file src/test/regress/sql/arrays.sql Hunk #1 succeeded at 438 (offset 12 lines). Compilation: there are no warning related to patch regress tests failed *** 1639,1647 .. SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL], ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL]; ! submultiset_of | submultiset_of ^M ! +^M ! | f^M (1 row) .. SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL]; --- 1639,1647 .. SELECT ARRAY[1, 2] SUBMULTISET OF ARRAY[1, NULL], ARRAY[1, 2] SUBMULTISET OF ARRAY[3, NULL]; ! submultiset_of | submultiset_of. ! + ! | f (1 row) .. SELECT ARRAY[1, NULL] SUBMULTISET OF ARRAY[1, NULL]; There is often used a fragment + --fn.arg[0] = values1[n1]; + --fn.arg[1] = values2[n2]; + --fn.argnull[0] = false; + --fn.argnull[1] = false; + --fn.isnull = false; + --r = DatumGetInt32(FunctionCallInvoke(fn)); it can be moved to procedure? Doc and regress tests are ok. I see only one issue. There isn't documented, what is a MULTISET? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GiST insert algorithm rewrite
On 09.01.2011 07:05, Tom Lane wrote: I just found out that the benchmark test script in contrib/intarray/bench/ crashes HEAD in gistdoinsert() --- it looks like it's trying to pop to a stack entry that isn't there. Run it per the instructions in the intarray documentation: $ createdb TEST $ psql TEST ../_int.sql ... $ ./create_test.pl | psql TEST CREATE TABLE CREATE TABLE CREATE INDEX CREATE INDEX server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost The script generates randomized data, so possibly it won't fail every time, but it failed three out of three times for me. The changes I'm about to commit in intarray don't seem to make any difference. Thanks, fixed. Apparently my testing never hit the case where an update, rather than an insert, into the root page causes it to split. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] hstore ? operator versus mathematics
hstore's hstore ? text[] operator is defined as contains all, ie, it will return true if all the key names found in the text array are present in the hstore. ISTM that a sane definition of this operator would provide that if the array is empty, it returns true: every set contains the empty set. However, the code goes out of its way to return false instead. Perhaps this was done intentionally because there was no way to make GIN index searches work compatibly ... but now there is, so I see no reason to remain bug-compatible with this behavior. Barring objections, I'm going to fix it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ALTER TYPE 1: recheck index-based constraints
When ALTER TABLE rewrites a table, it reindexes, but the reindex does not revalidate UNIQUE/EXCLUDE constraints. This behaves badly in cases like this, neglecting to throw an error on the new UNIQUE violation: CREATE TABLE t (c numeric UNIQUE); INSERT INTO t VALUES (1.1),(1.2); ALTER TABLE t ALTER c TYPE int; The comment gave a reason for skipping the checks: it would cause deadlocks when we rewrite a system catalog. So, this patch changes things to only skip the check for system catalogs. *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *** *** 2544,2552 reindex_index(Oid indexId, bool skip_constraint_checks) * catalog indexes while collecting the list.) * * We also skip rechecking uniqueness/exclusion constraint properties if ! * heap_rebuilt is true. This avoids likely deadlock conditions when doing ! * VACUUM FULL or CLUSTER on system catalogs. REINDEX should be used to ! * rebuild an index if constraint inconsistency is suspected. * * Returns true if any indexes were rebuilt. Note that a * CommandCounterIncrement will occur after each index rebuild. --- 2544,2554 * catalog indexes while collecting the list.) * * We also skip rechecking uniqueness/exclusion constraint properties if ! * heap_rebuilt is true and the relation is a system catalog. This avoids ! * likely deadlock conditions when doing VACUUM FULL or CLUSTER. We trust that ! * constraints will remain consistent across a user-issued ALTER TABLE against a ! * system catalog. REINDEX should be used to rebuild an index if constraint ! * inconsistency is suspected. * * Returns true if any indexes were rebuilt. Note that a * CommandCounterIncrement will occur after each index rebuild. *** *** 2629,2635 reindex_relation(Oid relid, bool toast_too, bool heap_rebuilt) if (is_pg_class) RelationSetIndexList(rel, doneIndexes, InvalidOid); ! reindex_index(indexOid, heap_rebuilt); CommandCounterIncrement(); --- 2631,2637 if (is_pg_class) RelationSetIndexList(rel, doneIndexes, InvalidOid); ! reindex_index(indexOid, heap_rebuilt IsSystemRelation(rel)); CommandCounterIncrement(); *** a/src/test/regress/expected/alter_table.out --- b/src/test/regress/expected/alter_table.out *** *** 1774,1779 DEBUG: Rebuilding index t_strarr_idx --- 1774,1781 DEBUG: Rebuilding index t_square_idx DEBUG: Rebuilding index t_expr_idx DEBUG: Rebuilding index t_touchy_f_idx + ERROR: could not create unique index t_touchy_f_idx + DETAIL: Key (touchy_f(constraint1))=(100) is duplicated. ALTER TABLE t ALTER constraint2 TYPE trickint; -- noop-e DEBUG: Rewriting table t DEBUG: Rebuilding index t_constraint4_key *** *** 1792,1816 DEBUG: Rebuilding index t_strarr_idx DEBUG: Rebuilding index t_square_idx DEBUG: Rebuilding index t_touchy_f_idx DEBUG: Rebuilding index t_expr_idx ! -- Temporary fixup until behavior of the previous two improves. ! ALTER TABLE t ALTER constraint1 TYPE int, ALTER constraint2 TYPE int; ! DEBUG: Rewriting table t ! DEBUG: Rebuilding index t_constraint4_key ! DEBUG: Rebuilding index t_integral_key ! DEBUG: Rebuilding index t_rational_key ! DEBUG: Rebuilding index t_daytimetz_key ! DEBUG: Rebuilding index t_daytime_key ! DEBUG: Rebuilding index t_stamptz_key ! DEBUG: Rebuilding index t_stamp_key ! DEBUG: Rebuilding index t_timegap_key ! DEBUG: Rebuilding index t_bits_key ! DEBUG: Rebuilding index t_network_key ! DEBUG: Rebuilding index t_string_idx ! DEBUG: Rebuilding index t_string_idx1 ! DEBUG: Rebuilding index t_strarr_idx ! DEBUG: Rebuilding index t_square_idx ! DEBUG: Rebuilding index t_touchy_f_idx ! DEBUG: Rebuilding index t_expr_idx -- Change a column with an outgoing foreign key constraint. ALTER TABLE t ALTER constraint3 TYPE numeric(8,1); -- rewrite, FK error DEBUG: Rewriting table t --- 1794,1801 DEBUG: Rebuilding index t_square_idx DEBUG: Rebuilding index t_touchy_f_idx DEBUG: Rebuilding index t_expr_idx ! ERROR: could not create unique index t_expr_idx ! DETAIL: Key ((1))=(1) is duplicated. -- Change a column with an outgoing foreign key constraint. ALTER TABLE t ALTER constraint3 TYPE numeric(8,1); -- rewrite, FK error DEBUG: Rewriting table t *** a/src/test/regress/sql/alter_table.sql --- b/src/test/regress/sql/alter_table.sql *** *** 1246,1253 FROM pg_trigger WHERE tgrelid = 't'::regclass ORDER BY tgname; ALTER TABLE t ALTER constraint0 TYPE trickint; -- verify-e ALTER TABLE t ALTER constraint1 TYPE trickint; -- noop-e ALTER TABLE
[HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
This patch removes ALTER TYPE rewrites in cases we can already prove valid. I add a function GetCoerceExemptions() that walks an Expr according to rules discussed in the design thread, simplified slightly pending additions in the next patch. See the comment at that function for a refresher. I use it to populate two new bools to AlteredTableInfo, new_bits and mayerror. new_bits is a superset of new_changedoids, so I subsume that. I change ATRewriteTable to act on those and support the notion of evaluating the transformation expressions when we're not rewriting the table. As unintended fallout, it's no longer an error to add oids or a column with a default value to a table whose rowtype is used in columns elsewhere. This seems best. Defaults on the origin table do not even apply to new inserts into such a column, and the rowtype does not gain an OID column via its table. This helps on conversions like varchar(X)-text, xml-text, and conversions between domains and their base types. *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 71,76 --- 71,77 #include storage/smgr.h #include utils/acl.h #include utils/builtins.h + #include utils/datum.h #include utils/fmgroids.h #include utils/inval.h #include utils/lsyscache.h *** *** 140,147 typedef struct AlteredTableInfo /* Information saved by Phases 1/2 for Phase 3: */ List *constraints;/* List of NewConstraint */ List *newvals;/* List of NewColumnValue */ boolnew_notnull;/* T if we added new NOT NULL constraints */ - boolnew_changeoids; /* T if we added/dropped the OID column */ Oid newTableSpace; /* new tablespace; 0 means no change */ /* Objects to rebuild after completing ALTER TYPE operations */ List *changedConstraintOids; /* OIDs of constraints to rebuild */ --- 141,149 /* Information saved by Phases 1/2 for Phase 3: */ List *constraints;/* List of NewConstraint */ List *newvals;/* List of NewColumnValue */ + boolnew_bits; /* Could stored tuple or OID bits change? */ + boolmayerror; /* Could the newval expressions throw errors? */ boolnew_notnull;/* T if we added new NOT NULL constraints */ Oid newTableSpace; /* new tablespace; 0 means no change */ /* Objects to rebuild after completing ALTER TYPE operations */ List *changedConstraintOids; /* OIDs of constraints to rebuild */ *** *** 3184,3190 ATRewriteTables(List **wqueue, LOCKMODE lockmode) * We only need to rewrite the table if at least one column needs to * be recomputed, or we are adding/removing the OID column. */ ! if (tab-newvals != NIL || tab-new_changeoids) { /* Build a temporary relation and copy data */ RelationOldHeap; --- 3186,3192 * We only need to rewrite the table if at least one column needs to * be recomputed, or we are adding/removing the OID column. */ ! if (tab-new_bits) { /* Build a temporary relation and copy data */ RelationOldHeap; *** *** 3250,3256 ATRewriteTables(List **wqueue, LOCKMODE lockmode) * Test the current data within the table against new constraints * generated by ALTER TABLE commands, but don't rebuild data. */ ! if (tab-constraints != NIL || tab-new_notnull) ATRewriteTable(tab, InvalidOid, lockmode); /* --- 3252,3258 * Test the current data within the table against new constraints * generated by ALTER TABLE commands, but don't rebuild data. */ ! if (tab-mayerror || tab-constraints != NIL || tab-new_notnull) ATRewriteTable(tab, InvalidOid, lockmode); /* *** *** 3310,3316 ATRewriteTables(List **wqueue, LOCKMODE lockmode) /* * ATRewriteTable: scan or rewrite one table * ! * OIDNewHeap is InvalidOid if we don't need to rewrite */ static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) --- 3312,3319 /* * ATRewriteTable: scan or rewrite one table * ! * OIDNewHeap is InvalidOid if we don't need to rewrite. If the only new ! * constraints are foreign key constraints, we do nothing here. */ static void
[HACKERS] ALTER TYPE 4: temporal data types
Add exemptor functions to avoid rewrites for conversions involving the temporal data types. I needed a find-last-set function for the interval_scale exemptor function, so I imported one from FreeBSD. To improve timestamp-timestamptz when the timezone is UTC/GMT, I compare the current timezone definition to the gmt_timezone we keep handy. To make that actually work, I've changed pg_tzset to zero its unused bytes. *** a/configure --- b/configure *** *** 20269,20275 LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'` ! for ac_func in crypt erand48 getopt getrusage inet_aton random rint srandom strdup strerror strlcat strlcpy strtol strtoul do as_ac_var=`$as_echo ac_cv_func_$ac_func | $as_tr_sh` { $as_echo $as_me:$LINENO: checking for $ac_func 5 --- 20269,20276 ! ! for ac_func in crypt erand48 fls getopt getrusage inet_aton random rint srandom strdup strerror strlcat strlcpy strtol strtoul do as_ac_var=`$as_echo ac_cv_func_$ac_func | $as_tr_sh` { $as_echo $as_me:$LINENO: checking for $ac_func 5 *** a/configure.in --- b/configure.in *** *** 1288,1294 fi pgac_save_LIBS=$LIBS LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'` ! AC_REPLACE_FUNCS([crypt erand48 getopt getrusage inet_aton random rint srandom strdup strerror strlcat strlcpy strtol strtoul]) case $host_os in --- 1288,1294 pgac_save_LIBS=$LIBS LIBS=`echo $LIBS | sed -e 's/-ledit//g' -e 's/-lreadline//g'` ! AC_REPLACE_FUNCS([crypt erand48 fls getopt getrusage inet_aton random rint srandom strdup strerror strlcat strlcpy strtol strtoul]) case $host_os in *** a/src/backend/utils/adt/date.c --- b/src/backend/utils/adt/date.c *** *** 1187,1192 timetypmodout(PG_FUNCTION_ARGS) --- 1187,1203 } + /* time_exemptor() + * Identify superfluous calls to time_scale. Also suitable for timetz_scale. + */ + Datum + time_exemptor(PG_FUNCTION_ARGS) + { + PG_RETURN_INT32(TemporalExemptor(MAX_TIME_PRECISION, + PG_GETARG_INT32(0), + PG_GETARG_INT32(1))); + } + /* time_scale() * Adjust time type for specified scale factor. * Used by PostgreSQL type system to stuff columns. *** a/src/backend/utils/adt/datetime.c --- b/src/backend/utils/adt/datetime.c *** *** 4135,4140 CheckDateTokenTables(void) --- 4135,4158 } /* + * Helper for temporal exemptor functions. Under !HAVE_INT64_TIMESTAMP and the + * wrong floating point implementation, rounding could make a fib of our + * COERCE_EXEMPT_NOCHANGE. Genuine users will appreciate this as a feature: we + * avoid gratuitous data churn due to rounding anomalies in the cast. This + * would yield an Assert failure in a suitably enabled build, at which point we + * might need to weaken that Assert or get serious about the precise truth here. + */ + int32 + TemporalExemptor(int32 max_precis, int32 old_precis, int32 new_precis) + { + if (new_precis == -1 || + new_precis == max_precis || + (old_precis != -1 new_precis = old_precis)) + return COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR; + return COERCE_EXEMPT_NOERROR; + } + + /* * This function gets called during timezone config file load or reload * to create the final array of timezone tokens. The argument array * is already sorted in name order. This data is in a temporary memory *** a/src/backend/utils/adt/timestamp.c --- b/src/backend/utils/adt/timestamp.c *** *** 308,313 timestamptypmodout(PG_FUNCTION_ARGS) --- 308,329 } + /* timestamp_exemptor() + * Identify superfluous calls to timestamp_scale. Also suitable for + * timestamptz_scale. + */ + Datum + timestamp_exemptor(PG_FUNCTION_ARGS) + { + /* +* timestamp_scale throws an error when the typmod is out of range, but we +* can't get there from a cast: our typmodin will have caught it already. +*/ + PG_RETURN_INT32(TemporalExemptor(MAX_TIMESTAMP_PRECISION, + PG_GETARG_INT32(0), + PG_GETARG_INT32(1))); + } + /* timestamp_scale() * Adjust time type for specified scale factor. * Used by PostgreSQL type system to stuff columns. *** *** 745,750 interval_send(PG_FUNCTION_ARGS) --- 761,779 PG_RETURN_BYTEA_P(pq_endtypsend(buf)); } + /* + * The interval typmod stores a range in its high 16 bits and a precision in + * its low 16 bits. Both address the resolution of the type, with the range + * affecting resolution larger than one second, and precision affecting + * resolution smaller than one second. The representation is sufficient to + * represent all
[HACKERS] ALTER TYPE 5: varbit and bit
Add exemptor functions for bit and varbit. These are probably the simplest examples of the full range of optimizations. I would have used them as the test case in the initial exemptor function patch if it were a more mainstream use case. *** a/src/backend/utils/adt/varbit.c --- b/src/backend/utils/adt/varbit.c *** *** 18,23 --- 18,24 #include access/htup.h #include libpq/pqformat.h + #include nodes/primnodes.h #include utils/array.h #include utils/varbit.h *** *** 337,342 bit_send(PG_FUNCTION_ARGS) --- 338,359 } /* + * bit_exemptor() + * Identify superfluous calls to our length coercion function. + */ + Datum + bit_exemptor(PG_FUNCTION_ARGS) + { + boolisExplicit = PG_GETARG_BOOL(2); + + /* +* We could add COERCE_EXEMPT_NOERROR when the old and new lengths are +* identical, but that has zero practical value. +*/ + PG_RETURN_INT32(isExplicit ? 0 : COERCE_EXEMPT_NOCHANGE); + } + + /* * bit() * Converts a bit() type to a specific internal length. * len is the bitlength specified in the column definition. *** *** 645,650 varbit_send(PG_FUNCTION_ARGS) --- 662,683 } /* + * varbit_exemptor() + * Identify superfluous calls to our length coercion function. + */ + Datum + varbit_exemptor(PG_FUNCTION_ARGS) + { + int32 old_max = PG_GETARG_INT32(0); + int32 new_max = PG_GETARG_INT32(1); + boolisExplicit = PG_GETARG_BOOL(2); + + if (new_max = 0 || (old_max 0 new_max = old_max)) + PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR); + PG_RETURN_INT32(isExplicit ? 0 : COERCE_EXEMPT_NOCHANGE); + } + + /* * varbit() * Converts a varbit() type to a specific internal length. * len is the maximum bitlength specified in the column definition. *** a/src/include/catalog/catversion.h --- b/src/include/catalog/catversion.h *** *** 53,58 */ /*mmddN */ ! #define CATALOG_VERSION_NO201101102 #endif --- 53,58 */ /*mmddN */ ! #define CATALOG_VERSION_NO201101103 #endif *** a/src/include/catalog/pg_cast.h --- b/src/include/catalog/pg_cast.h *** *** 355,362 DATA(insert ( 1114 1114 1961 3542 i f )); DATA(insert ( 1184 1184 1967 3542 i f )); DATA(insert ( 1186 1186 1200 3543 i f )); DATA(insert ( 1266 1266 1969 3541 i f )); ! DATA(insert ( 1560 1560 1685 0 i f )); ! DATA(insert ( 1562 1562 1687 0 i f )); DATA(insert ( 1700 1700 1703 0 i f )); #endif /* PG_CAST_H */ --- 355,362 DATA(insert ( 1184 1184 1967 3542 i f )); DATA(insert ( 1186 1186 1200 3543 i f )); DATA(insert ( 1266 1266 1969 3541 i f )); ! DATA(insert ( 1560 1560 1685 3817 i f )); ! DATA(insert ( 1562 1562 1687 3819 i f )); DATA(insert ( 1700 1700 1703 0 i f )); #endif /* PG_CAST_H */ *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *** *** 2433,2440 DESCR(int4 to bitstring); --- 2433,2444 DATA(insert OID = 1684 ( int4PGNSP PGUID 12 1 0 0 f f f t f i 1 0 23 1560 _null_ _null_ _null_ _null_ bittoint4 _null_ _null_ _null_ )); DESCR(bitstring to int4); + DATA(insert OID = 3817 ( bit_exemptor PGNSP PGUID 12 1 0 0 f f f t f i 3 0 23 23 23 16 _null_ _null_ _null_ _null_ bit_exemptor _null_ _null_ _null_ )); + DESCR(bit cast exemptor); DATA(insert OID = 1685 ( bitPGNSP PGUID 12 1 0 0 f f f t f i 3 0 1560 1560 23 16 _null_ _null_ _null_ _null_ bit _null_ _null_ _null_ )); DESCR(adjust bit() to typmod length); + DATA(insert OID = 3819 ( varbit_exemptor PGNSP PGUID 12 1 0 0 f f f t f i 3 0 23 23 23 16 _null_ _null_ _null_ _null_ varbit_exemptor _null_ _null_ _null_ )); + DESCR(varbit cast exemptor); DATA(insert OID = 1687 ( varbit PGNSP PGUID 12 1 0 0 f f f t f i 3 0 1562 1562 23 16 _null_ _null_ _null_ _null_ varbit _null_ _null_ _null_ )); DESCR(adjust varbit() to typmod length); *** a/src/include/utils/varbit.h --- b/src/include/utils/varbit.h *** *** 71,77 extern Datum varbit_recv(PG_FUNCTION_ARGS); --- 71,79 extern Datum varbit_send(PG_FUNCTION_ARGS); extern Datum varbittypmodin(PG_FUNCTION_ARGS); extern Datum varbittypmodout(PG_FUNCTION_ARGS); + extern Datum bit_exemptor(PG_FUNCTION_ARGS); extern Datum bit(PG_FUNCTION_ARGS); + extern Datum varbit_exemptor(PG_FUNCTION_ARGS); extern Datum varbit(PG_FUNCTION_ARGS); extern Datum biteq(PG_FUNCTION_ARGS); extern Datum bitne(PG_FUNCTION_ARGS); *** a/src/test/regress/expected/alter_table.out --- b/src/test/regress/expected/alter_table.out *** *** 2454,2478 DEBUG: Rebuilding index t_timegap_key ALTER TABLE t ALTER timegap TYPE interval(2);
[HACKERS] ALTER TYPE 6: numeric
Add an exemptor function for numeric. We store the scale in every datum, making numeric(7,2)-numeric(8,3) unoptimizable. Precision changes work, though. *** a/src/backend/utils/adt/numeric.c --- b/src/backend/utils/adt/numeric.c *** *** 712,717 numeric_send(PG_FUNCTION_ARGS) --- 712,754 } + Datum + numeric_exemptor(PG_FUNCTION_ARGS) + { + int32 old_typmod = PG_GETARG_INT32(0); + int32 new_typmod = PG_GETARG_INT32(1); + int old_scale; + int new_scale; + int old_precis; + int new_precis; + + /* Destination is unconstrained: never any work to do. */ + if (new_typmod VARHDRSZ) + PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR); + + /* +* Source is unconstrained, and destination is constrained. A value like +* 10^-1001 will always get truncated in this situation, so we can't help. +*/ + if (old_typmod VARHDRSZ) + PG_RETURN_INT32(0); + + /* Each value stores its scale. If the scale changes, we can't help. */ + old_scale = (old_typmod - VARHDRSZ) 0x; + new_scale = (new_typmod - VARHDRSZ) 0x; + if (old_scale != new_scale) + PG_RETURN_INT32(0); + + old_precis = (old_typmod - VARHDRSZ) 16 0x; + new_precis = (new_typmod - VARHDRSZ) 16 0x; + if (new_precis = old_precis) + /* Precision increases with scale unchanged: never any work to do. */ + PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE | COERCE_EXEMPT_NOERROR); + + /* Precision falls with scale unchanged: might error, never changes data. */ + PG_RETURN_INT32(COERCE_EXEMPT_NOCHANGE); + } + /* * numeric() - * *** a/src/include/catalog/catversion.h --- b/src/include/catalog/catversion.h *** *** 53,58 */ /*mmddN */ ! #define CATALOG_VERSION_NO201101103 #endif --- 53,58 */ /*mmddN */ ! #define CATALOG_VERSION_NO201101104 #endif *** a/src/include/catalog/pg_cast.h --- b/src/include/catalog/pg_cast.h *** *** 357,362 DATA(insert ( 1186 1186 1200 3543 i f )); DATA(insert ( 1266 1266 1969 3541 i f )); DATA(insert ( 1560 1560 1685 3817 i f )); DATA(insert ( 1562 1562 1687 3819 i f )); ! DATA(insert ( 1700 1700 1703 0 i f )); #endif /* PG_CAST_H */ --- 357,362 DATA(insert ( 1266 1266 1969 3541 i f )); DATA(insert ( 1560 1560 1685 3817 i f )); DATA(insert ( 1562 1562 1687 3819 i f )); ! DATA(insert ( 1700 1700 1703 3818 i f )); #endif /* PG_CAST_H */ *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *** *** 2595,2600 DATA(insert OID = 2917 ( numerictypmodinPGNSP PGUID 12 1 0 0 f f f t f i 1 0 --- 2595,2602 DESCR(I/O typmod); DATA(insert OID = 2918 ( numerictypmodoutPGNSP PGUID 12 1 0 0 f f f t f i 1 0 2275 23 _null_ _null_ _null_ _null_ numerictypmodout _null_ _null_ _null_ )); DESCR(I/O typmod); + DATA(insert OID = 3818 ( numeric_exemptor PGNSP PGUID 12 1 0 0 f f f t f i 3 0 23 23 23 16 _null_ _null_ _null_ _null_ numeric_exemptor _null_ _null_ _null_ )); + DESCR(numeric cast exemptor); DATA(insert OID = 1703 ( numeric PGNSP PGUID 12 1 0 0 f f f t f i 2 0 1700 1700 23 _null_ _null_ _null_ _null_ numeric _null_ _null_ _null_ )); DESCR(adjust numeric to typmod precision/scale); DATA(insert OID = 1704 ( numeric_abs PGNSP PGUID 12 1 0 0 f f f t f i 1 0 1700 1700 _null_ _null_ _null_ _null_ numeric_abs _null_ _null_ _null_ )); *** a/src/include/utils/builtins.h --- b/src/include/utils/builtins.h *** *** 878,883 extern Datum numeric_recv(PG_FUNCTION_ARGS); --- 878,884 extern Datum numeric_send(PG_FUNCTION_ARGS); extern Datum numerictypmodin(PG_FUNCTION_ARGS); extern Datum numerictypmodout(PG_FUNCTION_ARGS); + extern Datum numeric_exemptor(PG_FUNCTION_ARGS); extern Datum numeric (PG_FUNCTION_ARGS); extern Datum numeric_abs(PG_FUNCTION_ARGS); extern Datum numeric_uminus(PG_FUNCTION_ARGS); *** a/src/test/regress/expected/alter_table.out --- b/src/test/regress/expected/alter_table.out *** *** 1804,1845 DEBUG: Rebuilding index t_touchy_f_idx DEBUG: Rebuilding index t_expr_idx DEBUG: Validating foreign key constraint t_constraint3_fkey ALTER TABLE t ALTER constraint3 TYPE numeric(7,2); -- verify; FK noop ! DEBUG: Rewriting table t ! DEBUG: Rebuilding index t_constraint4_key ! DEBUG: Rebuilding index t_integral_key ! DEBUG: Rebuilding index t_rational_key ! DEBUG: Rebuilding index t_daytimetz_key ! DEBUG: Rebuilding index t_daytime_key ! DEBUG: Rebuilding index t_stamptz_key ! DEBUG:
Re: [HACKERS] Streaming base backups
2011/1/7 Magnus Hagander mag...@hagander.net: On Fri, Jan 7, 2011 at 01:47, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: 2011/1/5 Magnus Hagander mag...@hagander.net: On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Magnus Hagander mag...@hagander.net writes: * Stefan mentiond it might be useful to put some posix_fadvise(POSIX_FADV_DONTNEED) in the process that streams all the files out. Seems useful, as long as that doesn't kick them out of the cache *completely*, for other backends as well. Do we know if that is the case? Maybe have a look at pgfincore to only tag DONTNEED for blocks that are not already in SHM? I think that's way more complex than we want to go here. DONTNEED will remove the block from OS buffer everytime. Then we definitely don't want to use it - because some other backend might well want the file. Better leave it up to the standard logic in the kernel. Looking at the patch, it is (very) easy to add the support for that in basebackup.c That supposed allowing mincore(), so mmap(), and so probably switch the fopen() to an open() (or add an open() just for mmap requirement...) Let's go ? It should not be that hard to implement a snapshot(it needs mincore()) and to restore previous state. I don't know how basebackup is performed exactly...so perhaps I am wrong. Uh, it just reads the files out of the filesystem. Just like you'd to today, except it's now integrated and streams the data across a regular libpq connection. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GIN indexscans versus equality selectivity estimation
Whilst fooling around with GIN over the past few days, I noticed the following rather surprising behavior: regression=# create table t1 (f1 int[]); CREATE TABLE regression=# insert into t1 values (array[42]); INSERT 0 1 regression=# create index ti1 on t1 using gin (f1); CREATE INDEX regression=# set enable_seqscan to 0; SET regression=# explain select * from t1 where f1 = array[42]; QUERY PLAN --- Seq Scan on t1 (cost=100.00..101.01 rows=1 width=32) Filter: (f1 = '{42}'::integer[]) (2 rows) The system absolutely will not use the index in this state, even though the GIN array opclass does support equality. I dug around and eventually figured out why: 1. We don't have any pg_stats statistics in this state; but we do know reltuples = 1, since the CREATE INDEX helpfully updated the pg_class row with that information. Without stats, eqsel() falls back to a selectivity estimate of 1/numdistinct; and without stats, get_variable_numdistinct estimates the number of distinct values as equal to reltuples, if that's a small number. The upshot is that we get a selectivity estimate of exactly 1.0 for the equality condition. 2. In indxpath.c, we have the following comment and code about selecting which paths to generate bitmap scan paths for: * ... Also, pick out the ones that might be useful * as bitmap scans. For that, we must discard indexes that don't support * bitmap scans, and we also are only interested in paths that have some * selectivity; we should discard anything that was generated solely for * ordering purposes. if (ipath-indexinfo-amhasgetbitmap ipath-indexselectivity 1.0 !ScanDirectionIsBackward(ipath-indexscandir)) bitindexpaths = lappend(bitindexpaths, ipath); Since GIN doesn't support plain indexscans, only bitmap scans, this means that the planner simply won't use a GIN index unless the estimated selectivity is less than 1.0. There are several things we might choose to do about this: 1. Do nothing. The issue seems quite unlikely to affect anyone in the field, since in fact use a seqscan is probably the right answer anytime reltuples = 1; and anyway using a GIN index for plain equality is a corner case to begin with. However, it could confuse people who were doing testing (it confused me!). 2. Hack the selectivity estimators so that we don't get exactly 1.0 for this sort of situation. It might be plausible for get_variable_numdistinct to set a floor of 2 on its result, for example; or we could hack eqsel() to bound the no-stats estimate to a bit less than 1. 3. Change the indxpath.c code to have some different way of determining whether a path is sensible to use as a bitmap scan. We could for instance make the test be path has some indexquals and/or index is partial. Or maybe there should be an exception in the case where the index type doesn't support plain indexscan. I'm not really sold on any of these alternatives. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compatibility GUC for serializable
Kevin Grittner kevin.gritt...@wicourts.gov writes: If the SSI patch were to be accepted as is, REPEATABLE READ would continue to provide the exact same snapshot isolation behavior which both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would always use SSI on top of the snapshot isolation to prevent serialization anomalies. In his review, Jeff argued for a compatibility GUC which could be changed to provide legacy behavior for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back to working the same as REPEATABLE READ. In an off-list exchange with me, David Fetter expressed opposition to this, as a foot-gun. I think we've learned over the years that GUCs that significantly change semantics can be foot-guns. I'm not sure exactly how dangerous this one would be, but on the whole I'd prefer to avoid introducing a GUC here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
On 1/9/11 3:38 PM, Tom Lane wrote: 1. Do nothing. The issue seems quite unlikely to affect anyone in the field, since in fact use a seqscan is probably the right answer anytime reltuples = 1; and anyway using a GIN index for plain equality is a corner case to begin with. However, it could confuse people who were doing testing (it confused me!). +1. It's an unexpected result, but not actually a bad one. It just doesn't seem worth messing with code which works in production just to help testing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compatibility GUC for serializable
On Sun, Jan 9, 2011 at 7:47 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: If the SSI patch were to be accepted as is, REPEATABLE READ would continue to provide the exact same snapshot isolation behavior which both it and SERIALIZABLE do through 9.0, and SERIALIZABLE would always use SSI on top of the snapshot isolation to prevent serialization anomalies. In his review, Jeff argued for a compatibility GUC which could be changed to provide legacy behavior for SERIALIZABLE transactions -- if set, SERIALIZABLE would fall back to working the same as REPEATABLE READ. In an off-list exchange with me, David Fetter expressed opposition to this, as a foot-gun. I think we've learned over the years that GUCs that significantly change semantics can be foot-guns. I'm not sure exactly how dangerous this one would be, but on the whole I'd prefer to avoid introducing a GUC here. I agree. I think we should assume that existing code which asks for serializable behavior wants serializable behavior, not broken serializable behavior. There certainly could be cases where the opposite is true (the code wants, specifically, our traditional definition of serializability rather than actual serializability) but I bet there's not a whole lot of them, and changing such code to ask for REPEATABLE READ probably isn't extremely difficult. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN indexscans versus equality selectivity estimation
On Sun, Jan 9, 2011 at 6:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: or we could hack eqsel() to bound the no-stats estimate to a bit less than 1. This seems like a pretty sensible thing to do. I can't immediately imagine a situation in which 1.0 is a sensible selectivity estimate in the no-stats case and 0.90 (say) is a major regression. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore ? operator versus mathematics
On Sun, Jan 09, 2011 at 04:10:25PM -0500, Tom Lane wrote: hstore's hstore ? text[] operator is defined as contains all, ie, it will return true if all the key names found in the text array are present in the hstore. ISTM that a sane definition of this operator would provide that if the array is empty, it returns true: every set contains the empty set. However, the code goes out of its way to return false instead. Perhaps this was done intentionally because there was no way to make GIN index searches work compatibly ... but now there is, so I see no reason to remain bug-compatible with this behavior. Barring objections, I'm going to fix it. +1 for fixing it. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers