[HACKERS] WIP: default values for function parameters
Hello I have problem with sending patch, so I am send link http://www.pgsql.cz/patches/defaults.diff.gz Example: postgres=# create function fx(a int, b int default 30, c int default 40) postgres-# returns int as $$ select $1 + $2 + $3; $$ postgres-# language sql; CREATE FUNCTION postgres=# select fx(); ERROR: function fx() does not exist LINE 1: select fx(); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. postgres=# select fx(10); fx 80 (1 row) postgres=# select fx(10,11); fx 61 (1 row) postgres=# select fx(10,11,12); fx 33 (1 row) Know bugs: blind ambiguous call detection comments, ideas? regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] new correlation metric
Currently, we use correlation to estimate the I/O costs of an index scan. However, this has some problems: http://archives.postgresql.org/pgsql-hackers/2008-01/msg01084.php I worked with Nathan Boley to come up with what we think is a better metric for measuring this cost. It is based on the number of times in the ordered sample that you have to physically backtrack (i.e. the data value increases, but the physical position is earlier). For example, if the table's physical order is 6 7 8 9 10 1 2 3 4 5 then the new metric will see one backtrack -- after reading 5 it backtracks to the beginning to read 6. The old code would have produced a correlation near -0.5. The new code will yield a correlation of 8/9. The old code would have produced a run_cost of 1/4*min_IO_cost + 3/4*max_IO_cost, while the new code produces a run_cost of 8/9*min_IO_cost + 1/9*max_IO_cost. Attached is a patch. We replaced correlation, but left the name the same (even though that's now a misnomer). We can change it if needed. Comments? Regards, Jeff Davis correlation.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new correlation metric
On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: I worked with Nathan Boley to come up with what we think is a better metric for measuring this cost. It is based on the number of times in the ordered sample that you have to physically backtrack (i.e. the data value increases, but the physical position is earlier). For example, if the table's physical order is 6 7 8 9 10 1 2 3 4 5 How does it deal with a case like the following: 1 6 2 7 3 8 4 9 5 10 (interleaving) ISTM that your code will overestimate the cost whereas the old code wouldn't have done too badly. I think the code is in the right direction, but I think want you want is some kind of estimate of given I've looked for tuple X, how many tuples in the next k pages are near this one. Unfortunatly I don't see a way of calculating it other than a full simulation. Have a nice day, -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Please line up in a tree and maintain the heap invariant while boarding. Thank you for flying nlogn airlines. signature.asc Description: Digital signature
Re: [HACKERS] new correlation metric
Martijn van Oosterhout [EMAIL PROTECTED] writes: On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: I worked with Nathan Boley to come up with what we think is a better metric for measuring this cost. I think the code is in the right direction, but I think want you want is some kind of estimate of given I've looked for tuple X, how many tuples in the next k pages are near this one. ISTM that some experimental studies would be required to justify any proposal in this area. Having said that ... one of the things I never liked about the existing code is that it pays no attention to block-boundary effects. It doesn't matter to an indexscan how badly tuples within a single block are misordered --- what matters is how many block reads you have to do. So I think that any changes here ought to include measuring the correlation on the basis of block numbers not tuple numbers. But what's not too clear to me is whether our sampling methods would mess that up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new correlation metric
Martijn van Oosterhout wrote: On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: I worked with Nathan Boley to come up with what we think is a better metric for measuring this cost. It is based on the number of times in the ordered sample that you have to physically backtrack (i.e. the data value increases, but the physical position is earlier). For example, if the table's physical order is 6 7 8 9 10 1 2 3 4 5 How does it deal with a case like the following: 1 6 2 7 3 8 4 9 5 10 (interleaving) ISTM that your code will overestimate the cost whereas the old code wouldn't have done too badly. Another similar case is where the tables is almost in order, but not quite. For example, take an ordered table, but swap every other value: 2 1 4 3 6 5 7 8 10 9 Distributions similar to that that would happen naturully if you have a logging application that receives timestamped events from elsewhere. The events would be come in roughly in timestamp order, but not quite because of different delays, for example. For all practical purposes, that is just as good as a perfectly sorted table. However, the patch actually operates on the *sample*, not the real physical order. So I think it would actually work well for my example, because if you take just a few values from that sequence in random, the sample is likely to be in good order. Not sure about the interleaved case. I wonder how the proposed measurement behaves wrt. the sample size vs. table size ratio. Intuitively, it feels like it becomes less sensitive to disorder, as the table size grows and (sample size)/(table size) becomes smaller. Which is not good, I believe. I think the code is in the right direction, but I think want you want is some kind of estimate of given I've looked for tuple X, how many tuples in the next k pages are near this one. Unfortunatly I don't see a way of calculating it other than a full simulation. Yeah, it would be nice to figure out something, as the current method sure isn't perfect. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new correlation metric
I haven't look at the patch yet -- I'm actually on a train now. I'm sorry if these questions are answered in the patch. I think there are three questions here: A) what metric are we going to use B) how do we estimate thy metric given a sample C) how do we draw the conclusions we need based on this metric I looked recently on this topic and I found papers on estimating two metrics based on samples: longest sorted subsequence and number of sorted subsequence. The latter of which sounds like it might be equivalent to what you have. I didn't did any papers on drawing conclusions from either of these metrics though. I think we can just look at block number to avoid intra-block disorder confusing us. It does mean we have effectively a much smaller sample though. I don't think the other objection is worth worrying about though. If our sample has 15263748 then it may be that readahead will save us in that particular instance but it's clear the table isn't well clustered and it's just luck that those blocks were intermixed. In a very particular way. The rest of the table is unlikely to be so lucky. greg On 26 Oct 2008, at 03:51 PM, Tom Lane [EMAIL PROTECTED] wrote: Martijn van Oosterhout [EMAIL PROTECTED] writes: On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote: I worked with Nathan Boley to come up with what we think is a better metric for measuring this cost. I think the code is in the right direction, but I think want you want is some kind of estimate of given I've looked for tuple X, how many tuples in the next k pages are near this one. ISTM that some experimental studies would be required to justify any proposal in this area. Having said that ... one of the things I never liked about the existing code is that it pays no attention to block-boundary effects. It doesn't matter to an indexscan how badly tuples within a single block are misordered --- what matters is how many block reads you have to do. So I think that any changes here ought to include measuring the correlation on the basis of block numbers not tuple numbers. But what's not too clear to me is whether our sampling methods would mess that up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new correlation metric
Martijn van Oosterhout [EMAIL PROTECTED] writes: I think the code is in the right direction, but I think want you want is some kind of estimate of given I've looked for tuple X, how many tuples in the next k pages are near this one. Unfortunatly I don't see a way of calculating it other than a full simulation. I wonder whether we ought to rethink the problem entirely. In particular, the notion of associating correlation stats with particular columns doesn't seem amazingly useful when you get right down to it. We're wasting our time calculating the correlation of a column that has no index; and if the column is part of a multicolumn index it's not that easy to figure out what the index correlation is from the per-column numbers. Not to mention functional and partial indexes. So it occurs to me that maybe we should forget about per-column correlations altogether, and instead try directly to calculate *per index* correlations. You could do this now by doing an index-only scan and looking at the series of tuple block numbers that come back. However, there's no obvious way to do any sampling in that approach --- you can't read random subsets of the index without cooperation from the index AM. So the direction I'm headed in is to imagine that we should add an analyze entry point to the index AM API, with the definition that it will be called once per index per ANALYZE, and is in charge of putting useful stats into someplace or other. We might need to invent some other catalog besides pg_statistic if we want to represent per-index properties like correlation. A minimal solution would be to add a correlation column to pg_class or pg_index, but that doesn't scale well if you imagine that different index AMs might want different sorts of stats. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] array_agg and array_accum (patch)
Here is a patch to support two aggregate functions: 1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input, and skips NULL inputs. 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL inputs. These accumulate the result in a memory context that lives across calls to the state function, so it's reasonably efficient. On my old laptop it takes about 5s to generate an array of 1M elements -- not great, but at least it's linear. Although array_agg is the standard behavior, array_accum is important because otherwise you always lose the NULLs, and that's difficult to work around even with COALESCE. I added them as new native functions because ARRAY_AGG is in the standard, but if others think they should live elsewhere that's fine. I think that they are generally pretty useful functions for people using arrays. This patch is contributed by Truviso. Regards, Jeff Davis array_agg.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufferAccessStrategy for bulk insert
Robert Haas [EMAIL PROTECTED] writes: I am kind of inclined to define flags like this: #define HEAP_INSERT_SKIP_WAL 0x0001 #define HEAP_INSERT_SKIP_FSM 0x0002 #define HEAP_INSERT_BULK 0x0004 /* do we even need this one? */ And then: Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid, unsigned options, BulkInsertState *bistate); BulkInsertState *GetBulkInsertState(void); void FreeBulkInsertState(BulkInsertState *); Seems sane to me. I don't see the point of the HEAP_INSERT_BULK flag bit --- providing or not providing bistate would cover that, and if you have a bit as well then you have to define what the inconsistent combinations mean. I concur with making all-zeroes be the typical state of the flag bits, too. FWIW, we generally declare bitmask flag variables as int, unless there's some really good reason to do otherwise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new correlation metric
On Sun, 2008-10-26 at 12:44 -0400, Tom Lane wrote: I wonder whether we ought to rethink the problem entirely. [...] What you say makes a lot of sense. We would have to take a sample of index leaf pages, but I think we could get a really useful number from it. For BTree, we can just read the values in order, and maintain 3 counts: * same_count: this TID points to the same page as the last one did * near_count: points to a page that is close enough that readahead or caching will help * far_count: points to a page where we don't have a reason to think readahead or caching will be a major factor Then, we could use a formula like: run_cost = min_IO_cost + (max_IO_cost - min_IO_cost)*(far_count/total_count) + small_factor*(max_IO_cost - min_IO_cost)*(near_count/total_count) small_factor should be the one minus the percentage of the near_count that we expect to find in cache (because it was recently read or due to readahead). I think from these numbers most of the concerns are taken into account. Heikki's almost-in-order case is solved because we could recognize that the pages are close enough to be in cache still. Interleaving might still cause a lower estimate, but it seems like any optimistic estimate we could make for that case is pretty risky. If the caching effects or readahead aren't as helpful as we expected, it would mean very bad performance. Of course, I agree that we need some empiral testing. useful stats into someplace or other. We might need to invent some other catalog besides pg_statistic if we want to represent per-index properties like correlation. A minimal solution would be to add a correlation column to pg_class or pg_index, but that doesn't scale well if you imagine that different index AMs might want different sorts of stats. Why can't we just use pg_statistic with the starelid set to the index oid? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_agg and array_accum (patch)
I think array_agg also keeps nulls - although the draft standard I have seems to contradict itself about this... Ian 2008/10/26 Jeff Davis [EMAIL PROTECTED]: Here is a patch to support two aggregate functions: 1) ARRAY_AGG() -- SQL 2008 standard behavior, returns NULL on no input, and skips NULL inputs. 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL inputs. These accumulate the result in a memory context that lives across calls to the state function, so it's reasonably efficient. On my old laptop it takes about 5s to generate an array of 1M elements -- not great, but at least it's linear. Although array_agg is the standard behavior, array_accum is important because otherwise you always lose the NULLs, and that's difficult to work around even with COALESCE. I added them as new native functions because ARRAY_AGG is in the standard, but if others think they should live elsewhere that's fine. I think that they are generally pretty useful functions for people using arrays. This patch is contributed by Truviso. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new correlation metric
Jeff Davis [EMAIL PROTECTED] writes: On Sun, 2008-10-26 at 12:44 -0400, Tom Lane wrote: We might need to invent some other catalog besides pg_statistic if we want to represent per-index properties like correlation. Why can't we just use pg_statistic with the starelid set to the index oid? Well, because pg_statistic is built for per-column stats. You'd have to invent some value for staattnum, which would be problematic for views like pg_stats that expect it to join to a valid pg_attribute row; and you'd have useless columns like stanullfrac and stadistinct. There's no problem with using pg_statistic for stats that correspond to individual index columns (and in fact we do that already); but ISTM the point here is that correlation/ordering is about the index as a whole, not any one column of it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_agg and array_accum (patch)
Jeff, * Jeff Davis ([EMAIL PROTECTED]) wrote: 2) ARRAY_ACCUM() -- Returns empty array on no input, and includes NULL inputs. Excellent.. I added it the easy way (from the online docs), but that's clearly not at all efficient and was going to try and fix it, for psql to use with the column-level privs patch. It'd be great to use a more efficient mechanism like this, and to remove adding it from my patch (erm, it's only one line currently, but it would have been alot more eventually :). I havn't actually reviewed the code at all, but +1 in general to adding this to core. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] BufferAccessStrategy for bulk insert
Seems sane to me. I don't see the point of the HEAP_INSERT_BULK flag bit --- providing or not providing bistate would cover that, and if you have a bit as well then you have to define what the inconsistent combinations mean. I concur with making all-zeroes be the typical state of the flag bits, too. Thanks for the design review. I had thought to make the inconsistent combinations fail an assertion, but I'm just as happy to leave it out altogether. FWIW, we generally declare bitmask flag variables as int, unless there's some really good reason to do otherwise. OK, thanks for the tip. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP patch: convert SQL-language functions to return tuplestores
We have an open TODO item to support SQL-language functions that return the output of a RETURNING clause attached to an INSERT/UPDATE/DELETE query within the function. This is something that was left undone in the 8.2 development cycle after this thread analyzed the problem: http://archives.postgresql.org/pgsql-hackers/2006-10/msg00665.php The basic conclusion was that the only sane way to do it is to have the SQL function execute the DML command to completion and then return the emitted rows in a tuplestore. Which is fine, but it would be pretty messy to do unless we make set-returning SQL functions return tuplestores all the time, and there was worry that that might lose performance compared to the existing value-per-call protocol. Attached is a draft patch that converts set-returning SQL functions to return tuplestores. It's pretty incomplete --- it doesn't actually add the RETURNING feature, and there's a lot of ugly stuff to clean up --- but it passes regression tests and it's close enough for performance testing. What I find is that the performance hit doesn't seem too bad. The test case I'm using looks like this: regression=# create function foo(int) returns setof int as 'select generate_series(1,$1)' language sql; CREATE FUNCTION regression=# select count(*) from (select foo(NNN)) ss; This example is chosen with malice aforethought to stress the tuplestore performance as much as possible. The internal generate_series() call is about as cheap a set-generating function as possible, and it returns through value-per-call mechanism so there is no added tuplestore in the way. In the outer query we again avoid the tuplestore that would be created by nodeFunctionscan.c, and we use an upper count(*) to avoid shipping all the rows to the client. I should note also that the function is intentionally declared VOLATILE to prevent its being inlined into the calling query. What I find on my development machine is that CVS HEAD processes this query at about 1.33 microsec/row. With the attached patch, the speed is about 1.0 usec/row if the tuplestore stays within work_mem; about 1.3 usec/row if it spills to disk but doesn't overflow available kernel disk cache; and about 1.56 usec/row in cases considerably larger than available RAM, when we actually have to write the data to disk and read it back. This is on my development workstation, which is a dual 2.8GHz Xeon EM64T with your typical junk consumer-grade single ATA disk drive, running Fedora 9. (BTW, the test seems to be mostly CPU-bound even when spilling to disk.) So I'm concluding that we can easily afford to switch to tuplestore-always operation, especially if we are willing to put any effort into tuplestore optimization. (I note that the current tuplestore code writes 24 bytes per row for this example, which is a shade on the high side for only 4 bytes payload. It looks like it would be pretty easy to knock 10 bytes off that for a 40% savings in I/O volume.) I'm putting up this patch mostly so that anyone who's worried about the performance issue can do their own tests. It's definitely not meant for style or completeness critiques ;-) BTW, the patch also removes the existing limitation of not being able to call set-returning plpgsql functions in a SELECT targetlist... regards, tom lane binqMn7ZJD0iv.bin Description: functions-via-tuplestore-1.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_agg and array_accum (patch)
Ian Caulfield [EMAIL PROTECTED] writes: I think array_agg also keeps nulls - although the draft standard I have seems to contradict itself about this... The SQL:2008 draft I have says, in 10.9 aggregate function general rule 8g NOTE 267 - Null values are not eliminated when computing array aggregate function. This, plus the optional sort specification list, sets array aggregate function apart from general set functions. So that seems to make it perfectly clear that nulls aren't eliminated, and furthermore to be an intentional override of any other part of the spec that you might think says nulls should be eliminated. If you have an argument to read it otherwise, please say exactly what. A larger objection to Jeff's draft patch is that it doesn't implement the sort specification list. I'm entirely happy about not doing that --- the current SQL committee's willingness to invent random new syntax and nonorthogonal behavior for every function they can think of will be the death of SQL yet --- but it's something that we at least need to document the workaround for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers