Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 9/19/15 10:46 AM, Tom Lane wrote: > Peter Eisentrautwrites: >> On 7/23/15 6:39 PM, Tom Lane wrote: >>> + 2202HEERRCODE_INVALID_TABLESAMPLE_ARGUMENT >>>invalid_tablesample_argument >>> + 2202GEERRCODE_INVALID_TABLESAMPLE_REPEAT >>>invalid_tablesample_repeat > >> Where did you get these error codes from? The constants in the SQL >> standard would map to > >> ERRCODE_INVALID_SAMPLE_SIZE >> ERRCODE_INVALID_REPEAT_ARGUMENT_IN_A_SAMPLE_CLAUSE > >> Were you looking at a different standard, or did you intentionally >> choose to rephrase? > > I was looking at SQL:2011. My concern in naming them that way was that > I wanted to have errcodes that would be general enough for any tablesample > extension to use, but still be tablesample-specific, ie I don't want them > to have to fall back on say ERRCODE_INVALID_PARAMETER_VALUE. Makes sense. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Peter Eisentrautwrites: > On 7/23/15 6:39 PM, Tom Lane wrote: >> + 2202HEERRCODE_INVALID_TABLESAMPLE_ARGUMENT >> invalid_tablesample_argument >> + 2202GEERRCODE_INVALID_TABLESAMPLE_REPEAT >> invalid_tablesample_repeat > Where did you get these error codes from? The constants in the SQL > standard would map to > ERRCODE_INVALID_SAMPLE_SIZE > ERRCODE_INVALID_REPEAT_ARGUMENT_IN_A_SAMPLE_CLAUSE > Were you looking at a different standard, or did you intentionally > choose to rephrase? I was looking at SQL:2011. My concern in naming them that way was that I wanted to have errcodes that would be general enough for any tablesample extension to use, but still be tablesample-specific, ie I don't want them to have to fall back on say ERRCODE_INVALID_PARAMETER_VALUE. Is your concern that we shouldn't be extending the meaning of these standard SQLSTATE numbers in that way, or that I didn't slavishly follow the standard's wording while naming the macros, or what exactly? It's certainly not too late to change this, but we need to agree on what would be better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 7/23/15 6:39 PM, Tom Lane wrote: > + 2202HEERRCODE_INVALID_TABLESAMPLE_ARGUMENT > invalid_tablesample_argument > + 2202GEERRCODE_INVALID_TABLESAMPLE_REPEAT > invalid_tablesample_repeat Where did you get these error codes from? The constants in the SQL standard would map to ERRCODE_INVALID_SAMPLE_SIZE ERRCODE_INVALID_REPEAT_ARGUMENT_IN_A_SAMPLE_CLAUSE Were you looking at a different standard, or did you intentionally choose to rephrase? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: The only major difference that I see so far and I'd like you to incorporate that into your patch is that I renamed the SampleScanCost to SampleScanGetRelSize because that reflects much better the use of it, it isn't really used for costing, but for getting the pages and tuples of the baserel. Good suggestion. I was feeling vaguely uncomfortable with that name as well, given what the functionality ended up being. (In previous drafts of my patch I'd given the costing function more responsibility, but that ends up making TSMs responsible to know what nodeSamplescan.c does, so it's wrong.) Will do. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: Ok, attached are couple of cosmetic changes - what I wrote above, plus comment about why we do float8 + hashing for REPEATABLE (it's not obvious IMHO) and one additional test query. OK, thanks. Do you want to do the contrib changes yourself as well or should I take it? I'm working on them already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-25 00:36, Tom Lane wrote: I wrote: Petr Jelinek p...@2ndquadrant.com writes: The only major difference that I see so far and I'd like you to incorporate that into your patch is that I renamed the SampleScanCost to SampleScanGetRelSize because that reflects much better the use of it, it isn't really used for costing, but for getting the pages and tuples of the baserel. Good suggestion. I was feeling vaguely uncomfortable with that name as well, given what the functionality ended up being. After further thought it seemed like the right name to use is SampleScanGetSampleSize, so as to avoid confusion between the size of the relation and the size of the sample. (The planner is internally not making such a distinction right now, but that doesn't mean we should propagate that fuzzy thinking into the API spec.) Right, sounds good to me. Attached is a more-or-less-final version of the proposed patch. Major changes since yesterday: * I worked over the contrib modules and docs. * I thought of a reasonably easy way to do something with nonrepeatable sampling methods inside join queries: we can simply wrap the SampleScan plan node in a Materialize node, which will guard it against being executed more than once. There might be a few corner cases where this doesn't work fully desirably, but in testing it seemed to do the right thing (and I added some regression tests about that). This seems certainly a better answer than either throwing an error or ignoring the problem. Seems reasonable. I was wondering if we should perhaps cache the output of GetTsmRoutine as we call it up to 4 times in the planner now but it's relatively cheap call (basically just makeNode) so it's probably not worth it. In any case it's not something that should stop you from committing the patch. Thanks for all your work on this. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: I was wondering if we should perhaps cache the output of GetTsmRoutine as we call it up to 4 times in the planner now but it's relatively cheap call (basically just makeNode) so it's probably not worth it. Yeah, I was wondering about that too. The way to do it would probably be to add a TsmRoutine pointer to RelOptInfo. I'm not concerned at all about the makeNode/fill-the-struct cost, but the syscache lookup involved in getting from the function OID to the function might be worth worrying about. As things stand, it didn't quite seem worth the trouble, but if we add any more planner lookups of the TsmRoutine then I'd want to do it. Another place for future improvement is to store the sample-size outputs separately in RelOptInfo instead of overwriting pages/tuples. I'm not sure it's worth the complication right now, but if we ever support doing sampling with more than one scan plan type (eg bernoulli filtering in an indexscan), we'd pretty much have to do that in order to be able to compute costs sanely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-24 00:39, Tom Lane wrote: I wrote: OK, so InitSampleScan for a function called at ExecInitSampleScan time (which we might as well make optional), and then we'll use BeginSampleScan for the function that gets the parameters. The restart/ReScan function goes away since BeginSampleScan will take its place. Here's a WIP patch implementing things this way. I've also taken the time to do a complete code review, and fixed quite a large number of things, some cosmetic and some not so much. I have not yet touched the tsm contrib modules (so they won't even compile...), but I'm reasonably happy with the state of the core code now. Eh, I was just writing email with my rewrite :) I'll do review of this instead then. The only major difference that I see so far and I'd like you to incorporate that into your patch is that I renamed the SampleScanCost to SampleScanGetRelSize because that reflects much better the use of it, it isn't really used for costing, but for getting the pages and tuples of the baserel. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-24 01:26, Petr Jelinek wrote: On 2015-07-24 00:39, Tom Lane wrote: I wrote: OK, so InitSampleScan for a function called at ExecInitSampleScan time (which we might as well make optional), and then we'll use BeginSampleScan for the function that gets the parameters. The restart/ReScan function goes away since BeginSampleScan will take its place. Here's a WIP patch implementing things this way. I've also taken the time to do a complete code review, and fixed quite a large number of things, some cosmetic and some not so much. I have not yet touched the tsm contrib modules (so they won't even compile...), but I'm reasonably happy with the state of the core code now. Eh, I was just writing email with my rewrite :) I'll do review of this instead then. The only major difference that I see so far and I'd like you to incorporate that into your patch is that I renamed the SampleScanCost to SampleScanGetRelSize because that reflects much better the use of it, it isn't really used for costing, but for getting the pages and tuples of the baserel. Ok, attached are couple of cosmetic changes - what I wrote above, plus comment about why we do float8 + hashing for REPEATABLE (it's not obvious IMHO) and one additional test query. Do you want to do the contrib changes yourself as well or should I take it? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/tablesample/bernoulli.c b/src/backend/access/tablesample/bernoulli.c index 97d6407..e0a1568 100644 --- a/src/backend/access/tablesample/bernoulli.c +++ b/src/backend/access/tablesample/bernoulli.c @@ -46,7 +46,7 @@ typedef struct } BernoulliSamplerData; -static void bernoulli_samplescancost(PlannerInfo *root, +static void bernoulli_samplescangetrelsize(PlannerInfo *root, RelOptInfo *baserel, List *paramexprs, BlockNumber *pages, @@ -73,7 +73,7 @@ tsm_bernoulli_handler(PG_FUNCTION_ARGS) tsm-parameterTypes = list_make1_oid(FLOAT4OID); tsm-repeatable_across_queries = true; tsm-repeatable_across_scans = true; - tsm-SampleScanCost = bernoulli_samplescancost; + tsm-SampleScanGetRelSize = bernoulli_samplescangetrelsize; tsm-InitSampleScan = bernoulli_initsamplescan; tsm-BeginSampleScan = bernoulli_beginsamplescan; tsm-NextSampleBlock = NULL; @@ -87,11 +87,11 @@ tsm_bernoulli_handler(PG_FUNCTION_ARGS) * Cost estimation. */ static void -bernoulli_samplescancost(PlannerInfo *root, - RelOptInfo *baserel, - List *paramexprs, - BlockNumber *pages, - double *tuples) +bernoulli_samplescangetrelsize(PlannerInfo *root, + RelOptInfo *baserel, + List *paramexprs, + BlockNumber *pages, + double *tuples) { Node *pctnode; float4 samplesize; diff --git a/src/backend/access/tablesample/system.c b/src/backend/access/tablesample/system.c index 7cece29..1cdb859 100644 --- a/src/backend/access/tablesample/system.c +++ b/src/backend/access/tablesample/system.c @@ -48,7 +48,7 @@ typedef struct } SystemSamplerData; -static void system_samplescancost(PlannerInfo *root, +static void system_samplescangetrelsize(PlannerInfo *root, RelOptInfo *baserel, List *paramexprs, BlockNumber *pages, @@ -76,7 +76,7 @@ tsm_system_handler(PG_FUNCTION_ARGS) tsm-parameterTypes = list_make1_oid(FLOAT4OID); tsm-repeatable_across_queries = true; tsm-repeatable_across_scans = true; - tsm-SampleScanCost = system_samplescancost; + tsm-SampleScanGetRelSize = system_samplescangetrelsize; tsm-InitSampleScan = system_initsamplescan; tsm-BeginSampleScan = system_beginsamplescan; tsm-NextSampleBlock = system_nextsampleblock; @@ -90,11 +90,11 @@ tsm_system_handler(PG_FUNCTION_ARGS) * Cost estimation. */ static void -system_samplescancost(PlannerInfo *root, - RelOptInfo *baserel, - List *paramexprs, - BlockNumber *pages, - double *tuples) +system_samplescangetrelsize(PlannerInfo *root, + RelOptInfo *baserel, + List *paramexprs, + BlockNumber *pages, + double *tuples) { Node *pctnode; float4 samplesize; diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c index 32db05e..144d2db 100644 --- a/src/backend/executor/nodeSamplescan.c +++ b/src/backend/executor/nodeSamplescan.c @@ -325,6 +325,12 @@ tablesample_init(SampleScanState *scanstate) * suitable seed. For regression-testing purposes, that has the * convenient property that REPEATABLE(0) gives a machine-independent * result. + * + * The reson for using float8 for REPEATABLE at the SQL level is that + * it will work as expected both for users used to databases which + * accept only integers in REPEATABLE clause and for those who might + * expect that REPEATABLE works same way as setseed() (float in the + * range from -1 to 1). */ seed =
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: On 2015-07-23 02:01, Tom Lane wrote: This needs to work more like LIMIT, which doesn't try to compute the limit parameters until the first fetch. So what we need is an Init function that does very darn little indeed (maybe we don't even need it at all), and then a ParamInspect function that is called at first fetch or during a ReScan, and that one is the one that gets to look at the evaluated parameter values. If we replace the Begin and ReScan interfaces by single interface the Init would definitely be optional (all it would do so far is palloc the tsmdata which can be done easily in the new interface if tsmdata is NULL). I don't like the ParamInspect name as that function needs to setup the state for the sampling method (and that's true no matter if we have Init or not), I think something like BeginScan works better, but it must be clearly documented that it's called for ReScan as well. OK, so InitSampleScan for a function called at ExecInitSampleScan time (which we might as well make optional), and then we'll use BeginSampleScan for the function that gets the parameters. The restart/ReScan function goes away since BeginSampleScan will take its place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
I wrote: We could alternatively provide two scan-initialization callbacks, one that analyzes the parameters before we do heap_beginscan, and another that can do additional setup afterwards. Actually, that second call would really not be meaningfully different from the ReScan call, so another solution would be to just automatically invoke ReScan after we've created the HeapScanDesc. TSMs could work around not having this by complicating their NextBlock function a bit (so that it would do some initialization on first call) but perhaps it would be easier to have the additional init call. Actually, there's another reason why this has to be redesigned: evaluating the parameter expressions of TABLESAMPLE during ExecInitSampleScan is not OK. Consider this: regression=# select * from (values (0::float4),(100)) v(pct), lateral (select count(*) from tenk1 tablesample bernoulli (pct)) ss; pct | count -+--- 0 | 0 100 | 0 (2 rows) Surely the second execution of the subquery should've given me 1. It doesn't because the TABLESAMPLE parameters aren't recomputed during a rescan. Even if you're minded to claim that that's all right, this is definitely not acceptable: regression=# select * from (values (0::float4),(100)) v(pct), lateral (select * from tenk1 tablesample bernoulli (pct)) ss; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. That one's crashing because pct is a Var that refers to an outer scan tuple that doesn't exist yet. I'm not real sure how come the case with an aggregate manages not to crash, actually. This needs to work more like LIMIT, which doesn't try to compute the limit parameters until the first fetch. So what we need is an Init function that does very darn little indeed (maybe we don't even need it at all), and then a ParamInspect function that is called at first fetch or during a ReScan, and that one is the one that gets to look at the evaluated parameter values. If we wanted to let the method inspect the HeapScanDesc during setup it would need still a third callback. I'm not exactly sure if that's worth the trouble. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-20 17:23, Tom Lane wrote: Doesn't look like it to me: heap_beginscan_sampling always passes allow_sync = false to heap_beginscan_internal. Oh, right. More to the point, the existing coding of all these methods is such that they would fail to be reproducible if syncscan were enabled, because the scan start point wouldn't be the same. That's fixable, but it'll take more work than just dealing with the above oversight. In any case, given that Simon has stated he wants support for sample methods that don't try to be reproducible, we need the method to be able to say whether syncscan is allowed. I am not completely clear on how we do this tbh, do you mean we use the info about repeatability for this? Another thing that's not clear to me after playing with this is how do we want to detect if to do pagemode scan or not. I understand that it's neat optimization to say pagemode = true in bernoulli when percentage is high and false when it's low but then this would have to come from the BeginSampleScan() in the proposed API, but then BeginSampleScan would not have access to HeapScanDesc as it would not be inited yet when it's called. The info that BeginSampleScan() needs can be obtained directly from ss_currentRelation I guess, but it's somewhat strange to pass semi-initialized SampleScanState to the BeginSampleScan(). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: Another thing that's not clear to me after playing with this is how do we want to detect if to do pagemode scan or not. I understand that it's neat optimization to say pagemode = true in bernoulli when percentage is high and false when it's low but then this would have to come from the BeginSampleScan() in the proposed API, but then BeginSampleScan would not have access to HeapScanDesc as it would not be inited yet when it's called. Right. The info that BeginSampleScan() needs can be obtained directly from ss_currentRelation I guess, but it's somewhat strange to pass semi-initialized SampleScanState to the BeginSampleScan(). Doesn't seem like a big problem from here. The HeapScanDesc pointer will be null at that point, so it's not like attempts to access it would escape notice. We could alternatively provide two scan-initialization callbacks, one that analyzes the parameters before we do heap_beginscan, and another that can do additional setup afterwards. Actually, that second call would really not be meaningfully different from the ReScan call, so another solution would be to just automatically invoke ReScan after we've created the HeapScanDesc. TSMs could work around not having this by complicating their NextBlock function a bit (so that it would do some initialization on first call) but perhaps it would be easier to have the additional init call. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-19 22:56, Tom Lane wrote: Since I'm not observing any movement on the key question of redesigning the tablesample method API, and I think that's something that's absolutely necessary to fix for 9.5, attached is an attempt to respecify the API. Sorry, I got something similar to what you posted written as well, but I didn't want to submit before I have more working code done. * I got rid of the TableSampleDesc struct altogether in favor of giving the execution functions access to the whole SampleScanState executor state node. If we add support for filtering at the join level, filtering in indexscan nodes, etc, those would become separate sets of API functions in my view; there is no need to pretend that this set of API functions works for anything except the SampleScan case. This way is more parallel to the FDW precedent, too. In particular it lets tablesample code get at the executor's EState, which the old API did not, but which might be necessary for some scenarios. Ok. * You might have expected me to move the tsmseqscan and tsmpagemode flags into the TsmRoutine struct, but instead this API puts equivalent flags into the SampleScanState struct. The reason for that is that it lets the settings be determined at runtime after inspecting the TABLESAMPLE parameters, which I think would be useful. For example, whether to use the bulkread strategy should probably depend on what the sampling percentage is. I think this ignores one aspect of the old API. That is, we allowed the sampling method to specify which kind of input parameters it accepts. This is why for example the tsm_system_rows accepts integer while system and bernoulli both accept float8. This part of the API is certainly not needed for bernoulli and system sampling but if we ever want to do something more sophisticated like stratified sampling which Simon mentioned several times, specifying just percentages is not going to be enough. * As written, this still allows TABLESAMPLE parameters to have null values, but I'm pretty strongly tempted to get rid of that: remove the paramisnull[] argument and make the core code reject null values. I can't see any benefit in allowing null values that would justify the extra code and risks-of-omission involved in making every tablesample method check this for itself. I am for not allowing NULLs. * I specified that omitting NextSampleBlock is allowed and causes the core code to do a standard seqscan, including syncscan support, which is a behavior that's impossible with the current API. If we fix the bernoulli code to have history-independent sampling behavior, I see no reason that syncscan shouldn't be enabled for it. Umm, we were actually doing syncscan as well before. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-20 12:18, Petr Jelinek wrote: On 2015-07-19 22:56, Tom Lane wrote: * You might have expected me to move the tsmseqscan and tsmpagemode flags into the TsmRoutine struct, but instead this API puts equivalent flags into the SampleScanState struct. The reason for that is that it lets the settings be determined at runtime after inspecting the TABLESAMPLE parameters, which I think would be useful. For example, whether to use the bulkread strategy should probably depend on what the sampling percentage is. I think this ignores one aspect of the old API. That is, we allowed the sampling method to specify which kind of input parameters it accepts. This is why for example the tsm_system_rows accepts integer while system and bernoulli both accept float8. This part of the API is certainly not needed for bernoulli and system sampling but if we ever want to do something more sophisticated like stratified sampling which Simon mentioned several times, specifying just percentages is not going to be enough. Actually I see the tsmapi has support for this, so please ignore the noise. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
to handle DROP dependency behaviors properly. (On reflection, maybe better if it's bernoulli(internal) returns tablesample_handler, so as to guarantee that such a function doesn't create a conflict with any user-defined function of the same name.) The probability of conflict seems high with the system() so yeah we'd need some kind of differentiator. Maybe it would be even better to have something like bernoulli(tablesample) where tablesample defined as pseudo-type like trigger. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: On 2015-07-19 22:56, Tom Lane wrote: * I specified that omitting NextSampleBlock is allowed and causes the core code to do a standard seqscan, including syncscan support, which is a behavior that's impossible with the current API. If we fix the bernoulli code to have history-independent sampling behavior, I see no reason that syncscan shouldn't be enabled for it. Umm, we were actually doing syncscan as well before. Doesn't look like it to me: heap_beginscan_sampling always passes allow_sync = false to heap_beginscan_internal. More to the point, the existing coding of all these methods is such that they would fail to be reproducible if syncscan were enabled, because the scan start point wouldn't be the same. That's fixable, but it'll take more work than just dealing with the above oversight. In any case, given that Simon has stated he wants support for sample methods that don't try to be reproducible, we need the method to be able to say whether syncscan is allowed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 19 July 2015 at 21:56, Tom Lane t...@sss.pgh.pa.us wrote: Since I'm not observing any movement Apologies if nothing visible was occurring. Petr and I had arranged to meet and discuss Mon/Tues. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Since I'm not observing any movement on the key question of redesigning the tablesample method API, and I think that's something that's absolutely necessary to fix for 9.5, attached is an attempt to respecify the API. I haven't actually written any code, but I've written a tsmapi.h file modeled on fdwapi.h, and rewritten tablesample-method.sgml to match; those two files are attached. Some notes: * This is predicated on the assumption that we'll get rid of the pg_tablesample_method catalog and instead have a single FDW-style handler function for each sample method. That fixes the problem with the contrib modules being broken in DROP/pg_upgrade cases. * I got rid of the TableSampleDesc struct altogether in favor of giving the execution functions access to the whole SampleScanState executor state node. If we add support for filtering at the join level, filtering in indexscan nodes, etc, those would become separate sets of API functions in my view; there is no need to pretend that this set of API functions works for anything except the SampleScan case. This way is more parallel to the FDW precedent, too. In particular it lets tablesample code get at the executor's EState, which the old API did not, but which might be necessary for some scenarios. * You might have expected me to move the tsmseqscan and tsmpagemode flags into the TsmRoutine struct, but instead this API puts equivalent flags into the SampleScanState struct. The reason for that is that it lets the settings be determined at runtime after inspecting the TABLESAMPLE parameters, which I think would be useful. For example, whether to use the bulkread strategy should probably depend on what the sampling percentage is. * I got rid of the ExamineTuple function altogether. As I said before, I think what that is mostly doing is encouraging people to do unsound things. But in any case, there is no need for it because NextSampleTuple *can* look at the HeapScanDesc state if it really wants to: that lets it identify visible tuples, or even inspect the tuple contents. In the attached, I documented the visible-tuples aspect but did not suggest examining tuple contents. * As written, this still allows TABLESAMPLE parameters to have null values, but I'm pretty strongly tempted to get rid of that: remove the paramisnull[] argument and make the core code reject null values. I can't see any benefit in allowing null values that would justify the extra code and risks-of-omission involved in making every tablesample method check this for itself. * I specified that omitting NextSampleBlock is allowed and causes the core code to do a standard seqscan, including syncscan support, which is a behavior that's impossible with the current API. If we fix the bernoulli code to have history-independent sampling behavior, I see no reason that syncscan shouldn't be enabled for it. Barring objections, I'll press forward with turning this into code over the next few days. regards, tom lane /*- * * tsmapi.h * API for tablesample methods * * Copyright (c) 2015, PostgreSQL Global Development Group * * src/include/access/tsmapi.h * *- */ #ifndef TSMAPI_H #define TSMAPI_H #include nodes/execnodes.h #include nodes/relation.h /* * Callback function signatures --- see tablesample-method.sgml for more info. */ typedef void (*SampleScanCost_function) (PlannerInfo *root, RelOptInfo *baserel, List *paramexprs, BlockNumber *pages, double *tuples); typedef void (*BeginSampleScan_function) (SampleScanState *node, int eflags, uint32 seed, Datum *params, bool *paramisnull, int nparams); typedef BlockNumber (*NextSampleBlock_function) (SampleScanState *node); typedef OffsetNumber (*NextSampleTuple_function) (SampleScanState *node, BlockNumber blockno, OffsetNumber maxoffset); typedef void (*ReScanSampleScan_function) (SampleScanState *node); typedef void (*EndSampleScan_function) (SampleScanState *node); /* * TsmRoutine is the struct returned by a tablesample method's handler * function. It provides pointers to the callback functions needed by the * planner and executor, as well as additional information about the method. * * More function pointers are likely to be added in the future. * Therefore it's recommended that the handler initialize the struct with * makeNode(TsmRoutine) so that all fields are set to NULL. This will * ensure that no fields are accidentally left undefined. */ typedef struct TsmRoutine { NodeTag type; /* List of datatype OIDs for the arguments of the TABLESAMPLE clause */ List *parameterTypes; /* Can method produce repeatable samples across, or even within, queries? */ bool repeatable_across_queries; bool
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-16 17:08, Tom Lane wrote: Petr Jelinek p...@2ndquadrant.com writes: On 2015-07-16 15:59, Tom Lane wrote: I'm not clear on whether sequence AMs would need explicit catalog representation, or could be folded down to just a single SQL function with special signature as I suggested for tablesample handlers. Is there any need for a sequence AM to have additional catalog infrastructure like index AMs need? That depends on the route we will choose to take with the storage there. If we allow custom columns for sequence AMs (which is what both Heikki and me seem to be inclined to do) then I think it will still need catalog, plus it's also easier to just reuse the relam behavior than coming up up with something completely new IMHO. Hm. I've not been following the sequence AM stuff carefully, but if you want to use relam to point at a sequence's AM then really sequence AMs have to be represented in pg_am. (It would be quite broken from a relational theory standpoint if relam could point to either of two catalogs; not to mention that we have no way to guarantee OID uniqueness across multiple catalogs.) Well not necessarily, it would just mean that relam has different meaning depending on relkind so the OID uniqueness is not needed at all. As things stand today, putting both kinds of AM into one catalog would be pretty horrible, but it seems not hard to make it work if we did this sort of refactoring first. We'd end up with three columns in pg_am: amname, amkind (index or sequence), and amhandler. The type and contents of the struct returned by amhandler could be different depending on amkind, which means that the APIs could be as different as we need, despite sharing just one catalog. The only real restriction is that index and sequence AMs could not have the same names, which doesn't seem like much of a problem from here. Yes, this would be better. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-13 00:36, Tom Lane wrote: We have a far better model to follow already, namely the foreign data wrapper API. In that, there's a single SQL-visible function that returns a dummy datatype indicating that it's an FDW handler, and when called, it hands back a struct containing pointers to all the other functions that the particular wrapper needs to supply (and, if necessary, the struct could have non-function-pointer fields containing other info the core system might need to know about the handler). We could similarly invent a pseudotype tablesample_handler and represent each tablesample method by a single SQL function returning tablesample_handler. Any future changes in the API for tablesample handlers would then appear as changes in the C definition of the struct returned by the handler, which requires no SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it is pretty easy to design it so that failure to update an external module that implements the API results in C compiler errors and/or sane fallback behavior. (back from vacation, going over this thread) Yes this sounds very sane (and we use something similar also for logical decoding plugins, not just FDWs). I wish this has occurred to me before, I would not have to spend time on the DDL support which didn't even get in. Once we've done that, I think we don't even need a special catalog representing tablesample methods. Given FROM foo TABLESAMPLE bernoulli(...), the parser could just look for a function bernoulli() returning tablesample_handler, and it's done. The querytree would have an ordinary function dependency on that function, which would be sufficient It seems possible that we might not need catalog indeed. This would also simplify the parser part which currently contains specialized function search code as we could most likely just reuse the generic code. to handle DROP dependency behaviors properly. (On reflection, maybe better if it's bernoulli(internal) returns tablesample_handler, so as to guarantee that such a function doesn't create a conflict with any user-defined function of the same name.) The probability of conflict seems high with the system() so yeah we'd need some kind of differentiator. PS: now that I've written this rant, I wonder why we don't redesign the index AM API along the same lines. It probably doesn't matter much at the moment, but if we ever get serious about supporting index AM extensions, I think we ought to consider doing that. +1 I think this is very relevant to the proposed sequence am patch as well. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek wrote: On 2015-07-13 00:36, Tom Lane wrote: PS: now that I've written this rant, I wonder why we don't redesign the index AM API along the same lines. It probably doesn't matter much at the moment, but if we ever get serious about supporting index AM extensions, I think we ought to consider doing that. +1 I think this is very relevant to the proposed sequence am patch as well. Hmm, how would this work? Would we have index AM implementation run some function that register their support methods somehow at startup? Hopefully we're not going to have the index AMs become shared libraries. In any case, if indexes AMs and sequence AMs go this route, that probably means the column store AM we're working on will probably have to go the same route too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-13 15:39, Tom Lane wrote: Datum tsm_system_rows_init(PG_FUNCTION_ARGS) { TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0); uint32 seed = PG_GETARG_UINT32(1); int32 ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2); This is rather curious coding. Why should this function check only one of its arguments for nullness? If it needs to defend against any of them being null, it really needs to check all. But in point of fact, this function is declared STRICT, which means it's a violation of the function call protocol if the core code ever passes a null to it, and so this test ought to be dead code. A similar pattern of ARGISNULL checks in declared-strict functions exists in all the tablesample modules, not just this one, showing that this is an overall design error not just a thinko here. My inclination would be to make the core code enforce non-nullness of all tablesample arguments so as to make it unnecessary to check strictness of the tsm*init functions, but in any case it is Not Okay to just pass nulls willy-nilly to strict C functions. The reason for this ugliness came from having to have balance between modularity and following the SQL Standard error codes for BERNOULLI and SYSTEM, which came as issue during reviews (the original code did the checks before calling the sampling method's functions but produced just generic error code about wrong parameter). I considered it as okayish mainly because those functions are not SQL callable and the code which calls them knows how to handle it correctly, but I understand why you don't. I guess if we did what you proposed in another email in this thread - don't have the API exposed on SQL level, we could send the additional parameters as List * and leave the validation completely to the function. (And maybe don't allow NULLs at all) Also, I find this coding pretty sloppy even without the strictness angle, because the net effect of this way of dealing with nulls is that an argument-must-not-be-null complaint is reported as out of range, which is both confusing and the wrong ERRCODE. if (ntuples 1) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(invalid sample size), errhint(Sample size must be positive integer value.))); I don't find this to be good error message style. The secondary comment is not a hint, it's an ironclad statement of what you did wrong, so if we wanted to phrase it like this it should be an errdetail not errhint. But the whole thing is overly cute anyway because there is no reason at all not to just say what we mean in the primary error message, eg ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(sample size must be greater than zero))); Same as above, although now that I re-read the standard I am sure I misunderstand it the first time - it says: If S is the null value or if S 0 (zero) or if S 100, then an exception condition is raised: data exception — invalid sample size. I took it as literal error message originally but they just mean status code by it. While we're on the subject, what's the reason for disallowing a sample size of zero? Seems like a reasonable edge case. Yes that's a bug. /* All blocks have been read, we're done */ if (sampler-doneblocks sampler-nblocks || sampler-donetuples = sampler-ntuples) PG_RETURN_UINT32(InvalidBlockNumber); Okay, I lied, I *am* going to complain about this comment. Comments that do not accurately describe the code they're attached to are worse than useless. That's copy-pasto from the tsm_system_time. In short, I'd do something more like uint32 r; /* Safety check to avoid infinite loop or zero result for small n. */ if (n = 1) return 1; /* * This should only take 2 or 3 iterations as the probability of 2 numbers * being relatively prime is ~61%; but just in case, we'll include a * CHECK_FOR_INTERRUPTS in the loop. */ do { CHECK_FOR_INTERRUPTS(); r = (uint32) (sampler_random_fract(randstate) * n); } while (r == 0 || gcd(r, n) 1); Note however that this coding would result in an unpredictable number of iterations of the RNG, which might not be such a good thing if we're trying to achieve repeatability. It doesn't matter in the context of this module since the RNG is not used after initialization, but it would matter if we then went on to do Bernoulli-style sampling. (Possibly that could be dodged by reinitializing the RNG after the initialization steps.) Bernoulli-style sampling does not need this kind of code so it's not really an issue. That is unless you'd like to combine the linear probing and bernoulli of course, but I don't see any benefit in doing that. -- Petr Jelinek http://www.2ndQuadrant.com/
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-13 18:00, Tom Lane wrote: And here's that. I do *not* claim that this is a complete list of design issues with the patch, it's just things I happened to notice in the amount of time I've spent so far (which is already way more than I wanted to spend on TABLESAMPLE right now). I'm not sure that we need an extensible sampling-method capability at all, let alone that an API for that needs to be the primary focus of a patch. Certainly the offered examples of extension modules aren't inspiring: tsm_system_time is broken by design (more about that below) and nobody would miss tsm_system_rows if it weren't there. What is far more important is to get sampling capability into indexscans, and designing an API ahead of time seems like mostly a distraction from that. I'd think seriously about tossing the entire executor-stage part of the patch, creating a stateless (history-independent) sampling filter that just accepts or rejects TIDs, and sticking calls to that into all the table scan node types. Once you had something like that working well it might be worth considering whether to try to expose an API to generalize it. But even then it's not clear that we really need any more options than true-Bernoulli and block-level sampling. I think this is not really API issue as much as opposite approach on what to implement first. I prioritized in first iteration for the true block sampling support as that's what I've been mostly asked for. But my plan was to have at same later stage (9.6+) also ability to do subquery scans etc. Basically new SamplingFilter executor node which would pass the tuples to examinetuple() which would then decide what to do with it. The selection between using nextblock/nexttuple and examinetuple was supposed to be extension of the API where the sampling method would say if it supports examinetuple or nextblock/nexttuple or both. And eventually I wanted to rewrite bernoulli to just use the examinetuple() on top of whatever scan once this additional support was in. I think I explained this during the review stage. The IBM paper I linked to in the other thread mentions that their optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM was requested. Probably that happens when it decides to do a simple indexscan, because then there's no advantage to trying to cluster the Yeah it happens when there is an index which is used in WHERE clause and has bigger selectivity than the percentage specified in the TABLESAMPLE clause. This of course breaks one of the common use-case though: SELECT count(*) * 100 FROM table TABLESAMPLE SYSTEM(1) WHERE foo = bar; sampled rows. But in the case of a bitmap scan, you could very easily do either true Bernoulli or block-level sampling simply by adjusting the rules about which bits you keep or clear in the bitmap (given that you apply the filter between collection of the index bitmap and accessing the heap, which seems natural). The only case where a special scan type really seems to be needed is if you want block-level sampling, the query would otherwise use a seqscan, *and* the sampling percentage is pretty low --- if you'd be reading every second or third block anyway, you're likely better off with a plain seqscan so that the kernel sees sequential access and does prefetching. The current API doesn't seem to make it possible to switch between seqscan and read-only-selected-blocks based on the sampling percentage, but I think that could be an interesting optimization. Well you can do that if you write your own sampling method. We don't do that in optimizer and that's design choice, because you can't really do that on high level like that if you want to keep extensibility. And given the amount of people that asked if they can do their own sampling when I talked to them about this during the design stage, I consider the extensibility as more important. Especially if extensibility gives you the option to do the switching anyway, albeit on lower-level and not out of the box. (Another bet that's been missed is having the samplescan logicrequest prefetching when it is doing selective block reads. The current API can't support that AFAICS, since there's no expectation that nextblock calls could be done asynchronously from nexttuple calls.) Not sure I follow, would not it be possible to achieve this using the tsmseqscan set to true (it's a misnomer then, I know)? Another issue with the API as designed is the examinetuple() callback. Allowing sample methods to see invisible tuples is quite possibly the worst idea in the whole patch. They can *not* do anything with such tuples, or they'll totally break reproducibility: if the tuple is invisible to your scan, it might well be or soon become invisible to everybody, whereupon it would be subject to garbage collection at the drop of a hat. So if an invisible tuple affects the sample method's behavior at all, repeated scans in the same query would
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek wrote: On 2015-07-13 15:39, Tom Lane wrote: I don't find this to be good error message style. The secondary comment is not a hint, it's an ironclad statement of what you did wrong, so if we wanted to phrase it like this it should be an errdetail not errhint. But the whole thing is overly cute anyway because there is no reason at all not to just say what we mean in the primary error message, eg ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(sample size must be greater than zero))); Same as above, although now that I re-read the standard I am sure I misunderstand it the first time - it says: If S is the null value or if S 0 (zero) or if S 100, then an exception condition is raised: data exception — invalid sample size. I took it as literal error message originally but they just mean status code by it. Yes, must use a new errcode ERRCODE_INVALID_SAMPLE_SIZE defined to 2202H here. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 2015-07-16 15:59, Tom Lane wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: Petr Jelinek wrote: On 2015-07-13 00:36, Tom Lane wrote: PS: now that I've written this rant, I wonder why we don't redesign the index AM API along the same lines. It probably doesn't matter much at the moment, but if we ever get serious about supporting index AM extensions, I think we ought to consider doing that. I think this is very relevant to the proposed sequence am patch as well. Hmm, how would this work? Would we have index AM implementation run some function that register their support methods somehow at startup? Well, registration of new index AMs is an unsolved question ATM anyhow. But what I'm imagining is that pg_am would reduce to about two columns, amname and a handler function OID, and everything else that constitutes the API for AMs would get moved down to the C level. We have to keep that catalog because we still need index AMs to have OIDs that will represent them in pg_opclass etc; but we don't need to nail the exact set of AM interface functions into the catalog. (I'm not sure whether we'd want to remove all the bool columns from pg_am. At the C level it would be about as convenient to have them in a struct returned by the handler function. But it's occasionally useful to have those properties visible to SQL queries.) This is along the lines of how I was thinking also (when I read your previous email). I think the properties of the index will have to be decided on individual basis once somebody actually starts working on this. But functions can clearly go into C struct if they are called only from C anyway. I'm not clear on whether sequence AMs would need explicit catalog representation, or could be folded down to just a single SQL function with special signature as I suggested for tablesample handlers. Is there any need for a sequence AM to have additional catalog infrastructure like index AMs need? That depends on the route we will choose to take with the storage there. If we allow custom columns for sequence AMs (which is what both Heikki and me seem to be inclined to do) then I think it will still need catalog, plus it's also easier to just reuse the relam behavior than coming up up with something completely new IMHO. In any case, if indexes AMs and sequence AMs go this route, that probably means the column store AM we're working on will probably have to go the same route too. It's worth considering anyway. The FDW API has clearly been far more successful than the index AM API in terms of being practically usable by extensions. Yep, I now consider it to be a clear mistake that I modeled both sequence am and tablesample after indexes given that I targeted both at extensibility. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On Thu, Jul 16, 2015 at 10:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Petr Jelinek wrote: On 2015-07-13 00:36, Tom Lane wrote: PS: now that I've written this rant, I wonder why we don't redesign the index AM API along the same lines. It probably doesn't matter much at the moment, but if we ever get serious about supporting index AM extensions, I think we ought to consider doing that. +1 I think this is very relevant to the proposed sequence am patch as well. Hmm, how would this work? Would we have index AM implementation run some function that register their support methods somehow at startup? Hopefully we're not going to have the index AMs become shared libraries. I recall a proposal by Alexander Korotkov about extensible access methods although his proposal also included a CREATE AM command that would add a pg_am row so that perhaps differs from what Tom seems to allude to here. http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Amit Langote amitlangot...@gmail.com writes: On Thu, Jul 16, 2015 at 10:33 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Hmm, how would this work? Would we have index AM implementation run some function that register their support methods somehow at startup? I recall a proposal by Alexander Korotkov about extensible access methods although his proposal also included a CREATE AM command that would add a pg_am row so that perhaps differs from what Tom seems to allude to here. I think we'd still need to invent CREATE AM if we wanted to allow index AMs to be created as extensions; we'd still have to have the pg_am catalog, and extensions still couldn't write rows directly into that, for the same reasons I pointed out with respect to tablesample methods. However, if the contents of pg_am could be boiled down to just a name and a handler function, then that would represent a simple and completely stable definition for CREATE AM's arguments, which would be a large improvement over trying to reflect the current contents of pg_am directly in a SQL statement. We add new columns to pg_am all the time, and that would create huge backward-compatibility headaches if we had to modify the behavior of a CREATE AM statement every time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Petr Jelinek p...@2ndquadrant.com writes: On 2015-07-16 15:59, Tom Lane wrote: I'm not clear on whether sequence AMs would need explicit catalog representation, or could be folded down to just a single SQL function with special signature as I suggested for tablesample handlers. Is there any need for a sequence AM to have additional catalog infrastructure like index AMs need? That depends on the route we will choose to take with the storage there. If we allow custom columns for sequence AMs (which is what both Heikki and me seem to be inclined to do) then I think it will still need catalog, plus it's also easier to just reuse the relam behavior than coming up up with something completely new IMHO. Hm. I've not been following the sequence AM stuff carefully, but if you want to use relam to point at a sequence's AM then really sequence AMs have to be represented in pg_am. (It would be quite broken from a relational theory standpoint if relam could point to either of two catalogs; not to mention that we have no way to guarantee OID uniqueness across multiple catalogs.) As things stand today, putting both kinds of AM into one catalog would be pretty horrible, but it seems not hard to make it work if we did this sort of refactoring first. We'd end up with three columns in pg_am: amname, amkind (index or sequence), and amhandler. The type and contents of the struct returned by amhandler could be different depending on amkind, which means that the APIs could be as different as we need, despite sharing just one catalog. The only real restriction is that index and sequence AMs could not have the same names, which doesn't seem like much of a problem from here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote: Michael Paquier michael.paqu...@gmail.com writes: Regarding the fact that those two contrib modules can be part of a -contrib package and could be installed, nuking those two extensions from the tree and preventing the creating of custom tablesample methods looks like a correct course of things to do for 9.5. TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. Based on the various comments here, I don't see that as the right course of action at this point. There are no issues relating to security or data loss, just various fixable issues in a low-impact feature, which in my view is an important feature also. If it's to stay, it *must* get a line-by-line review from some committer-level person; and I think there are other more important things for us to be doing for 9.5. Honestly, I am very surprised by this. My feeling was the code was neat, clear and complete, much more so than many patches I review. If I had thought the patch or its implementation was in any way contentious I would not have committed it. I take responsibility for the state of the code and will put time into addressing the concerns mentioned and others. If we cannot resolve them in reasonable time, a revert is possible: there is nothing riding on this from me. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 14 July 2015 at 15:32, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote: TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. There are no issues relating to security or data loss, just various fixable issues in a low-impact feature, which in my view is an important feature also. There is a *very large* amount of work needed here, and I do not hear you promising to do it. I thought I had done so clearly enough, happy to do so again. I promise that either work will be done, or the patch will be reverted. Since I have more time now, I view that as a realistic prospect. What I'm hearing is stonewalling, and I am not happy. I'm not sure what you mean by that but it sounds negative and is almost certainly not justified, in this or other cases. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Simon Riggs si...@2ndquadrant.com writes: On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote: TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. There are no issues relating to security or data loss, just various fixable issues in a low-impact feature, which in my view is an important feature also. There is a *very large* amount of work needed here, and I do not hear you promising to do it. What I'm hearing is stonewalling, and I am not happy. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On Tue, Jul 14, 2015 at 11:14:55AM +0100, Simon Riggs wrote: On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote: TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. Based on the various comments here, I don't see that as the right course of action at this point. There are no issues relating to security or data loss, just various fixable issues in a low-impact feature, which in my view is an important feature also. If it's to stay, it *must* get a line-by-line review from some committer-level person; and I think there are other more important things for us to be doing for 9.5. Honestly, I am very surprised by this. Tom's partial review found quite a crop of unvarnished bugs: 1. sample node can give different tuples across rescans within an executor run 2. missing dependency machinery to restrict dropping a sampling extension 3. missing pg_dump --binary-upgrade treatment 4. potential core dumps due to dereferencing values that could be null 5. factually incorrect comments 6. null argument checks in strict functions 7. failure to check for constisnull 8. failure to sanity-check ntuples 9. arithmetic errors in random_relative_prime() (That's after sifting out design counterproposals, feature requests, and other topics of regular disagreement. I erred on the side of leaving things off that list.) Finding one or two like that during a complete post-commit review would be business as usual. Finding nine in a partial review diagnoses a critical shortfall in pre-commit review vigilance. Fixing the bugs found to date will not cure that shortfall. A qualified re-review could cure it. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 15 July 2015 at 05:58, Noah Misch n...@leadboat.com wrote: If it's to stay, it *must* get a line-by-line review from some committer-level person; and I think there are other more important things for us to be doing for 9.5. Honestly, I am very surprised by this. Tom's partial review found quite a crop of unvarnished bugs: 1. sample node can give different tuples across rescans within an executor run 2. missing dependency machinery to restrict dropping a sampling extension 3. missing pg_dump --binary-upgrade treatment 4. potential core dumps due to dereferencing values that could be null 5. factually incorrect comments 6. null argument checks in strict functions 7. failure to check for constisnull 8. failure to sanity-check ntuples 9. arithmetic errors in random_relative_prime() (That's after sifting out design counterproposals, feature requests, and other topics of regular disagreement. I erred on the side of leaving things off that list.) Finding one or two like that during a complete post-commit review would be business as usual. Finding nine in a partial review diagnoses a critical shortfall in pre-commit review vigilance. Fixing the bugs found to date will not cure that shortfall. A qualified re-review could cure it. Thank you for the summary of points. I agree with that list. I will work on the re-review as you suggest. 1 and 4 relate to the sample API exposed, which needs some rework. We'll see how big that is; at this time I presume not that hard, but I will wait for Petr's opinion also when he returns on Friday. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On Mon, Jul 13, 2015 at 7:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: The two contrib modules this patch added are nowhere near fit for public consumption. They cannot clean up after themselves when dropped: ... Raw inserts into system catalogs just aren't a sane thing to do in extensions. I had some thoughts about how we might fix that, without going to the rather tedious lengths of creating a complete set of DDL infrastructure for CREATE/DROP TABLESAMPLE METHOD. First off, the extension API designed for the tablesample patch is evidently modeled on the index AM API, which was not a particularly good precedent --- it's not a coincidence that index AMs can't be added or dropped on-the-fly. Modeling a server internal API as a set of SQL-visible functions is problematic when the call signatures of those functions can't be accurately described by SQL datatypes, and it's rather pointless and inefficient when none of the functions in question is meant to be SQL-callable. It's even less attractive if you don't think you've got a completely stable API spec, because adding, dropping, or changing signature of any one of the API functions then involves a pile of easy-to-mess-up catalog changes on top of the actually useful work. Not to mention then having to think about backwards compatibility of your CREATE command's arguments. We have a far better model to follow already, namely the foreign data wrapper API. In that, there's a single SQL-visible function that returns a dummy datatype indicating that it's an FDW handler, and when called, it hands back a struct containing pointers to all the other functions that the particular wrapper needs to supply (and, if necessary, the struct could have non-function-pointer fields containing other info the core system might need to know about the handler). We could similarly invent a pseudotype tablesample_handler and represent each tablesample method by a single SQL function returning tablesample_handler. Any future changes in the API for tablesample handlers would then appear as changes in the C definition of the struct returned by the handler, which requires no SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it is pretty easy to design it so that failure to update an external module that implements the API results in C compiler errors and/or sane fallback behavior. That's elegant. Once we've done that, I think we don't even need a special catalog representing tablesample methods. Given FROM foo TABLESAMPLE bernoulli(...), the parser could just look for a function bernoulli() returning tablesample_handler, and it's done. The querytree would have an ordinary function dependency on that function, which would be sufficient to handle DROP dependency behaviors properly. (On reflection, maybe better if it's bernoulli(internal) returns tablesample_handler, so as to guarantee that such a function doesn't create a conflict with any user-defined function of the same name.) Thoughts? Regarding the fact that those two contrib modules can be part of a -contrib package and could be installed, nuking those two extensions from the tree and preventing the creating of custom tablesample methods looks like a correct course of things to do for 9.5. When looking at this patch I was as well surprised that the BERNOUILLI method can only be applied on a physical relation and was not able to fire on a materialized set of tuples, say the result of a WITH clause for example. PS: now that I've written this rant, I wonder why we don't redesign the index AM API along the same lines. It probably doesn't matter much at the moment, but if we ever get serious about supporting index AM extensions, I think we ought to consider doing that. Any API redesign looks to be clearly 9.6 work IMO at this point. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
Michael Paquier michael.paqu...@gmail.com writes: Regarding the fact that those two contrib modules can be part of a -contrib package and could be installed, nuking those two extensions from the tree and preventing the creating of custom tablesample methods looks like a correct course of things to do for 9.5. TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. I'll send a separate message about high-level issues, but as far as code details go, I started to do some detailed code review last night and only got through contrib/tsm_system_rows/tsm_system_rows.c before deciding it was hopeless. Let's have a look at my notes: * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group Obsolete copyright date. * IDENTIFICATION *contrib/tsm_system_rows_rowlimit/tsm_system_rows.c Wrong filename. (For the moment, I'll refrain from any value judgements about the overall adequacy or quality of the comments in this patch, and just point out obvious errors that should have been caught in review.) typedef struct { SamplerRandomState randstate; uint32 seed; /* random seed */ BlockNumber nblocks;/* number of block in relation */ int32 ntuples;/* number of tuples to return */ int32 donetuples; /* tuples already returned */ OffsetNumber lt;/* last tuple returned from current block */ BlockNumber step; /* step size */ BlockNumber lb; /* last block visited */ BlockNumber doneblocks; /* number of already returned blocks */ } SystemSamplerData; This same typedef name is defined in three different places in the patch (tablesample/system.c, tsm_system_rows.c, tsm_system_time.c). While that might not amount to a bug, it's sure a recipe for confusion, especially since the struct definitions are all different. Datum tsm_system_rows_init(PG_FUNCTION_ARGS) { TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0); uint32 seed = PG_GETARG_UINT32(1); int32 ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2); This is rather curious coding. Why should this function check only one of its arguments for nullness? If it needs to defend against any of them being null, it really needs to check all. But in point of fact, this function is declared STRICT, which means it's a violation of the function call protocol if the core code ever passes a null to it, and so this test ought to be dead code. A similar pattern of ARGISNULL checks in declared-strict functions exists in all the tablesample modules, not just this one, showing that this is an overall design error not just a thinko here. My inclination would be to make the core code enforce non-nullness of all tablesample arguments so as to make it unnecessary to check strictness of the tsm*init functions, but in any case it is Not Okay to just pass nulls willy-nilly to strict C functions. Also, I find this coding pretty sloppy even without the strictness angle, because the net effect of this way of dealing with nulls is that an argument-must-not-be-null complaint is reported as out of range, which is both confusing and the wrong ERRCODE. if (ntuples 1) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(invalid sample size), errhint(Sample size must be positive integer value.))); I don't find this to be good error message style. The secondary comment is not a hint, it's an ironclad statement of what you did wrong, so if we wanted to phrase it like this it should be an errdetail not errhint. But the whole thing is overly cute anyway because there is no reason at all not to just say what we mean in the primary error message, eg ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg(sample size must be greater than zero))); While we're on the subject, what's the reason for disallowing a sample size of zero? Seems like a reasonable edge case. /* All blocks have been read, we're done */ if (sampler-doneblocks sampler-nblocks || sampler-donetuples = sampler-ntuples) PG_RETURN_UINT32(InvalidBlockNumber); Okay, I lied, I *am* going to complain about this comment. Comments that do not accurately describe the code they're attached to are worse than useless. /* * Cleanup method. */ Datum tsm_system_rows_end(PG_FUNCTION_ARGS) { TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0); pfree(tsdesc-tsmdata); PG_RETURN_VOID(); } This cleanup method is a waste of code space. There is no need to pfree individual allocations at query execution end. limitnode = linitial(args); limitnode = estimate_expression_value(root, limitnode);
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 11 July 2015 at 21:28, Tom Lane t...@sss.pgh.pa.us wrote: What are we going to do about this? I will address the points you raise, one by one. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
I wrote: TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. I'll send a separate message about high-level issues, And here's that. I do *not* claim that this is a complete list of design issues with the patch, it's just things I happened to notice in the amount of time I've spent so far (which is already way more than I wanted to spend on TABLESAMPLE right now). I'm not sure that we need an extensible sampling-method capability at all, let alone that an API for that needs to be the primary focus of a patch. Certainly the offered examples of extension modules aren't inspiring: tsm_system_time is broken by design (more about that below) and nobody would miss tsm_system_rows if it weren't there. What is far more important is to get sampling capability into indexscans, and designing an API ahead of time seems like mostly a distraction from that. I'd think seriously about tossing the entire executor-stage part of the patch, creating a stateless (history-independent) sampling filter that just accepts or rejects TIDs, and sticking calls to that into all the table scan node types. Once you had something like that working well it might be worth considering whether to try to expose an API to generalize it. But even then it's not clear that we really need any more options than true-Bernoulli and block-level sampling. The IBM paper I linked to in the other thread mentions that their optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM was requested. Probably that happens when it decides to do a simple indexscan, because then there's no advantage to trying to cluster the sampled rows. But in the case of a bitmap scan, you could very easily do either true Bernoulli or block-level sampling simply by adjusting the rules about which bits you keep or clear in the bitmap (given that you apply the filter between collection of the index bitmap and accessing the heap, which seems natural). The only case where a special scan type really seems to be needed is if you want block-level sampling, the query would otherwise use a seqscan, *and* the sampling percentage is pretty low --- if you'd be reading every second or third block anyway, you're likely better off with a plain seqscan so that the kernel sees sequential access and does prefetching. The current API doesn't seem to make it possible to switch between seqscan and read-only-selected-blocks based on the sampling percentage, but I think that could be an interesting optimization. (Another bet that's been missed is having the samplescan logic request prefetching when it is doing selective block reads. The current API can't support that AFAICS, since there's no expectation that nextblock calls could be done asynchronously from nexttuple calls.) Another issue with the API as designed is the examinetuple() callback. Allowing sample methods to see invisible tuples is quite possibly the worst idea in the whole patch. They can *not* do anything with such tuples, or they'll totally break reproducibility: if the tuple is invisible to your scan, it might well be or soon become invisible to everybody, whereupon it would be subject to garbage collection at the drop of a hat. So if an invisible tuple affects the sample method's behavior at all, repeated scans in the same query would not produce identical results, which as mentioned before is required both by spec and for minimally sane query behavior. Moreover, examining the contents of the tuple is unsafe (it might contain pointers to TOAST values that no longer exist); and even if it were safe, what's the point? Sampling that pays attention to the data is the very definition of biased. So if we do re-introduce an API like the current one, I'd definitely lose this bit and only allow sample methods to consider visible tuples. On the point of reproducibility: the tsm_system_time method is utterly incapable of producing identical results across repeated scans, because there is no reason to believe it would get exactly as far in the same amount of time each time. This might be all right across queries if the method could refuse to work with REPEATABLE clauses (but there's no provision for such a restriction in the current API). But it's not acceptable within a query. Another problem with tsm_system_time is that its cost/rowcount estimation is based on nothing more than wishful thinking, and can never be based on anything more than wishful thinking, because planner cost units are not time-based. Adding a comment that says we'll nonetheless pretend they are milliseconds isn't a solution. So that sampling method really has to go away and never come back, whatever we might ultimately salvage from this patch otherwise. (I'm not exactly convinced that the system or tsm_system_rows methods are adequately
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
On 13 July 2015 at 17:00, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: TBH, I think the right thing to do at this point is to revert the entire patch and send it back for ground-up rework. I think the high-level design is wrong in many ways and I have about zero confidence in most of the code details as well. I'll send a separate message about high-level issues, And here's that. I do *not* claim that this is a complete list of design issues with the patch, it's just things I happened to notice in the amount of time I've spent so far (which is already way more than I wanted to spend on TABLESAMPLE right now). Interesting that a time-based sample does indeed yield useful results. Good to know. That is exactly what this patch provides: a time-based sample, with reduced confidence in the accuracy of the result. And other things too. I'm not sure that we need an extensible sampling-method capability at all, let alone that an API for that needs to be the primary focus of a patch. Certainly the offered examples of extension modules aren't inspiring: tsm_system_time is broken by design (more about that below) and nobody would miss tsm_system_rows if it weren't there. Time based sampling isn't broken by design. It works exactly according to the design. Time-based sampling has been specifically requested by data visualization developers who are trying to work out how to display anything useful from a database beyond a certain size. The feature for PostgreSQL implements a similar mechanism to that deployed already with BlinkDB, demonstrated at VLDB 2012. I regard it as an Advanced feature worthy of deployment within PostgreSQL, based upon user request. Row based sampling offers the capability to retrieve a sample of a fixed size however big the data set. I shouldn't need to point out that this is already in use within the ANALYZE command. Since it implements a technique already in use within PostgreSQL, I see no reason not to expose that to users. As I'm sure you're aware, there is rigorous math backing up the use of fixed size sampling, with recent developments from my research colleagues at Manchester University that further emphasises the usefulness of this feature for us. What is far more important is to get sampling capability into indexscans, and designing an API ahead of time seems like mostly a distraction from that. This has come up as a blocker on TABLESAMPLE before. I've got no evidence to show anyone cares about that. I can't imagine a use for it myself; it was not omitted from this patch because its difficult, it was omitted because its just useless. If anyone ever cares, they can add it in a later release. I'd think seriously about tossing the entire executor-stage part of the patch, creating a stateless (history-independent) sampling filter that just accepts or rejects TIDs, and sticking calls to that into all the table scan node types. Once you had something like that working well it might be worth considering whether to try to expose an API to generalize it. But even then it's not clear that we really need any more options than true-Bernoulli and block-level sampling. See above. The IBM paper I linked to in the other thread mentions that their optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM was requested. That sounds broken to me. This patch gives the user what the user asks for. BERNOULLI or SYSTEM. If you particularly like the idea of mixing the two sampling methods, you can do so *because* the sampling method API is extensible for the user. So Mr.DB2 can get it his way if he likes and he can call it SYSNOULLI if he likes; others can get it the way they like also. No argument required. It was very, very obvious that whatever sampling code anybody dreamt up would be objected to. Clearly, we need a way to allow people to implement whatever technique they wish. Stratified sampling, modified sampling, new techniques. Whatever. Probably that happens when it decides to do a simple indexscan, because then there's no advantage to trying to cluster the sampled rows. But in the case of a bitmap scan, you could very easily do either true Bernoulli or block-level sampling simply by adjusting the rules about which bits you keep or clear in the bitmap (given that you apply the filter between collection of the index bitmap and accessing the heap, which seems natural). The only case where a special scan type really seems to be needed is if you want block-level sampling, the query would otherwise use a seqscan, *and* the sampling percentage is pretty low --- if you'd be reading every second or third block anyway, you're likely better off with a plain seqscan so that the kernel sees sequential access and does prefetching. The current API doesn't seem to make it possible to switch between seqscan and read-only-selected-blocks based on the sampling percentage, but I think that could be an interesting optimization.
Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape
I wrote: The two contrib modules this patch added are nowhere near fit for public consumption. They cannot clean up after themselves when dropped: ... Raw inserts into system catalogs just aren't a sane thing to do in extensions. I had some thoughts about how we might fix that, without going to the rather tedious lengths of creating a complete set of DDL infrastructure for CREATE/DROP TABLESAMPLE METHOD. First off, the extension API designed for the tablesample patch is evidently modeled on the index AM API, which was not a particularly good precedent --- it's not a coincidence that index AMs can't be added or dropped on-the-fly. Modeling a server internal API as a set of SQL-visible functions is problematic when the call signatures of those functions can't be accurately described by SQL datatypes, and it's rather pointless and inefficient when none of the functions in question is meant to be SQL-callable. It's even less attractive if you don't think you've got a completely stable API spec, because adding, dropping, or changing signature of any one of the API functions then involves a pile of easy-to-mess-up catalog changes on top of the actually useful work. Not to mention then having to think about backwards compatibility of your CREATE command's arguments. We have a far better model to follow already, namely the foreign data wrapper API. In that, there's a single SQL-visible function that returns a dummy datatype indicating that it's an FDW handler, and when called, it hands back a struct containing pointers to all the other functions that the particular wrapper needs to supply (and, if necessary, the struct could have non-function-pointer fields containing other info the core system might need to know about the handler). We could similarly invent a pseudotype tablesample_handler and represent each tablesample method by a single SQL function returning tablesample_handler. Any future changes in the API for tablesample handlers would then appear as changes in the C definition of the struct returned by the handler, which requires no SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it is pretty easy to design it so that failure to update an external module that implements the API results in C compiler errors and/or sane fallback behavior. Once we've done that, I think we don't even need a special catalog representing tablesample methods. Given FROM foo TABLESAMPLE bernoulli(...), the parser could just look for a function bernoulli() returning tablesample_handler, and it's done. The querytree would have an ordinary function dependency on that function, which would be sufficient to handle DROP dependency behaviors properly. (On reflection, maybe better if it's bernoulli(internal) returns tablesample_handler, so as to guarantee that such a function doesn't create a conflict with any user-defined function of the same name.) Thoughts? regards, tom lane PS: now that I've written this rant, I wonder why we don't redesign the index AM API along the same lines. It probably doesn't matter much at the moment, but if we ever get serious about supporting index AM extensions, I think we ought to consider doing that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TABLESAMPLE patch is really in pretty sad shape
The two contrib modules this patch added are nowhere near fit for public consumption. They cannot clean up after themselves when dropped: regression=# create extension tsm_system_rows; CREATE EXTENSION regression=# create table big as select i, random() as x from generate_series(1,100) i; SELECT 100 regression=# create view v1 as select * from big tablesample system_rows(10); CREATE VIEW regression=# drop extension tsm_system_rows; DROP EXTENSION The view is still there, but is badly broken: regression=# select * from v1; ERROR: cache lookup failed for function 46379 Potentially this is a security issue, since a malicious user could probably manage to create a Trojan horse function having the now-vacated OID, whereupon use of the view would invoke that function. Worse still, the broken pg_tablesample_method row is still there: regression=# create extension tsm_system_rows; ERROR: duplicate key value violates unique constraint pg_tablesample_method_name_index DETAIL: Key (tsmname)=(system_rows) already exists. And even if we fixed that, these modules will not survive a pg_upgrade cycle, because pg_upgrade has no idea that it would need to create a pg_tablesample_method row (remember that we do *not* replay the extension script during binary upgrade). Raw inserts into system catalogs just aren't a sane thing to do in extensions. Some of the risks here come from what seems like a fundamentally wrong decision to copy all of the info about a tablesample method out of the pg_tablesample_method catalog *at parse time* and store it permanently in the query parse tree. This makes any sort of alter tablesample method DDL operation impossible in principle, because any views/rules referencing the method have already copied the data. On top of that, I find the basic implementation design rather dubious, because it supposes that the tablesample filtering step must always come first; the moment you say TABLESAMPLE you can kiss goodbye the idea that the query will use any indexes. For example: d2=# create table big as select i, random() as x from generate_series(1,1000) i; SELECT 1000 d2=# create index on big(i); CREATE INDEX d2=# analyze big; ANALYZE d2=# explain analyze select * from big where i between 100 and 200; QUERY PLAN Index Scan using big_i_idx on big (cost=0.43..10.18 rows=87 width=12) (actual time=0.022..0.088 rows=101 loops=1) Index Cond: ((i = 100) AND (i = 200)) Planning time: 0.495 ms Execution time: 0.141 ms (4 rows) d2=# explain analyze select * from big tablesample bernoulli(10) where i between 100 and 200; QUERY PLAN Sample Scan (bernoulli) on big (cost=0.00..54055.13 rows=9 width=12) (actual time=0.028..970.051 rows=13 loops=1) Filter: ((i = 100) AND (i = 200)) Rows Removed by Filter: 999066 Planning time: 0.182 ms Execution time: 970.093 ms (5 rows) Now, maybe I don't understand the use-case for this feature, but I should think it's meant for dealing with tables that are so big that you can't afford to scan all the data. So, OK, a samplescan is hopefully cheaper than a pure seqscan, but that doesn't mean that fetching 999079 rows and discarding 999066 of them is a good plan design. There needs to be an operational mode whereby we can use an index and do random sampling of the TIDs the index returns. I do not insist that that has to appear in version one of the feature --- but I am troubled by the fact that, by exposing an oversimplified API for use by external modules, this patch is doubling down on the assumption that no such operational mode will ever need to be implemented. There are a whole lot of lesser sins, such as documentation that was clearly never copy-edited by a native speaker of English, badly designed planner APIs (Paths really ought not have rowcounts different from the underlying RelOptInfo), potential core dumps due to dereferencing values that could be null (the guards for null values are in the wrong places entirely), etc etc. While there's nothing here that couldn't be fixed by nuking the contrib modules and putting a week or two of concentrated work into fixing the core code, I for one certainly don't have time to put that kind of effort into TABLESAMPLE right now. Nor do I really have the interest; I find this feature of pretty dubious value. What are we going to do about this? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] TABLESAMPLE patch
On 19/04/15 01:24, Michael Paquier wrote: On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote: On 10/04/15 06:46, Michael Paquier wrote: 13) Some regression tests with pg_tablesample_method would be welcome. Not sure what you mean by that. I meant a sanity check on pg_tablesample_method to be sure that tsminit, tsmnextblock and tsmnexttuple are always defined as they are mandatory functions. So the idea is to add a query like and and to be sure that it returns no rows: SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL; Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I am sure you guessed it that way already.. Yes I guessed that and it's very reasonable request, I guess it should look like the attached (I don't want to send new version of everything just for this). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From d059c792a1e864e38f321cea51169ea0b3c5caab Mon Sep 17 00:00:00 2001 From: Petr Jelinek pjmo...@pjmodos.net Date: Wed, 22 Apr 2015 21:28:29 +0200 Subject: [PATCH 7/7] tablesample: add catalog regression test --- src/test/regress/expected/tablesample.out | 15 +++ src/test/regress/sql/tablesample.sql | 13 + 2 files changed, 28 insertions(+) diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out index 271638d..04e5eb8 100644 --- a/src/test/regress/expected/tablesample.out +++ b/src/test/regress/expected/tablesample.out @@ -209,6 +209,21 @@ SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5); ERROR: syntax error at or near TABLESAMPLE LINE 1: ...CT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPL... ^ +-- catalog sanity +SELECT * +FROM pg_tablesample_method +WHERE tsminit IS NULL + OR tsmseqscan IS NULL + OR tsmpagemode IS NULL + OR tsmnextblock IS NULL + OR tsmnexttuple IS NULL + OR tsmend IS NULL + OR tsmreset IS NULL + OR tsmcost IS NULL; + tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost +-++-+-+--+--+-++--+- +(0 rows) + -- done DROP TABLE test_tablesample CASCADE; NOTICE: drop cascades to 2 other objects diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql index 2f4b7de..7b3eb9b 100644 --- a/src/test/regress/sql/tablesample.sql +++ b/src/test/regress/sql/tablesample.sql @@ -57,5 +57,18 @@ SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1); SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5); +-- catalog sanity + +SELECT * +FROM pg_tablesample_method +WHERE tsminit IS NULL + OR tsmseqscan IS NULL + OR tsmpagemode IS NULL + OR tsmnextblock IS NULL + OR tsmnexttuple IS NULL + OR tsmend IS NULL + OR tsmreset IS NULL + OR tsmcost IS NULL; + -- done DROP TABLE test_tablesample CASCADE; -- 1.9.1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Thu, Apr 23, 2015 at 4:31 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 19/04/15 01:24, Michael Paquier wrote: On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote: On 10/04/15 06:46, Michael Paquier wrote: 13) Some regression tests with pg_tablesample_method would be welcome. Not sure what you mean by that. I meant a sanity check on pg_tablesample_method to be sure that tsminit, tsmnextblock and tsmnexttuple are always defined as they are mandatory functions. So the idea is to add a query like and and to be sure that it returns no rows: SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL; Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I am sure you guessed it that way already.. Yes I guessed that and it's very reasonable request, I guess it should look like the attached (I don't want to send new version of everything just for this). Thanks. That's exactly the idea. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote: On 10/04/15 06:46, Michael Paquier wrote: 13) Some regression tests with pg_tablesample_method would be welcome. Not sure what you mean by that. I meant a sanity check on pg_tablesample_method to be sure that tsminit, tsmnextblock and tsmnexttuple are always defined as they are mandatory functions. So the idea is to add a query like and and to be sure that it returns no rows: SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL; - Added two sample contrib modules demonstrating row limited and time limited sampling. I am using linear probing for both of those as the builtin block sampling is not well suited for row limited or time limited sampling. For row limited I originally thought of using the Vitter's reservoir sampling but that does not fit well with the executor as it needs to keep the reservoir of all the output tuples in memory which would have horrible memory requirements if the limit was high. The linear probing seems to work quite well for the use case of give me 500 random rows from table. Patch 4 is interesting, it shows a direct use of examinetuple to filter the output. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Sat, Apr 11, 2015 at 12:56 AM, Peter Eisentraut pete...@gmx.net wrote: On 4/9/15 8:58 PM, Petr Jelinek wrote: Well, you can have two approaches to this, either allow some specific set of keywords that can be used to specify limit, or you let sampling methods interpret parameters, I believe the latter is more flexible. There is nothing stopping somebody writing sampling method which takes limit as number of rows, or anything else. Also for example for BERNOULLI to work correctly you'd need to convert the number of rows to fraction of table anyway (and that's exactly what the one database which has this feature does internally) and then it's no different than passing (SELECT 100/reltuples*number_of_rows FROM tablename) as a parameter. What is your intended use case for this feature? I know that give me 100 random rows from this table quickly is a common use case, but that's kind of cumbersome if you need to apply formulas like that. I'm not sure what the use of a percentage is. Presumably, the main use of this features is on large tables. But then you might not even know how large it really is, and even saying 0.1% might be more than you wanted to handle. The use case for specifying number of rows for sample scan is valid and can be achieved by other means if required as suggested by Petr Jelinek, however the current proposed syntax (w.r.t to Sample Percentage [1]) seems to comply with SQL standard, so why not go for it and then extend it based on more use-cases? [1] SQL Standard (2003) w.r.t Sample Percentage sample clause ::= TABLESAMPLE sample method left paren sample percentage right paren [ repeatable clause ] sample percentage ::= numeric value expression With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On 10 April 2015 at 15:26, Peter Eisentraut pete...@gmx.net wrote: What is your intended use case for this feature? Likely use cases are: * Limits on numbers of rows in sample. Some research colleagues have published a new mathematical analysis that will allow a lower limit than previously considered. * Time limits on sampling. This allows data visualisation approaches to gain approximate answers in real time. * Stratified sampling. Anything with some kind of filtering, lifting or bias. Allows filtering out known incomplete data. * Limits on sample error Later use cases would allow custom aggregates to work together with custom sampling methods, so we might work our way towards i) an SUM() function that provides the right answer even when used with a sample scan, ii) custom aggregates that report the sample error, allowing you to get both AVG() and AVG_STDERR(). That would be technically possible with what we have here, but I think a lot more thought required yet. These have all come out of detailed discussions with two different groups of data mining researchers. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 10/04/15 21:26, Peter Eisentraut wrote: On 4/9/15 8:58 PM, Petr Jelinek wrote: Well, you can have two approaches to this, either allow some specific set of keywords that can be used to specify limit, or you let sampling methods interpret parameters, I believe the latter is more flexible. There is nothing stopping somebody writing sampling method which takes limit as number of rows, or anything else. Also for example for BERNOULLI to work correctly you'd need to convert the number of rows to fraction of table anyway (and that's exactly what the one database which has this feature does internally) and then it's no different than passing (SELECT 100/reltuples*number_of_rows FROM tablename) as a parameter. What is your intended use case for this feature? I know that give me 100 random rows from this table quickly is a common use case, but that's kind of cumbersome if you need to apply formulas like that. I'm not sure what the use of a percentage is. Presumably, the main use of this features is on large tables. But then you might not even know how large it really is, and even saying 0.1% might be more than you wanted to handle. My main intended use-case is analytics on very big tables. The percentages of population vs confidence levels are pretty well mapped there and you can get quite big speedups if you are fine with getting results with slightly smaller confidence (ie you care about ballpark figures). But this was not really my point, the BERNOULLI just does not work well with row-limit by definition, it applies probability on each individual row and while you can get probability from percentage very easily (just divide by 100), to get it for specific target number of rows you have to know total number of source rows and that's not something we can do very accurately so then you won't get 500 rows but approximately 500 rows. In any case for give me 500 somewhat random rows from table quickly you want probably SYSTEM sampling anyway as it will be orders of magnitude faster on big tables and yes even 0.1% might be more than you wanted in that case. I am not against having row limit input for methods which can work with it like SYSTEM but then that's easily doable by adding separate sampling method which accepts rows (even if sampling algorithm itself is same). In current approach all you'd have to do is write different init function for the sampling method and register it under new name (yes it won't be named SYSTEM but for example SYSTEM_ROWLIMIT then). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 04/10/15 21:57, Petr Jelinek wrote: On 10/04/15 21:26, Peter Eisentraut wrote: But this was not really my point, the BERNOULLI just does not work well with row-limit by definition, it applies probability on each individual row and while you can get probability from percentage very easily (just divide by 100), to get it for specific target number of rows you have to know total number of source rows and that's not something we can do very accurately so then you won't get 500 rows but approximately 500 rows. It's actually even trickier. Even if you happen to know the exact number of rows in the table, you can't just convert that into a percentage like that and use it for BERNOULLI sampling. It may give you different number of result rows, because each row is sampled independently. That is why we have Vitter's algorithm for reservoir sampling, which works very differently from BERNOULLI. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 10/04/15 22:16, Tomas Vondra wrote: On 04/10/15 21:57, Petr Jelinek wrote: On 10/04/15 21:26, Peter Eisentraut wrote: But this was not really my point, the BERNOULLI just does not work well with row-limit by definition, it applies probability on each individual row and while you can get probability from percentage very easily (just divide by 100), to get it for specific target number of rows you have to know total number of source rows and that's not something we can do very accurately so then you won't get 500 rows but approximately 500 rows. It's actually even trickier. Even if you happen to know the exact number of rows in the table, you can't just convert that into a percentage like that and use it for BERNOULLI sampling. It may give you different number of result rows, because each row is sampled independently. That is why we have Vitter's algorithm for reservoir sampling, which works very differently from BERNOULLI. Hmm this actually gives me idea - perhaps we could expose Vitter's reservoir sampling as another sampling method for people who want the give me 500 rows from table fast then? We already have it implemented it's just matter of adding the glue. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Thu, Apr 9, 2015 at 5:12 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 06/04/15 14:30, Petr Jelinek wrote: On 06/04/15 11:02, Simon Riggs wrote: Are we ready for a final detailed review and commit? I plan to send v12 in the evening with some additional changes that came up from Amit's comments + some improvements to error reporting. I think it will be ready then. Ok so here it is. Changes vs v11: - changed input parameter list to expr_list - improved error reporting, particularly when input parameters are wrong - fixed SELECT docs to show correct syntax and mention that there can be more sampling methods - added name of the sampling method to the explain output - I don't like the code much there as it has to look into RTE but on the other hand I don't want to create new scan node just so it can hold the name of the sampling method for explain - made views containing TABLESAMPLE clause not autoupdatable - added PageIsAllVisible() check before trying to check for tuple visibility - some typo/white space fixes Compiler warnings: explain.c: In function 'ExplainNode': explain.c:861: warning: 'sname' may be used uninitialized in this function Docs spellings: PostgreSQL distrribution extra r. The optional parameter REPEATABLE acceps accepts. But I don't know that 'accepts' is the right word. It makes the seed value sound optional to REPEATABLE. each block having same chance should have the before same. Both of those sampling methods currently I think it should be these not those, as this sentence is immediately after their introduction, not at a distance. ...tuple contents and decides if to return in, or zero if none Something here is confusing. return it, not return in? Other comments: Do we want tab completions for psql? (I will never remember how to spell BERNOULLI). Yes. I think so. Needs a Cat version bump. The committer who will pick up this patch will normally do it. Patch 1 is simple enough and looks fine to me. Regarding patch 2... I found for now some typos: + titlestructnamepg_tabesample_method/structname/title +productnamePostgreSQL/productname distrribution: Also, I am wondering if the sampling logic based on block analysis is actually correct, for example for now this fails and I think that we should support it: =# with query_select as (select generate_series(1, 10) as a) select query_select.a from query_select tablesample system (100.0/11) REPEATABLE (); ERROR: 42P01: relation query_select does not exist How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a sample from a result set? Thoughts? Just to be clear, the example above being misleading... Doing table sampling using SYSTEM at physical level makes sense. In this case I think that we should properly error out when trying to use this method on something not present at physical level. But I am not sure that this restriction applies to BERNOUILLI: you may want to apply it on other things than physical relations, like views or results of WITH clauses. Also, based on the fact that we support custom sampling methods, I think that it should be up to the sampling method to define on what kind of objects it supports sampling, and where it supports sampling fetching, be it page-level fetching or analysis from an existing set of tuples. Looking at the patch, TABLESAMPLE is just allowed on tables and matviews, this limitation is too restrictive IMO. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 4/9/15 5:02 AM, Michael Paquier wrote: Just to be clear, the example above being misleading... Doing table sampling using SYSTEM at physical level makes sense. In this case I think that we should properly error out when trying to use this method on something not present at physical level. But I am not sure that this restriction applies to BERNOUILLI: you may want to apply it on other things than physical relations, like views or results of WITH clauses. Also, based on the fact that we support custom sampling methods, I think that it should be up to the sampling method to define on what kind of objects it supports sampling, and where it supports sampling fetching, be it page-level fetching or analysis from an existing set of tuples. Looking at the patch, TABLESAMPLE is just allowed on tables and matviews, this limitation is too restrictive IMO. In the SQL standard, the TABLESAMPLE clause is attached to a table expression (table primary), which includes table functions, subqueries, CTEs, etc. In the proposed patch, it is attached to a table name, allowing only an ONLY clause. So this is a significant deviation. Obviously, doing block sampling on a physical table is a significant use case, but we should be clear about which restrictions and tradeoffs were are making now and in the future, especially if we are going to present extension interfaces. The fact that physical tables are interchangeable with other relation types, at least in data-reading contexts, is a feature worth preserving. It may be worth thinking about some examples of other sampling methods, in order to get a better feeling for whether the interfaces are appropriate. Earlier in the thread, someone asked about supporting specifying a number of rows instead of percents. While not essential, that seems pretty useful, but I wonder how that could be implemented later on if we take the approach that the argument to the sampling method can be an arbitrary quantity that is interpreted only by the method. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 09/04/15 11:37, Simon Riggs wrote: On 9 April 2015 at 04:52, Simon Riggs si...@2ndquadrant.com wrote: TABLESAMPLE BERNOULLI could work in this case, or any other non-block based sampling mechanism. Whether it does work yet is another matter. This query should be part of the test suite and should generate a useful message or work correctly. The SQL Standard does allow the WITH query given. It makes no mention of the obvious point that SYSTEM-defined mechanisms might not work, but that is for the implementation to define, AFAICS. Yes SQL Standard allows this and the reason why they don't define what happens with SYSTEM is that they actually don't define how SYSTEM should behave except that it should return approximately given percentage of rows, but the actual behavior is left to the DBMS. The reason why other dbs like MSSQL or DB2 have chosen this to be block sampling is that it makes most sense (and is fastest) on tables and those databases don't support TABLESAMPLE on anything else at all. On balance, in this release, I would be happier to exclude sampled results from queries, and only allow sampling against base tables. I think so too, considering how late in the last CF we are. Especially given my note about MSSQL and DB2 above. In any case I don't see any fundamental issues with extending the current implementation with the subquery support. I think most of the work there is actually in parser/analyzer and planner. The sampling methods will just not receive the request for next blockid and tupleid from that block when source of the data is subquery and if they want to support subquery as source of sampling they will have to provide the examinetuple interface (which is already there and optional, the test/example custom sampling method is using it). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Thu, Apr 9, 2015 at 5:52 PM, Simon Riggs si...@2ndquadrant.com wrote: On 9 April 2015 at 04:12, Michael Paquier michael.paqu...@gmail.com wrote: Also, I am wondering if the sampling logic based on block analysis is actually correct, for example for now this fails and I think that we should support it: =# with query_select as (select generate_series(1, 10) as a) select query_select.a from query_select tablesample system (100.0/11) REPEATABLE (); ERROR: 42P01: relation query_select does not exist How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a sample from a result set? Thoughts? Good catch. The above query doesn't make any sense. TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is block level sampling. So any block level sampling method should be barred from operating on a result set in this way... i.e. should generate an ERROR inappropriate sampling method specified TABLESAMPLE BERNOULLI could work in this case, or any other non-block based sampling mechanism. Whether it does work yet is another matter. This query should be part of the test suite and should generate a useful message or work correctly. Ahah, you just beat me on that ;) See a more precise reply below. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 9 April 2015 at 04:52, Simon Riggs si...@2ndquadrant.com wrote: TABLESAMPLE BERNOULLI could work in this case, or any other non-block based sampling mechanism. Whether it does work yet is another matter. This query should be part of the test suite and should generate a useful message or work correctly. The SQL Standard does allow the WITH query given. It makes no mention of the obvious point that SYSTEM-defined mechanisms might not work, but that is for the implementation to define, AFAICS. The SQL Standard goes on to talk about possibly non-deterministic issues. Which in Postgres relates to the point that the results of a SampleScan will never be IMMUTABLE. That raises the possibility of planner issues. We must, for example, never do inner join removal on a sampled relation - we don't do that yet, but something to watch for. On balance, in this release, I would be happier to exclude sampled results from queries, and only allow sampling against base tables. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 06/04/15 14:30, Petr Jelinek wrote: On 06/04/15 11:02, Simon Riggs wrote: Are we ready for a final detailed review and commit? I plan to send v12 in the evening with some additional changes that came up from Amit's comments + some improvements to error reporting. I think it will be ready then. Ok so here it is. Changes vs v11: - changed input parameter list to expr_list - improved error reporting, particularly when input parameters are wrong - fixed SELECT docs to show correct syntax and mention that there can be more sampling methods - added name of the sampling method to the explain output - I don't like the code much there as it has to look into RTE but on the other hand I don't want to create new scan node just so it can hold the name of the sampling method for explain - made views containing TABLESAMPLE clause not autoupdatable - added PageIsAllVisible() check before trying to check for tuple visibility - some typo/white space fixes Compiler warnings: explain.c: In function 'ExplainNode': explain.c:861: warning: 'sname' may be used uninitialized in this function Docs spellings: PostgreSQL distrribution extra r. The optional parameter REPEATABLE acceps accepts. But I don't know that 'accepts' is the right word. It makes the seed value sound optional to REPEATABLE. each block having same chance should have the before same. Both of those sampling methods currently I think it should be these not those, as this sentence is immediately after their introduction, not at a distance. ...tuple contents and decides if to return in, or zero if none Something here is confusing. return it, not return in? Other comments: Do we want tab completions for psql? (I will never remember how to spell BERNOULLI). Yes. I think so. Needs a Cat version bump. The committer who will pick up this patch will normally do it. Patch 1 is simple enough and looks fine to me. Regarding patch 2... I found for now some typos: + titlestructnamepg_tabesample_method/structname/title +productnamePostgreSQL/productname distrribution: Also, I am wondering if the sampling logic based on block analysis is actually correct, for example for now this fails and I think that we should support it: =# with query_select as (select generate_series(1, 10) as a) select query_select.a from query_select tablesample system (100.0/11) REPEATABLE (); ERROR: 42P01: relation query_select does not exist How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a sample from a result set? Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 9 April 2015 at 04:12, Michael Paquier michael.paqu...@gmail.com wrote: Also, I am wondering if the sampling logic based on block analysis is actually correct, for example for now this fails and I think that we should support it: =# with query_select as (select generate_series(1, 10) as a) select query_select.a from query_select tablesample system (100.0/11) REPEATABLE (); ERROR: 42P01: relation query_select does not exist How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a sample from a result set? Thoughts? Good catch. The above query doesn't make any sense. TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is block level sampling. So any block level sampling method should be barred from operating on a result set in this way... i.e. should generate an ERROR inappropriate sampling method specified TABLESAMPLE BERNOULLI could work in this case, or any other non-block based sampling mechanism. Whether it does work yet is another matter. This query should be part of the test suite and should generate a useful message or work correctly. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 06/04/15 14:30, Petr Jelinek wrote: On 06/04/15 11:02, Simon Riggs wrote: Are we ready for a final detailed review and commit? I plan to send v12 in the evening with some additional changes that came up from Amit's comments + some improvements to error reporting. I think it will be ready then. Ok so here it is. Changes vs v11: - changed input parameter list to expr_list - improved error reporting, particularly when input parameters are wrong - fixed SELECT docs to show correct syntax and mention that there can be more sampling methods - added name of the sampling method to the explain output - I don't like the code much there as it has to look into RTE but on the other hand I don't want to create new scan node just so it can hold the name of the sampling method for explain - made views containing TABLESAMPLE clause not autoupdatable - added PageIsAllVisible() check before trying to check for tuple visibility - some typo/white space fixes Compiler warnings: explain.c: In function 'ExplainNode': explain.c:861: warning: 'sname' may be used uninitialized in this function Docs spellings: PostgreSQL distrribution extra r. The optional parameter REPEATABLE acceps accepts. But I don't know that 'accepts' is the right word. It makes the seed value sound optional to REPEATABLE. each block having same chance should have the before same. Both of those sampling methods currently I think it should be these not those, as this sentence is immediately after their introduction, not at a distance. ...tuple contents and decides if to return in, or zero if none Something here is confusing. return it, not return in? Other comments: Do we want tab completions for psql? (I will never remember how to spell BERNOULLI). Needs a Cat version bump. Cheers, Jeff
Re: [HACKERS] TABLESAMPLE patch
On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 04/04/15 14:57, Amit Kapila wrote: 1. +tablesample_clause: +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause It seems to me that you want to allow it to make it extendable to user defined Tablesample methods, but not sure if it is right to use func_arg_list for the same because sample method arguments can have different definition than function arguments. Now even if we want to keep sample method arguments same as function arguments that sounds like a separate discussion. Yes I want extensibility here. And I think the tablesample method arguments are same thing as function arguments given that in the end they are arguments for the init function of tablesampling method. I would be ok with just expr_list, naming parameters here isn't overly important, but ability to have different types and numbers of parameters for custom TABLESAMPLE methods *is* important. Yeah, named parameters is one reason which I think won't be required for sample methods and neither the same is mentioned in docs (if user has to use, what is the way he can pass the same) and another is number of arguments for sampling methods which is currently seems to be same as FUNC_MAX_ARGS, I think that is sufficient but do we want to support that many arguments for sampling method. I have shared my thoughts regarding this point with you, if you don't agree with the same, then proceed as you think is the best way. 2. postgres=# explain update test_tablesample TABLESAMPLE system(30) set id = 2; ERROR: syntax error at or near TABLESAMPLE LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i... postgres=# Delete from test_tablesample TABLESAMPLE system(30); ERROR: syntax error at or near TABLESAMPLE LINE 1: Delete from test_tablesample TABLESAMPLE system(30); Isn't TABLESAMPLE clause suppose to work with Update/Delete statements? It's supported in the FROM part of UPDATE and USING part of DELETE. I think that that's sufficient. But I think the Update on target table with sample scan is supported via views which doesn't seem to be the right thing in case you just want to support it via FROM/USING, example postgres=# create view vw_test As select * from test_tablesample TABLESAMPLE sys tem(30); postgres=# explain update vw_test set id = 4; QUERY PLAN --- Update on test_tablesample (cost=0.00..4.04 rows=4 width=210) - Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210) (2 rows) Standard is somewhat useless for UPDATE and DELETE as it only defines quite limited syntax there. From what I've seen when doing research MSSQL also only supports it in their equivalent of FROM/USING list, Oracle does not seem to support their SAMPLING clause outside of SELECTs at all and if I got the cryptic DB2 manual correctly I think they don't support it outside of (sub)SELECTs either. By the way, what is the usecase to support sample scan in Update or Delete statement? Also, isn't it better to mention in the docs for Update and Delete incase we are going to support tablesample clause for them? 7. ParseTableSample() { .. arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE); .. } What is the reason for coercing value of REPEATABLE clause to INT4OID when numeric value is expected for the clause. If user gives the input value as -2.3, it will generate a seed which doesn't seem to be right. Because the REPEATABLE is numeric expression so it can produce whatever number but we need int4 internally (well float4 would also work just the code would be slightly uglier). Okay, I understand that part. Here the real point is why not just expose it as int4 to user rather than telling in docs that it is numeric and actually we neither expect nor use it as numberic. Even Oracle supports supports it as int with below description. The seed_value must be an integer between 0 and 4294967295 And we do this type of coercion even for table data (you can insert -2.3 into integer column and it will work) so I don't see what's wrong with it here. I am not sure we can compare it with column of a table. I think we can support it within a valid range (similar to tablesample method) and if user inputs value outside the range, then return error. 8. +DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 2281 23 700 _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ )); +DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 2281 23 700 _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_ _null_ )); Datatype for second argument is kept as 700 (Float4), shouldn't it be 1700 (Numeric)? Why is that? As we are exposing it as numeric. Given the sampling error I think the float4 is enough for
Re: [HACKERS] TABLESAMPLE patch
On 2 April 2015 at 17:36, Petr Jelinek p...@2ndquadrant.com wrote: so here is version 11. Looks great. Comment on docs: The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention that if other methods are available they could be used also. The phrasing was sampling method can be one of list. Are we ready for a final detailed review and commit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, RemoteDBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 06/04/15 12:33, Amit Kapila wrote: On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: Yes I want extensibility here. And I think the tablesample method arguments are same thing as function arguments given that in the end they are arguments for the init function of tablesampling method. I would be ok with just expr_list, naming parameters here isn't overly important, but ability to have different types and numbers of parameters for custom TABLESAMPLE methods *is* important. Yeah, named parameters is one reason which I think won't be required for sample methods and neither the same is mentioned in docs (if user has to use, what is the way he can pass the same) and another is number of arguments for sampling methods which is currently seems to be same as FUNC_MAX_ARGS, I think that is sufficient but do we want to support that many arguments for sampling method. I think I'll go with simple list of a_exprs. The reason for that is that I can foresee sampling methods that need multiple parameters, but I don't think naming them is very important. Also adding support for naming parameters can be done in the future if we decide so without breaking compatibility. Side benefit is that it makes hinting about what is wrong with input somewhat easier. I don't think we need to come up with different limit from FUNC_MAX_ARGS. I don't think any sampling method would need that many parameters but I also don't see what would additional smaller limit give us. 2. postgres=# explain update test_tablesample TABLESAMPLE system(30) set id = 2; ERROR: syntax error at or near TABLESAMPLE LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i... postgres=# Delete from test_tablesample TABLESAMPLE system(30); ERROR: syntax error at or near TABLESAMPLE LINE 1: Delete from test_tablesample TABLESAMPLE system(30); Isn't TABLESAMPLE clause suppose to work with Update/Delete statements? It's supported in the FROM part of UPDATE and USING part of DELETE. I think that that's sufficient. But I think the Update on target table with sample scan is supported via views which doesn't seem to be the right thing in case you just want to support it via FROM/USING, example postgres=# create view vw_test As select * from test_tablesample TABLESAMPLE sys tem(30); postgres=# explain update vw_test set id = 4; QUERY PLAN --- Update on test_tablesample (cost=0.00..4.04 rows=4 width=210) - Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210) (2 rows) Right, I'll make those views not auto-updatable. Standard is somewhat useless for UPDATE and DELETE as it only defines quite limited syntax there. From what I've seen when doing research MSSQL also only supports it in their equivalent of FROM/USING list, Oracle does not seem to support their SAMPLING clause outside of SELECTs at all and if I got the cryptic DB2 manual correctly I think they don't support it outside of (sub)SELECTs either. By the way, what is the usecase to support sample scan in Update or Delete statement? Well for the USING/FROM part the use-case is same as for SELECT - providing sample of the data for the query (it can be useful also for getting pseudo random rows fast). And if we didn't support it, it could still be done using sub-select so why not have it directly. Also, isn't it better to mention in the docs for Update and Delete incase we are going to support tablesample clause for them? Most of other clauses that we support in FROM are not mentioned in UPDATE/DELETE docs, both of those commands just say something like refer to the SELECT FROM docs for more info. Do you think TABLESAMPLE deserves special treatment in this regard? 7. ParseTableSample() { .. arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE); .. } What is the reason for coercing value of REPEATABLE clause to INT4OID when numeric value is expected for the clause. If user gives the input value as -2.3, it will generate a seed which doesn't seem to be right. Because the REPEATABLE is numeric expression so it can produce whatever number but we need int4 internally (well float4 would also work just the code would be slightly uglier). Okay, I understand that part. Here the real point is why not just expose it as int4 to user rather than telling in docs that it is numeric and actually we neither expect nor use it as numberic. Even Oracle supports supports it as int with below description. The seed_value must be an integer between 0 and 4294967295 Well the thing with SQL Standard's numeric value expression is that it actually does not mean numeric data type, it's just simple arithmetic expression with some given rules (by the standard), but the output data type can be either implementation
Re: [HACKERS] TABLESAMPLE patch
On 06/04/15 15:07, Amit Kapila wrote: On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: On 06/04/15 12:33, Amit Kapila wrote: But I think the Update on target table with sample scan is supported via views which doesn't seem to be the right thing in case you just want to support it via FROM/USING, example postgres=# create view vw_test As select * from test_tablesample TABLESAMPLE sys tem(30); postgres=# explain update vw_test set id = 4; QUERY PLAN --- Update on test_tablesample (cost=0.00..4.04 rows=4 width=210) - Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210) (2 rows) Right, I'll make those views not auto-updatable. Standard is somewhat useless for UPDATE and DELETE as it only defines quite limited syntax there. From what I've seen when doing research MSSQL also only supports it in their equivalent of FROM/USING list, Oracle does not seem to support their SAMPLING clause outside of SELECTs at all and if I got the cryptic DB2 manual correctly I think they don't support it outside of (sub)SELECTs either. By the way, what is the usecase to support sample scan in Update or Delete statement? Well for the USING/FROM part the use-case is same as for SELECT - providing sample of the data for the query (it can be useful also for getting pseudo random rows fast). And if we didn't support it, it could still be done using sub-select so why not have it directly. I can understand why someone wants to read sample data via SELECT, but not clearly able to understand, why some one wants to Update or Delete random data in table and if there is a valid case, then why just based on sub-selects used in where clause or table reference in FROM/USING list. Can't we keep it simple such that either we support to Update/Delete based on Tablesample clause or prohibit it in all cases? Well, I don't understand why would somebody do it either, but then again during research of this feature I've found questions on stack overflow and similar sites about how to do it, so people must have use-cases. And in any case as you say sub-select would work there so there is no reason to explicitly disable it. Plus there is already difference between what can be the target table in DELETE/UPDATE versus what can be in the FROM/USING clause and I think the TABLESAMPLE behavior follows that separation nicely - it's well demonstrated by the fact that we would have to add explicit exception to some places in code to disallow it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 06/04/15 12:33, Amit Kapila wrote: But I think the Update on target table with sample scan is supported via views which doesn't seem to be the right thing in case you just want to support it via FROM/USING, example postgres=# create view vw_test As select * from test_tablesample TABLESAMPLE sys tem(30); postgres=# explain update vw_test set id = 4; QUERY PLAN --- Update on test_tablesample (cost=0.00..4.04 rows=4 width=210) - Sample Scan on test_tablesample (cost=0.00..4.04 rows=4 width=210) (2 rows) Right, I'll make those views not auto-updatable. Standard is somewhat useless for UPDATE and DELETE as it only defines quite limited syntax there. From what I've seen when doing research MSSQL also only supports it in their equivalent of FROM/USING list, Oracle does not seem to support their SAMPLING clause outside of SELECTs at all and if I got the cryptic DB2 manual correctly I think they don't support it outside of (sub)SELECTs either. By the way, what is the usecase to support sample scan in Update or Delete statement? Well for the USING/FROM part the use-case is same as for SELECT - providing sample of the data for the query (it can be useful also for getting pseudo random rows fast). And if we didn't support it, it could still be done using sub-select so why not have it directly. I can understand why someone wants to read sample data via SELECT, but not clearly able to understand, why some one wants to Update or Delete random data in table and if there is a valid case, then why just based on sub-selects used in where clause or table reference in FROM/USING list. Can't we keep it simple such that either we support to Update/Delete based on Tablesample clause or prohibit it in all cases? Also, isn't it better to mention in the docs for Update and Delete incase we are going to support tablesample clause for them? Most of other clauses that we support in FROM are not mentioned in UPDATE/DELETE docs, both of those commands just say something like refer to the SELECT FROM docs for more info. Do you think TABLESAMPLE deserves special treatment in this regard? Nothing too important, just as I got confused while using, someone else can also get confused, but I think we can leave it. And we do this type of coercion even for table data (you can insert -2.3 into integer column and it will work) so I don't see what's wrong with it here. I am not sure we can compare it with column of a table. I think we can support it within a valid range (similar to tablesample method) and if user inputs value outside the range, then return error. But that's not what standard says, it says any numeric value expression is valid. The fact that Oracle limits it to some range should not make us do the same. I think most important thing here is that using -2.3 will produce same results if called repeatedly (if there are no changes to data, vacuum etc). Yes passing -2 will produce same results, I don't know if that is a problem. The main reason why I have the coercion there is so that users don't have to explicitly typecast expression results. Actually, not a big point, but I felt it will be clear if there is a valid range and actually we are not doing anything with negative (part) of seed input by the user. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On 06/04/15 11:02, Simon Riggs wrote: On 2 April 2015 at 17:36, Petr Jelinek p...@2ndquadrant.com wrote: so here is version 11. Looks great. Comment on docs: The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention that if other methods are available they could be used also. The phrasing was sampling method can be one of list. Will reword. Are we ready for a final detailed review and commit? I plan to send v12 in the evening with some additional changes that came up from Amit's comments + some improvements to error reporting. I think it will be ready then. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek p...@2ndquadrant.com wrote: Hi, so here is version 11. Thanks, patch looks much better, but I think still few more things needs to discussed/fixed. 1. +tablesample_clause: + TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause Why do you want to allow func_arg_list? Basically if user tries to pass multiple arguments in TABLESAMPLE method's clause like (10,20), then I think that should be caught in grammer level as an error. SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE method is same numeric value expr It seems to me that you want to allow it to make it extendable to user defined Tablesample methods, but not sure if it is right to use func_arg_list for the same because sample method arguments can have different definition than function arguments. Now even if we want to keep sample method arguments same as function arguments that sounds like a separate discussion. In general, I feel you already have good basic infrastructure for supportting other sample methods, but it is better to keep the new DDL's for doing the same as a separate patch than this patch, as that way we can reduce the scope of this patch, OTOH if you or others feel that it is mandatory to have new DDL's support for other tablesample methods then we have to deal with this now itself. 2. postgres=# explain update test_tablesample TABLESAMPLE system(30) set id = 2; ERROR: syntax error at or near TABLESAMPLE LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i... postgres=# Delete from test_tablesample TABLESAMPLE system(30); ERROR: syntax error at or near TABLESAMPLE LINE 1: Delete from test_tablesample TABLESAMPLE system(30); Isn't TABLESAMPLE clause suppose to work with Update/Delete statements? 3. +static RangeTblEntry * +transformTableSampleEntry(ParseState *pstate, RangeTableSample *r) .. + parser_errposition(pstate, + exprLocation((Node *) r; Better to align exprLocation() with pstate 4. SampleTupleVisible() { .. else { /* No pagemode, we have to check the tuple itself. */ Snapshot snapshot = scan-rs_snapshot; Buffer buffer = scan-rs_cbuf; bool visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer); .. } I think it is better to check if PageIsAllVisible() in above code before visibility check as that can avoid visibility checks. 5. ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable, List *sampleargs) { .. if (con-val.type == T_Null) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg(REPEATABLE clause must be NOT NULL numeric value), parser_errposition (pstate, con-location))); .. } InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample) { .. if (fcinfo.argnull[1]) ereport(ERROR, (errcode (ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg(REPEATABLE clause cannot be NULL))); .. } I think it would be better if we can have same error message and probably the same error code in both of the above cases. 6. extern TableSampleClause * ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable, List *sampleargs) It is better to expose function (usage of extern) via header file. Is there a need to mention extern here? 7. ParseTableSample() { .. arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE); .. } What is the reason for coercing value of REPEATABLE clause to INT4OID when numeric value is expected for the clause. If user gives the input value as -2.3, it will generate a seed which doesn't seem to be right. 8. +DATA(insert OID = 3295 ( tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 2281 23 700 _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ )); +DATA(insert OID = 3301 ( tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 2281 23 700 _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ )); Datatype for second argument is kept as 700 (Float4), shouldn't it be 1700 (Numeric)? 9. postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE SYSTEM ( 10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20); QUERY PLAN Nested Loop (cost=0.00..4.05 rows=100 width=4) - Sample Scan on test_tablesample t1 (cost=0.00..0.01 rows=1 width=4) - Sample Scan on test_tablesample t2 (cost=0.00..4.02 rows=2 width=0) (3 rows) Isn't it better to display sample method name in explain. I think it can help in case of join queries. We can use below format to display: Sample Scan (System) on test_tablesample ... 10. Bernoulli.c +/* State */ +typedef struct +{ + uint32 seed; /* random seed */ + BlockNumber startblock; /* starting block, we use ths for syncscan support */ typo. /ths/this With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On 04/04/15 14:57, Amit Kapila wrote: 1. +tablesample_clause: +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause Why do you want to allow func_arg_list? Basically if user tries to pass multiple arguments in TABLESAMPLE method's clause like (10,20), then I think that should be caught in grammer level as an error. It will be reported as error during parse analysis if the TABLESAMPLE method does not accept two parameters, same as when the expression used wrong type for example. SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE method is same numeric value expr It seems to me that you want to allow it to make it extendable to user defined Tablesample methods, but not sure if it is right to use func_arg_list for the same because sample method arguments can have different definition than function arguments. Now even if we want to keep sample method arguments same as function arguments that sounds like a separate discussion. Yes I want extensibility here. And I think the tablesample method arguments are same thing as function arguments given that in the end they are arguments for the init function of tablesampling method. I would be ok with just expr_list, naming parameters here isn't overly important, but ability to have different types and numbers of parameters for custom TABLESAMPLE methods *is* important. In general, I feel you already have good basic infrastructure for supportting other sample methods, but it is better to keep the new DDL's for doing the same as a separate patch than this patch, as that way we can reduce the scope of this patch, OTOH if you or others feel that it is mandatory to have new DDL's support for other tablesample methods then we have to deal with this now itself. Well I did attach it as separate diff mainly for that reason. I agree that DDL does not have to be committed immediately with the rest of the patch (although it's the simplest part of the patch IMHO). 2. postgres=# explain update test_tablesample TABLESAMPLE system(30) set id = 2; ERROR: syntax error at or near TABLESAMPLE LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i... postgres=# Delete from test_tablesample TABLESAMPLE system(30); ERROR: syntax error at or near TABLESAMPLE LINE 1: Delete from test_tablesample TABLESAMPLE system(30); Isn't TABLESAMPLE clause suppose to work with Update/Delete statements? It's supported in the FROM part of UPDATE and USING part of DELETE. I think that that's sufficient. Standard is somewhat useless for UPDATE and DELETE as it only defines quite limited syntax there. From what I've seen when doing research MSSQL also only supports it in their equivalent of FROM/USING list, Oracle does not seem to support their SAMPLING clause outside of SELECTs at all and if I got the cryptic DB2 manual correctly I think they don't support it outside of (sub)SELECTs either. 4. SampleTupleVisible() { .. else { /* No pagemode, we have to check the tuple itself. */ Snapshot snapshot = scan-rs_snapshot; Bufferbuffer = scan-rs_cbuf; bool visible = HeapTupleSatisfiesVisibility(tuple, snapshot, buffer); .. } I think it is better to check if PageIsAllVisible() in above code before visibility check as that can avoid visibility checks. It's probably even better to do that one level up in the samplenexttup() and only call the SampleTupleVisible if page is not allvisible (PageIsAllVisible() is cheap). 6. extern TableSampleClause * ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable, List *sampleargs) It is better to expose function (usage of extern) via header file. Is there a need to mention extern here? Eh, stupid copy-paste error when copying function name from header to actual source file. 7. ParseTableSample() { .. arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE); .. } What is the reason for coercing value of REPEATABLE clause to INT4OID when numeric value is expected for the clause. If user gives the input value as -2.3, it will generate a seed which doesn't seem to be right. Because the REPEATABLE is numeric expression so it can produce whatever number but we need int4 internally (well float4 would also work just the code would be slightly uglier). And we do this type of coercion even for table data (you can insert -2.3 into integer column and it will work) so I don't see what's wrong with it here. 8. +DATA(insert OID = 3295 ( tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 2281 23 700 _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ )); +DATA(insert OID = 3301 ( tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 2281 23 700 _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_ _null_ )); Datatype for second argument is kept as 700 (Float4), shouldn't it be 1700 (Numeric)? Why is that? Given the sampling error I think the float4 is enough for specifying the percentage and it makes the
Re: [HACKERS] TABLESAMPLE patch
On 03/15/15 16:21, Petr Jelinek wrote: I also did all the other adjustments we talked about up-thread and rebased against current master (there was conflict with 31eae6028). Hi, I did a review of the version submitted on 03/15 today, and only found a few minor issues: 1) The documentation of the pg_tablesample_method catalog is missing documentation of the 'tsmpagemode' column, which was added later. 2) transformTableEntry() in parse_clause modifies the comment, in a way that made sense before part of the code was moved to a separate function. I suggest to revert the comment changes, and instead add the comment to transformTableSampleEntry() 3) The shared parts of the block sampler in sampling.c (e.g. in BlockSampler_Next) reference Vitter's algorithm (both the code and comments) which is a bit awkward as the only part that uses it is analyze.c. The other samplers using this code (system / bernoulli) don't use Vitter's algorithm. I don't think it's possible to separate this piece of code, though. It simply has to be in there somewhere, so I'd recommend adding here a simple comment explaining that it's needed because of analyze.c. Otherwise I think the patch is OK. I'll wait for Petr to fix these issues, and then mark it as ready for committer. What do you think, Amit? (BTW you should probably add yourself as reviewer in the CF app, as you've provided a lot of feedback here.) -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Wed, Apr 1, 2015 at 6:31 PM, Tomas Vondra tomas.von...@2ndquadrant.com wrote: On 03/15/15 16:21, Petr Jelinek wrote: I also did all the other adjustments we talked about up-thread and rebased against current master (there was conflict with 31eae6028). Hi, I did a review of the version submitted on 03/15 today, and only found a few minor issues: 1) The documentation of the pg_tablesample_method catalog is missing documentation of the 'tsmpagemode' column, which was added later. 2) transformTableEntry() in parse_clause modifies the comment, in a way that made sense before part of the code was moved to a separate function. I suggest to revert the comment changes, and instead add the comment to transformTableSampleEntry() 3) The shared parts of the block sampler in sampling.c (e.g. in BlockSampler_Next) reference Vitter's algorithm (both the code and comments) which is a bit awkward as the only part that uses it is analyze.c. The other samplers using this code (system / bernoulli) don't use Vitter's algorithm. I don't think it's possible to separate this piece of code, though. It simply has to be in there somewhere, so I'd recommend adding here a simple comment explaining that it's needed because of analyze.c. Otherwise I think the patch is OK. I'll wait for Petr to fix these issues, and then mark it as ready for committer. What do you think, Amit? (BTW you should probably add yourself as reviewer in the CF app, as you've provided a lot of feedback here.) I am still not sure whether it is okay to move REPEATABLE from unreserved to other category. In-fact last weekend I have spent some time to see the exact reason for shift/reduce errors and tried some ways but didn't find a way to get away with the same. Now I am planning to spend some more time on the same probably in next few days and then still if I cannot find a way, I will share my findings and then once re-review the changes made by Petr in last version. I think overall the patch is in good shape now although I haven't looked into DDL support part of the patch which I thought could be done in a separate patch as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On 01/04/15 17:52, Robert Haas wrote: On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila amit.kapil...@gmail.com wrote: I am still not sure whether it is okay to move REPEATABLE from unreserved to other category. In-fact last weekend I have spent some time to see the exact reason for shift/reduce errors and tried some ways but didn't find a way to get away with the same. Now I am planning to spend some more time on the same probably in next few days and then still if I cannot find a way, I will share my findings and then once re-review the changes made by Petr in last version. I think overall the patch is in good shape now although I haven't looked into DDL support part of the patch which I thought could be done in a separate patch as well. That seems like a legitimate concern. We usually try not to make keywords more reserved in PostgreSQL than they are in the SQL standard, and REPEATABLE is apparently non-reserved there: http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html This also makes method an unreserved keyword, which I'm not wild about either. Adding new keyword doesn't cost *much*, but is this SQL-mandated syntax or something we created? If the latter, can we find something to call it that doesn't require new keywords? REPEATABLE is mandated by standard. I did try for quite some time to make it unreserved but was not successful (I can only make it unreserved if I make it mandatory but that's not a solution). I haven't been in fact even able to find out what it actually conflicts with... METHOD is something I added. I guess we could find a way to name this differently if we really tried. The reason why I picked METHOD was that I already added the same unreserved keyword in the sequence AMs patch and in that one any other name does not really make sense. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila amit.kapil...@gmail.com wrote: I am still not sure whether it is okay to move REPEATABLE from unreserved to other category. In-fact last weekend I have spent some time to see the exact reason for shift/reduce errors and tried some ways but didn't find a way to get away with the same. Now I am planning to spend some more time on the same probably in next few days and then still if I cannot find a way, I will share my findings and then once re-review the changes made by Petr in last version. I think overall the patch is in good shape now although I haven't looked into DDL support part of the patch which I thought could be done in a separate patch as well. That seems like a legitimate concern. We usually try not to make keywords more reserved in PostgreSQL than they are in the SQL standard, and REPEATABLE is apparently non-reserved there: http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html This also makes method an unreserved keyword, which I'm not wild about either. Adding new keyword doesn't cost *much*, but is this SQL-mandated syntax or something we created? If the latter, can we find something to call it that doesn't require new keywords? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 01/04/15 18:38, Robert Haas wrote: On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek p...@2ndquadrant.com wrote: REPEATABLE is mandated by standard. I did try for quite some time to make it unreserved but was not successful (I can only make it unreserved if I make it mandatory but that's not a solution). I haven't been in fact even able to find out what it actually conflicts with... Yeah, that can be hard to figure out. Did you run bison with -v and poke around in gram.output? Oh, no I didn't (I didn't know gram.output will be generated). This helped quite a bit. Thanks. I now found the reason, it's conflicting with alias but that's my mistake - alias should be before TABLESAMPLE clause as per standard and I put it after in parser. Now that I put it at correct place REPEATABLE can be unreserved keyword. This change requires making TABLESAMPLE a type_func_name_keyword but that's probably not really an issue as TABLESAMPLE is reserved keyword per standard. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek p...@2ndquadrant.com wrote: REPEATABLE is mandated by standard. I did try for quite some time to make it unreserved but was not successful (I can only make it unreserved if I make it mandatory but that's not a solution). I haven't been in fact even able to find out what it actually conflicts with... Yeah, that can be hard to figure out. Did you run bison with -v and poke around in gram.output? METHOD is something I added. I guess we could find a way to name this differently if we really tried. The reason why I picked METHOD was that I already added the same unreserved keyword in the sequence AMs patch and in that one any other name does not really make sense. I see. Well, I guess if we've got more than one use for it it's not so bad. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 10/03/15 10:54, Amit Kapila wrote: On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: Ok now I think I finally understand what you are suggesting - you are saying let's go over whole page while tsmnexttuple returns something, and do the visibility check and other stuff in that code block under the buffer lock and cache the resulting valid tuples in some array and then return those tuples one by one from that cache? Yes, this is what I am suggesting. And if the caller will try to do it in one step and cache the visibility info then we'll end up with pretty much same structure as rs_vistuples - there isn't saner way to cache this info other than ordered vector of tuple offsets, unless we assume that most pages have close to MaxOffsetNumber of tuples which they don't, so why not just use the heapgetpage directly and do the binary search over rs_vistuples. The downside of doing it via heapgetpage is that it will do visibility test for tuples which we might not even need (I think we should do visibility test for tuples retrurned by tsmnexttuple). Well, heapgetpage can either read visibility data for whole page or not, depending on if we want pagemode reading or not. So we can use the pagemode for sampling methods where it's feasible (like system) and not use pagemode where it's not (like bernoulli) and then either use the rs_vistuples or call HeapTupleSatisfiesVisibility individually again depending if the method is using pagemode or not. Yeah, but as mentioned above, this has some downside, but go for it only if you feel that above suggestion is making code complex, which I think should not be the case as we are doing something similar in acquire_sample_rows(). I think your suggestion is actually simpler code wise, I am just somewhat worried by the fact that no other scan node uses that kind of caching and there is probably reason for that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 10/03/15 04:43, Amit Kapila wrote: On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: On 09/03/15 04:51, Amit Kapila wrote: On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com Double checking for tuple visibility is the only downside I can think of. That will happen if we use heapgetpage and the way currently code is written in patch, however we can easily avoid double checking if we don't call heapgetpage and rather do the required work at caller's place. What's the point of pagemode then if the caller code does the visibility checks still one by one on each call. I thought one of the points of pagemode was to do this in one step (and one buffer lock). You only need one buffer lock for doing at caller's location as well something like we do in acquire_sample_rows(). Ok now I think I finally understand what you are suggesting - you are saying let's go over whole page while tsmnexttuple returns something, and do the visibility check and other stuff in that code block under the buffer lock and cache the resulting valid tuples in some array and then return those tuples one by one from that cache? And if the caller will try to do it in one step and cache the visibility info then we'll end up with pretty much same structure as rs_vistuples - there isn't saner way to cache this info other than ordered vector of tuple offsets, unless we assume that most pages have close to MaxOffsetNumber of tuples which they don't, so why not just use the heapgetpage directly and do the binary search over rs_vistuples. The downside of doing it via heapgetpage is that it will do visibility test for tuples which we might not even need (I think we should do visibility test for tuples retrurned by tsmnexttuple). Well, heapgetpage can either read visibility data for whole page or not, depending on if we want pagemode reading or not. So we can use the pagemode for sampling methods where it's feasible (like system) and not use pagemode where it's not (like bernoulli) and then either use the rs_vistuples or call HeapTupleSatisfiesVisibility individually again depending if the method is using pagemode or not. This is how sequential scan does it afaik. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 10/03/15 04:43, Amit Kapila wrote: On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: On 09/03/15 04:51, Amit Kapila wrote: On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com Double checking for tuple visibility is the only downside I can think of. That will happen if we use heapgetpage and the way currently code is written in patch, however we can easily avoid double checking if we don't call heapgetpage and rather do the required work at caller's place. What's the point of pagemode then if the caller code does the visibility checks still one by one on each call. I thought one of the points of pagemode was to do this in one step (and one buffer lock). You only need one buffer lock for doing at caller's location as well something like we do in acquire_sample_rows(). Ok now I think I finally understand what you are suggesting - you are saying let's go over whole page while tsmnexttuple returns something, and do the visibility check and other stuff in that code block under the buffer lock and cache the resulting valid tuples in some array and then return those tuples one by one from that cache? Yes, this is what I am suggesting. And if the caller will try to do it in one step and cache the visibility info then we'll end up with pretty much same structure as rs_vistuples - there isn't saner way to cache this info other than ordered vector of tuple offsets, unless we assume that most pages have close to MaxOffsetNumber of tuples which they don't, so why not just use the heapgetpage directly and do the binary search over rs_vistuples. The downside of doing it via heapgetpage is that it will do visibility test for tuples which we might not even need (I think we should do visibility test for tuples retrurned by tsmnexttuple). Well, heapgetpage can either read visibility data for whole page or not, depending on if we want pagemode reading or not. So we can use the pagemode for sampling methods where it's feasible (like system) and not use pagemode where it's not (like bernoulli) and then either use the rs_vistuples or call HeapTupleSatisfiesVisibility individually again depending if the method is using pagemode or not. Yeah, but as mentioned above, this has some downside, but go for it only if you feel that above suggestion is making code complex, which I think should not be the case as we are doing something similar in acquire_sample_rows(). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 09/03/15 04:51, Amit Kapila wrote: On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com Double checking for tuple visibility is the only downside I can think of. That will happen if we use heapgetpage and the way currently code is written in patch, however we can easily avoid double checking if we don't call heapgetpage and rather do the required work at caller's place. What's the point of pagemode then if the caller code does the visibility checks still one by one on each call. I thought one of the points of pagemode was to do this in one step (and one buffer lock). You only need one buffer lock for doing at caller's location as well something like we do in acquire_sample_rows(). And if the caller will try to do it in one step and cache the visibility info then we'll end up with pretty much same structure as rs_vistuples - there isn't saner way to cache this info other than ordered vector of tuple offsets, unless we assume that most pages have close to MaxOffsetNumber of tuples which they don't, so why not just use the heapgetpage directly and do the binary search over rs_vistuples. The downside of doing it via heapgetpage is that it will do visibility test for tuples which we might not even need (I think we should do visibility test for tuples retrurned by tsmnexttuple). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On 09/03/15 04:51, Amit Kapila wrote: On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: On 05/03/15 09:21, Amit Kapila wrote: On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I didn't add the whole page visibility caching as the tuple ids we get from sampling methods don't map well to the visibility info we get from heapgetpage (it maps to the values in the rs_vistuples array not to to its indexes). Commented about it in code also. I think we should set pagemode for system sampling as it can have dual benefit, one is it will allow us caching tuples and other is it can allow us pruning of page which is done in heapgetpage(). Do you see any downside to it? Double checking for tuple visibility is the only downside I can think of. That will happen if we use heapgetpage and the way currently code is written in patch, however we can easily avoid double checking if we don't call heapgetpage and rather do the required work at caller's place. What's the point of pagemode then if the caller code does the visibility checks still one by one on each call. I thought one of the points of pagemode was to do this in one step (and one buffer lock). And if the caller will try to do it in one step and cache the visibility info then we'll end up with pretty much same structure as rs_vistuples - there isn't saner way to cache this info other than ordered vector of tuple offsets, unless we assume that most pages have close to MaxOffsetNumber of tuples which they don't, so why not just use the heapgetpage directly and do the binary search over rs_vistuples. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com wrote: On 05/03/15 09:21, Amit Kapila wrote: On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I didn't add the whole page visibility caching as the tuple ids we get from sampling methods don't map well to the visibility info we get from heapgetpage (it maps to the values in the rs_vistuples array not to to its indexes). Commented about it in code also. I think we should set pagemode for system sampling as it can have dual benefit, one is it will allow us caching tuples and other is it can allow us pruning of page which is done in heapgetpage(). Do you see any downside to it? Double checking for tuple visibility is the only downside I can think of. That will happen if we use heapgetpage and the way currently code is written in patch, however we can easily avoid double checking if we don't call heapgetpage and rather do the required work at caller's place. Plus some added code complexity of course. I guess we could use binary search on rs_vistuples (it's already sorted) so that info won't be completely useless. Not sure if worth it compared to normal visibility check, but not hard to do. It seems to me that it is better to avoid locking/unlocking buffer wherever possible. I personally don't see the page pruning in TABLESAMPLE as all that important though, considering that in practice it will only scan tiny portion of a relation and usually one that does not get many updates (in practice the main use-case is BI). In that case, I think it should be acceptable either way, because if there are less updates then anyway it won't incur any cost of doing the pruning. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On 05/03/15 09:21, Amit Kapila wrote: On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I didn't add the whole page visibility caching as the tuple ids we get from sampling methods don't map well to the visibility info we get from heapgetpage (it maps to the values in the rs_vistuples array not to to its indexes). Commented about it in code also. I think we should set pagemode for system sampling as it can have dual benefit, one is it will allow us caching tuples and other is it can allow us pruning of page which is done in heapgetpage(). Do you see any downside to it? Double checking for tuple visibility is the only downside I can think of. Plus some added code complexity of course. I guess we could use binary search on rs_vistuples (it's already sorted) so that info won't be completely useless. Not sure if worth it compared to normal visibility check, but not hard to do. I personally don't see the page pruning in TABLESAMPLE as all that important though, considering that in practice it will only scan tiny portion of a relation and usually one that does not get many updates (in practice the main use-case is BI). Few other comments: 1. Current patch fails to apply, minor change is required: patching file `src/backend/utils/misc/Makefile' Hunk #1 FAILED at 15. 1 out of 1 hunk FAILED -- saving rejects to src/backend/utils/misc/Makefile.rej Ah bitrot over time. 2. Few warnings in code (compiled on windows(msvc)) 1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' : truncation from 'double' to 'float4' 1src\backend\utils\tablesample\system.c(177): warning C4305: '=' : truncation from 'double' to 'float4' I think this is just compiler stupidity but hopefully fixed (I don't have msvc to check for it and other compilers I tried don't complain). 3. +static void +InitScanRelation(SampleScanState *node, EState *estate, int eflags, +TableSampleClause *tablesample) { .. +/* +* Page at a time mode is useless for us as we need to check visibility +* of tuples individually because tuple offsets returned by sampling +* methods map to rs_vistuples values and not to its indexes. +*/ +node-ss.ss_currentScanDesc-rs_pageatatime = false; .. } Modifying scandescriptor in nodeSamplescan.c looks slightly odd, Do we modify this way at anyother place? I think it is better to either teach heap_beginscan_* api's about it or expose it via new API in heapam.c Yeah I agree, I taught the heap_beginscan_strat about it as that one is the advanced API. 4. +Datum +tsm_system_cost(PG_FUNCTION_ARGS) { .. +*tuples = path-rows * samplesize; } It seems above calculation considers all rows in table are of equal size and hence multiplying directly with samplesize will give estimated number of rows for sample method, however if row size varies then this calculation might not give right results. I think if possible we should consider the possibility of rows with varying sizes else we can at least write a comment to indicate the possibility of improvement for future. I am not sure how we would know what size would the tuples have in the random blocks that we are going to read later. That said, I am sure that costing can be improved even if I don't know how myself. 6. @@ -13577,6 +13608,7 @@ reserved_keyword: | PLACING | PRIMARY | REFERENCES +| REPEATABLE Have you tried to investigate the reason why it is giving shift/reduce error for unreserved category and if we are not able to resolve that, then at least we can try to make it in some less restrictive category. I tried (on windows) by putting it in (type_func_name_keyword:) and it seems to be working. To investigate, you can try with information at below link: http://www.gnu.org/software/bison/manual/html_node/Understanding.html Basically I think we should try some more before concluding to change the category of REPEATABLE and especially if we want to make it a RESERVED keyword. Yes it can be moved to type_func_name_keyword which is not all that much better but at least something. I did try to play with this already during first submission but could not find a way to move it to something less restrictive. I could not even pinpoint what exactly is the shift/reduce issue except that it has something to do with the fact that the REPEATABLE clause is optional (at least I didn't have the problem when it wasn't optional). On a related note, it seems you have agreed upthread with Kyotaro-san for below change, but I don't see the same in latest patch: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ; opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; }
Re: [HACKERS] TABLESAMPLE patch
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com wrote: I didn't add the whole page visibility caching as the tuple ids we get from sampling methods don't map well to the visibility info we get from heapgetpage (it maps to the values in the rs_vistuples array not to to its indexes). Commented about it in code also. I think we should set pagemode for system sampling as it can have dual benefit, one is it will allow us caching tuples and other is it can allow us pruning of page which is done in heapgetpage(). Do you see any downside to it? Few other comments: 1. Current patch fails to apply, minor change is required: patching file `src/backend/utils/misc/Makefile' Hunk #1 FAILED at 15. 1 out of 1 hunk FAILED -- saving rejects to src/backend/utils/misc/Makefile.rej 2. Few warnings in code (compiled on windows(msvc)) 1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' : truncation from 'double' to 'float4' 1src\backend\utils\tablesample\system.c(177): warning C4305: '=' : truncation from 'double' to 'float4' 3. +static void +InitScanRelation(SampleScanState *node, EState *estate, int eflags, + TableSampleClause *tablesample) { .. + /* + * Page at a time mode is useless for us as we need to check visibility + * of tuples individually because tuple offsets returned by sampling + * methods map to rs_vistuples values and not to its indexes. + */ + node-ss.ss_currentScanDesc-rs_pageatatime = false; .. } Modifying scandescriptor in nodeSamplescan.c looks slightly odd, Do we modify this way at anyother place? I think it is better to either teach heap_beginscan_* api's about it or expose it via new API in heapam.c 4. +Datum +tsm_system_cost(PG_FUNCTION_ARGS) { .. + *tuples = path-rows * samplesize; } It seems above calculation considers all rows in table are of equal size and hence multiplying directly with samplesize will give estimated number of rows for sample method, however if row size varies then this calculation might not give right results. I think if possible we should consider the possibility of rows with varying sizes else we can at least write a comment to indicate the possibility of improvement for future. 5. gram.y - /* * Given UPDATE foo set set ..., we have to decide without looking any Unrelated line removed. 6. @@ -13577,6 +13608,7 @@ reserved_keyword: | PLACING | PRIMARY | REFERENCES + | REPEATABLE Have you tried to investigate the reason why it is giving shift/reduce error for unreserved category and if we are not able to resolve that, then at least we can try to make it in some less restrictive category. I tried (on windows) by putting it in (type_func_name_keyword:) and it seems to be working. To investigate, you can try with information at below link: http://www.gnu.org/software/bison/manual/html_node/Understanding.html Basically I think we should try some more before concluding to change the category of REPEATABLE and especially if we want to make it a RESERVED keyword. On a related note, it seems you have agreed upthread with Kyotaro-san for below change, but I don't see the same in latest patch: diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ; opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; } | /*EMPTY*/ { $$ = NULL; } With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
Hi, On 22.2.2015 18:57, Petr Jelinek wrote: Tomas noticed that the patch is missing error check when TABLESAMPLE is used on view, so here is a new version that checks it's only used against table or matview. No other changes. Curious question - could/should this use page prefetch, similar to what bitmap heap scan does? I believe the answer is 'yes'. With SYSTEM that should be rather straightforward to implement, because it already works at page level, and it's likely to give significant performance speedup, similar to bitmap index scan: http://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com With BERNOULLI that might be more complex to implement because of the page/tuple sampling, and the benefit is probably much lower than for SYSTEM because it's likely that at least one tuple will be sampled. I'm not saying it has to be done in this CF (or that it makes the patch uncommitable). For example, this seems like a very nice project for the GSoC (clear scope, not too large, ...). -- Tomas Vondrahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 19/01/15 07:08, Amit Kapila wrote: On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I think that's actually good to have, because we still do costing and the partial index might help produce better estimate of number of returned rows for tablesample as well. I don't understand how will it help, because for tablesample scan it doesn't consider partial index at all as per patch. CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i; INSERT INTO test_tablesample values(generate_series(10,1); create index idx_tblsample on test_tablesample(id) where id; Analyze test_tablesample; postgres=# explain SELECT id FROM test_tablesample where id ; QUERY PLAN --- Index Only Scan using idx_tblsample on test_tablesample (cost=0.13..8.14 rows= 1 width=4) Index Cond: (id ) (2 rows) postgres=# explain SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80) wh ere id ; QUERY PLAN Sample Scan on test_tablesample (cost=0.00..658.00 rows=8000 width=4) Filter: (id ) (2 rows) The above result also clearly indicate that when TABLESAMPLE clause is used, it won't use partial index. Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan. I don't mind doing everything in nodeSamplescan, however if you can split the function, it will be easier to read and understand, if you see in nodeBitmapHeapscan, that also has function like bitgetpage(). This is just a suggestion and if you think that it can be splitted, then it's okay, otherwise leave it as it is. Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c. Basically during sequiantial scan on same table by different backends, we attempt to keep them synchronized to reduce the I/O. Ah this, yes, it makes sense for bernoulli, not for system though. I guess it should be used for sampling methods that use SAS_SEQUENTIAL strategy. Have you taken care of this in your latest patch? I don't know how else to explain, if we loop over every tuple in the relation and return it with equal probability then visibility checks don't matter as the percentage of visible tuples will be same in the result as in the relation. For example if you have 30% visible tuples and you are interested in 10% of tuples overall it will return 10% of all tuples and since every tuple has 10% chance of being returned, 30% of those should be visible (if we assume smooth distribution of random numbers generated). So in the end you are getting 10% of visible tuples because the scan will obviously skip the invisible ones and that's what you wanted. As I said problem of analyze is that it uses tuple limit instead of probability. Okay, got the point. Yes the differences is smaller (in relative numbers) for bigger tables when I test this. On 1k row table the spread of 20 runs was between 770-818 and on 100k row table it's between 79868-80154. I think that's acceptable variance and it's imho indeed the random generator that produces this. Sure, I think this is acceptable. Oh and BTW when I delete 50k of tuples (make them invisible) the results of 20 runs are between 30880 and 40063 rows. This is between 60% to 80%, lower than what is expected, but I guess we can't do much for this except for trying with reverse order for visibility test and sample tuple call, you can decide if you want to try that once just to see if that is better. I am sure it could be done with sequence scan. Not sure if it would be pretty and more importantly, the TABLESAMPLE is *not* sequential scan. The fact that bernoulli method looks similar should not make us assume that every other method does as well, especially when system method is completely different. Okay, lets keep it as separate. Anyway, attached is new version with some updates that you mentioned (all_visible, etc). I also added optional interface for the sampling method to access the tuple contents as I can imagine sampling methods that will want to do that. + /* + * Let the sampling method examine the actual tuple and decide if we + * should return it. + * + * Note that we let it examine even invisible tuples. + */ + if
Re: [HACKERS] TABLESAMPLE patch
On 31/01/15 14:27, Amit Kapila wrote: On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: On 19/01/15 07:08, Amit Kapila wrote: On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: I think that's actually good to have, because we still do costing and the partial index might help produce better estimate of number of returned rows for tablesample as well. I don't understand how will it help, because for tablesample scan it doesn't consider partial index at all as per patch. Oh I think we were talking abut different things then, I thought you were talking about the index checks that planner/optimizer sometimes does to get more accurate statistics. I'll take another look then. Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan. I don't mind doing everything in nodeSamplescan, however if you can split the function, it will be easier to read and understand, if you see in nodeBitmapHeapscan, that also has function like bitgetpage(). This is just a suggestion and if you think that it can be splitted, then it's okay, otherwise leave it as it is. Yeah I can split it to separate function within the nodeSamplescan, that sounds reasonable. Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c. Basically during sequiantial scan on same table by different backends, we attempt to keep them synchronized to reduce the I/O. Ah this, yes, it makes sense for bernoulli, not for system though. I guess it should be used for sampling methods that use SAS_SEQUENTIAL strategy. Have you taken care of this in your latest patch? Not yet. I think I will need to make the strategy property of the sampling method instead of returning it by costing function so that the info can be used by the scan. Oh and BTW when I delete 50k of tuples (make them invisible) the results of 20 runs are between 30880 and 40063 rows. This is between 60% to 80%, lower than what is expected, but I guess we can't do much for this except for trying with reverse order for visibility test and sample tuple call, you can decide if you want to try that once just to see if that is better. No, that's because I can't write properly, the lower number was supposed to be 39880 which is well within the tolerance, sorry for the confusion (9 and 0 are just too close). Anyway, attached is new version with some updates that you mentioned (all_visible, etc). I also added optional interface for the sampling method to access the tuple contents as I can imagine sampling methods that will want to do that. +/* +* Let the sampling method examine the actual tuple and decide if we +* should return it. +* +* Note that we let it examine even invisible tuples. +*/ +if (OidIsValid(node-tsmreturntuple.fn_oid)) +{ +found = DatumGetBool(FunctionCall4(node-tsmreturntuple, + PointerGetDatum (node), + UInt32GetDatum (blockno), + PointerGetDatum (tuple), + BoolGetDatum (visible))); +/* XXX: better error */ +if (found !visible) +elog(ERROR, Sampling method wanted to return invisible tuple); +} You have mentioned in comment that let it examine invisible tuple, but it is not clear why you want to allow examining invisible tuple and then later return error, why can't it skips invisible tuple. Well I think returning invisible tuples to user is bad idea and that's why the check, but I also think it might make sense to examine the invisible tuple for the sampling function in case it wants to create some kind of stats about the scan and wants to use those for making the decision about returning other tuples. The interface should be probably called tsmexaminetuple instead to make it more clear what the intention is. 1. How about statistics (pgstat_count_heap_getnext()) during SampleNext as we do in heap_getnext? Right, will add. 2. +static TupleTableSlot * +SampleNext(SampleScanState *node) +{ .. +/* +* Lock the buffer so we can safely assess tuple +* visibility later. +*/ +LockBuffer(buffer, BUFFER_LOCK_SHARE); .. } When is this content lock released, shouldn't we release it after checking visibility of tuple? Here, + if (!OffsetNumberIsValid(tupoffset)) + { + UnlockReleaseBuffer(buffer); but yes you are correct, it should be just released there and we can unlock already after visibility check. 3. +static TupleTableSlot * +SampleNext(SampleScanState
Re: [HACKERS] TABLESAMPLE patch
On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote: On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek p...@2ndquadrant.com wrote: Yes, that's my view too. I would generally be for that change also and it would be worth it if the code was used in more than one place, but as it is it seems like it will just add code/complexity for no real benefit. It would make sense in case we used sequential scan node instead of the new node as Amit also suggested, but I remain unconvinced that mixing sampling and sequential scan into single scan node would be a good idea. Based on previous experience, I expect that any proposal to merge those nodes would get shot down by Tom with his laser-guided atomic bazooka faster than you can say -1 from me regards tom lane. Do we get illustrations with that? ;-) I want a poster for my wall! -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek p...@2ndquadrant.com wrote: Yes, that's my view too. I would generally be for that change also and it would be worth it if the code was used in more than one place, but as it is it seems like it will just add code/complexity for no real benefit. It would make sense in case we used sequential scan node instead of the new node as Amit also suggested, but I remain unconvinced that mixing sampling and sequential scan into single scan node would be a good idea. Based on previous experience, I expect that any proposal to merge those nodes would get shot down by Tom with his laser-guided atomic bazooka faster than you can say -1 from me regards tom lane. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 1/29/15 10:44 AM, Bruce Momjian wrote: On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote: On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek p...@2ndquadrant.com wrote: Yes, that's my view too. I would generally be for that change also and it would be worth it if the code was used in more than one place, but as it is it seems like it will just add code/complexity for no real benefit. It would make sense in case we used sequential scan node instead of the new node as Amit also suggested, but I remain unconvinced that mixing sampling and sequential scan into single scan node would be a good idea. Based on previous experience, I expect that any proposal to merge those nodes would get shot down by Tom with his laser-guided atomic bazooka faster than you can say -1 from me regards tom lane. Do we get illustrations with that? ;-) I want a poster for my wall! +1. It should also be the tshirt for the next pgCon. ;) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 28/01/15 09:41, Kyotaro HORIGUCHI wrote: As an issue related to size esmation, I got a explain result as following, =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a 5; QUERY PLAN Sample Scan on t1 (cost=0.00..301.00 rows=1 width=4) (actual time=0.015..2 .920 rows=4294 loops=1) Filter: (a 5) Rows Removed by Filter: 5876 Buffers: shared hit=45 actual rows is large as twice as the estimated. tsm_system_cost estimates the number of the result rows using baserel-tuples, not using baserel-rows so it doesn't reflect the scan quals. Did the code come from some other intention? No, that's a bug. 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? Possibly, my thinking was that unlike the index_getnext() and heap_getnext(), this function would not be called from any other place so adding one more layer of abstraction didn't seem useful. And it would mean creating new ScanDesc struct, etc. I think adding additional abstraction would simplify the function and reduce the chance of discrepency, I think we need to almost create a duplicate version of code for heapgetpage() method. Yeah, I agree that we need to build structure like ScanDesc, but still it will be worth to keep code consistent. Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan. As a general opinion, I'll vote for Amit's comment, since three or four similar instances seems to be a enough reason to abstract it. On the other hand, since the scan processes are distributed in ExecProcNode by the nodeTag of scan nodes, reunioning of the control by abstraction layer after that could be said to introducing useless annoyance. In short, bastraction at the level of *Next() would bring the necessity of additional changes around the execution node distribution. Yes, that's my view too. I would generally be for that change also and it would be worth it if the code was used in more than one place, but as it is it seems like it will just add code/complexity for no real benefit. It would make sense in case we used sequential scan node instead of the new node as Amit also suggested, but I remain unconvinced that mixing sampling and sequential scan into single scan node would be a good idea. How will it get smoothen for cases when let us say more than 50% of tuples are not visible. I think its due to Postgres non-overwritting storage architecture that we maintain multiple version of rows in same heap, otherwise for different kind of architecture (like mysql/oracle) where multiple row versions are not maintained in same heap, the same function (with same percentage) can return entirely different number of rows. I don't know how else to explain, if we loop over every tuple in the relation and return it with equal probability then visibility checks don't matter as the percentage of visible tuples will be same in the result as in the relation. Surely it migh yield the effectively the same result. Even so, this code needs exaplation comment about the nature aroud the code, or write code as MMVC people won't get confused, I suppose. Yes it does, but as I am failing to explain even here, it's not clear to me what to write there. From my point of view it's just effect of the essential property of bernoulli sampling which is that every element has equal probability of being included in the sample. It comes from the fact that we do bernoulli trial (in the code it's the while (sampler_random_fract(sampler-randstate) probability) loop) on every individual tuple. This means that the ratio of visible and invisible tuples should be same in the sample as it is in the relation. We then just skip the invisible tuples and get the correct percentage of the visible ones (this has performance benefit of not having to do visibility check on all tuples). If that wasn't true than the bernoulli sampling would just simply not work as intended as the same property is the reason why it's used in statistics - the trends should look the same in sample as they look in the source population. Obviously there is some variation in practice as we don't have perfect random generator but that's independent of the algorithm. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7
Re: [HACKERS] TABLESAMPLE patch
Hello, On 19/01/15 07:08, Amit Kapila wrote: On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: No issues, but it seems we should check other paths where different handling could be required for tablesample scan. In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size()) for rel size estimates where it checks the presence of partial indexes and that might effect the size estimates and that doesn't seem to be required when we have to perform scan based on TableSample method. I think that's actually good to have, because we still do costing and the partial index might help produce better estimate of number of returned rows for tablesample as well. As an issue related to size esmation, I got a explain result as following, =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a 5; QUERY PLAN Sample Scan on t1 (cost=0.00..301.00 rows=1 width=4) (actual time=0.015..2 .920 rows=4294 loops=1) Filter: (a 5) Rows Removed by Filter: 5876 Buffers: shared hit=45 actual rows is large as twice as the estimated. tsm_system_cost estimates the number of the result rows using baserel-tuples, not using baserel-rows so it doesn't reflect the scan quals. Did the code come from some other intention? 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? Possibly, my thinking was that unlike the index_getnext() and heap_getnext(), this function would not be called from any other place so adding one more layer of abstraction didn't seem useful. And it would mean creating new ScanDesc struct, etc. I think adding additional abstraction would simplify the function and reduce the chance of discrepency, I think we need to almost create a duplicate version of code for heapgetpage() method. Yeah, I agree that we need to build structure like ScanDesc, but still it will be worth to keep code consistent. Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan. As a general opinion, I'll vote for Amit's comment, since three or four similar instances seems to be a enough reason to abstract it. On the other hand, since the scan processes are distributed in ExecProcNode by the nodeTag of scan nodes, reunioning of the control by abstraction layer after that could be said to introducing useless annoyance. In short, bastraction at the level of *Next() would bring the necessity of additional changes around the execution node distribution. Another one is PageIsAllVisible() check. Done. Another thing is don't we want to do anything for sync scans for these method's as they are doing heap scan? I don't follow this one tbh. Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c. Basically during sequiantial scan on same table by different backends, we attempt to keep them synchronized to reduce the I/O. Ah this, yes, it makes sense for bernoulli, not for system though. I guess it should be used for sampling methods that use SAS_SEQUENTIAL strategy. c. for bernoulli method, it will first get the tupoffset with the help of function and then apply visibility check, it seems that could reduce the sample set way lower than expected in case lot of tuples are not visible, shouldn't it be done in reverse way(first visibility check, then call function to see if we want to include it in the sample)? I think something similar is done in acquire_sample_rows which seems right to me. I don't think so, the way bernoulli works is that it returns every tuple with equal probability, so the visible tuples have same chance of being returned as the invisible ones so the issue should be smoothed away automatically (IMHO). How will it get smoothen for cases when let us say more than 50% of tuples are not visible. I think its due to Postgres non-overwritting storage architecture that we maintain multiple version of rows in same heap, otherwise for different kind of architecture (like mysql/oracle) where multiple row versions are not maintained in same heap, the same function (with same percentage) can return
Re: [HACKERS] TABLESAMPLE patch
On 28/01/15 08:23, Kyotaro HORIGUCHI wrote: Hi, I took a look on this and found nice. By the way, the parameter for REPEATABLE seems allowing to be a expression in ParseTableSample but the grammer rejects it. It wasn't my intention to support it, but you are correct, the code is generic enough that we can support it. The following change seems enough. Seems about right, thanks. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
Hi, I took a look on this and found nice. By the way, the parameter for REPEATABLE seems allowing to be a expression in ParseTableSample but the grammer rejects it. The following change seems enough. diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 4578b5e..8cf09d5 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10590,7 +10590,7 @@ tablesample_clause: ; opt_repeatable_clause: - REPEATABLE '(' AexprConst ')' { $$ = (Node *) $3; } + REPEATABLE '(' a_expr ')' { $$ = (Node *) $3; } | /*EMPTY*/ { $$ = NULL; } ; regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 17/01/15 13:46, Amit Kapila wrote: 3. @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Foreign table */ set_foreign_pathlist(root, rel, rte); } +else if (rte-tablesample != NULL) +{ +/* Build sample scan on relation */ +set_tablesample_rel_pathlist(root, rel, rte); +} Have you consider to have a different RTE for this? I assume you mean different RTEKind, yes I did, but the fact is that it's still a relation, just with different scan type and I didn't want to modify every piece of code which deals with RTE_RELATION to also deal with the new RTE for sample scan as it seems unnecessary. I am not strongly opinionated about this one though. No issues, but it seems we should check other paths where different handling could be required for tablesample scan. In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size()) for rel size estimates where it checks the presence of partial indexes and that might effect the size estimates and that doesn't seem to be required when we have to perform scan based on TableSample method. I think we can once check other places where some separate handling for (rte-inh) is done to see if we need some different handing for Tablesample scan. 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? Possibly, my thinking was that unlike the index_getnext() and heap_getnext(), this function would not be called from any other place so adding one more layer of abstraction didn't seem useful. And it would mean creating new ScanDesc struct, etc. I think adding additional abstraction would simplify the function and reduce the chance of discrepency, I think we need to almost create a duplicate version of code for heapgetpage() method. Yeah, I agree that we need to build structure like ScanDesc, but still it will be worth to keep code consistent. b. Also don't we want to handle pruning of page while scanning (heap_page_prune_opt()) and I observed in heap scan API's after visibility check we do check for serializable conflict (CheckForSerializableConflictOut()). Both good points, will add. Another one is PageIsAllVisible() check. Another thing is don't we want to do anything for sync scans for these method's as they are doing heap scan? I don't follow this one tbh. Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c. Basically during sequiantial scan on same table by different backends, we attempt to keep them synchronized to reduce the I/O. c. for bernoulli method, it will first get the tupoffset with the help of function and then apply visibility check, it seems that could reduce the sample set way lower than expected in case lot of tuples are not visible, shouldn't it be done in reverse way(first visibility check, then call function to see if we want to include it in the sample)? I think something similar is done in acquire_sample_rows which seems right to me. I don't think so, the way bernoulli works is that it returns every tuple with equal probability, so the visible tuples have same chance of being returned as the invisible ones so the issue should be smoothed away automatically (IMHO). How will it get smoothen for cases when let us say more than 50% of tuples are not visible. I think its due to Postgres non-overwritting storage architecture that we maintain multiple version of rows in same heap, otherwise for different kind of architecture (like mysql/oracle) where multiple row versions are not maintained in same heap, the same function (with same percentage) can return entirely different number of rows. The acquire_sample_rows has limit on number of rows it returns so that's why it has to do the visibility check before as the problem you described applies there. Even if in tablesample method, the argument value is in percentage that doesn't mean not to give any consideration to number of rows returned. The only difference I could see is with tablesample method the number of rows returned will not be accurate number. I think it is better to consider only visible rows. The reason for using the probability instead of tuple limit is the fact that there is no way to accurately guess number of tuples in the table at the beginning of scan. This method should actually be better at returning the correct number of tuples without dependence on how many of them are visible or not and how much space in blocks is used. 5. So above test yield 60% rows first time and 100% rows next time, when the test has requested 80%. I understand that sample percentage will result an approximate number of rows, however I just wanted that we should check if the implementation has any problem or not? I think this is caused by random
Re: [HACKERS] TABLESAMPLE patch
On 17/01/15 13:46, Amit Kapila wrote: On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote: In second patch which implements the TABLESAMPLE itself I changed the implementation of random generator because when I looked at the code again I realized the old one would produce wrong results if there were multiple TABLESAMPLE statements in same query or multiple cursors in same transaction. I have looked into this patch and would like to share my findings with you. That's a lot for this. 1. +static void +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { .. +/* +* There is only one plan to consider but we still need to set +* parameters for RelOptInfo. +*/ +set_cheapest(rel); } It seems we already call set_cheapest(rel) in set_rel_pathlist() which is a caller of set_tablesample_rel_pathlist(), so why do we need it inside set_tablesample_rel_pathlist()? Ah, this changed after I started working on this patch and I didn't notice - previously all the set_something_rel_pathlist called set_cheapest() individually. I will change the code. 2. +static void +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { .. +/* We only do sample scan if it was requested */ +add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer)); } Do we need to add_path, if there is only one path? Good point, we can probably just set the pathlist directly in this case. 3. @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Foreign table */ set_foreign_pathlist(root, rel, rte); } +else if (rte-tablesample != NULL) +{ +/* Build sample scan on relation */ +set_tablesample_rel_pathlist(root, rel, rte); +} Have you consider to have a different RTE for this? I assume you mean different RTEKind, yes I did, but the fact is that it's still a relation, just with different scan type and I didn't want to modify every piece of code which deals with RTE_RELATION to also deal with the new RTE for sample scan as it seems unnecessary. I am not strongly opinionated about this one though. 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? Possibly, my thinking was that unlike the index_getnext() and heap_getnext(), this function would not be called from any other place so adding one more layer of abstraction didn't seem useful. And it would mean creating new ScanDesc struct, etc. b. Also don't we want to handle pruning of page while scanning (heap_page_prune_opt()) and I observed in heap scan API's after visibility check we do check for serializable conflict (CheckForSerializableConflictOut()). Both good points, will add. Another thing is don't we want to do anything for sync scans for these method's as they are doing heap scan? I don't follow this one tbh. c. for bernoulli method, it will first get the tupoffset with the help of function and then apply visibility check, it seems that could reduce the sample set way lower than expected in case lot of tuples are not visible, shouldn't it be done in reverse way(first visibility check, then call function to see if we want to include it in the sample)? I think something similar is done in acquire_sample_rows which seems right to me. I don't think so, the way bernoulli works is that it returns every tuple with equal probability, so the visible tuples have same chance of being returned as the invisible ones so the issue should be smoothed away automatically (IMHO). The acquire_sample_rows has limit on number of rows it returns so that's why it has to do the visibility check before as the problem you described applies there. The reason for using the probability instead of tuple limit is the fact that there is no way to accurately guess number of tuples in the table at the beginning of scan. This method should actually be better at returning the correct number of tuples without dependence on how many of them are visible or not and how much space in blocks is used. 5. CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i; postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80); id 1 3 4 7 8 9 (6 rows) postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80); id 0 1 2 3 4 5 6 7 8 9 (10 rows) So above test yield 60% rows first time and 100% rows next time, when the test has requested 80%. I understand that sample percentage will result an approximate number of rows, however I just wanted that we should check if the implementation has any problem or not? I think this is caused by random generator not producing smooth enough random
Re: [HACKERS] TABLESAMPLE patch
On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek p...@2ndquadrant.com wrote: In second patch which implements the TABLESAMPLE itself I changed the implementation of random generator because when I looked at the code again I realized the old one would produce wrong results if there were multiple TABLESAMPLE statements in same query or multiple cursors in same transaction. I have looked into this patch and would like to share my findings with you. 1. +static void +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { .. + /* + * There is only one plan to consider but we still need to set + * parameters for RelOptInfo. + */ + set_cheapest(rel); } It seems we already call set_cheapest(rel) in set_rel_pathlist() which is a caller of set_tablesample_rel_pathlist(), so why do we need it inside set_tablesample_rel_pathlist()? 2. +static void +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { .. + /* We only do sample scan if it was requested */ + add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer)); } Do we need to add_path, if there is only one path? 3. @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Foreign table */ set_foreign_pathlist(root, rel, rte); } + else if (rte-tablesample != NULL) + { + /* Build sample scan on relation */ + set_tablesample_rel_pathlist(root, rel, rte); + } Have you consider to have a different RTE for this? 4. SampleNext() a. Isn't it better to code SampleNext() similar to SeqNext() and IndexNext(), which encapsulate the logic to get the tuple in another function(tbs_next() or something like that)? b. Also don't we want to handle pruning of page while scanning (heap_page_prune_opt()) and I observed in heap scan API's after visibility check we do check for serializable conflict (CheckForSerializableConflictOut()). Another thing is don't we want to do anything for sync scans for these method's as they are doing heap scan? c. for bernoulli method, it will first get the tupoffset with the help of function and then apply visibility check, it seems that could reduce the sample set way lower than expected in case lot of tuples are not visible, shouldn't it be done in reverse way(first visibility check, then call function to see if we want to include it in the sample)? I think something similar is done in acquire_sample_rows which seems right to me. 5. CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10); INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i; postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80); id 1 3 4 7 8 9 (6 rows) postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80); id 0 1 2 3 4 5 6 7 8 9 (10 rows) So above test yield 60% rows first time and 100% rows next time, when the test has requested 80%. I understand that sample percentage will result an approximate number of rows, however I just wanted that we should check if the implementation has any problem or not? In this regard, I have one question related to below code: +Datum +tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS) +{ .. + /* Every tuple has percent chance of being returned */ + while (sampler_random_fract(sampler-randstate) samplesize) + { + tupoffset++; + + if (tupoffset maxoffset) + break; + } Why are we not considering tuple in above code if tupoffset is less than maxoffset? 6. One larger question about the approach used in patch, why do you think it is better to have a new node (SampleScan/SamplePath) like you have used in patch instead of doing it as part of Sequence Scan. I agree that we need some changes in different parts of Sequence Scan execution, but still sounds like a viable way. One simple thing that occurs to me is that in ExecSeqScan(), we can use something like SampleSeqNext instead of SeqNext to scan heap in a slightly different way, probably doing it as part of sequence scan will have advantage that we can use most of the existing infrastructure (sequence node path) and have less discrepancies as mentioned in point-4. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TABLESAMPLE patch
On Fri, Jan 9, 2015 at 1:10 AM, Petr Jelinek p...@2ndquadrant.com wrote: On 06/01/15 14:22, Petr Jelinek wrote: On 06/01/15 08:51, Michael Paquier wrote: On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com wrote: Attached is v3 which besides the fixes mentioned above also includes changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for crash with FETCH FIRST and is rebased against current master. This patch needs a rebase, there is a small conflict in parallel_schedule. Sigh, I really wish we had automation that checks this automatically for patches in CF. Here is rebase against current master. Thanks! Structurally speaking, I think that the tsm methods should be added in src/backend/utils and not src/backend/access which is more high-level as tsm_bernoulli.c and tsm_system.c contain only a set of new I am not sure if I parsed this correctly, do you mean to say that only low-level access functions belong to src/backend/access? Makes sense. Made this change. procedure functions. Having a single header file tsm.h would be also a good thing. Moved into single tablesample.h file. Regarding the naming, is tsm (table sample method) really appealing? Wouldn't it be better to use simply tablesample_* for the file names and the method names? I created the src/backend/tablesample and files are named just system.c and bernoulli.c, but I kept tsm_ prefix for methods as they become too long for my taste when prefixing with tablesample_. This is a large patch... Wouldn't sampling.[c|h] extracted from ANALYZE live better as a refactoring patch? This would limit a bit bug occurrences on the main patch. That's a good idea, I'll split it into patch series. I've split the sampling.c/h into separate patch. I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as separate patch in the attached patch-set. This also includes modules test with simple custom tablesample method. There are some very minor cleanups in the main tablesample code itself but no functional changes. Some comments about the 1st patch: 1) Nitpicking: + * Block sampling routines shared by ANALYZE and TABLESAMPLE. TABLESAMPLE is not added yet. You may as well mention simplify this description with something like Sampling routines for relation blocks. 2) file_fdw and postgres_fdw do not compile correctly as they still use anl_random_fract. This function is still mentioned in vacuum.h as well. 3) Not really an issue of this patch, but I'd think that this comment should be reworked: + * BlockSampler is used for stage one of our new two-stage tuple + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject + * Large DB). It selects a random sample of samplesize blocks out of + * the nblocks blocks in the table. If the table has less than + * samplesize blocks, all blocks are selected. 4) As a refactoring patch, why is the function providing a random value changed? Shouldn't sample_random_fract be consistent with anl_random_fract? 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and RAND48_SEED_2 instead of being hardcoded? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com wrote: Attached is v3 which besides the fixes mentioned above also includes changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for crash with FETCH FIRST and is rebased against current master. This patch needs a rebase, there is a small conflict in parallel_schedule. Structurally speaking, I think that the tsm methods should be added in src/backend/utils and not src/backend/access which is more high-level as tsm_bernoulli.c and tsm_system.c contain only a set of new procedure functions. Having a single header file tsm.h would be also a good thing. Regarding the naming, is tsm (table sample method) really appealing? Wouldn't it be better to use simply tablesample_* for the file names and the method names? This is a large patch... Wouldn't sampling.[c|h] extracted from ANALYZE live better as a refactoring patch? This would limit a bit bug occurrences on the main patch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 21/12/14 18:38, Tomas Vondra wrote: Hi, On 18.12.2014 13:14, Petr Jelinek wrote: Hi, v2 version of this patch is attached. I did a review of this v2 patch today. I plan to do a bit more testing, but these are my comments/questions so far: Thanks for looking at it! (0) There's a TABLESAMPLE page at the wiki, not updated since 2012: https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation We should either update it or mark it as obsolete I guess. Also, I'd like to know what's the status regarding the TODO items mentioned there. Are those still valid with this patch? I will have to look in more detail over the holidays and update it, but the general info about table sampling there applies and will apply to any patch trying to implement it. Quick look on the mail thread suggest that at least the concerns mentioned in the mailing list should not apply to this implementation. And looking at the patch the problem with BERNOULLI sampling should not exist either as I use completely different implementation for that. I do also have some issues with joins which I plan to look at but it's different one (my optimizer code overestimates the number of rows) (1) The patch adds a new catalog, but does not bump CATVERSION. I thought this was always done by committer? (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward, as it squishes everything into a single chunk. That's inconsistent with naming of the other catalogs. I think pg_table_sample_method would be better. Fair point, but perhaps pg_tablesample_method then as tablesample is used as single word everywhere including the standard. (3) There are a few more strange naming decisions, but that's mostly because of the SQL standard requires that naming. I mean SYSTEM and BERNOULLI method names, and also the fact that the probability is specified as 0-100 value, which is inconsistent with other places (e.g. percentile_cont uses the usual 0-1 probability notion). But I don't think this can be fixed, that's what the standard says. Yeah, I don't exactly love that either but what standard says, standard says. (4) I noticed there's an interesting extension in SQL Server, which allows specifying PERCENT or ROWS, so you can say SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT); or SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS); That seems handy, and it'd make migration from SQL Server easier. What do you think? Well doing it exactly this way somewhat kills the extensibility which was one of the main goals for me - I allow any kind of parameters for sampling and the handling of those depends solely on the sampling method. This means that in my approach if you'd want to change what you are limiting you'd have to write new sampling method and the query would then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500); or some such (depending on how you name the sampling method). Or SELECT * FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend the SYSTEM sampling method, that would be also doable. The reason for this is that I don't want to really limit too much what parameters can be send to sampling (I envision even SYSTEM_TIMED sampling method that will get limit as time interval for example). (5) I envision a lot of confusion because of the REPEATABLE clause. With READ COMMITTED, it's not really repeatable because of changes done by the other users (and maybe things like autovacuum). Shall we mention this in the documentation? Yes docs need improvement and this should be mentioned, also code-docs - I found few places which I forgot to update when changing code and now have misleading comments. (6) This seems slightly wrong, because of long/uint32 mismatch: long seed = PG_GETARG_UINT32(1); I think uint32 would be more appropriate, no? Probably, although I need long later in the algorithm anyway. (7) NITPICKING: I think a 'sample_rate' would be a better name here: double percent = sampler-percent; Ok. (8) NITPICKING: InitSamplingMethod contains a command with ';;' fcinfo.arg[i] = (Datum) 0;; Yeah :) (9) The current regression tests only use the REPEATABLE cases. I understand queries without this clause are RANDOM, but maybe we could do something like this: SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM ( SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) ) foo; Granted, there's still a small probability of false positive, but maybe that's sufficiently small? Or is the amount of code this tests negligible? Well, depending on fillfactor and limit it could be made quite reliable I think, I also want to add test with VIEW (I think v2 has a bug there) and perhaps some JOIN. (10) In the initial patch you mentioned it's possible to write custom
Re: [HACKERS] TABLESAMPLE patch
On 22.12.2014 10:07, Petr Jelinek wrote: On 21/12/14 18:38, Tomas Vondra wrote: (1) The patch adds a new catalog, but does not bump CATVERSION. I thought this was always done by committer? Right. Sorry for the noise. (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward, as it squishes everything into a single chunk. That's inconsistent with naming of the other catalogs. I think pg_table_sample_method would be better. Fair point, but perhaps pg_tablesample_method then as tablesample is used as single word everywhere including the standard. Sounds good. (4) I noticed there's an interesting extension in SQL Server, which allows specifying PERCENT or ROWS, so you can say SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT); or SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS); That seems handy, and it'd make migration from SQL Server easier. What do you think? Well doing it exactly this way somewhat kills the extensibility which was one of the main goals for me - I allow any kind of parameters for sampling and the handling of those depends solely on the sampling method. This means that in my approach if you'd want to change what you are limiting you'd have to write new sampling method and the query would then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500); or some such (depending on how you name the sampling method). Or SELECT * FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend the SYSTEM sampling method, that would be also doable. The reason for this is that I don't want to really limit too much what parameters can be send to sampling (I envision even SYSTEM_TIMED sampling method that will get limit as time interval for example). Good point. (6) This seems slightly wrong, because of long/uint32 mismatch: long seed = PG_GETARG_UINT32(1); I think uint32 would be more appropriate, no? Probably, although I need long later in the algorithm anyway. Really? ISTM the sampler_setseed() does not really require long, uint32 would work exactly the same. (9) The current regression tests only use the REPEATABLE cases. I understand queries without this clause are RANDOM, but maybe we could do something like this: SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM ( SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) ) foo; Granted, there's still a small probability of false positive, but maybe that's sufficiently small? Or is the amount of code this tests negligible? Well, depending on fillfactor and limit it could be made quite reliable I think, I also want to add test with VIEW (I think v2 has a bug there) and perhaps some JOIN. OK. (10) In the initial patch you mentioned it's possible to write custom sampling methods. Do you think a CREATE TABLESAMPLE METHOD, allowing custom methods implemented as extensions would be useful? Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since that's just simple mechanical work with no hard problems to solve there I can add it later once we have agreement on the general approach of the patch (especially in the terms of extensibility). OK, good to know. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote: Hi, v2 version of this patch is attached. a few more tests revealed that passing null as the sample size argument works, and it shouldn't. in repeatable it gives an error if i use null as argument but it gives a syntax error, and it should be a data exception (data exception -- invalid repeat argument in a sample clause) according to the standard also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a big table and had to wait a long time for it to finish regression=# select count(1) from tenk1 tablesample system (null); count --- 28 (1 row) regression=# select count(1) from tenk1 tablesample bernoulli (null); count --- 0 (1 row) -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On 22/12/14 20:14, Jaime Casanova wrote: On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote: Hi, v2 version of this patch is attached. a few more tests revealed that passing null as the sample size argument works, and it shouldn't. Fixed. in repeatable it gives an error if i use null as argument but it gives a syntax error, and it should be a data exception (data exception -- invalid repeat argument in a sample clause) according to the standard Also fixed. also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a big table and had to wait a long time for it to finish Ah yeah, I can't rely on CHECK_FOR_INTERRUPTS in ExecScan because it might take a while to fetch a row if percentage is very small and table is big... Fixed. Attached is v3 which besides the fixes mentioned above also includes changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for crash with FETCH FIRST and is rebased against current master. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 01d24a5..250ae29 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase -[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ] +[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ TABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ] ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ] [ LATERAL ] ( replaceable class=parameterselect/replaceable ) [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] replaceable class=parameterwith_query_name/replaceable [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ] [ LATERAL ] replaceable class=parameterfunction_name/replaceable ( [ replaceable class=parameterargument/replaceable [, ...] ] ) @@ -317,6 +317,38 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ] /varlistentry varlistentry + termTABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ]/term + listitem + para +Table sample clause after +replaceable class=parametertable_name/replaceable indicates that +a replaceable class=parametersampling_method/replaceable should +be used to retrieve subset of rows in the table. +The replaceable class=parametersampling_method/replaceable can be +one of: +itemizedlist + listitem + paraliteralSYSTEM/literal/para + /listitem + listitem + paraliteralBERNOULLI/literal/para + /listitem +/itemizedlist +Both of those sampling methods currently accept only single argument +which is the percent (floating point from 0 to 100) of the rows to +be returned. +The literalSYSTEM/literal sampling method does block level +sampling with each block having same chance of being selected and +returns all rows from each selected block. +The literalBERNOULLI/literal scans whole table and returns +individual rows with equal probability. +The optional numeric parameter literalREPEATABLE/literal is used +as random seed for sampling. + /para + /listitem + /varlistentry + + varlistentry termreplaceable class=parameteralias/replaceable/term listitem para diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile index 21721b4..595737c 100644 --- a/src/backend/access/Makefile +++ b/src/backend/access/Makefile @@ -8,6 +8,7 @@ subdir = src/backend/access top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist transam +SUBDIRS = brin common gin gist hash heap index nbtree rmgrdesc spgist \ + transam tsm include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile new file mode 100644 index 000..73bbbd7 --- /dev/null +++ b/src/backend/access/tsm/Makefile @@ -0,0 +1,17 @@
Re: [HACKERS] TABLESAMPLE patch
Hi, On 18.12.2014 13:14, Petr Jelinek wrote: Hi, v2 version of this patch is attached. I did a review of this v2 patch today. I plan to do a bit more testing, but these are my comments/questions so far: (0) There's a TABLESAMPLE page at the wiki, not updated since 2012: https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation We should either update it or mark it as obsolete I guess. Also, I'd like to know what's the status regarding the TODO items mentioned there. Are those still valid with this patch? (1) The patch adds a new catalog, but does not bump CATVERSION. (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward, as it squishes everything into a single chunk. That's inconsistent with naming of the other catalogs. I think pg_table_sample_method would be better. (3) There are a few more strange naming decisions, but that's mostly because of the SQL standard requires that naming. I mean SYSTEM and BERNOULLI method names, and also the fact that the probability is specified as 0-100 value, which is inconsistent with other places (e.g. percentile_cont uses the usual 0-1 probability notion). But I don't think this can be fixed, that's what the standard says. (4) I noticed there's an interesting extension in SQL Server, which allows specifying PERCENT or ROWS, so you can say SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT); or SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS); That seems handy, and it'd make migration from SQL Server easier. What do you think? (5) I envision a lot of confusion because of the REPEATABLE clause. With READ COMMITTED, it's not really repeatable because of changes done by the other users (and maybe things like autovacuum). Shall we mention this in the documentation? (6) This seems slightly wrong, because of long/uint32 mismatch: long seed = PG_GETARG_UINT32(1); I think uint32 would be more appropriate, no? (7) NITPICKING: I think a 'sample_rate' would be a better name here: double percent = sampler-percent; (8) NITPICKING: InitSamplingMethod contains a command with ';;' fcinfo.arg[i] = (Datum) 0;; (9) The current regression tests only use the REPEATABLE cases. I understand queries without this clause are RANDOM, but maybe we could do something like this: SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM ( SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50) ) foo; Granted, there's still a small probability of false positive, but maybe that's sufficiently small? Or is the amount of code this tests negligible? (10) In the initial patch you mentioned it's possible to write custom sampling methods. Do you think a CREATE TABLESAMPLE METHOD, allowing custom methods implemented as extensions would be useful? regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESAMPLE patch
On Mon, Dec 22, 2014 at 2:38 AM, Tomas Vondra t...@fuzzy.cz wrote: (1) The patch adds a new catalog, but does not bump CATVERSION. FWIW, this part is managed by the committer when this patch is picked up. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers