Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Tomas Vondrawrites: >>> [ scalarineqsel may fall over when used by extension operators ] > What about using two-pronged approach: > 1) fall back to mid bucket in back branches (9.3 - 10) > 2) do something smarter (along the lines you outlined) in PG11 Sure. We need to test the fallback case anyway. >> [ sketch of a more extensible design ] > Sounds reasonable to me, I guess - I can't really think about anything > simpler giving us the same flexibility. Actually, on further thought, that's still too simple. If you look at convert_string_to_scalar() you'll see it's examining all three values concurrently (the probe value, of one datatype, and two bin boundary values of possibly a different type). The reason is that it's stripping off whatever common prefix those strings have before trying to form a numeric equivalent. While certainly convert_string_to_scalar is pretty stupid in the face of non-ASCII sort orders, the prefix-stripping is something I really don't want to give up. So the design I sketched of considering each value totally independently isn't good enough. We could, perhaps, provide an API that lets an operator estimation function replace convert_to_scalar in toto, but that seems like you'd still end up duplicating code in many cases. Not sure about how to find a happy medium. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
On 09/21/2017 04:24 PM, Tom Lane wrote: > Tomas Vondrawrites: >> [ scalarineqsel may fall over when used by extension operators ] > > I concur with your thought that we could have > ineq_histogram_selectivity fall back to a "mid bucket" default if > it's working with a datatype it is unable to convert_to_scalar. But I > think if we're going to touch this at all, we ought to have higher > ambition than that, and try to provide a mechanism whereby an > extension that's willing to work a bit harder could get that > additional increment of estimation accuracy. I don't care for this > way to do that: > The question is - do we need a solution that is back-patchable? This means we can't really use FIXEDDECIMAL without writing effectively copying a lot of the selfuncs.c stuff, only to make some minor fixes. What about using two-pronged approach: 1) fall back to mid bucket in back branches (9.3 - 10) 2) do something smarter (along the lines you outlined) in PG11 I'm willing to spend some time on both, but (2) alone is not a particularly attractive for us as it only helps PG11 and we'll have to do the copy-paste evil anyway to get the data type working on back branches. >> * Make convert_numeric_to_scalar smarter, so that it checks if >> there is an implicit cast to numeric, and fail only if it does not >> find one. > > because it's expensive, and it only works for numeric-category cases, > and it will fail outright for numbers outside the range of "double". > (Notice that convert_numeric_to_scalar is *not* calling the regular > cast function for numeric-to-double.) Moreover, any operator ought to > know what types it can accept; we shouldn't have to do more catalog > lookups to figure out what to do. > Ah, I see. I haven't realized it's not using the regular cast functions, but now that you point that out it's clear relying on casts would fail. > Now that scalarltsel and friends are just trivial wrappers for a > common function, we could imagine exposing scalarineqsel_wrapper as a > non-static function, with more arguments (and perhaps a better-chosen > name ;-)). The idea would be for extensions that want to go this > extra mile to provide their own selectivity estimation functions, > which again would just be trivial wrappers for the core function, but > would provide additional knowledge through additional arguments. > > The additional arguments I'm envisioning are a couple of C function > pointers, one function that knows how to convert the operator's > left-hand input type to scalar, and one function that knows how to > convert the right-hand type to scalar. (Identical APIs of course.) > Passing a NULL would imply that the core code must fall back on its > own devices for that input. > > Now the thing about convert_to_scalar is that there are several > different conversion conventions already (numeric, string, timestamp, > ...), and there probably could be more once extension types are > coming to the party. So I'm imagining that the API for these > conversion functions could be something like > > bool convert(Datum value, Oid valuetypeid, >int *conversion_convention, double *converted_value); > > The conversion_convention output would make use of some agreed-on > constants, say 1 for numeric, 2 for string, etc etc. In the core > code, if either converter fails (returns false) or if they don't > return the same conversion_convention code, we give up and use the > mid-bucket default. If they both produce results with the same > conversion_convention, then we can treat the converted_values as > commensurable. > OK, so instead of re-implementing the whole function, we'd essentially do just something like this: #typedef bool (*convert_callback) (Datum value, Oid valuetypeid, \ int *conversion_convention, \ double *converted_value); double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, VariableStatData *vardata, Datum constval, Oid consttype, convert_calback convert); and then, from the extension double scalarineqsel_wrapper(PlannerInfo *root, Oid operator, bool isgt, bool iseq, VariableStatData *vardata, Datum constval, Oid consttype) { return scalarineqsel(root, operator, isgt, iseq, vardata, constval, consttype, my_convert_func); } Sounds reasonable to me, I guess - I can't really think about anything simpler giving us the same flexibility. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Tomas Vondrawrites: > [ scalarineqsel may fall over when used by extension operators ] I concur with your thought that we could have ineq_histogram_selectivity fall back to a "mid bucket" default if it's working with a datatype it is unable to convert_to_scalar. But I think if we're going to touch this at all, we ought to have higher ambition than that, and try to provide a mechanism whereby an extension that's willing to work a bit harder could get that additional increment of estimation accuracy. I don't care for this way to do that: > * Make convert_numeric_to_scalar smarter, so that it checks if there is > an implicit cast to numeric, and fail only if it does not find one. because it's expensive, and it only works for numeric-category cases, and it will fail outright for numbers outside the range of "double". (Notice that convert_numeric_to_scalar is *not* calling the regular cast function for numeric-to-double.) Moreover, any operator ought to know what types it can accept; we shouldn't have to do more catalog lookups to figure out what to do. Now that scalarltsel and friends are just trivial wrappers for a common function, we could imagine exposing scalarineqsel_wrapper as a non-static function, with more arguments (and perhaps a better-chosen name ;-)). The idea would be for extensions that want to go this extra mile to provide their own selectivity estimation functions, which again would just be trivial wrappers for the core function, but would provide additional knowledge through additional arguments. The additional arguments I'm envisioning are a couple of C function pointers, one function that knows how to convert the operator's left-hand input type to scalar, and one function that knows how to convert the right-hand type to scalar. (Identical APIs of course.) Passing a NULL would imply that the core code must fall back on its own devices for that input. Now the thing about convert_to_scalar is that there are several different conversion conventions already (numeric, string, timestamp, ...), and there probably could be more once extension types are coming to the party. So I'm imagining that the API for these conversion functions could be something like bool convert(Datum value, Oid valuetypeid, int *conversion_convention, double *converted_value); The conversion_convention output would make use of some agreed-on constants, say 1 for numeric, 2 for string, etc etc. In the core code, if either converter fails (returns false) or if they don't return the same conversion_convention code, we give up and use the mid-bucket default. If they both produce results with the same conversion_convention, then we can treat the converted_values as commensurable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Hi, while testing a custom data type FIXEDDECIMAL [1], implementing a numeric-like data type with limited range, I ran into a several issues that I suspect may not be entirely intentional / expected behavior. [1] https://github.com/2ndQuadrant/fixeddecimal Attached is a minimal subset of the extension SQL definition, which may be more convenient when looking into the issue. The most important issue is that when planning a simple query, the estimation of range queries on a column with the custom data type fails like this: test=# create table t (a fixeddecimal); CREATE TABLE test=# insert into t select random() from generate_series(1,10); INSERT 0 10 test=# analyze t; ANALYZE test=# select * from t where a > 0.9; ERROR: unsupported type: 16385 The error message here comes from convert_numeric_to_scalar, which gets called during histogram processing (ineq_histogram_selectivity) when approximating the histogram. convert_to_scalar does this: switch (valuetypeid) { ... case NUMERICOID: ... *scaledvalue = convert_numeric_to_scalar(value, valuetypid); *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid); *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid); return true; ... } The first call works fine, as the constant really is numeric (valuetypeid=1700). But the histogram boundaries are using the custom data type, causing the error as convert_numeric_to_scalar expects only a bunch of hard-coded data types. So it's pretty much guaranteed to fail with any user-defined data type. This seems a bit unfortunate :-( One solution would be to implement custom estimation function, replacing scalarltsel/scalargtsel. But that seems rather unnecessary, especially considering there is an implicit cast from fixeddecimal to numeric. Another thing is that when there's just an MCV, the estimation works just fine. So I see two basic ways to fix this: * Make convert_numeric_to_scalar smarter, so that it checks if there is an implicit cast to numeric, and fail only if it does not find one. * Make convert_to_scalar smarter, so that it does return false for unexpected data types, so that ineq_histogram_selectivity uses the default estimate of 0.5 (for that one bucket). Both options seem more favorable than what's happening currently. Is there anything I missed, making those fixes unacceptable? If anything, the fact that MCV estimates work while histogram does not makes this somewhat unpredictable - a change in the data distribution (or perhaps even just a different sample in ANALYZE) may result in sudden failures. I ran into one additional strange thing while investigating this. The attached SQL script defines two operator classes - fixeddecimal_ops and fixeddecimal_numeric_ops, defining (fixeddecimal,fixeddecimal) and (fixeddecimal,numeric) operators. Dropping one of those operator classes changes the estimates in a somewhat suspicious ways. When only fixeddecimal_ops is defined, we get this: test=# explain select * from t where a > 0.1; QUERY PLAN Seq Scan on t (cost=0.00..1943.00 rows=3 width=8) Filter: ((a)::numeric > 0.1) (2 rows) That is, we get the default estimate for inequality clauses, 33%. But when only fixeddecimal_numeric_ops, we get this: test=# explain select * from t where a > 0.1; QUERY PLAN Seq Scan on t (cost=0.00..1693.00 rows=5 width=8) Filter: (a > 0.1) (2 rows) That is, we get 50% estimate, because that's what scalarineqsel uses when it ineq_histogram_selectivity can't compute selectivity from a histogram for some reason. Wouldn't it make it more sense to use the default 33% estimate here? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services fixeddecimal-minimal.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers