Re: [HACKERS] Add Roman numeral conversion to to_number
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:tested, passed Applied clean and runs fine. Previous comments were addressed. There's been some discussion in the thread about whether this is worth doing, but given that it is done it seems a bit of a moot point. So passing this off to the committers to make a final call. The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add Roman numeral conversion to to_number
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested Code looks fine, but one niggly complaint at line 146 of the patch file ("while (*cp) {"). A K&R style brace slipped in, which doesn't match the formatting of the file. Given that this is providing new formatting options, there should be new tests added that validate the options and error handling. There's also the "do we want this?" debate from the discussion thread that still needs to be resolved. (I don't have an opinion either way.) I'm sending this back to the author to address the first two issues. The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small code improvement for btree
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Looks good to me. The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cached plans and statement generalization
> > A naive option would be to invalidate anything that depends on table or > view *.FOOBAR. You could probably make it a bit smarter by also requiring > that schema A appear in the path. > This has been rumbling around in my head. I wonder if you could solve this problem by registering dependencies on objects which don't yet exist. Consider: CREATE TABLE C.T1(...); CREATE TABLE C.T2(...); SET search_path='A,B,C,D'; SELECT * FROM C.T1, T2; For T1, you'd register a hard dependency on C.T1 and no virtual dependencies since the table is explicitly qualified. For T2, you'd register a hard dependency on C.T2 since that is the table that was selected for the query. You'd also register virtual dependencies on A.T2 and B.T2 since if either of those tables (or views) are created you need to recompile the statement. (Note that no virtual dependency is created on D.T2() since that table would never be selected by the compiler.) The catch is that virtual dependencies would have to be recorded and searched as strings, not OIDs since the objects don't exist. Virtual dependencies only have to be checked during CREATE processing though, so that might not be too bad. But this is getting off topic - I just wanted to capture the idea while it was rumbling around.
Re: [HACKERS] Cached plans and statement generalization
> > (FWIW, on this list we don't do top-quotes) > I know. Forgot and just did "reply all". My bad. It's not always that simple, at least in postgres, unless you disregard > search_path. Consider e.g. cases like > > CREATE SCHEMA a; > CREATE SCHEMA b; > CREATE TABLE a.foobar(somecol int); > SET search_patch = 'b,a'; > SELECT * FROM foobar; > CREATE TABLE b.foobar(anothercol int); > SELECT * FROM foobar; -- may not be cached plan from before! > > it sounds - my memory of DB2 is very faint, and I never used it much - > like similar issues could arise in DB2 too? > DB2 does handle this case. Unfortunately I don't know the details of how it worked though. A naive option would be to invalidate anything that depends on table or view *.FOOBAR. You could probably make it a bit smarter by also requiring that schema A appear in the path. - Doug
Re: [HACKERS] Cached plans and statement generalization
Plan invalidation was no different than for any SQL statement. DB2 keeps a list of the objects the statement depends on. If any of the objects changes in an incompatible way the plan is invalidated and kicked out of the cache. I suspect what is more interesting is plan lookup. DB2 has something called the "compilation environment". This is a collection of everything that impacts how a statement is compiled (SQL path, optimization level, etc.). Plan lookup is done using both the statement text and the compilation environment. So, for example, if my path is DOUG, MYTEAM, SYSIBM and your path is ANDRES, MYTEAM, SYSIBM we will have different compilation environments. If we both issue "SELECT * FROM T" we'll end up with different cache entries even if T in both of our statements resolves to MYTEAM.T. If I execute "SELECT * FROM T", change my SQL path and then execute "SELECT * FROM T" again, I have a new compilation environment so the second invocation of the statement will create a new entry in the cache. The first entry is not kicked out - it will still be there for re-use if I change my SQL path back to my original value (modulo LRU for cache memory management of course). With literal replacement, the cache entry is on the modified statement text. Given the modified statement text and the compilation environment, you're guaranteed to get the right plan entry. On Tue, Apr 25, 2017 at 2:47 PM Andres Freund wrote: > On 2017-04-25 21:11:08 +, Doug Doole wrote: > > When I did this in DB2, I didn't use the parser - it was too expensive. I > > just tokenized the statement and used some simple rules to bypass the > > invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd > > disallow replacement replacement until I hit the end of the current > > subquery or statement. > > How did you manage plan invalidation and such? > > - Andres >
Re: [HACKERS] Cached plans and statement generalization
When I did this in DB2, I didn't use the parser - it was too expensive. I just tokenized the statement and used some simple rules to bypass the invalid cases. For example, if I saw the tokens "ORDER" and "BY" then I'd disallow replacement replacement until I hit the end of the current subquery or statement. There are a few limitations to this approach. For example, DB2 allowed you to cast using function notation like VARCHAR(foo, 10). This meant I would never replace the second parameter of any VARCHAR function. Now it's possible that when the statement was fully compiled we'd find that VARCHAR(foo,10) actually resolved to BOB.VARCHAR() instead of the built-in cast function. Our thinking that these cases were rare enough that we wouldn't worry about them. (Of course, PostgreSQL's ::VARCHAR(10) syntax avoids this problem completely.) Because SQL is so structured, the implementation ended up being quite simple (a few hundred line of code) with no significant maintenance issues. (Other developers had no problem adding in new cases where constants had to be preserved.) The simple tokenizer was also fairly extensible. I'd prototyped using the same code to also normalize statements (uppercase all keywords, collapse whitespace to a single blank, etc.) but that feature was never added to the product. - Doug On Tue, Apr 25, 2017 at 1:47 PM Konstantin Knizhnik < k.knizh...@postgrespro.ru> wrote: > On 04/25/2017 11:40 PM, Serge Rielau wrote: > > > On Apr 25, 2017, at 1:37 PM, Konstantin Knizhnik < > k.knizh...@postgrespro.ru> wrote: > > > SELECT ‘hello’::CHAR(10) || ‘World’, 5 + 6; > > You can substitute ‘hello’, ‘World’, 5, and 6. But not 10. > > > I am substituting only string literals. So the query above will be > transformed to > > SELECT $1::CHAR(10) || $2, 5 + 6; > > What's wrong with it? > > > Oh, well that leaves a lot of opportunities on the table, doesn’t it? > > > Well, actually my primary intention was not to make badly designed > programs (not using prepared statements) work faster. > I wanted to address cases when it is not possible to use prepared > statements. > If we want to substitute with parameters as much literals as possible, > then parse+deparse tree seems to be the only reasonable approach. > I will try to implement it also, just to estimate parsing overhead. > > > > > Cheers > Serge > > > > -- > Konstantin Knizhnik > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company > >
Re: [HACKERS] Improving executor performance
Attached (in patch 0003) is a proof-of-concept implementing an expression evalution framework that doesn't use recursion. Instead ExecInitExpr2 computes a number of 'steps' necessary to compute an expression. These steps are stored in a linear array, and executed one after another (save boolean expressions, which can jump to later steps). E.g. to compute colname = 1 the steps are 1) fetch var, 2) evaluate const, 3) call function. We've been having trouble with the performance of simple expressions in PLpgSQL so I started playing with this patch. (No sense re-inventing the wheel after all.) It was straightforward to extend to simple expressions and showed an immediate improvement (~10% faster on a simple test). Running in our full environment highlighted a few areas that I think are worth more investigation. However, before I tackle that, is the posted proof-of-concept still the latest and greatest? If not, any chance of getting the latest? Going forward, I'd be happy to collaborate on our efforts. - Doug Doole Salesforce -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving executor performance
On 07/13/2016 06:18 PM, Andres Freund wrote: Attached (in patch 0003) is a proof-of-concept implementing an expression evalution framework that doesn't use recursion. Instead ExecInitExpr2 computes a number of 'steps' necessary to compute an expression. These steps are stored in a linear array, and executed one after another (save boolean expressions, which can jump to later steps). E.g. to compute colname = 1 the steps are 1) fetch var, 2) evaluate const, 3) call function. We've been having trouble with the performance of simple expressions in PLpgSQL so I started playing with this patch. (No sense re-inventing the wheel after all.) It was straightforward to extend to simple expressions and showed an immediate improvement (~10% faster on a simple test). Running in our full environment highlighted a few areas that I think are worth more investigation. However, before I tackle that, is the posted proof-of-concept still the latest and greatest? If not, any chance of getting the latest? Going forward, I'd like to collaborate on our efforts if you're interested. - Doug Doole Salesforce -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ICU integration
> > This isn't a problem for Postgres, or at least wouldn't be right now, > because we don't have case insensitive collations. I was wondering if Postgres might be that way. It does avoid the RI constraint problem, but there are still troubles with range based predicates. (My previous project wanted case/accent insensitive collations, so we got to deal with it all.) > So, we use a strcmp()/memcmp() tie-breaker when strcoll() indicates > equality, while also making the general notion of text equality actually > mean binary equality. We used a similar tie breaker in places. (e.g. Index keys needed to be identical, not just equal. We also broke ties in sort to make its behaviour more deterministic.) I would like to get case insensitive collations some day, and was > really hoping that ICU would help. That being said, the need for a > strcmp() tie-breaker makes that hard. Oh well. > Prior to adding ICU to my previous project, it had the assumption that equal meant identical as well. It turned out to be a lot easier to break this assumption than I expected, but that code base had religiously used its own string comparison function for user data - strcmp()/memcmp() was never called for user data. (I don't know if the same can be said for Postgres.) We found that very few places needed to be aware of values that were equal but not identical. (Index and sort were the big two.) Hopefully Postgres will be the same. -- Doug Doole
Re: [HACKERS] ICU integration
> I understand that in principle, but I don't see operating system > providers shipping a bunch of ICU versions to facilitate that. They > will usually ship one. Yep. If you want the protection I've described, you can't depend on the OS copy of ICU. You need to have multiple ICU libraries that are named/installed such that you can load the specific version you want. It also means that you can have dependencies on versions of ICU that are no longer supported. (In my previous project, we were shipping 3 copies of the ICU library, going back to 2.x. Needless to say, we didn't add support for every drop of ICU.) On Wed, Sep 7, 2016 at 5:53 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 9/6/16 1:40 PM, Doug Doole wrote: > > We carried the ICU version numbers around on our collation and locale > > IDs (such as fr_FR%icu36) . The database would load multiple versions of > > the ICU library so that something created with ICU 3.6 would always be > > processed with ICU 3.6. This avoided the problems of trying to change > > the rules on the user. (We'd always intended to provide tooling to allow > > the user to move an existing object up to a newer version of ICU, but we > > never got around to doing it.) In the code, this meant we were > > explicitly calling the versioned API so that we could keep the calls > > straight. > > I understand that in principle, but I don't see operating system > providers shipping a bunch of ICU versions to facilitate that. They > will usually ship one. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] ICU integration
> The ICU ABI (not API) is also versioned. The way that this is done is > that all functions are actually macros to a versioned symbol. So > ucol_open() is actually a macro that expands to, say, ucol_open_57() in > ICU version 57. (They also got rid of a dot in their versions a while > ago.) It's basically hand-crafted symbol versioning. That way, you can > link with multiple versions of ICU at the same time. However, the > purpose of that, as I understand it, is so that plugins can have a > different version of ICU loaded than the main process or another plugin. > In terms of postgres using the right version of ICU, it doesn't buy > anything beyond what the soname mechanism does. You can access the versioned API as well, it's just not documented. (The ICU team does support this - we worked very closely with them when doing all this.) We exploited the versioned API when we learned that there is no guarantee of backwards compatibility in collations. You can't just change a collation under a user (at least that was our opinion) since it can cause all sorts of problems. Refreshing a collation (especially on the fly) is a lot more work than we were prepared to take on. So we exploited the versioned APIs. We carried the ICU version numbers around on our collation and locale IDs (such as fr_FR%icu36) . The database would load multiple versions of the ICU library so that something created with ICU 3.6 would always be processed with ICU 3.6. This avoided the problems of trying to change the rules on the user. (We'd always intended to provide tooling to allow the user to move an existing object up to a newer version of ICU, but we never got around to doing it.) In the code, this meant we were explicitly calling the versioned API so that we could keep the calls straight. (Of course this was abstracted in a set of our own locale functions so that the rest of the engine was ignorant of the ICU library fun that was going on.) > We can refine the guidance. But indexes are the most important issue, I > think, because changing the sorting rules in the background makes data > silently disappear. I'd say that collation is the most important issue, but collation impacts a lot more than indexes. Unfortunately as part of changing companies I had to leave my "screwy stuff that has happened in collations" presentation behind so I don't have concrete examples to point to, but I can cook up illustrative examples: - Suppose in ICU X.X, AA = Å but in ICU Y.Y AA != Å. Further suppose there was an RI constraint where the primary key used AA and the foreign key used Å. If ICU was updated, the RI constraint between the rows would break, leaving an orphaned foreign key. - I can't remember the specific language but they had the collation rule that "CH" was treated as a distinct entity between C and D. This gave the order C < CG < CI < CZ < CH < D. Then they removed CH as special which gave C < CG < CH < CI < CZ < D. Suppose there was the constraint CHECK (COL BETWEEN 'C' AND 'CH'). Originally it would allow (almost) all strings that started with C. After the change it the constraint would block everything that started with CI - CZ leaving many rows that no longer qualify in the database. It could be argued that these are edge cases and not likely to be hit. That's likely true for a lot of users. But for a user who hits this, their database is going to be left in a mess. -- Doug Doole On Tue, Sep 6, 2016 at 8:37 AM Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 8/31/16 4:24 PM, Doug Doole wrote: > > ICU explicitly does not provide stability in their locales and > collations. We pushed them hard to provide this, but between changes to the > CLDR data and changes to the ICU code it just wasn’t feasible for them to > provide version to version stability. > > > > What they do offer is a compile option when building ICU to version all > their APIs. So instead of calling icu_foo() you’d call icu_foo46(). (Or > something like this - it’s been a few years since I actually worked with > the ICU code.) This ultimately allows you to load multiple versions of the > ICU library into a single program and provide stability by calling the > appropriate version of the library. (Unfortunately, the OS - at least my > Linux box - only provides the generic version of ICU and not the version > annotated APIs, which means a separate compile of ICU is needed.) > > > > The catch with this is that it means you likely want to expose the > version information. In another note it was suggested to use something like > fr_FR%icu. If you want to pin it to a specific version of ICU, you’ll > likely need something like fr_FR%icu46. (There’s nothing wrong with > supporting fr_FR%icu to give users an easy
Re: [HACKERS] ICU integration
Hi all. I’m new to the PostgreSQL code and the mailing list, but I’ve had a lot of experience with using ICU in a different database product. So while I’m not up to speed on the code yet, I can offer some insights on using ICU. > On Aug 30, 2016, at 9:12 PM, Peter Eisentraut > wrote: >> How stable are the UCU locales? Most importantly, does ICU offer any >> way to "pin" a locale version, so we can say "we want de_DE as it was >> in ICU 4.6" and get consistent behaviour when the user sets up a >> replica on some other system with ICU 4.8? Even if the German >> government has changed its mind (again) about some details of the >> language and 4.8 knows about the changes but 4.4 doesn’t? ICU explicitly does not provide stability in their locales and collations. We pushed them hard to provide this, but between changes to the CLDR data and changes to the ICU code it just wasn’t feasible for them to provide version to version stability. What they do offer is a compile option when building ICU to version all their APIs. So instead of calling icu_foo() you’d call icu_foo46(). (Or something like this - it’s been a few years since I actually worked with the ICU code.) This ultimately allows you to load multiple versions of the ICU library into a single program and provide stability by calling the appropriate version of the library. (Unfortunately, the OS - at least my Linux box - only provides the generic version of ICU and not the version annotated APIs, which means a separate compile of ICU is needed.) The catch with this is that it means you likely want to expose the version information. In another note it was suggested to use something like fr_FR%icu. If you want to pin it to a specific version of ICU, you’ll likely need something like fr_FR%icu46. (There’s nothing wrong with supporting fr_FR%icu to give users an easy way of saying “give me the latest and greatest”, but you’d probably want to harden it to a specific ICU version internally.) > I forgot to mention this, but the patch adds a collversion column that > stores the collation version (provided by ICU). And then when you > upgrade ICU to something incompatible you get > > + if (numversion != collform->collversion) > + ereport(WARNING, > + (errmsg("ICU collator version mismatch"), > +errdetail("The database was created using > version 0x%08X, the library provides version 0x%08X.", > + (uint32) collform->collversion, > (uint32) numversion), > +errhint("Rebuild affected indexes, or build > PostgreSQL with the right version of ICU."))); > > So you still need to manage this carefully, but at least you have a > chance to learn about it. Indexes are the obvious place where collation comes into play, and are relatively easy to address. But consider all the places where string comparisons can be done. For example, check constraints and referential constraints can depend on string comparisons. If the collation rules change because of a new version of ICU, the database can become inconsistent and will need a lot more work than an index rebuild. > Suggestions for refining this are welcome. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Doug Doole Salesforce -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers