Re: [HACKERS] Can ICU be used for a database's default sort order?
On 2019-03-21 18:46, Marius Timmer wrote: > as I mentioned three weeks ago the patch from October 2018 did not apply > on the master. In the meantime I rebased it. Additionally I fixed some > Makefiles because a few icu-libs were missing. Now this patch applies > and compiles successfully on my machine. After installing running "make > installcheck-world" results in some failures (for example "select"). I > will take a closer look at those failures and review the whole patch in > the next few days. I just wanted to avoid that you have to do the same > rebasing stuff. The new patch is attached to this mail. As I said previously in this thread, this patch needs some fundamental design work. I don't think it's worth doing a code review on the patch as it is. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hello Andrey, we would like to see ICU collations become the default for entire databases as well. Therefore we would also review the patch. Unfortunately your Patch from late October does not apply on the current master. Besides of that I noticed the patch applies on master of October but results in errors when compiling without "--with-icu" and executing "make check-world": > libpq.so: Warning: undefined reference to »get_collprovider_name« > libpq.so: Warning: undefined reference to »is_valid_nondefault_collprovider« > libpq.so: Warning: undefined reference to »get_collprovider« May be caused by your last modification: > I've added LDFLAGS_INTERNAL += $(ICU_LIBS) in libpq, but I'm not > entirely sure this is correct way to deal with complaints on ICU > functions from libpq linking. Best regards, Marius Timmer -- Westfälische Wilhelms-Universität Münster (WWU) Zentrum für Informationsverarbeitung (ZIV) Röntgenstraße 7-13 48149 Münster +49 251 83 31158 marius.tim...@uni-muenster.de https://www.uni-muenster.de/ZIV signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Can ICU be used for a database's default sort order?
On Tue, Dec 11, 2018 at 09:22:48AM +0100, Peter Eisentraut wrote: > If we have well-designed answers to these questions, I'd imagine that > the actual feature patch would be quite small. I was very surprised to > see how large this patch is and how much code is moves around without > much explanation. I don't think it's worth reviewing this patch any > further. It needs several steps back and some fundamental design and > refactoring work. Marked as returned with feedback. This thread has stalled. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Can ICU be used for a database's default sort order?
On 12/12/2018 15:57, Daniel Verite wrote: > I think one related issue that the patch works around by using a libc locale > as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE. > In fact I've been wondering if that's the main reason for the interface > implemented by the patch. So it seems, but then it should be called out more clearly. > Otherwise, how should these env variables be initialized for ICU > databases? I think when using ICU by default, then it should not matter because we shouldn't be calling any libc functions that use those settings. Maybe there need to be some exceptions, but again we should call those out more clearly. We could set them to "C" for consistency perhaps. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Can ICU be used for a database's default sort order?
Peter Eisentraut wrote: > Another issue is that we'd need to carefully divide up the role of the > "default" collation and the "default" provider. The default collation > is the collation defined for the database, the default provider means to > use the libc non-locale_t enabled API functions. Right now these are > always the same, but if the database-global locale is ICU, then the > default collation would use the ICU provider. I think one related issue that the patch works around by using a libc locale as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE. In fact I've been wondering if that's the main reason for the interface implemented by the patch. Otherwise, how should these env variables be initialized for ICU databases? For instance in the existing FTS code, lowerstr_with_len() in tsearch/ts_locale.c calls tolower() or towlower() to fold a string to lower case when normalizing lexemes. This requires LC_CTYPE to be set to something compatible with the database encoding, at the very least. Even if that code looks like it might need to be changed for ICU anyway (or just to be collation-aware according to the TODO marks?), what about comparable calls in extensions? In the case that we don't touch libc's LC_COLLATE/LC_CTYPE in backends, extension code would have them inherited from the postmaster? Does that sound acceptable? If not, maybe ICU databases should have these as settable options, in addition to their ICU locale? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [HACKERS] Can ICU be used for a database's default sort order?
On 02/12/2018 15:40, Andrey Borodin wrote: > Daniel have raised important interface question in his review. Using > libc-style locale in lc_collate is not a perfect choice for many ICU-only > collations. > I'd work on patch if I knew how to improve the interface, but I need input > from community: how this interface should look like. Figuring out the interface is the hard part. Several options have been discussed in this thread and earlier threads. My current thinking is that we should add a datcollprovider column to pg_database, and then store the ICU locale name in datcollate. So mirror the columns of pg_collation in pg_database. Another issue is that we'd need to carefully divide up the role of the "default" collation and the "default" provider. The default collation is the collation defined for the database, the default provider means to use the libc non-locale_t enabled API functions. Right now these are always the same, but if the database-global locale is ICU, then the default collation would use the ICU provider. My recently posted patch "Reorganize collation lookup time and place" is meant to help reorganize the APIs to make this simpler. It doesn't have all the answers yet, but I think it's a step in this direction. If we have well-designed answers to these questions, I'd imagine that the actual feature patch would be quite small. I was very surprised to see how large this patch is and how much code is moves around without much explanation. I don't think it's worth reviewing this patch any further. It needs several steps back and some fundamental design and refactoring work. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Can ICU be used for a database's default sort order?
On Sun, Dec 2, 2018 at 4:21 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > About the status of the patch, to me it should be RWF. It's been > > moved to the next CF several times with no progress besides rebases. > > Let me disagree. Judging from the commentaries in this discussion it could be > significant and useful feature, and the author is trying to keep this patch > uptodate. The lack of reviews could be due other reasons than desirability of > the patch (as well as as for many other interesting proposals in hackers). +1. I, for one, care about this feature. We're all very busy, but I don't want to see it die. -- Peter Geoghegan
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hi, Dmitry, Daniel! > 2 дек. 2018 г., в 17:22, Dmitry Dolgov <9erthali...@gmail.com> написал(а): > >> There were reviews: Andrey Borodin raised issues with the patch in >> [2], I spent some time trying it and asked questions about the design >> in [3], but no one followed up on them within the next months. >> >> About the status of the patch, to me it should be RWF. It's been >> moved to the next CF several times with no progress besides rebases. > > Let me disagree. Judging from the commentaries in this discussion it could be > significant and useful feature, and the author is trying to keep this patch > uptodate. The lack of reviews could be due other reasons than desirability of > the patch (as well as as for many other interesting proposals in hackers). Regarding status of this patch: Marina is the original author of the patch, but I'm interested in pushing it. Basically, I've asked Marina to provide some code from Postgres Pro to discuss the feature. Daniel have raised important interface question in his review. Using libc-style locale in lc_collate is not a perfect choice for many ICU-only collations. I'd work on patch if I knew how to improve the interface, but I need input from community: how this interface should look like. I have intention to provide some solution for this question before next CF, but found no enough time during this CF (beside rebase). Best regards, Andrey Borodin.
Re: [HACKERS] Can ICU be used for a database's default sort order?
> On Sat, Dec 1, 2018 at 9:08 PM Daniel Verite wrote: > > Dmitry Dolgov wrote: > > > As a side note, I'm a bit confused, who is the original author of > > the proposed patch? If it's Marina, why she isn't involved in the > > discussion or even mentioned in the patch itself? > > The original patch [1] starts with these commit metadata: > > From e1cb130f550952d9c9c2d9ad1c52e60699a2c968 Mon Sep 17 00:00:00 2001 > From: Marina Polyakova > Date: Fri, 9 Feb 2018 18:57:25 +0300 > Subject: [PATCH] ICU as default collation provider > ... commit message... > > but as you note, Marina did not intervene in the discussion nor > submitted it herself, so this patch misses someone to play the > role of the author in the CF process. Yes, I've missed that, thank you. But there was no references like that in the last rebased version. > There were reviews: Andrey Borodin raised issues with the patch in > [2], I spent some time trying it and asked questions about the design > in [3], but no one followed up on them within the next months. > > About the status of the patch, to me it should be RWF. It's been > moved to the next CF several times with no progress besides rebases. Let me disagree. Judging from the commentaries in this discussion it could be significant and useful feature, and the author is trying to keep this patch uptodate. The lack of reviews could be due other reasons than desirability of the patch (as well as as for many other interesting proposals in hackers).
Re: [HACKERS] Can ICU be used for a database's default sort order?
Dmitry Dolgov wrote: > As a side note, I'm a bit confused, who is the original author of > the proposed patch? If it's Marina, why she isn't involved in the > discussion or even mentioned in the patch itself? The original patch [1] starts with these commit metadata: From e1cb130f550952d9c9c2d9ad1c52e60699a2c968 Mon Sep 17 00:00:00 2001 From: Marina Polyakova Date: Fri, 9 Feb 2018 18:57:25 +0300 Subject: [PATCH] ICU as default collation provider ... commit message... but as you note, Marina did not intervene in the discussion nor submitted it herself, so this patch misses someone to play the role of the author in the CF process. There were reviews: Andrey Borodin raised issues with the patch in [2], I spent some time trying it and asked questions about the design in [3], but no one followed up on them within the next months. About the status of the patch, to me it should be RWF. It's been moved to the next CF several times with no progress besides rebases. [1] <37a534be-cbf7-467c-b096-0aad25091...@yandex-team.ru> [2] <92826deb-da8f-4ae4-9c43-03a55d18a...@yandex-team.ru> [3] <7e86a1aa-a942-4f6b-978e-e9013a4af...@manitou-mail.org> Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: [HACKERS] Can ICU be used for a database's default sort order?
> On Tue, Oct 30, 2018 at 9:07 AM Andrey Borodin wrote: > > Hi! > > > 2 окт. 2018 г., в 11:37, Michael Paquier написал(а): > > > > Please note that the latest patch set does not apply, so this has been > > switched to commit fest 2018-11, waiting on author for a rebase. > > PFA rebased version. I've added LDFLAGS_INTERNAL += $(ICU_LIBS) in libpq, Thanks for providing the rebased version. > but I'm not entirely sure this is correct way to deal with complaints on ICU > functions from libpq linking. Well, it was enough on my own Gentoo, where patch actually compiles without errors and pass "make check". But for cfbot it doesn't compile, I'm not sure why. ../../../src/interfaces/libpq/libpq.so: undefined reference to `ucol_open_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `uloc_toLanguageTag_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `u_errorName_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `ucol_getVersion_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `u_versionToString_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `uloc_setDefault_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `uloc_getDefault_52' ../../../src/interfaces/libpq/libpq.so: undefined reference to `ucol_close_52' As a side note, I'm a bit confused, who is the original author of the proposed patch? If it's Marina, why she isn't involved in the discussion or even mentioned in the patch itself?
Re: [HACKERS] Can ICU be used for a database's default sort order?
On Wed, Sep 12, 2018 at 01:06:12PM +1200, Thomas Munro wrote: > The ideal scope would be to track all referenced collation versions on > every index, and only update them at CREATE INDEX or REINDEX time > (also, as discussed in some other thread, CHECK constraints and > partition keys might be invalidated and should in theory also carry > versions that can only be updated by running a hypothetical RECHECK or > REPARTITION command). Then a shared pg_collation catalog would make > perfect sense, and there would be no need for it to have a collversion > column at all, or an ALTER COLLATION ... REFRESH VERSION command, and > therefore there would be no way to screw it up by REFRESHing the > VERSION without having really fixed the problem. Please note that the latest patch set does not apply, so this has been switched to commit fest 2018-11, waiting on author for a rebase. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Can ICU be used for a database's default sort order?
On Sat, Jun 24, 2017 at 10:55 AM Peter Geoghegan wrote: > On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut > wrote: > > It's something I hope to address soon. > > I hope you do. I think that we'd realize significant benefits by > having ICU become the defacto standard collation provider, that most > users get without even realizing it. As things stand, you have to make > a point of specifying an ICU collation as your per-column collation > within every CREATE TABLE. That's a significant barrier to adoption. > > > 1) Associate by name only. That is, you can create a database with any > > COLLATION "foo" that you want, and it's only checked when you first > > connect to or do anything in the database. > > > > 2) Create shared collations. Then we'd need a way to manage having a > > mix of shared and non-shared collations around. > > > > There are significant pros and cons to all of these ideas. Some people > > I talked to appeared to prefer the shared collations approach. > > I strongly prefer the second approach. The only downside that occurs > to me is that that approach requires more code. Is there something > that I've missed? Sorry to join this thread late. I was redirected here from another one[1]. I like the shared catalog idea, but here's one objection I thought about: it makes it a bit harder to track whether you've sorted out all your indexes after a version change. Say collation fr_CA's version changes according to the provider, so that it no longer matches the stored collversion. Now you'll need to be careful to connect to every database in the cluster and run REINDEX, before you run ALTER COLLATION "fr_CA" REFRESH VERSION to update the single shared pg_collation row's collversion. With the non-shared pg_collation scheme we have currently, you'd need to refresh the collation row in each database after reindexing the whole database, which is IMHO a bit nicer (you track which databases you've dealt with as you go through them). In other words, using a shared catalog moves the "scope" of the version tracking even further away from the ideal scope, and requires humans to actually get the cleanup right, and it's extra confusing because you can only be connected to one database at a time so there is no "REINDEX MY CLUSTER" and no possibility of making a command that reindexes dependent indexes and then refreshes the collation version. The ideal scope would be to track all referenced collation versions on every index, and only update them at CREATE INDEX or REINDEX time (also, as discussed in some other thread, CHECK constraints and partition keys might be invalidated and should in theory also carry versions that can only be updated by running a hypothetical RECHECK or REPARTITION command). Then a shared pg_collation catalog would make perfect sense, and there would be no need for it to have a collversion column at all, or an ALTER COLLATION ... REFRESH VERSION command, and therefore there would be no way to screw it up by REFRESHing the VERSION without having really fixed the problem. [1] https://www.postgresql.org/message-id/242e081c-aec8-a20a-510c-f4d0f183cebd%402ndquadrant.com -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Can ICU be used for a database's default sort order?
Andrey Borodin wrote: > Overall patch looks solid and thoughtful work and adds important > functionality. I tried the patch, with some minor changes to build with HEAD. I was surprised by the interface, that is, the fact that a user is not allowed to freely choose the ICU collation of a database, in constrast with CREATE COLLATION. AFAIU, when the "default collation provider" is ICU, CREATE DATABASE still expects a libc locale in the lc_collate/lc_ctype arguments. The code will automatically find an ICU equivalent by matching the language, and it seems that the country is ignored? So if we wanted a database with an ICU collation like, say, "es@collation=traditional" or "es-u-co-trad" as expressed with a BCP-47 tag, or anything that is not defined by only a language, would it be possible? I have the impression it wouldn't. This is not something that could be easily improved after the fact because getting the ICU collation through a libc collation is a user-interface choice. I think users would rather be able to create a database with something like: CREATE DATABASE foo COLLPROVIDER='icu' LOCALE='icu_locale' | LC_COLLATE='icu_locale' | LC_CTYPE='icu_locale' ... which would be in line with CREATE COLLATION. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Re: [HACKERS] Can ICU be used for a database's default sort order?
On 3/2/18 1:14 AM, Andres Freund wrote: > > On 2018-02-10 20:45:40 +0500, Andrey Borodin wrote: >> I've contacted Postgres Professional. Marina Polyakova had kindly provided >> their patch. >> The patch allows to use libc locale with ICU collation as default for >> cluster or database. >> >> It seems that this patch brings important long-awaited feature and deserves >> to be included in last v11 commitfest. >> Peter, everyone, do you agree with this? Or should we better adapt this work >> through v12 cycle? >> >> I'm planning to provide review asap and do necessary changes if required >> (this was discussed with Marina and Postgres Professional). > > This patch was submitted for the last v11 commitfest, it's not a trivial > patch, and hasn't yet been reviewed. I'm afraid the policy is that large > patches shouldn't be submitted for the last commitfest... Thus I think > this should be moved to the next one. This patch has been moved to the next CF. Regards, -- -David da...@pgmasters.net
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hi, On 2018-02-10 20:45:40 +0500, Andrey Borodin wrote: > I've contacted Postgres Professional. Marina Polyakova had kindly provided > their patch. > The patch allows to use libc locale with ICU collation as default for cluster > or database. > > It seems that this patch brings important long-awaited feature and deserves > to be included in last v11 commitfest. > Peter, everyone, do you agree with this? Or should we better adapt this work > through v12 cycle? > > I'm planning to provide review asap and do necessary changes if required > (this was discussed with Marina and Postgres Professional). This patch was submitted for the last v11 commitfest, it's not a trivial patch, and hasn't yet been reviewed. I'm afraid the policy is that large patches shouldn't be submitted for the last commitfest... Thus I think this should be moved to the next one. Greetings, Andres Freund
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hello. Just want to inform: I have run check,installcheck,plcheck,contribcheck,modulescheck,ecpgcheck,isolationcheck,upgradecheck tests on Windows 10, VC2017 with patch applied on top of 2a41507dab0f293ff241fe8ae326065998668af8 as Andrey asked me. Everything is passing with and without $config->{icu} = 'D:\Dev\postgres\icu\'; Best regards, Michail. пт, 16 февр. 2018 г. в 11:13, Andrey Borodin : > Hi everyone! > > > 10 февр. 2018 г., в 20:45, Andrey Borodin > написал(а): > > > > I'm planning to provide review > > > > So, I was looking into the patch. > The patch adds: > 1. Ability to specify collation provider (with version) in --locale for > initdb and createdb. > 2. Changes to locale checks > 3. Sets ICU as default collation provider. For example > "ru_RU@icu.153.80.32.1" is default on my machine with patch > 4. Tests and necessary changes to documentation > > With patch I get correct ICU ordering by default > postgres=# select unnest(array['е','ё','ж']) order by 1; > unnest > > е > ё > ж > (3 rows) > > While libc locale provides incorrect order (I also get same ordering by > default without patch) > > postgres=# select c from unnest(array['е','ё','ж']) c order by c collate > "ru_RU"; > c > --- > е > ж > ё > (3 rows) > > > Unfortunately, neither "ru_RU@icu.153.80.32.1" (exposed by LC_COLLATE and > other places) nor "ru_RU@icu" cannot be used by collate SQL clause. > Also, patch removes compatibility with MSVC 1800 (Visual Studio 2013) on > Windows XP and Windows Server 2003. This is done to use newer > locale-related functions in VS2013 build. > > If the database was initialized with default locale without this patch, > one cannot connect to it anymore > psql: FATAL: could not find out the collation provider for datcollate > "ru_RU.UTF-8" of database "postgres" > This problem is mentioned in commit message of the patch. I think that > this problem should be addressed somehow. > What do you think? > > Overall patch looks solid and thoughtful work and adds important > functionality. > > Best regards, Andrey Borodin. >
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hi everyone! > 10 февр. 2018 г., в 20:45, Andrey Borodin написал(а): > > I'm planning to provide review > So, I was looking into the patch. The patch adds: 1. Ability to specify collation provider (with version) in --locale for initdb and createdb. 2. Changes to locale checks 3. Sets ICU as default collation provider. For example "ru_RU@icu.153.80.32.1" is default on my machine with patch 4. Tests and necessary changes to documentation With patch I get correct ICU ordering by default postgres=# select unnest(array['е','ё','ж']) order by 1; unnest е ё ж (3 rows) While libc locale provides incorrect order (I also get same ordering by default without patch) postgres=# select c from unnest(array['е','ё','ж']) c order by c collate "ru_RU"; c --- е ж ё (3 rows) Unfortunately, neither "ru_RU@icu.153.80.32.1" (exposed by LC_COLLATE and other places) nor "ru_RU@icu" cannot be used by collate SQL clause. Also, patch removes compatibility with MSVC 1800 (Visual Studio 2013) on Windows XP and Windows Server 2003. This is done to use newer locale-related functions in VS2013 build. If the database was initialized with default locale without this patch, one cannot connect to it anymore psql: FATAL: could not find out the collation provider for datcollate "ru_RU.UTF-8" of database "postgres" This problem is mentioned in commit message of the patch. I think that this problem should be addressed somehow. What do you think? Overall patch looks solid and thoughtful work and adds important functionality. Best regards, Andrey Borodin.
Re: [HACKERS] Can ICU be used for a database's default sort order?
On Fri, Feb 2, 2018 at 4:22 AM, Andrey Borodin wrote: > I can try to do this before next CF. But ISTM that EDB and Postgres Pro > already have various flavours of similar feature. Maybe they are planning to > publish that? I would definitely review that patch. -- Peter Geoghegan
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hello! > 1 февр. 2018 г., в 19:09, Peter Eisentraut > написал(а): > > On 1/31/18 11:48, Vladimir Borodin wrote: >> Will it work only for a particular database? Or for a whole cluster >> during initdb also? Any chance to get this done in 11? > > I'm currently not working on it. > > It's basically just a lot of leg work, and you need to come up with a > catalog representation. Possible options have already been addressed in > earlier threads. I can try to do this before next CF. But ISTM that EDB and Postgres Pro already have various flavours of similar feature. Maybe they are planning to publish that? Best regards, Andrey Borodin.
Re: [HACKERS] Can ICU be used for a database's default sort order?
On 1/31/18 11:48, Vladimir Borodin wrote: > Will it work only for a particular database? Or for a whole cluster > during initdb also? Any chance to get this done in 11? I'm currently not working on it. It's basically just a lot of leg work, and you need to come up with a catalog representation. Possible options have already been addressed in earlier threads. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Can ICU be used for a database's default sort order?
Hi. > 23 июня 2017 г., в 21:32, Peter Eisentraut > написал(а): > > On 6/22/17 23:10, Peter Geoghegan wrote: >> On Thu, Jun 22, 2017 at 7:10 PM, Tom Lane wrote: >>> Is there some way I'm missing, or is this just a not-done-yet feature? >> >> It's a not-done-yet feature. > > It's something I hope to address soon. Will it work only for a particular database? Or for a whole cluster during initdb also? Any chance to get this done in 11? -- May the force be with you… https://simply.name