Re: [HACKERS] pg_conversion seems rather strangely defined
>> I used to had a customer who needs to have different client and >> database encoding than the default. That is, a slightly different >> mapping between Shift-JIS and other database encoding. Due to >> unfortunate historical reasons, there are several Shift-JIS variants >> (in addition to the standard defined by government, there are IBM, NEC >> and Microsoft versions). This is the reason why I wanted to have the >> functionality at that time. I'm not sure the customer still uses the >> functionality, but it is possible that there are other users who have >> similar use cases, since the Shift-JIS variants are still used. > > Hm. Odd that we've not heard complaints about the removal of > CONVERT(... USING ...), then. Yeah, I'm not sure the customer updated to the newer version of PostgreSQL. > I think it would be a good idea at least to put back some equivalent > of CONVERT(... USING ...), if for no other reason than that it would > ease testing. As I understood it, the argument to remove it was not > that the functionality was bad, but that we were using a SQL-standard > syntax for what we concluded was not SQL-standard functionality. > I'd propose putting it back with a syntax of, say, > > convert_with(input bytea, conversion_name text) returns bytea > > As for the client encoding conversion case, I still say a > search-path-based lookup is a horrible idea, and furthermore there > seems no very good reason why it has to be restricted to default > conversions. Aside from other arguments, that tends to push people > to mark *every* conversion as default, which is outright silly if > you have several competing ones. > > As a sketch of an alternative, consider inventing a GUC named > preferred_conversions or some such, which is a list of > possibly-schema-qualified conversion names. When establishing an > original or new value of client_encoding, we look through the list > for the first entry that exists and performs the desired encoding > conversion (whether or not it is default). If there is no match, > look up the (unique) default conversion for the encoding pair, and > use that. (Obviously this has to be done twice, once for each > direction, when setting up client encoding conversions.) We could > apply the same rules for identifying which specific conversion to use > in convert() and siblings. Sounds good to me. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
Tatsuo Ishii writes: >> It would only be important to be able to do it like that if different >> users of the same database had conflicting ideas about what was the >> correct conversion between client and database encodings. I submit >> that that's somewhere around epsilon probability, considering we have >> not even heard of anyone replacing the system conversions at all. > I used to had a customer who needs to have different client and > database encoding than the default. That is, a slightly different > mapping between Shift-JIS and other database encoding. Due to > unfortunate historical reasons, there are several Shift-JIS variants > (in addition to the standard defined by government, there are IBM, NEC > and Microsoft versions). This is the reason why I wanted to have the > functionality at that time. I'm not sure the customer still uses the > functionality, but it is possible that there are other users who have > similar use cases, since the Shift-JIS variants are still used. Hm. Odd that we've not heard complaints about the removal of CONVERT(... USING ...), then. I think it would be a good idea at least to put back some equivalent of CONVERT(... USING ...), if for no other reason than that it would ease testing. As I understood it, the argument to remove it was not that the functionality was bad, but that we were using a SQL-standard syntax for what we concluded was not SQL-standard functionality. I'd propose putting it back with a syntax of, say, convert_with(input bytea, conversion_name text) returns bytea As for the client encoding conversion case, I still say a search-path-based lookup is a horrible idea, and furthermore there seems no very good reason why it has to be restricted to default conversions. Aside from other arguments, that tends to push people to mark *every* conversion as default, which is outright silly if you have several competing ones. As a sketch of an alternative, consider inventing a GUC named preferred_conversions or some such, which is a list of possibly-schema-qualified conversion names. When establishing an original or new value of client_encoding, we look through the list for the first entry that exists and performs the desired encoding conversion (whether or not it is default). If there is no match, look up the (unique) default conversion for the encoding pair, and use that. (Obviously this has to be done twice, once for each direction, when setting up client encoding conversions.) We could apply the same rules for identifying which specific conversion to use in convert() and siblings. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
> It would only be important to be able to do it like that if different > users of the same database had conflicting ideas about what was the > correct conversion between client and database encodings. I submit > that that's somewhere around epsilon probability, considering we have > not even heard of anyone replacing the system conversions at all. I used to had a customer who needs to have different client and database encoding than the default. That is, a slightly different mapping between Shift-JIS and other database encoding. Due to unfortunate historical reasons, there are several Shift-JIS variants (in addition to the standard defined by government, there are IBM, NEC and Microsoft versions). This is the reason why I wanted to have the functionality at that time. I'm not sure the customer still uses the functionality, but it is possible that there are other users who have similar use cases, since the Shift-JIS variants are still used. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
Noah Misch writes: > On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote: >> Moreover, we have precedent both for this approach being a bad idea >> and for us changing it without many complaints. We used to have >> search-path-dependent lookup of default index operator classes. >> We found out that that was a bad idea and got rid of it, cf commit >> 3ac1ac58c. This situation looks much the same to me. > Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default > opclass per datatype regardless of schemas." That had been true since day one > for CREATE OPERATOR CLASS. It doesn't hold for conversions today, and that's > a crucial difference between that commit and this proposal. Well, the state of affairs is slightly different, but that doesn't mean it's not equally broken. What I took away from the default-opclass fiasco is that search-path-based lookup is a good idea only when you are looking up something *by name*, so that the user can resolve any ambiguity by schema-qualifying that name. Searches that aren't based on a user-specified name shouldn't depend on search_path. This is important because there are multiple conflicting concerns when you choose a search_path setting, and you may not want to allow any particular search to force your hand on what you put where in your search path. If, for example, you don't really want to put "public" on the front of your search path, the current code gives you no way to select a conversion that's in "public". If we need to cater for alternative conversions, my preference would be a GUC or some other kind of setting that allows selecting a client I/O conversion by name, whether it is "default" or not. But given the lack of apparent demand, I don't really want to design and implement such a feature right now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
On Wed, Jan 06, 2016 at 11:56:14PM -0500, Tom Lane wrote: > Noah Misch writes: > > On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote: > >> I do not see a lot of point in the namespacing of encoding conversions > >> either. Does anyone really need or use search-path-dependent lookup of > >> conversions? > > > I have not issued CREATE CONVERSION except to experiment, and I have never > > worked in a database in which someone else had created one. Among PGXN > > distributions, CREATE CONVERSION appears only in the pyrseas test suite. It > > could be hard to track down testimony on real-world usage patterns, but I > > envision two credible patterns. First, you could change the default search > > path to "corrected_conversions, pg_catalog, $user, public" and inject fixed > > versions of the system conversions. One could use that to backport commit > > 8d3e090. Second, you could add conversions we omit entirely, like UTF8 -> > > MULE_INTERNAL. Dropping search-path-dependent lookup would remove the > > supported way to fix system conversions. > > The built-in conversions are very intentionally not pinned. So to my > mind, the supported way to replace one is to drop it and create your own. I just learned something. Interesting. > The way you describe works only if an appropriate search path is installed > at the time a new session activates its client encoding conversions. TBH, > I have no idea whether we apply (for example) "ALTER ROLE SET search_path" > before that happens; but even if we do, there's no real guarantee that it > wouldn't change. We publish no documentation about the order of startup > actions. Moreover past attempts at defining dependencies between GUC > settings have been spectacular failures, so I really don't want to go > there in this context. > > It would only be important to be able to do it like that if different > users of the same database had conflicting ideas about what was the > correct conversion between client and database encodings. I submit > that that's somewhere around epsilon probability, considering we have > not even heard of anyone replacing the system conversions at all. > > (Adding conversions we don't supply is, of course, orthogonal to this.) Agreed on all those points. Even so, I don't find that the need to set GUCs in a particular order makes a good case for removing this ancient capability. I _would_ send a new feature back for redesign on the strength of such a defect, but that is different. Independent from that dearth of positive cause to restrict this, users taking the "DROP CONVERSION pg_catalog.foo" route get dump/restore problems. pg_dump doesn't notice that one dropped a pg_catalog conversion; the user would manually repeat each drop before each restore. That's especially awkward for pg_upgrade. I guess the user could drop each conversion in the new cluster's template0, run pg_upgrade, and then recreate conversions in databases that had not overridden them in the original cluster. That's a mess. > Moreover, we have precedent both for this approach being a bad idea > and for us changing it without many complaints. We used to have > search-path-dependent lookup of default index operator classes. > We found out that that was a bad idea and got rid of it, cf commit > 3ac1ac58c. This situation looks much the same to me. Per the 3ac1ac58c log message, "CREATE OPERATOR CLASS only allows one default opclass per datatype regardless of schemas." That had been true since day one for CREATE OPERATOR CLASS. It doesn't hold for conversions today, and that's a crucial difference between that commit and this proposal. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
Noah Misch writes: > On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote: >> I do not see a lot of point in the namespacing of encoding conversions >> either. Does anyone really need or use search-path-dependent lookup of >> conversions? > I have not issued CREATE CONVERSION except to experiment, and I have never > worked in a database in which someone else had created one. Among PGXN > distributions, CREATE CONVERSION appears only in the pyrseas test suite. It > could be hard to track down testimony on real-world usage patterns, but I > envision two credible patterns. First, you could change the default search > path to "corrected_conversions, pg_catalog, $user, public" and inject fixed > versions of the system conversions. One could use that to backport commit > 8d3e090. Second, you could add conversions we omit entirely, like UTF8 -> > MULE_INTERNAL. Dropping search-path-dependent lookup would remove the > supported way to fix system conversions. The built-in conversions are very intentionally not pinned. So to my mind, the supported way to replace one is to drop it and create your own. The way you describe works only if an appropriate search path is installed at the time a new session activates its client encoding conversions. TBH, I have no idea whether we apply (for example) "ALTER ROLE SET search_path" before that happens; but even if we do, there's no real guarantee that it wouldn't change. We publish no documentation about the order of startup actions. Moreover past attempts at defining dependencies between GUC settings have been spectacular failures, so I really don't want to go there in this context. It would only be important to be able to do it like that if different users of the same database had conflicting ideas about what was the correct conversion between client and database encodings. I submit that that's somewhere around epsilon probability, considering we have not even heard of anyone replacing the system conversions at all. (Adding conversions we don't supply is, of course, orthogonal to this.) Moreover, we have precedent both for this approach being a bad idea and for us changing it without many complaints. We used to have search-path-dependent lookup of default index operator classes. We found out that that was a bad idea and got rid of it, cf commit 3ac1ac58c. This situation looks much the same to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
On Tue, Jan 05, 2016 at 01:46:51PM -0500, Tom Lane wrote: > I do not see a lot of point in the namespacing of encoding conversions > either. Does anyone really need or use search-path-dependent lookup of > conversions? I have not issued CREATE CONVERSION except to experiment, and I have never worked in a database in which someone else had created one. Among PGXN distributions, CREATE CONVERSION appears only in the pyrseas test suite. It could be hard to track down testimony on real-world usage patterns, but I envision two credible patterns. First, you could change the default search path to "corrected_conversions, pg_catalog, $user, public" and inject fixed versions of the system conversions. One could use that to backport commit 8d3e090. Second, you could add conversions we omit entirely, like UTF8 -> MULE_INTERNAL. Dropping search-path-dependent lookup would remove the supported way to fix system conversions. > (If they do, it's probably broken anyway, since for example > we do not trouble to re-identify the client encoding conversion functions > when search_path changes.) That's bad in principle, but I'll guess it's tolerable in practice. Switching among implementations of a particular conversion might happen with O(weeks) or longer period, like updating your system's iconv() conversion tables. I can't easily envision one application switching between implementations over the course of a session. (An application doing that today probably works around the problem, perhaps with extra "SET client_encoding" calls.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_conversion seems rather strangely defined
I wrote: > I still think however that search-path-based lookup of default encoding > conversions is a Bad Idea, and that we only ought to allow one default > conversion to exist for any pair of encodings. > I realized that we could implement that without too much trouble by > redefining pg_conversion.condefault as being true for default conversions > and NULL (not false) for non-default ones. With this definition, a > unique index on pg_conversion (conforencoding, contoencoding, condefault) > would have the behavior we want --- sort of a poor man's partial unique > index, except that it would work correctly on a system catalog whereas > a true partial index wouldn't. Turns out that indeed that works just fine. See attached draft patch. regards, tom lane diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 97ef618..dcb8b8e 100644 *** a/doc/src/sgml/catalogs.sgml --- b/doc/src/sgml/catalogs.sgml *** *** 2538,2544 condefault bool ! True if this is the default conversion --- 2538,2545 condefault bool ! True if this is the default conversion for these two encodings, !else NULL diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index f8c7ac3..eb39a9f 100644 *** a/doc/src/sgml/charset.sgml --- b/doc/src/sgml/charset.sgml *** $ psql -l *** 1102,1108 pg_conversion system catalog. PostgreSQL comes with some predefined conversions, as shown in . You can create a new ! conversion using the SQL command CREATE CONVERSION. --- 1102,1108 pg_conversion system catalog. PostgreSQL comes with some predefined conversions, as shown in . You can create a new ! conversion using the SQL command . diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index a0b42c2..f57d891 100644 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 1488,1494 original encoding is specified by src_encoding. The string must be valid in this encoding. ! Conversions can be defined by CREATE CONVERSION. Also there are some predefined conversions. See for available conversions. --- 1488,1494 original encoding is specified by src_encoding. The string must be valid in this encoding. ! Conversions can be defined with . Also there are some predefined conversions. See for available conversions. *** *** 1525,1534 bytea ! Convert string to dest_encoding. convert_to('some text', 'UTF8') !some text represented in the UTF8 encoding --- 1525,1535 bytea ! Convert string from the database encoding ! to dest_encoding. convert_to('some text', 'UTF8') !some text represented in UTF8 encoding diff --git a/doc/src/sgml/ref/create_conversion.sgml b/doc/src/sgml/ref/create_conversion.sgml index d2e2c01..1933107 100644 *** a/doc/src/sgml/ref/create_conversion.sgml --- b/doc/src/sgml/ref/create_conversion.sgml *** CREATE [ DEFAULT ] CONVERSION CREATE CONVERSION defines a new conversion between !character set encodings. Also, conversions that are marked DEFAULT can be used for automatic encoding conversion between client and server. For this purpose, two conversions, from encoding A to --- 28,34 CREATE CONVERSION defines a new conversion between !character set encodings. Conversions that are marked DEFAULT can be used for automatic encoding conversion between client and server. For this purpose, two conversions, from encoding A to *** CREATE [ DEFAULT ] CONVERSION The DEFAULT clause indicates that this conversion is the default for this particular source to destination !encoding. There should be only one default encoding in a schema for the encoding pair. --- 53,59 The DEFAULT clause indicates that this conversion is the default for this particular source to destination !encoding. There can be only one default conversion in a database for the encoding pair. diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index 8b105fe..98977f6 100644 *** a/src/backend/catalog/namespace.c --- b/src/backend/catalog/namespace.c *** *** 28,34 #include "catalog/pg_authid.h" #include "catalog/pg_collation.h" #include "catalog/pg_conversion.h" - #include "catalog/pg_conversion_fn.h" #include "catalog/pg_namespace.h" #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" --- 28,33
Re: [HACKERS] pg_conversion seems rather strangely defined
I wrote: > What is the point of pg_conversion.condefault (the flag that says whether > a conversion is "default")? AFAICS, there is absolutely no way to invoke > a conversion that is not default, which means we might as well eliminate > the concept. > I do not see a lot of point in the namespacing of encoding conversions > either. Does anyone really need or use search-path-dependent lookup of > conversions? (If they do, it's probably broken anyway, since for example > we do not trouble to re-identify the client encoding conversion functions > when search_path changes.) > While actually removing pg_conversion.connamespace might not be worth > the trouble, it's mighty tempting to have just a single unique index on > (conforencoding, contoencoding), thereby enforcing that There Can Be Only > One conversion between any given pair of encodings, and then we can just > use that index to find the right entry without any fooling with search > path. > But in any case we should get rid of the concept of defaultness, because > it's pointless; all entries should be equally "default". After further poking around, I realized that there *used* to be a way to get at non-default encoding conversions, but it was removed here: commit 02138357ffc8c41a3d646035368712a5394f1f5f Author: Andrew Dunstan Date: Mon Sep 24 01:29:30 2007 + Remove "convert 'blah' using conversion_name" facility, because if it produces text it is an encoding hole and if not it's incompatible with the spec, whatever the spec means (which we're not sure about anyway). Now, the big problem there was that the function could produce values of type "text" that violate the current database encoding. We could have retained the facility by making the function take and return bytea, as convert() does today, but apparently nobody spoke up for the need then or since. Still, somebody might want it someday, and it would be pretty messy to remove the concept of a default encoding only to have to put it back later. I still think however that search-path-based lookup of default encoding conversions is a Bad Idea, and that we only ought to allow one default conversion to exist for any pair of encodings. I realized that we could implement that without too much trouble by redefining pg_conversion.condefault as being true for default conversions and NULL (not false) for non-default ones. With this definition, a unique index on pg_conversion (conforencoding, contoencoding, condefault) would have the behavior we want --- sort of a poor man's partial unique index, except that it would work correctly on a system catalog whereas a true partial index wouldn't. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_conversion seems rather strangely defined
What is the point of pg_conversion.condefault (the flag that says whether a conversion is "default")? AFAICS, there is absolutely no way to invoke a conversion that is not default, which means we might as well eliminate the concept. I do not see a lot of point in the namespacing of encoding conversions either. Does anyone really need or use search-path-dependent lookup of conversions? (If they do, it's probably broken anyway, since for example we do not trouble to re-identify the client encoding conversion functions when search_path changes.) While actually removing pg_conversion.connamespace might not be worth the trouble, it's mighty tempting to have just a single unique index on (conforencoding, contoencoding), thereby enforcing that There Can Be Only One conversion between any given pair of encodings, and then we can just use that index to find the right entry without any fooling with search path. But in any case we should get rid of the concept of defaultness, because it's pointless; all entries should be equally "default". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers