Re: [HACKERS] Proposal: CREATE CONVERSION
> IMHO basic encoding information like name and id are not problem. > The PQmblen() is big problem. Strange question: is PQmblen() really > needful? I see it's used for result printing, but why backend not > mark size of field (word) to result? If backend good knows size of > data why not send this information to client togeter with data? PQmblen() is used by psql in many places. It is used for parsing query texts supplied by user, not only for data sent from backend. -- Tatsuo Ishii ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
On Thu, Jul 11, 2002 at 06:30:48PM +0900, Tatsuo Ishii wrote: > > > No, it's not a libpq problem, but more common "client/server" problem > > > IMO. It's very hard to share dynamically created object (info) > > > effectively between client and server. > > > > IMHO dynamic object will keep server and client must ask for wanted > > information to server. > > I agree with you. However real problem is how fast it could be. For > example, pg_mblen() is called for each word processed by libpq to know > the byte length of the word. If each call to pg_mblen() accesses > backend, the performance might be unacceptably slow. It must load all relevant information about actual encoding(s) and cache it in libpq. IMHO basic encoding information like name and id are not problem. The PQmblen() is big problem. Strange question: is PQmblen() really needful? I see it's used for result printing, but why backend not mark size of field (word) to result? If backend good knows size of data why not send this information to client togeter with data? Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> > No, it's not a libpq problem, but more common "client/server" problem > > IMO. It's very hard to share dynamically created object (info) > > effectively between client and server. > > IMHO dynamic object will keep server and client must ask for wanted > information to server. I agree with you. However real problem is how fast it could be. For example, pg_mblen() is called for each word processed by libpq to know the byte length of the word. If each call to pg_mblen() accesses backend, the performance might be unacceptably slow. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal: CREATE CONVERSION
On Thu, Jul 11, 2002 at 05:52:18PM +0900, Tatsuo Ishii wrote: > > > pg_char_to_encoding() is already in libpq. Or am I missing something? > > > > It works with encoding table (pg_enc2name_tbl) and it's compiled > > into backend and client too. It means number of encoding is not possible > > change after compilation and you (user) can't add new encoding without > > pg_enc2name_tbl[] change. I original thought we can add new encodings > > on-the-fly in 7.3 :-) You're right. > > > > IMHO implement "User defined charsets(encodings)" will problem for > > current libpq design. > > No, it's not a libpq problem, but more common "client/server" problem > IMO. It's very hard to share dynamically created object (info) > effectively between client and server. IMHO dynamic object will keep server and client must ask for wanted information to server. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> > pg_char_to_encoding() is already in libpq. Or am I missing something? > > It works with encoding table (pg_enc2name_tbl) and it's compiled > into backend and client too. It means number of encoding is not possible > change after compilation and you (user) can't add new encoding without > pg_enc2name_tbl[] change. I original thought we can add new encodings > on-the-fly in 7.3 :-) You're right. > > IMHO implement "User defined charsets(encodings)" will problem for > current libpq design. No, it's not a libpq problem, but more common "client/server" problem IMO. It's very hard to share dynamically created object (info) effectively between client and server. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
On Thu, Jul 11, 2002 at 05:26:01PM +0900, Tatsuo Ishii wrote: > > Where/how is describe conversion between encoding id and encoding > > name? (I maybe something overlook:-) I expect new encoding system > > will extendable and encodings list not will hardcoded like now. > > (extendable = add new encoding without PostgreSQL rebuild) > > User defined charsets(encodings) is under discussion and I believe it > would not happen for 7.3. > > > BTW, the client site needs routines for work with encoding names too > > (pg_char_to_encoding()). Hmm.. it can't be extendable, or yes? > > pg_char_to_encoding() is already in libpq. Or am I missing something? It works with encoding table (pg_enc2name_tbl) and it's compiled into backend and client too. It means number of encoding is not possible change after compilation and you (user) can't add new encoding without pg_enc2name_tbl[] change. I original thought we can add new encodings on-the-fly in 7.3 :-) You're right. IMHO implement "User defined charsets(encodings)" will problem for current libpq design. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
> Where/how is describe conversion between encoding id and encoding > name? (I maybe something overlook:-) I expect new encoding system > will extendable and encodings list not will hardcoded like now. > (extendable = add new encoding without PostgreSQL rebuild) User defined charsets(encodings) is under discussion and I believe it would not happen for 7.3. > BTW, the client site needs routines for work with encoding names too > (pg_char_to_encoding()). Hmm.. it can't be extendable, or yes? pg_char_to_encoding() is already in libpq. Or am I missing something? -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Proposal: CREATE CONVERSION
On Thu, Jul 11, 2002 at 03:37:49PM +0900, Tatsuo Ishii wrote: > > > CREATE FUNCTION function_for_LATIN1_to_UTF-8(opaque, opaque, integer) > > > RETURNS integer; > > > CREAE CONVERSION myconversion FOR 'LATIN1' TO 'UNICODE' FROM > > > function_for_LATIN1_to_UTF-8; > > > > Hmm, but it require define "function_for_..." for each conversion. > > For example trigger function I needn't define for each table, but I can > > use only one PostgreSQL function for arbirary table. > > I don't think this is a big problem, IMO. > > However, thinking more, I came to a conclusion that passing encoding > ids would be a good thing. With the encoding id parameters, the > function could check if it is called with correct encodings, and this > would prevent disaster. New interface proposal: OK. > pgconv( > INTEGER,-- source encoding id > INTEGER,-- destination encoding id Where/how is describe conversion between encoding id and encoding name? (I maybe something overlook:-) I expect new encoding system will extendable and encodings list not will hardcoded like now. (extendable = add new encoding without PostgreSQL rebuild) BTW, the client site needs routines for work with encoding names too (pg_char_to_encoding()). Hmm.. it can't be extendable, or yes? > OPAQUE, -- source string (null terminated C string) > OPAQUE, -- destination string (null terminated C string) > INTERGER-- source string length > ) returns INTEGER; -- dummy. returns nothing, actually. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> > For example you want to define a function for LATIN1 to UNICODE conversion > > function would look like: > > > > function_for_LATIN1_to_UTF-8(from_string opaque, to_string opaque, length > > integer) > > { > > : > > : > > generic_function_using_iconv(from_str, to_str, "ISO-8859-1", "UTF-8", > > length); > > } > > > > CREATE FUNCTION function_for_LATIN1_to_UTF-8(opaque, opaque, integer) > > RETURNS integer; > > CREAE CONVERSION myconversion FOR 'LATIN1' TO 'UNICODE' FROM > > function_for_LATIN1_to_UTF-8; > > Hmm, but it require define "function_for_..." for each conversion. > For example trigger function I needn't define for each table, but I can > use only one PostgreSQL function for arbirary table. I don't think this is a big problem, IMO. However, thinking more, I came to a conclusion that passing encoding ids would be a good thing. With the encoding id parameters, the function could check if it is called with correct encodings, and this would prevent disaster. New interface proposal: pgconv( INTEGER,-- source encoding id INTEGER,-- destination encoding id OPAQUE, -- source string (null terminated C string) OPAQUE, -- destination string (null terminated C string) INTERGER-- source string length ) returns INTEGER; -- dummy. returns nothing, actually. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Proposal: CREATE CONVERSION
On Wed, 10 Jul 2002, Peter Eisentraut wrote: > Sure. However, Tatsuo maintains that the customary Japanese character > sets don't map very well with Unicode. Personally, I believe that this is > an issue that should be fixed, not avoided, but I don't understand the > issues well enough. I hear this all the time, but I have yet to have someone show me what, Iin SO-2022-JP, EUC-JP or SJIS cannot be transparently translated into Unicode and back. cjs -- Curt Sampson <[EMAIL PROTECTED]> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Proposal: CREATE CONVERSION
On Wed, 10 Jul 2002 08:21, Peter Eisentraut wrote: > Hannu Krosing writes: ... > > I would even reccommend going a step further and storing all 'national' > > character sets in unicode. > > Sure. However, Tatsuo maintains that the customary Japanese character > sets don't map very well with Unicode. Personally, I believe that this is > an issue that should be fixed, not avoided, but I don't understand the > issues well enough. Presumably improving the Unicode support to cover the full UTF32 (or UCS4) range would help with this. Last time I checked, PostgreSQL only supports the UCS2 subset of Unicode, ie 16 bits. From the Unicode propaganda I've read, it seems that one of the main goals of the expansion of the range beyond 16 bits was to answer the complaints of Japanese users. Tim -- --- Tim Allen [EMAIL PROTECTED] Proximity Pty Ltd http://www.proximity.com.au/ http://www4.tpg.com.au/users/rita_tim/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
Hannu Krosing writes: > Can't we do all collating in unicode and convert charsets A and B to and > >from it ? > > I would even reccommend going a step further and storing all 'national' > character sets in unicode. Sure. However, Tatsuo maintains that the customary Japanese character sets don't map very well with Unicode. Personally, I believe that this is an issue that should be fixed, not avoided, but I don't understand the issues well enough. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
Thomas Lockhart writes: > An aside: I was thinking about this some, from the PoV of using our > existing type system to handle this (as you might remember, this is an > inclination I've had for quite a while). I think that most things line > up fairly well to allow this (and having transaction-enabled features > may require it), but do notice that the SQL feature of allowing a > different character set for every column *name* does not map > particularly well to our underlying structures. There more I think about it, the more I come to the conclusion that the SQL framework for "character sets" is both bogus and a red herring. (And it begins with figuring out exactly what a character set is, as opposed to a form-of-use, a.k.a.(?) encoding, but let's ignore that.) The ability to store each column value in a different encoding sounds interesting, because it allows you to create tables such as product_id | product_name_en | product_name_kr | product_name_jp but you might as well create a table such as product_id | lang | product_name with product_name in Unicode, and have a more extensible application that way, too. I think it's fine to have the encoding fixed for the entire database. It sure makes coding easier. If you want to be international, you use Unicode. If not you can "optimize" your database by using a more efficient encoding. In fact, I think we should consider making UTF-8 the default encoding sometime. The real issue is the collation. But the collation is a small subset of the whole locale/character set gobbledigook. Standardized collation rules in standardized forms exist. Finding/creating routines to interpret and apply them should be the focus. SQL's notion to funnel the decision which collation rule to apply through the character sets is bogus. It's impossible to pick a default collation rule for many character sets without applying bias. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
On Tue, Jul 09, 2002 at 10:07:11AM +0900, Tatsuo Ishii wrote: > > > Use a simple wrap function. > > > > How knows this function to/from encoding? > > For example you want to define a function for LATIN1 to UNICODE conversion > function would look like: > > function_for_LATIN1_to_UTF-8(from_string opaque, to_string opaque, length > integer) > { > : > : > generic_function_using_iconv(from_str, to_str, "ISO-8859-1", "UTF-8", > length); > } > > CREATE FUNCTION function_for_LATIN1_to_UTF-8(opaque, opaque, integer) > RETURNS integer; > CREAE CONVERSION myconversion FOR 'LATIN1' TO 'UNICODE' FROM > function_for_LATIN1_to_UTF-8; Hmm, but it require define "function_for_..." for each conversion. For example trigger function I needn't define for each table, but I can use only one PostgreSQL function for arbirary table. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> An aside: I was thinking about this some, from the PoV of using our > existing type system to handle this (as you might remember, this is an > inclination I've had for quite a while). I think that most things line > up fairly well to allow this (and having transaction-enabled features > may require it), but do notice that the SQL feature of allowing a > different character set for every column *name* does not map > particularly well to our underlying structures. I've been think this for a while too. What about collation? If we add new chaset A and B, and each has 10 collations then we are going to have 20 new types? That seems overkill to me. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal: CREATE CONVERSION
On Tue, 2002-07-09 at 03:47, Tatsuo Ishii wrote: > > An aside: I was thinking about this some, from the PoV of using our > > existing type system to handle this (as you might remember, this is an > > inclination I've had for quite a while). I think that most things line > > up fairly well to allow this (and having transaction-enabled features > > may require it), but do notice that the SQL feature of allowing a > > different character set for every column *name* does not map > > particularly well to our underlying structures. > > I've been think this for a while too. What about collation? If we add > new chaset A and B, and each has 10 collations then we are going to > have 20 new types? That seems overkill to me. Can't we do all collating in unicode and convert charsets A and B to and from it ? I would even reccommend going a step further and storing all 'national' character sets in unicode. -- Hannu ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
SQL99 allows on the fly encoding conversion: CONVERT('aaa' USING myconv 'bbb') So there could be more than 1 conversion for a paticlular encodings pair. This lead to an ambiguity for "default" conversion used for the frontend/backend automatic encoding conversion. Can we add a flag indicating that this is the "default" conversion? The new proposed syntax would be: CREATE CONVERSION FOR TO FROM [DEFAULT] Comments? -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal: CREATE CONVERSION
> > If so, what about the "coercibility" property? > > The standard defines four distinct coercibility properties. So in > > above my example, actually you are going to define 80 new types? > > (also a collation could be either "PAD SPACE" or "NO PAD". So you > > might have 160 new types). > > Well, yes I suppose so. The point is that these relationships *must be > defined anyway*. Allowed and forbidden conversions must be defined, > collation order must be defined, indexing operations must be defined, > etc etc etc. In fact, everything typically associated with a type must > be defined, including the allowed conversions between other types > (character sets/collations). > > So, how are we going to do this *in a general way* without carrying the > infrastructure of a (the) type system along with it? What would we be > able to leave out or otherwise get for free if we use another mechanism? > And is that mechanism fundamentally simpler than (re)using the type > system that we already have? Well, I think charset/collation/coercibility/pad are all string data type specific properties, not common to any other data types. So it seems more appropreate for type systems not to have those certain type specific knowledges. For example, S1 < S2 should raise an error if S1 has "no collating properties" and S2 has "implicit collating properties", while ok if S1 has "no collating properties" and S2 has "explicit collating properties". It would be very hard for the type system to handle this kind of cases since it requires special knowledges about string data type. Alternative? Why don't we have these properties in the string data itself? (probably we do not need to have them on disk storage). Existing text data type has length + data. I suggest to extend it like: length + charset + collation + pad + coercibility + data With this above example could be easily handled by < operator. For index, maybe we could dynamically replace varstr_cmp function according to collation, though I have not actually examined my idea closely. -- Tatsuo Ishii ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
> > I've been think this for a while too. What about collation? If we add > > new chaset A and B, and each has 10 collations then we are going to > > have 20 new types? That seems overkill to me. > > Well, afaict all of the operations we would ask of a type we will be > required to provide for character sets and collations. So ordering, > conversions, operators, index methods, etc etc are all required. It > *does* seem like a lot of work, but the type system is specifically > designed to do exactly this. Lifting those capabilities out of the type > system only to reimplement them elsewhere seems all trouble with no > upside. If so, what about the "coercibility" property? The standard defines four distinct coercibility properties. So in above my example, actually you are going to define 80 new types? (also a collation could be either "PAD SPACE" or "NO PAD". So you might have 160 new types). -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
> I've been think this for a while too. What about collation? If we add > new chaset A and B, and each has 10 collations then we are going to > have 20 new types? That seems overkill to me. Well, afaict all of the operations we would ask of a type we will be required to provide for character sets and collations. So ordering, conversions, operators, index methods, etc etc are all required. It *does* seem like a lot of work, but the type system is specifically designed to do exactly this. Lifting those capabilities out of the type system only to reimplement them elsewhere seems all trouble with no upside. Perhaps the current concept of "binary compatible types" could help reduce the complexity, if it were made extensible, which it needs anyway. But in most cases the character set/collation pair is a unique combination, with a limited set of possibilities for other character set/collation pairs with equivalent forms of use, which would keep us from being able to reuse pieces anyway. For most installations, we would install just those character sets the installation/database requires, so in practice the database size need not grow much beyond what it already is. And we could have conventions on how functions and operators are named for a character set and/or collation, so we could auto-generate the SQL definitions given an implementation which meets a template standard. Hmm, an aside which might be relevant: I've been looking at the "national character string" syntax (you know, the N'string' convention) and at the binary and hex string syntax (B'101010' and X'AB1D', as examples) and would like to implement them in the lexer and parser by having the string preceded with a type identifier as though they were something like NATIONAL CHARACTER 'string' BIN '101010' HEX 'AB1D' where both BIN and HEX result in the *same* underlying data type once ingested (or at least a reasonable facimile). I won't be allowed to create two data types with the same type OID, but maybe if I assign them to be binary compatible then I won't have to flesh out the hex data type but only provide an input and output function. - Thomas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
> > (1) a CONVERSION can only be dropped by the superuser or its owner. > > Okay ... > > > (2) a grant syntax for CONVERSION is: > > > GRANT USAGE ON CONVERSION to > > { | GROUP | PUBLIC} [, ...] > > No, I don't think a conversion has any privileges of its own at all. > You either have USAGE on the underlying function, or not. I see. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal: CREATE CONVERSION
> When you say "We do not yet implement the SQL99 forms of character > support", I think you mean the ability to specify per column (or even > per string) charset. I don't think this would happen for 7.3(or 8.0 > whatever), but sometime later I would like to make it reality. Right. An aside: I was thinking about this some, from the PoV of using our existing type system to handle this (as you might remember, this is an inclination I've had for quite a while). I think that most things line up fairly well to allow this (and having transaction-enabled features may require it), but do notice that the SQL feature of allowing a different character set for every column *name* does not map particularly well to our underlying structures. - Thomas ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> > Here is a proposal for new pg_conversion system table. Comments? > > I wonder if the encodings themselves shouldn't be represented in some > system table, too. Admittedly, this is nearly orthogonal to the proposed > system table, except perhaps the data type of the two encoding fields. That would be ideal, but I think that would happen at the same time when CREATE CHARACTER SET would be implemented. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
> > Use a simple wrap function. > > How knows this function to/from encoding? For example you want to define a function for LATIN1 to UNICODE conversion function would look like: function_for_LATIN1_to_UTF-8(from_string opaque, to_string opaque, length integer) { : : generic_function_using_iconv(from_str, to_str, "ISO-8859-1", "UTF-8", length); } CREATE FUNCTION function_for_LATIN1_to_UTF-8(opaque, opaque, integer) RETURNS integer; CREAE CONVERSION myconversion FOR 'LATIN1' TO 'UNICODE' FROM function_for_LATIN1_to_UTF-8; ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Proposal: CREATE CONVERSION
> If so, what about the "coercibility" property? > The standard defines four distinct coercibility properties. So in > above my example, actually you are going to define 80 new types? > (also a collation could be either "PAD SPACE" or "NO PAD". So you > might have 160 new types). Well, yes I suppose so. The point is that these relationships *must be defined anyway*. Allowed and forbidden conversions must be defined, collation order must be defined, indexing operations must be defined, etc etc etc. In fact, everything typically associated with a type must be defined, including the allowed conversions between other types (character sets/collations). So, how are we going to do this *in a general way* without carrying the infrastructure of a (the) type system along with it? What would we be able to leave out or otherwise get for free if we use another mechanism? And is that mechanism fundamentally simpler than (re)using the type system that we already have? - Thomas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: >> I believe the spec just demands USAGE on the underlying function for >> the TRANSLATE case, and I don't see why it should be different for >> CONVERT. (In principle, if we didn't use a C-only API, you could >> just call the underlying function directly; so there's little point >> in having protection restrictions different from that case.) > Ok, so: > (1) a CONVERSION can only be dropped by the superuser or its owner. Okay ... > (2) a grant syntax for CONVERSION is: > GRANT USAGE ON CONVERSION to > { | GROUP | PUBLIC} [, ...] No, I don't think a conversion has any privileges of its own at all. You either have USAGE on the underlying function, or not. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
> I believe the spec just demands USAGE on the underlying function for > the TRANSLATE case, and I don't see why it should be different for > CONVERT. (In principle, if we didn't use a C-only API, you could > just call the underlying function directly; so there's little point > in having protection restrictions different from that case.) Ok, so: (1) a CONVERSION can only be dropped by the superuser or its owner. (2) a grant syntax for CONVERSION is: GRANT USAGE ON CONVERSION to { | GROUP | PUBLIC} [, ...] -- Tatsuo Ishii ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> Tatsuo, it seems that we should use SQL99 terminology and commands where > appropriate. We do not yet implement the SQL99 forms of character > support, and I'm not sure if our current system is modeled to fit the > SQL99 framework. Are you suggesting CREATE CONVERSION to avoid > infringing on SQL99 syntax to allow us to use that sometime later? I'm not sure I understand your question, but I would say I would like to follow SQL99 as much as possible. When you say "We do not yet implement the SQL99 forms of character support", I think you mean the ability to specify per column (or even per string) charset. I don't think this would happen for 7.3(or 8.0 whatever), but sometime later I would like to make it reality. -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii writes: > Here is a proposal for new pg_conversion system table. Comments? I wonder if the encodings themselves shouldn't be represented in some system table, too. Admittedly, this is nearly orthogonal to the proposed system table, except perhaps the data type of the two encoding fields. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
Thomas Lockhart writes: > Tatsuo, it seems that we should use SQL99 terminology and commands where > appropriate. We do not yet implement the SQL99 forms of character > support, and I'm not sure if our current system is modeled to fit the > SQL99 framework. Are you suggesting CREATE CONVERSION to avoid > infringing on SQL99 syntax to allow us to use that sometime later? SQL99 says that the method by which conversions are created is implementation-defined. Tatsuo is defining the implementation. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
... > So I withdraw my earlier comment. But perhaps the syntax of the proposed > command could be aligned with the CREATE TRANSLATION command. Tatsuo, it seems that we should use SQL99 terminology and commands where appropriate. We do not yet implement the SQL99 forms of character support, and I'm not sure if our current system is modeled to fit the SQL99 framework. Are you suggesting CREATE CONVERSION to avoid infringing on SQL99 syntax to allow us to use that sometime later? - Thomas ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > BTW, I wonder if we should invent new access privilege for conversion. I believe the spec just demands USAGE on the underlying function for the TRANSLATE case, and I don't see why it should be different for CONVERT. (In principle, if we didn't use a C-only API, you could just call the underlying function directly; so there's little point in having protection restrictions different from that case.) regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > CATALOG(pg_conversion) > { > NameDataconname; > Oid connamespace; > int4conowner; > int4conforencoding; > int4contoencoding; > Oid conproc; > } FormData_pg_conversion; Should use type "regproc" for conproc, I think. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
On Mon, Jul 08, 2002 at 09:59:44PM +0900, Tatsuo Ishii wrote: > > On Sun, Jul 07, 2002 at 12:58:07PM +0200, Peter Eisentraut wrote: > > > What would be really cool is if we could somehow reuse the conversion > > > modules provided by the C library and/or the iconv library. For example, > > ^^^ > > > > Very good point. Why use own conversion routines/tables if there is common > > library for this? > > I'm not still sure about the details of conversion map used by > iconv. Japanese users have enough trouble with the conversin between > Unicode and othe charsets. This is because there are many variation of > conversion maps provided by vendors. For example, the conversion map > used for Unicode and SJIS in PostgreSQL has been carefully designed to > minimize problems described above. Another issue is the availabilty of > iconv among platforms. If we are sure that a particlular iconv > conversion routine is available on all platforms and the conversion > result is good eough, our conversion routine could be replaced by new > one using iconv. This is not problem if we will have some common API. You can use current conversion tables (maps) and for example I can use iconv on my i386/Linux. I don't want to replace current maps if somebody needs it. I would like to API. I see iconv is included into glibc now. > > I unsure with only one argument for encoding function. What if I want > > to use one generic function for all encodings (for example as API to > > iconv)? > > Use a simple wrap function. How knows this function to/from encoding? Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
> On Sun, Jul 07, 2002 at 12:58:07PM +0200, Peter Eisentraut wrote: > > What would be really cool is if we could somehow reuse the conversion > > modules provided by the C library and/or the iconv library. For example, > ^^^ > > Very good point. Why use own conversion routines/tables if there is common > library for this? I'm not still sure about the details of conversion map used by iconv. Japanese users have enough trouble with the conversin between Unicode and othe charsets. This is because there are many variation of conversion maps provided by vendors. For example, the conversion map used for Unicode and SJIS in PostgreSQL has been carefully designed to minimize problems described above. Another issue is the availabilty of iconv among platforms. If we are sure that a particlular iconv conversion routine is available on all platforms and the conversion result is good eough, our conversion routine could be replaced by new one using iconv. > The encoding API for PostgreSQL is really cool idea. > > I unsure with only one argument for encoding function. What if I want > to use one generic function for all encodings (for example as API to > iconv)? Use a simple wrap function. -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
On Sun, Jul 07, 2002 at 12:58:07PM +0200, Peter Eisentraut wrote: > What would be really cool is if we could somehow reuse the conversion > modules provided by the C library and/or the iconv library. For example, ^^^ Very good point. Why use own conversion routines/tables if there is common library for this? The encoding API for PostgreSQL is really cool idea. I unsure with only one argument for encoding function. What if I want to use one generic function for all encodings (for example as API to iconv)? I think better C interface is: encode( TEXT data, NAME from, NAME to ); where from/to are encoding names. The other way is use some struct that handle this information -- like ARGS in trigger functions. Karel -- Karel Zak <[EMAIL PROTECTED]> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
Here is a proposal for new pg_conversion system table. Comments? /*- * * pg_conversion.h *definition of the system "conversion" relation (pg_conversion) *along with the relation's initial contents. * * * Portions Copyright (c) 1996-2002, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * $Id$ * * NOTES *the genbki.sh script reads this file and generates .bki *information from the DATA() statements. * *- */ #ifndef PG_CONVERSION_H #define PG_CONVERSION_H /* * postgres.h contains the system type definitions and the * CATALOG(), BOOTSTRAP and DATA() sugar words so this file * can be read by both genbki.sh and the C compiler. * */ /* * pg_conversion definition. * * cpp turns this into typedef struct FormData_pg_namespace * * conname name of the conversion * connamespacename space which the conversion belongs to * conownerower of the conversion * conforencoding FOR encoding id * contoencoding TO encoding id * conproc OID of the conversion proc * */ CATALOG(pg_conversion) { NameDataconname; Oid connamespace; int4conowner; int4conforencoding; int4contoencoding; Oid conproc; } FormData_pg_conversion; /* * Form_pg_conversion corresponds to a pointer to a tuple with * the format of pg_conversion relation. * */ typedef FormData_pg_conversion *Form_pg_conversion; /* * compiler constants for pg_conversion * */ #define Natts_pg_conversion 6 #define Anum_pg_conversion_conpname 1 #define Anum_pg_conversion_connamespace 2 #define Anum_pg_conversion_conowner 3 #define Anum_pg_conversion_conforencoding 4 #define Anum_pg_conversion_contoencoding5 #define Anum_pg_conversion_conproc 6 /* * initial contents of pg_conversion * --- */ /* * prototypes for functions in pg_conversion.c */ extern Oid ConversionCreate(const char *conname, Oid connamespace, int32 conowner, int4 conforencoding, int4 contoencoding, Oid conproc); #endif /* PG_CONVERSION_H */ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
> So I withdraw my earlier comment. But perhaps the syntax of the proposed > command could be aligned with the CREATE TRANSLATION command. Ok. What about this? CREATE CONVERSION FOR TO FROM DROP CONVERSION BTW, I wonder if we should invent new access privilege for conversion. -- Tatsuo Ishii ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
Tom Lane writes: > One thing that's really unclear to me is what's the difference between > a and a , other than > that they didn't provide a syntax for defining new conversions. The standard has this messed up. In part 1, a form-of-use and an encoding are two distinct things that can be applied to a character repertoire (see clause 4.6.2.1), whereas in part 2 the term encoding is used in the definition of form-of-use (clause 3.1.5 r). When I sort it out, however, I think that what Tatsuo was describing is indeed a form-of-use conversion. Note that in part 2, clause 4.2.2.1, it says about form-of-use conversions, It is intended, though not enforced by this part of ISO/IEC 9075, that S2 be exactly the same sequence of characters as S1, but encoded according some different form-of-use. A typical use might be to convert a character string from two-octet UCS to one-octet Latin1 or vice versa. This seems to match what we're doing. A character translation does not make this requirement and it explicitly calls out the possibility of "many-to-one or one-to-one mapping between two not necessarily distinct character sets". I imagine that what this is intended to do is to allow the user to create mappings such as ö -> oe (as is common in German to avoid using characters with diacritic marks), or ö -> o (as one might do in French to achieve the same). In fact, it's a glorified sed command. So I withdraw my earlier comment. But perhaps the syntax of the proposed command could be aligned with the CREATE TRANSLATION command. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > I guess you mix up SQL99's "trasnlate" and "convert". No, I believe Peter has read the spec correctly. Further down they have is a function for changing each character of a given string according to some many-to-one or one-to-one mapping between two not necessarily distinct character sets. So this is intended as a one-character-at-a-time mapping, not a language translation (which would be far beyond what anyone would expect of a database anyway). One thing that's really unclear to me is what's the difference between a and a , other than that they didn't provide a syntax for defining new conversions. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
> Tatsuo Ishii writes: > > > > Also, is there anything in SQL99 that we ought to try to be > > > compatible with? > > > > As far as I know there's no such an equivalent in SQL99. > > Sure: > > 11.34 I guess you mix up SQL99's "trasnlate" and "convert". As far as I know, SQL99's "translation" is exactly a translation. e.g. rr) translation: A method of translating characters in one character repertoire into characters of the same or a different character repertoire. For example, certain "translation" might take an input of Engish text, and makes an output of Japanese one (I don't know if we could implement such a translation though :-). On the other hand "convert" just changes the "form-of-use" (SQL's term, actually equivalent to "encoding"), keeping the character repertoire. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii writes: > > Also, is there anything in SQL99 that we ought to try to be > > compatible with? > > As far as I know there's no such an equivalent in SQL99. Sure: 11.34 Function Define a character translation. Format ::= CREATE TRANSLATION FOR TO FROM ::= ::= ::= | ::= ::= That's pretty much exactly what you are descibing. What would be really cool is if we could somehow reuse the conversion modules provided by the C library and/or the iconv library. For example, I have 176 "modules" under /usr/lib/gconv. They should be useful for something. -- Peter Eisentraut [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Proposal: CREATE CONVERSION
> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > I am worried about that too. But if we stick a C-level API, how can we > > define the argument data type suitable for C string? I don't see such > > data types. Maybe you are suggesting that we should not use CREATE > > FUNCTION? > > Well, you'd have to use the same cheat that's used for selectivity > estimation functions, triggers, I/O functions and everything else that > deals in internal datatypes: declare the function as taking and > returning OPAQUE. This is moderately annoying but I don't think > there's anything really wrong with it in practice. Oh, I see. -- Tatsuo Ishii ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > I am worried about that too. But if we stick a C-level API, how can we > define the argument data type suitable for C string? I don't see such > data types. Maybe you are suggesting that we should not use CREATE > FUNCTION? Well, you'd have to use the same cheat that's used for selectivity estimation functions, triggers, I/O functions and everything else that deals in internal datatypes: declare the function as taking and returning OPAQUE. This is moderately annoying but I don't think there's anything really wrong with it in practice. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Proposal: CREATE CONVERSION
> I see two different functions linked to from each pg_wchar_table > entry... although perhaps those are associated with encodings > not with conversions. Yes. those are not directly associated with conversions. > IIRC the existing conversion functions deal in C string pointers and > lengths. I'm a little worried about the extra overhead implicit > in converting to a TEXT object and back again; that probably means at > least two more palloc and memcpy operations. I think you'd be better > off sticking to a C-level API, because I really don't believe that > anyone is going to code conversion functions in (say) plpgsql. I am worried about that too. But if we stick a C-level API, how can we define the argument data type suitable for C string? I don't see such data types. Maybe you are suggesting that we should not use CREATE FUNCTION? -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii wrote: > Here is my proposal for new CREATE CONVERSION which makes it possible > to define new encoding conversion mapping between two encodings on the > fly. > > The background: > > We are getting having more and more encoding conversion tables. Up to > now, they reach to 385352 source lines and over 3MB in compiled forms > in total. They are statically linked to the backend. I know this > itself is not a problem since modern OSs have smart memory management > capabilities to fetch only necessary pages from a disk. However, I'm > worried about the infinite growing of these static tables. I think > users won't love 50MB PostgreSQL backend load module. Yes, those conversion tables are getting huge in the tarball too: $ pwd /pg/backend/utils/mb $ du 4 ./CVS 7 ./Unicode/CVS 9541./Unicode 15805 . Look at these two file alone: -rw-r--r-- 1 postgres wheel 1427492 Jun 13 04:28 gb18030_to_utf8.map -rw-r--r-- 1 postgres wheel 1427492 Jun 13 04:28 utf8_to_gb18030.map If we can make these loadable, that would be good. What would be really interesting is if we could split these out into a separate directory/project so development on those could take place in an independent way. This would probably stimulate even more encoding options for users. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Proposal: CREATE CONVERSION
> > CREATE CONVERSION > >SOURCE > >DESTINATION > >FROM > > Doesn't a conversion currently require several support functions? > How much overhead will you be adding to funnel them all through > one function? No, only one function is sufficient. What else do you think of? > Basically I'd like to see a spec for the API of the conversion > function... That would be very simple (the previous example I gave was unnecessary complex). The function signature would look like: conversion_funcion(TEXT) RETURNS TEXT It receives source text and converts it then returns it. That's all. > Also, is there anything in SQL99 that we ought to try to be > compatible with? As far as I know there's no such an equivalent in SQL99. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: >> Doesn't a conversion currently require several support functions? >> How much overhead will you be adding to funnel them all through >> one function? > No, only one function is sufficient. What else do you think of? I see two different functions linked to from each pg_wchar_table entry... although perhaps those are associated with encodings not with conversions. >> Basically I'd like to see a spec for the API of the conversion >> function... > That would be very simple (the previous example I gave was unnecessary > complex). The function signature would look like: > conversion_funcion(TEXT) RETURNS TEXT > It receives source text and converts it then returns it. That's all. IIRC the existing conversion functions deal in C string pointers and lengths. I'm a little worried about the extra overhead implicit in converting to a TEXT object and back again; that probably means at least two more palloc and memcpy operations. I think you'd be better off sticking to a C-level API, because I really don't believe that anyone is going to code conversion functions in (say) plpgsql. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal: CREATE CONVERSION
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > Syntax proposal: > CREATE CONVERSION >SOURCE >DESTINATION >FROM Doesn't a conversion currently require several support functions? How much overhead will you be adding to funnel them all through one function? Basically I'd like to see a spec for the API of the conversion function... Also, is there anything in SQL99 that we ought to try to be compatible with? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly