Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Sep 07, 2006 at 01:27:01PM +0100, Gregory Stark wrote: >> ... If you look again at the columns in my example you'll >> see none of them are appropriate targets for i18n anyways. They're all codes >> and even numbers. > > Which begs the question of why they don't store the numbers in numeric > columns? That'll take far less space than any string.
Sure, unless by PRVDR_NUM they mean things like '000001' or even 'C00001'. ... > It makes sense to store them as numbers, or perhaps an enum. Sure. If you're designing the schema from scratch and don't have to interoperate with any other systems. But if you're receiving a flat ascii text file and it has a 5 character opaque identifier called "FI_NUM" which do you think is the safer approach for storing these opaque identifiers? You can suggest that there are other ways of designing a schema that will work better with Postgres but I think you're just working around Postgre deficiencies. These may be deficiencies that are pretty low down your priority list but they may be higher up my list. I just don't think you can argue they're not deficiencies just because you know better than to get tripped up by them. I'm discussing these things with an eye to getting some kind of consensus on what should be done about them so I can go do it, not because I'm trying to get you to work on it :) >> Currently Postgres has a limitation that you can only have one encoding per >> database and one locale per cluster. Personally I'm of the opinion that the >> only correct choice for that is "C" and all localization should be handled in >> the client and with pg_strxfrm. Putting the whole database into non-C locales >> guarantees that the columns that should not be localized will have broken >> semantics and there's no way to work around things in the other direction. > > Quite. So if someone would code up SQL COLLATE support and integrate > ICU, everyone would be happy and we could all go home. Well I for one would be pretty unhappy if ICU were integrated. It seems like a whole pile of code and complexity for no particular gain. The standard i18n support with a few extensions (namely strcoll_l) seems to be adequate for us and not introduce huge new dependencies and code burdens. > BTW, requireing localisation to happen in the client is silly. SQL > provides the ORDER BY clause for strings and it'd be silly to have the > client resort them just because they're not using C locale. The point > of a database was to make your life easier, right? That's why I mentioned pg_strxfrm. It doesn't solve all your problems if you're doing lots of string manipulations in queries but it can handle collation so you can at least execute ORDER BY clauses which of course you can't efficiently do in the client. For anything more complex you're probably happier doing your string manipulations in the client just because SQL's string primitives are so, well, primitive. >> Perhaps given the current situation what we should have is a cvarchar and >> cchar data types that are like varchar and char but guaranteed to always be >> interpreted in the c locale with ascii encoding. > > I think bytea gives you that, pretty much. Hm, that's an interesting idea. We could define all the string functions for bytea as well. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly