Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-18 Thread Martin Flahault

Very interesting discussion indeed.

It seems that Postgresql:The world's most advanced open source database can 
not work properly on Mac OS X: the world's most advanced operating system and 
FreeBSD.

Don't you think postgresql.org should remove from their download page the links 
to FreeBSD and Mac OS X binary packages?

Martin Flahault 



Le 18 janv. 2010 à 08:10, Martijn van Oosterhout a écrit :

 On Sat, Jan 16, 2010 at 09:10:53PM +, Greg Stark wrote:
 Switching to ICU means trading our current inconsistency from platform
 to platform for a different inconsistency which would be better in
 some cases and worse in others.
 
 Or, you can have the cake and eat it too. That is, aim for the end goal
 and let people choose what library they want to use for sorting (that
 is, extend the meaning of the locale identifier). Patches for this
 should be in the archives somewhere. As I recall the reason this was
 rejected is that *BSD lack the capability of handling multiple
 collation algorithms at all at the libc level (that is, if you don't
 just tell people to use ICU in that case).
 
 Mac OS X doesn't have great POSIX locale support but at least they
 implemented strcoll_l.
 
 Have a nice day,
 -- 
 Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-17 Thread Martijn van Oosterhout
On Sat, Jan 16, 2010 at 09:10:53PM +, Greg Stark wrote:
 Switching to ICU means trading our current inconsistency from platform
 to platform for a different inconsistency which would be better in
 some cases and worse in others.

Or, you can have the cake and eat it too. That is, aim for the end goal
and let people choose what library they want to use for sorting (that
is, extend the meaning of the locale identifier). Patches for this
should be in the archives somewhere. As I recall the reason this was
rejected is that *BSD lack the capability of handling multiple
collation algorithms at all at the libc level (that is, if you don't
just tell people to use ICU in that case).

Mac OS X doesn't have great POSIX locale support but at least they
implemented strcoll_l.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-16 Thread Greg Stark
On Thu, Jan 14, 2010 at 6:15 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 Out of interest: Why not?

 There's plenty of discussion in the archives about it, but basically
 ICU would represent a pretty enormous dependency and would lock us in
 to having no other backend encoding but UTF8.

 Thanks. You're right - I should've just STFA ;-) so I appreciate the
 brief explanation.

There's also the question of whether being inconsistent with the rest
of the system is really the right thing at all. If a programmer does a
sort in the database and then writes application code using the same
collation on the same system which depends on the data being sorted
it's nice that that works. Or if an admin presorts the data using sort
before doing a data load he might reasonable expect the table to be
effectively clustered.

Switching to ICU means trading our current inconsistency from platform
to platform for a different inconsistency which would be better in
some cases and worse in others.


-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Maximilian Tyrtania
Am 12.01.2010 um 12:36 schrieb Martin Flahault:

 We have spend some time evaluating PostgreSQL and we can't get correct 
 outputs with the ORDER BY command.
 LC_COLLATE and LC_CTYPE are set to fr_FR.UTF-8.
 
 It seems there is a known problem with the collating order of text including 
 diacritics with the UTF8 encodings on BSD systems.
 
 Does anyone know a workaround ?

The best i've seen so far is:

CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert text)
 RETURNS text AS
$BODY$
select translate(upper($1),'ÄÖÜ','AOU')--add french diacritical characters here
$BODY$
 LANGUAGE 'sql' IMMUTABLE STRICT
 COST 100;
ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;

CREATE INDEX idx_mytable_myfield_orderbyfriendly
 ON mytable
 USING btree
 (f_getorderbyfriendlyversion(myfield::text));

Select * from mytable order by f_getorderbyfriendlyversion(myfield);

Not an ideal solution, but it seems to perform quite well.

Best wishes from Berlin,

Maximilian Tyrtania

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Craig Ringer

On 12/01/2010 7:36 PM, Martin Flahault wrote:

Hi,

We are a software publisher searching for a new DBMS for our software.
We have more than one hundred installed servers, running Mac OS and a
Primebase database.

We have spend some time evaluating PostgreSQL and we can't get correct
outputs with the ORDER BY command.


Can you provide a sample? Include a table of sample values, an example 
query, its output, and what you'd expect to get instead? And why?


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Martin Flahault

Here is an exemple :

postgres=# create database newbase;
CREATE DATABASE
postgres=# \c newbase;
psql (8.4.2)
You are now connected to database newbase.
newbase=# create table t1 (contenu text);
CREATE TABLE
newbase=# insert into t1 values ('a'), ('e'), ('à'), ('é'), ('A'), ('E');
INSERT 0 6

newbase=# select * from t1 order by contenu;
 contenu 
-
 A
 E
 a
 e
 à
 é
(6 rows)

newbase=# select * from t1 order by upper(contenu);
 contenu 
-
 a
 A
 e
 E
 à
 é
(6 rows)


Here is the encoding informations :

newbase=# \encoding
UTF8
newbase=# show lc_collate;
 lc_collate 

 fr_FR
(1 row)

newbase=# show lc_ctype;
 lc_ctype 
--
 fr_FR
(1 row)


As with others DBMS (MySQL for example), diacritics should be ignored when 
determining the sort order. Here is the expected output:
 a
 à
 A
 e
 é 
 E


It seems there is a problem with the collating order on BSD systems with 
diacritics using UTF8.
If you put this text :
a
A
à
é
e
E

in a UTF8 text file and use the sort command on it, you will have the same 
wrong output as with PostgreSQL :
A
E
a
e
à
é

Hope this will help,

Martin



Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Martijn van Oosterhout
On Wed, Jan 13, 2010 at 04:15:06PM +0100, Martin Flahault wrote:

[postgres]
 newbase=# select * from t1 order by contenu;
  contenu 
 -
  A
  E
  a
  e

Postgresql outputs whatever the C library does on the underlying
system. The quality of this varies wildly.
  à
 As with others DBMS (MySQL for example), diacritics should be ignored when 
 determining the sort order. Here is the expected output:

MySQL implements the unicode collation algorithm, which means it
essentially does what you want.
 
 It seems there is a problem with the collating order on BSD systems with 
 diacritics using UTF8.

Last I checked, BSD did not support useful sorting on UTF-8 at all, so
it's not surprised it doesn't work.

 in a UTF8 text file and use the sort command on it, you will have the same 
 wrong output as with PostgreSQL :

Yes, that's the basic idea. Mac OS X apparently provides ICU underneath
for programs that would like true unicode collation, but there is
little chance that postgresql will ever use this.

Hope this helps,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Please line up in a tree and maintain the heap invariant while 
 boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Craig Ringer

On 13/01/2010 11:15 PM, Martin Flahault wrote:


It seems there is a problem with the collating order on BSD systems with
diacritics using UTF8.
If you put this text :
a
A
à
é
e
E

in a UTF8 text file and use the sort command on it, you will have the
same wrong output as with PostgreSQL :
A
E
a
e
à
é


First: PostgreSQL expects the OS to behave correctly and sort according 
to the locale. It relies on the C library for this. If the C library 
doesn't do it right, PostgreSQL won't do it right either. So you need to 
get Mac OS X to do the right thing.


Your results match what I get on a Linux system without a properly 
generated fr_FR.UTF-8 locale. Libc falls back on the C locale, which 
sorts that way.


If I generate the fr_FR.UTF-8 locale and run the sort (on the file x), 
I get the desired result:


LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 sort x
a
A
à
e
E
é

I don't know Mac OS X well, but this is making me wonder if maybe you're 
just missing the required information for the locale, so libc is falling 
back on the C locale.


(Of course, being Mac OS X there are probably at least three out of date 
or simply false man pages describing the behaviour, none of which 
reflect the reality of a magic config key buried somewhere in NetInfo, 
for which the documentation is also completely out of date. Bitter? Me? 
Yeah, I admin a bunch of OS X machines on a business network.)


Hmm... a quick test suggests that Mac OS X (testing on 10.4) at least 
*thinks* it supports the fr_FR.UTF-8 locale:


osx104$ LANG=xxx LC_ALL=xxx locale
LANG=xxx
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=C
LC_NUMERIC=C
LC_TIME=C
LC_ALL=C

osx104$ LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 locale
LANG=fr_FR.UTF-8
LC_COLLATE=fr_FR.UTF-8
LC_CTYPE=fr_FR.UTF-8
LC_MESSAGES=fr_FR.UTF-8
LC_MONETARY=fr_FR.UTF-8
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_ALL=fr_FR.UTF-8

osx104$ locale -a  | grep fr_FR
fr_FR
fr_FR.ISO8859-1
fr_FR.ISO8859-15
fr_FR.UTF-8

... yet it clearly doesn't:

osx104$ LANG=C LC_ALL=C sort x
A
E
a
e
à
é
osx104$ LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 sort x
A
E
a
e
à
é
osx104$ LANG=fr_FR.ISO8859-1 LC_ALL=fr_FR.ISO8859-1 sort x
A
E
a
e
à
é

Mac OS X seems to keep its locale config in /usr/share/locale . Looking 
there, there are clearly LC_COLLATE files for fr_FR.UTF-8 . However, 
they're identical to those for en_US.UTF-8:


osx104$ cd /usr/share/locale
osx104$ diff fr_FR.UTF-8/LC_COLLATE en_US.UTF-8/LC_COLLATE

... so your OS's localized collation support is broken/missing, at least 
if the same is true for more modern versions of OS X.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Craig Ringer
Martijn van Oosterhout wrote:

 in a UTF8 text file and use the sort command on it, you will have the same 
 wrong output as with PostgreSQL :
 
 Yes, that's the basic idea. Mac OS X apparently provides ICU underneath
 for programs that would like true unicode collation, but there is
 little chance that postgresql will ever use this.

Out of interest: Why not?

Using ICU would permit Pg to be independent of libc's collation rules,
finally permitting things like specifying a specific collation for a
textual sort. It'd make mixing data from different locales in a database
a lot easier (read: possible to do correctly).

Is this just a matter of nobody cares enough to produce a solid, tested
patch with equivalent performance that doesn't turn people who try to
review it green with disgust ... or are there specific reasons why
using something like ICU instead of libc's locale support is not
appropriate for Pg?

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Tom Lane
Craig Ringer cr...@postnewspapers.com.au writes:
 Martijn van Oosterhout wrote:
 Yes, that's the basic idea. Mac OS X apparently provides ICU underneath
 for programs that would like true unicode collation, but there is
 little chance that postgresql will ever use this.

 Out of interest: Why not?

There's plenty of discussion in the archives about it, but basically
ICU would represent a pretty enormous dependency and would lock us in
to having no other backend encoding but UTF8.

The state of OS X's POSIX-spec locale support is pretty pitiful, but on
the whole I'd say if you need better UTF8 locale support you could use
another OS.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Craig Ringer
Tom Lane wrote:
 Craig Ringer cr...@postnewspapers.com.au writes:
 Martijn van Oosterhout wrote:
 Yes, that's the basic idea. Mac OS X apparently provides ICU underneath
 for programs that would like true unicode collation, but there is
 little chance that postgresql will ever use this.
 
 Out of interest: Why not?
 
 There's plenty of discussion in the archives about it, but basically
 ICU would represent a pretty enormous dependency and would lock us in
 to having no other backend encoding but UTF8.

Thanks. You're right - I should've just STFA ;-) so I appreciate the
brief explanation.

 The state of OS X's POSIX-spec locale support is pretty pitiful, but on
 the whole I'd say if you need better UTF8 locale support you could use
 another OS.

That's my personal opinion too ... I have the pleasure of
administrating an OS X Server and six Mac Pro clients at work, and have
become well acquainted with the exciting variety of bugs, undocumented
features, and bizarre quirks of that particular OS. POSIX locale
issues are the least of its issues.

Alas, people will want to run Pg on it anyway, especially when bundling
with an app. It'd be nice if it could be made to work smoothly ... but I
certainly don't care enough to try! Like you, I favour using an OS that
follows the specs it claims to support instead.

Perhaps someone who wants to use Mac OS X and Pg for their product will
come forward with some compat wrapper functions for the localizable
libc/posix functions, so Pg can just be built against the wrapper and
the rest of us need not care about OS X's bugs.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Scott Marlowe
On Wed, Jan 13, 2010 at 11:15 PM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 Perhaps someone who wants to use Mac OS X and Pg for their product will
 come forward with some compat wrapper functions for the localizable
 libc/posix functions, so Pg can just be built against the wrapper and
 the rest of us need not care about OS X's bugs.

I know this sounds crazy, but couldn't Apple be bothered to fix their OS?  :)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Tom Lane
Craig Ringer cr...@postnewspapers.com.au writes:
 Tom Lane wrote:
 The state of OS X's POSIX-spec locale support is pretty pitiful, but on
 the whole I'd say if you need better UTF8 locale support you could use
 another OS.

 Alas, people will want to run Pg on it anyway, especially when bundling
 with an app. It'd be nice if it could be made to work smoothly ... but I
 certainly don't care enough to try! Like you, I favour using an OS that
 follows the specs it claims to support instead.

For the record, I *like* OS X; I'm typing this on a Macbook Pro.  But
it doesn't do everything, and one of the things it doesn't do well is
POSIX-spec locale support.

As you now know from having looked at the archives, there've been many
discussions of changing PG to not rely on the platform's locale
support.  But the bang-for-buck ratio of any such change doesn't seem
very attractive --- we have a lot of higher priority things to spend
our finite development manpower on.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-13 Thread Craig Ringer
Scott Marlowe wrote:
 On Wed, Jan 13, 2010 at 11:15 PM, Craig Ringer
 cr...@postnewspapers.com.au wrote:
 Perhaps someone who wants to use Mac OS X and Pg for their product will
 come forward with some compat wrapper functions for the localizable
 libc/posix functions, so Pg can just be built against the wrapper and
 the rest of us need not care about OS X's bugs.
 
 I know this sounds crazy, but couldn't Apple be bothered to fix their OS?  :)

I see you don't use Macs much :-P

( Goes back to researching a way to make network automounts on login
work and properly support search in 10.6, since the only way that used
to work in 10.4 is broken by 10.6 )

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Collate order on Mac OS X, text with diacritics in UTF-8

2010-01-12 Thread Martin Flahault
Hi,

We are a software publisher searching for a new DBMS for our software. We have 
more than one hundred installed servers, running Mac OS and a Primebase 
database.

We have spend some time evaluating PostgreSQL and we can't get correct outputs 
with the ORDER BY command.
LC_COLLATE and LC_CTYPE are set to fr_FR.UTF-8.

It seems there is a known problem with the collating order of text including 
diacritics with the UTF8 encodings on BSD systems.

Does anyone know a workaround ?

Thanks for your answers,

Martin



[GENERAL] Collate problem when using ORDER BY

2008-05-30 Thread David Harel

Hi,

I'm new here. I work on a site that uses postgres version 8.1.5 and 
database encoding ISO_8859_8. When I sort select requests using order by 
the weight of the characters seem to be really funny (but consistent). 
It seems to me that if I create a phantom converted field such as:
SELECT field convert(field, ISO_8859_8 ,UTF8) as field_utf8 ORDER BY 
field_utf8;

Then the sort will be OK however, I get the error:
Query failed: ERROR: character 0xd7 of encoding ISO_8859_8 has no 
equivalent in UTF8.


Is it due to a bad character in the content?
Can I get rid of it like find/replace?
Can I change collate for a given table/database?

--
Regards.

David Harel,

==

Home office +972 77 7657645
Fax:+972 77 7657645
Cellular:   +972 54 4534502
Snail Mail: Amuka
   D.N Merom Hagalil
   13802
   Israel
Email:  [EMAIL PROTECTED]




Re: [GENERAL] Collate problem when using ORDER BY

2008-05-30 Thread Zdenek Kotala

David Harel napsal(a):

Hi,

I'm new here. I work on a site that uses postgres version 8.1.5 and 
database encoding ISO_8859_8. When I sort select requests using order by 
the weight of the characters seem to be really funny (but consistent). 
It seems to me that if I create a phantom converted field such as:
SELECT field convert(field, ISO_8859_8 ,UTF8) as field_utf8 ORDER BY 
field_utf8;

Then the sort will be OK however, I get the error:
Query failed: ERROR: character 0xd7 of encoding ISO_8859_8 has no 
equivalent in UTF8.


Is it due to a bad character in the content?
Can I get rid of it like find/replace?
Can I change collate for a given table/database?



You can use only one encoding per database. It means you convert data from UTF8 
to ISO_8859_8, but you store it back into UTF8 encoded field. Probably strcoll 
function is confused with illegal UTF8 character which has been created by your 
conversion.


Zdenek

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] COLLATE

2006-08-17 Thread Kevin Murphy

Tom Lane wrote:

Kevin Murphy [EMAIL PROTECTED] writes:

What is the answer to Filip's question?  I didn't see an answer in the list 
archives.  I've seen several copies of Joe Conway's pg_strxfrm.c code on the 
web, and it always refers to the Warn_restart variable, which doesn't seem to 
exist in the 8.1.4 code that I'm using.


Warn_restart hasn't existed since PG 7.4.  I would imagine that the code
needs to be tweaked to use a PG_TRY construct instead of direct setjmp
hacking.


Yes, I'm a user, not a hacker.  I was hoping that someone had done this 
already.  Anyway, I gave PG_TRY a try, and the code superficially works. 
I have no idea what I'm doing; you can see what I did below.


Confirm that instead of:

   memcpy(save_restart, Warn_restart, sizeof(save_restart));
   if (sigsetjmp(Warn_restart, 1) != 0)
   {
   memcpy(Warn_restart, save_restart, sizeof(Warn_restart));
   newlocale = setlocale(LC_COLLATE, oldlocale);
   if (!newlocale)
   elog(PANIC, setlocale failed to reset locale: 
%s, localestr);

   siglongjmp(Warn_restart, 1);
   }
   ...
   code here
   ...
   memcpy(Warn_restart, save_restart, sizeof(Warn_restart));


it should be:

   PG_TRY();
   {
...
code here
...
   }
   PG_CATCH();
   {
newlocale = setlocale(LC_COLLATE, oldlocale);
if (!newlocale)
elog(PANIC, setlocale failed to reset locale: %s, 
localestr);


   }
   PG_END_TRY();

Thanks,
Kevin Murphy


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] COLLATE

2006-08-16 Thread Kevin Murphy

Filip Rembiałkowski wrote:

Greg Stark wrote:

But the closest partial solution suggested so far is the pg_xfrm (sic) 
function that
has been implemented and posted at least three times by three different
posters to the postgres mailing lists. In the interest of avoiding a fourth
independent implementation I'll attach the one I use below, it's not big.


But I have no idea how to compile/install it.

I tried compiling but it failed:

pgdba ( at ) sulaco:~/src/postgresql-8.1.3/src$ gcc -I ./include -fPIC -c 
pg_strxfrm.c
pg_strxfrm.c: In function 'pg_strxfrm':
pg_strxfrm.c:98: error: 'Warn_restart' undeclared (first use in this function)
pg_strxfrm.c:98: error: (Each undeclared identifier is reported only once
pg_strxfrm.c:98: error: for each function it appears in.)


What is the answer to Filip's question?  I didn't see an answer in the list 
archives.  I've seen several copies of Joe Conway's pg_strxfrm.c code on the 
web, and it always refers to the Warn_restart variable, which doesn't seem to 
exist in the 8.1.4 code that I'm using.

I am working with a database in UTF-8 encoding using C collation - but I'd occasionally like to ORDER BY columns containing real UTF-8 data. 


Would the pg_strxfrm() function get used in a new operator class function?   
I'll read up on operator classes in chapter 32.14 of the docs, but if someone 
has a simple example, it might help other searchers of the archives.

Thanks,
Kevin Murphy




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] COLLATE

2006-08-16 Thread Tom Lane
Kevin Murphy [EMAIL PROTECTED] writes:
 What is the answer to Filip's question?  I didn't see an answer in the list 
 archives.  I've seen several copies of Joe Conway's pg_strxfrm.c code on the 
 web, and it always refers to the Warn_restart variable, which doesn't seem to 
 exist in the 8.1.4 code that I'm using.

Warn_restart hasn't existed since PG 7.4.  I would imagine that the code
needs to be tweaked to use a PG_TRY construct instead of direct setjmp
hacking.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] COLLATE

2006-06-14 Thread Jan Behrens

Filip Rembiałkowski wrote:

let's assume that we keep Unicode text  data in the column.
sometimes we want to sort it according to specific collation order.
how can we force collation when running a query?


Hi Filip,

I had the same problem you have. As a solution I implemented a wrapper 
function named collkey(), which transforms a Unicode string into a 
sortable collation key using IBM's ICU library. Of course it's not as 
nice as having a standard SQL command for it, but i think it's a nice 
solution, until a good collation support is provided by PostgreSQL itself.


You can find the source here:
http://www.flexiguided.de/publications.pgcollkey.en.html

I can understand that people don't want PostgreSQL being dependent of 
other libraries. The best solution for future would be implementing 
independent collation functions inside of PostgreSQL, which also support 
standard SQL syntax, but that's a lot of work, i think.


Jan Behrens

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] COLLATE

2006-06-07 Thread Filip Rembiałkowski

(group dupe)

05 Jun 2006 12:53:57 -0400, Greg Stark [EMAIL PROTECTED]:


But the closest partial solution suggested so far is the pg_xfrm function that
has been implemented and posted at least three times by three different
posters to the postgres mailing lists. In the interest of avoiding a fourth
independent implementation I'll attach the one I use below, it's not big.



Thanks! It may be the only solution available at the moment.

But I have no idea how to compile/install it.
I imagine there should be separate /contrib/pg_strxfrm directory, with
the makefile etc.
is it correct? if so, how should the makefile look like?

I tried compiling but it failed:

[EMAIL PROTECTED]:~/src/postgresql-8.1.3/src$ gcc -I ./include -fPIC -c 
pg_strxfrm.c
pg_strxfrm.c: In function 'pg_strxfrm':
pg_strxfrm.c:98: error: 'Warn_restart' undeclared (first use in this function)
pg_strxfrm.c:98: error: (Each undeclared identifier is reported only once
pg_strxfrm.c:98: error: for each function it appears in.)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] COLLATE

2006-06-07 Thread Martijn van Oosterhout
On Wed, Jun 07, 2006 at 12:40:49AM +0400, Nikolay Samokhvalov wrote:
 I were looking forward to this feature... So many troubles in my
 projects would be vanished...
 
 What is 'lack of interest'? Interest from community, or major
 developers, or your personal one?

Kind of all three, feeding off eachother. There's just not enough
interest from any front to really get it moving. It's a fairly invasive
change and without significant support and interest from somewhere,
chances of completion let alone acceptance are pretty slim...

Have a nice day
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] COLLATE

2006-06-07 Thread Tomi NA

On 6/7/06, Martijn van Oosterhout kleptog@svana.org wrote:


Kind of all three, feeding off eachother. There's just not enough
interest from any front to really get it moving. It's a fairly invasive
change and without significant support and interest from somewhere,
chances of completion let alone acceptance are pretty slim...


I don't get it. Maybe it's me or my environment.
About half of the work we do in my company is building webs on the
(excellent) eZ publish CMS. We routinely use MySQL even though the CMS
(at least declaratively) wolks on top of pgsql. We use MySQL on
inertia: it's the default db under eZ publish and it's easy to get
hosting.
On the other hand, I'd rather bite my hand off than use MySQL in some
other Internet apps (e-shop with a card payment interface and others).
Where am I going with all this?
Well, if you don't live in an english-speaking country, there's no
such thing as a single language web. Because of collation limitations,
however, postgresql would be the first to be crossed out on my list no
matter how good it is in all other respects.
I understand that the needed change is uncomfortably invasive, but not
beeing able to collate correctly is a show-stopping problem on a
professional site. The user opens a drop-down with several dozens of
cities, scrolls down a bit to where his city should be, dooesn't see
it because the collator places the weird letter all the way down at
the end of the list - and the user walks away. The other user can't
get info about the bus lines to the city or he doesn't buy a product
he want's because he doesn't see it where it's supposed to be. Another
user notices the error and dissregards the site as amateurish.
I understand I'm talking about a specific area of use, but isn't that
where the biggest growth in both the number of new applications as
well as the number of users is?
As it is, I'm happy to have a great RDBMS to build intranet, rich
client apps on - but that's about it.
I've used MySQL and MSSQL (unfortunately) and they both support
collation much better than pgsql. Not perfect, but much much better.
Maybe postgresql should try to set less ambitious goals and instead of
going for the holy grail of collation management (which is usualy
praiseworthy) try to provide at least db-level collation definitions
if table/row/cell level collation settings should proove too
challenging at the moment.

t.n.a.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] COLLATE

2006-06-07 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Wed, Jun 07, 2006 at 12:40:49AM +0400, Nikolay Samokhvalov wrote:
 What is 'lack of interest'? Interest from community, or major
 developers, or your personal one?

 Kind of all three, feeding off eachother. There's just not enough
 interest from any front to really get it moving. It's a fairly invasive
 change and without significant support and interest from somewhere,
 chances of completion let alone acceptance are pretty slim...

I think there was also considerable concern about introducing a
dependency on a very large chunk of outside software (viz, ICU).
Loss of control, licensing questions, etc.

Of course, doing it *without* ICU is even more daunting :-(
but I'd like to think we'll get there someday.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] COLLATE

2006-06-06 Thread Nikolay Samokhvalov

On 6/5/06, Martijn van Oosterhout kleptog@svana.org wrote:


Yeah, I was working on it but got stuck on the planner/optimiser
changes. In the mean time the tree drifted and lack of interest, which
gets us where we are now...


Very bad news :-(
I were looking forward to this feature... So many troubles in my
projects would be vanished...

What is 'lack of interest'? Interest from community, or major
developers, or your personal one?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] COLLATE

2006-06-05 Thread Filip Rembiałkowski

Hi all,

let's assume that we keep Unicode text  data in the column.
sometimes we want to sort it according to specific collation order.
how can we force collation when running a query?
ideal solution would be having SQL92 standard COLLATE clauses in
SELECT statements. I heard it's work in progress, what's the status?

are there any partial solutions to the problem?

regards,

Filip Rembiałkowski

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] COLLATE

2006-06-05 Thread Martijn van Oosterhout
On Mon, Jun 05, 2006 at 05:04:25PM +0200, Filip Rembia??kowski wrote:
 Hi all,
 
 let's assume that we keep Unicode text  data in the column.
 sometimes we want to sort it according to specific collation order.
 how can we force collation when running a query?
 ideal solution would be having SQL92 standard COLLATE clauses in
 SELECT statements. I heard it's work in progress, what's the status?

Yeah, I was working on it but got stuck on the planner/optimiser
changes. In the mean time the tree drifted and lack of interest, which
gets us where we are now...

 are there any partial solutions to the problem?

Not that I know of.

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [GENERAL] COLLATE

2006-06-05 Thread Greg Stark
Filip Rembiaâkowski [EMAIL PROTECTED] writes:

 let's assume that we keep Unicode text  data in the column.
 sometimes we want to sort it according to specific collation order.
 how can we force collation when running a query?
 ideal solution would be having SQL92 standard COLLATE clauses in
 SELECT statements. I heard it's work in progress, what's the status?
 
 are there any partial solutions to the problem?

I don't know the status of the full COLLATE support.

But the closest partial solution suggested so far is the pg_xfrm function that
has been implemented and posted at least three times by three different
posters to the postgres mailing lists. In the interest of avoiding a fourth
independent implementation I'll attach the one I use below, it's not big.


/*
  * Joe Conway [EMAIL PROTECTED]
  *
  * Copyright (c) 2004, Joseph E. Conway
  * ALL RIGHTS RESERVED
  *
  * Permission to use, copy, modify, and distribute this software and its
  * documentation for any purpose, without fee, and without a written agreement
  * is hereby granted, provided that the above copyright notice and this
  * paragraph and the following two paragraphs appear in all copies.
  *
  * IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
  * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
  * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
  * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  * THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
  * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
  * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
  * ON AN AS IS BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
  * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
  *
  */

/*
 * 
 * If your libc strxfrm() overflows the buffer provided ignoring the length
 * argument then add this define. It causes this function to do an extra
 * strxfrm() call on every execution to get the actual size of buffer needed.
 * 
 * #define DONT_TRUST_STRXFRM
*/

/*
 *
 * CREATE OR REPLACE function pg_strxfrm(text,text) RETURNS bytea 
 * AS 'strxfrm2.so', 'pg_strxfrm' LANGUAGE c IMMUTABLE STRICT;
 * 
 * Usage: pg_strxfrm(string, locale)
 */

#include setjmp.h
#include string.h
#include postgres.h
#include fmgr.h
#include tcop/tcopprot.h
#include utils/builtins.h

#define GET_STR(textp) \
  DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp)))
#define GET_BYTEA(str_) \
  DatumGetTextP(DirectFunctionCall1(byteain, CStringGetDatum(str_)))
#define MAX_BYTEA_LEN0x3fff

/*
 * pg_strxfrm - Function to convert string similar to the strxfrm C
 * function using a specified locale.
 */

extern Datum pg_strxfrm(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(pg_strxfrm);

Datum
pg_strxfrm(PG_FUNCTION_ARGS)
{
	char   *str = GET_STR(PG_GETARG_TEXT_P(0));
	char   *localestr = GET_STR(PG_GETARG_TEXT_P(1));
	size_t  approx_trans_len;
	
	char   *trans;
	
	size_t  actual_trans_len;
	char   *oldlocale;
	char   *newlocale;
	sigjmp_buf  save_restart;

#ifndef DONT_TRUST_STRXFRM
	size_t  str_len = strlen(str);
	static unsigned guess = 1;

	approx_trans_len = guess + guess * str_len + 1;
	if (approx_trans_len  MAX_BYTEA_LEN)
		elog(ERROR, source string too long to transform);
	trans = (char *) palloc(approx_trans_len);
#else
	approx_trans_len = 0;
	trans = NULL;
#endif

	oldlocale = setlocale(LC_COLLATE, NULL);
	if (!oldlocale)
		elog(ERROR, setlocale failed to return a locale);
	oldlocale = pstrdup(oldlocale);

	/* catch elog while locale is set other than the default */
	memcpy(save_restart, Warn_restart, sizeof(save_restart));
	if (sigsetjmp(Warn_restart, 1) != 0)
	{
		memcpy(Warn_restart, save_restart, sizeof(Warn_restart));
		newlocale = setlocale(LC_COLLATE, oldlocale);
		if (!newlocale)
			elog(PANIC, setlocale failed to reset locale: %s, localestr);
		siglongjmp(Warn_restart, 1);
	}

	newlocale = setlocale(LC_COLLATE, localestr);
	if (!newlocale)
		elog(ERROR, setlocale failed to set a locale: %s, localestr);

	actual_trans_len = strxfrm(trans, str, approx_trans_len);
	/* if the buffer was not large enough, resize it and try again */
	if (actual_trans_len = approx_trans_len)
	{
		approx_trans_len = actual_trans_len + 1;
		if (approx_trans_len  MAX_BYTEA_LEN)
			elog(ERROR, source string too long to transform);

		if (trans)
			trans = (char *) repalloc(trans, approx_trans_len);
		else
			trans = (char *) palloc(approx_trans_len);

		actual_trans_len = strxfrm(trans, str, approx_trans_len);

#ifndef DONT_TRUST_STRXFRM
		while(actual_trans_len = guess + guess * str_len)
			guess++;
		elog(INFO, strxfrm seems to need %d*n+%d sized buffer, guess, guess + 1);
#endif

		/* if the buffer still not large enough, punt */
		if (actual_trans_len = approx_trans_len)
			elog(ERROR, strxfrm failed, buffer 

Re: [GENERAL] Collate problem

2005-10-24 Thread talenat

talenat wrote:

talenat wrote:


Hi,
I have a mandrake 10.1 server running Postgres 8.1beta3 with hr_HR 
locale. Client is a XP with WIN1250 code page.

When I use ORDER BY in query collation is out of order.
I have tried with client encoding WIN1250 and LATIN2 but with no luck.
If I use XP as a server with postgres 8.1beta3 then everything is as 
expected.
The database is filled with with data from script made on XP with 
WIN1250 codepage.
It looks like conversion from Latin2 to Win1250 is not correct and I 
don't know what to do.

Any hint is welcome.
BR



Here is an update to above.
If I execute SELECT * FROM atable ORDER BY afield then collate is wrong.
If I execute SELECT * FROM atable WHERE afield LIKE 'Š%' ORDER BY afield 
then collate is OK for returned subset.

afield is a index field.
It seems to me that there is a bug here but I need to be sure if I am 
not missing something.




Hi,
Here is a only solution that I could find so I hope that it will help 
others.
It is definitely something wrong with automatic translation between 
Linux and Win.
For Croatian character set on Linux default encoding is Latin2 and on 
Win it is Win1250. As I understand that translation should be automatic 
inside Postgresql server but the collation is wrong.
The solution is to use HR_UTF8 ( unicode ) on Linux and then DBInit 
creates templates as unicode.

When you create your database use UNICODE as encoding.
After this you could use WIN1250 as client side encoding ( if the client 
is on Win machine since that is a code page for Croatia ) and everything 
is as expected.

I don't know if there are any performance penalty but it works.

BR





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] Collate problem

2005-10-16 Thread talenat

Hi,
I have a mandrake 10.1 server running Postgres 8.1beta3 with hr_HR 
locale. Client is a XP with WIN1250 code page.

When I use ORDER BY in query collation is out of order.
I have tried with client encoding WIN1250 and LATIN2 but with no luck.
If I use XP as a server with postgres 8.1beta3 then everything is as 
expected.
The database is filled with with data from script made on XP with 
WIN1250 codepage.
It looks like conversion from Latin2 to Win1250 is not correct and I 
don't know what to do.

Any hint is welcome.
BR

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Collate problem

2005-10-16 Thread talenat

talenat wrote:

Hi,
I have a mandrake 10.1 server running Postgres 8.1beta3 with hr_HR 
locale. Client is a XP with WIN1250 code page.

When I use ORDER BY in query collation is out of order.
I have tried with client encoding WIN1250 and LATIN2 but with no luck.
If I use XP as a server with postgres 8.1beta3 then everything is as 
expected.
The database is filled with with data from script made on XP with 
WIN1250 codepage.
It looks like conversion from Latin2 to Win1250 is not correct and I 
don't know what to do.

Any hint is welcome.
BR


Here is an update to above.
If I execute SELECT * FROM atable ORDER BY afield then collate is wrong.
If I execute SELECT * FROM atable WHERE afield LIKE 'Š%' ORDER BY afield 
then collate is OK for returned subset.

afield is a index field.
It seems to me that there is a bug here but I need to be sure if I am 
not missing something.




---(end of broadcast)---
TIP 6: explain analyze is your friend