Re: [HACKERS] Fixing the Turkish problem

2004-05-23 Thread Devrim GUNDUZ

Hi,

On Thu, 6 May 2004, Tom Lane wrote:

 We're sort of halfway there on coping with the Turkish-locale i-vs-I
 problem.  I'd like to finish the job for 7.5.

Cool!

snip
 AFAICS the remaining problem is that there are a bunch of places that
 use strcasecmp() or strncasecmp() to match inputs against locally known
 keywords (such as datestyle or timezone names).  We need to make a
 variant version of strcasecmp that uses this same style of case-folding.
 
 What I'm thinking of doing is inventing pg_strcasecmp and
 pg_strncasecmp that act like the above and replacing all calls of the
 standard library functions with these.

If you can post all the patches you'd like to apply, I'd be happy to test 
them. (Sorry for the very late response, btw.)

Regards,

-- 
Devrim GUNDUZ  
devrim~gunduz.org   devrim.gunduz~linux.org.tr 
http://www.TDMSoft.com
http://www.gunduz.org





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixing the Turkish problem

2004-05-23 Thread Tom Lane
Devrim GUNDUZ [EMAIL PROTECTED] writes:
 On Thu, 6 May 2004, Tom Lane wrote:
 What I'm thinking of doing is inventing pg_strcasecmp and
 pg_strncasecmp that act like the above and replacing all calls of the
 standard library functions with these.

 If you can post all the patches you'd like to apply, I'd be happy to test 
 them. (Sorry for the very late response, btw.)

The patches are in; please give CVS tip a shot and see what you think.
It passed regression tests in a Turkish locale for me.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fixing the Turkish problem

2004-05-23 Thread Devrim GUNDUZ

Hi,

On Sun, 23 May 2004, Tom Lane wrote:

  pg_strncasecmp that act like the above and replacing all calls of the
  standard library functions with these.
 
  If you can post all the patches you'd like to apply, I'd be happy to test 
  them. (Sorry for the very late response, btw.)
 
 The patches are in; please give CVS tip a shot and see what you think.
 It passed regression tests in a Turkish locale for me.

Yes, it solves the initdb bug #1133. Thanks.

However, we still fail to sort small I (i dotless) and i. i dotless 
comes before i in Turkish Alphabet, but ORDER BY sorts i before i 
dotless.

I would post a sample, but I'm not sure that anyone on the list could view 
it :)

Regards,
-- 
Devrim GUNDUZ  
devrim~gunduz.org   devrim.gunduz~linux.org.tr 
http://www.TDMSoft.com
http://www.gunduz.org


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixing the Turkish problem

2004-05-23 Thread Tom Lane
Devrim GUNDUZ [EMAIL PROTECTED] writes:
 However, we still fail to sort small I (i dotless) and i. i dotless 
 comes before i in Turkish Alphabet, but ORDER BY sorts i before i 
 dotless.

For that, you have to complain to your locale's designer.  We just do
what strcoll tells us to.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Fixing the Turkish problem

2004-05-06 Thread Tom Lane
We're sort of halfway there on coping with the Turkish-locale i-vs-I
problem.  I'd like to finish the job for 7.5.

What we presently have is that identifier and keyword downcasing is done
without trusting tolower():

/*
 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 * have the infrastructure for.  Instead we use tolower() to provide a
 * locale-aware translation.  However, there are some locales where this
 * is not right either (eg, Turkish may do strange things with 'i' and
 * 'I').  Our current compromise is to use tolower() for characters with
 * the high bit set, and use an ASCII-only downcasing for 7-bit
 * characters.
 */
for (i = 0; i  len; i++)
{
unsigned charch = (unsigned char) ident[i];

if (ch = 'A'  ch = 'Z')
ch += 'a' - 'A';
else if (ch = 0x80  isupper(ch))
ch = tolower(ch);
result[i] = (char) ch;
}

AFAICS the remaining problem is that there are a bunch of places that
use strcasecmp() or strncasecmp() to match inputs against locally known
keywords (such as datestyle or timezone names).  We need to make a
variant version of strcasecmp that uses this same style of case-folding.

What I'm thinking of doing is inventing pg_strcasecmp and
pg_strncasecmp that act like the above and replacing all calls of the
standard library functions with these.

The routines need to be available in client code (eg, psql) as well as
the backend, so I'm thinking of putting them into libpgport (src/port/).
Another possibility would be to associate them with the multibyte
character code, which is already imported into client code in places.

Any thoughts, objections?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Fw: [HACKERS] Fixing the Turkish problem

2004-05-06 Thread Ismail Kizir

- Original Message -
From: Ismail Kizir [EMAIL PROTECTED]
To: Tom Lane [EMAIL PROTECTED]
Sent: Friday, May 07, 2004 2:22 AM
Subject: Re: [HACKERS] Fixing the Turkish problem


 Tom,

 Thank you very much for turkish locale fix.
 I think, that simple approach will fix the problem.
 And libpgport (src/port/) may be a good place to put function
declarations.
 I am sure that you can make better decisions than me on that subject.
 Devrim wrote about a bug in glibc ... Do you know anything about it?
 Sometimes, I encounter strange behaviors with php(with unicode support)
 also.
 When I open a php generated page(utf-8 encoded source code), php
interpreter
 gives Syntax error . And when i refresh the same page with F5, it
 works correctly. This may be a proof of that bug.

 Regards
 Ismail Kizir




---(end of broadcast)---
TIP 8: explain analyze is your friend