Re: [GENERAL] russian case-insensitive regexp search not working

2007-07-12 Thread alexander lunyov

Oleg Bartunov wrote:

alexander,

lc_ctype and lc_collate can be changed only at initdb !
You need to read localization chapter
http://www.postgresql.org/docs/current/static/charset.html



Yes, i knew about this, but i thought maybe somehow it can be changed
onthefly.

... (10 minutes later)

Yes, now when initdb done with --locale=ru_RU.UTF-8,
lower('RussianString') gives me 'russianstring', though, case-insensiive
regexp still not working. I guess i'll stick with lower() ~ lower()
construction.

And thanks everybody who replied!




Oleg
On Thu, 12 Jul 2007, alexander lunyov wrote:


Tom Lane wrote:

alexander lunyov [EMAIL PROTECTED] writes:

With this i just wanted to say that lower() doesn't work at all on
russian unicode characters,


In that case you're using the wrong locale (ie, not russian unicode).
Check show lc_ctype.


db=  SHOW LC_CTYPE;
lc_ctype
--
C
(1 запись)

db=  SHOW LC_COLLATE;
lc_collate

C
(1 запись)

Where can i change this? Trying to SET this parameters gives error 
parameter lc_collate cannot be changed



Or [ checks back in thread... ] maybe you're using the wrong operating
system.  Not so long ago FreeBSD didn't have Unicode locale support at
all; I'm not sure if 6.2 has that problem but it is worth checking.
Does it work for you to do case-insensitive russian comparisons in
grep, for instance?


I put to textfile 3 russian strings with different case of first char 
and grep'ed them all:


# cat  textfile
Зеленая
Зеленодольская
зеленая
# grep -i зелен *
textfile:Зеленая
textfile:Зеленодольская
textfile:зеленая

So i think system is fine about unicode.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


--
alexander lunyov
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] russian case-insensitive regexp search not working

2007-07-11 Thread alexander lunyov

Tom Lane wrote:

alexander lunyov [EMAIL PROTECTED] writes:

With this i just wanted to say that lower() doesn't work at all on
russian unicode characters,


In that case you're using the wrong locale (ie, not russian unicode).
Check show lc_ctype.


db=  SHOW LC_CTYPE;
 lc_ctype
--
 C
(1 запись)

db=  SHOW LC_COLLATE;
 lc_collate

 C
(1 запись)

Where can i change this? Trying to SET this parameters gives error 
parameter lc_collate cannot be changed



Or [ checks back in thread... ] maybe you're using the wrong operating
system.  Not so long ago FreeBSD didn't have Unicode locale support at
all; I'm not sure if 6.2 has that problem but it is worth checking.
Does it work for you to do case-insensitive russian comparisons in
grep, for instance?


I put to textfile 3 russian strings with different case of first char 
and grep'ed them all:


# cat  textfile
Зеленая
Зеленодольская
зеленая
# grep -i зелен *
textfile:Зеленая
textfile:Зеленодольская
textfile:зеленая

So i think system is fine about unicode.

--
alexander lunyov
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] russian case-insensitive regexp search not working

2007-07-10 Thread alexander lunyov

Karsten Hilbert wrote:

Just to clarify: lower() on both sides of a comparison
should still work as expected on multibyte encodings ? It's
been suggested here before.
lower() on both sides also does not working in my case, it still search for 
case-sensitive data. String in this example have first char capitalized, 
and result is the same. Seems that lower() can't lower multibyte character.


db= select lower('Зелен');

Well, no,


   With this i just wanted to say that lower() doesn't work at all on
russian unicode characters, even in select lower('String') 'String'
don't become lowercase, and further it does not work in more complex
select statement.



select my_string where lower(my_string) ~ lower(search_fragment);

Does that help ?

(~ does work for eg. German in my experience)


No, for russian unicode strings it is not working.
I searched pgsql-patches@ list and found there this thread:
http://archives.postgresql.org/pgsql-patches/2007-06/msg00021.php
I wrote Andrew (he didn't answer yet) about whether this patch can
help with my problem.

P.S.: if this issue is a known bug (as we talked earlier), then how long
will it take to fix it? I know little about postgresql development
process, maybe you know it little better?

--
alexander lunyov
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] russian case-insensitive regexp search not working

2007-07-09 Thread alexander lunyov

Hello, friends.

OS FreeBSD 6.2, Postgresql 8.2.4

Postgresql does not search case-insensitive russian regexp unicode 
patterns. Postgres is working under user pgsql with login class (in 
/etc/login.conf):


postgres:\
:lang=ru_RU.UTF-8:\
:setenv=LC_COLLATE=C:\
:tc=default:

In .profile of postgres user:

LANG=ru_RU.UTF-8
export LANG
CHARSET=UTF-8
export CHARSET

Then, database:

db= \encoding
UTF8

Case insensitive search for low-cased pattern show nothing:

db= select street from people where street ~* 'зелен';
 street

(0 rows)

While there are records, but they are with first capital character:

db= select street from people where street ~* 'Зелен';
 street

 Зеленая
 Зеленоградская
(2 rows)

Search for english values work fine, russian values not. Why could it be?

--
alexander lunyov
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] russian case-insensitive regexp search not working

2007-07-09 Thread alexander lunyov

No, ILIKE also does case-sensitive search.

I found this bug report:
http://archives.postgresql.org/pgsql-bugs/2006-09/msg00065.php

Is it about this issue? And will it be fixed someday?

Sergey Levchenko wrote:

Just use: select street from people where street ILIKE 'зелен%';

select with case-insensitive regexp does no work right now!

On 09/07/07, alexander lunyov [EMAIL PROTECTED] wrote:

Hello, friends.

OS FreeBSD 6.2, Postgresql 8.2.4

Postgresql does not search case-insensitive russian regexp unicode
patterns. Postgres is working under user pgsql with login class (in
/etc/login.conf):

postgres:\
 :lang=ru_RU.UTF-8:\
 :setenv=LC_COLLATE=C:\
 :tc=default:

In .profile of postgres user:

LANG=ru_RU.UTF-8
export LANG
CHARSET=UTF-8
export CHARSET

Then, database:

db= \encoding
UTF8

Case insensitive search for low-cased pattern show nothing:

db= select street from people where street ~* 'зелен';
  street

(0 rows)

While there are records, but they are with first capital character:

db= select street from people where street ~* 'Зелен';
  street

  Зеленая
  Зеленоградская
(2 rows)

Search for english values work fine, russian values not. Why could it be?

--
alexander lunyov
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



--
alexander lunyov
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] russian case-insensitive regexp search not working

2007-07-09 Thread alexander lunyov

Karsten Hilbert wrote:

Just to clarify: lower() on both sides of a comparison
should still work as expected on multibyte encodings ? It's
been suggested here before.


lower() on both sides also does not working in my case, it still search 
for case-sensitive data. String in this example have first char 
capitalized, and result is the same. Seems that lower() can't lower 
multibyte character.



db= select lower('Зелен');
 lower
---
 Зелен
(1 запись)


--
alexander lunyov
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly