Re: [firebird-support] Character sets and collations of columns

2016-08-30 Thread Stefan Heymann li...@stefanheymann.de [firebird-support]
Tomasz,

> One of the legacy databases I happen to maintain has character sets and
> collations messed up. [...]

For a database that is messed up like this, I'd go the clean way and
use a pump. That would also give you the opportunity to change it to
Unicode (UTF-8) at the same time, which is a transition that you will
have to do anyway.

Regards

Stefan







Re: [firebird-support] Character sets and collations of columns

2016-08-26 Thread Tomasz Tyrakowski t.tyrakow...@sol-system.pl [firebird-support]
On 26.08.2016 o 14:35, robert rottermann rob...@redcor.ch 
[firebird-support] wrote:
> I have no idea at on how to fix it "correctly", but would it not be an
> idea just to dump the tables with weird character into a text file, and
> search replace them.
>
> The wrong chars are probably uniformly wrong, so you should be able to
> fix them using sed or some such tool (on linux, I am not well versed
> with windows)

The funny part is, I don't believe the data is actually corrupted. I 
think the database tries to convert it incorrectly, using the info about 
the encoding of the columns (which might be incorrect) and the encoding 
used by the client. The actual data, when I display it via Flame Robin 
or isql seems to be OK, it's the sorting or uppercasing that goes weird.
Thanks for the suggestions anyway.

regards
Tomasz

-- 
__--==--__
__--== Tomasz Tyrakowski==--__
__--==SOL-SYSTEM==--__
__--== http://www.sol-system.pl ==--__
__--==--__


Re: [firebird-support] Character sets and collations of columns

2016-08-26 Thread robert rottermann rob...@redcor.ch [firebird-support]
I have no idea at on how to fix it "correctly", but would it not be an 
idea just to dump the tables with weird character into a text file, and 
search replace them.

The wrong chars are probably uniformly wrong, so you should be able to 
fix them using sed or some such tool (on linux, I am not well versed 
with windows)

afterwards you clean up the db setting (droping it?) and the load the db 
again


robert


On 26.08.2016 13:58, Tomasz Tyrakowski t.tyrakow...@sol-system.pl 
[firebird-support] wrote:
> Hi,
>
> One of the legacy databases I happen to maintain has character sets and
> collations messed up. As far as I know, the database was created with
> default character set "NONE" instead of "WIN1250", which should have
> been set back then (and it's over 10 years old).
> Then, some columns have been added with the character set option set
> explicitly to WIN1250 (alter table X add Y varchar(n) character set
> WIN1250), while the old columns still remained as "NONE".
> The clients were windows apps and used WIN1250 when connecting to the
> database.
> Later on, another maintainer changed the RDB$COLLATION_ID flags for some
> of the columns in RDB$RELATION_FIELDS to indicate WIN1250 (value 3 as
> far as I know, although I'm not entirely sure if this flag actually
> concerns character sets, or only collation order - maybe the maintainer
> was convinced he was clever, while in fact nothing relevant happened),
> and set the default character set of the database to WIN1250 (in a way
> unknown to me - maybe even via binary editing the GDB file and changing
> some byte-flags inside).
> So now the DB is kind of a mess (despite reporting dutifully WIN1250 as
> the default character set), with some columns created when the DB had
> "NONE" character set, some other create with character set WIN1250 set
> explicitly upon creation, yet another created without explicit character
> set but when the database has already had the "WIN1250" default
> character set, and possibly some columns, for which the RDB$COLLATION_ID
> was set by hand.
> So my question is (in fact I'm not even sure what the right question is
> :( ): assuming the strings in the database are in fact WIN1250-encoded
> (all the clients used WIN1250), is setting the RDB$COLLATION_ID on all
> text columns a good idea and would actually make any difference?
> All strange quirks that happen from time to time with this database are
> related to national characters. Sometimes a sort doesn't work as
> expected, sometimes a client function (like strtoupper in PHP) doesn't
> work on data from some of the columns (but works for other data). Is
> there a chance that setting the collation flag might help? Or is there
> another way to set it right?
> I know the "right" right thing to do (pun intended) is to create a new
> DB and pump the data, but it has about 400 interconnected tables and
> over 60GB of size, so if there is another way, I would be grateful to
> hear about it first.
>
> Thanks in advance and sorry for this lengthy story.
>
> regards
> Tomasz
>



[firebird-support] Character sets and collations of columns

2016-08-26 Thread Tomasz Tyrakowski t.tyrakow...@sol-system.pl [firebird-support]
Hi,

One of the legacy databases I happen to maintain has character sets and 
collations messed up. As far as I know, the database was created with 
default character set "NONE" instead of "WIN1250", which should have 
been set back then (and it's over 10 years old).
Then, some columns have been added with the character set option set 
explicitly to WIN1250 (alter table X add Y varchar(n) character set 
WIN1250), while the old columns still remained as "NONE".
The clients were windows apps and used WIN1250 when connecting to the 
database.
Later on, another maintainer changed the RDB$COLLATION_ID flags for some 
of the columns in RDB$RELATION_FIELDS to indicate WIN1250 (value 3 as 
far as I know, although I'm not entirely sure if this flag actually 
concerns character sets, or only collation order - maybe the maintainer 
was convinced he was clever, while in fact nothing relevant happened), 
and set the default character set of the database to WIN1250 (in a way 
unknown to me - maybe even via binary editing the GDB file and changing 
some byte-flags inside).
So now the DB is kind of a mess (despite reporting dutifully WIN1250 as 
the default character set), with some columns created when the DB had 
"NONE" character set, some other create with character set WIN1250 set 
explicitly upon creation, yet another created without explicit character 
set but when the database has already had the "WIN1250" default 
character set, and possibly some columns, for which the RDB$COLLATION_ID 
was set by hand.
So my question is (in fact I'm not even sure what the right question is 
:( ): assuming the strings in the database are in fact WIN1250-encoded 
(all the clients used WIN1250), is setting the RDB$COLLATION_ID on all 
text columns a good idea and would actually make any difference?
All strange quirks that happen from time to time with this database are 
related to national characters. Sometimes a sort doesn't work as 
expected, sometimes a client function (like strtoupper in PHP) doesn't 
work on data from some of the columns (but works for other data). Is 
there a chance that setting the collation flag might help? Or is there 
another way to set it right?
I know the "right" right thing to do (pun intended) is to create a new 
DB and pump the data, but it has about 400 interconnected tables and 
over 60GB of size, so if there is another way, I would be grateful to 
hear about it first.

Thanks in advance and sorry for this lengthy story.

regards
Tomasz

-- 
__--==--__
__--== Tomasz Tyrakowski==--__
__--==SOL-SYSTEM==--__
__--== http://www.sol-system.pl ==--__
__--==--__