Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Dean Harding
> > Is "é" a "character"?
> 
> Yes.  And it is a different character from "e", "ê" or "è".  

Obviously, but the character "é" can be represented by the single Unicode 
codepoint U+00E9 (aka LATIN SMALL LETTER E WITH ACUTE) or by the sequence 
U+0065 U+0301 (aka, LATIN SMALL LETTER E, COMBINING ACUTE ACCENT). My point 
with this example was to show that one "character" may require more than one 
codepoint to represent. You could store "é" in a CHAR(1) if you used the 
representation U+00E9, however you could not store it as U+0065 U+0301.

The point is that saying "n" represents a count of "characters" is misleading, 

> > What I mean is, even if you changed the connection string character
> > set to "UTF8", 0x65 0x00 0x00 0x00 represents four UTF-8 characters
> > (that is, U+0065 U+ U+ U+).
> 
> According to what, I wonder?  

According to the definition UTF-8. See http://en.wikipedia.org/wiki/UTF-8 if 
you need a primer.

Dean.




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Helen Borrie
At 12:26 PM 6/06/2007, you wrote:
> > Which is exactly correct.  It's only confusing to those who don't
> > understand that not all characters/character sets encode characters
> > with single-byte encodings.
>
>It's confusing because the "n" represents a buffer size.

The "n" represents _a number_ which (by the 
standard) is the number of CHARACTERS (not a 
buffer size...SQL definitions are independent of any programming interface).

>Even allocating
>four bytes you cannot store 1 "character" because a character is such a
>confusing word to begin with. Is "é" a "character"?

Yes.  And it is a different character from "e", 
"ê" or "è".  Each has its own distinct encoding 
because it is a distinct character.  The fact 
that all of them use a basic image that looks to 
you like "e" isn't relevant (except in an accent-insensitive collation, natch!)

>What if I store it as
>U+0065 U+0301? That's two Unicode codepoints for one logical character.

What if?  You would simply be storing two Unicode 
codepoints that might have some meaning in some 
Unicode collation somewhere.  It wouldn't be 
meaningful otherwise.  Do you perhaps think a 
"logical character" has to do with the  graphical 
representation? (it doesn't).  You can't, for 
example, supply a chain of Unicode codepoints and 
hope it is a hack to enable you to use a font 
that doesn't support the characters you want to represent...


> > The cause of the problem here is not the length of the data stored
> > but that the client is set to expect single-byte encoding (by using
> > character set NONE)
>
>I agree that's a problem, however I disagree that the server should be
>returning four bytes when the UTF-8 encoded value of "e" (or whatever ASCII
>character you like) is only one byte long.

If anything, it's a shortcoming of the 
implementation of strings.  Strings have to be 
stored "somehow".  What you have to work with are 
data types (char and varchar, each with its 
particular rules that the engine knows about and 
that application language interface interpreters 
like the .NET driver know about) and string size 
(maximum length, defined by a number which is the 
maximum number of characters allowed).

The assumption is that 1 byte==1 character unless 
you specify otherwise (by defining a character 
set for which character boundaries are > 
1).  Both the client and the server will refer to 
the character set mapping to perform 
transliteration;  but there is nothing about the 
way strings are implemented in SQL that can 
support identifying variable boundaries for the 
"characters" that are represented by the 
sequences of bytes within the string.  There are 
text editors that *can* do that, but a database 
engine is not a text editorsoUTF8 
characters are stored as 4 bytes with 
left-to-right significance.  By convention, all 
character sets (including Unicode) that support 
the unaccented Roman characters (and the 
so-called "Arabic" numerals) use the same 7-bit 
encoding for the leftmost byte, viz. the hex range 30 to 5A and 61 to 7A.


>What I mean is, even if you changed the connection string character set to
>"UTF8", 0x65 0x00 0x00 0x00 represents four UTF-8 characters (that is,
>U+0065 U+ U+ U+).

According to what, I wonder?  A Unicode editor 
that understands that convention?  All a database 
engine can do is take a sequence of input or 
output codes and, if necessary, transliterate 
that sequence according to some rules, that are 
packaged for it as "character sets" and "collate 
sequences".  (Firebird 2 supports two collate 
sequences for UTF8.)  It doesn't store 
"characters" and it doesn't return "characters".

Hmmm, this has turned into a magnum opus.  I must get on with work...

Helen


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Dean Harding
> Which is exactly correct.  It's only confusing to those who don't
> understand that not all characters/character sets encode characters
> with single-byte encodings.

It's confusing because the "n" represents a buffer size. Even allocating
four bytes you cannot store 1 "character" because a character is such a
confusing word to begin with. Is "é" a "character"? What if I store it as
U+0065 U+0301? That's two Unicode codepoints for one logical character.

> The cause of the problem here is not the length of the data stored
> but that the client is set to expect single-byte encoding (by using
> character set NONE)

I agree that's a problem, however I disagree that the server should be
returning four bytes when the UTF-8 encoded value of "e" (or whatever ASCII
character you like) is only one byte long.

What I mean is, even if you changed the connection string character set to
"UTF8", 0x65 0x00 0x00 0x00 represents four UTF-8 characters (that is,
U+0065 U+ U+ U+).

Dean.




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Helen Borrie
At 11:14 AM 6/06/2007, you wrote:
> > YES! Changing the column's type to VARCHAR(1) works.
> >
> > Now can someone tell me why? Doesn't my experience still indicate that
> > there is some sort of bug with columns of type CHAR(1)?
>
>The language spec is a bit confusing with respect to the what the "n"
>actually means in CHAR(n). My copy says something like this:
>
>NameSize   Range/Precision
>-      
>CHAR (n)n characters   1 to 32,767 bytes
>Character set character size determines
>the maximum number of characters that
>can fit in 32K
>
>So it says that "n" represents the number of *characters*, while the maximum
>range is "32,767 *bytes*".

Which is exactly correct.  It's only confusing to those who don't 
understand that not all characters/character sets encode characters 
with single-byte encodings.


>Because one UTF-8 character may take between 1 and 4 bytes, perhaps the
>server allocates four bytes of storage for it, even though the character
>that is actually stored is only 1 byte?

The server allocates storage according to the declared data type AND 
the encoding used for character data.  What it returns depends on 
whether it is defined as CHAR or VARCHAR.  CHAR is returned 
right-padded with zeros (or NUL, i.e. hex 00 in the case of charset 
OCTETS), while VARCHARs are returned as actual data plus 2 bytes (a 
smallint carrying the length of the data as NUMBER OF CHARACTERS.

The cause of the problem here is not the length of the data stored 
but that the client is set to expect single-byte encoding (by using 
character set NONE), so it has prepared its buffer to receive 1 and 
only 1 byte.  The unsuspecting server returns 4 bytes (as it is meant 
to do) and client squawks.

>It seems like a bug to me, though. That the server *allocates* 4 bytes seems
>reasonable, however I don't think it should be *returning* 4 bytes (unless
>actually required).

The server returns what it is meant to.  It doesn't have any way to 
know that the client asked for the wrong thing.

Helen


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Dean Harding
> YES! Changing the column's type to VARCHAR(1) works.
> 
> Now can someone tell me why? Doesn't my experience still indicate that
> there is some sort of bug with columns of type CHAR(1)?

The language spec is a bit confusing with respect to the what the "n"
actually means in CHAR(n). My copy says something like this:

NameSize   Range/Precision
-      
CHAR (n)n characters   1 to 32,767 bytes
   Character set character size determines
   the maximum number of characters that
   can fit in 32K

So it says that "n" represents the number of *characters*, while the maximum
range is "32,767 *bytes*".

Because one UTF-8 character may take between 1 and 4 bytes, perhaps the
server allocates four bytes of storage for it, even though the character
that is actually stored is only 1 byte?

It seems like a bug to me, though. That the server *allocates* 4 bytes seems
reasonable, however I don't think it should be *returning* 4 bytes (unless
actually required).

Dean.




-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Helen Borrie
At 06:12 AM 6/06/2007, you wrote:
>YES! Changing the column's type to VARCHAR(1) works.
>
>Now can someone tell me why?

You said your data was UTF8 (4-byte characters) and your connection 
charset is NONE (which means US ASCII, one-byte characters).  ASCII 
"D" is stored as "44", while UTF8 "D" is stored as "44 00 00 
00".  Varchar(1) is an absurdity, but the engine dutifully strips the 
right-padding from varchar byte streams and the client gets the 
single byte it is expecting.  This hack is only going to work for 
characters that share the same code for the leftmost byte, i.e. 
ASCII/ANSI codes <= hex 80, and only if the search string is a single 
character. Beyond that, all bets are off.

>Doesn't my experience still indicate that there
>is some sort of bug with columns of type CHAR(1)?

In your current setup, yes.  The hack only works if both the client 
charset and the database charset have encodings for all equivalent 
characters _and_ both remain meaningful if reduced to the leftmost 
byte.  (The same applies to the introducer syntax, see below.)

With client charset NONE, you are getting transliteration when 
*storing* strings (since the database engine takes care of it) but it 
can't happen for reads, since the client only knows about single-byte 
encodings and prepares its buffers accordingly.

You'll need to set the correct client charset to ensure that the 
client side of the interface knows the correct attributes for the 
data it is preparing for.  Also follow up on Carlos' advice to 
revisit your Windows language environment and do whatever is needed 
to ensure that the strings being processed by the application are 
valid for UTF8.

Side note:
When you have a literal search argument (as opposed to a 
parameterised one) you can use the introducer syntax to coerce 
literal input strings to a different character set than the 
connection charset, e.g.

SELECT Description from TableName
where Type = _UTF8 'D'

While this syntax can have its uses in databases that store data in a 
variety of character sets, it's no kind of universal formula for 
hacking around your mismatched client/server encodings.  It's not 
possible to coerce output so, as you have already discovered, your 
client prepares a single-byte buffer for your unsearched SELECT 
statement because it is expecting ASCII-encoded characters.

>By the way, I renamed the column back to TYPE and it doesn't seem to cause
>any problems.

This was a red herring.  From Fb 2.0 on, TYPE is no longer a reserved 
word.  See p. 45 (Acrobat p. 55) of the release notes...

cheers,
Helen


-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Dan Cooperstock
YES! Changing the column's type to VARCHAR(1) works.

Now can someone tell me why? Doesn't my experience still indicate that there
is some sort of bug with columns of type CHAR(1)?

By the way, I renamed the column back to TYPE and it doesn't seem to cause
any problems.

- Dan.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: June 5, 2007 3:24 PM

Hi,
Can you declare it as a VARCHAR(1)? Or try "select cast(recipt_type as 
varchar(1)) from "
Regards.,
...michal



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread [EMAIL PROTECTED]
Hi,
Can you declare it as a VARCHAR(1)? Or try "select cast(recipt_type as 
varchar(1)) from "
Regards.,
...michal

Dan Cooperstock napisał(a):
> Here's a bit more info: I just walked through the execution of "SELECT
> RECEIPT_TYPE FROM RECEIPT WHERE ..." in the debugger, and what happens is
> that it is still selecting 4 characters, e.g. "D   ", when the column was
> declared as CHAR(1) and the real value should be "D". Obviously it ends up
> comparing that to the schema, seeing it's too long, and giving the error
> message I have quoted.
> 
> - Dan.

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Dan Cooperstock
Here's a bit more info: I just walked through the execution of "SELECT
RECEIPT_TYPE FROM RECEIPT WHERE ..." in the debugger, and what happens is
that it is still selecting 4 characters, e.g. "D   ", when the column was
declared as CHAR(1) and the real value should be "D". Obviously it ends up
comparing that to the schema, seeing it's too long, and giving the error
message I have quoted.

- Dan.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Dan Cooperstock
Sent: June 5, 2007 10:23 AM
To: firebird-net-provider@lists.sourceforge.net
Subject: Re: [Firebird-net-provider] Short char field with longer name

OK, I have renamed the column (from TYPE to RECEIPT_TYPE) and I'm still 
getting the same error message when I do "select * from tablename":

"Failed to enable constraints. One or more rows contain values violating 
non-null, unique or foreign-key constraints."

However, I can execute the same SQL statement successfully in ISQL, which 
makes me think this is indeed a problem in the .NET provider.

Any bright ideas? Thanks.



-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider


Re: [Firebird-net-provider] Short char field with longer name

2007-06-05 Thread Dan Cooperstock
OK, I have renamed the column (from TYPE to RECEIPT_TYPE) and I'm still 
getting the same error message when I do "select * from tablename":

"Failed to enable constraints. One or more rows contain values violating 
non-null, unique or foreign-key constraints."

However, I can execute the same SQL statement successfully in ISQL, which 
makes me think this is indeed a problem in the .NET provider.

Any bright ideas? Thanks.

[EMAIL PROTECTED] (Dan Cooperstock) wrote in
news:[EMAIL PROTECTED]: 

> I changed my connection string to use UTF8 and I'm still getting the
> same types of errors. Perhaps I do need to rename this column?
> 

-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
___
Firebird-net-provider mailing list
Firebird-net-provider@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/firebird-net-provider