Re: [Firebird-net-provider] Short char field with longer name
> > Is "é" a "character"? > > Yes. And it is a different character from "e", "ê" or "è". Obviously, but the character "é" can be represented by the single Unicode codepoint U+00E9 (aka LATIN SMALL LETTER E WITH ACUTE) or by the sequence U+0065 U+0301 (aka, LATIN SMALL LETTER E, COMBINING ACUTE ACCENT). My point with this example was to show that one "character" may require more than one codepoint to represent. You could store "é" in a CHAR(1) if you used the representation U+00E9, however you could not store it as U+0065 U+0301. The point is that saying "n" represents a count of "characters" is misleading, > > What I mean is, even if you changed the connection string character > > set to "UTF8", 0x65 0x00 0x00 0x00 represents four UTF-8 characters > > (that is, U+0065 U+ U+ U+). > > According to what, I wonder? According to the definition UTF-8. See http://en.wikipedia.org/wiki/UTF-8 if you need a primer. Dean. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
At 12:26 PM 6/06/2007, you wrote: > > Which is exactly correct. It's only confusing to those who don't > > understand that not all characters/character sets encode characters > > with single-byte encodings. > >It's confusing because the "n" represents a buffer size. The "n" represents _a number_ which (by the standard) is the number of CHARACTERS (not a buffer size...SQL definitions are independent of any programming interface). >Even allocating >four bytes you cannot store 1 "character" because a character is such a >confusing word to begin with. Is "é" a "character"? Yes. And it is a different character from "e", "ê" or "è". Each has its own distinct encoding because it is a distinct character. The fact that all of them use a basic image that looks to you like "e" isn't relevant (except in an accent-insensitive collation, natch!) >What if I store it as >U+0065 U+0301? That's two Unicode codepoints for one logical character. What if? You would simply be storing two Unicode codepoints that might have some meaning in some Unicode collation somewhere. It wouldn't be meaningful otherwise. Do you perhaps think a "logical character" has to do with the graphical representation? (it doesn't). You can't, for example, supply a chain of Unicode codepoints and hope it is a hack to enable you to use a font that doesn't support the characters you want to represent... > > The cause of the problem here is not the length of the data stored > > but that the client is set to expect single-byte encoding (by using > > character set NONE) > >I agree that's a problem, however I disagree that the server should be >returning four bytes when the UTF-8 encoded value of "e" (or whatever ASCII >character you like) is only one byte long. If anything, it's a shortcoming of the implementation of strings. Strings have to be stored "somehow". What you have to work with are data types (char and varchar, each with its particular rules that the engine knows about and that application language interface interpreters like the .NET driver know about) and string size (maximum length, defined by a number which is the maximum number of characters allowed). The assumption is that 1 byte==1 character unless you specify otherwise (by defining a character set for which character boundaries are > 1). Both the client and the server will refer to the character set mapping to perform transliteration; but there is nothing about the way strings are implemented in SQL that can support identifying variable boundaries for the "characters" that are represented by the sequences of bytes within the string. There are text editors that *can* do that, but a database engine is not a text editorsoUTF8 characters are stored as 4 bytes with left-to-right significance. By convention, all character sets (including Unicode) that support the unaccented Roman characters (and the so-called "Arabic" numerals) use the same 7-bit encoding for the leftmost byte, viz. the hex range 30 to 5A and 61 to 7A. >What I mean is, even if you changed the connection string character set to >"UTF8", 0x65 0x00 0x00 0x00 represents four UTF-8 characters (that is, >U+0065 U+ U+ U+). According to what, I wonder? A Unicode editor that understands that convention? All a database engine can do is take a sequence of input or output codes and, if necessary, transliterate that sequence according to some rules, that are packaged for it as "character sets" and "collate sequences". (Firebird 2 supports two collate sequences for UTF8.) It doesn't store "characters" and it doesn't return "characters". Hmmm, this has turned into a magnum opus. I must get on with work... Helen - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
> Which is exactly correct. It's only confusing to those who don't > understand that not all characters/character sets encode characters > with single-byte encodings. It's confusing because the "n" represents a buffer size. Even allocating four bytes you cannot store 1 "character" because a character is such a confusing word to begin with. Is "é" a "character"? What if I store it as U+0065 U+0301? That's two Unicode codepoints for one logical character. > The cause of the problem here is not the length of the data stored > but that the client is set to expect single-byte encoding (by using > character set NONE) I agree that's a problem, however I disagree that the server should be returning four bytes when the UTF-8 encoded value of "e" (or whatever ASCII character you like) is only one byte long. What I mean is, even if you changed the connection string character set to "UTF8", 0x65 0x00 0x00 0x00 represents four UTF-8 characters (that is, U+0065 U+ U+ U+). Dean. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
At 11:14 AM 6/06/2007, you wrote: > > YES! Changing the column's type to VARCHAR(1) works. > > > > Now can someone tell me why? Doesn't my experience still indicate that > > there is some sort of bug with columns of type CHAR(1)? > >The language spec is a bit confusing with respect to the what the "n" >actually means in CHAR(n). My copy says something like this: > >NameSize Range/Precision >- >CHAR (n)n characters 1 to 32,767 bytes >Character set character size determines >the maximum number of characters that >can fit in 32K > >So it says that "n" represents the number of *characters*, while the maximum >range is "32,767 *bytes*". Which is exactly correct. It's only confusing to those who don't understand that not all characters/character sets encode characters with single-byte encodings. >Because one UTF-8 character may take between 1 and 4 bytes, perhaps the >server allocates four bytes of storage for it, even though the character >that is actually stored is only 1 byte? The server allocates storage according to the declared data type AND the encoding used for character data. What it returns depends on whether it is defined as CHAR or VARCHAR. CHAR is returned right-padded with zeros (or NUL, i.e. hex 00 in the case of charset OCTETS), while VARCHARs are returned as actual data plus 2 bytes (a smallint carrying the length of the data as NUMBER OF CHARACTERS. The cause of the problem here is not the length of the data stored but that the client is set to expect single-byte encoding (by using character set NONE), so it has prepared its buffer to receive 1 and only 1 byte. The unsuspecting server returns 4 bytes (as it is meant to do) and client squawks. >It seems like a bug to me, though. That the server *allocates* 4 bytes seems >reasonable, however I don't think it should be *returning* 4 bytes (unless >actually required). The server returns what it is meant to. It doesn't have any way to know that the client asked for the wrong thing. Helen - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
> YES! Changing the column's type to VARCHAR(1) works. > > Now can someone tell me why? Doesn't my experience still indicate that > there is some sort of bug with columns of type CHAR(1)? The language spec is a bit confusing with respect to the what the "n" actually means in CHAR(n). My copy says something like this: NameSize Range/Precision - CHAR (n)n characters 1 to 32,767 bytes Character set character size determines the maximum number of characters that can fit in 32K So it says that "n" represents the number of *characters*, while the maximum range is "32,767 *bytes*". Because one UTF-8 character may take between 1 and 4 bytes, perhaps the server allocates four bytes of storage for it, even though the character that is actually stored is only 1 byte? It seems like a bug to me, though. That the server *allocates* 4 bytes seems reasonable, however I don't think it should be *returning* 4 bytes (unless actually required). Dean. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
At 06:12 AM 6/06/2007, you wrote: >YES! Changing the column's type to VARCHAR(1) works. > >Now can someone tell me why? You said your data was UTF8 (4-byte characters) and your connection charset is NONE (which means US ASCII, one-byte characters). ASCII "D" is stored as "44", while UTF8 "D" is stored as "44 00 00 00". Varchar(1) is an absurdity, but the engine dutifully strips the right-padding from varchar byte streams and the client gets the single byte it is expecting. This hack is only going to work for characters that share the same code for the leftmost byte, i.e. ASCII/ANSI codes <= hex 80, and only if the search string is a single character. Beyond that, all bets are off. >Doesn't my experience still indicate that there >is some sort of bug with columns of type CHAR(1)? In your current setup, yes. The hack only works if both the client charset and the database charset have encodings for all equivalent characters _and_ both remain meaningful if reduced to the leftmost byte. (The same applies to the introducer syntax, see below.) With client charset NONE, you are getting transliteration when *storing* strings (since the database engine takes care of it) but it can't happen for reads, since the client only knows about single-byte encodings and prepares its buffers accordingly. You'll need to set the correct client charset to ensure that the client side of the interface knows the correct attributes for the data it is preparing for. Also follow up on Carlos' advice to revisit your Windows language environment and do whatever is needed to ensure that the strings being processed by the application are valid for UTF8. Side note: When you have a literal search argument (as opposed to a parameterised one) you can use the introducer syntax to coerce literal input strings to a different character set than the connection charset, e.g. SELECT Description from TableName where Type = _UTF8 'D' While this syntax can have its uses in databases that store data in a variety of character sets, it's no kind of universal formula for hacking around your mismatched client/server encodings. It's not possible to coerce output so, as you have already discovered, your client prepares a single-byte buffer for your unsearched SELECT statement because it is expecting ASCII-encoded characters. >By the way, I renamed the column back to TYPE and it doesn't seem to cause >any problems. This was a red herring. From Fb 2.0 on, TYPE is no longer a reserved word. See p. 45 (Acrobat p. 55) of the release notes... cheers, Helen - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
YES! Changing the column's type to VARCHAR(1) works. Now can someone tell me why? Doesn't my experience still indicate that there is some sort of bug with columns of type CHAR(1)? By the way, I renamed the column back to TYPE and it doesn't seem to cause any problems. - Dan. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: June 5, 2007 3:24 PM Hi, Can you declare it as a VARCHAR(1)? Or try "select cast(recipt_type as varchar(1)) from " Regards., ...michal - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
Hi, Can you declare it as a VARCHAR(1)? Or try "select cast(recipt_type as varchar(1)) from " Regards., ...michal Dan Cooperstock napisał(a): > Here's a bit more info: I just walked through the execution of "SELECT > RECEIPT_TYPE FROM RECEIPT WHERE ..." in the debugger, and what happens is > that it is still selecting 4 characters, e.g. "D ", when the column was > declared as CHAR(1) and the real value should be "D". Obviously it ends up > comparing that to the schema, seeing it's too long, and giving the error > message I have quoted. > > - Dan. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
Here's a bit more info: I just walked through the execution of "SELECT RECEIPT_TYPE FROM RECEIPT WHERE ..." in the debugger, and what happens is that it is still selecting 4 characters, e.g. "D ", when the column was declared as CHAR(1) and the real value should be "D". Obviously it ends up comparing that to the schema, seeing it's too long, and giving the error message I have quoted. - Dan. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dan Cooperstock Sent: June 5, 2007 10:23 AM To: firebird-net-provider@lists.sourceforge.net Subject: Re: [Firebird-net-provider] Short char field with longer name OK, I have renamed the column (from TYPE to RECEIPT_TYPE) and I'm still getting the same error message when I do "select * from tablename": "Failed to enable constraints. One or more rows contain values violating non-null, unique or foreign-key constraints." However, I can execute the same SQL statement successfully in ISQL, which makes me think this is indeed a problem in the .NET provider. Any bright ideas? Thanks. - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider
Re: [Firebird-net-provider] Short char field with longer name
OK, I have renamed the column (from TYPE to RECEIPT_TYPE) and I'm still getting the same error message when I do "select * from tablename": "Failed to enable constraints. One or more rows contain values violating non-null, unique or foreign-key constraints." However, I can execute the same SQL statement successfully in ISQL, which makes me think this is indeed a problem in the .NET provider. Any bright ideas? Thanks. [EMAIL PROTECTED] (Dan Cooperstock) wrote in news:[EMAIL PROTECTED]: > I changed my connection string to use UTF8 and I'm still getting the > same types of errors. Perhaps I do need to rename this column? > - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ Firebird-net-provider mailing list Firebird-net-provider@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/firebird-net-provider