subject:"\[PHP\-DB\] Re\: CHAR field with charset UTF8 and COLLATION UNICODE_CI

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Lester Caine

On 22/11/16 18:01, Delmar Wichnieski wrote:
> 2016-11-22 12:42 GMT-02:00 Lester Caine :
> 
>> >  needs help to move
>> > the string to a variable that it can check if the UTF8 data is a single
>> > character or multiple characters.
>> >
>> >
> I believe it is a single byte, the goal is to simulate a boolean field,
> where I only use S for yes and N for no. (Idem/the same Y and N in English).
> 
> There is no operations like 'upper' and 'lower'. The script is very simple,
> according to pastebin links in the previous message.
> 
> S and N are in the range between 0 and 127 of the ASCII table and UTF-8
> says that only one byte is required to encode the first 128 ASCII
> characters (Unicode U +  to U + 007F).
> 
> But even if it consumed 2, 3 or 4 bytes, UNICODE should predict the end of
> the character, so it would be enough to find the end, apply the inverse
> algorithm to the encoding of the code point, and we would have the
> character back. This is just a dream.
> 
> 
> If the situation presented along the thread is a problem, then more people
> should report it. Let's wait. I'll use trim per hour, or cast
> 
> example
> 
> $q = $pdoconn->prepare("SELECT CODIGO, CAST(ACESSOSISTEMA AS VARCHAR(1)) AS
> ACESSOSISTEMA FROM USUARIO");
> 
> And the problem is solved. Or yet another solution not thought out.

That is perhaps the point. PHP on it's own can't decide if you need to
convert to an ASCII single byte, allow space for a multiple byte single
character or something else. All PHP sees is a buffer with a number of
bytes in, and what comes over the wire from Firebird even strips any
trailing space characters requiring the client end to untangle things.
If you want a unicode string you have to copy it to a mbstring variable
since the simple single byte buffer does not know that it is not just
256 bit data. Now a CHAR(1) could be treated as a special case, but
CHAR(2+) can not be so easily handled. This is one reason why the normal
'hack' to add a binary domain is to use a SMALLINT rather than a CHAR
and store NULL/0/1 ...

I'm not saying that the current results are correct, just that without a
native handling of unicode one has some edge cases which could be
resolved different ways. Returning a unicode CHAR field as a fixed
number of 32bit characters has an attraction when one needs to work with
particular fixed character positions in the string but while UNICODE_FSS
was designed with that in mind, UTF8 *IS* the right way forward once
everybody actually supports it ;)

One of my pet grips is that simple PHP variables do not play well with
database fields, and rather than having to pull in mbstring, extending
'string' so that it can be flagged as utf8 and handle a utf8 field
natively is what is needed. The fact that Firebird is capable of using a
different collation for each field is not something that PHP understands
and another reason I don't use PDO at all in production. With ADOdb one
has a bit more access to the metadata for the query.

-- 
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Delmar Wichnieski

2016-11-22 12:42 GMT-02:00 Lester Caine :

>  needs help to move
> the string to a variable that it can check if the UTF8 data is a single
> character or multiple characters.
>
>
I believe it is a single byte, the goal is to simulate a boolean field,
where I only use S for yes and N for no. (Idem/the same Y and N in English).

There is no operations like 'upper' and 'lower'. The script is very simple,
according to pastebin links in the previous message.

S and N are in the range between 0 and 127 of the ASCII table and UTF-8
says that only one byte is required to encode the first 128 ASCII
characters (Unicode U +  to U + 007F).

But even if it consumed 2, 3 or 4 bytes, UNICODE should predict the end of
the character, so it would be enough to find the end, apply the inverse
algorithm to the encoding of the code point, and we would have the
character back. This is just a dream.


If the situation presented along the thread is a problem, then more people
should report it. Let's wait. I'll use trim per hour, or cast

example

$q = $pdoconn->prepare("SELECT CODIGO, CAST(ACESSOSISTEMA AS VARCHAR(1)) AS
ACESSOSISTEMA FROM USUARIO");

And the problem is solved. Or yet another solution not thought out.

Thank you for the precious information.

Delmar Wichnieski

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Lester Caine

On 22/11/16 13:56, Delmar Wichnieski wrote:
> But VARCHAR fields work correctly. Problem only in CHAR.

VARCHAR is trimmed to the number of bytes used ... not the number of
'characters'! CHAR is only designed for single byte characters IN PHP so
providing multi byte characters to a CHAR(1) field does not know how
many actual characters are displayed. I'm not saying what is currently
happening is right, but it is 'safe' since PHP then needs help to move
the string to a variable that it can check if the UTF8 data is a single
character or multiple characters.

> And it should not be gambol, because
> "Each UTF is reversible, thus every UTF supports lossless round tripping:
> mapping from any Unicode coded character sequence S to a sequence of bytes
> and back will produce S again."
> Source
> http://unicode.org/faq/utf_bom.html

Provided that there is no processing of the data then that is correct,
but operations like 'upper' and 'lower' can result in a change in number
of characters, and the addition of accent characters can also result in
differences. It is this area that basically stopped the development of a
UTF8 native PHP6. Normalization in http://www.unicode.org/reports/tr15/
is a minefield even for the Firebird collation process ... Just how long
is the normalized string?

> 2016-11-22 11:21 GMT-02:00 Lester Caine :
> 
>> > On 22/11/16 12:58, Delmar Wichnieski wrote:
>>> > > Since there was no answer here on the list, I was feeling alone and
>> > afraid
>>> > > and wondering why no one else has this problem.
>> >
>> > Delmar I must apologise as I HAD posted a reply, but it did not actually
>> > go through ... list in bounce emails mode which I missed ...
>> >
>> > The simple answer is that strings in PHP are not UTF8 so the 'bug' you
>> > are listing is actually that we need to make sure that the single byte
>> > buffer for a string is long enough. To ensure UTF8 strings to be handled
>> > properly since PHP6 is not going to happen, we have to transfer the
>> > simple php strings to mbstring objects. UTF8 is a gambol in PHP if it is
>> > going to be transferred properly as a simple string variable and will
>> > give string length as bytes rather than characters ...

-- 
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Delmar Wichnieski

But VARCHAR fields work correctly. Problem only in CHAR.

And it should not be gambol, because
"Each UTF is reversible, thus every UTF supports lossless round tripping:
mapping from any Unicode coded character sequence S to a sequence of bytes
and back will produce S again."
Source
http://unicode.org/faq/utf_bom.html

2016-11-22 11:21 GMT-02:00 Lester Caine :

> On 22/11/16 12:58, Delmar Wichnieski wrote:
> > Since there was no answer here on the list, I was feeling alone and
> afraid
> > and wondering why no one else has this problem.
>
> Delmar I must apologise as I HAD posted a reply, but it did not actually
> go through ... list in bounce emails mode which I missed ...
>
> The simple answer is that strings in PHP are not UTF8 so the 'bug' you
> are listing is actually that we need to make sure that the single byte
> buffer for a string is long enough. To ensure UTF8 strings to be handled
> properly since PHP6 is not going to happen, we have to transfer the
> simple php strings to mbstring objects. UTF8 is a gambol in PHP if it is
> going to be transferred properly as a simple string variable and will
> give string length as bytes rather than characters ...
>
> --
> Lester Caine - G8HFL
> -
> Contact - http://lsces.co.uk/wiki/?page=contact
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk
> Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
>
> --
> PHP Database Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Lester Caine

On 22/11/16 12:58, Delmar Wichnieski wrote:
> Since there was no answer here on the list, I was feeling alone and afraid
> and wondering why no one else has this problem.

Delmar I must apologise as I HAD posted a reply, but it did not actually
go through ... list in bounce emails mode which I missed ...

The simple answer is that strings in PHP are not UTF8 so the 'bug' you
are listing is actually that we need to make sure that the single byte
buffer for a string is long enough. To ensure UTF8 strings to be handled
properly since PHP6 is not going to happen, we have to transfer the
simple php strings to mbstring objects. UTF8 is a gambol in PHP if it is
going to be transferred properly as a simple string variable and will
give string length as bytes rather than characters ...

-- 
Lester Caine - G8HFL
-
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Delmar Wichnieski

For 30 days, I avoided opening as a bug, because I researched a lot, I
asked in some forums with more than 50 thousand users and I did not find
anyone else with this problem, so much so that in the initial post, I asked
if it could be a configuration problem, or bug or lack of full UFT8 support
with Firebir CHAR fields.

Since there was no answer here on the list, I was feeling alone and afraid
and wondering why no one else has this problem.

Environment test
Windows 10
PHP 7.1.0RC5+ x64 VC14 TS
Apache Lounge 2.4.23 x64 VC14
firebird 3.0.1 x64

configuration
php.ini
default_charset = "UTF-8"
PHP script file with the test in UFT-8



Example
CHAR SIZE 1
in the database
'S'
In php results
'S   ' instead of 'S'
vardump (char_field)
string (4) "S   "

(The issue is the same on ibase_query as PDO)

IMPORTANT
To isolate de problem, all create and insert were executed by isql


ALL DDL

DDL for all 5 collates

http://pastebin.com/0dK6xqS5



script test php

http://pastebin.com/ZRmMRiDy



script test php with PDO

http://pastebin.com/r7rErRyS

2016-11-22 10:05 GMT-02:00 Christoph M. Becker :

> On 22.11.2016 at 12:27, Delmar Wichnieski wrote:
>
> > This issue also occurs with folow PHP versions (for any collate - UTF8)
> > PHP 7.0.12+
> > PHP 7.1.0 RC4+
> > Firebir 2.5.5 + and Firebird-3.0.1.32609_0_x64
> > I have no test with other versions.
> >
> >
> > The driver could map the SQLVARs of SQL_TEXT to SQL_VARYING and adjust
> > offsets and lengths?
> >
> > Or else it has to do the manual work of identifying the charset (UTF8 =
> > number 4) and get byte by byte by mounting the string disregarding the
> > extra spaces.
> >
> > How to resolve without using trim or could there be a fix for an upcoming
> > release if the drive is not able to work with CHAR fields in UTF8?
> >
> > 2016-10-21 10:24 GMT-02:00 Delmar Wichnieski :
> >
> >> Subject:
> >> CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8 PHP is
> >> loading white spaces
> >> Example
> >> 'S   ' instead of 'S'
> >>
> >> Environment
> >> Windows 10
> >> PHP 7.0.12 x64 VC14 TS
> >> Apache Lounge 2.4.23 x64 VC14
> >> firebird 2.5.5
> >>
> >> configuration
> >> php.ini
> >> default_charset = "UTF-8"
> >> connection to the database
> >> ibase_connect ( "localhost:" DB, user, pw, "UTF8".
> >> connection PHP script file in UTF-8
> >> PHP script file with the test in UFT-8
> >> response header
> >>Content-Type: text / html; charset = UTF-8
> >> file.html
> >> 
> >>
> >> Migration in Firebird 2.5.5 charset ISO8859_1 collate PT_BR to UTF8 and
> >> UNICODE_CI_AI (firebird 2.5.5)
> >>
> >> DDL
> >>
> >> SET SQL DIALECT 3;
> >>
> >> SET NAMES UTF8;
> >>
> >> SET CLIENTLIB 'C:\Program Files\Firebird\Firebird_2_5_5\
> >> WOW64\fbclient.dll';
> >>
> >> CREATE DATABASE 'D:\MYDB_UTF8.FDB'
> >> USER 'SYSDBA' PASSWORD 'A'
> >> PAGE_SIZE 4096
> >> DEFAULT CHARACTER SET UTF8 COLLATION UNICODE_CI_AI;
> >>
> >> CREATE TABLE USUARIO (
> >> CODIGO INTEGER NOT NULL,
> >> USUARIOVARCHAR(20) CHARACTER SET UTF8 NOT NULL COLLATE
> >> UNICODE_CI_AI,
> >> SENHA  VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE
> >> UNICODE_CI_AI,
> >> CODCIDADE  INTEGER,
> >> ACESSOSISTEMA  CHAR(1) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,
> >> CPFVARCHAR(12) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI
> >> );
> >>
> >>
> >>
> >> Example
> >> CHAR SIZE 1
> >> in the database
> >> 'S'
> >> In php results
> >> 'S   ' instead of 'S'
> >> vardump (char_field)
> >> string (4) "S   "
> >>
> >> (Both ibase_query as PDO)
> >>
> >> Its a configuration problem, bug or not full support to UFT-8?
>
> On a quick glance, it seems this issue has been reported as bug, see
> .
>
> --
> Christoph M. Becker
>
>

[PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

2016-11-22 Thread Christoph M. Becker

On 22.11.2016 at 12:27, Delmar Wichnieski wrote:

> This issue also occurs with folow PHP versions (for any collate - UTF8)
> PHP 7.0.12+
> PHP 7.1.0 RC4+
> Firebir 2.5.5 + and Firebird-3.0.1.32609_0_x64
> I have no test with other versions.
> 
> 
> The driver could map the SQLVARs of SQL_TEXT to SQL_VARYING and adjust
> offsets and lengths?
> 
> Or else it has to do the manual work of identifying the charset (UTF8 =
> number 4) and get byte by byte by mounting the string disregarding the
> extra spaces.
> 
> How to resolve without using trim or could there be a fix for an upcoming
> release if the drive is not able to work with CHAR fields in UTF8?
> 
> 2016-10-21 10:24 GMT-02:00 Delmar Wichnieski :
> 
>> Subject:
>> CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8 PHP is
>> loading white spaces
>> Example
>> 'S   ' instead of 'S'
>>
>> Environment
>> Windows 10
>> PHP 7.0.12 x64 VC14 TS
>> Apache Lounge 2.4.23 x64 VC14
>> firebird 2.5.5
>>
>> configuration
>> php.ini
>> default_charset = "UTF-8"
>> connection to the database
>> ibase_connect ( "localhost:" DB, user, pw, "UTF8".
>> connection PHP script file in UTF-8
>> PHP script file with the test in UFT-8
>> response header
>>Content-Type: text / html; charset = UTF-8
>> file.html
>> 
>>
>> Migration in Firebird 2.5.5 charset ISO8859_1 collate PT_BR to UTF8 and
>> UNICODE_CI_AI (firebird 2.5.5)
>>
>> DDL
>>
>> SET SQL DIALECT 3;
>>
>> SET NAMES UTF8;
>>
>> SET CLIENTLIB 'C:\Program Files\Firebird\Firebird_2_5_5\
>> WOW64\fbclient.dll';
>>
>> CREATE DATABASE 'D:\MYDB_UTF8.FDB'
>> USER 'SYSDBA' PASSWORD 'A'
>> PAGE_SIZE 4096
>> DEFAULT CHARACTER SET UTF8 COLLATION UNICODE_CI_AI;
>>
>> CREATE TABLE USUARIO (
>> CODIGO INTEGER NOT NULL,
>> USUARIOVARCHAR(20) CHARACTER SET UTF8 NOT NULL COLLATE
>> UNICODE_CI_AI,
>> SENHA  VARCHAR(10) CHARACTER SET UTF8 NOT NULL COLLATE
>> UNICODE_CI_AI,
>> CODCIDADE  INTEGER,
>> ACESSOSISTEMA  CHAR(1) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI,
>> CPFVARCHAR(12) CHARACTER SET UTF8 COLLATE UNICODE_CI_AI
>> );
>>
>>
>>
>> Example
>> CHAR SIZE 1
>> in the database
>> 'S'
>> In php results
>> 'S   ' instead of 'S'
>> vardump (char_field)
>> string (4) "S   "
>>
>> (Both ibase_query as PDO)
>>
>> Its a configuration problem, bug or not full support to UFT-8?

On a quick glance, it seems this issue has been reported as bug, see
.

-- 
Christoph M. Becker


-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

Re: [PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

[PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

[PHP-DB] Re: CHAR field with charset UTF8 and COLLATION UNICODE_CI_AI or UTF8PHP is loading white spaces

7 matches

Site Navigation

Mail list logo

Footer information