Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2022-03-18 Thread Gustaf Neumann
Just as a short notice: iOS 14.5 (released a few days ago) supports some 
more Unicode 14 characters, iOS 15 is supposed to support all of Unicode 
14.0.
The melting face of Unicode 14 on the test-page on openacs.org (see link 
below) works already.


-gn

On 04.12.21 15:57, Gustaf Neumann wrote:
It will take some time, until the Emojis from Unicode 14 will be 
generally available, but when this comes, we should have already 
everything working in NaviServer and the DB interfaces. I've added a 
small demo page, one can try when the new clients come out:


https://openacs.org/emojis.tcl





___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-12-04 Thread Gustaf Neumann

Dear all,

It will take some time, until the Emojis from Unicode 14 will be 
generally available, but when this comes, we should have already 
everything working in NaviServer and the DB interfaces. I've added a 
small demo page, one can try when the new clients come out:


https://openacs.org/emojis.tcl

One interesting part is the grapheme cluster (like e.g. ‍‍‍), 
which is made up of the following unicode graphemes:


   WOMAN ZWJ WOMAN ZWJ GIRL ZWJ BOY

where ZWJ are zero-width joiners. One can enter these e.g. via

  set x \ud83d\udc69\u200d\ud83d\udc69\u200d\ud83d\udc67\u200d\ud83d\udc66

into current Tcl. When just passing such string through Tcl, everything 
seems fine. I would not be surprised that  "string length", "string 
range" etc can lead to unexpected results, but is is quite fun to 
decompose this emoji with Tcl:


   % set x \ud83d\udc69\u200d\ud83d\udc69\u200d\ud83d\udc67\u200d\ud83d\udc66
   ‍‍‍
   % string range $x 0 1
   
   % string range $x 6 7
   
   % string range $x 9 10
   

AFIKT, eveything is fine with NaviServer in this respect.

Concerning Unicode 14:

Android 12L contains support for Emojis from Unicode 14. Google 
announced Android 12L in October 2021, less than one month after the 
stable release of Android 12. 12L is expected in early 2022 [2].


According to [3] iOS 15.0 will not include Unicode 14 emojis. Support 
for Emoji 14.0 on Apple platforms is expected in the first half of 2022 
(probably in iOS 16).


all the best

-g

[1] https://9to5google.com/2021/10/27/android-12l-unicode-14/
[2] https://developer.android.com/about/versions/12/12L/summary
[3] https://emojipedia.org/apple/


On 26.11.21 10:40, Wolfgang Winkler via naviserver-devel wrote:


Hi!

We've testet the encoding now extensively. All Emojis up to 13 
 are handled correctly, 
including database storage and retrieving, tdom and form handling.


Version 14 emojis  are not 
supported by any of the browsers we've testet, but don't throw errors. 
It seems we are save for future updates.


Wolfgang

Am 18.11.21 um 18:24 schrieb Gustaf Neumann:


Dear all

On bitbucket is now an update (see change log message below) that 
introduces support of UTF-8 characters using up to 4 bytes (with Tcl 
8.6). It should work as well with 6 byte UTF when Tcl 8.7 is properly 
compiled (by setting TCL_UTF_MAX).


One can now use e.g. emoticons in SQL queries

 db_0or1row ... {select 1 from cr_items where name = ''}

or as values of bind variables

 set x 
 db_0or1row ... {select 1 from cr_items where name = :x}

... but not as names of bind variables (these have the same 
restricted syntax than before

(in essence no funny characters).

The code is already running at openacs.org.

all the best

-gn


Added support for UTF-8 characters up to 4 bytes

These changes add proper export of UTF-8 for Unicode symbols taking up
to 4 bytes. For the western world the biggest interest is probably for
emoticons. The change is implemented with performance in mind. The
proper encoded byte-strings are cached in Tcl_Objs, such that only the
values for bind-vars (which have probably different values per call)
have to be recoded at call time. This should keep the performance
penalty small (we see on some of our servers in day-average 1500 SQL
operations per second, peaks at >10K).

The names of bind variables follow still the same rules as before (no
emoticons as variable names).

On 16.11.21 16:39, Wolfgang Winkler via naviserver-devel wrote:

the fix worked, thank you Gustaf! But we still have a problem with 
emojis when writing them to the database. The error we get is:


Database operation "dml" failed (exception ERROR, "ERROR:  invalid 
byte sequence for encoding "UTF8": 0xf0 0x9f






___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

--

*Wolfgang Winkler*
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172

dc:*büro*
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com 
tel +43.732.997117.72
tel +43.699.1997117.2

Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz




___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


--
Univ.Prof. Dr. Gustaf Neumann
Head of the Institute of Information Systems and New Media
of Vienna University of Economics and Business
Program Director of MSc "Information Systems"
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-26 Thread Wolfgang Winkler via naviserver-devel

Hi!

We've testet the encoding now extensively. All Emojis up to 13 
 are handled correctly, including 
database storage and retrieving, tdom and form handling.


Version 14 emojis  are not 
supported by any of the browsers we've testet, but don't throw errors. 
It seems we are save for future updates.


Wolfgang

Am 18.11.21 um 18:24 schrieb Gustaf Neumann:


Dear all

On bitbucket is now an update (see change log message below) that 
introduces support of UTF-8 characters using up to 4 bytes (with Tcl 
8.6). It should work as well with 6 byte UTF when Tcl 8.7 is properly 
compiled (by setting TCL_UTF_MAX).


One can now use e.g. emoticons in SQL queries

 db_0or1row ... {select 1 from cr_items where name = ''}

or as values of bind variables

 set x 
 db_0or1row ... {select 1 from cr_items where name = :x}

... but not as names of bind variables (these have the same restricted 
syntax than before

(in essence no funny characters).

The code is already running at openacs.org.

all the best

-gn


Added support for UTF-8 characters up to 4 bytes

These changes add proper export of UTF-8 for Unicode symbols taking up
to 4 bytes. For the western world the biggest interest is probably for
emoticons. The change is implemented with performance in mind. The
proper encoded byte-strings are cached in Tcl_Objs, such that only the
values for bind-vars (which have probably different values per call)
have to be recoded at call time. This should keep the performance
penalty small (we see on some of our servers in day-average 1500 SQL
operations per second, peaks at >10K).

The names of bind variables follow still the same rules as before (no
emoticons as variable names).

On 16.11.21 16:39, Wolfgang Winkler via naviserver-devel wrote:

the fix worked, thank you Gustaf! But we still have a problem with 
emojis when writing them to the database. The error we get is:


Database operation "dml" failed (exception ERROR, "ERROR:  invalid 
byte sequence for encoding "UTF8": 0xf0 0x9f






___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

--

*Wolfgang Winkler*
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172

dc:*büro*
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com 
tel +43.732.997117.72
tel +43.699.1997117.2

Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz

___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-18 Thread Gustaf Neumann

Dear all

On bitbucket is now an update (see change log message below) that 
introduces support of UTF-8 characters using up to 4 bytes (with Tcl 
8.6). It should work as well with 6 byte UTF when Tcl 8.7 is properly 
compiled (by setting TCL_UTF_MAX).


One can now use e.g. emoticons in SQL queries

db_0or1row ... {select 1 from cr_items where name = ''}

or as values of bind variables

set x 
db_0or1row ... {select 1 from cr_items where name = :x}

... but not as names of bind variables (these have the same restricted 
syntax than before

(in essence no funny characters).

The code is already running at openacs.org.

all the best

-gn


Added support for UTF-8 characters up to 4 bytes

These changes add proper export of UTF-8 for Unicode symbols taking up
to 4 bytes. For the western world the biggest interest is probably for
emoticons. The change is implemented with performance in mind. The
proper encoded byte-strings are cached in Tcl_Objs, such that only the
values for bind-vars (which have probably different values per call)
have to be recoded at call time. This should keep the performance
penalty small (we see on some of our servers in day-average 1500 SQL
operations per second, peaks at >10K).

The names of bind variables follow still the same rules as before (no
emoticons as variable names).

On 16.11.21 16:39, Wolfgang Winkler via naviserver-devel wrote:

the fix worked, thank you Gustaf! But we still have a problem with 
emojis when writing them to the database. The error we get is:


Database operation "dml" failed (exception ERROR, "ERROR:  invalid 
byte sequence for encoding "UTF8": 0xf0 0x9f


___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-16 Thread Wolfgang Winkler via naviserver-devel
We've a workaround which is probably much worse than yours, but working. 
We are using nsdbpg, I've not tried the query with the nsdbi interface.


Am 16.11.21 um 17:06 schrieb Gustaf Neumann:


Funny enough, i same a very similar problem today and provide a local 
fix for this. I am not happy with this fix since it is rather costly, 
so i would like to work on this more before committing. However, today 
and tomorrow i am fully booked with urgent items, so don't expect a 
fix for this before the weekend.


-g

PS: i assume, you are using the nsdbpg driver.

On 16.11.21 16:39, Wolfgang Winkler via naviserver-devel wrote:
the fix worked, thank you Gustaf! But we still have a problem with 
emojis when writing them to the database. The error we get is:


Database operation "dml" failed (exception ERROR, "ERROR:  invalid 
byte sequence for encoding "UTF8": 0xf0 0x9f 0x98 0xff






___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

--

*Wolfgang Winkler*
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172

dc:*büro*
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com 
tel +43.732.997117.72
tel +43.699.1997117.2

Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz

___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-16 Thread Gustaf Neumann
Funny enough, i same a very similar problem today and provide a local 
fix for this. I am not happy with this fix since it is rather costly, so 
i would like to work on this more before committing. However, today and 
tomorrow i am fully booked with urgent items, so don't expect a fix for 
this before the weekend.


-g

PS: i assume, you are using the nsdbpg driver.

On 16.11.21 16:39, Wolfgang Winkler via naviserver-devel wrote:
the fix worked, thank you Gustaf! But we still have a problem with 
emojis when writing them to the database. The error we get is:


Database operation "dml" failed (exception ERROR, "ERROR:  invalid 
byte sequence for encoding "UTF8": 0xf0 0x9f 0x98 0xff


___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-16 Thread Wolfgang Winkler via naviserver-devel

Hello all,

the fix worked, thank you Gustaf! But we still have a problem with 
emojis when writing them to the database. The error we get is:


Database operation "dml" failed (exception ERROR, "ERROR:  invalid byte 
sequence for encoding "UTF8": 0xf0 0x9f 0x98 0xff


when trying to write the emoji to a TEXT or VARCHAR field in the 
database. Inserting the same string in the database console works as 
expected. When we read the string and reinsert it, it also works 
flawlessly.


We've compared the two strings, wrote them to files and compared them 
with a hex reader, converted them with tcl "encoding convertto" and 
iconv, all with no luck.


We are using postgres 12 and the nsdbpg module with 
naviserver-4.99.22-16-g67adf3c34710+


Here is the test case:

In the database console:

CREATE TABLE test (
    idx SERIAL,
    txt TEXT
);

INSERT INTO test (txt) VALUES ('');

In the naviserver console or in a script:

# V1: working
set db [ns_db gethandle]
set sql "SELECT txt FROM test WHERE idx=1"
set selection [ns_db 1row $db $sql]
set str [ns_set value $selection 0]
set sql "INSERT INTO test (txt) VALUES ('$str')"
ns_db dml $db $sql
ns_db releasehandle $db

# V2: not working
set db [ns_db gethandle]
set sql "INSERT INTO test (txt) VALUES ('')"
ns_db dml $db $sql
ns_db releasehandle $db

With nscp, pasting the string of V2 already shows a wrong string in the log:

Notice: nscp:  3: set sql "INSERT INTO test (txt) VALUES 
('���')"



Whereas V1 works (the smiley is not printed here, but works in the console):

Notice: nscp:  5: puts $str


Any help is greatly appreciated!

Wolfgang Winkler

Am 09.11.21 um 09:36 schrieb Gustaf Neumann:

Dear all,

The situation is trickier than someone might hope.  Aside of the Tcl 
version dependencies (as Brian pointed out), Tcl before 8.7 do not 
support TCL_UTF_MAX with longer multi-byte sequences than 4 (see Tcl 
TIP 389), which are also mostly relevant for some newer emojis. So, 
for full emoji support, Tcl 8.7 with the proper compilation options is 
needed.


Anyhow, in the case of Wolfgang's the "Smiling Face with Open Mouth" 
we have just a 4-byte UTF-8 character, which is supported by 
out-of-the-box Tcl 8.6. However, this emoji is represented 
Tcl-internally as a 6-byte sequence. Since NaviServer wrongly assumed 
that Tcl-internal representations are also accepted as external 
representations, a conversion step was omitted for utf-8 (which is not 
always true).


In the tip version of NaviServer on Bitbucket, this optimization is 
now removed, the examples work as expected, the regression test is 
extended for this case.


Many thanks to Wolfgang for the good bug report.

-g





___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

--

*Wolfgang Winkler*
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172

dc:*büro*
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com 
tel +43.732.997117.72
tel +43.699.1997117.2

Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz

___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-09 Thread Gustaf Neumann

Dear all,

The situation is trickier than someone might hope.  Aside of the Tcl 
version dependencies (as Brian pointed out), Tcl before 8.7 do not 
support TCL_UTF_MAX with longer multi-byte sequences than 4 (see Tcl TIP 
389), which are also mostly relevant for some newer emojis. So, for full 
emoji support, Tcl 8.7 with the proper compilation options is needed.


Anyhow, in the case of Wolfgang's the "Smiling Face with Open Mouth" we 
have just a 4-byte UTF-8 character, which is supported by out-of-the-box 
Tcl 8.6. However, this emoji is represented Tcl-internally as a 6-byte 
sequence. Since NaviServer wrongly assumed that Tcl-internal 
representations are also accepted as external representations, a 
conversion step was omitted for utf-8 (which is not always true).


In the tip version of NaviServer on Bitbucket, this optimization is now 
removed, the examples work as expected, the regression test is extended 
for this case.


Many thanks to Wolfgang for the good bug report.

-g





___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


Re: [naviserver-devel] Issue with certain emojis (unicode/utf-8)

2021-11-08 Thread Brian Fenton
Hi Wolfgang

I wonder if it's a TCL issue? For example, I see this discussion has some 
mentions of problems with emojis in some TCL versions 
https://core.tcl-lang.org/tips/doc/trunk/tip/600.md

Brian


From: Wolfgang Winkler via naviserver-devel 

Sent: Monday 8 November 2021 11:09
To: naviserver-devel@lists.sourceforge.net 

Cc: Wolfgang Winkler 
Subject: [naviserver-devel] Issue with certain emojis (unicode/utf-8)


This message's attachments contains at least one web link. This is often used 
for phishing attempts. Please only interact with this attachment if you know 
its source and that the content is safe. If in doubt, confirm the legitimacy 
with the sender by phone.


Dear all!

We face some issues with certain UTF-8 characters, e.g. this one: 
https://emojipedia.org/grinning-face-with-big-eyes/.

When we set "charset" to "utf-8", everything works as expected, except the 
output of various emojis.

set str "Test smiley: öüäÖÄÜß"
ns_return 200 "text/html; charset=unicode" $str; return

returns the smiley correctly, whereas

set str "Test smiley: öüäÖÄÜß"
ns_return 200 "text/html; charset=utf-8" $str; return

returns

Test smiley: ��
öüäÖÄÜß

So I tried to set the charset to "unicode". This works for some files and not 
for others, especially not for javascript files.

This are the parameters in the config section:

ns_section "ns/parameters"
 
 #ns_param   HackContentType true
 ns_param   HackContentType false
 ns_param   OutputCharset   $charset
 ns_param   URLCharset  $charset


We also tried with nscp and the tclsh:

nscp Input:

puts ""

Log Output:

Notice: nscp:  1: puts "���"

The nscp telnet client does not return to the prompt.

Tclsh works as expected:

tclsh
% puts ""



Tcl version is 8.6.11, naviserver 4.99.22 running on Debian 10.11.

Has anybody encountered and solved a similiar issue?

Thanks,

Wolfgang Winkler

--

Wolfgang Winkler
Geschäftsführung
wolfgang.wink...@digital-concepts.com
mobil +43.699.19971172

dc:büro
digital concepts Novak Winkler OG
Software & Design
Landstraße 68, 5. Stock, 4020 Linz
www.digital-concepts.com
tel +43.732.997117.72
tel +43.699.1997117.2

Firmenbuchnummer: 192003h
Firmenbuchgericht: Landesgericht Linz

[https://www.digital-concepts.com/cu/digitalconcepts2016/images/logo_digitalconcepts2016.png]
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel