Hi Oleg,

since HTTP has means to include encodings, which NaviServer uses acting
as a server, it should behave the same way when acting as a client and
not burdening the application to dig into the content-type charsets
to call the right conversion stuff.

A "-binary" flag still makes sense in cases there is no content-type
given or to let the developer overrule other mechanisms.
The usage of the "-binary" flag + convertfrom/to should
always be applicable. Having NaviServer versions leading to different
results depending on compile flags is not a good idea.

To get a more detailed understanding, i have to dig into your examples
to understand whether this is indeed a problem on the Tcl side or
in NaviServer, ... but for this, i need a certain block of time, which
is hard to get for me right now.

-g

On 03.09.20 13:52, oleg wrote:
Hello!

We are having some difficulties when using the ns_http command with
sites using 8-bit encoding.

The ns_http command does not convert the received data, so we must use
the 'encoding convertfrom' command. Sometimes converted strings become
corrupted. For example, there is a server with output encoding
iso-8859-2:
if the server passes 'äöüŁ', then after conversion we get 'äöüŁ'
(correct);
if the server passes 'ÄÖÜŁ', then after conversion we get 'ÄÖ#'
(corrupted).
See attached ns_http.test1 for example (test 1.2 fails).

Such strings can be found in any 8-bit encoding (to see run attached
http_charsets.test with 'pairsTest' constraint enabled).
The source for the ns_http command (tclhttp.c) shows that the problem is
using the Tcl_NewStringObj on binary input data (8-bit chars).

Two solutions come up:
1) Using Tcl_NewByteArrayObj instead of Tcl_NewStringObj;
2) Using Tcl_ExternalToUtf before using Tcl_NewStringObj, i.e. built-in
'encoding convertfrom'.

Attached tclhttp.c.binary-externaltoutf patch modifies the ns_http
command:
1) the -binary switch is added to the queue/wait/run sub-commands to use
of Tcl_NewByteArrayObj on text pages;
2) without -binary the text page will be converted according to the
Content-Type header.

Note that the second change requires the TCLHTTP_USE_EXTERNALTOUTF to
be defined at compile time.

The fixed ns_http command can be tested with the attached ns_http.test2
(see 1.2.1 and 1.2.2). More intensive testing of changes can be done
with the http_charsets.test (note commented pairsTest
constraint).
Also I replaced the 'nstest :: http-0.9 -encoding xxx' with 'ns_http
run' in existing encoding.test (see encoding_ns_http.test). All data
transformations are successfully performed without explicit decoding.

Automatic data decoding is convenient to use, but it changes the
behavior of ns_http on 8-bit inputs. These changes could break existing
code if someone uses ns_http to inter with 8-bit sites (with risk of
data corruption). To use the patched version of ns_http, either remove
the 'encoding convertfrom' or add the -binary switch.

It should be noted that the -binary switch followed by 'encoding
convertfrom' will also be useful for 8-bit sites with missing or
incorrect Content-Type.

Regards,
Oleg Oleinick.

PS. Attached files:

ns_http.test1 - tests for the current version, shows corruption of
8-bit text;

ns_http.test2 - tests for the patched version, shows the correct
receipt of 8-bit text;

tclhttp.c.binary-externaltoutf.patch - patch for changing the ns_http
command, adds the -binary switch and text data auto-decoding;

http_charsets.test - tests for ns_http, suitable for both the current
and the patched version;

encoding_ns_http.test - like existing encoding.test, with 'nstest ::
http-0.9 -encoding xxx' replaces by new 'ns_http run';


_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to