Hi Oleg, since HTTP has means to include encodings, which NaviServer uses acting as a server, it should behave the same way when acting as a client and not burdening the application to dig into the content-type charsets to call the right conversion stuff.
A "-binary" flag still makes sense in cases there is no content-type given or to let the developer overrule other mechanisms. The usage of the "-binary" flag + convertfrom/to should always be applicable. Having NaviServer versions leading to different results depending on compile flags is not a good idea. To get a more detailed understanding, i have to dig into your examples to understand whether this is indeed a problem on the Tcl side or in NaviServer, ... but for this, i need a certain block of time, which is hard to get for me right now. -g On 03.09.20 13:52, oleg wrote:
Hello! We are having some difficulties when using the ns_http command with sites using 8-bit encoding. The ns_http command does not convert the received data, so we must use the 'encoding convertfrom' command. Sometimes converted strings become corrupted. For example, there is a server with output encoding iso-8859-2: if the server passes 'äöüŁ', then after conversion we get 'äöüŁ' (correct); if the server passes 'ÄÖÜŁ', then after conversion we get 'ÄÖ#' (corrupted). See attached ns_http.test1 for example (test 1.2 fails). Such strings can be found in any 8-bit encoding (to see run attached http_charsets.test with 'pairsTest' constraint enabled). The source for the ns_http command (tclhttp.c) shows that the problem is using the Tcl_NewStringObj on binary input data (8-bit chars). Two solutions come up: 1) Using Tcl_NewByteArrayObj instead of Tcl_NewStringObj; 2) Using Tcl_ExternalToUtf before using Tcl_NewStringObj, i.e. built-in 'encoding convertfrom'. Attached tclhttp.c.binary-externaltoutf patch modifies the ns_http command: 1) the -binary switch is added to the queue/wait/run sub-commands to use of Tcl_NewByteArrayObj on text pages; 2) without -binary the text page will be converted according to the Content-Type header. Note that the second change requires the TCLHTTP_USE_EXTERNALTOUTF to be defined at compile time. The fixed ns_http command can be tested with the attached ns_http.test2 (see 1.2.1 and 1.2.2). More intensive testing of changes can be done with the http_charsets.test (note commented pairsTest constraint). Also I replaced the 'nstest :: http-0.9 -encoding xxx' with 'ns_http run' in existing encoding.test (see encoding_ns_http.test). All data transformations are successfully performed without explicit decoding. Automatic data decoding is convenient to use, but it changes the behavior of ns_http on 8-bit inputs. These changes could break existing code if someone uses ns_http to inter with 8-bit sites (with risk of data corruption). To use the patched version of ns_http, either remove the 'encoding convertfrom' or add the -binary switch. It should be noted that the -binary switch followed by 'encoding convertfrom' will also be useful for 8-bit sites with missing or incorrect Content-Type. Regards, Oleg Oleinick. PS. Attached files: ns_http.test1 - tests for the current version, shows corruption of 8-bit text; ns_http.test2 - tests for the patched version, shows the correct receipt of 8-bit text; tclhttp.c.binary-externaltoutf.patch - patch for changing the ns_http command, adds the -binary switch and text data auto-decoding; http_charsets.test - tests for ns_http, suitable for both the current and the patched version; encoding_ns_http.test - like existing encoding.test, with 'nstest :: http-0.9 -encoding xxx' replaces by new 'ns_http run'; _______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
_______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel