On Thu, 28 Feb 2002, Bernd Eidenschink wrote:

> Hi,
>
> I don't know how to set up a combination of the latest AOLserver,
> using the nsd8x Interpreter, and a Postgres 7.2 database, that allows
> me to safely work with a charset of iso8859-1. Please don't throw
> stones, I know this has been discussed very often ;-)
>
> How to do it?
>
> The problems I run into during my own tests and the problems that
> other people have (I read through some threads on various boards) are:
>
> A.
> Using the latest server "out of the box", working with most European
> characters will almost always fail using the typical string and regexp
> functions that internally use utf-8 ... _IF_ you are returning (in http-
> header or meta-tags) a charset of e.g. iso8859-1 (you have to
> know what comes from forms/submits; you try to return "umlauts"
> or any language depending chars and tell the browser of it).
>
> B.
> I got the tip of using an undocumented parameter in the config
> file, that maps the iso8859-1 charset to ".adp" files (last year on this
> list).
> But this does not guarantee that all characters that _leave_ an
> adp will be in iso8859-1 encoding (e.g. if I use combinations of
> "ns_adp_parse -file" (where strings from the DB are regexped and
> stringed and whatever) and return the string with "ns_return").
> At least if all what comes in goes iso-ed to the DB, you could do
> a workaround and translate all outgoing chars to "&#123" html code.
> (There was an example of a ns_adp_puts function that does this
> given by Harray Moreau on the list)... If you "escape" all the characters
> that way, I assume, you would not no longer have to return a
> charset header or charset-meta-tag.
>
> C.
> The uncool way: Using "charset=utf-8" outgoing, then also
> expecting it incoming. A special character should come in as unicode
> and tcl should treat it this way. The database must be installed with
> unicode-encoding. You will run into performace problems and/or, maybe,
> some unresolved topics of Postgres unicode-implementation as well.
> I have not tried this yet, I merely assume this would work. What's
> your opinion on that?
>
> D.
> Using an AD-patched version of AOLserver. The Problem: It's an
> older version of the server, will it be kept up to date in the future?
> (Of course, there's a large user base running it)
>
> E.
> I did not try --enable-recode and setting up a charset table for
> utf8 -> latin1 and latin1 -> utf8 for Postgres. Maybe this would
> work if you can guarantee all charsets coming from AOLserver
> are coming as utf8 or iso8859x. Maybe, don't know.
> I tried putting "SET ENCODING TO 'UNICODE'" resp. "LATIN1" in front
> of every SQL statement in my test api (and took into account that it may
> make a
> difference if you are SELECTing or INSERTing) for telling the Postgres
> server that the client uses unicode or latin with and without the
> undocumented feature of (B). This only lead to error messages
> noticing me of failed encoding translations. Every insert _always_
> was logged with german special characters (umlauts) and never
> as unicode characters (don't know if this is correct).
>
> Will AOLserver 4 come with I18N support that solves all the problems
> and what to do if you need a solution here and now?
> Solution B? C?
>
>
> Thanks for reading through this one...
>
> Bernd.
>

Kriston from AOL assured me that AOLserver 4 will have full international
support. But I haven't inspected availabale code and verified that yet.

Now I successfully run on 3.4.x dealing with iso8859-2 in the following
way (I'm sure that you can do this the same way with iso8859-1 = latin1).

* AOLserver scripts

I use only .adp scripts with undocumented config

ns_section "ns/encodings"
ns_param "adp" iso8859-2

Of course I use *.tcl libraries with proper encoding set by [encoding
system iso8859-2].

I don't use *.tcl pages at all because as far as I remeber ns_return* and
ns_write don't do proper encoding translation. If you have to use *.tcl
scripts (for example you are ACS user) I think you should use ArsDigita
version today.

* Databases

If I use Postgres - I configure my database to use UTF-8 encoding

createdb -E unicode database_name_in_utf8

and I set in the AOLserver process enviroment PGCLIENTENCODIGN=latin2
for postgres libraries.

I suggest using UTF-8 for Postgres database rather than using 8-bit
encoding in database (ex. latin2) and translating TCL strings which are
send in unicode to some 8-bit encoding (it is possible by postgres
configure option --enable-unicode-conversion). Main reason is that when
running database in latin2 I had problems with encoding in very usefull
Postgres procedural languages: pltcl & pltclu. You may also encounter
problems with char expansion which I discovered with Oracle.

Anyway for Oracle I successfully used database in latin2 encoding with
CharExpansion=4 parameter added to ArsDigita Oracle Driver 2.6 as a result
of my proposal.

* Other stuff

With 3.4 I use the following simple patch for ns_urlencode. This is not
general solution, but rather dirty fix.

[tkosiak@mule nsd]$ diff urlencode.c.orig urlencode.c
72a73,81
>     Tcl_DString ds;
>     char* encname;
>
>     Tcl_DStringInit(&ds);
>
>     encname = Ns_ConfigGet(NS_CONFIG_PARAMETERS, "urlencoding");
>     if ( encname != NULL ) {
>         string = NsUtf2Ext(NsGetEnc(encname),string,&ds);
>     }

Than you have to configure ns_urlencode:

ns_section "ns/parameters"
ns_param   urlencoding iso8859-2

In 3.4 ns_urldecode works fine in adp scripts with properly configured
encoding.

You should also inspect and probably fix procs defined in shared TCL
library:

ns_httpopen in aolserver/modules/tcl/http.tcl
ns_sendmail in aolserver/modules/tcl/sendmail.tcl

They should use [fconfigure -encoding %$fd] to properly handle encodings.
But this is optional if you don't use these features.

--tkosiak

Reply via email to