Hi Stephen, > In the documents you serve, do you specify the encoding *within* the > document, at th etop of the HTML file for example? Or are you serving > XML in which case the default for that is utf-8 anyway (I think, off > the top of my head...).
Usually we specify it both ways, in the meta http-equiv part of the HTML header and the Content-Type header of the HTTP response. > Another possibility is that you happen to be using browsers which are > smart enough to reparse a document if it doesn't happen to be in the > encoding it first expected. I think the big guys do this -- not sure > your mobile phone will be so forgiving. I'd say we are perfectly happy with just setting up the config file via the ns/encodings + ns/mimetypes sections and let the server handle the rest. The less knobs the better. We know (or can control) the encoding of files on disk, we set up the encoding of the database - and then we simply want to return the specified encoding. We have different sites running with iso-8859-1, -15 and utf-8. Usually we have no need to do runtime changes, but if so, I would like to see ns_conn to do the expected thing. Only relying on (aka. being forced to use) UTF-8 would not be optimal as a potential naviserver user might want to use another specified encoding or avoid a UTF/unicode database setup for whatever reason, e.g. performance, storage or to avoid collation issues (sorting orders). For us using only web and http moving with every installation to UTF-8 is nevertheless the way to go. > (This applies to case 3: supporting multiple encodings) > > > I agree with Zoran. ns_conn encoding should be the way to change the > encoding (input or output) at runtime. yes. > Another place this trips up: In the config for the tests Michael added: > > ns_section "ns/mimetypes" > ns_param .utf2utf_adp "text/plain; charset=utf-8" > ns_param .iso2iso_adp "text/plain; charset=iso-8859-1" > > ns_section "ns/encodings" > ns_param .utf2utf_adp "utf-8" > ns_param .iso2iso_adp "iso-8859-1" > > The ns/encodings are the encoding to use to read an ADP file from > disk, accoring to extension. It solves the problem: the web designers > editor doesn't support utf-8. If you focus here only on web designers and adp files. It could be every other kind of usage as well (file exports etc.). > But, the code is actually expecting Tcl encoding names here, not a > charset, so this config is busted. It doesn't show up in the tests > because the only alternative encoding we're using is iso-8859-1, which > also happens to be the default. this is correct, an annoying thing to be aware of. > The strategy of driving the encoding from the mime-type has some other > problems. You have to create a whole bunch of fake mime-types / > extension mappings just to support multiple encodings (the > ns/mimetypes above). > > What if there is no extension? Or you want to keep the .adp (or > whatever) extension, but serve content in different encodings from > different parts of the URL tree? Currently you have to put code in > each ADP to set the mime-type (which is always the same) explicitly, > to set the charset as a side effect. this is true. it does not affect our apps, as we commit to one encoding and then cache the HTML output to files on disk, but it is not nice if you have the need to change it. > * utf-8 by default > * mime-types are just mime-types > * always hack the mime-type for text data to add the charset > * text is anything sent via Ns_ConnReturnCharData() > * binary is a Tcl bytearray object > * static files are served as-is, text or binary > * multiple encodings are handled via calling ns_conn encoding > * folks need to do this manually. no more file extension magic > I think a nice way for folks to handle multiple encodings is to > register a filter, which you can of course use to simulate the file > extension scheme in place now, the AOLserver 4.5 ns_register_encoding > stuff, and more, because it's a filter. You can also do things like > check query data or cookies for the charset to use. As our app has one main filter that handles the file dispatching we simply would place it there. But we should find a solution that is both flexible and compatible in respect of the "file extension magic", if possible! > Questions that need answered: > > * can we junk charset aliases in nsd/encodings.c and use a dir of symlinks? i would vote for non filesystem based lookup function. > * can we junk ns/encodings in 2006? i would not recommend it as the server loses purposes. Bernd.