On Tue, Feb 24, 2009 at 6:20 AM, Hans - softflow.co.uk <[email protected]> wrote: > >> take for example the BOLTutf8_strip function. if i open the file >> regardless of the encoding i have it full of greek characters (my >> locale), while if i open it with Wester Euroepan encoding it shows >> correctly. >> maybe it should be encoded in UTF? > > I think you are right. The BOLTutf8_strip function with its character > conversion array is the root cause of display problems in text > editors, because the file is encoded in ANSI. It should be encoded in > UTF-8. I tried UNICODE, but it does not work, whereas saving the file > as UTF-8 in one texteditor made it possible to open and correctly > display the Latin letters with diacritics in my problematic text > editor as well.
One of the reasons I had problems with charsets was my editor does not seem to have any options for saving in different character encodings. If someone could suggest a way to change this function to use the ascii encoded values, I would happily change it. I think it's the only place this is a problem in the core. I notice the utf.php plugin uses a bunch of ascii encoded values to handle case conversions (Don't remember where I got that array). So it's possible. I just don't know how... Anyone care to help? > Dan, you could also consider moving that function to markups.php (and > still saving the file with UTF-8 encoding). This only moves the problem to the markups file. I'd rather solve it properly if possible. > Actually the function does > not strip utf-8, it just converts Latin characters with diacritics to > Latin characters without diacritics, it converts characters from range > 128 to 256 of the unicode set to characters below 128 (basic ascii > characters). > > It is most useful for west european languages, which use a lot of > diacritics, to convert page name input to the basic ascii character > set. That's correct--it's what it was designed for. The function should be misnamed. Maybe BOLTstripAccents() or something. This has been around awhile in BoltWire to make it easier to handle pagenames with diacritics. We have an interesting situation now that we have added the UTFpages option in site.config. If you enable, the accents are encoded and retained. If not enabled, they get stripped. At least that's how they are supposed to work. It would be good if someone would verify. Cheers, Dan --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "BoltWire" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/boltwire?hl=en -~----------~----~----~----~------~----~------~--~---
