On Tue, Feb 24, 2009 at 6:20 AM, Hans - softflow.co.uk
<[email protected]> wrote:
>
>> take for example the BOLTutf8_strip function. if i open the file
>> regardless of the encoding i have it full of greek characters (my
>> locale), while if i open it with Wester Euroepan encoding it shows
>> correctly.
>> maybe it should be encoded in UTF?
>
> I think you are right. The BOLTutf8_strip function with its character
> conversion array is the root cause of display problems in text
> editors, because the file is encoded in ANSI. It should be encoded in
> UTF-8. I tried UNICODE, but it does not work, whereas saving the file
> as UTF-8 in one texteditor made it possible to open and correctly
> display the Latin letters with diacritics in my problematic text
> editor as well.

One of the reasons I had problems with charsets was my editor does not
seem to have any options for saving in different character encodings.
If someone could suggest a way to change this function to use the
ascii encoded values, I would happily change it. I think it's the only
place this is a problem in the core.  I notice the utf.php plugin uses
a bunch of ascii encoded values to handle case conversions (Don't
remember where I got that array). So it's possible. I just don't know
how...  Anyone care to help?

> Dan, you could also consider moving that function to markups.php (and
> still saving the file with UTF-8 encoding).

This only moves the problem to the markups file. I'd rather solve it
properly if possible.

> Actually the function does
> not strip utf-8, it just converts Latin characters with diacritics to
> Latin characters without diacritics, it converts characters from range
> 128 to 256 of the unicode set to characters below 128 (basic ascii
> characters).
>
> It is most useful for west european languages, which use a lot of
> diacritics, to convert page name input to the basic ascii character
> set.

That's correct--it's what it was designed for. The function should be
misnamed.  Maybe BOLTstripAccents() or something.  This has been
around awhile in BoltWire to make it easier to handle pagenames with
diacritics. We have an interesting situation now that we have added
the UTFpages option in site.config. If you enable, the accents are
encoded and retained. If not enabled, they get stripped. At least
that's how they are supposed to work. It would be good if someone
would verify.

Cheers,
Dan

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"BoltWire" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/boltwire?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to