Wow, turned out to be quite a deep problem... While it was easy to
hack a fix, I wanted to dig deeper, as always. To see if we can't fix
it right.
Here's the issues:
1) Pages are saved in our %encoded page names
2) Anchors are saved in utf in the actual page, but displayed in the
html as % encoded.
So to get it really working right, always, you have to go to the
loadpage function and convert url2utf for the anchors, and then you
can send url or utf to the load page function with no problem. No more
worrying about encoding in secondary functions. Need to do the same in
the savepage function, I suspect....
Similarly, if you do the same with page names, always convert them to
% encoding at the load and save level, we should be able to escape
needing to worry about it elsewhere. Or at least a backup fail safe...
It never hurts to run these functions more than once on a string.
Also by adding this line to BOLTfilter
if ($type == 'page') $input = BOLTutf2url($input);
I take care of all page filtering automatically as well. I'm not sure
I want to go through and strip out every other instance of BOLTutf2url
and BOLTurl2utf, because I'm surely forgetting something--but
theoretically it might be possible. Certainly most could go.
Also, as you noticed we do have some duplicate code in include & load
page. At least as far as anchors go, we can delete all that in the
include function and just change the load page line to this:
$out = BOLTloadpage("$page$anchor", $dir);
I have it working perfectly on my site, on a test page simultaneously
checking the source function, zones, breadcrumbs and several other
functions to make sure we don't break some utf capability in the
process. It's coming along--but I'm taking my time, trying to be
careful.
As for your other posts about headers and footers, I don't think I
want those in the loadpage function because they call the loadpage
function. :) I think giving a call to the include function with
whatever parameters you want is the best approach. Or to make it a bit
easier on the eyes:
$text = BOLTcomm2func('include', 'parameters....');
This may end up being a pretty big release... But it should make the
code a lot smarter and more bug free. Maybe we'll never get to bug
report UTF8-bug #4000
Cheers,
Dan
On Thu, Sep 3, 2009 at 8:33 AM, The Editor<[email protected]> wrote:
> Yes, actually. However, I think we may want to also filter the
> anchors, to ensure they are valid html names.
>
> Let's try adding # to the escaped chars.
>
> Line 55 or so engine.php. I'm testing now, but if you are faster...
> Either we will want to clarify the filtering process, so it's not
> called twice. But sooner, not later I think...
>
> Cheers,
> Dan
>
>
> On Thu, Sep 3, 2009 at 8:28 AM, DrunkenMonk<[email protected]> wrote:
>>
>>> P.S. It may be easier to set a special line in the filtering for when
>>> pages are being scanned to automatically convert to url encoded. But
>>> then of course, everything will pass and the filtering is useless...
>>> On the other hand, if all page names get url encoded, perhaps we don't
>>> need to worry about special chars anyway. I need to check into the
>>> security issues again. But I suspect that third line should just read
>>>
>>> if ($page == '') return;
>>
>> That does sound like a better solution.
>>
>> I'm having trouble with [(include page#log)] right now. I wonder if
>> it's connected.
>>
>> I notice that boltfilter is being called twice as it stands, and that
>> if you put the utf2url where you suggest you may break anchor rules.
>> I'm going to try to remove boltfilter from that position and rely on
>> the second one, see if it breaks anything.
>> >>
>>
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"BoltWire" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/boltwire?hl=en
-~----------~----~----~----~------~----~------~--~---