C Drozdowski wrote:
> I have been doing some testing and need confirmation that the following
> is correct.
>
> You have a DOMDocument that potentially contains UTF-8 encoded data (it
> might not however).
>
> You want to search it via DOMXpath->query() using a value that comes
> from a $_POST value.
>
> If the page that posts the data via a form to the search script  IS NOT
> encoded in UTF-8, then the value must be converted to UTF-8 before it is
> used in the query expression.
>
> Else, if the posting page IS UTF-8 encoded, then the $_POST data does
> not need to be converted before being used in the expression.
>
> Is this correct?

AFAIK... yes, this is correct.

>
> Also, if the $_POST data comes from a UTF-8 encoded page, and it needs
> to be sanitized before use, will the basic PHP string functions work on
> the data (e.g. htmlentities, stripslashes, trim, preg_replace, etc)?
>
> If not what do I have to do?

I believe that PHP uses ISO-8859-1 as the default encoding, but there
are ways around it.

htmlentities() will let you specify UTF-8 encoding.

Remember that your DOMDocument may / may not be whitespace-sensitive, so
be careful about how / if you trim().

I don't know how well stripslashes, preg_replace, etc. work with UTF-8.
 Hopefully someone else will be able to help out with those...

--
Teach a man to fish...

NEW? | http://www.catb.org/~esr/faqs/smart-questions.html
STFA | http://marc.theaimsgroup.com/?l=php-general&w=2
STFM | http://php.net/manual/en/index.php
STFW | http://www.google.com/search?q=php
LAZY |
http://mycroft.mozdev.org/download.html?name=PHP&submitform=Find+search+plugins

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to