On Fri, January 13, 2006 10:55 am, Jay Blanchard wrote:
> I am having a problem with a an ampersand sign. I have a list of
things on a
> page, in which one category is 'Oil & Gas'. I store it in the
database as
> 'Oil & Gas'.

Don't.

The DATA to be stored in the database is 'Oil & Gas'

When it's time to present it in a browser, and ONLY when it's time to
present it in a browser, use:
htmlentities('Oil & Gas')
to make it suitable for HTML transport to the browser.

Here's why:
Suppose tomorrow you decide to do an RSS Feed, or export to another
database, or send that data somewhere OTHER than your browser.

Your & is *NOT* the raw data, and it's *NOT* what that other
technology might *want* for the encoding of &

That other technology might not even WANT & encoded in the first place.

Now, RSS might want & -> & for its encoding

But can you guarantee that tomorrow's technology will want that?

No.

Maybe tomorrow's next big thing will want & -> && or perhaps it will
want & -> %#26 or maybe it will want & -> 'fnord-26' or maybe it won't
even need & encoded, but it will need the character sequence 'fnord'
encoded.

The DATA is 'Oil & Gas'

'Oil & Gas' is merely a presentation / encoding of that data for
one (or more) particular (currently popular) transport mechanisms.

Encoding the data for today's usage in your orginal source data is
sheer folly, of the same magnitude that gave us Y2K.

You're making trouble for yourself long-term, and probably confusing
yourself short-term.

RAW data goes in your database: 'Oil & Gas'

> When the category is clicked the query string shows
> just an
> ampersand, i.e.
> "Filter=Process&FilterKey=Oil%20&%20Gas&Order=Application&Direction=ASC&comm
ents=" and therefore just shows as an '&' and the query only sees
'Oil'.

Shows where?

Until you tell us what showed you & where, we can't even begin to
guess what is going on -- because WHERE you saw it changes everything.

There are all manner of potential sources of your vision here.

What you see in the browser, and what you see in "View Source" and
what you see when your mouse goes over a link are all different, and
probably all different from what you would see in the 'mysql' monitor
program.

If "View Source" showed you that, then it's probably a problem.
If you saw it printed out to your browser, it may or may not be a
problem.
If it's in the ToolTip from mouse-over of the link, it's may or may
not be a problem.

The browsers try to "hide" icky details from normal users, and that
means the the & will often get converted before you see it.

The fact that the link doesn't work means that it obviously *IS* a
problem, of course, so exactly where you saw it is somewhat moot,
since you shouldn't have put & in your database, and after you fix
that, the solution will probably entail fixing whatever is causing the
& to get "lost" anyway.

> I guess that I am too tired to deal with this or the answer would
come to
> mind immediately. Can someone drop kick me in the right direction?

Ah.  An even MORE important reason for not doing what you did.

Part of your PROBLEM is you've put & in the database instead of &

So you think it's "escaped" already.

Well, it is... For HTML display, it is escaped.

It is *NOT* escaped for a URL.

urlencode() is for URL-escaping.
htmlentities() is for HTML-escaping.

You've done htmlentities() on your data, not urlencode() on your
output of your data.

What *SHOULD* be done is this:

1. Get the original,  un-corrupted (un-escaped) data: 'Oil & Gas'
$value = 'Oil & Gas'; // from db.

Note lack of & here!

Your database has no business [*] keeping the HTML-encoding of its
data internally.

2. Since that datum is being passed as an argument in a URL,
urlencode() it:
$value_url = urlencode($value); //prepare for use in URL

$value_url will now most likely contain %26, and the whole & -> &
problem will be MOOT.

But you never know for sure WHAT data will be in there, so...

3. Make the URL:
$url = "Filter=" . urlencode('Process') .
"&FilterKey=$value_url&Order=" . urlencode('Application') . "&order="
. urlencode('ASC');

NOTE: Just to be pedantic, and to drive the point home, I've
urlencode()d every other data element in the URL, even though the
output of urlencode() in all these cases *happens*, by sheer luck, to
be the same as the input, so you don't "need" to encode the data.

I am as guilty as the next guy of taking shortcuts and not
URLencode()ing anything that is 'hard-wired' in PHP source.

But if it's coming from your database, or worse, the user, you'd damn
well better urlencode() each value element you are putting into the
URL.

4. *NOW* you are about to dump that URL into your HTML as the HREF= of
a link.  At *THAT* point, and *ONLY* at that point, you want to escape
it for HTML usage:

$url_html = htmlentities($url); //escape for HTML

Your URL now has & for each & separating the key/value pairs in
the GET args.

That's what HTML *wants* though.

Any 'weird' data, where 'weird' is defined by what HTML likes, after
urlencode()ing, is ALSO escaped for suitable usage in HTML.

You could EVEN have a directory on your web-server with a space in it,
and the space would be converted to '+' or '%20' and be kosher.

I wouldn't actually recommend that you have a directory with a space
in it as part of your URL, but you COULD get away with it, and it
would probably "work" in this context.

You MAY want to be displaying the data to the user as the inner HTML
text of the link.

Then you need to use htmlentities() on the ORIGINAL source data:
$value_html = htmlentities($value);

$value_html how has the 'Oil & Gas' that you THOUGHT you wanted to
store in the database.

5. *NOW* you have all the correctly escaped bits to print out:
echo "<a href=\"$url_html\">$value_html</a><br />\n";


To summarize:

$value = 'Oil & Gas'; //Original un-corrupted data
$value_url = urlencode($value); //Suitable for use as URL value
$value_html = htmlentities($value); //Suitable for display in HTML
$url_html = htmlentities($value_url); //URL suitable for embedding as
HREF within HTML -- IE, Values are URL URLencoded, and URL as a whole
is encoded for HTML with htmlentities().

echo <<<EOL
<a href="$url_html">$value_html</a>
EOL;

Hopefully this all clarifies the differences between raw data,
URLencode()d data for use as a URL value, and HTML-encoded data (from
raw, or from URL) for usage within HTML.

Another way to look at this is this:

urlencode() takes care of data for HTTP transport.
htmlentities() takes care of data for HTML browser display.
HTTP and HTML are symbiotic technologies, but are not the same.

You need to url-encode values for HTTP transport in GET arts.
You need to html-encode all data for HTML display.

PS
Don't feel bad:
It took me YEARS to grok why the URL with URL-encoded values needed to
be encoded as a whole for dumping out to HTML.


[*]
For performance reasons, you MIGHT be able to make a case for storing
the HTML-encoded version of your raw data, alongside the original raw
un-encoded data -- or perhaps keeping the raw source data elsewhere
safely.

You MIGHT even have a compelling performance/storage requirement to
store HTML-encoded data *IF* you know you'd never need to use the data
for any other purpose ever in any potential future no matter what... 
Except that's a sure way to have it turned around on you shortly, and
you'll need the original un-encoded data anyway. :-)

-- 
Like Music?
http://l-i-e.com/artists.htm




-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to