I've asked simular questions before, but I think I'm finally asking the
*right* question to get the right answer.
That's often the tricky part :-)
I'm look for some suggestions on the best method of parsing a HTML document
(or part thereof), with the view of CAPTURING and MODIFYING a specific
element of a specific tag.
something like:
1. look for a given tag eg DIV
2. capture the tag (everything from 'DIV' up to the '')
3. look for a given attribute (eg ID=foo, ID=foo, ID='foo' -- all valid
ways)
4. capture it
5. be given the opportunity to manipulate the attribute's value, delete it,
etc
6. place captured tag (complete with modifed elements) back into the string
in it's original position
7. return to step 1, looking for the next occurence of a DIV tag
If you are only looking for a SPECIFIC tag, you just simplified life
immensely!
?php
# Get some beautiful sample HTML:
$html = file('http://php.net/') or die(Could not open php.net);
$html = implode('', $html);
# Find the DIV tag:
$div = stristr($html, 'div');
$divpos = strlen($html) - strlen($div);
# Break the HTML up into before and after DIV tag:
$before_div = substr($html, 0, $divpos);
$after_div = substr($html, $divpos);
# Find the *END* of the DIV tag:
# KNOWN BUG:
# They *could* bury a in their attributes if they work at it...
$end_tag = strstr($after_div, '');
$endpos = strlen($after_div) - strlen($end_tag);
$div = substr($after_div, 0, $endpos);
# Now get the after part to *really* be after the *WHOLE* DIV tag:
$afterdiv = substr($after_div, $endpos);
echo Before DIV tag: BR, htmlentities($before_div), HR\n;
echo DIV tag itself: BR, htmlentities($div), HR\n;
echo After DIV tag: BR, htmlentities($after_div), HR\n;
?
I can pretty much guarantee that I didn't put a +1 or -1 somewhere where it
belongs in the substr() function calls. I never get that right in my first
pass of coding. You'll have to fine-tune that part yourself.
But you can now do the same technique to search inside of $div for the ID
attribute, pretty much.
The solution might be a helluva lot more complex, or may be OOP based.
Any inspiration/links/words of wisdom?
If you need to do this for any arbitrary tag all at once, there *HAVE* to be
PHP-based HTML parsers out there in the various PHP script archives...
If all else fails, the PHP source for http://php.net/strip_tags must have
some kind of HTML parsing routine in it.
--
Like Music? http://l-i-e.com/artists.htm
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php