Hi - I hope this is the right list.
I have a suggestion for a new attribute to potentially make it into
(x)html standard.
The attribute is for search engines, to instruct them not to index part
of a page.
What I'm currently doing in xhtml is this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!ATTLIST div spider (on | off) #IMPLIED>
]>
The added attribute is spider and takes a value of on or off.
I'm using it in a modification of the open source sphider search engine
I'm working on. The idea is to avoid using html comments to turn on/off
indexing on part of a page.
The actual attribute name and values of such an attribute is definitely
open to discussion, but I think it should be non search crawler specific.
Example of use -
<p>This paragraph is indexed</p>
<p spider="off">This paragraph is not indexed</p>
<p>This paragraph is indexed</p>
<div spider="off">
<p>This paragraph is not indexed</p>
<p spider="on">This paragraph is indexed</p>
</div>
<img src="foo.jpg" alt="[This image is indexed]" />
<img src="bar.gif" spider="off" alt="[This image is not indexed]" />
Default is on unless the node or a parent node has turned it off.
It would be useful for things like navigation areas, images/multimedia
you specifically do not want engines to index, signature areas of
bulletin boards, etc.
Of course search engines would need their indexers to respect it, but
that's why a standard attribute is very desirable. With a standard, many
search engines would implement it as when properly used by the
webmaster, it would improve the usefulness of the search engine.
Thoughts?