A couple of thoughts on John's post.

John Yeo writes:
> I have recently noticed a few requests every day on my
> webserver for "robots.txt".  After getting that file (which doesn't even
> exist on my webserver!), they would download various other html files.

The last thing I do before opening a site for a client (or myself) is
create an empty robots.txt file.  It is a dotting of the i and a crossing
of the t, and prevents the error message you mention.  And in most cases,
one wants the entire text of one's site indexed by the search engines
so that one can be found for the flimsiest of reasons.  I do not have a
lot of my work up, but since some of my work is of places, I would like
my images to come up when people search on those places.

For clients who really do not want their materials indexed, I often go
with a) a URL that is not linked from any page or b) put a password
on that directory/section of the site.  The problem with a) is that
if someone likes your materials, they can put a link to your hidden
directory from their pages -- maliciously or not -- and soon enough the
spiders will come.

> your images.  Fortunately, you can lock these robots out of parts of your
> website, or the whole site itself.  Usually you would want to leave the site
> open for search engines to increase traffic, but lock out your gallery so
> sites like http://images.google.com, can't get at your artwork.
> 
> To lock the robots out of your entire website, make a text file called
> robots.txt, and put the following lines in it:
> 
> User-agent: *
> Disallow: /

This assumes that the creator of the robot is an ethical person; if
someone's out to steal your images in the first place, it seems unlikely
that the spider they create would follow this protocol.

Oh, the protocol is described succinctly at

    http://www.robotstxt.org/wc/exclusion.html

Digital watermarking still seems like a good strategy.  Companies like
digimarc not only provide software for watermarking your images, but
have their own spiders (which have a subscription fee) that look for
occurences of your work on unauthorized sites.  (And of course, if their
spiders followed the protocol, they'd never discover the thefts.)

--Eric

Reply via email to