Problem with Vspider and excluding specific pages

2006-04-12 Thread Andrew Tyrone
Hi everyone,

I've been working with Vspider for a while now (CFMX 6.1 and 7.0.1) on a
search revamp project and ran into a problem.  It seems that if you want the
spider to follow a page but not index it, a link to that page must be
explicit and not implicit with links like this:

http://www.mysite.com/mydir/
and
http://www.mysite.com/mydir/index.cfm

If I want to exclude the index.cfm file from being indexed but still have
vspider follow it, I'd do this in my config file:

-indexclude */mydir/index.cfm

That will work, but ONLY if the link vspider picks up to that page
explicitly calls the index.cfm page.  If vspider follows this link:

http://www.mysite.com/mydir/

The index.cfm is indexed and followed, effectively rendering the -indexclude
option useless.

Does anyone have any ideas on this?  It's possible I am missing something
with one of the options but I've been through them quite a lot.  I was
thinking maybe the -regexp switch might help get around this problem.

Thanks,
Andy



~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:237595
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54


Problem with Vspider and excluding specific pages

2006-04-12 Thread Andy Tyrone
Hi everyone,

I've been working with Vspider for a while now (CFMX 6.1 and 7.0.1) on a
search revamp project and ran into a problem.  It seems that if you want the
spider to follow a page but not index it, a link to that page must be
explicit and not implicit with links like this:

http://www.mysite.com/mydir/
and
http://www.mysite.com/mydir/index.cfm

If I want to exclude the index.cfm file from being indexed but still have
vspider follow it, I'd do this in my config file:

-indexclude */mydir/index.cfm

That will work, but ONLY if the link vspider picks up to that page
explicitly calls the index.cfm page.  If vspider follows this link:

http://www.mysite.com/mydir/

The index.cfm is indexed and followed, effectively rendering the -indexclude
option useless.

Does anyone have any ideas on this?  It's possible I am missing something
with one of the options but I've been through them quite a lot.  I was
thinking maybe the -regexp switch might help get around this problem.

Thanks,
Andy


~|
Message: http://www.houseoffusion.com/lists.cfm/link=i:4:237582
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/4
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:4
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.4
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54