Be able to modify URL rules while crawler is running
----------------------------------------------------
Key: DROIDS-77
URL: https://issues.apache.org/jira/browse/DROIDS-77
Project: Droids
Issue Type: New Feature
Components: core
Affects Versions: 0.01
Reporter: Richard Frovarp
Priority: Minor
It would be nice to be able to modify the URL rules while a crawler is running.
This would allow me to dynamically exclude areas from being crawled based on
results being returned. Basically I want to look for certain markers inside a
page, then not crawl those pages without having update a robots file. Different
paths of our site is going to enter into the index from a different method than
the main crawl, so I can skip them once I find them.
Having a modifiable filter would allow people to load their rules from places
other than a file without having to write their own implementation or
extension. I'll try to work up a patch sometime this week.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.