I want to do some specific crawling where I crawl one site with one set of urls to accept/reject, then reset to crawl another site with another set of urls to accept/reject, etc. I'm writing my own wrapper that sticks the urls to accept/reject into the Configuration and a URLFilter that uses that configuration item to do the accepting/rejecting, but I don't see how to make it start at a given url other than making a dir/url file with that url in it. In this case that's inefficient and I'd rather just parse one file with a list of urls and the accept/reject list for that url, then say "Inject this url", then do my own generate/fetch/updatedb cycle, then inject the next and repeat.
-- http://www.linkedin.com/in/paultomblin