First let me say, plucker is an excellent set of tools and applications!

I'm having trouble figuring out the best way to do this.  Specifically,
I seem to have problems getting best results with meta news sites. 
Let's say I want to pluck slashdot.  If I set max depth to 2, while
making it stay on host, I won't get the article, rather, only the
article talking about the article.  That's about worthless.  If I remove
the stayonhost restriction, it quickly spiders far too much stuff,
mostly, away from the meta site, which makes the pdb grow far too large
and simply wastes time spidering it.  This again, is not what I want. 
It would be nice to allow for something like:

--maxnonhomedepth 1
--maxdepth 5

Which would allow a max depth of 5 for anything at or below the home url
(the meta news site, in this case) while anything not below the home
url, would get a max depth of 1.  This way, as for meta news sites, I
can still get the referred article AND the content being provided by the
meta news site.  

Is there any way to do what I'm wanting to do without modifying plucker?

Thanks,

Greg




_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to