I'm trying to get rid of some spammy sites in our index.
First, I wonder if anyone has any suggestions on changes to the default
install config of Nutch that will help drive better sites to the top and
spammier sites down.
Secondly, I boosted the inbound anchor text config - but if anything
that made things worse. A lot of the spammier sites heavily use search
terms intheir internal anchors. So I'm wondering - is there any easy
way to distinguish between anchor text from within the same domain vs.
anchor text from external domains, and give them different weightings?
I expect this isn't the case currently - anyone have any opinions on how
difficult this would be to change?
Thanks,
g.
- modifying inbound link text calc Insurance Squared Inc.
-