Hi, My link DB is growing substantially now and Im' crawling some 12 million urls a day. I plan on generating my linkdb in portions (10) of 100 million each to place on my sand box servers for a distributed search cluster. Before I move this out of hadoop and place it on local file systems I want to filter my linkdb for any adult content.
Does anyone have any pointers or ready made filters for this? I'm sure I can create some filters to do this to a degree; however a tried and true system would be ideal. Thanks.. Axel..
