[Nutch-general] How to prevent a page from being index during crawl or after crawl??

Ratnesh,V2Solutions India Mon, 02 Apr 2007 04:36:10 -0700

Hi,

It will be appreciable if you will help me in this regard, I want some of
the pages not to be indexed during crawl if they din't meet with specific
criteria??


I am getting the url of those pages in Hadoop log which they don't meet ??
But still those urls along with all the contents are indexed, so what I want
is to delete all those urls or contents from Luke tool , I mean index.

Can any body help me resolving the issue??


Ratnesh V2Solutions India
-- 
View this message in context: 
http://www.nabble.com/How-to-prevent-a-page-from-being-index-during-crawl-or-after-crawl---tf3505149.html#a9788975
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] How to prevent a page from being index during crawl or after crawl??

Reply via email to