ya , its better to use latest version of Nutch, now it's nutch-0.9 and we have used nutch-0.8.1, so i will suggest you to use any one of this, you can find much help in nutch-0.8.1 instead of nutch-0.9 because it's latest start. rest is your choice.
Thnx Ratnesh,V2Solutions,India franklinb4u wrote: > > Thanks for ur reply... u asked me to use PruneIndexTool which is a class > from Nutch 0.9 (I believe). > but i m using nutch 0.7.2 which tells me "command not found"... should i > migrate from nutch 0.7.2 to nutch 0.9? > > With Thanks, > Franklin.S > > franklinb4u wrote: >> >> Even i am facing the same problems... >> I dont know how to eliminate or delete the particular index of an url >> which is crawled. >> i need to eliminate the porn url's from my search engine... >> >> i m having the crawled data after crawling with me and now i need to >> find,the indexes of the porn urls.. >> >> please help me in doing this... >> >> With Thanks, >> Franklin.S >> >> Ratnesh,V2Solutions India wrote: >>> >>> no, >>> i don't think that we hav to deal somthing we that, because if i remove >>> then I wont b able to index my own file for which I am crawling to. >>> >>> But I will surely check, as at this moment I am not very sure?? >>> Can you tell me abour ur whereabots?? >>> >>> Thnks >>> Ratnesh V2Soltuons, India >>> >>> Siddharth Jonathan wrote: >>>> >>>> Hmmm...I haven't had to do this, but my guess would be to remove the >>>> corresponding >>>> plugin entries from the nutch-default.xml file. >>>> There is a plugin include property in that file which includes the >>>> default >>>> indexing filters (index-basic,index-more etc) >>>> and the query filter plugins(query-basic,query-more etc). Try removing >>>> those. That might keep them from getting used. >>>> >>>> Jonathan >>>> >>>> >>>> On 4/2/07, Ratnesh,V2Solutions India >>>> <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> >>>>> exactly offcourse , >>>>> >>>>> I want this only, Do you have any solution for this?? >>>>> >>>>> looking forwards for your reply >>>>> >>>>> Thnx >>>>> >>>>> >>>>> Siddharth Jonathan wrote: >>>>> > >>>>> > Do you mean how do you get rid of some of the fields that are >>>>> indexed by >>>>> > default? eg. content, anchor text etc. >>>>> > >>>>> > Jonathan >>>>> > On 4/2/07, Ratnesh,V2Solutions India >>>>> > <[EMAIL PROTECTED]> >>>>> > wrote: >>>>> >> >>>>> >> >>>>> >> Hi, >>>>> >> I have written a plugin , which finds no. of Object tags in a html >>>>> and >>>>> >> corresponding urls. >>>>> >> I am storing "objects" as fields and page url as values. >>>>> >> >>>>> >> And finally interested in seeing the search realted with "objects" >>>>> >> indexed >>>>> >> fields not those which is already stored as indexed fields. >>>>> >> >>>>> >> So how shall I delete those index fields which is already >>>>> stored???? >>>>> >> >>>>> >> Looking forward towards your reply(Valuable >>>>> >> inputs)......................... >>>>> >> >>>>> >> Thnx to Nutch Community >>>>> >> -- >>>>> >> View this message in context: >>>>> >> >>>>> http://www.nabble.com/How-to-delete-already-stored-indexed-fields----tf3504164.html#a9786377 >>>>> >> Sent from the Nutch - User mailing list archive at Nabble.com. >>>>> >> >>>>> >> >>>>> > >>>>> > >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/How-to-delete-already-stored-indexed-fields----tf3504164.html#a9803792 >>>>> Sent from the Nutch - User mailing list archive at Nabble.com. >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/How-to-delete-already-stored-indexed-fields----tf3504164.html#a10133749 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
