Thanks for hint.
I have changed my script and instead single "nutch crawl" step
I use  generate->fetch->updatedb->fetch->invertlinks->index commands.
I don't use dedup command.
Now it seems to be OK, search find out all occurrences. 
I think nutch removes duplicate pages even they are on different locations.
But for me it is important to have information about every occurrence of 
a term.

Libor

Alvaro Cabrerizo wrote:
> I recommend you to check you index using luke. Whith luke you can manage
> (query, see structure..) your lucene index in order to discover if you 
> have
> a problem during indexation or during the search.
>
> 2007/1/16, kauu <[EMAIL PROTECTED]>:
>>
>> so ,u must show us the logs ,
>> and did u change the nutch-site.xml in the tomcat ?
>>
>> On 1/16/07, Libor Štefek <[EMAIL PROTECTED]> wrote:
>> >
>> > Hi,
>> > I'm using nutch 0.8.1 to index several thousand text files (source 
>> code)
>> > and I use
>> > intranet crawling method to create an index.
>> >
>> > Everything looks fine, but when I try to search something, it often
>> > doesn't find
>> > what it should. I'm sure that the term is in several pages, but I got
>> > result only
>> > for some of them.
>> >
>> > I tried to set limits in properties like page sizes, number of links
>> > etc. but nothing helped.
>> > There aren't any error messages in logfile during crawl.
>> >
>> > Is there any way how to find a reason for this behavior ?
>> > How to make nutch more reliable in results?
>> >
>> > Thanks for any hint.
>> > Libor
>> >
>> >
>>
>>
>> -- 
>> www.babatu.com
>>
>>
>


-- 
-- 
Libor Štefek
LOGIS, s.r.o.
tel.    +420 556 841 100
fax.    +420 556 841 117
mobil   +420 605 228 985
www.logis.cz <http://www.logis.cz/>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to