I found the problem.  Very easy, indeed, but we have to be carefull to the 
details.  If no results were found, I looked to the "searcher properties" 
and in the name of  "searcher directory" the default value was 
<value>crawl</value>.  If your directory is not crawl, the error.  Just 
change this to  "." like in the previous versions of Nutch and it works no 
matter the name of your directory.
Tanks
W. Melo


----- Original Message ----- 
From: "carmmello" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, September 28, 2006 7:31 PM
Subject: Re: no results in nutch 0.8.1


> Hello, Dennis,
>
> Tanks again, for your response.  I am really amazed that the things can`t 
> go right.  I have verified my configuration, in nutch-site.xml  and  I 
> have already filled all the fields we mentioned in your e-mail.  I have 
> even copied the file nutch-site.xml to a sub-folder under the folder ROOT 
> in TomCat.  Still no results, although the log does not show any problems. 
> Just for your information I will reproduce two section of the log:
>
> The first one, just when starting the crawl:
>
> 006-09-28 17:15:43,930 INFO  http.Http - http.agent = 
> qualidade/0.8.1(qualidade e meio ambiente; http://www.qualidade.eng.br; 
> [EMAIL PROTECTED])
>
> and, the final section, after all the indexing and optimization:
>
> 2006-09-28 17:25:58,551 INFO  indexer.Indexer - Indexer: done
> 2006-09-28 17:25:58,556 INFO  indexer.DeleteDuplicates - Dedup: starting
> 2006-09-28 17:25:58,593 INFO  indexer.DeleteDuplicates - Dedup: adding 
> indexes in: teste/indexes
> 2006-09-28 17:26:01,356 INFO  indexer.DeleteDuplicates - Dedup: done
> 2006-09-28 17:26:01,358 INFO  indexer.IndexMerger - Adding 
> teste/indexes/part-00000
> 2006-09-28 17:26:02,377 INFO  crawl.Crawl - crawl finished: teste
>
> Then I go to the "teste" folder and start TomCat from there, like in Nutch 
> 0.7.2, get that nice search page, try something and ..........zero 
> results!
>
> Any new ideas?
>
> Tanks,
> W. Melo
>
>
>
> ----- Original Message ----- 
> From: "Dennis Kubes" <[EMAIL PROTECTED]>
> To: <[email protected]>
> Sent: Thursday, September 28, 2006 6:19 PM
> Subject: Re: no results in nutch 0.8.1
>
>
>> This is what we have, hope this clears up some confusion.  It will show 
>> up in log files of the sites that you crawl like this.  I don't know if 
>> the configuration is what is causing your problem but I have talked to 
>> other people on the list with similar problems where their configuration 
>> was incorrect.  I think the only thing that is "required" is for the 
>> http.agent.name not to be blank but I would set all of the other options 
>> as well, just for politeness.
>>
>> Dennis
>>
>> Log file will record a crawler similar to this:
>> NameOfAgent/1.0_(Yourwebsite.com;_http://www.yoururl.com/bot.html;[EMAIL 
>> PROTECTED])
>>
>> <!-- HTTP properties -->
>> <property>
>>  <name>http.agent.name</name>
>>  <value>NameOfAgent</value>
>>  <description>Our HTTP 'User-Agent' request header.</description>
>> </property>
>>
>> <property>
>>  <name>http.robots.agents</name>
>>  <value>NutchCVS,Nutch,NameOfAgent,*</value>
>>  <description>The agent strings we'll look for in robots.txt files,
>>  comma-separated, in decreasing order of precedence.</description>
>> </property>
>>
>> <property>
>>  <name>http.robots.403.allow</name>
>>  <value>true</value>
>>  <description>Some servers return HTTP status 403 (Forbidden) if
>>  /robots.txt doesn't exist. This should probably mean that we are
>>  allowed to crawl the site nonetheless. If this is set to false,
>>  then such sites will be treated as forbidden.</description>
>> </property>
>>
>> <property>
>>  <name>http.agent.description</name>
>>  <value>Yourwebsite.com</value>
>>  <description>Further description of our bot- this text is used in
>>  the User-Agent header.  It appears in parenthesis after the agent name.
>>  </description>
>> </property>
>>
>> <property>
>>  <name>http.agent.url</name>
>>  <value>http://yoururl.com</value>
>>  <description>A URL to advertise in the User-Agent header.  This will
>>   appear in parenthesis after the agent name.
>>  </description>
>> </property>
>>
>> <property>
>>  <name>http.agent.email</name>
>>  <value>[EMAIL PROTECTED]</value>
>>  <description>An email address to advertise in the HTTP 'From' request
>>   header and User-Agent header.</description>
>> </property>
>>
>> <property>
>>  <name>http.agent.version</name>
>>  <value>1.0</value>
>>  <description>A version string to advertise in the User-Agent
>>   header.</description>
>> </property>
>>
>> carmmello wrote:
>>> Tanks for your answer Dennis, but, yes, I did.  The only thing I did not 
>>> (and I have some doubt about it) is that in the http.agent.version I 
>>> only used Nutch-0.8.1 name, but not the the name I used in 
>>> http.robots.agent, although in this configuration I have kept the *. 
>>> Also,  in the log file, I can not find any error regarding this
>>>
>>> ----- Original Message ----- From: "Dennis Kubes" 
>>> <[EMAIL PROTECTED]>
>>> To: <[email protected]>
>>> Sent: Wednesday, September 27, 2006 7:59 PM
>>> Subject: Re: no results in nutch 0.8.1
>>>
>>>
>>>> Did you setup the user agent name in the nutch-site.xml file or 
>>>> nutch-default.xml file?
>>>>
>>>> Dennis
>>>>
>>>> carmmello wrote:
>>>>> I have followed the steps in the  0.8.1 tutorial and, also, I have 
>>>>> been using Nutch for some time now, without seeing the kind of 
>>>>> problem I am encountering now.
>>>>> After I have finished the crawl process (intranet crawling), I go to 
>>>>> localhost:8080, try a search and get, no matter what, 0 results.
>>>>> Looking at the logs, everything seems ok.  Also, if I use the command 
>>>>> bin/nutch readdb "crawl/crawldb"  I found more than 6000 urls.
>>>>> So, why can`t I get any results?
>>>>> Tanks
>>>>>
>>>>
>>>>
>>>> -- 
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition.
>>>> Version: 7.1.405 / Virus Database: 268.12.9/458 - Release Date: 
>>>> 27/9/2006
>>>>
>>>>
>>>
>>
>>
>> -- 
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.405 / Virus Database: 268.12.9/458 - Release Date: 27/9/2006
>>
>>
>
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.405 / Virus Database: 268.12.9/458 - Release Date: 27/9/2006
>
> 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to