Hi Olaf,
   
    I have used the Intranet crawling of Nutch to
crawl, the root URLs are:

http://www.l3s.uni-hannover.de/
http://www.l3s.de/
http://www.learninglab.uni-hannover.de/
http://www.learninglab.de/

and the domain names of the root URLs above refer to
the same IP address(Host names aliases). After the
crawling has completed, i used the WebDBReader command
line(bin/nutch readdb <db> -dumplinks) to get data
about link of URLs.

>From the dumplinks, i found some link is not correct
(see the example at below). Why the source
page(/morob/Galleries/ER1/pages/09_DSCF0492.html)on
the host http://www.l3s.uni-hannover.de has outlinks
to pages of the other hosts
(http://www.learninglab.de/ and
http://www.learninglab.uni-hannover.de/). 

In fact, the source page has only 3 outlinks(absolute
outlinks) but from the dumplinks it has in total 9
outlinks(6 outlinks are false). The detail in pages of
the 6 false outlinks are same the 3 pages of absolute
outlinks but on other host name. 

Is maybe problem about the Host names aliases and can
you tell me why?

Thank a lot!
Niti


Date: Mon, 21 Feb 2005 20:50:54 +0100
From: Olaf Thiele <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Re: [Nutch-dev] Why found the unabsolute
links from Nutch
Reply-To: [EMAIL PROTECTED]

Hi Niti,
I don't get your question. Just write it in German
and I will post it in English.

Bye
Olaf



On Mon, 21 Feb 2005 05:29:43 -0800 (PST), Niti
Witthayawiroj
<[EMAIL PROTECTED]> wrote:
> Hi, 
>   
> I have used Nutch to crawl four hosts and the four
host names 
correspond to
> the same IP address. I used the WebDBReader to get
the dump links of 
URLs.
> Why it found the unabsolute links (pages in one host
have links to 
pages in
> other hosts). 
>   
> For example: 
>   
> from
> 
http://www.l3s.uni-hannover.de/morob/Galleries/ER1/pages/09_DSCF0492.html
>  to
http://www.l3s.uni-hannover.de/morob/Galleries/ER1/index.html
>  to
> 
http://www.l3s.uni-hannover.de/morob/Galleries/ER1/pages/08_DSCF0493.html
>  to
> 
http://www.l3s.uni-hannover.de/morob/Galleries/ER1/pages/10_DSCF0499.html
>  to
http://www.learninglab.de/morob/Galleries/ER1/index.html
>  to 
http://www.learninglab.de/morob/Galleries/ER1/pages/08_DSCF0493.html
>  to 
http://www.learninglab.de/morob/Galleries/ER1/pages/10_DSCF0499.html
>  to 
http://www.learninglab.uni-hannover.de/morob/Galleries/ER1/index.html
>  to
> 
http://www.learninglab.uni-hannover.de/morob/Galleries/ER1/pages/08_DSCF0493.html
>  to
> 
http://www.learninglab.uni-hannover.de/morob/Galleries/ER1/pages/10_DSCF0499.html
>  
> ragards, 
> Niti


                
__________________________________ 
Do you Yahoo!? 
Yahoo! Sports - Sign up for Fantasy Baseball. 
http://baseball.fantasysports.yahoo.com/


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to