Hello Gal,
thanks for your reply.
Take a look here: http://issues.apache.org/jira/browse/NUTCH-100
If you have further questions...
I've a problem with nutch-extensionpoints. There are no Java-Sourcefiles
in src. So I cant compile.
I tried to remove the entry for nutch-extension-points from build.xml so
that I could compile the sources without it but it seems that
nutch-extensionpoints is vital.
Any idea?
Kai
Regards,
Gal
Kai Hagemeister wrote:
Hello,
I have a few basic questions and hope that somebody can assist.
I'm trying to search different domains. It seems fairly simple to crawl
one special domain (intranet-search) which is defined in the
configuration
file. But this seems to be limited to only the one, specified domain.
I also could search through the web (websearch) by giving different urls
via an urlfile. But I want to search complete domains without going
outside.
So, if I handover the urls bla.com and blub.net, only sites from this
domains should be fetched. I tried to set the parameter follow
outsitelinks to 0. But then, also links inside of the domain were
ignored.
Is there a way to acomplish the task? I mean an other then changing the
sourcecode :-).
Furthermore I created a directory db for the database and one for
segments. Then I started tomcat from a parent-directory of segments. The
Java class seems to search for a child-directory segments from the
current
position. The problem: after each update of the index I have to restart
tomcat :-(. It's getting worse each time when I start the processes I
must
delete the database and the segments.
How do I accomplish a reasonable fetching cycle. Could somebody give an
example?
My idea would be to put the following snippet in a endless loop and call
this with nohup:
bin/nutch generate db segments -topN 1000
s1=`ls -d segments/2* | tail -1`
bin/nutch fetch $s1
bin/nutch updatedb db $s1
bin/nutch index $s1
Would this be advisable? And can sombody explain the meaning of -topN
1000.
Is there no other way then restarting tomcat?
I would appriciate any assistance.
Best regards
Kai
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general