Franz,

Someone else will need to confirm this...

FYI...why not simply inject the urls directly into Nutch?

./nutch inject db/ -urlfile seeds.txt


At 03:49 PM 1/20/2006, you wrote:

Thank you, but if I do that will the page be read for urls?
Cheers, Frank

On 1/20/06, Neal Whitley <[EMAIL PROTECTED]> wrote:
> Franz,
>
> I 'think' you could use the regex url filter to not index this page
> (regex-urlfilter.txt).
>
> Something like:  -^http://([a-z0-9]*\.)*tripod.com/
>
> I am new to Nutch so I make no guarantee... :-)
>
> Neal
>
>
>
> At 05:23 AM 1/20/2006, you wrote:
>
> >Hello,
> >
> >We are trying to implement Nutch on an intranet and have setup a
> >special page which has links to all the other pages of the site, since
> >many are not linked together.
> >We will start with this special page and then go from there to all the
> >other pages, but we would like to not index the first page (so that it
> >doesn't show up in search results), just use it for its links.
> >Is it possible easily?
> >
> >Thank you.
>
>



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to