Thanks for your reply. I'm still a little cloudy about this though. When
you say files, are you talking about the html files that should provide
starting points for a crawl? Or is it text files with urls in them?
On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote:
The input argument is the name of the directory where your crawl files
are located, not the name of the file. Then nutch examines every file
in that directory as starting points for the crawl.
So, when you issue
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
"urls" must be the name of a directory, not the name of a file.
Hope that helps.
Brian Ulicny
On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <[EMAIL PROTECTED]>
said:
> I have read through an archive message dealing with Nutch on
Windows. It
> was helpful, but I'm still having problems with this "flat file" on
> Windows. How is this created? After I have configured everything and
> start
> it up using the commands from the tutorial, I get an error that
basically
> says that my input path doesn't exist. I have a file named url in the
> nutch
> root directory that I created with no extension through notepad that
> simply
> contains the one url I'm trying to spider. Is there another way to
> create
> this flat file?
>
> --
> Aaron
--
Brian Ulicny
bulicny at alum dot mit dot edu
home: 781-721-5746
fax: 360-361-5746
--
Aaron
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general