The latter: text files with urls in them. They must be compatible with your URL filter to actually get crawled, of course.
Brian Ulicny On Wed, 23 May 2007 13:52:29 -0500, "Aaron Green" <[EMAIL PROTECTED]> said: > Thanks for your reply. I'm still a little cloudy about this though. > When > you say files, are you talking about the html files that should provide > starting points for a crawl? Or is it text files with urls in them? > > > > On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote: > > > > The input argument is the name of the directory where your crawl files > > are located, not the name of the file. Then nutch examines every file > > in that directory as starting points for the crawl. > > > > So, when you issue > > > > bin/nutch crawl urls -dir crawl -depth 3 -topN 50 > > > > "urls" must be the name of a directory, not the name of a file. > > > > Hope that helps. > > > > Brian Ulicny > > > > On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <[EMAIL PROTECTED]> > > said: > > > I have read through an archive message dealing with Nutch on > > Windows. It > > > was helpful, but I'm still having problems with this "flat file" on > > > Windows. How is this created? After I have configured everything and > > > start > > > it up using the commands from the tutorial, I get an error that > > basically > > > says that my input path doesn't exist. I have a file named url in the > > > nutch > > > root directory that I created with no extension through notepad that > > > simply > > > contains the one url I'm trying to spider. Is there another way to > > > create > > > this flat file? > > > > > > -- > > > Aaron > > -- > > Brian Ulicny > > bulicny at alum dot mit dot edu > > home: 781-721-5746 > > fax: 360-361-5746 > > > > > > > > > -- > Aaron -- Brian Ulicny bulicny at alum dot mit dot edu home: 781-721-5746 fax: 360-361-5746 ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
