The latter: text files with urls in them.  They must be compatible with
your URL filter to actually get crawled, of course.

Brian Ulicny

On Wed, 23 May 2007 13:52:29 -0500, "Aaron Green" <[EMAIL PROTECTED]>
said:
> Thanks for your reply.  I'm still a little cloudy about this though. 
> When
> you say files, are you talking about the html files that should provide
> starting points for a crawl?  Or is it text files with urls in them?
> 
> 
> 
> On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote:
> >
> > The input argument is the name of the directory where your crawl files
> > are located, not the name of the file.  Then nutch examines every file
> > in that directory as starting points for the crawl.
> >
> > So, when you issue
> >
> > bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> >
> > "urls" must be the name of a directory, not the name of a file.
> >
> > Hope that helps.
> >
> > Brian Ulicny
> >
> > On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <[EMAIL PROTECTED]>
> > said:
> > > I have read through an archive message dealing with Nutch on
> > Windows.  It
> > > was helpful, but I'm still having problems with this "flat file" on
> > > Windows.  How is this created?  After I have configured everything and
> > > start
> > > it up using the commands from the tutorial, I get an error that
> > basically
> > > says that my input path doesn't exist.  I have a file named url in the
> > > nutch
> > > root directory that I created with no extension through notepad that
> > > simply
> > > contains the one url I'm trying to spider.  Is there another way to
> > > create
> > > this flat file?
> > >
> > > --
> > > Aaron
> > --
> >   Brian Ulicny
> >   bulicny at alum dot mit dot edu
> >   home: 781-721-5746
> >   fax: 360-361-5746
> >
> >
> >
> 
> 
> -- 
> Aaron
-- 
  Brian Ulicny
  bulicny at alum dot mit dot edu
  home: 781-721-5746
  fax: 360-361-5746



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to