Ok, apparently, I'm going to have to have it broken down in further steps
than are on the web site.  I'm creating the urls folder, and I have a text
file named mysite.  I've tried it with and without an extension.  I get the
same message everytime, that my input path doesn't exist.  It's actually the
first thing I tried when I downloaded the files, and then started going off
on tangents to find something that worked.  I think it may have something to
do with the fact that this is Windows and it's very easy to just touch a
file on Linux and create it.



On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote:

The latter: text files with urls in them.  They must be compatible with
your URL filter to actually get crawled, of course.

Brian Ulicny

On Wed, 23 May 2007 13:52:29 -0500, "Aaron Green" <[EMAIL PROTECTED]>
said:
> Thanks for your reply.  I'm still a little cloudy about this though.
> When
> you say files, are you talking about the html files that should provide
> starting points for a crawl?  Or is it text files with urls in them?
>
>
>
> On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote:
> >
> > The input argument is the name of the directory where your crawl files
> > are located, not the name of the file.  Then nutch examines every file
> > in that directory as starting points for the crawl.
> >
> > So, when you issue
> >
> > bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> >
> > "urls" must be the name of a directory, not the name of a file.
> >
> > Hope that helps.
> >
> > Brian Ulicny
> >
> > On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <
[EMAIL PROTECTED]>
> > said:
> > > I have read through an archive message dealing with Nutch on
> > Windows.  It
> > > was helpful, but I'm still having problems with this "flat file" on
> > > Windows.  How is this created?  After I have configured everything
and
> > > start
> > > it up using the commands from the tutorial, I get an error that
> > basically
> > > says that my input path doesn't exist.  I have a file named url in
the
> > > nutch
> > > root directory that I created with no extension through notepad that
> > > simply
> > > contains the one url I'm trying to spider.  Is there another way to
> > > create
> > > this flat file?
> > >
> > > --
> > > Aaron
> > --
> >   Brian Ulicny
> >   bulicny at alum dot mit dot edu
> >   home: 781-721-5746
> >   fax: 360-361-5746
> >
> >
> >
>
>
> --
> Aaron
--
  Brian Ulicny
  bulicny at alum dot mit dot edu
  home: 781-721-5746
  fax: 360-361-5746





--
Aaron
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to