Ok, apparently, I'm going to have to have it broken down in further steps
than are on the web site. I'm creating the urls folder, and I have a text
file named mysite. I've tried it with and without an extension. I get the
same message everytime, that my input path doesn't exist. It's actually the
first thing I tried when I downloaded the files, and then started going off
on tangents to find something that worked. I think it may have something to
do with the fact that this is Windows and it's very easy to just touch a
file on Linux and create it.
On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote:
The latter: text files with urls in them. They must be compatible with
your URL filter to actually get crawled, of course.
Brian Ulicny
On Wed, 23 May 2007 13:52:29 -0500, "Aaron Green" <[EMAIL PROTECTED]>
said:
> Thanks for your reply. I'm still a little cloudy about this though.
> When
> you say files, are you talking about the html files that should provide
> starting points for a crawl? Or is it text files with urls in them?
>
>
>
> On 5/23/07, Brian Ulicny <[EMAIL PROTECTED]> wrote:
> >
> > The input argument is the name of the directory where your crawl files
> > are located, not the name of the file. Then nutch examines every file
> > in that directory as starting points for the crawl.
> >
> > So, when you issue
> >
> > bin/nutch crawl urls -dir crawl -depth 3 -topN 50
> >
> > "urls" must be the name of a directory, not the name of a file.
> >
> > Hope that helps.
> >
> > Brian Ulicny
> >
> > On Wed, 23 May 2007 11:11:26 -0500, "Aaron Green" <
[EMAIL PROTECTED]>
> > said:
> > > I have read through an archive message dealing with Nutch on
> > Windows. It
> > > was helpful, but I'm still having problems with this "flat file" on
> > > Windows. How is this created? After I have configured everything
and
> > > start
> > > it up using the commands from the tutorial, I get an error that
> > basically
> > > says that my input path doesn't exist. I have a file named url in
the
> > > nutch
> > > root directory that I created with no extension through notepad that
> > > simply
> > > contains the one url I'm trying to spider. Is there another way to
> > > create
> > > this flat file?
> > >
> > > --
> > > Aaron
> > --
> > Brian Ulicny
> > bulicny at alum dot mit dot edu
> > home: 781-721-5746
> > fax: 360-361-5746
> >
> >
> >
>
>
> --
> Aaron
--
Brian Ulicny
bulicny at alum dot mit dot edu
home: 781-721-5746
fax: 360-361-5746
--
Aaron
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general