Oh and as for the web interface, take a look at the wiki page:

http://wiki.apache.org/nutch/NutchTutorial

The bottom of the page has a section on searching.

On 6/15/07, Briggs <[EMAIL PROTECTED]> wrote:
> Yeah, you still don't have the agent configured.  All your values for
> the agent (the <value></value> needs a value) are blank.  So, you need
> to at least confugure an agent name.
>
>
>
> On 6/15/07, karan thakral <[EMAIL PROTECTED]> wrote:
> > i m using crawl on the cygwin while working on windows
> >
> > but the crawl output is not proper
> >
> > during fetch its says fetch: the document could not be fetched java runtime
> > exception  agent not configured
> >
> > my nutch-site.xml is  as follows
> >
> > <?xml version="1.0"?>
> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> >
> > <!-- Put site-specific property overrides in this file. -->
> >
> > <configuration>
> > <property>
> >   <name>http.agent.name</name>
> >   <value></value>
> >   <description>HTTP 'User-Agent' request header. MUST NOT be empty -
> >   please set this to a single word uniquely related to your organization.
> >
> >   NOTE: You should also check other related properties:
> >
> >   http.robots.agents
> >   http.agent.description
> >   http.agent.url
> >   http.agent.email
> >   http.agent.version
> >
> >   and set their values appropriately.
> >
> >   </description>
> > </property>
> >
> > <property>
> >   <name>http.agent.description</name>
> >   <value></value>
> >   <description>Further description of our bot- this text is used in
> >   the User-Agent header.  It appears in parenthesis after the agent name.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>http.agent.url</name>
> >   <value></value>
> >   <description>A URL to advertise in the User-Agent header.  This will
> >    appear in parenthesis after the agent name. Custom dictates that this
> >    should be a URL of a page explaining the purpose and behavior of this
> >    crawler.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>http.agent.email</name>
> >   <value></value>
> >   <description>An email address to advertise in the HTTP 'From' request
> >    header and User-Agent header. A good practice is to mangle this
> >    address (e.g. 'info at example dot com') to avoid spamming.
> >   </description>
> > </property>
> > </configuration>
> >
> >   but still thrs error
> >
> > also please throw some light on the searching of info through the web
> > interface after the crawl is made successful
> > --
> > With Regards
> > Karan Thakral
> >
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>


-- 
"Conscious decisions by conscious minds are what make reality real"

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to