Oh and as for the web interface, take a look at the wiki page: http://wiki.apache.org/nutch/NutchTutorial
The bottom of the page has a section on searching. On 6/15/07, Briggs <[EMAIL PROTECTED]> wrote: > Yeah, you still don't have the agent configured. All your values for > the agent (the <value></value> needs a value) are blank. So, you need > to at least confugure an agent name. > > > > On 6/15/07, karan thakral <[EMAIL PROTECTED]> wrote: > > i m using crawl on the cygwin while working on windows > > > > but the crawl output is not proper > > > > during fetch its says fetch: the document could not be fetched java runtime > > exception agent not configured > > > > my nutch-site.xml is as follows > > > > <?xml version="1.0"?> > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > > <!-- Put site-specific property overrides in this file. --> > > > > <configuration> > > <property> > > <name>http.agent.name</name> > > <value></value> > > <description>HTTP 'User-Agent' request header. MUST NOT be empty - > > please set this to a single word uniquely related to your organization. > > > > NOTE: You should also check other related properties: > > > > http.robots.agents > > http.agent.description > > http.agent.url > > http.agent.email > > http.agent.version > > > > and set their values appropriately. > > > > </description> > > </property> > > > > <property> > > <name>http.agent.description</name> > > <value></value> > > <description>Further description of our bot- this text is used in > > the User-Agent header. It appears in parenthesis after the agent name. > > </description> > > </property> > > > > <property> > > <name>http.agent.url</name> > > <value></value> > > <description>A URL to advertise in the User-Agent header. This will > > appear in parenthesis after the agent name. Custom dictates that this > > should be a URL of a page explaining the purpose and behavior of this > > crawler. > > </description> > > </property> > > > > <property> > > <name>http.agent.email</name> > > <value></value> > > <description>An email address to advertise in the HTTP 'From' request > > header and User-Agent header. A good practice is to mangle this > > address (e.g. 'info at example dot com') to avoid spamming. > > </description> > > </property> > > </configuration> > > > > but still thrs error > > > > also please throw some light on the searching of info through the web > > interface after the crawl is made successful > > -- > > With Regards > > Karan Thakral > > > > > -- > "Conscious decisions by conscious minds are what make reality real" > -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
