ok try this,

as you see the two filters have the same entry. I dont exactly why it has to
be 2 where one would be enough but this keeps me from crawl the parent dir
aswell.

check the nutch site.xml  if I put there .* it isnt working in my case so I
have to write the plugins I really need.

check also out my new SMB Protocol.


-Djava stuff 

copy jcifs to 
C:\Program Files\Java_jdk1.6.0_01\jre\lib\ext (in my case)

Add wollowing to the main method of crawl.java 

  /* Perform complete crawling and indexing given a set of root urls. */
  public static void main(String args[]) throws Exception {
-->       System.setProperty("java.protocol.handler.pkgs", "jcifs");
-->       LOG.info("SMB Info: " +
System.getProperty("java.protocol.handler.pkgs"));
-->       LOG.info("SMB Info: " +  new
java.util.PropertyPermission("java.protocol.handler.pkgs","read,
write").toString());
          if (args.length < 1) {
...and so on....

then you dont need to set the -Djava..  properties before starting the app.


good luck 




http://www.nabble.com/file/p11047384/protocol-smb.zip protocol-smb.zip 

http://www.nabble.com/file/p11047384/regex-urlfilter.txt regex-urlfilter.txt 
http://www.nabble.com/file/p11047384/crawl-urlfilter.txt crawl-urlfilter.txt 
http://www.nabble.com/file/p11047384/nutch-site.xml nutch-site.xml 

opoole wrote:
> 
> Hi Vadim,
> 
> To be honest I am somewhat behind you as my problem is that I cannot get
> the SMB protocol setup, I am unable to get the -djava bit to do anything,
> I am using cygwin and entering the command from within sun\java etc.
> 
> As for crawl speed, I'd love to get that far.
> 
> Also I noticed that you were crawling from the root of C:\ whereas I want
> to crawl a specific folder and the parent directory issue crops up, I
> cannot get it to stop crawling the parent.  One thing I had noticed is
> that I did not have a URLFILTER entry in my nucth-config.xml and that
> makes a difference in that if I try to set it up as in the tutorial it
> won't crawl a thing??!!
> 
> Sorry I cannot be of help but I feel somewhat behind you in terms of Nutch
> dev, I am thinking of trying Nutch using ver 8 instead of 9 as there is
> more documented on it although I have read that it is slow, half the speed
> of ver 9 in terms of crawl speed, are you using ver 8?
> 
> Regards,
> 
> Oli
> 
> 
> Vadim B wrote:
>> 
>> Could you solve the problem? 
>> 
>> I get about 800kb/s as transfer speed wich is not so fast to use it in
>> productiv enviroment, what about you?
>> 
>> 
>> 
>> opoole wrote:
>>> 
>>> Sorry Vadim,
>>> 
>>> I did not realise you had sent me the email [Doh!].
>>> 
>>> 
>>> Vadim B wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am working on the same issue as you, So far I could crawl
>>>> file:///C:/* but i am stucked on the smb part. It looks to me that this
>>>> plugin isn't working properly so it needs to be fixed for the newer
>>>> version of nutch.
>>>> 
>>>> The error I get differs a bit from yours it is:
>>>> 
>>>> 2007-05-25 18:06:29,573 INFO  fetcher.Fetcher - fetching
>>>> smb://mobidick/test/
>>>> 2007-05-25 18:06:29,573 INFO  fetcher.Fetcher - fetch of
>>>> smb://mobidick/test/ failed with:
>>>> org.apache.nutch.protocol.ProtocolNotFound: protocol not found for
>>>> url=smb
>>>> 
>>>> I will dive into the plugin-smb and try out to narrow the problem Maybe
>>>> we can work together to get a quick solution.
>>>> 
>>>> 
>>>> 
>>>> ---SNIP---
>>>> 
>>>> # accept hosts in MY.DOMAIN.NAME
>>>> # Standart +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
>>>> +^file:///C:/Policies/ <<-- why you put it here it doesn't make sense
>>>> because the +^(file|smb) line above is already fitting so this will be
>>>> skipped 
>>>> ---SNIP ---
>>>> 
>>>> ---SNIP ---
>>>> 2007-05-24 14:04:22,000 WARN  crawl.PartitionUrlByHost - Malformed URL:
>>>> 'smb://sql1/Sales/DATA/' 
>>>> //did you cuoted the url or is it displayed in the logs like this? I
>>>> dont get this error 
>>>> ---SNIP ---
>>>> 
>>>> try this  in package org.apache.nutch.crawl.Crawl
>>>> 
>>>>   public static void main(String args[]) throws Exception {
>>>>      System.setProperty("java.protocol.handler.pkgs", "jcifs"); // new 
>>>>      LOG.info("SMB Info: " +
>>>> System.getProperty("java.protocol.handler.pkgs")); //new 
>>>>      LOG.info("SMB Info: " +  new
>>>> java.util.PropertyPermission("java.protocol.handler.pkgs","read,
>>>> write").toString());//new 
>>>>      if (args.length < 1) {
>>>>       System.out.println
>>>>         ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i] [-topN
>>>> N]");
>>>>       return;
>>>>     }
>>>> ---SNIP---
>>>> 
>>>> check out this:
>>>> http://java.sun.com/developer/onlineTraining/protocolhandlers/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> opoole wrote:
>>>>> 
>>>>> Hi All, I hope you can help as I am becomming rather depressed with
>>>>> Nutch on Windows.
>>>>> 
>>>>> Using: Windows XP Pro SP2 - Nutch-0.9 - Cygwin [current version from
>>>>> cygwin site] - Java JDK 1.6.0 - Java Platform Standard Edition 1.6.0
>>>>> 
>>>>> I cannot stop Nutch from crawling parent directories, I have looked at
>>>>> other threads and none seem to work.
>>>>> 
>>>>> I have tried to include protocol-smb [jcifs] but Cygwin keeps
>>>>> prompting for Java syntax corrections.
>>>>> 
>>>>> Below I have listed my configurations along with the command I type in
>>>>> cygwin for jcifs:
>>>>> 
>>>>> CRAWL-URLFILTER
>>>>> # The url filter file used by the crawl command.
>>>>> 
>>>>> # Better for intranet crawling.
>>>>> # Be sure to change MY.DOMAIN.NAME to your domain name.
>>>>> 
>>>>> # Each non-comment, non-blank line contains a regular expression
>>>>> # prefixed by '+' or '-'.  The first matching pattern in the file
>>>>> # determines whether a URL is included or ignored.  If no pattern
>>>>> # matches, the URL is ignored.
>>>>> 
>>>>> # skip file:, ftp:, & mailto: urls
>>>>> -^(http|ftp|mailto):
>>>>> +^(file|smb):
>>>>> 
>>>>> # skip image and other suffixes we can't yet parse
>>>>> -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
>>>>> 
>>>>> # skip URLs containing certain characters as probable queries, etc.
>>>>> 
>>>>> # skip URLs with slash-delimited segment that repeats 3+ times, to
>>>>> break
>>>>> loops
>>>>> -.*(/[^/]+)/[^/]+\1/[^/]+\1/
>>>>> 
>>>>> # accept hosts in MY.DOMAIN.NAME
>>>>> # Standart +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
>>>>> +^file:///C:/Policies/ <<-- why you put it here it doesnt make sese 
>>>>> because the +^(file|smb) is already fitting !
>>>>> 
>>>>> # skip everything else
>>>>> -.
>>>>> 
>>>>> NUTCH-SITE
>>>>> 
>>>>> <?xml version="1.0"?>
>>>>> <?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
>>>>> <!-- Put site-specific property overrides in this file. -->
>>>>> 
>>>>> <nutch-conf>
>>>>> 
>>>>> <property>
>>>>>  <name>http.agent.name</name>
>>>>>  <value>pascall</value>
>>>>>  <description></description>
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>>   <name>file.content.limit</name>
>>>>>   <value>-1</value>
>>>>>   <description>The length limit for downloaded content, in bytes.
>>>>>   If this value is nonnegative (>=0), content longer than it will be
>>>>> truncated;
>>>>>   otherwise, no truncation at all.
>>>>>   </description>
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>>   <name>file.crawl.parent</name>
>>>>>   <value>false</value>
>>>>>   <description>The crawler is not restricted to the directories that
>>>>> you specified in the
>>>>>     Urls file but it is jumping into the parent directories as well.
>>>>> For your own crawlings you can
>>>>>     change this bahavior (set to false) the way that only directories
>>>>> beneath the directories that you specify get
>>>>>     crawled.</description>
>>>>> </property>
>>>>> 
>>>>> <property>
>>>>> <name>plugin.includes</name> 
>>>>> <value>protocol-file|protocol-smb|scoring-opic|parse-(msexcel|mspowerpoint|msword|xml|text|html|pdf)|index-basic|query-(basic|site|url)</value>
>>>>> </property> 
>>>>> 
>>>>> </nutch-conf>
>>>>> 
>>>>> CYGWIN
>>>>> 
>>>>> Using cygwin I enter the command from C:\Sun\Java\jdk160\bin\
>>>>> 
>>>>> java -Djava.protocol.handler.pkgs=jcifs
>>>>> 
>>>>> When I press return the cygwin shell displays a list of java commands
>>>>> as though I am using incorrect syntax.
>>>>> 
>>>>> Dump of Crawl from Cygwin:
>>>>> 
>>>>> 2007-05-24 14:04:16,140 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:16,171 INFO  crawl.Crawl - crawl started in: crawl
>>>>> 2007-05-24 14:04:16,171 INFO  crawl.Crawl - rootUrlDir = urls.txt
>>>>> 2007-05-24 14:04:16,171 INFO  crawl.Crawl - threads = 10
>>>>> 2007-05-24 14:04:16,171 INFO  crawl.Crawl - depth = 5
>>>>> 2007-05-24 14:04:16,281 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:16,281 INFO  crawl.Injector - Injector: starting
>>>>> 2007-05-24 14:04:16,281 INFO  crawl.Injector - Injector: crawlDb:
>>>>> crawl/crawldb
>>>>> 2007-05-24 14:04:16,296 INFO  crawl.Injector - Injector: urlDir:
>>>>> urls.txt
>>>>> 2007-05-24 14:04:16,296 INFO  crawl.Injector - Injector: Converting
>>>>> injected urls to crawl db entries.
>>>>> 2007-05-24 14:04:16,328 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:16,843 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:16,953 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository - Registered
>>>>> Plugins:
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   the nutch
>>>>> core extension points (nutch-extensionpoints)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   MSPowerPoint
>>>>> Parse Plug-in (parse-mspowerpoint)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Basic Query
>>>>> Filter (query-basic)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Basic
>>>>> Indexing Filter (index-basic)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Html Parse
>>>>> Plug-in (parse-html)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Pdf Parse
>>>>> Plug-in (parse-pdf)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Site Query
>>>>> Filter (query-site)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Jakarta POI -
>>>>> Java API To Access Microsoft Format Files (lib-jakarta-poi)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Text Parse
>>>>> Plug-in (parse-text)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   MSWord Parse
>>>>> Plug-in (parse-msword)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   SMB Protocol
>>>>> Plug-in (protocol-smb)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   MSExcel Parse
>>>>> Plug-in (parse-msexcel)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   OPIC Scoring
>>>>> Plug-in (scoring-opic)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   CyberNeko
>>>>> HTML Parser (lib-nekohtml)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Log4j
>>>>> (lib-log4j)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   File Protocol
>>>>> Plug-in (protocol-file)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   URL Query
>>>>> Filter (query-url)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Parse MS
>>>>> Documents Framework (lib-parsems)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository - Registered
>>>>> Extension-Points:
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch
>>>>> Summarizer (org.apache.nutch.searcher.Summarizer)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Normalizer (org.apache.nutch.net.URLNormalizer)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch
>>>>> Protocol (org.apache.nutch.protocol.Protocol)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch
>>>>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Filter (org.apache.nutch.net.URLFilter)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch
>>>>> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch Online
>>>>> Search Results Clustering Plugin
>>>>> (org.apache.nutch.clustering.OnlineClusterer)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   HTML Parse
>>>>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch Content
>>>>> Parser (org.apache.nutch.parse.Parser)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch Scoring
>>>>> (org.apache.nutch.scoring.ScoringFilter)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Nutch Query
>>>>> Filter (org.apache.nutch.searcher.QueryFilter)
>>>>> 2007-05-24 14:04:17,156 INFO  plugin.PluginRepository -   Ontology
>>>>> Model Loader (org.apache.nutch.ontology.Ontology)
>>>>> 2007-05-24 14:04:17,875 INFO  crawl.Injector - Injector: Merging
>>>>> injected urls into crawl db.
>>>>> 2007-05-24 14:04:17,906 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:18,156 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:18,375 WARN  util.NativeCodeLoader - Unable to load
>>>>> native-hadoop library for your platform... using builtin-java classes
>>>>> where applicable
>>>>> 2007-05-24 14:04:19,281 INFO  crawl.Injector - Injector: done
>>>>> 2007-05-24 14:04:20,281 INFO  crawl.Generator - Generator: Selecting
>>>>> best-scoring urls due for fetch.
>>>>> 2007-05-24 14:04:20,281 INFO  crawl.Generator - Generator: starting
>>>>> 2007-05-24 14:04:20,281 INFO  crawl.Generator - Generator: segment:
>>>>> crawl/segments/20070524140420
>>>>> 2007-05-24 14:04:20,281 INFO  crawl.Generator - Generator: filtering:
>>>>> false
>>>>> 2007-05-24 14:04:20,281 INFO  crawl.Generator - Generator: topN:
>>>>> 2147483647
>>>>> 2007-05-24 14:04:20,312 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:20,312 INFO  crawl.Generator - Generator: jobtracker
>>>>> is 'local', generating exactly one partition.
>>>>> 2007-05-24 14:04:20,562 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:20,609 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository - Registered
>>>>> Plugins:
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   the nutch
>>>>> core extension points (nutch-extensionpoints)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   MSPowerPoint
>>>>> Parse Plug-in (parse-mspowerpoint)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Basic Query
>>>>> Filter (query-basic)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Basic
>>>>> Indexing Filter (index-basic)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Html Parse
>>>>> Plug-in (parse-html)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Pdf Parse
>>>>> Plug-in (parse-pdf)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Site Query
>>>>> Filter (query-site)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Jakarta POI -
>>>>> Java API To Access Microsoft Format Files (lib-jakarta-poi)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Text Parse
>>>>> Plug-in (parse-text)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   MSWord Parse
>>>>> Plug-in (parse-msword)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   SMB Protocol
>>>>> Plug-in (protocol-smb)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   MSExcel Parse
>>>>> Plug-in (parse-msexcel)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   OPIC Scoring
>>>>> Plug-in (scoring-opic)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   CyberNeko
>>>>> HTML Parser (lib-nekohtml)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Log4j
>>>>> (lib-log4j)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   File Protocol
>>>>> Plug-in (protocol-file)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   URL Query
>>>>> Filter (query-url)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Parse MS
>>>>> Documents Framework (lib-parsems)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository - Registered
>>>>> Extension-Points:
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch
>>>>> Summarizer (org.apache.nutch.searcher.Summarizer)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Normalizer (org.apache.nutch.net.URLNormalizer)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch
>>>>> Protocol (org.apache.nutch.protocol.Protocol)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch
>>>>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Filter (org.apache.nutch.net.URLFilter)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch
>>>>> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch Online
>>>>> Search Results Clustering Plugin
>>>>> (org.apache.nutch.clustering.OnlineClusterer)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   HTML Parse
>>>>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch Content
>>>>> Parser (org.apache.nutch.parse.Parser)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch Scoring
>>>>> (org.apache.nutch.scoring.ScoringFilter)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Nutch Query
>>>>> Filter (org.apache.nutch.searcher.QueryFilter)
>>>>> 2007-05-24 14:04:20,781 INFO  plugin.PluginRepository -   Ontology
>>>>> Model Loader (org.apache.nutch.ontology.Ontology)
>>>>> 2007-05-24 14:04:20,796 WARN  crawl.PartitionUrlByHost - Malformed
>>>>> URL: 'smb://sql1/Sales/DATA/'
>>>>> 2007-05-24 14:04:20,843 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository - Registered
>>>>> Plugins:
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   the nutch
>>>>> core extension points (nutch-extensionpoints)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   MSPowerPoint
>>>>> Parse Plug-in (parse-mspowerpoint)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Basic Query
>>>>> Filter (query-basic)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Basic
>>>>> Indexing Filter (index-basic)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Html Parse
>>>>> Plug-in (parse-html)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Pdf Parse
>>>>> Plug-in (parse-pdf)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Site Query
>>>>> Filter (query-site)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Jakarta POI -
>>>>> Java API To Access Microsoft Format Files (lib-jakarta-poi)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Text Parse
>>>>> Plug-in (parse-text)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   MSWord Parse
>>>>> Plug-in (parse-msword)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   SMB Protocol
>>>>> Plug-in (protocol-smb)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   MSExcel Parse
>>>>> Plug-in (parse-msexcel)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   OPIC Scoring
>>>>> Plug-in (scoring-opic)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   CyberNeko
>>>>> HTML Parser (lib-nekohtml)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Log4j
>>>>> (lib-log4j)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   File Protocol
>>>>> Plug-in (protocol-file)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   URL Query
>>>>> Filter (query-url)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Parse MS
>>>>> Documents Framework (lib-parsems)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository - Registered
>>>>> Extension-Points:
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Summarizer (org.apache.nutch.searcher.Summarizer)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Normalizer (org.apache.nutch.net.URLNormalizer)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Protocol (org.apache.nutch.protocol.Protocol)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Filter (org.apache.nutch.net.URLFilter)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch Online
>>>>> Search Results Clustering Plugin
>>>>> (org.apache.nutch.clustering.OnlineClusterer)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   HTML Parse
>>>>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch Content
>>>>> Parser (org.apache.nutch.parse.Parser)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch Scoring
>>>>> (org.apache.nutch.scoring.ScoringFilter)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Nutch Query
>>>>> Filter (org.apache.nutch.searcher.QueryFilter)
>>>>> 2007-05-24 14:04:21,000 INFO  plugin.PluginRepository -   Ontology
>>>>> Model Loader (org.apache.nutch.ontology.Ontology)
>>>>> 2007-05-24 14:04:21,578 INFO  crawl.Generator - Generator:
>>>>> Partitioning selected urls by host, for politeness.
>>>>> 2007-05-24 14:04:21,593 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:21,828 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:21,859 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository - Registered
>>>>> Plugins:
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   the nutch
>>>>> core extension points (nutch-extensionpoints)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   MSPowerPoint
>>>>> Parse Plug-in (parse-mspowerpoint)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Basic Query
>>>>> Filter (query-basic)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Basic
>>>>> Indexing Filter (index-basic)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Html Parse
>>>>> Plug-in (parse-html)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Pdf Parse
>>>>> Plug-in (parse-pdf)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Site Query
>>>>> Filter (query-site)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Jakarta POI -
>>>>> Java API To Access Microsoft Format Files (lib-jakarta-poi)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Text Parse
>>>>> Plug-in (parse-text)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   MSWord Parse
>>>>> Plug-in (parse-msword)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   SMB Protocol
>>>>> Plug-in (protocol-smb)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   MSExcel Parse
>>>>> Plug-in (parse-msexcel)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   OPIC Scoring
>>>>> Plug-in (scoring-opic)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   CyberNeko
>>>>> HTML Parser (lib-nekohtml)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Log4j
>>>>> (lib-log4j)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   File Protocol
>>>>> Plug-in (protocol-file)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   URL Query
>>>>> Filter (query-url)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Parse MS
>>>>> Documents Framework (lib-parsems)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository - Registered
>>>>> Extension-Points:
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Summarizer (org.apache.nutch.searcher.Summarizer)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Normalizer (org.apache.nutch.net.URLNormalizer)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Protocol (org.apache.nutch.protocol.Protocol)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Filter (org.apache.nutch.net.URLFilter)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch
>>>>> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch Online
>>>>> Search Results Clustering Plugin
>>>>> (org.apache.nutch.clustering.OnlineClusterer)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   HTML Parse
>>>>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch Content
>>>>> Parser (org.apache.nutch.parse.Parser)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch Scoring
>>>>> (org.apache.nutch.scoring.ScoringFilter)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Nutch Query
>>>>> Filter (org.apache.nutch.searcher.QueryFilter)
>>>>> 2007-05-24 14:04:22,000 INFO  plugin.PluginRepository -   Ontology
>>>>> Model Loader (org.apache.nutch.ontology.Ontology)
>>>>> 2007-05-24 14:04:22,000 WARN  crawl.PartitionUrlByHost - Malformed
>>>>> URL: 'smb://sql1/Sales/DATA/'
>>>>> 2007-05-24 14:04:22,843 INFO  crawl.Generator - Generator: done.
>>>>> 2007-05-24 14:04:22,843 INFO  fetcher.Fetcher - Fetcher: starting
>>>>> 2007-05-24 14:04:22,843 INFO  fetcher.Fetcher - Fetcher: segment:
>>>>> crawl/segments/20070524140420
>>>>> 2007-05-24 14:04:22,859 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:23,156 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:23,187 INFO  fetcher.Fetcher - Fetcher: threads: 10
>>>>> 2007-05-24 14:04:23,203 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository - Registered
>>>>> Plugins:
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   the nutch
>>>>> core extension points (nutch-extensionpoints)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   MSPowerPoint
>>>>> Parse Plug-in (parse-mspowerpoint)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Basic Query
>>>>> Filter (query-basic)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Basic
>>>>> Indexing Filter (index-basic)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Html Parse
>>>>> Plug-in (parse-html)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Pdf Parse
>>>>> Plug-in (parse-pdf)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Site Query
>>>>> Filter (query-site)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Jakarta POI -
>>>>> Java API To Access Microsoft Format Files (lib-jakarta-poi)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Text Parse
>>>>> Plug-in (parse-text)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   MSWord Parse
>>>>> Plug-in (parse-msword)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   SMB Protocol
>>>>> Plug-in (protocol-smb)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   MSExcel Parse
>>>>> Plug-in (parse-msexcel)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   OPIC Scoring
>>>>> Plug-in (scoring-opic)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   CyberNeko
>>>>> HTML Parser (lib-nekohtml)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Log4j
>>>>> (lib-log4j)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   File Protocol
>>>>> Plug-in (protocol-file)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   URL Query
>>>>> Filter (query-url)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Parse MS
>>>>> Documents Framework (lib-parsems)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository - Registered
>>>>> Extension-Points:
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch
>>>>> Summarizer (org.apache.nutch.searcher.Summarizer)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Normalizer (org.apache.nutch.net.URLNormalizer)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch
>>>>> Protocol (org.apache.nutch.protocol.Protocol)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch
>>>>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Filter (org.apache.nutch.net.URLFilter)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch
>>>>> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch Online
>>>>> Search Results Clustering Plugin
>>>>> (org.apache.nutch.clustering.OnlineClusterer)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   HTML Parse
>>>>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch Content
>>>>> Parser (org.apache.nutch.parse.Parser)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch Scoring
>>>>> (org.apache.nutch.scoring.ScoringFilter)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Nutch Query
>>>>> Filter (org.apache.nutch.searcher.QueryFilter)
>>>>> 2007-05-24 14:04:23,343 INFO  plugin.PluginRepository -   Ontology
>>>>> Model Loader (org.apache.nutch.ontology.Ontology)
>>>>> 2007-05-24 14:04:23,390 INFO  fetcher.Fetcher - fetching
>>>>> smb://sql1/Sales/DATA/
>>>>> 2007-05-24 14:04:23,390 INFO  fetcher.Fetcher - fetch of
>>>>> smb://sql1/Sales/DATA/ failed with:
>>>>> org.apache.nutch.protocol.ProtocolNotFound:
>>>>> java.net.MalformedURLException: unknown protocol: smb
>>>>> 2007-05-24 14:04:23,500 INFO  fetcher.Fetcher - fetching
>>>>> file:///C:/Policies/
>>>>> 2007-05-24 14:04:23,718 INFO  crawl.SignatureFactory - Using Signature
>>>>> impl: org.apache.nutch.crawl.MD5Signature
>>>>> 2007-05-24 14:04:24,671 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository - Registered
>>>>> Plugins:
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   the nutch
>>>>> core extension points (nutch-extensionpoints)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   MSPowerPoint
>>>>> Parse Plug-in (parse-mspowerpoint)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Basic Query
>>>>> Filter (query-basic)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Basic
>>>>> Indexing Filter (index-basic)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Html Parse
>>>>> Plug-in (parse-html)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Pdf Parse
>>>>> Plug-in (parse-pdf)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Site Query
>>>>> Filter (query-site)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Jakarta POI -
>>>>> Java API To Access Microsoft Format Files (lib-jakarta-poi)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Text Parse
>>>>> Plug-in (parse-text)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   MSWord Parse
>>>>> Plug-in (parse-msword)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   SMB Protocol
>>>>> Plug-in (protocol-smb)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   MSExcel Parse
>>>>> Plug-in (parse-msexcel)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   OPIC Scoring
>>>>> Plug-in (scoring-opic)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   CyberNeko
>>>>> HTML Parser (lib-nekohtml)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Log4j
>>>>> (lib-log4j)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   File Protocol
>>>>> Plug-in (protocol-file)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   URL Query
>>>>> Filter (query-url)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Parse MS
>>>>> Documents Framework (lib-parsems)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository - Registered
>>>>> Extension-Points:
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch
>>>>> Summarizer (org.apache.nutch.searcher.Summarizer)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Normalizer (org.apache.nutch.net.URLNormalizer)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch
>>>>> Protocol (org.apache.nutch.protocol.Protocol)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch
>>>>> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch URL
>>>>> Filter (org.apache.nutch.net.URLFilter)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch
>>>>> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch Online
>>>>> Search Results Clustering Plugin
>>>>> (org.apache.nutch.clustering.OnlineClusterer)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   HTML Parse
>>>>> Filter (org.apache.nutch.parse.HtmlParseFilter)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch Content
>>>>> Parser (org.apache.nutch.parse.Parser)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch Scoring
>>>>> (org.apache.nutch.scoring.ScoringFilter)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Nutch Query
>>>>> Filter (org.apache.nutch.searcher.QueryFilter)
>>>>> 2007-05-24 14:04:24,812 INFO  plugin.PluginRepository -   Ontology
>>>>> Model Loader (org.apache.nutch.ontology.Ontology)
>>>>> 2007-05-24 14:04:25,171 INFO  fetcher.Fetcher - Fetcher: done
>>>>> 2007-05-24 14:04:25,171 INFO  crawl.CrawlDb - CrawlDb update: starting
>>>>> 2007-05-24 14:04:25,171 INFO  crawl.CrawlDb - CrawlDb update: db:
>>>>> crawl/crawldb
>>>>> 2007-05-24 14:04:25,171 INFO  crawl.CrawlDb - CrawlDb update:
>>>>> segments: [crawl/segments/20070524140420]
>>>>> 2007-05-24 14:04:25,171 INFO  crawl.CrawlDb - CrawlDb update:
>>>>> additions allowed: true
>>>>> 2007-05-24 14:04:25,171 INFO  crawl.CrawlDb - CrawlDb update: URL
>>>>> normalizing: true
>>>>> 2007-05-24 14:04:25,171 INFO  crawl.CrawlDb - CrawlDb update: URL
>>>>> filtering: true
>>>>> 2007-05-24 14:04:25,203 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:25,203 INFO  crawl.CrawlDb - CrawlDb update: Merging
>>>>> segment data into db.
>>>>> 2007-05-24 14:04:25,421 FATAL conf.Configuration - bad conf file:
>>>>> top-level element not <configuration>
>>>>> 2007-05-24 14:04:25,468 INFO  plugin.PluginRepository - Plugins:
>>>>> looking in: C:\nutch-0.9\plugins
>>>>> 2007-05-24 14:04:25,593 INFO  plugin.PluginRepository - Plugin
>>>>> Auto-activation mode: [true]
>>>>> 
>>>>> 
>>>>> Thank you for reading my post, hope you can help.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Oli
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/WIN-XP-PRO--Djava.protocol*-file%3A---c%3A-folder--Crawling-Parents-tf3809966.html#a11047384
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to