I found a way to work around this problem. Maybe this will help someone else. Apparently there is some problem with IFS in the sh shell I'm using. If I commented out these two lines in bin/nutch:
#IFS=
#unset IFS
it works fine. If I leave "unset IFS" the script won't run and I get the error "IFS: cannot unset" (I noted someone wrote to the list last year asking about "IFS: cannot unset")

Once I commented out the lines "IFS=" and "unset IFS" the script ran OK and the various params on the bin/nutch command line were interpreted properly. The output line contained only "rootUrlFile = urls.txt" as you would expect, and not

Another approach that works for me is to leave the script as is, with setting and unsetting IFS but using bash instead of sh.

Michael Levy wrote:
I hope someone can help me with this problem I'm having with crawling on Solaris. The same script works fine on Windows using cygwin but I need to run this on Solaris.

This works fine:
 #bin/nutch crawl urls.txt
...creating a directory named something like crawl-20060418105008, as expected, and creates a working index.

However if I try to add any parameters beyond the root_url_file parameter I get the output below. I'm really stumped. The following does not create a directory named FOO, but it does create a directory named something like crawl-20060418105500. Apparently it ignores the -dir FOO parameter.

Actually looking at the output it seems as if it is taking "urls.txt -dir FOO" as the name of the urls file, rather than interpreting the "-dir FOO" at all. See the line "rootUrlFile = urls.txt -dir FOO"; it should just be "rootUrlFile = urls.txt" I think.


## bin/nutch crawl urls.txt -dir FOO
060418 105308 parsing file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml 060418 105308 parsing file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml 060418 105308 parsing file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060418 105308 No FS indicated, using default:local
060418 105308 crawl started in: crawl-20060418105308
060418 105308 rootUrlFile = urls.txt -dir FOO
060418 105308 threads = 10
060418 105308 depth = 5
060418 105310 Created webdb at LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060418105308/db Exception in thread "main" java.io.FileNotFoundException: urls.txt -dir FOO (No such file or directory)
      at java.io.FileInputStream.open(Native Method)
      at java.io.FileInputStream.<init>(FileInputStream.java:106)
      at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
      at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
      at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)






-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to