[Nutch-general] Re: crawl command params misinterpreted under Solaris?

Michael Levy Wed, 19 Apr 2006 09:59:07 -0700

I found a way to work around this problem. Maybe this will help someoneelse.Apparently there is some problem with IFS in the sh shell I'm using. IfI commented out these two lines in bin/nutch:

#IFS=
#unset IFS

it works fine. If I leave "unset IFS" the script won't run and I getthe error "IFS: cannot unset" (I noted someone wrote to the list lastyear asking about "IFS: cannot unset")

Once I commented out the lines "IFS=" and "unset IFS" the script ran OKand the various params on the bin/nutch command line were interpretedproperly. The output line contained only "rootUrlFile = urls.txt" asyou would expect, and not

Another approach that works for me is to leave the script as is, withsetting and unsetting IFS but using bash instead of sh.


Michael Levy wrote:

I hope someone can help me with this problem I'm having with crawlingon Solaris. The samescript works fine on Windows using cygwin but I need to run this onSolaris.
This works fine:
 #bin/nutch crawl urls.txt
...creating a directory named something like crawl-20060418105008, asexpected, and creates a working index.
However if I try to add any parameters beyond the root_url_fileparameter I get the output below. I'm really stumped. The followingdoes not create a directory named FOO, but it does create a directorynamed something like crawl-20060418105500. Apparently it ignores the-dir FOO parameter.
Actually looking at the output it seems as if it is taking "urls.txt-dir FOO"as the name of the urls file, rather than interpreting the "-dir FOO"at all.See the line "rootUrlFile = urls.txt -dir FOO"; it should just be"rootUrlFile = urls.txt" I think.
## bin/nutch crawl urls.txt -dir FOO
060418 105308 parsingfile:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml060418 105308 parsingfile:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml060418 105308 parsingfile:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060418 105308 No FS indicated, using default:local
060418 105308 crawl started in: crawl-20060418105308
060418 105308 rootUrlFile = urls.txt -dir FOO
060418 105308 threads = 10
060418 105308 depth = 5
060418 105310 Created webdb atLocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060418105308/dbException in thread "main" java.io.FileNotFoundException: urls.txt-dir FOO (No such file or directory)
      at java.io.FileInputStream.open(Native Method)
      at java.io.FileInputStream.<init>(FileInputStream.java:106)
      at java.io.FileReader.<init>(FileReader.java:55)
atorg.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
      at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
      at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: crawl command params misinterpreted under Solaris?

Reply via email to