On 4/24/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > I forgot to have a look at the log files: > namenode: > 060424 121444 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > 060424 121444 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > Exception in thread "main" java.lang.RuntimeException: Not a host:port pair: > local > at org.apache.hadoop.dfs.DataNode.createSocketAddr(DataNode.java:75) > at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:78) > at org.apache.hadoop.dfs.NameNode.main(NameNode.java:394) > > > datanode > 060424 121448 10 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > 060424 121448 10 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > 060424 121448 10 Can't start DataNode in non-directory: /tmp/hadoop/dfs/data > > jobtracker > 060424 121455 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > 060424 121455 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > 060424 121455 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > 060424 121456 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > 060424 121456 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > Exception in thread "main" java.lang.RuntimeException: Bad > mapred.job.tracker: local > at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361) > at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:333) > at > org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:51) > at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:907) > > > tasktracker > 060424 121502 parsing > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > 060424 121503 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > Exception in thread "main" java.lang.RuntimeException: Bad > mapred.job.tracker: local > at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:361) > at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:86) > at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:755) > > > What can be the problem? > > --- Ursprüngliche Nachricht --- > > Von: "Peter Swoboda" <[EMAIL PROTECTED]> > > An: [email protected] > > Betreff: Re: java.io.IOException: No input directories specified in > > Datum: Mon, 24 Apr 2006 12:39:28 +0200 (MEST) > > > > Got the latest nutch-nightly built, > > including hadoop-0.1.1.jar. > > Copied the content of the daoop-default.xml into hadoop-site.xml. > > started namenode, datanode, jobtracker, tasktracker. > > made > > bin/hadoop dfs -put seeds seeds > > > > result: > > > > bash-3.00$ bin/hadoop-daemon.sh start namenode > > starting namenode, logging to > > bin/../logs/hadoop-jung-namenode-gillespie.log > > > > bash-3.00$ bin/hadoop-daemon.sh start datanode > > starting datanode, logging to > > bin/../logs/hadoop-jung-datanode-gillespie.log > > > > bash-3.00$ bin/hadoop-daemon.sh start jobtracker > > starting jobtracker, logging to > > bin/../logs/hadoop-jung-jobtracker-gillespie.log > > > > bash-3.00$ bin/hadoop-daemon.sh start tasktracker > > starting tasktracker, logging to > > bin/../logs/hadoop-jung-tasktracker-gillespie.log > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds > > 060424 121512 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > 060424 121512 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > 060424 121513 No FS indicated, using default:local > > > > bash-3.00$ bin/hadoop dfs -ls > > 060424 121543 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > 060424 121543 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > 060424 121544 No FS indicated, using default:local > > Found 18 items > > /home/../nutch-nightly/docs <dir> > > /home/../nutch-nightly/nutch-nightly.war 15541036 > > /home/../nutch-nightly/webapps <dir> > > /home/../nutch-nightly/CHANGES.txt 17709 > > /home/../nutch-nightly/build.xml 21433 > > /home/../nutch-nightly/LICENSE.txt 615 > > /home/../nutch-nightly/test.log 3447 > > /home/../nutch-nightly/conf <dir> > > /home/../nutch-nightly/default.properties 3043 > > /home/../nutch-nightly/plugins <dir> > > /home/../nutch-nightly/lib <dir> > > /home/../nutch-nightly/bin <dir> > > /home/../nutch-nightly/logs <dir> > > /home/../nutch-nightly/nutch-nightly.jar 408375 > > /home/../nutch-nightly/src <dir> > > /home/../nutch-nightly/nutch-nightly.job 18537096 > > /home/../nutch-nightly/seeds <dir> > > /home/../nutch-nightly/README.txt 403 > > > > bash-3.00$ bin/hadoop dfs -ls seeds > > 060424 121603 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > 060424 121603 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > 060424 121603 No FS indicated, using default:local > > Found 2 items > > /home/../nutch-nightly/seeds/urls.txt~ 0 > > /home/../nutch-nightly/seeds/urls.txt 26 > > > > so far so good, but: > > > > bash-3.00$ bin/nutch crawl seeds -dir crawled -depht 2 > > 060424 121613 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-default.xml > > 060424 121613 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml > > 060424 121613 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > 060424 121613 parsing file:/home/../nutch-nightly/conf/nutch-site.xml > > 060424 121613 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > 060424 121614 crawl started in: crawled > > 060424 121614 rootUrlDir = 2 > > 060424 121614 threads = 10 > > 060424 121614 depth = 5 > > Exception in thread "main" java.io.IOException: No valid local directories > > in property: mapred.local.dir > > at > > org.apache.hadoop.conf.Configuration.getFile(Configuration.java:282) > > at org.apache.hadoop.mapred.JobConf.getLocalFile(JobConf.java:127) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:101) > > bash-3.00$ > > > > I really don't know what to do. > > in hadoop-site.xml it's: > > .. > > <property> > > <name>mapred.local.dir</name> > > <value>/tmp/hadoop/mapred/local</value> > > <description>The local directory where MapReduce stores intermediate > > data files. May be a space- or comma- separated list of > > directories on different devices in order to spread disk i/o. > > </description> > > </property> > > .. > > > > > > > > > > _______________________________________ > > Is your hadoop-site.xml empty, I mean it doesn't consisit any > > configuration correct? So what you need to do is add your > > configuration there. I suggest you copy the hadoop-0.1.1.jar to > > another directory for inspection, copy not move. unzip the > > hadoop-0.1.1.jar file you will see hadoop-default.xml file there. use > > that as a template to edit your hadoop-site.xml under conf. Once you > > have edited it then you should start your 'namenode' and 'datanode'. I > > am guessing you are using nutch in a distributed way. cos you don't > > need to use hadoop if you are just running in one machine local mode!! > > > > Anyway you need to do the following to start the datanode and namenode > > > > bin/hadoop-daemon.sh start namenode > > bin/hadoop-daemon.sh start datanode > > > > then you need to start jobtracker and tasktracker before you start > > crawling > > bin/hadoop-daemon.sh start jobtracker > > bin/hadoop-daemon.sh start tasktracker > > > > then you start your bin/hadoop dfs -put seeds seeds > > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > > > ok. changed to latest nightly build. > > > hadoop-0.1.1.jar is existing, > > > hadoop-site.xml also. > > > now trying > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds > > > > > > 060421 125154 parsing > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop- > > > 0.1.1.jar!/hadoop-default.xml > > > 060421 125155 parsing > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml > > > 060421 125155 No FS indicated, using default:local > > > > > > and > > > > > > bash-3.00$ bin/hadoop dfs -ls > > > > > > 060421 125217 parsing > > > jar:file:/home/stud/jung/Desktop/nutch-nightly/lib/hadoop- > > > 0.1.1.jar!/hadoop-default.xml > > > 060421 125217 parsing > > > file:/home/stud/jung/Desktop/nutch-nightly/conf/hadoop-sit e.xml > > > 060421 125217 No FS indicated, using default:local > > > Found 16 items > > > /home/stud/jung/Desktop/nutch-nightly/docs <dir> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.war 15541036 > > > /home/stud/jung/Desktop/nutch-nightly/webapps <dir> > > > /home/stud/jung/Desktop/nutch-nightly/CHANGES.txt 17709 > > > /home/stud/jung/Desktop/nutch-nightly/build.xml 21433 > > > /home/stud/jung/Desktop/nutch-nightly/LICENSE.txt 615 > > > /home/stud/jung/Desktop/nutch-nightly/conf <dir> > > > /home/stud/jung/Desktop/nutch-nightly/default.properties 3043 > > > /home/stud/jung/Desktop/nutch-nightly/plugins <dir> > > > /home/stud/jung/Desktop/nutch-nightly/lib <dir> > > > /home/stud/jung/Desktop/nutch-nightly/bin <dir> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.jar 408375 > > > /home/stud/jung/Desktop/nutch-nightly/src <dir> > > > /home/stud/jung/Desktop/nutch-nightly/nutch-nightly.job 18537096 > > > /home/stud/jung/Desktop/nutch-nightly/seeds <dir> > > > /home/stud/jung/Desktop/nutch-nightly/README.txt 403 > > > > > > also: > > > > > > bash-3.00$ bin/hadoop dfs -ls seeds > > > > > > 060421 133004 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > > 060421 133004 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > > 060421 133004 No FS indicated, using default:local > > > Found 2 items > > > /home/../nutch-nightly/seeds/urls.txt~ 0 > > > /home/../nutch-nightly/seeds/urls.txt 26 > > > bash-3.00$ > > > > > > but: > > > > > > but: > > > > > > bin/nutch crawl seeds -dir crawled -depht 2 > > > > > > 060421 131722 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-default.xml > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml > > > 060421 131723 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/nutch-site.xml > > > 060421 131723 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > > 060421 131723 crawl started in: crawled > > > 060421 131723 rootUrlDir = 2 > > > 060421 131723 threads = 10 > > > 060421 131723 depth = 5 > > > 060421 131724 Injector: starting > > > 060421 131724 Injector: crawlDb: crawled/crawldb > > > 060421 131724 Injector: urlDir: 2 > > > 060421 131724 Injector: Converting injected urls to crawl db entries. > > > 060421 131724 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > > 060421 131724 parsing file:/home/../nutch-nightly/conf/nutch-default.xml > > > 060421 131724 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml > > > 060421 131724 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131724 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131725 parsing file:/home/../nutch-nightly/conf/nutch-site.xml > > > 060421 131725 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > > 060421 131725 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-default.xml > > > 060421 131726 parsing file:/home/../nutch-nightly/conf/crawl-tool.xml > > > 060421 131726 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131726 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131726 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131726 parsing file:/home/../nutch-nightly/conf/nutch-site.xml > > > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > > 060421 131727 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/hadoop-default.xml > > > 060421 131727 parsing > > > jar:file:/home/../nutch-nightly/lib/hadoop-0.1.1.jar!/mapred-default.xml > > > 060421 131727 parsing > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xml > > > 060421 131727 parsing file:/home/../nutch-nightly/conf/hadoop-site.xml > > > 060421 131727 job_6jn7j8 > > > java.io.IOException: No input directories specified in: Configuration: > > > defaults: hadoop-default.xml , mapred-default.xml , > > > /tmp/hadoop/mapred/local/localRunner/job_6jn7j8.xmlfinal: > > hadoop-site.xml > > > at > > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java > > :90) > > > at > > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java > > :100) > > > at > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:88) > > > 060421 131728 Running job: job_6jn7j8 > > > Exception in thread "main" java.io.IOException: Job failed! > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:322) > > > at org.apache.nutch.crawl.Injector.inject(Injector.java:115) > > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104) > > > bash-3.00$ > > > > > > Can anyone help? > > > > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht --- > > > > Von: "Zaheed Haque" <[EMAIL PROTECTED]> > > > > An: [email protected] > > > > Betreff: Re: java.io.IOException: No input directories specified in > > > > Datum: Fri, 21 Apr 2006 13:18:37 +0200 > > > > > > > > Also I have noticed that you are using hadoop-0.1, there was a bug in > > > > 0.1 you should be using 0.1.1. Under you lib catalog you should have > > > > the following file > > > > > > > > hadoop-0.1.1.jar > > > > > > > > If thats the case. Please download the latest nightly build. > > > > > > > > Cheers > > > > > > > > > > > > > > > > On 4/21/06, Zaheed Haque <[EMAIL PROTECTED]> wrote: > > > > > Do you have a file called "hadoop-site.xml" under your conf > > directory? > > > > > The content of the file is like the following: > > > > > > > > > > <?xml version="1.0"?> > > > > > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > > > > > > > > > <!-- Put site-specific property overrides in this file. --> > > > > > > > > > > <configuration> > > > > > > > > > > </configuration> > > > > > > > > > > or is it missing... if its missing please create a file under the > > conf > > > > > catalog with the name hadoop-site.xml and then try the hadoop dfs > > -ls > > > > > again? you should see something! like listing from your local file > > > > > system. > > > > > > > > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > --- Ursprüngliche Nachricht --- > > > > > > > Von: "Zaheed Haque" <[EMAIL PROTECTED]> > > > > > > > An: [email protected] > > > > > > > Betreff: Re: java.io.IOException: No input directories specified > > in > > > > > > > Datum: Fri, 21 Apr 2006 09:48:38 +0200 > > > > > > > > > > > > > > bin/hadoop dfs -ls > > > > > > > > > > > > > > Can you see your "seeds" directory? > > > > > > > > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -put seeds seeds > > > > > > 060421 122421 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0. > > > > > > 1-dev.jar!/hadoop-default.xml > > > > > > > > > > I think the hadoop-site is missing cos we should be seeing a message > > > > > like this here... > > > > > > > > > > 060421 131014 parsing > > > > > file:/usr/local/src/nutch/build/nutch-0.8-dev/conf/hadoop-site.xml > > > > > > > > > > > 060421 122421 No FS indicated, using default:local > > > > > > > > > > > > bash-3.00$ bin/hadoop dfs -ls > > > > > > > > > > > > 060421 122425 parsing > > jar:file:/home/../nutch-nightly/lib/hadoop-0. > > > > > > 1-dev.jar!/hadoop-default.xml > > > > > > > > > > > > 060421 122426 No FS indicated, using default:local > > > > > > > > > > > > Found 0 items > > > > > > > > > > > > bash-3.00$ > > > > > > > > > > > > As you can see, i can't. > > > > > > What's going wrong? > > > > > > > > > > > > > bin/hadoop dfs -ls seeds > > > > > > > > > > > > > > Can you see your text file with URLS? > > > > > > > > > > > > > > Furthermore bin/nutch crawl is a one shot crawl/index command. I > > > > > > > strongly recommend you take the long route of > > > > > > > > > > > > > > inject, generate, fetch, updatedb, invertlinks, index, dedup and > > > > > > > merge. You can try the above commands just by typing > > > > > > > bin/nutch inject > > > > > > > etc.. > > > > > > > If just try the inject command without any parameters it will > > tell > > > > you > > > > > > > how to use it.. > > > > > > > > > > > > > > Hope this helps. > > > > > > > On 4/21/06, Peter Swoboda <[EMAIL PROTECTED]> wrote: > > > > > > > > hi > > > > > > > > > > > > > > > > i've changed from nutch 0.7 to 0.8 > > > > > > > > done the following steps: > > > > > > > > created an urls.txt in a dir. named seeds > > > > > > > > > > > > > > > > bin/hadoop dfs -put seeds seeds > > > > > > > > > > > > > > > > 060317 121440 parsing > > > > > > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop- > > 0.1-dev.jar!/hadoop-default.xml > > > > > > > > 060317 121441 No FS indicated, using default:local > > > > > > > > > > > > > > > > bin/nutch crawl seeds -dir crawled -depth 2 >& crawl.log > > > > > > > > but in crawl.log: > > > > > > > > 060419 124302 parsing > > > > > > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop- > > 0.1-dev.jar!/hadoop-default.xml > > > > > > > > 060419 124302 parsing > > > > > > > > > > > > > > > > > > > jar:file:/home/../nutch-nightly/lib/hadoop- > > 0.1-dev.jar!/mapred-default.xml > > > > > > > > 060419 124302 parsing > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunner > > > > > > > > 060419 124302 parsing > > > > file:/home/../nutch-nightly/conf/hadoop-site.xml > > > > > > > > java.io.IOException: No input directories specified in: > > > > Configuration: > > > > > > > > defaults: hadoop-default.xml , mapred-default.xml , > > > > > > > > /tmp/hadoop/mapred/local/job_e7cpf1.xml/localRunnerfinal: > > > > > > > hadoop-site.xml > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java > > :84) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java > > :94) > > > > > > > > at > > > > > > > > > > > > > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70) > > > > > > > > 060419 124302 Running job: job_e7cpf1 > > > > > > > > Exception in thread "main" java.io.IOException: Job failed! > > > > > > > > at > > > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310) > > > > > > > > at > > org.apache.nutch.crawl.Injector.inject(Injector.java:114) > > > > > > > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:104) > > > > > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Echte DSL-Flatrate dauerhaft für 0,- Euro*! > > > > > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl > > > > > > > > > > > > > > > > > > > > > -- > > > > -- > > Echte DSL-Flatrate dauerhaft für 0,- Euro*! > > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl > > > > -- > Echte DSL-Flatrate dauerhaft für 0,- Euro*! > "Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl >
------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
