Hi, Does anyone know what packages I have to install in Suse to get Nutch running?
I have another installation with nutch where everything is fine. So I copied the hole installation. It's also an Suse linux but it is in 64 bit and I don’t installed it. But the same problem. At the moment I installed the following packages: Tomcat 6 Openjdk devel 1.6 Sun java devel 1.6 Ant 1.7 Now it is enough for today. Hope someone can help. Tom -----Original Message----- From: MilleBii [mailto:[email protected]] Sent: Freitag, 4. Dezember 2009 17:31 To: [email protected] Subject: Re: Problems with a new Installation of Nutch I don't know that hadoop uses tomcat... But I think it uses Jetty instead. The nodes communicate via http: so you need some kind of web server... And for monitorin its the best way 2009/12/4, Tom Landvoigt <[email protected]>: > Hi, > > I don't have tomcat on this system because I don't want to use the > websearch. But if it is necessary for hadoop what I don’t think I will > install it. > > nu...@ip-10-224-113-210:/nutch/search> ./bin/hadoop fs -ls / > Found 1 items > -rw-r--r-- 2 nutch supergroup 0 2009-12-04 14:04 /url.txt > nu...@ip-10-224-113-210:/nutch/search> > > I get the normal answer but the file is empty. > > -----Original Message----- > From: MilleBii [mailto:[email protected]] > Sent: Freitag, 4. Dezember 2009 15:06 > To: [email protected] > Subject: Re: Problems with a new Installation of Nutch > > Did you check with the web interface ? It gives a lot of info you can > even browse the file system. > > Try hadoop fs -ls to see what it gives you ? > > 2009/12/4, Tom Landvoigt <[email protected]>: >> Hallo, >> >> >> >> I hope someone can help me. >> >> >> >> I installed nutch on 2 Amazon EC2 computers. Everything is fine but I >> can't put data in the hdfs. >> >> >> >> I formatted the namenode and start the hdfs with start all. >> >> >> >> All java processes start properly, but when I want to make hadoop fs >> -put something / I get these logs: >> >> >> >> >> >> >> >> nu...@bla:/nutch/search> ./bin/hadoop fs -put >> /tmp/hadoop-nutch-tasktracker.pid blub >> >> put: Protocol not available >> >> >> >> DATA NODE LOG on the master: >> >> 2009-12-04 12:50:15,566 INFO http.HttpServer - Version Jetty/5.1.4 >> >> 2009-12-04 12:50:15,582 INFO util.Credential - Checking Resource >> aliases >> >> 2009-12-04 12:50:16,483 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@e45b5e >> >> 2009-12-04 12:50:16,614 INFO util.Container - Started >> WebApplicationContext[/static,/static] >> >> 2009-12-04 12:50:16,882 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@1284fd4 >> >> 2009-12-04 12:50:16,883 INFO util.Container - Started >> WebApplicationContext[/logs,/logs] >> >> 2009-12-04 12:50:17,827 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@39c8c1 >> >> 2009-12-04 12:50:17,849 INFO util.Container - Started >> WebApplicationContext[/,/] >> >> 2009-12-04 12:50:18,485 INFO http.SocketListener - Started >> SocketListener on 0.0.0.0:50075 >> >> 2009-12-04 12:50:18,485 INFO util.Container - Started >> org.mortbay.jetty.ser...@36527f >> >> 2009-12-04 12:54:20,745 ERROR datanode.DataNode - >> DatanodeRegistration(10.224.113.210:50010, >> storageID=DS-1135263253-10.224.113.210-50010-1259926637370, >> infoPort=50075, ipcPort=50020):DataXceiver >> >> java.io.EOFException >> >> at java.io.DataInputStream.readShort(DataInputStream.java:315) >> >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java: >> 79) >> >> at java.lang.Thread.run(Thread.java:636) >> >> 2009-12-04 12:54:20,746 ERROR datanode.DataNode - >> DatanodeRegistration(10.224.113.210:50010, >> storageID=DS-1135263253-10.224.113.210-50010-1259926637370, >> infoPort=50075, ipcPort=50020):DataXceiver >> >> java.io.EOFException >> >> at java.io.DataInputStream.readShort(DataInputStream.java:315) >> >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java: >> 79) >> >> at java.lang.Thread.run(Thread.java:636) >> >> 2009-12-04 12:54:20,747 ERROR datanode.DataNode - >> DatanodeRegistration(10.224.113.210:50010, >> storageID=DS-1135263253-10.224.113.210-50010-1259926637370, >> infoPort=50075, ipcPort=50020):DataXceiver >> >> java.io.EOFException >> >> at java.io.DataInputStream.readShort(DataInputStream.java:315) >> >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java: >> 79) >> >> at java.lang.Thread.run(Thread.java:636) >> >> 2009-12-04 12:54:20,747 ERROR datanode.DataNode - >> DatanodeRegistration(10.224.113.210:50010, >> storageID=DS-1135263253-10.224.113.210-50010-1259926637370, >> infoPort=50075, ipcPort=50020):DataXceiver >> >> java.io.EOFException >> >> at java.io.DataInputStream.readShort(DataInputStream.java:315) >> >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java: >> 79) >> >> at java.lang.Thread.run(Thread.java:636) >> >> >> >> NAME NODE LOG >> >> 2009-12-04 12:50:11,539 INFO http.HttpServer - Version Jetty/5.1.4 >> >> 2009-12-04 12:50:11,573 INFO util.Credential - Checking Resource >> aliases >> >> 2009-12-04 12:50:12,488 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@19fe451 >> >> 2009-12-04 12:50:12,565 INFO util.Container - Started >> WebApplicationContext[/static,/static] >> >> 2009-12-04 12:50:12,891 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@1570945 >> >> 2009-12-04 12:50:12,891 INFO util.Container - Started >> WebApplicationContext[/logs,/logs] >> >> 2009-12-04 12:50:13,569 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@11410e5 >> >> 2009-12-04 12:50:13,582 INFO util.Container - Started >> WebApplicationContext[/,/] >> >> 2009-12-04 12:50:13,613 INFO http.SocketListener - Started >> SocketListener on 0.0.0.0:50070 >> >> 2009-12-04 12:50:13,613 INFO util.Container - Started >> org.mortbay.jetty.ser...@173ec72 >> >> >> >> SECONDARY NAMENODE LOG >> >> 2009-12-04 12:50:19,163 INFO http.HttpServer - Version Jetty/5.1.4 >> >> 2009-12-04 12:50:19,207 INFO util.Credential - Checking Resource >> aliases >> >> 2009-12-04 12:50:20,365 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@174d93a >> >> 2009-12-04 12:50:20,454 INFO util.Container - Started >> WebApplicationContext[/static,/static] >> >> 2009-12-04 12:50:21,396 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@31f2a7 >> >> 2009-12-04 12:50:21,396 INFO util.Container - Started >> WebApplicationContext[/logs,/logs] >> >> 2009-12-04 12:50:21,533 INFO servlet.XMLConfiguration - No >> WEB-INF/web.xml in file:/mnt/nutch/nutch-1.0/webapps/secondary. Serving >> files and default/dynamic servlets only >> >> 2009-12-04 12:50:22,206 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@383118 >> >> 2009-12-04 12:50:22,785 INFO util.Container - Started >> WebApplicationContext[/,/] >> >> 2009-12-04 12:50:22,787 INFO http.SocketListener - Started >> SocketListener on 0.0.0.0:50090 >> >> 2009-12-04 12:50:22,787 INFO util.Container - Started >> org.mortbay.jetty.ser...@297ffb >> >> 2009-12-04 12:50:22,787 WARN namenode.SecondaryNameNode - Checkpoint >> Period :3600 secs (60 min) >> >> 2009-12-04 12:50:22,787 WARN namenode.SecondaryNameNode - Log Size >> Trigger :67108864 bytes (65536 KB) >> >> 2009-12-04 12:55:23,908 WARN namenode.SecondaryNameNode - Checkpoint >> done. New Image Size: 1056 >> >> >> >> HADOOP LOG >> >> 2009-12-04 12:54:20,708 WARN hdfs.DFSClient - DataStreamer Exception: >> java.io.IOException: Unable to create new block. >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(D >> FSClient.java:2722) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.j >> ava:1996) >> >> at >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli >> ent.java:2183) >> >> >> >> 2009-12-04 12:54:20,709 WARN hdfs.DFSClient - Error Recovery for block >> blk_5506837520665828594_1002 bad datanode[0] nodes == null >> >> 2009-12-04 12:54:20,709 WARN hdfs.DFSClient - Could not get block >> locations. Source file "/user/nutch/blub/hadoop-nutch-tasktracker.pid" - >> Aborting... >> >> >> >> DATA NODE LOG on the slave >> >> 2009-12-04 12:49:49,433 INFO http.HttpServer - Version Jetty/5.1.4 >> >> 2009-12-04 12:49:49,438 INFO util.Credential - Checking Resource >> aliases >> >> 2009-12-04 12:49:50,288 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@e45b5e >> >> 2009-12-04 12:49:50,357 INFO util.Container - Started >> WebApplicationContext[/static,/static] >> >> 2009-12-04 12:49:50,555 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@2016b0 >> >> 2009-12-04 12:49:50,555 INFO util.Container - Started >> WebApplicationContext[/logs,/logs] >> >> 2009-12-04 12:49:50,816 INFO util.Container - Started >> org.mortbay.jetty.servlet.webapplicationhand...@118278a >> >> 2009-12-04 12:49:50,820 INFO util.Container - Started >> WebApplicationContext[/,/] >> >> 2009-12-04 12:49:50,849 INFO http.SocketListener - Started >> SocketListener on 0.0.0.0:50075 >> >> 2009-12-04 12:49:50,849 INFO util.Container - Started >> org.mortbay.jetty.ser...@b02928 >> >> >> >> HADOOP SITE XML >> >> <property> >> >> <name>fs.default.name</name> >> >> <value>hdfs://(yes here is the right ip):9000</value> >> >> <description> >> >> The name of the default file system. Either the literal string >> >> "local" or a host:port for NDFS. >> >> </description> >> >> </property> >> >> >> >> <!-- Gibt an wo der JobTracker (koordiniert die (MapReduce-)Auftraege) >> zu finden ist. --> >> >> <property> >> >> <name>mapred.job.tracker</name> >> >> <value>hdfs://(here to):9001</value> >> >> <description> >> >> The host and port that the MapReduce job tracker runs at. If >> >> "local", then jobs are run in-process as a single map and >> >> reduce task. >> >> </description> >> >> </property> >> >> >> >> <!-- Gibt an wieviele MapJobs gleichzeitig laufen duerfen--> >> >> <property> >> >> <name>mapred.tasktracker.map.tasks.maximum</name> >> >> <value>2</value> >> >> <description> >> >> define mapred.map tasks to be number of slave hosts >> >> </description> >> >> </property> >> >> >> >> <!-- Gibt an wieviele ReduceJobs gleichzeitig laufen duerfen--> >> >> <property> >> >> <name>mapred.tasktracker.reduce.tasks.maximum</name> >> >> <value>2</value> >> >> <description> >> >> define mapred.reduce tasks to be number of slave hosts >> >> </description> >> >> </property> >> >> >> >> <property> >> >> <name>mapred.child.java.opts</name> >> >> <value>-Xmx1500m</value> >> >> </property> >> >> >> >> <property> >> >> <name>mapred.jobtracker.restart.recover</name> >> >> <value>true</value> >> >> </property> >> >> >> >> <!-- Die naechsten Einstellungen geben an wo das HadoopFS seine Datein >> auf der Festplatte jeder Instanz speichert. --> >> >> <property> >> >> <name>dfs.name.dir</name> >> >> <value>/nutch/filesystem/name</value> >> >> </property> >> >> >> >> <property> >> >> <name>dfs.data.dir</name> >> >> <value>/nutch/filesystem/data</value> >> >> </property> >> >> >> >> <property> >> >> <name>mapred.system.dir</name> >> >> <value>/nutch/filesystem/mapreduce/system</value> >> >> </property> >> >> >> >> <property> >> >> <name>mapred.local.dir</name> >> >> <value>/nutch/filesystem/mapreduce/local</value> >> >> </property> >> >> >> >> <!-- Gibt an wieviele Replikate einer Datei im Dateisystem vorhanden >> sein muessen damit sie erreichbar ist. Am Anfang 1 --> >> >> <property> >> >> <name>dfs.replication</name> >> >> <value>2</value> >> >> </property> >> >> >> >> I hope someone can help me. >> >> >> >> Thanks >> >> >> >> Tom >> >> > > > -- > -MilleBii- > -- -MilleBii-
