Hello! I'm deploying Nutch on two computers. When I run start-all.sh
script all goes good but data node on slave computer does not log
anything. All other parts of Hadoop (namenode, jobtracker, both
tasktrackers and datanode on master) log their information properly.
Also, when I put some files
I've just changed dfs.replication property in the hadoop-site.xml from 1
to 2. Now I get ArithmeticException: / by zero when try to put
something into dfs.
Also I want to say that both nodes are on Windows.
-Original Message-
From: Ilya Vishnevsky [mailto:[EMAIL PROTECTED]
Sent
Hello!
I'm trying to deploy Nutch on two machines (the slave's IP is for
example 66.66.66.66). After all I type:
bin/start-all.sh
And get the following output:
starting namenode, logging to
/cygdrive/c/nutch/testdfs/logs/hadoop-nutch-namenode-Thanatos.out
localhost: starting datanode,
Hello
I'm trying to deploy Nutch and Hadoop like it's shown here:
http://wiki.apache.org/nutch/NutchHadoopTutorial
The difference is that I use Win XP.
After typing in Cygwin bin/hadoop namenode -format I get the following
exceptions:
Exception in thread main java.lang.NoClassDefFoundError:
Hello. I try to run shell scripts starting Nutch. I use Windows XP, so I
installed cygwin. When I execute bin/start-all.sh, I get following
messages:
localhost: /cygdrive/c/nutch/nutch-0.9/bin/slaves.sh: line 45: ssh:
command not found
localhost: /cygdrive/c/nutch/nutch-0.9/bin/slaves.sh: line
Using SegmentReader I encountered the next problem. While trying to read
content for an URL from segment I got the next log:
INFO 2007-05-18 03:26:58,562 [main] SegmentReader - SegmentReader: get
'http://www.casdn.neu.edu/~etam/military/FM19-30/9309ch.pdf'
DEBUG 2007-05-18 03:27:03,593 [main]
Hi. I'm trying to read the crawl_generate sequence file from my Nutch:
SequenceFile.Reader reader = new SequenceFile.Reader (fs, genPath,
conf);
Here is the exception I get when I try to run my code:
java.io.FileNotFoundException:
C:\webdb\aco\segments\20070515133433\crawl_generate (Access is
Hi all!
As I understand Nutch creates distributed index in Hadoop called
Indexes while indexing fetched segments. Then it merges these Indexes
into one Index in local file system.
We use parts of Nutch in our project. We want to use only distributed
index (Indexes). The problem is that we want to
Another question on the similiar subject:
For example I made a recrawl and found some new pages. I want to add
them to my Index. I use Indexer to create Indexes. How could I add this
Indexes into my already existing Index now?
If I just merge Indexes, I'll lose documents placed in my old Index.
Hello! I'm trying to write a document to an existing index. I've created
the following method:
public void testIndexWriter () {
Analyzer analyzer = new StandardAnalyzer();
Document testDoc = new Document ();
testDoc.add(new Field(apple, apple, Field.Store.YES,
I'm sorry, there is FsDirectory in my method, not DCFsDirectory.
-Original Message-
From: Ilya Vishnevsky [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 22, 2007 4:42 PM
To: nutch-user@lucene.apache.org
Subject: Lucene IndexWriter and Nutch index
Hello! I'm trying to write a document
11 matches
Mail list logo