Hi all!
As I understand Nutch creates distributed index in Hadoop called
"Indexes" while indexing fetched segments. Then it merges these Indexes
into one Index in local file system.
We use parts of Nutch in our project. We want to use only distributed
index ("Indexes"). The problem is that we want to refresh index every
time we fetch a number of documents, but I do not know how to add newly
fetched documents to it. I wrote my own class instead of Indexer. All
the difference is that in this instanciating: 
IndexWriter(fs.startLocalOutput(perm, temp).toString(),
                          new NutchDocumentAnalyzer(job), true);
I changed parameter "create" from true to false.
Still nutch throws FileAlreadyExistsException caused by 

org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatB
ase.java:96)

Is it possible to add new documents to "Indexes" without full rewriting
of these "Indexes"?

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to