Andrzej Bialecki wrote:
I have a problem with the recently added CRC files, when "put"-ting stuff to NDFS. NDFS complains that these files already exist - I suspect that it creates them on the fly just before they are actually transmitted from the NDFSClient - and aborts the transfer. I was able to succeed in -put operation only if I first deleted all .*.crc files.

I have not seen this. Can you tell me more how to cause this problem, perhaps providing the transcript of a session? Are you overwriting existing files?

A crc file is created just after file is opened for output. It overwrites any existing crc file. See NFSDataOutputStream.java line 44.

There are a few cases where things will complain about non-existant .crc files. This happens, e.g., when putting a file that was not created by Nutch tools.

It also notably happens with Lucene indexes, since these are created by FSDirectory, not NDFSDirectory, since NDFS does not permit overwrites, and Lucene overwrites in one place (TermInfosWriter.java line 141). If we modify Lucene to write the term count at EOF-8 then Lucene indexes can be written directly through a NutchFileSystem API and will be correctly checksummed at creation. Is this change to Lucene justified?

Doug

Reply via email to