On Mon, Jan 16, 2017 at 5:53 PM, Josh Elser <josh.el...@gmail.com> wrote: > > > Dylan Hutchison wrote: >>> >>> You can configure HDFS to use the RawLocalFileSystem class forfile:// >>> > URIs which is what is done for a majority of the integration tests. >>> > Beware >>> > that you configure the RawLocalFileSystem as the ChecksumFileSystem >>> > (default forfile://) will fail miserably around WAL recovery. >>> > >>> > https://github.com/apache/accumulo/blob/master/test/src/main >>> > /java/org/apache/accumulo/test/BulkImportVolumeIT.java#L61 >>> > >>> > >> >> Hi Josh, are you saying that the ChecksumFileSystem is required or >> forbidden for WAL recovery? Looking at the Hadoop code it seems that >> LocalFileSystem wraps around a RawLocalFileSystem to provide checksum >> capabilities. Is that right? >> > > Sorry I wasn't clearer: forbidden. If you use the RawLocalFileSystem and you > should not see any issues. If you use the ChecksumFileSystem (which is the > default) and you *will* see issues.
The ChecksumFileSystem does nothing for flush, thats why there are WAL problems. The RawLocalFileSystem pushes data to the OS (which may buffer in memory for a short period), when flush is called. However, RawLocalFileSystem does not offer a way to force data to disk. So with RawLocalFileSystem you can restart Accumulo processes w/o losing data. However, it the OS is restarted then data may be lost.