Hello Martin & Marc, Since you mention MappedByteBuffer here are some notes on my experiences with memory mapped IO during the last decade:
I heavily used mmap() back to 2001 to process GeoTIFF images and Shapefiles too using C++. Later on I switched to Java using NIO with MappedByteBuffer, realizing crazy fast processing tools. Unfortunately I also encountered some problems: 1) I got unexpected OutOfMemoryError and it took a long time for me to understand the source of this problem. It was not caused by missing Java heap space (-Xmx...). Instead the system was unable to allocate additional virtual address space beyond the heap space java itself already allocated. This occurred especially on 32bit systems. While Linux may assign up to 3GB virtual memory to a user process, stupid XP gets exhausted below 1GB (and you have to subtract the java heap space already allocated). Today we mainly use 64 bit systems, but I still observed sporadic OOM Errors even on 64Bit systems (but this was around 2009, so maybe this has gone with Java7/8 meanwhile). 2) In contrast to C there is no way to explicitly unmap() any MappedByteBuffer in Java. Even worse the associated file is kept open, which is a minor problem on Unix but raises major problems on Windows due to its stupid mandatory locking. (see http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4724038). The problem is, that the mapping and the file channel are not released until the garbage collector finally wipes the buffer. In addition you may run unto a "too many open file" problem if you are about to process many files using MMIO. (see: http://stackoverflow.com/questions/13204656/too-many-open-file-error-java-io-filenotfoundexception) My conclusion was to give up MappedByteBuffer to speed up IO. (I still use it rarely; i.e. modifying the colormap of a GeoTiff image on the fly...) Instead I switched back to plain ByteBuffers again, as you mentioned. But it may still be useful to use direct ByteBuffers. Those are allocated outside the Java heap space, just like MappedByteBuffer, but without locking any external file resource. This may still be problematic on 32bit systems, but I think running big data applications on 32bit is a bad idea anyway (and still using XP particularly!) Dieter. -----Ursprüngliche Nachricht----- Von: Martin Desruisseaux [mailto:[email protected]] Gesendet: Montag, 5. Januar 2015 16:38 An: [email protected] Betreff: Tip (for future consideration) on Channel + Buffer use Hello Marc Just a tip for later (at your choice): since our reading of Shapefile data is (for now) essentially sequential, it would be nice to use a plain ByteBuffer instead than a MappedByteBuffer in order to use less OS resources and for avoiding to be restricted to File inputs (other inputs could be URL or entries in a ZIP file. The later is especially useful for implementing Web Services that can return only a single file). Using a plain java.nio.ByteBuffer is a little bit more difficult because we have to fill the buffer ourself from the java.nio.channels.ReadableByteChannel. To make this task easier, we have this internal class: storage/sis-storage/src/main/java/org/apache/sis/internal/storage/ChannelDataInput.java This class takes the supplied ByteBuffer and ReadableByteChannel, and provides convenience methods like readByte(), readDouble(), etc. which will handle automatically the task of transferring data from the channel to the buffer when needed. For the record, the reason why this class is not public is that it breaks encapsulation: the ByteBuffer and the ReadableByteChannel are exposed publicly. This is intentional since this class is designed only as a convenience for SIS implementations, who may want to switch between the convenience methods for some tasks and direct usage of the channel and buffer for other tasks. Martin
