Hello Dieter
Thanks a lot for sharing your experience, this is truly appreciated.
I remember to have seen discussions on the OpenJDK mailing list about
MappedByteBuffer releasing resources only when garbage collected, and
also noticed that it allocated memory outside the JVM heap. It seems
that we had similar experience, I'm glad that you gave us confirmation.
I found MappedByteBuffer worth its weight when lot of access on a large
file happen at random locations, for example when performing a binary
search. But I think that those usages are rare. In the case of DataStore
where most access are expected to be sequential, I hope we can limit
ourself to ordinary ByteBuffer as much as possible...
Martin
Le 05/01/15 18:42, Dieter Stüken a écrit :
> Hello Martin & Marc,
>
> Since you mention MappedByteBuffer here are some notes on my experiences with
> memory mapped IO during the last decade:
>
> I heavily used mmap() back to 2001 to process GeoTIFF images and Shapefiles
> too using C++.
> Later on I switched to Java using NIO with MappedByteBuffer, realizing crazy
> fast processing tools.
>
> Unfortunately I also encountered some problems:
>
> 1) I got unexpected OutOfMemoryError and it took a long time for me to
> understand the source of this problem.
>
> It was not caused by missing Java heap space (-Xmx...). Instead the system
> was unable to allocate additional virtual address space beyond the heap space
> java itself already allocated. This occurred especially on 32bit systems.
> While Linux may assign up to 3GB virtual memory to a user process, stupid XP
> gets exhausted below 1GB (and you have to subtract the java heap space
> already allocated).
>
> Today we mainly use 64 bit systems, but I still observed sporadic OOM Errors
> even on 64Bit systems (but this was around 2009, so maybe this has gone with
> Java7/8 meanwhile).
>
> 2) In contrast to C there is no way to explicitly unmap() any
> MappedByteBuffer in Java.
>
> Even worse the associated file is kept open, which is a minor problem on Unix
> but raises major problems on Windows due to its stupid mandatory locking.
> (see http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4724038). The
> problem is, that the mapping and the file channel are not released until the
> garbage collector finally wipes the buffer. In addition you may run unto a
> "too many open file" problem if you are about to process many files using
> MMIO. (see:
> http://stackoverflow.com/questions/13204656/too-many-open-file-error-java-io-filenotfoundexception)
>
>
> My conclusion was to give up MappedByteBuffer to speed up IO. (I still use it
> rarely; i.e. modifying the colormap of a GeoTiff image on the fly...) Instead
> I switched back to plain ByteBuffers again, as you mentioned. But it may
> still be useful to use direct ByteBuffers. Those are allocated outside the
> Java heap space, just like MappedByteBuffer, but without locking any external
> file resource. This may still be problematic on 32bit systems, but I think
> running big data applications on 32bit is a bad idea anyway (and still using
> XP particularly!)
>
> Dieter.