The binary content manager (MappedByteBuffer or another one) implementation
should be improved.
I handle :
.dbf and soon their index .ndx (I haven't seen these files before, but maybe
they really exist ?)
.shp and their .shx files.
When a caller will send a request "SELECT * FROM ... WHERE <condition>" in
order to get some features, if we are lucky, we will be able to use the .ndx
file on dbase 3 side, and maybe the .shx file on shapefile side : keeping
the ability to do some random access would be useful.
The classes I have are currently able to read, but it will be really
convenient if they were able to read, update, insert and remove content (or
mark records as deleted) too : else, I have to implement a
Dbase3BinaryWriter near the DBase3BinaryReader, same for the Shapefile.
Marc.
-----Message d'origine-----
From: Martin Desruisseaux
Sent: Monday, January 05, 2015 8:47 PM
To: [email protected]
Subject: Re: AW: Tip (for future consideration) on Channel + Buffer use
Hello Dieter
Thanks a lot for sharing your experience, this is truly appreciated.
I remember to have seen discussions on the OpenJDK mailing list about
MappedByteBuffer releasing resources only when garbage collected, and
also noticed that it allocated memory outside the JVM heap. It seems
that we had similar experience, I'm glad that you gave us confirmation.
I found MappedByteBuffer worth its weight when lot of access on a large
file happen at random locations, for example when performing a binary
search. But I think that those usages are rare. In the case of DataStore
where most access are expected to be sequential, I hope we can limit
ourself to ordinary ByteBuffer as much as possible...
Martin
Le 05/01/15 18:42, Dieter Stüken a écrit :
Hello Martin & Marc,
Since you mention MappedByteBuffer here are some notes on my experiences
with memory mapped IO during the last decade:
I heavily used mmap() back to 2001 to process GeoTIFF images and
Shapefiles too using C++.
Later on I switched to Java using NIO with MappedByteBuffer, realizing
crazy fast processing tools.
Unfortunately I also encountered some problems:
1) I got unexpected OutOfMemoryError and it took a long time for me to
understand the source of this problem.
It was not caused by missing Java heap space (-Xmx...). Instead the system
was unable to allocate additional virtual address space beyond the heap
space java itself already allocated. This occurred especially on 32bit
systems. While Linux may assign up to 3GB virtual memory to a user
process, stupid XP gets exhausted below 1GB (and you have to subtract the
java heap space already allocated).
Today we mainly use 64 bit systems, but I still observed sporadic OOM
Errors even on 64Bit systems (but this was around 2009, so maybe this has
gone with Java7/8 meanwhile).
2) In contrast to C there is no way to explicitly unmap() any
MappedByteBuffer in Java.
Even worse the associated file is kept open, which is a minor problem on
Unix but raises major problems on Windows due to its stupid mandatory
locking. (see
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4724038). The problem
is, that the mapping and the file channel are not released until the
garbage collector finally wipes the buffer. In addition you may run unto a
"too many open file" problem if you are about to process many files using
MMIO. (see:
http://stackoverflow.com/questions/13204656/too-many-open-file-error-java-io-filenotfoundexception)
My conclusion was to give up MappedByteBuffer to speed up IO. (I still use
it rarely; i.e. modifying the colormap of a GeoTiff image on the fly...)
Instead I switched back to plain ByteBuffers again, as you mentioned. But
it may still be useful to use direct ByteBuffers. Those are allocated
outside the Java heap space, just like MappedByteBuffer, but without
locking any external file resource. This may still be problematic on 32bit
systems, but I think running big data applications on 32bit is a bad idea
anyway (and still using XP particularly!)
Dieter.