Re: Tip (for future consideration) on Channel + Buffer use

Marc Le Bihan Mon, 05 Jan 2015 22:27:07 -0800

The binary content manager (MappedByteBuffer or another one) implementationshould be improved.


I handle :

.dbf and soon their index .ndx (I haven't seen these files before, but maybethey really exist ?)

.shp and their  .shx files.

When a caller will send a request "SELECT * FROM ... WHERE <condition>" inorder to get some features, if we are lucky, we will be able to use the .ndxfile on dbase 3 side, and maybe the .shx file on shapefile side : keepingthe ability to do some random access would be useful.

The classes I have are currently able to read, but it will be reallyconvenient if they were able to read, update, insert and remove content (ormark records as deleted) too : else, I have to implement aDbase3BinaryWriter near the DBase3BinaryReader, same for the Shapefile.


Marc.

-----Message d'origine-----From: Martin Desruisseaux

Sent: Monday, January 05, 2015 8:47 PM
To: [email protected]
Subject: Re: AW: Tip (for future consideration) on Channel + Buffer use

Hello Dieter

Thanks a lot for sharing your experience, this is truly appreciated.

I remember to have seen discussions on the OpenJDK mailing list about
MappedByteBuffer releasing resources only when garbage collected, and
also noticed that it allocated memory outside the JVM heap. It seems
that we had similar experience, I'm glad that you gave us confirmation.

I found MappedByteBuffer worth its weight when lot of access on a large
file happen at random locations, for example when performing a binary
search. But I think that those usages are rare. In the case of DataStore
where most access are expected to be sequential, I hope we can limit
ourself to ordinary ByteBuffer as much as possible...

   Martin

Le 05/01/15 18:42, Dieter Stüken a écrit :

Hello Martin & Marc,
Since you mention MappedByteBuffer here are some notes on my experienceswith memory mapped IO during the last decade:
I heavily used mmap() back to 2001 to process GeoTIFF images andShapefiles too using C++.Later on I switched to Java using NIO with MappedByteBuffer, realizingcrazy fast processing tools.
Unfortunately I also encountered some problems:
1) I got unexpected OutOfMemoryError and it took a long time for me tounderstand the source of this problem.
It was not caused by missing Java heap space (-Xmx...). Instead the systemwas unable to allocate additional virtual address space beyond the heapspace java itself already allocated. This occurred especially on 32bitsystems. While Linux may assign up to 3GB virtual memory to a userprocess, stupid XP gets exhausted below 1GB (and you have to subtract thejava heap space already allocated).
Today we mainly use 64 bit systems, but I still observed sporadic OOMErrors even on 64Bit systems (but this was around 2009, so maybe this hasgone with Java7/8 meanwhile).
2) In contrast to C there is no way to explicitly unmap() anyMappedByteBuffer in Java.
Even worse the associated file is kept open, which is a minor problem onUnix but raises major problems on Windows due to its stupid mandatorylocking. (seehttp://bugs.java.com/bugdatabase/view_bug.do?bug_id=4724038). The problemis, that the mapping and the file channel are not released until thegarbage collector finally wipes the buffer. In addition you may run unto a"too many open file" problem if you are about to process many files usingMMIO. (see:http://stackoverflow.com/questions/13204656/too-many-open-file-error-java-io-filenotfoundexception)
My conclusion was to give up MappedByteBuffer to speed up IO. (I still useit rarely; i.e. modifying the colormap of a GeoTiff image on the fly...)Instead I switched back to plain ByteBuffers again, as you mentioned. Butit may still be useful to use direct ByteBuffers. Those are allocatedoutside the Java heap space, just like MappedByteBuffer, but withoutlocking any external file resource. This may still be problematic on 32bitsystems, but I think running big data applications on 32bit is a bad ideaanyway (and still using XP particularly!)
Dieter.

Re: Tip (for future consideration) on Channel + Buffer use

Reply via email to