Hi Mikhail,

Your right - the use case of many smaller datasets isn't best served by memory mapped mode.

The mode of operation currently has to be set very early on a per JVM basis and ideally to the JVMitself -Dtdb:fileMode=direct . This is because TDB reads the setting rather early - there is no fundamental reason for this and it could be done on a per dataset basis, it just isn't.

While the files show as 200MB that are sparse files. Linux will show 8M files with "ls -l" but the directory, to "du -sh" is 208K. Sparse files don't allocate all their space. OS/X seems to be difefrent - "du -sh" reports the sum of the file sizes, but they are still sparse files and don't consume all their disk space.

In theory, the index segment size is configurable (see SystemTDB.SegmentSize) but it isn't tested for in the test suite.

        Andy

On 12/09/11 18:13, Mikhail Sogrin wrote:
Hi,

With memory mapped TDB storage (default with 64-bit JVM), the initial size
of TDB store without any data at all is 200 MB, because most of index files
are 8 MB, and there's quite a number of them.
It may be a good number when loading big data sets, but is absolutely huge
if an user expects to load only a bit of data.

In comparison, direct file method (with 32-bit JVM) makes only 8 KB index
files resulting in only 200 KB usage for an empty database.

Is there a way to configure initial size of index files?
The only method I could think of was to set 'direct' method, create dataset,
close it, set method to 'mapped' and open dataset again. But it prints a
warning "System file mode already determined - setting it has no effect",
and yes, the second setting does not seem to have any effect.

Kind regards,
Mikhail

Reply via email to