Hi,

it seems the segment store will inline any binary blob up to ~16KB in the tar 
files and not store them in the BlobStore [1]. The 16 KB limit 
(Segment.MEDIUM_LIMIT) is hardcoded and not configurable.

I can see this in action when debugging and when looking at an S3 datastore of 
a full Oak segment + s3 ds installation, where the smallest binaries in S3 are 
16 + something KB.

As Ian pointed out:

> This could bloat Tar files, impact memory mapping, and may be a major 
> consumer of RAM for TarMK mmap mode, but I don't know TarMK well enough to 
> know the logic behind doing that. The OS Disk cache is the correct place to 
> deal with any file over 1 block in size, especially if its accessed 
> sporadically.

I would agree on first sight. However, there might be good reasons for the 
current design and these concerns would not be true in practice. The same 
setting is essentially used for both STRING and BINARY properties - maybe it 
makes a lot of sense for Strings, but not so much for immutable binaries?

Could someone shed some light?

IIUC, it also makes the minRecordLength config [3] of the datastore(s) have no 
effect, since that should probably be rather low (default is 100 bytes), given 
it encodes the binary in the blob id itself. But since only binaries larger 
than 16KB will ever reach the blob store (for a segment store setup), all 
binaries will effectively always be larger than minRecordLength.

[1] 
https://github.com/apache/jackrabbit-oak/blob/58fdaf0dc0786f4cc9e39e7d26684fda04b32e78/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/DefaultSegmentWriter.java#L648
[2] 
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/Segment.java#L111-L118
[3] https://jackrabbit.apache.org/oak/docs/osgi_config.html

Cheers,
Alex

Reply via email to