I have found that if I load a Large Binary out of the DB and
xdmp:document-insert() it to a different uri in the DB, the Large Data usage
for the DB doesn't changes and the Large directory in the forest dir doesn't
change either. However, if I load the binary from off the file system then the
Large data usage grows with the size of the Large Binary files being
document-inserted. This leads me to believe that there is some optimization
going on that Large Binaries that originate from the the DB just have pointers
to some master record of the binary data itself. So if the file is in ten
places in the DB, even with different filenames, MarkLogic has pointers in all
of them back to a single binary file.
When I delete or change the name of one of these binaries it doesn't seem to
affect the others who have the same "parent." This seems to be very useful,
except for when I'm trying to generate a lot of Large Binary data for testing,
only to find out that it's all linked in under the covers and not a very good
test set. Hence, I have been loading the seed files off the filesystem to
prevent a linkage in order to generate a large test set from a small set of
binaries.
Is my understanding mostly correct?
-Rayn
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general