[ https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Mueller updated OAK-6254: -------------------------------- Priority: Minor (was: Major) > DataStore: API to retrieve approximate storage size > --------------------------------------------------- > > Key: OAK-6254 > URL: https://issues.apache.org/jira/browse/OAK-6254 > Project: Jackrabbit Oak > Issue Type: Bug > Components: blob > Reporter: Thomas Mueller > Priority: Minor > > The estimated size of the datastore (on disk) is needed to: > * monitor growth over time, or growth of certain operations > * monitor if garbage collection is effective > * avoid out of disk space > * estimate backup size > * statistical purposes (for example, if there are many repositories, to group > them by size) > Datastore size: we could use the following heuristic: We could read the file > sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or > ./datastore/00 and multiply by 256. That would give a rough estimation > (within about 20% for repositories with datastore size > 50 GB). > I think this is mainly important for the FileDataStore. The S3 datastore, if > there is a simple and fast S3 API to read the size, then that would be good > as well, but if there is none, then returning "unknown" is fine for me. > As for the API, I would use something like this: {{long > getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for > inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very > slow). Similar to > [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)]. > I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds > for accuracyLevel 1, and possibly hours for level 9. With level 1, I would > read files in 00/00, with level 2 - 8 I would read files in 00, and with > level 9 I would read all the files. For level 1, I wouldn't stop; for level > 2, if it takes more than 5 seconds, I would stop and return the current best > estimate. -- This message was sent by Atlassian Jira (v8.3.4#803005)