[ https://issues.apache.org/jira/browse/OAK-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291272#comment-16291272 ]
Chetan Mehrotra commented on OAK-7066: -------------------------------------- bq. I think adding a method "isInline" would be better +1 for such a method to Blob interface > Active deletion blob list files can grow too large due to inlined blobs > ----------------------------------------------------------------------- > > Key: OAK-7066 > URL: https://issues.apache.org/jira/browse/OAK-7066 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Reporter: Vikas Saurabh > Assignee: Vikas Saurabh > > This is follow up from OAK-7052 where we noticed that deleted blob list files > collected by active deletion logic can grow very large due to inlined blobs. > One potential way (not sure how yet though) is to not actively delete inlined > blobs. > Here are some stats which might help us take a call (based on raw numbers > collected at \[0]) > ||file-name||large_lines||large_size||small_lines||small_size||small_lines/total_lines||small_size/total_size|| > |blobs-1512664032264.txt|245301|3310224358|173096|35473656|0.413712335413495|0.010602766852107| > |blobs-1512698405656.txt|370373|4443957885|256775|52997864|0.409432861142824|0.011785275852845| > |blobs-1512987450004.txt|660669|6214740439|461168|92017554|0.411082893504137|0.014590309966251| > |blobs-1513130410963.txt|569083|5490965583|406756|80124598|0.416826956085994|0.014382211631264| > |blobs-1513216819447.txt|69876|1413561892|46238|9221956|0.398212101899857|0.006481628262061| > \[0]: > file sizes > {noformat} > repository/index/deleted-blobs$ ls -l blobs-151* > -rw-r--r-- 1 root root 3369065620 Dec 8 01:59 blobs-1512664032264.txt > -rw-r--r-- 1 root root 4532250073 Dec 9 01:59 blobs-1512698405656.txt > -rw-r--r-- 1 root root 6370201955 Dec 13 01:59 blobs-1512987450004.txt > -rw-r--r-- 1 root root 1916223582 Dec 13 11:52 blobs-1513130410963.txt > {noformat} > number of entries > {noformat} > repository/index/deleted-blobs$ wc -l blobs-151* > 418397 blobs-1512664032264.txt > 627148 blobs-1512698405656.txt > 1121837 blobs-1512987450004.txt > 308292 blobs-1513130410963.txt > 2475674 total > {noformat} > number of entries and sizes split on threshold of 500 bytes of blob ids > {noformat} > repository/index/deleted-blobs$ for i in blobs-151*;do echo $i;awk 'BEGIN > {FS="|"} {len = length($1); if (len > 500) {large++; largeSize+=len} else > {small++; smallSize+=len}} END {print large, largeSize, small, smallSize}' > $i;done > blobs-1512664032264.txt > 245301 3310224358 173096 35473656 > blobs-1512698405656.txt > 370373 4443957885 256775 52997864 > blobs-1512987450004.txt > 660669 6214740439 461168 92017554 > blobs-1513130410963.txt > 569083 5490965583 406756 80124598 > blobs-1513216819447.txt > 69876 1413561892 46238 9221956 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)