[ https://issues.apache.org/jira/browse/AMQ-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644450#comment-14644450 ]
Tim Bain commented on AMQ-5905: ------------------------------- Embarrassingly, I'm the author of two of those three but didn't remember ever having suggested the solution. (And I didn't find any of them because I searched for "KahaDB" as the component but didn't search for "Message Store".) I'll mark this one and the later two as dupes of the earliest one (AMQ-3978). > Allow KahaDB to perform compaction of sparsely-used data files > -------------------------------------------------------------- > > Key: AMQ-5905 > URL: https://issues.apache.org/jira/browse/AMQ-5905 > Project: ActiveMQ > Issue Type: Improvement > Components: KahaDB > Reporter: Tim Bain > > As currently implemented, KahaDB can only reduce data file usage by deleting > an entire data file. As a result, the situation where KahaDB can reduce the > amount of space it uses on disk is when there are no old messages still in a > data file; if there is even one message in an old file that must be kept, the > entire file cannot be deleted. And if one (deleted) message in the old file > has its deletion record in a later file, that later file must also be kept, > even if none of the messages in it are actually needed otherwise; as a > result, a single old message could keep alive a long chain of data files. > The current advice that's been given is 1) don't keep messages for very long, > and 2) use small KahaDB files so that you'll be able to delete at least some > portions of what would have been a single large file that had to stick around > (and in the hopes that you'll get lucky and be able to break the chain of > kept files). These are both workarounds (and not very good ones, > particularly since the entire concept of a DLQ is fundamentally opposed to > #1) for the fundamental flawed assumption in KahaDB: that it's reasonable for > its files to be read-only and for the database itself to be powerless to do > anything when files are sparsely populated by live messages. The fundamental > paradigm of files being write-only for individual message deletion was a good > one and provides excellent performance characteristics; however, restricting > occasional maintenance tasks to the same paradigm handcuffs them unreasonably > and should be changed. > The periodic cleanup task that already looks for files that are unused should > be changed so that if it determines that it cannot delete the file because it > contains at least one live message but it contains less than a configurable > percentage of live messages, it will rewrite the journal file in question so > it contains only those live messages into file, updating any in-memory > indices that might show the offsets of messages within the file (if there are > any such things). If any in-memory data structures will need to be updated, > we need to appropriately synchronize to ensure that no one can use the > portions of the data structure related to the file currently being compacted; > access to similar information for all other data files can continue > unrestricted. > Note that this will result in us still having potentially many individual > files, with each one having a much smaller file size than our target size. > If that is problematic, it would be possible to combine multiple partial > files together during the compaction process (while respecting the max file > size) instead of writing live messages back into their current file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)