[ 
https://issues.apache.org/jira/browse/AMQ-5905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644450#comment-14644450
 ] 

Tim Bain commented on AMQ-5905:
-------------------------------

Embarrassingly, I'm the author of two of those three but didn't remember ever 
having suggested the solution.  (And I didn't find any of them because I 
searched for "KahaDB" as the component but didn't search for "Message Store".)  
I'll mark this one and the later two as dupes of the earliest one (AMQ-3978).

> Allow KahaDB to perform compaction of sparsely-used data files
> --------------------------------------------------------------
>
>                 Key: AMQ-5905
>                 URL: https://issues.apache.org/jira/browse/AMQ-5905
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: KahaDB
>            Reporter: Tim Bain
>
> As currently implemented, KahaDB can only reduce data file usage by deleting 
> an entire data file.  As a result, the situation where KahaDB can reduce the 
> amount of space it uses on disk is when there are no old messages still in a 
> data file; if there is even one message in an old file that must be kept, the 
> entire file cannot be deleted.  And if one (deleted) message in the old file 
> has its deletion record in a later file, that later file must also be kept, 
> even if none of the messages in it are actually needed otherwise; as a 
> result, a single old message could keep alive a long chain of data files.
> The current advice that's been given is 1) don't keep messages for very long, 
> and 2) use small KahaDB files so that you'll be able to delete at least some 
> portions of what would have been a single large file that had to stick around 
> (and in the hopes that you'll get lucky and be able to break the chain of 
> kept files).  These are both workarounds (and not very good ones, 
> particularly since the entire concept of a DLQ is fundamentally opposed to 
> #1) for the fundamental flawed assumption in KahaDB: that it's reasonable for 
> its files to be read-only and for the database itself to be powerless to do 
> anything when files are sparsely populated by live messages.  The fundamental 
> paradigm of files being write-only for individual message deletion was a good 
> one and provides excellent performance characteristics; however, restricting 
> occasional maintenance tasks to the same paradigm handcuffs them unreasonably 
> and should be changed.
> The periodic cleanup task that already looks for files that are unused should 
> be changed so that if it determines that it cannot delete the file because it 
> contains at least one live message but it contains less than a configurable 
> percentage of live messages, it will rewrite the journal file in question so 
> it contains only those live messages into file, updating any in-memory 
> indices that might show the offsets of messages within the file (if there are 
> any such things).  If any in-memory data structures will need to be updated, 
> we need to appropriately synchronize to ensure that no one can use the 
> portions of the data structure related to the file currently being compacted; 
> access to similar information for all other data files can continue 
> unrestricted.
> Note that this will result in us still having potentially many individual 
> files, with each one having a much smaller file size than our target size.  
> If that is problematic, it would be possible to combine multiple partial 
> files together during the compaction process (while respecting the max file 
> size) instead of writing live messages back into their current file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to