[jira] [Updated] (AMQ-3978) Allow KahaDB to "compact" journal files to remove messages that are no longer needed

Tim Bain (JIRA) Tue, 28 Jul 2015 07:34:35 -0700

     [ 
https://issues.apache.org/jira/browse/AMQ-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tim Bain updated AMQ-3978:
--------------------------
    Description: 
KahaDB uses a write-only journaling approach that ensures that a journal file 
will be deleted only when no content within it is still in use.  If a single 
byte of the file is still needed, the entire file must be kept, even if the 
rest of the file is not needed.

This works fine when all messages are immediately removed from destinations 
within an ActiveMQ broker, but it fails ungracefully when messages that are 
consumed infrequently (or not at all, relying on TTL to delete the messages) 
are interspersed with large volumes of messages that are consumed quickly.  In 
this scenario, if a single infrequently-consumed message ends up in a journal 
file with a large number of quickly-consumed messages, the entire file will be 
kept even though nearly all of the content of the file is no longer needed.  
When this happens for enough journal files, the KahaDB's disk/file limits are 
reached, even though the amount of actual "live" data within the KahaDB is far 
below the configured limits.

To fix this, the periodic cleanup task that already looks for files that are 
unused should be changed so that if it determines that it cannot delete the 
file because it contains at least one live message but it contains less than a 
configurable percentage of live messages, the task will rewrite the journal 
file in question so it contains only those live messages into file, updating 
any in-memory indices that might show the offsets of messages within the file 
(if there are any such things). If any in-memory data structures will need to 
be updated, we need to appropriately synchronize to ensure that no one can use 
the portions of the data structure related to the file currently being 
compacted; access to similar information for all other data files can continue 
unrestricted.

Note that this will result in us still having potentially many individual 
files, with each one having a much smaller file size than our target size. If 
that is problematic, it would be possible to combine multiple partial files 
together during the compaction process (while respecting the max file size) 
instead of writing live messages back into their current file.

  was:
KahaDB uses a write-only journaling approach that ensures that a journal file 
will be deleted only when no content within it is still in use.  If a single 
byte of the file is still needed, the entire file must be kept, even if the 
rest of the file is not needed.

This works fine when all messages are immediately removed from destinations 
within an ActiveMQ broker, but it fails ungracefully when messages that are 
consumed infrequently (or not at all, relying on TTL to delete the messages) 
are interspersed with large volumes of messages that are consumed quickly.  In 
this scenario, if a single infrequently-consumed message ends up in a journal 
file with a large number of quickly-consumed messages, the entire file will be 
kept even though nearly all of the content of the file is no longer needed.  
When this happens for enough journal files, the KahaDB's disk/file limits are 
reached, even though the amount of actual "live" data within the KahaDB is far 
below the configured limits.

To fix this problem, an administrative feature needs to be added to KahaDB, to 
allow it to "compact" its journal files by scanning through all files in order, 
writing all messages that are still alive into a set of new journal files.  
Once all messages from one of the old journal files have been written to the 
new set of journal files, the old journal file can be deleted.  Without knowing 
the codebase for KahaDB, I would assume this would have to be done while the 
KahaDB processes are blocking reads and writes, so presumably the 
recommendation in the documentation will be to use this feature sparingly to 
prevent performance degradation.


> Allow KahaDB to "compact" journal files to remove messages that are no longer 
> needed
> ------------------------------------------------------------------------------------
>
>                 Key: AMQ-3978
>                 URL: https://issues.apache.org/jira/browse/AMQ-3978
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: Message Store
>    Affects Versions: 5.6.0
>            Reporter: Tim Bain
>            Priority: Minor
>
> KahaDB uses a write-only journaling approach that ensures that a journal file 
> will be deleted only when no content within it is still in use.  If a single 
> byte of the file is still needed, the entire file must be kept, even if the 
> rest of the file is not needed.
> This works fine when all messages are immediately removed from destinations 
> within an ActiveMQ broker, but it fails ungracefully when messages that are 
> consumed infrequently (or not at all, relying on TTL to delete the messages) 
> are interspersed with large volumes of messages that are consumed quickly.  
> In this scenario, if a single infrequently-consumed message ends up in a 
> journal file with a large number of quickly-consumed messages, the entire 
> file will be kept even though nearly all of the content of the file is no 
> longer needed.  When this happens for enough journal files, the KahaDB's 
> disk/file limits are reached, even though the amount of actual "live" data 
> within the KahaDB is far below the configured limits.
> To fix this, the periodic cleanup task that already looks for files that are 
> unused should be changed so that if it determines that it cannot delete the 
> file because it contains at least one live message but it contains less than 
> a configurable percentage of live messages, the task will rewrite the journal 
> file in question so it contains only those live messages into file, updating 
> any in-memory indices that might show the offsets of messages within the file 
> (if there are any such things). If any in-memory data structures will need to 
> be updated, we need to appropriately synchronize to ensure that no one can 
> use the portions of the data structure related to the file currently being 
> compacted; access to similar information for all other data files can 
> continue unrestricted.
> Note that this will result in us still having potentially many individual 
> files, with each one having a much smaller file size than our target size. If 
> that is problematic, it would be possible to combine multiple partial files 
> together during the compaction process (while respecting the max file size) 
> instead of writing live messages back into their current file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AMQ-3978) Allow KahaDB to "compact" journal files to remove messages that are no longer needed

Reply via email to