[ 
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756174#comment-16756174
 ] 

Alan Protasio commented on AMQ-7080:
------------------------------------

Hi [~gtully] :D thanks for looking into that. :D

Here is my comments:
{quote}this is interesting. the async recovery will be expensive for sure, it 
may help to start that thread *after* normal recovery.
{quote}
That's true... we can start the recovery after the normal one... but it still 
hurt performance (dis read/write latency will increase).
{quote}on the free list map, this is a replacement for the db.free that uses 
less space. There is probably no need for db.free at all.
{quote}
Yeah.. i thought the same thing... I can do that.. The reason why I decided to 
keep the db.free is because i'm only writing db.map IF the recoveryFile is 
enable. That's why if the recovery file is not enabled, I will not know the 
"nextTransactionId" in a unclean shutdown and i'm not able to see if db.map is 
in sync to db.data. But yeah.. i can still save the db.map in a clean shutdown 
(and in this case i will know the nextTransactionId).
{quote}One thought, the bit per page is good, it is very compact. 
The sequence set is a little more heavy weight, being the actual page Ids, 
until there are large gaps in the free pages, then tracking 1-1000 as free is 
nice.
{quote}
Yeah.. and the main advantage i think is that i'm only writing the bytes that 
belongs to the pages modified. With the actual serialization I have to aways 
write the whole sequence set (so, instead of O(m) worst case - m = number os 
pages - we have o(n) where n = number of writes on this checkpoint and m always 
>> n)
{quote}I wonder, is it worth improving the sequence set in two ways:
1) having preallocated pages such that the pages are linear (this will avoid 
seeks around the page file)
2) having it keep track of modifications such that only modified pages (the 
contents of the sequence set are contained on pages) are written on a store.

In other words, I am wondering, would a better sequence set suffice? the 
sequence set is used is a few places that could benefit if that was the case.
{quote}
If I understood you correctly you are saying that instead of having a extra 
file (db.map) we could use a preallocate space in the db.data to store the same 
information, right?

If that's the case I though the same think. The problem is that db.data can 
grow indefinitely and because of that, I cannot know the size that i have to 
preallocate. Maybe we can preallocate a defined size and if the db.data grows 
bigger than that, we can allocate again 2x the size before and copy the data 
(and free the prev pages that were used). To do so I would have to keep in the 
metadata the page where i'm storing this information). 

The problem with that is that i will have to add writes in the batch and this 
seems odd:

[https://github.com/apache/activemq/blob/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/page/PageFile.java#L1145]

I using page files to store information about pages files and to do so i have 
do add writes in the batch without any transaction.. Seems a chicken and egg 
problem. I dont know if i could do it without a further hack.

Because of that I thought that the best solution was to have a separated memory 
space (db.map) to store this information.
{quote}The trade off here is an additional file sync on every batch write and 
checkpoint. If the sequenceSet can be improved the sync is on a single file.
{quote}
On my preliminary tests i could not see any significant performance hit. Its 
important to note that now i'm only write the information that changed (only 
pages that the type changed). Do you suggest any test to run?

> Keep track of free pages - Update db.free file during checkpoints
> -----------------------------------------------------------------
>
>                 Key: AMQ-7080
>                 URL: https://issues.apache.org/jira/browse/AMQ-7080
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: KahaDB
>    Affects Versions: 5.15.6
>            Reporter: Alan Protasio
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>             Fix For: 5.16.0
>
>         Attachments: AMQ-7080-freeList-update.diff
>
>
> In a event of an unclean shutdown, Activemq loses the information about the 
> free pages in the index. In order to recover this information, ActiveMQ read 
> the whole index during shutdown searching for free pages and then save the 
> db.free file. This operation can take a long time, making the failover 
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide 
> high availability such that if a broker is killed, another broker can take 
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
> seconds, any following shutdown will be unclean. This broker will stay in 
> this state unless the index is deleted (this state means that every failover 
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time 
> to 5 minutes, you fail over can take more than 5 minutes).
>  
> In order to prevent ActiveMQ reading the whole index file to search for free 
> pages, we can keep track of those on every Checkpoint. In order to do that we 
> need to be sure that db.data and db.free are in sync. To achieve that we can 
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the 
> db.free. If this is the case we can safely use the free page information 
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens 
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free 
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free 
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the  Pagefile#load the P1 will be free and then 
> the replay will mark P1 as occupied or will occupied another page (now that 
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing 
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i 
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>  
> This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to