[
https://issues.apache.org/jira/browse/ARTEMIS-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17212806#comment-17212806
]
Justin Bertram edited comment on ARTEMIS-2930 at 10/13/20, 3:16 AM:
--------------------------------------------------------------------
The first thing to note is that you're reading documentation from the 1.0.0
release. You can see the version in the URL, i.e.
[https://activemq.apache.org/components/artemis/documentation/*1.0.0*/ha.html|https://activemq.apache.org/components/artemis/documentation/1.0.0/ha.html].
I assume you're not actually using the 1.0.0 release. You should be reading to
the documentation that corresponds to the version you're using. You can always
find the latest documentation at
[https://activemq.apache.org/components/artemis/documentation/*latest*/|http://activemq.apache.org/components/artemis/documentation/latest/].
Next, you quote this bit of documentation:
bq. Replication will create a copy of the data at the backup. One issue to be
aware of is: in case of a successful fail-over, the backup's data will be newer
than the one at the live's storage. If you configure your live server to
perform a failback to live server when restarted, it will synchronize its data
with the backup's. If both servers are shutdown, the administrator will have to
determine which one has the latest data.
This is not really talking about "data loss." It is simply drawing a
distinction between the behavior of shared-store and replication. Hopefully I
can explain in a way you can understand...
When using shared storage the live and backup brokers always have direct access
to the most up-to-date journal because it's sitting on the shared storage
device. Therefore, when you restart a live broker after that live broker has
failed and the backup has started then that live broker can simply initiate a
fail-back and connect to the shared store to have the most up-to-date data.
However, when using replication the data has to be physically replicated
between brokers. Therefore, when you restart a live broker after that live
broker has failed and the backup has started then the live broker has to become
a backup to the existing live server, receive the replicated journal data, and
only then can it initiate fail-back.
If a live broker fails and the backup starts and then later the backup broker
fails before the original live broker is restarted then the an administrator
will have to inspect the broker's log files to determine which broker was alive
most recently because that's the broker that will have the most up-to-date
data. The broker with the most up-to-date data will need to be started _first_
so that it can become live and serve clients with the data they expect. If the
broker with stale data is started first and it becomes live and then the other
broker starts and becomes its backup the stale data will be replicated to the
backup, but (and here's a very important bit) the backup's original up-to-date
data *will not be lost*. It will be put into a special backup directory. This
is controlled by the {{max-saved-replicated-journals-size}} configuration
property discussed in the documentation.
As far as my previous explanation goes, that specific text is not in the
documentation although the general idea is. The whole point of "high
availability" in general and replication in particular is to *not lose
messages*. The documentation doesn't really dive into implementation details
because those details are subject to change even when the actual function
remains the same. Ultimately if you want confidence about how HA works you
should inspect the code-base to see how it works and then run experiments to
ensure it behaves the way you expect for the use-cases you care about.
was (Author: jbertram):
The first thing to note is that you're reading documentation from the 1.0.0
release. You can see the version in the URL, i.e.
https://activemq.apache.org/components/artemis/documentation/*1.0.0*/ha.html. I
assume you're not actually using the 1.0.0 release. You should be reading to
the documentation that corresponds to the version you're using. You can always
find the latest documentation at
http://activemq.apache.org/components/artemis/documentation/*latest*/.
Next, you quote this bit of documentation:
bq. Replication will create a copy of the data at the backup. One issue to be
aware of is: in case of a successful fail-over, the backup's data will be newer
than the one at the live's storage. If you configure your live server to
perform a failback to live server when restarted, it will synchronize its data
with the backup's. If both servers are shutdown, the administrator will have to
determine which one has the latest data.
This is not really talking about "data loss." It is simply drawing a
distinction between the behavior of shared-store and replication. Hopefully I
can explain in a way you can understand...
When using shared storage the live and backup brokers always have direct access
to the most up-to-date journal because it's sitting on the shared storage
device. Therefore, when you restart a live broker after that live broker has
failed and the backup has started then that live broker can simply initiate a
fail-back and connect to the shared store to have the most up-to-date data.
However, when using replication the data has to be physically replicated
between brokers. Therefore, when you restart a live broker after that live
broker has failed and the backup has started then the live broker has to become
a backup to the existing live server, receive the replicated journal data, and
only then can it initiate fail-back.
If a live broker fails and the backup starts and then later the backup broker
fails before the original live broker is restarted then the an administrator
will have to inspect the broker's log files to determine which broker was alive
most recently because that's the broker that will have the most up-to-date
data. The broker with the most up-to-date data will need to be started _first_
so that it can become live and serve clients with the data they expect. If the
broker with stale data is started first and it becomes live and then the other
broker starts and becomes its backup the stale data will be replicated to the
backup, but (and here's a very important bit) the backup's original up-to-date
data *will not be lost*. It will be put into a special backup directory. This
is controlled by the {{max-saved-replicated-journals-size}} configuration
property discussed in the documentation.
As far as my previous explanation goes, that specific text is not in the
documentation although the general idea is. The whole point of "high
availability" in general and replication in particular is to *not lose
messages*. The documentation doesn't really dive into implementation details
because those details are subject to change even when the actual function
remains the same. Ultimately if you want confidence about how HA works you
should inspect the code-base to see how it works and then run experiments to
ensure it behaves the way you expect for the use-cases you care about.
> Artemis HA with Replication strategy, has always issue of data loss
> --------------------------------------------------------------------
>
> Key: ARTEMIS-2930
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2930
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Reporter: Karan Aggarwal
> Priority: Major
>
> In the documentation I read that the HA with replication strategy, the slave
> node keeps polling the new data at a regular interval.
> So, there is 100% chance that delta messages are lost if the master server is
> down.
>
> How to overcome this issue and ensure that there is no data loss in any
> condition while using HA replication strategy.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)