Re: Optimal backup strategy

2019-12-03 Thread Hossein Ghiyasi Mehr
I am sorry! This is true. I forgot "*not*"!
1. It's *not* recommended to use commit log after one node failure.
Cassandra has many options such as replication factor as
substitute solution.

*VafaTech.com - A Total Solution for Data Gathering & Analysis*


On Tue, Dec 3, 2019 at 10:42 AM Adarsh Kumar  wrote:

> Thanks Hossein,
>
> Just one more question is there any special SOP or consideration we have
> to take for multi-site backup.
>
> Please share any helpful link, blog or steps documented.
>
> Regards,
> Adarsh Kumar
>
> On Sun, Dec 1, 2019 at 10:40 PM Hossein Ghiyasi Mehr <
> ghiyasim...@gmail.com> wrote:
>
>> 1. It's recommended to use commit log after one node failure. Cassandra
>> has many options such as replication factor as substitute solution.
>> 2. Yes, right.
>>
>> *VafaTech.com - A Total Solution for Data Gathering & Analysis*
>>
>>
>> On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar 
>> wrote:
>>
>>> Thanks Ahu and Hussein,
>>>
>>> So my understanding is:
>>>
>>>1. Commit log backup is not documented for Apache Cassandra, hence
>>>not standard. But can be used for restore on the same machine (For taking
>>>backup from commit_log_dir). If used on other machine(s) has to be in the
>>>same topology. Can it be used for replacement node?
>>>2. For periodic backup Snapshot+Incremental backup is the best option
>>>
>>>
>>> Thanks,
>>> Adarsh Kumar
>>>
>>> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell 
>>> wrote:
>>>
 Hossein is right , But for use , we restore to the same cassandra
 topology ,So it is usable to do replay .But when restore to the
 same machine it is also usable .
 Using sstableloader cost too much time and more storage(though will
 reduce after  restored)

 Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:

> commitlog backup isn't usable in another machine.
> Backup solution depends on what you want to do: periodic backup or
> backup to restore on other machine?
> Periodic backup is combine of snapshot and incremental backup. Remove
> incremental backup after new snapshot.
> Take backup to restore on other machine: You can use snapshot after
> flushing memtable or Use sstableloader.
>
>
> 
> VafaTech.com - A Total Solution for Data Gathering & Analysis
>
> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell 
> wrote:
>
>> for cassandra or datastax's documentation, commitlog's backup is not
>> mentioned.
>> only snapshot and incremental backup is described to do backup .
>>
>> Though commitlog's archive for keyspace/table is not support but
>> commitlog' replay (though you must put log to commitlog_dir and restart 
>> the
>> process)
>> support the feature of keyspace/table' replay filter (using
>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format 
>> to
>> replay the specified keyspace/table)
>>
>> Snapshot do affect the storage, for us we got snapshot one week a
>> time under the low business peak and making snapshot got throttle ,for 
>> you
>> you may
>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>>
>>
>>
>> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>>
>>> Thanks Guo and Eric for replying,
>>>
>>> I have some confusions about commit log backup:
>>>
>>>1. commit log archival technique is (
>>>
>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>>) as good as an incremental backup, as it also captures commit logs 
>>> after
>>>memtable flush.
>>>2. If we go for "Snapshot + Incremental bk + Commit log", here
>>>we have to take commit log from commit log directory (is there any 
>>> SOP for
>>>this?). As commit logs are not per table or ks, we will have 
>>> chalange in
>>>restoring selective tables.
>>>3. Snapshot based backups are easy to manage and operate due to
>>>its simplicity. But they are heavy on storage. Any views on this?
>>>4. Please share any successful strategy that someone is using
>>>for production. We are still in the design phase and want to 
>>> implement the
>>>best solution.
>>>
>>> Thanks Eric for sharing link for medusa.
>>>
>>> Regards,
>>> Adarsh Kumar
>>>
>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
>>> wrote:
>>>
 For me, I think the last one :
  Snapshot + Incremental + commitlog
 is the most meaningful way to do backup and restore, when you make
 the data backup to some where else like AWS S3.

- Snapshot based backup // for incremental data will not be
backuped and may lose data when restore to the time latter than 
 snapshot
time;
- Incremental backups // better 

Re: Optimal backup strategy

2019-12-02 Thread Adarsh Kumar
Thanks Hossein,

Just one more question is there any special SOP or consideration we have to
take for multi-site backup.

Please share any helpful link, blog or steps documented.

Regards,
Adarsh Kumar

On Sun, Dec 1, 2019 at 10:40 PM Hossein Ghiyasi Mehr 
wrote:

> 1. It's recommended to use commit log after one node failure. Cassandra
> has many options such as replication factor as substitute solution.
> 2. Yes, right.
>
> *VafaTech.com - A Total Solution for Data Gathering & Analysis*
>
>
> On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar  wrote:
>
>> Thanks Ahu and Hussein,
>>
>> So my understanding is:
>>
>>1. Commit log backup is not documented for Apache Cassandra, hence
>>not standard. But can be used for restore on the same machine (For taking
>>backup from commit_log_dir). If used on other machine(s) has to be in the
>>same topology. Can it be used for replacement node?
>>2. For periodic backup Snapshot+Incremental backup is the best option
>>
>>
>> Thanks,
>> Adarsh Kumar
>>
>> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell  wrote:
>>
>>> Hossein is right , But for use , we restore to the same cassandra
>>> topology ,So it is usable to do replay .But when restore to the
>>> same machine it is also usable .
>>> Using sstableloader cost too much time and more storage(though will
>>> reduce after  restored)
>>>
>>> Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:
>>>
 commitlog backup isn't usable in another machine.
 Backup solution depends on what you want to do: periodic backup or
 backup to restore on other machine?
 Periodic backup is combine of snapshot and incremental backup. Remove
 incremental backup after new snapshot.
 Take backup to restore on other machine: You can use snapshot after
 flushing memtable or Use sstableloader.


 
 VafaTech.com - A Total Solution for Data Gathering & Analysis

 On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell 
 wrote:

> for cassandra or datastax's documentation, commitlog's backup is not
> mentioned.
> only snapshot and incremental backup is described to do backup .
>
> Though commitlog's archive for keyspace/table is not support but
> commitlog' replay (though you must put log to commitlog_dir and restart 
> the
> process)
> support the feature of keyspace/table' replay filter (using
> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format 
> to
> replay the specified keyspace/table)
>
> Snapshot do affect the storage, for us we got snapshot one week a time
> under the low business peak and making snapshot got throttle ,for you you
> may
> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>
>
>
> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>
>> Thanks Guo and Eric for replying,
>>
>> I have some confusions about commit log backup:
>>
>>1. commit log archival technique is (
>>
>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>) as good as an incremental backup, as it also captures commit logs 
>> after
>>memtable flush.
>>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>>have to take commit log from commit log directory (is there any SOP 
>> for
>>this?). As commit logs are not per table or ks, we will have chalange 
>> in
>>restoring selective tables.
>>3. Snapshot based backups are easy to manage and operate due to
>>its simplicity. But they are heavy on storage. Any views on this?
>>4. Please share any successful strategy that someone is using for
>>production. We are still in the design phase and want to implement 
>> the best
>>solution.
>>
>> Thanks Eric for sharing link for medusa.
>>
>> Regards,
>> Adarsh Kumar
>>
>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
>> wrote:
>>
>>> For me, I think the last one :
>>>  Snapshot + Incremental + commitlog
>>> is the most meaningful way to do backup and restore, when you make
>>> the data backup to some where else like AWS S3.
>>>
>>>- Snapshot based backup // for incremental data will not be
>>>backuped and may lose data when restore to the time latter than 
>>> snapshot
>>>time;
>>>- Incremental backups // better than snapshot backup .but
>>>with Insufficient data accuracy. For data remain in the memtable 
>>> will be
>>>lose;
>>>- Snapshot + incremental
>>>- Snapshot + commitlog archival // better data precision than
>>>made incremental backup, but the data in the non archived 
>>> commitlog(not
>>>archive and commitlog log not closed) will not restore and will 
>>> lose. Also
>>>when log is too much, do log reply 

Re: Optimal backup strategy

2019-12-01 Thread Hossein Ghiyasi Mehr
1. It's recommended to use commit log after one node failure. Cassandra has
many options such as replication factor as substitute solution.
2. Yes, right.

*VafaTech.com - A Total Solution for Data Gathering & Analysis*


On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar  wrote:

> Thanks Ahu and Hussein,
>
> So my understanding is:
>
>1. Commit log backup is not documented for Apache Cassandra, hence not
>standard. But can be used for restore on the same machine (For taking
>backup from commit_log_dir). If used on other machine(s) has to be in the
>same topology. Can it be used for replacement node?
>2. For periodic backup Snapshot+Incremental backup is the best option
>
>
> Thanks,
> Adarsh Kumar
>
> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell  wrote:
>
>> Hossein is right , But for use , we restore to the same cassandra
>> topology ,So it is usable to do replay .But when restore to the
>> same machine it is also usable .
>> Using sstableloader cost too much time and more storage(though will
>> reduce after  restored)
>>
>> Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:
>>
>>> commitlog backup isn't usable in another machine.
>>> Backup solution depends on what you want to do: periodic backup or
>>> backup to restore on other machine?
>>> Periodic backup is combine of snapshot and incremental backup. Remove
>>> incremental backup after new snapshot.
>>> Take backup to restore on other machine: You can use snapshot after
>>> flushing memtable or Use sstableloader.
>>>
>>>
>>> 
>>> VafaTech.com - A Total Solution for Data Gathering & Analysis
>>>
>>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell 
>>> wrote:
>>>
 for cassandra or datastax's documentation, commitlog's backup is not
 mentioned.
 only snapshot and incremental backup is described to do backup .

 Though commitlog's archive for keyspace/table is not support but
 commitlog' replay (though you must put log to commitlog_dir and restart the
 process)
 support the feature of keyspace/table' replay filter (using
 -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
 replay the specified keyspace/table)

 Snapshot do affect the storage, for us we got snapshot one week a time
 under the low business peak and making snapshot got throttle ,for you you
 may
 see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)



 Adarsh Kumar  于2019年11月28日周四 上午1:00写道:

> Thanks Guo and Eric for replying,
>
> I have some confusions about commit log backup:
>
>1. commit log archival technique is (
>
> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>) as good as an incremental backup, as it also captures commit logs 
> after
>memtable flush.
>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>have to take commit log from commit log directory (is there any SOP for
>this?). As commit logs are not per table or ks, we will have chalange 
> in
>restoring selective tables.
>3. Snapshot based backups are easy to manage and operate due to
>its simplicity. But they are heavy on storage. Any views on this?
>4. Please share any successful strategy that someone is using for
>production. We are still in the design phase and want to implement the 
> best
>solution.
>
> Thanks Eric for sharing link for medusa.
>
> Regards,
> Adarsh Kumar
>
> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
> wrote:
>
>> For me, I think the last one :
>>  Snapshot + Incremental + commitlog
>> is the most meaningful way to do backup and restore, when you make
>> the data backup to some where else like AWS S3.
>>
>>- Snapshot based backup // for incremental data will not be
>>backuped and may lose data when restore to the time latter than 
>> snapshot
>>time;
>>- Incremental backups // better than snapshot backup .but
>>with Insufficient data accuracy. For data remain in the memtable will 
>> be
>>lose;
>>- Snapshot + incremental
>>- Snapshot + commitlog archival // better data precision than
>>made incremental backup, but the data in the non archived 
>> commitlog(not
>>archive and commitlog log not closed) will not restore and will lose. 
>> Also
>>when log is too much, do log reply will cost very mucu time
>>
>> For me ,We use snapshot + incremental + commitlog archive. We read
>> snapshot data and incremental data .Also the log is backuped .But we will
>> not backup the
>> log whose data have been flush to sstable ,for the data will be
>> backuped by the way we do incremental backup .
>>
>> This way , the data will exist in the format of sstable trough

Re: Optimal backup strategy

2019-11-28 Thread guo Maxwell
Same topology means the restore node should got the same tokes with the
backup nodes ;
ex : backup
   node1(1/2/3/4/5) node2(6/7/8/9/10)
restore :
  nodea(1/2/3/4/5) nodeb(6/7/8/9/10)
so node1's commitlog can be replay on nodea .

Adarsh Kumar  于2019年11月29日周五 下午2:03写道:

> Thanks Ahu and Hussein,
>
> So my understanding is:
>
>1. Commit log backup is not documented for Apache Cassandra, hence not
>standard. But can be used for restore on the same machine (For taking
>backup from commit_log_dir). If used on other machine(s) has to be in the
>same topology. Can it be used for replacement node?
>2. For periodic backup Snapshot+Incremental backup is the best option
>
>
> Thanks,
> Adarsh Kumar
>
> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell  wrote:
>
>> Hossein is right , But for use , we restore to the same cassandra
>> topology ,So it is usable to do replay .But when restore to the
>> same machine it is also usable .
>> Using sstableloader cost too much time and more storage(though will
>> reduce after  restored)
>>
>> Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:
>>
>>> commitlog backup isn't usable in another machine.
>>> Backup solution depends on what you want to do: periodic backup or
>>> backup to restore on other machine?
>>> Periodic backup is combine of snapshot and incremental backup. Remove
>>> incremental backup after new snapshot.
>>> Take backup to restore on other machine: You can use snapshot after
>>> flushing memtable or Use sstableloader.
>>>
>>>
>>> 
>>> VafaTech.com - A Total Solution for Data Gathering & Analysis
>>>
>>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell 
>>> wrote:
>>>
 for cassandra or datastax's documentation, commitlog's backup is not
 mentioned.
 only snapshot and incremental backup is described to do backup .

 Though commitlog's archive for keyspace/table is not support but
 commitlog' replay (though you must put log to commitlog_dir and restart the
 process)
 support the feature of keyspace/table' replay filter (using
 -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
 replay the specified keyspace/table)

 Snapshot do affect the storage, for us we got snapshot one week a time
 under the low business peak and making snapshot got throttle ,for you you
 may
 see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)



 Adarsh Kumar  于2019年11月28日周四 上午1:00写道:

> Thanks Guo and Eric for replying,
>
> I have some confusions about commit log backup:
>
>1. commit log archival technique is (
>
> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>) as good as an incremental backup, as it also captures commit logs 
> after
>memtable flush.
>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>have to take commit log from commit log directory (is there any SOP for
>this?). As commit logs are not per table or ks, we will have chalange 
> in
>restoring selective tables.
>3. Snapshot based backups are easy to manage and operate due to
>its simplicity. But they are heavy on storage. Any views on this?
>4. Please share any successful strategy that someone is using for
>production. We are still in the design phase and want to implement the 
> best
>solution.
>
> Thanks Eric for sharing link for medusa.
>
> Regards,
> Adarsh Kumar
>
> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
> wrote:
>
>> For me, I think the last one :
>>  Snapshot + Incremental + commitlog
>> is the most meaningful way to do backup and restore, when you make
>> the data backup to some where else like AWS S3.
>>
>>- Snapshot based backup // for incremental data will not be
>>backuped and may lose data when restore to the time latter than 
>> snapshot
>>time;
>>- Incremental backups // better than snapshot backup .but
>>with Insufficient data accuracy. For data remain in the memtable will 
>> be
>>lose;
>>- Snapshot + incremental
>>- Snapshot + commitlog archival // better data precision than
>>made incremental backup, but the data in the non archived 
>> commitlog(not
>>archive and commitlog log not closed) will not restore and will lose. 
>> Also
>>when log is too much, do log reply will cost very mucu time
>>
>> For me ,We use snapshot + incremental + commitlog archive. We read
>> snapshot data and incremental data .Also the log is backuped .But we will
>> not backup the
>> log whose data have been flush to sstable ,for the data will be
>> backuped by the way we do incremental backup .
>>
>> This way , the data will exist in the format of sstable trough
>> 

Re: Optimal backup strategy

2019-11-28 Thread Adarsh Kumar
Thanks Ahu and Hussein,

So my understanding is:

   1. Commit log backup is not documented for Apache Cassandra, hence not
   standard. But can be used for restore on the same machine (For taking
   backup from commit_log_dir). If used on other machine(s) has to be in the
   same topology. Can it be used for replacement node?
   2. For periodic backup Snapshot+Incremental backup is the best option


Thanks,
Adarsh Kumar

On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell  wrote:

> Hossein is right , But for use , we restore to the same cassandra topology
> ,So it is usable to do replay .But when restore to the
> same machine it is also usable .
> Using sstableloader cost too much time and more storage(though will reduce
> after  restored)
>
> Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:
>
>> commitlog backup isn't usable in another machine.
>> Backup solution depends on what you want to do: periodic backup or backup
>> to restore on other machine?
>> Periodic backup is combine of snapshot and incremental backup. Remove
>> incremental backup after new snapshot.
>> Take backup to restore on other machine: You can use snapshot after
>> flushing memtable or Use sstableloader.
>>
>>
>> 
>> VafaTech.com - A Total Solution for Data Gathering & Analysis
>>
>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell  wrote:
>>
>>> for cassandra or datastax's documentation, commitlog's backup is not
>>> mentioned.
>>> only snapshot and incremental backup is described to do backup .
>>>
>>> Though commitlog's archive for keyspace/table is not support but
>>> commitlog' replay (though you must put log to commitlog_dir and restart the
>>> process)
>>> support the feature of keyspace/table' replay filter (using
>>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
>>> replay the specified keyspace/table)
>>>
>>> Snapshot do affect the storage, for us we got snapshot one week a time
>>> under the low business peak and making snapshot got throttle ,for you you
>>> may
>>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>>>
>>>
>>>
>>> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>>>
 Thanks Guo and Eric for replying,

 I have some confusions about commit log backup:

1. commit log archival technique is (

 https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
) as good as an incremental backup, as it also captures commit logs 
 after
memtable flush.
2. If we go for "Snapshot + Incremental bk + Commit log", here we
have to take commit log from commit log directory (is there any SOP for
this?). As commit logs are not per table or ks, we will have chalange in
restoring selective tables.
3. Snapshot based backups are easy to manage and operate due to its
simplicity. But they are heavy on storage. Any views on this?
4. Please share any successful strategy that someone is using for
production. We are still in the design phase and want to implement the 
 best
solution.

 Thanks Eric for sharing link for medusa.

 Regards,
 Adarsh Kumar

 On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
 wrote:

> For me, I think the last one :
>  Snapshot + Incremental + commitlog
> is the most meaningful way to do backup and restore, when you make the
> data backup to some where else like AWS S3.
>
>- Snapshot based backup // for incremental data will not be
>backuped and may lose data when restore to the time latter than 
> snapshot
>time;
>- Incremental backups // better than snapshot backup .but
>with Insufficient data accuracy. For data remain in the memtable will 
> be
>lose;
>- Snapshot + incremental
>- Snapshot + commitlog archival // better data precision than made
>incremental backup, but the data in the non archived commitlog(not 
> archive
>and commitlog log not closed) will not restore and will lose. Also 
> when log
>is too much, do log reply will cost very mucu time
>
> For me ,We use snapshot + incremental + commitlog archive. We read
> snapshot data and incremental data .Also the log is backuped .But we will
> not backup the
> log whose data have been flush to sstable ,for the data will be
> backuped by the way we do incremental backup .
>
> This way , the data will exist in the format of sstable trough
> snapshot backup and incremental backup . The log number will be very small
> .And log replay will not cost much time.
>
>
>
> Eric LELEU  于2019年11月27日周三 下午4:13写道:
>
>> Hi,
>> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>>
>> See :
>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>>

Re: Optimal backup strategy

2019-11-28 Thread guo Maxwell
Hossein is right , But for use , we restore to the same cassandra topology
,So it is usable to do replay .But when restore to the
same machine it is also usable .
Using sstableloader cost too much time and more storage(though will reduce
after  restored)

Hossein Ghiyasi Mehr  于2019年11月28日周四 下午7:40写道:

> commitlog backup isn't usable in another machine.
> Backup solution depends on what you want to do: periodic backup or backup
> to restore on other machine?
> Periodic backup is combine of snapshot and incremental backup. Remove
> incremental backup after new snapshot.
> Take backup to restore on other machine: You can use snapshot after
> flushing memtable or Use sstableloader.
>
>
> 
> VafaTech.com - A Total Solution for Data Gathering & Analysis
>
> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell  wrote:
>
>> for cassandra or datastax's documentation, commitlog's backup is not
>> mentioned.
>> only snapshot and incremental backup is described to do backup .
>>
>> Though commitlog's archive for keyspace/table is not support but
>> commitlog' replay (though you must put log to commitlog_dir and restart the
>> process)
>> support the feature of keyspace/table' replay filter (using
>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
>> replay the specified keyspace/table)
>>
>> Snapshot do affect the storage, for us we got snapshot one week a time
>> under the low business peak and making snapshot got throttle ,for you you
>> may
>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>>
>>
>>
>> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>>
>>> Thanks Guo and Eric for replying,
>>>
>>> I have some confusions about commit log backup:
>>>
>>>1. commit log archival technique is (
>>>
>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>>) as good as an incremental backup, as it also captures commit logs after
>>>memtable flush.
>>>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>>>have to take commit log from commit log directory (is there any SOP for
>>>this?). As commit logs are not per table or ks, we will have chalange in
>>>restoring selective tables.
>>>3. Snapshot based backups are easy to manage and operate due to its
>>>simplicity. But they are heavy on storage. Any views on this?
>>>4. Please share any successful strategy that someone is using for
>>>production. We are still in the design phase and want to implement the 
>>> best
>>>solution.
>>>
>>> Thanks Eric for sharing link for medusa.
>>>
>>> Regards,
>>> Adarsh Kumar
>>>
>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell 
>>> wrote:
>>>
 For me, I think the last one :
  Snapshot + Incremental + commitlog
 is the most meaningful way to do backup and restore, when you make the
 data backup to some where else like AWS S3.

- Snapshot based backup // for incremental data will not be
backuped and may lose data when restore to the time latter than snapshot
time;
- Incremental backups // better than snapshot backup .but
with Insufficient data accuracy. For data remain in the memtable will be
lose;
- Snapshot + incremental
- Snapshot + commitlog archival // better data precision than made
incremental backup, but the data in the non archived commitlog(not 
 archive
and commitlog log not closed) will not restore and will lose. Also when 
 log
is too much, do log reply will cost very mucu time

 For me ,We use snapshot + incremental + commitlog archive. We read
 snapshot data and incremental data .Also the log is backuped .But we will
 not backup the
 log whose data have been flush to sstable ,for the data will be
 backuped by the way we do incremental backup .

 This way , the data will exist in the format of sstable trough snapshot
 backup and incremental backup . The log number will be very small .And log
 replay will not cost much time.



 Eric LELEU  于2019年11月27日周三 下午4:13写道:

> Hi,
> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>
> See :
> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>
> Hope this link will help you.
>
> Eric
>
>
> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>
> Hi,
>
> I was looking for the backup strategies of Cassandra. After some study
> I came to know that there are the following options:
>
>- Snapshot based backup
>- Incremental backups
>- Snapshot + incremental
>- Snapshot + commitlog archival
>- Snapshot + Incremental + commitlog
>
> Which is the most suitable and feasible approach? Also which of these
> is used most.
> Please let me know if there is any other 

Re: Optimal backup strategy

2019-11-28 Thread Hossein Ghiyasi Mehr
commitlog backup isn't usable in another machine.
Backup solution depends on what you want to do: periodic backup or backup
to restore on other machine?
Periodic backup is combine of snapshot and incremental backup. Remove
incremental backup after new snapshot.
Take backup to restore on other machine: You can use snapshot after
flushing memtable or Use sstableloader.



VafaTech.com - A Total Solution for Data Gathering & Analysis

On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell  wrote:

> for cassandra or datastax's documentation, commitlog's backup is not
> mentioned.
> only snapshot and incremental backup is described to do backup .
>
> Though commitlog's archive for keyspace/table is not support but
> commitlog' replay (though you must put log to commitlog_dir and restart the
> process)
> support the feature of keyspace/table' replay filter (using
> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
> replay the specified keyspace/table)
>
> Snapshot do affect the storage, for us we got snapshot one week a time
> under the low business peak and making snapshot got throttle ,for you you
> may
> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)
>
>
>
> Adarsh Kumar  于2019年11月28日周四 上午1:00写道:
>
>> Thanks Guo and Eric for replying,
>>
>> I have some confusions about commit log backup:
>>
>>1. commit log archival technique is (
>>
>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>>) as good as an incremental backup, as it also captures commit logs after
>>memtable flush.
>>2. If we go for "Snapshot + Incremental bk + Commit log", here we
>>have to take commit log from commit log directory (is there any SOP for
>>this?). As commit logs are not per table or ks, we will have chalange in
>>restoring selective tables.
>>3. Snapshot based backups are easy to manage and operate due to its
>>simplicity. But they are heavy on storage. Any views on this?
>>4. Please share any successful strategy that someone is using for
>>production. We are still in the design phase and want to implement the 
>> best
>>solution.
>>
>> Thanks Eric for sharing link for medusa.
>>
>> Regards,
>> Adarsh Kumar
>>
>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell  wrote:
>>
>>> For me, I think the last one :
>>>  Snapshot + Incremental + commitlog
>>> is the most meaningful way to do backup and restore, when you make the
>>> data backup to some where else like AWS S3.
>>>
>>>- Snapshot based backup // for incremental data will not be backuped
>>>and may lose data when restore to the time latter than snapshot time;
>>>- Incremental backups // better than snapshot backup .but
>>>with Insufficient data accuracy. For data remain in the memtable will be
>>>lose;
>>>- Snapshot + incremental
>>>- Snapshot + commitlog archival // better data precision than made
>>>incremental backup, but the data in the non archived commitlog(not 
>>> archive
>>>and commitlog log not closed) will not restore and will lose. Also when 
>>> log
>>>is too much, do log reply will cost very mucu time
>>>
>>> For me ,We use snapshot + incremental + commitlog archive. We read
>>> snapshot data and incremental data .Also the log is backuped .But we will
>>> not backup the
>>> log whose data have been flush to sstable ,for the data will be backuped
>>> by the way we do incremental backup .
>>>
>>> This way , the data will exist in the format of sstable trough snapshot
>>> backup and incremental backup . The log number will be very small .And log
>>> replay will not cost much time.
>>>
>>>
>>>
>>> Eric LELEU  于2019年11月27日周三 下午4:13写道:
>>>
 Hi,
 TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

 See :
 https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html

 Hope this link will help you.

 Eric


 Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :

 Hi,

 I was looking for the backup strategies of Cassandra. After some study
 I came to know that there are the following options:

- Snapshot based backup
- Incremental backups
- Snapshot + incremental
- Snapshot + commitlog archival
- Snapshot + Incremental + commitlog

 Which is the most suitable and feasible approach? Also which of these
 is used most.
 Please let me know if there is any other option to tool available.

 Thanks in advance.

 Regards,
 Adarsh Kumar


>>>
>>> --
>>> you are the apple of my eye !
>>>
>>
>
> --
> you are the apple of my eye !
>


Re: Optimal backup strategy

2019-11-27 Thread guo Maxwell
for cassandra or datastax's documentation, commitlog's backup is not
mentioned.
only snapshot and incremental backup is described to do backup .

Though commitlog's archive for keyspace/table is not support but commitlog'
replay (though you must put log to commitlog_dir and restart the process)
support the feature of keyspace/table' replay filter (using
-Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to
replay the specified keyspace/table)

Snapshot do affect the storage, for us we got snapshot one week a time
under the low business peak and making snapshot got throttle ,for you you
may
see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019)



Adarsh Kumar  于2019年11月28日周四 上午1:00写道:

> Thanks Guo and Eric for replying,
>
> I have some confusions about commit log backup:
>
>1. commit log archival technique is (
>
> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
>) as good as an incremental backup, as it also captures commit logs after
>memtable flush.
>2. If we go for "Snapshot + Incremental bk + Commit log", here we have
>to take commit log from commit log directory (is there any SOP for this?).
>As commit logs are not per table or ks, we will have chalange in restoring
>selective tables.
>3. Snapshot based backups are easy to manage and operate due to its
>simplicity. But they are heavy on storage. Any views on this?
>4. Please share any successful strategy that someone is using for
>production. We are still in the design phase and want to implement the best
>solution.
>
> Thanks Eric for sharing link for medusa.
>
> Regards,
> Adarsh Kumar
>
> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell  wrote:
>
>> For me, I think the last one :
>>  Snapshot + Incremental + commitlog
>> is the most meaningful way to do backup and restore, when you make the
>> data backup to some where else like AWS S3.
>>
>>- Snapshot based backup // for incremental data will not be backuped
>>and may lose data when restore to the time latter than snapshot time;
>>- Incremental backups // better than snapshot backup .but
>>with Insufficient data accuracy. For data remain in the memtable will be
>>lose;
>>- Snapshot + incremental
>>- Snapshot + commitlog archival // better data precision than made
>>incremental backup, but the data in the non archived commitlog(not archive
>>and commitlog log not closed) will not restore and will lose. Also when 
>> log
>>is too much, do log reply will cost very mucu time
>>
>> For me ,We use snapshot + incremental + commitlog archive. We read
>> snapshot data and incremental data .Also the log is backuped .But we will
>> not backup the
>> log whose data have been flush to sstable ,for the data will be backuped
>> by the way we do incremental backup .
>>
>> This way , the data will exist in the format of sstable trough snapshot
>> backup and incremental backup . The log number will be very small .And log
>> replay will not cost much time.
>>
>>
>>
>> Eric LELEU  于2019年11月27日周三 下午4:13写道:
>>
>>> Hi,
>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>>>
>>> See :
>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>>>
>>> Hope this link will help you.
>>>
>>> Eric
>>>
>>>
>>> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>>>
>>> Hi,
>>>
>>> I was looking for the backup strategies of Cassandra. After some study I
>>> came to know that there are the following options:
>>>
>>>- Snapshot based backup
>>>- Incremental backups
>>>- Snapshot + incremental
>>>- Snapshot + commitlog archival
>>>- Snapshot + Incremental + commitlog
>>>
>>> Which is the most suitable and feasible approach? Also which of these is
>>> used most.
>>> Please let me know if there is any other option to tool available.
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Adarsh Kumar
>>>
>>>
>>
>> --
>> you are the apple of my eye !
>>
>

-- 
you are the apple of my eye !


Re: Optimal backup strategy

2019-11-27 Thread Adarsh Kumar
Thanks Guo and Eric for replying,

I have some confusions about commit log backup:

   1. commit log archival technique is (
   
https://support.datastax.com/hc/en-us/articles/115001593706-Manual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-
   ) as good as an incremental backup, as it also captures commit logs after
   memtable flush.
   2. If we go for "Snapshot + Incremental bk + Commit log", here we have
   to take commit log from commit log directory (is there any SOP for this?).
   As commit logs are not per table or ks, we will have chalange in restoring
   selective tables.
   3. Snapshot based backups are easy to manage and operate due to its
   simplicity. But they are heavy on storage. Any views on this?
   4. Please share any successful strategy that someone is using for
   production. We are still in the design phase and want to implement the best
   solution.

Thanks Eric for sharing link for medusa.

Regards,
Adarsh Kumar

On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell  wrote:

> For me, I think the last one :
>  Snapshot + Incremental + commitlog
> is the most meaningful way to do backup and restore, when you make the
> data backup to some where else like AWS S3.
>
>- Snapshot based backup // for incremental data will not be backuped
>and may lose data when restore to the time latter than snapshot time;
>- Incremental backups // better than snapshot backup .but
>with Insufficient data accuracy. For data remain in the memtable will be
>lose;
>- Snapshot + incremental
>- Snapshot + commitlog archival // better data precision than made
>incremental backup, but the data in the non archived commitlog(not archive
>and commitlog log not closed) will not restore and will lose. Also when log
>is too much, do log reply will cost very mucu time
>
> For me ,We use snapshot + incremental + commitlog archive. We read
> snapshot data and incremental data .Also the log is backuped .But we will
> not backup the
> log whose data have been flush to sstable ,for the data will be backuped
> by the way we do incremental backup .
>
> This way , the data will exist in the format of sstable trough snapshot
> backup and incremental backup . The log number will be very small .And log
> replay will not cost much time.
>
>
>
> Eric LELEU  于2019年11月27日周三 下午4:13写道:
>
>> Hi,
>> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>>
>> See :
>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>>
>> Hope this link will help you.
>>
>> Eric
>>
>>
>> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>>
>> Hi,
>>
>> I was looking for the backup strategies of Cassandra. After some study I
>> came to know that there are the following options:
>>
>>- Snapshot based backup
>>- Incremental backups
>>- Snapshot + incremental
>>- Snapshot + commitlog archival
>>- Snapshot + Incremental + commitlog
>>
>> Which is the most suitable and feasible approach? Also which of these is
>> used most.
>> Please let me know if there is any other option to tool available.
>>
>> Thanks in advance.
>>
>> Regards,
>> Adarsh Kumar
>>
>>
>
> --
> you are the apple of my eye !
>


Re: Optimal backup strategy

2019-11-27 Thread guo Maxwell
For me, I think the last one :
 Snapshot + Incremental + commitlog
is the most meaningful way to do backup and restore, when you make the data
backup to some where else like AWS S3.

   - Snapshot based backup // for incremental data will not be backuped and
   may lose data when restore to the time latter than snapshot time;
   - Incremental backups // better than snapshot backup .but
   with Insufficient data accuracy. For data remain in the memtable will be
   lose;
   - Snapshot + incremental
   - Snapshot + commitlog archival // better data precision than made
   incremental backup, but the data in the non archived commitlog(not archive
   and commitlog log not closed) will not restore and will lose. Also when log
   is too much, do log reply will cost very mucu time

For me ,We use snapshot + incremental + commitlog archive. We read snapshot
data and incremental data .Also the log is backuped .But we will not backup
the
log whose data have been flush to sstable ,for the data will be backuped by
the way we do incremental backup .

This way , the data will exist in the format of sstable trough snapshot
backup and incremental backup . The log number will be very small .And log
replay will not cost much time.



Eric LELEU  于2019年11月27日周三 下午4:13写道:

> Hi,
> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.
>
> See :
> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>
> Hope this link will help you.
>
> Eric
>
>
> Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :
>
> Hi,
>
> I was looking for the backup strategies of Cassandra. After some study I
> came to know that there are the following options:
>
>- Snapshot based backup
>- Incremental backups
>- Snapshot + incremental
>- Snapshot + commitlog archival
>- Snapshot + Incremental + commitlog
>
> Which is the most suitable and feasible approach? Also which of these is
> used most.
> Please let me know if there is any other option to tool available.
>
> Thanks in advance.
>
> Regards,
> Adarsh Kumar
>
>

-- 
you are the apple of my eye !


Re: Optimal backup strategy

2019-11-27 Thread Eric LELEU

Hi,

TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

See : 
https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html


Hope this link will help you.

Eric


Le 27/11/2019 à 08:10, Adarsh Kumar a écrit :

Hi,

I was looking for the backup strategies of Cassandra. After some study 
I came to know that there are the following options:


  * Snapshot based backup
  * Incremental backups
  * Snapshot + incremental
  * Snapshot + commitlog archival
  * Snapshot + Incremental + commitlog

Which is the most suitable and feasible approach? Also which of these 
is used most.

Please let me know if there is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar


Optimal backup strategy

2019-11-26 Thread Adarsh Kumar
Hi,

I was looking for the backup strategies of Cassandra. After some study I
came to know that there are the following options:

   - Snapshot based backup
   - Incremental backups
   - Snapshot + incremental
   - Snapshot + commitlog archival
   - Snapshot + Incremental + commitlog

Which is the most suitable and feasible approach? Also which of these is
used most.
Please let me know if there is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar