Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Greg Martyn via talk
I haven't used Gluster personally, but have you tried
turning performance.parallel-readdir on?
https://docs.gluster.org/en/latest/release-notes/3.10.0/#implemented-parallel-readdirp-with-distribute-xlator

It seems there's a reason why it's on by default (
https://www.spinics.net/lists/gluster-devel/msg25518.html) but maybe it'd
still be worth it for you?

On Mon, May 4, 2020 at 9:55 AM Alvin Starr via talk  wrote:

>
> I am hoping someone has seen this kind of problem before and knows of a
> solution.
> I have a client who has file systems filled with lots of small files on
> the orders of hundreds of millions of files.
> Running something like a find on filesystem takes the better part of a
> week so any kind of directory walking backup tool will take even longer
> to run.
> The actual data-size for 100M files is on the order of 15TB so there is
> a lot of data to backup but the data only increases on the order of tens
> to hundreds of MB a day.
>
>
> Even things like xfsdump take a long time.
> For example I tried xfsdump on a 50M file set and it took over 2 days to
> complete.
>
> The only thing that seems to be workable is Veeam.
> It will run an incremental volume snapshot in a few hours a night but I
> dislike adding proprietary kernel modules into the systems.
>
>
> --
> Alvin Starr   ||   land:  (647)478-6285
> Netvel Inc.   ||   Cell:  (416)806-0133
> al...@netvel.net  ||
>
> ---
> Post to this mailing list talk@gtalug.org
> Unsubscribe from this mailing list
> https://gtalug.org/mailman/listinfo/talk
>
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread John Sellens via talk
On Wed, 2020/05/06 10:38:29AM -0400, Howard Gibson via talk  
wrote:
| > ZFS is another option. And it handles delta-backups very easily.
| 
|How do you recover stuff from delta backups?  You have to figure which 
backup the file or directory is in, right?

Remember that snapshots, like RAID, are not actually backups,
unless they are on a different machine, in a different place.

ZFS makes it easy:

You can browse through snapshots for /mypool/myfs by looking
in /mypool/myfs/.zfs/snapshot and if your ZFS snapshots are
named using dates, easy peasy to choose when.  You can also
brute force and
  find /mypool/myfs/.zfs/snapshot -name 'myfile.tex' -ls
and find what's there.

You can use "zfs rollback" to revert to a snapshot.

You can use "zfs send ... | zfs recv ..." to copy a specific
snapshot (or group of snapshots) to another pool, system, etc.

And of course, when you create a snapshot, you could create your
own index listing of what's there for easy grepping.

ZFS is great.

You can still (and likely should) continue to backup to Blu-ray,
but ZFS will make sure your files don't rot in place unnoticed.

Hope that helps, cheers

John
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk

On 5/6/20 10:18 AM, Lennart Sorensen via talk wrote:

On Wed, May 06, 2020 at 07:25:29AM -0400, David Mason wrote:

ZFS is another option. And it handles delta-backups very easily.

It is however not the XFS that glusterfs says to use.  If glusterfs is
involved, XFS really seems to be the only option.

There is Gluster hosted documentation on how to use ZFS with Gluster but 
it not the preferred option.

Putting Gluster on a thinlvm is the Gluster way for doing snapshots.

I have found a package called 
wyng-backup(https://github.com/tasket/wyng-backup).


This may solve my problem but it would require me to change all my 
volumes to thin provisioning.




--
Alvin Starr   ||   land:  (647)478-6285
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Howard Gibson via talk
On Wed, 6 May 2020 07:25:29 -0400
David Mason via talk  wrote:

> ZFS is another option. And it handles delta-backups very easily.

David,

   How do you recover stuff from delta backups?  You have to figure which 
backup the file or directory is in, right?

   My backup recoveries, admittedly here at home, consist of me recovering 
files I stupidly modified incorrectly, and some accidentally deleted 
directories.  In the case of the directories, I noticed the problem several 
years after I did it.  Since I back up everything, all I had to do was pull out 
the backup Blu-ray from shortly after I worked on the directory. 

-- 
Howard Gibson 
hgib...@eol.ca
jhowardgib...@gmail.com
http://home.eol.ca/~hgibson
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Lennart Sorensen via talk
On Wed, May 06, 2020 at 07:25:29AM -0400, David Mason wrote:
> ZFS is another option. And it handles delta-backups very easily.

It is however not the XFS that glusterfs says to use.  If glusterfs is
involved, XFS really seems to be the only option.

-- 
Len Sorensen
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk

On 5/6/20 7:25 AM, David Mason via talk wrote:

ZFS is another option. And it handles delta-backups very easily.

../Dave

I have been following ZFS on and off and it looks interesting but I am 
kind of stuck because it is not included in Centos/RH which is a 
requirement in this case.


It would be interesting to test it at scale just to see how well is does 
work.


--
Alvin Starr   ||   land:  (647)478-6285
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread David Mason via talk
ZFS is another option. And it handles delta-backups very easily.

../Dave
On May 5, 2020, 11:27 PM -0400, Lennart Sorensen via talk , 
wrote:
> On Mon, May 04, 2020 at 10:42:25PM -0400, Alvin Starr via talk wrote:
> > The files are generally a few hundred KB each. They may run into a few MB
> > but that's about it.
> > I use to use ReiserFS back in the days of ext2/3 but it kind of fell out of
> > favor after the lead developer got sent away for murder.
> > Reiser was much faster and more reliable than ext at the time.
> > It would actually be interesting to see if running a reiserfs or btrfs
> > filesystem would actually make a significant difference but in the long run
> > I am kind of stuck with Centos/RH supported file systems and reiser and
> > btrfs are not part of that mix anymore.
>
> ReiserFS was not reliable. I certainly stopped using it long before
> the developer issues happened. The silent file content loss was just
> unacceptable. And it wasn't a rare occurance. I saw it on many systems
> many times. ext2 and 3 you could at least trust with your data even if
> they were quite a bit slower. Certainly these days RHEL supports ext2/3/4
> and XFS (their default and preferred). I use ext4 because it works well.
> GlusterFS defaults to XFS and while technically it can use other
> filesystems (and many people do run ext4 on it apparently) I don't
> believe they support that setup.
>
> > I am not sure how much I can get by tweaking the filesystem.
> > I would need to get a 50x -100x improvement to make backups complete in a
> > few hours.
> > Most stuff I have read comparing various filesystems and performance are
> > talking about percentage differences that is much less than 100%.
> >
> > I have a feeling that the only answer will be something like Veeam where
> > only changed blocks are backed up.
> > A directory tree walk just takes too long.
>
> Well, does the system have enough ram? That is something that often
> isn't hard to increase. XFS has certainly in the past been known to
> require a fair bit of ram to manage well.
>
> --
> Len Sorensen
> ---
> Post to this mailing list talk@gtalug.org
> Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk

On 5/6/20 12:37 AM, Nicholas Krause wrote:
[snip]

Well, does the system have enough ram?  That is something that often

isn't hard to increase.  XFS has certainly in the past been known to
require a fair bit of ram to manage well.

I mentioned to check /proc/meminfo as well to look at cache in case you
missed that.

You did mention that and it is on my plate to research the amount of ram 
that is suggested for this kind of application.


--
Alvin Starr   ||   land:  (647)478-6285
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] On the subject of backups.

2020-05-06 Thread Alvin Starr via talk

On 5/5/20 11:27 PM, Lennart Sorensen wrote:

On Mon, May 04, 2020 at 10:42:25PM -0400, Alvin Starr via talk wrote:

The files are generally a few hundred KB each. They may run into a few MB
but that's about it.
I use to use ReiserFS back in the days of ext2/3 but it kind of fell out of
favor after the lead developer got sent away for murder.
Reiser was much faster and more reliable than ext at the time.
It would actually be interesting to see if running a reiserfs or btrfs
filesystem would actually make a significant difference but in the long run
I am kind of stuck with Centos/RH supported file systems and reiser and
btrfs are not part of that mix anymore.

ReiserFS was not reliable.  I certainly stopped using it long before
the developer issues happened.  The silent file content loss was just
unacceptable.  And it wasn't a rare occurance.  I saw it on many systems
many times.  ext2 and 3 you could at least trust with your data even if
they were quite a bit slower.  Certainly these days RHEL supports ext2/3/4
and XFS (their default and preferred).  I use ext4 because it works well.
GlusterFS defaults to XFS and while technically it can use other
filesystems (and many people do run ext4 on it apparently) I don't
believe they support that setup.


This is a nice example of YMMV.
I moved my critical data from ext2/3 because they would trash my data 
and corrupt the whole filesystem under certain cases.
I had much better luck with Reiser even in the face of a couple of 
system crashes.

I am not sure how much I can get by tweaking the filesystem.
I would need to get a 50x -100x improvement to make backups complete in a
few hours.
Most stuff I have read comparing various filesystems and performance are
talking about percentage differences that is much less than 100%.

I have a feeling that the only answer will be something like Veeam where
only changed blocks are backed up.
A directory tree walk just takes too long.

Well, does the system have enough ram?  That is something that often
isn't hard to increase.  XFS has certainly in the past been known to
require a fair bit of ram to manage well.


The system has 32G and runs nothing but gluster.
Not sure if that is enough ram and will require a bit of researching.
It looks like the system has about 15G of cached data.

--
Alvin Starr   ||   land:  (647)478-6285
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk