Re: copying milllions of small files and millions of dirs

2013-10-10 Thread aurfalien

On Aug 15, 2013, at 11:46 PM, Nicolas KOWALSKI wrote:

 On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
 Is there a faster way to copy files over NFS?
 
 I would use find+cpio. This handles hard links, permissions, and in case 
 of later runs, will not copy files if they already exist on the 
 destination.
 
 # cd /source/dir
 # find . | cpio -pvdm /destination/dir


Old thread I know but cpio has proven twice as fast as rsync.

Trusty ol cpio.

Gonna try cpdup next.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-10-10 Thread Warren Block

On Thu, 10 Oct 2013, aurfalien wrote:



On Aug 15, 2013, at 11:46 PM, Nicolas KOWALSKI wrote:


On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:

Is there a faster way to copy files over NFS?


I would use find+cpio. This handles hard links, permissions, and in case
of later runs, will not copy files if they already exist on the
destination.

# cd /source/dir
# find . | cpio -pvdm /destination/dir



Old thread I know but cpio has proven twice as fast as rsync.

Trusty ol cpio.

Gonna try cpdup next.


Try sysutils/clone, too.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread krad
When i migrated a large mailspool in maildir format from the old nfs server
to the new one in a previous job, I 1st generated a list of the top level
maildirs. I then generated the rsync commands + plus a few other bits and
pieces for each maildir to make a single transaction like function. I then
pumped all this auto generated scripts into xjobs and ran them in parallel.
This vastly speeded up the process as sequentially running the tree was far
to slow. THis was for about 15 million maildirs in a hashed structure btw
so a fair amount of files.


eg

find /maildir -type d -maxdepth 4 | while read d
do
r=$(($RANDOM*$RANDOM))
echo rsync -a $d/ /newpath/$d/  /tmp/scripts/$r
echo some other stuff  /tmp/scripts/$r
done

ls /tmp/scripts/| while read f
echo /tmp/scripts/$f
done | xjobs -j 20










On 19 August 2013 18:52, aurfalien aurfal...@gmail.com wrote:


 On Aug 19, 2013, at 10:41 AM, Mark Felder wrote:

  On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
  On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
  Is there a faster way to copy files over NFS?
 
  I would use find+cpio. This handles hard links, permissions, and in case
  of later runs, will not copy files if they already exist on the
  destination.
 
  # cd /source/dir
  # find . | cpio -pvdm /destination/dir
 
 
  I always found sysutils/cpdup to be faster than rsync.

 Ah, bookmarking this one.

 Many thanks.

 - aurf
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to 
 freebsd-questions-unsubscr...@freebsd.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread krad
whops that should have been

ls /tmp/scripts/| while read f
echo sh /tmp/scripts/$f
done | xjobs -j 20


On 20 August 2013 08:32, krad kra...@gmail.com wrote:

 When i migrated a large mailspool in maildir format from the old nfs
 server to the new one in a previous job, I 1st generated a list of the top
 level maildirs. I then generated the rsync commands + plus a few other bits
 and pieces for each maildir to make a single transaction like function. I
 then pumped all this auto generated scripts into xjobs and ran them in
 parallel. This vastly speeded up the process as sequentially running the
 tree was far to slow. THis was for about 15 million maildirs in a hashed
 structure btw so a fair amount of files.


 eg

 find /maildir -type d -maxdepth 4 | while read d
 do
 r=$(($RANDOM*$RANDOM))
 echo rsync -a $d/ /newpath/$d/  /tmp/scripts/$r
 echo some other stuff  /tmp/scripts/$r
 done

 ls /tmp/scripts/| while read f
 echo /tmp/scripts/$f
 done | xjobs -j 20










 On 19 August 2013 18:52, aurfalien aurfal...@gmail.com wrote:


 On Aug 19, 2013, at 10:41 AM, Mark Felder wrote:

  On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
  On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
  Is there a faster way to copy files over NFS?
 
  I would use find+cpio. This handles hard links, permissions, and in
 case
  of later runs, will not copy files if they already exist on the
  destination.
 
  # cd /source/dir
  # find . | cpio -pvdm /destination/dir
 
 
  I always found sysutils/cpdup to be faster than rsync.

 Ah, bookmarking this one.

 Many thanks.

 - aurf
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to 
 freebsd-questions-unsubscr...@freebsd.org



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread Frank Leonhardt

On 20/08/2013 08:32, krad wrote:

When i migrated a large mailspool in maildir format from the old nfs server
to the new one in a previous job, I 1st generated a list of the top level
maildirs. I then generated the rsync commands + plus a few other bits and
pieces for each maildir to make a single transaction like function. I then
pumped all this auto generated scripts into xjobs and ran them in parallel.
This vastly speeded up the process as sequentially running the tree was far
to slow. THis was for about 15 million maildirs in a hashed structure btw
so a fair amount of files.


eg

find /maildir -type d -maxdepth 4 | while read d
do
r=$(($RANDOM*$RANDOM))
echo rsync -a $d/ /newpath/$d/  /tmp/scripts/$r
echo some other stuff  /tmp/scripts/$r
done

ls /tmp/scripts/| while read f
echo /tmp/scripts/$f
done | xjobs -j 20



This isn't what I'd have expected, as running operations in parallel on 
mechanical drives would normally result in superfluous head movements 
and thus exacerbate the I/O bottleneck. The system must be optimising 
the requests from 20 parallel jobs better than I thought it would to 
climb out from that hole far enough to get a net benefit. Did you 
remember how any other approaches performed?


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread Warren Block

On Mon, 19 Aug 2013, Mark Felder wrote:


On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:

On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:

Is there a faster way to copy files over NFS?


I would use find+cpio. This handles hard links, permissions, and in case
of later runs, will not copy files if they already exist on the
destination.

# cd /source/dir
# find . | cpio -pvdm /destination/dir



I always found sysutils/cpdup to be faster than rsync.


sysutils/clone may do better as well.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-19 Thread Mark Felder
On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
 On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
  Is there a faster way to copy files over NFS?
 
 I would use find+cpio. This handles hard links, permissions, and in case 
 of later runs, will not copy files if they already exist on the 
 destination.
 
 # cd /source/dir
 # find . | cpio -pvdm /destination/dir
 

I always found sysutils/cpdup to be faster than rsync.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-19 Thread aurfalien

On Aug 19, 2013, at 10:41 AM, Mark Felder wrote:

 On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
 On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
 Is there a faster way to copy files over NFS?
 
 I would use find+cpio. This handles hard links, permissions, and in case 
 of later runs, will not copy files if they already exist on the 
 destination.
 
 # cd /source/dir
 # find . | cpio -pvdm /destination/dir
 
 
 I always found sysutils/cpdup to be faster than rsync.

Ah, bookmarking this one.

Many thanks.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-16 Thread Nicolas KOWALSKI
On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
 Is there a faster way to copy files over NFS?

I would use find+cpio. This handles hard links, permissions, and in case 
of later runs, will not copy files if they already exist on the 
destination.

# cd /source/dir
# find . | cpio -pvdm /destination/dir

-- 
Nicolas
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien
Hi all,

Is there a faster way to copy files over NFS?

Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs 
having ~500,000 dirs or files each.

Obviously reading all the meta data is a PITA.

Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.

Going from a 38TB used, 50TB total BlueArc Titan 3200 to a new shiny 80TB total 
FreeBSD 9.2RC1 ZFS bad boy.

Thanks in advance,

- aurf



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:

 On Aug 15, 2013, at 11:13 AM, aurfalien aurfal...@gmail.com wrote:
 Is there a faster way to copy files over NFS?
 
 Probably.

Ok, thanks for the specifics.

 Currently breaking up a simple rsync over 7 or so scripts which copies 22 
 dirs having ~500,000 dirs or files each.
 
 There's a maximum useful concurrency which depends on how many disk spindles 
 and what flavor of RAID is in use; exceeding it will result in thrashing the 
 disks and heavily reducing throughput due to competing I/O requests.  Try 
 measuring aggregate performance when running fewer rsyncs at once and see 
 whether it improves.

Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no 
atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or 
really learn ZFS but at this point its only backing up the data for emergency 
purposes.

 Of course, putting half a million files into a single directory level is also 
 a bad idea, even with dirhash support.  You'd do better to break them up into 
 subdirs containing fewer than ~10K files apiece.

I can't, thats our job structure obviously developed by scrip kiddies and not 
systems ppl, but I digress.

 Obviously reading all the meta data is a PITA.
 
 Yes.
 
 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
 
 Yeah, probably not-- you're almost certainly I/O bound, not network bound.

Actually it was network bound via 1 rsync process which is why I broke up 154 
dirs into 7 batches of 22 each.

I'll have to acquaint myself with ZFS centric tools to help me determine whats 
going on.

But 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote:

 On Aug 15, 2013, at 11:37 AM, aurfalien aurfal...@gmail.com wrote:
 On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
 On Aug 15, 2013, at 11:13 AM, aurfalien aurfal...@gmail.com wrote:
 Is there a faster way to copy files over NFS?
 
 Probably.
 
 Ok, thanks for the specifics.
 
 You're most welcome.
 
 Currently breaking up a simple rsync over 7 or so scripts which copies 22 
 dirs having ~500,000 dirs or files each.
 
 There's a maximum useful concurrency which depends on how many disk 
 spindles and what flavor of RAID is in use; exceeding it will result in 
 thrashing the disks and heavily reducing throughput due to competing I/O 
 requests.  Try measuring aggregate performance when running fewer rsyncs at 
 once and see whether it improves.
 
 Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no 
 atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or 
 really learn ZFS but at this point its only backing up the data for 
 emergency purposes.
 
 OK.  If you've got 7 independent groups and can use separate network pipes 
 for each parallel copy, then using 7 simultaneous scripts is likely 
 reasonable.
 
 Of course, putting half a million files into a single directory level is 
 also a bad idea, even with dirhash support.  You'd do better to break them 
 up into subdirs containing fewer than ~10K files apiece.
 
 I can't, thats our job structure obviously developed by scrip kiddies and 
 not systems ppl, but I digress.
 
 Identifying something which is broken as designed is still helpful, since 
 it indicates what needs to change.
 
 Obviously reading all the meta data is a PITA.
 
 Yes.
 
 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
 
 Yeah, probably not-- you're almost certainly I/O bound, not network bound.
 
 Actually it was network bound via 1 rsync process which is why I broke up 
 154 dirs into 7 batches of 22 each.
 
 Oh.  Um, unless you can make more network bandwidth available, you've 
 saturated the bottleneck.
 Doing a single copy task is likely to complete faster than splitting up the 
 job into subtasks in such a case.

Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going were 
before it was in the 10Ms with 1.

Also, physically looking at my ZFS server, it now shows the drives lights are 
blinking faster, like every second.  Were as before it was sort of seldom, like 
every 3 seconds or so.

I was thinking to perhaps zip dirs up and then xfer the file over but it would 
prolly take as long to zip/unzip.

This bloody project structure we have is nuts.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Charles Swiger
On Aug 15, 2013, at 11:13 AM, aurfalien aurfal...@gmail.com wrote:
 Is there a faster way to copy files over NFS?

Probably.

 Currently breaking up a simple rsync over 7 or so scripts which copies 22 
 dirs having ~500,000 dirs or files each.

There's a maximum useful concurrency which depends on how many disk spindles 
and what flavor of RAID is in use; exceeding it will result in thrashing the 
disks and heavily reducing throughput due to competing I/O requests.  Try 
measuring aggregate performance when running fewer rsyncs at once and see 
whether it improves.

Of course, putting half a million files into a single directory level is also a 
bad idea, even with dirhash support.  You'd do better to break them up into 
subdirs containing fewer than ~10K files apiece.

 Obviously reading all the meta data is a PITA.

Yes.

 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.

Yeah, probably not-- you're almost certainly I/O bound, not network bound.

Regards,
-- 
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Adam Vande More
On Thu, Aug 15, 2013 at 1:13 PM, aurfalien aurfal...@gmail.com wrote:

 Hi all,

 Is there a faster way to copy files over NFS?


Remove NFS from the setup.



-- 
Adam Vande More
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 12:36 PM, Adam Vande More wrote:

 On Thu, Aug 15, 2013 at 1:13 PM, aurfalien aurfal...@gmail.com wrote:
 Hi all,
 
 Is there a faster way to copy files over NFS?
 
 Remove NFS from the setup.  

Yea, your mouth to gods ears.

My BlueArc is an NFS NAS only box.

So no way to get to the data other then NFS.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Charles Swiger
On Aug 15, 2013, at 11:37 AM, aurfalien aurfal...@gmail.com wrote:
 On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
 On Aug 15, 2013, at 11:13 AM, aurfalien aurfal...@gmail.com wrote:
 Is there a faster way to copy files over NFS?
 
 Probably.
 
 Ok, thanks for the specifics.

You're most welcome.

 Currently breaking up a simple rsync over 7 or so scripts which copies 22 
 dirs having ~500,000 dirs or files each.
 
 There's a maximum useful concurrency which depends on how many disk spindles 
 and what flavor of RAID is in use; exceeding it will result in thrashing the 
 disks and heavily reducing throughput due to competing I/O requests.  Try 
 measuring aggregate performance when running fewer rsyncs at once and see 
 whether it improves.
 
 Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no 
 atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or 
 really learn ZFS but at this point its only backing up the data for emergency 
 purposes.

OK.  If you've got 7 independent groups and can use separate network pipes for 
each parallel copy, then using 7 simultaneous scripts is likely reasonable.

 Of course, putting half a million files into a single directory level is 
 also a bad idea, even with dirhash support.  You'd do better to break them 
 up into subdirs containing fewer than ~10K files apiece.
 
 I can't, thats our job structure obviously developed by scrip kiddies and not 
 systems ppl, but I digress.

Identifying something which is broken as designed is still helpful, since it 
indicates what needs to change.

 Obviously reading all the meta data is a PITA.
 
 Yes.
 
 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
 
 Yeah, probably not-- you're almost certainly I/O bound, not network bound.
 
 Actually it was network bound via 1 rsync process which is why I broke up 154 
 dirs into 7 batches of 22 each.

Oh.  Um, unless you can make more network bandwidth available, you've saturated 
the bottleneck.
Doing a single copy task is likely to complete faster than splitting up the job 
into subtasks in such a case.

Regards,
-- 
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Frank Leonhardt

On 15/08/2013 19:13, aurfalien wrote:

Hi all,

Is there a faster way to copy files over NFS?

Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs 
having ~500,000 dirs or files each.



I'm reading all this with interest. The first thing I'd have tried would 
be tar (and probably netcat) but I'm a probably bit of a dinosaur. (If 
someone wants to buy me some really big drives I promise I'll update). 
If it's really NFS or nothing I guess you couldn't open a socket anyway.


I'd be interested to know whether tar is still worth using in this world 
of volume managers and SMP.


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Charles Swiger
[ ...combining replies for brevity... ]

On Aug 15, 2013, at 1:02 PM, Frank Leonhardt fra...@fjl.co.uk wrote:
 I'm reading all this with interest. The first thing I'd have tried would be 
 tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone 
 wants to buy me some really big drives I promise I'll update). If it's really 
 NFS or nothing I guess you couldn't open a socket anyway.

Either tar via netcat or SSH, or dump / restore via similar pipeline are quite 
traditional.  tar is more flexible for partial filesystem copies, whereas the 
dump / restore is more oriented towards complete filesystem copies.  If the 
destination starts off empty, they're probably faster than rsync, but rsync 
does delta updates which is a huge win if you're going to be copying changes 
onto a slightly older version.

Anyway, you're entirely right that the capabilities of the source matter a 
great deal.
If it could do zfs send / receive, or similar snapshot mirroring, that would 
likely do better than userland tools.

 I'd be interested to know whether tar is still worth using in this world of 
 volume managers and SMP.

Yes.

On Aug 15, 2013, at 12:14 PM, aurfalien aurfal...@gmail.com wrote:
[ ... ]
 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
 
 Yeah, probably not-- you're almost certainly I/O bound, not network bound.
 
 Actually it was network bound via 1 rsync process which is why I broke up 
 154 dirs into 7 batches of 22 each.
 
 Oh.  Um, unless you can make more network bandwidth available, you've 
 saturated the bottleneck.
 Doing a single copy task is likely to complete faster than splitting up the 
 job into subtasks in such a case.
 
 Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going 
 were before it was in the 10Ms with 1.

1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s 
obviously wasn't close saturating a 10Gb link.

Regards,
-- 
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Roland Smith
On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
 Hi all,
 
 Is there a faster way to copy files over NFS?

Can you log into your NAS with ssh or telnet?

It so I would suggest using tar(1) and nc(1). It has been a while since I
measured it, but IIRC the combination of tar (without compression) and netcat
could saturate a 100 Mbit ethernet connection.

Roland
-- 
R.F.Smith   http://rsmith.home.xs4all.nl/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpQj6wccNv4f.pgp
Description: PGP signature


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 1:35 PM, Roland Smith wrote:

 On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
 Hi all,
 
 Is there a faster way to copy files over NFS?
 
 Can you log into your NAS with ssh or telnet?

I can but thats a back channel link of 100Mb link.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 1:22 PM, Charles Swiger wrote:

 [ ...combining replies for brevity... ]
 
 On Aug 15, 2013, at 1:02 PM, Frank Leonhardt fra...@fjl.co.uk wrote:
 I'm reading all this with interest. The first thing I'd have tried would be 
 tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone 
 wants to buy me some really big drives I promise I'll update). If it's 
 really NFS or nothing I guess you couldn't open a socket anyway.
 
 Either tar via netcat or SSH, or dump / restore via similar pipeline are 
 quite traditional.  tar is more flexible for partial filesystem copies, 
 whereas the dump / restore is more oriented towards complete filesystem 
 copies.  If the destination starts off empty, they're probably faster than 
 rsync, but rsync does delta updates which is a huge win if you're going to be 
 copying changes onto a slightly older version.

Yep, so looks like it is what it is as the data set is changing while I do the 
base sync.  So I'll have to do several more to pick up new comers etc...

 Anyway, you're entirely right that the capabilities of the source matter a 
 great deal.
 If it could do zfs send / receive, or similar snapshot mirroring, that would 
 likely do better than userland tools.
 
 I'd be interested to know whether tar is still worth using in this world of 
 volume managers and SMP.
 
 Yes.
 
 On Aug 15, 2013, at 12:14 PM, aurfalien aurfal...@gmail.com wrote:
 [ ... ]
 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
 
 Yeah, probably not-- you're almost certainly I/O bound, not network bound.
 
 Actually it was network bound via 1 rsync process which is why I broke up 
 154 dirs into 7 batches of 22 each.
 
 Oh.  Um, unless you can make more network bandwidth available, you've 
 saturated the bottleneck.
 Doing a single copy task is likely to complete faster than splitting up the 
 job into subtasks in such a case.
 
 Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going 
 were before it was in the 10Ms with 1.
 
 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s 
 obviously wasn't close saturating a 10Gb link.

Cool.  Looks like I am doing my best which is what I wanted to know.  I chose 
to do 7 rsync scripts as it evenly divides into 154 parent dirs :)

You should see how our backup system deal with this; Atempo Time Navigator or 
Tina as its called.

It takes an hour just to lay down the dirs on tape before even starting to 
backup, crazyness.  And thats just for 1 parent dir having an avg of 500,000 
dirs.  Actually I'm prolly wrong as the initial creation is 125,000 dirs, of 
which a few are sym links.

Then it grows from there.  Looking at the Tina stats, we see a million objects 
or more.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread iamatt
I would use ndmp.  That is how we  archive our  nas crap  isilon stuff but
we have the backend accelerators   Not sure if there is ndmp for FreeBSD.
Like another poster said   you are most likely i/o bound anyway.


On Thu, Aug 15, 2013 at 2:14 PM, aurfalien aurfal...@gmail.com wrote:


 On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote:

  On Aug 15, 2013, at 11:37 AM, aurfalien aurfal...@gmail.com wrote:
  On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
  On Aug 15, 2013, at 11:13 AM, aurfalien aurfal...@gmail.com wrote:
  Is there a faster way to copy files over NFS?
 
  Probably.
 
  Ok, thanks for the specifics.
 
  You're most welcome.
 
  Currently breaking up a simple rsync over 7 or so scripts which
 copies 22 dirs having ~500,000 dirs or files each.
 
  There's a maximum useful concurrency which depends on how many disk
 spindles and what flavor of RAID is in use; exceeding it will result in
 thrashing the disks and heavily reducing throughput due to competing I/O
 requests.  Try measuring aggregate performance when running fewer rsyncs at
 once and see whether it improves.
 
  Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL
 and no atime, the server it self has 128GB ECC RAM.  I didn't have time to
 tune or really learn ZFS but at this point its only backing up the data for
 emergency purposes.
 
  OK.  If you've got 7 independent groups and can use separate network
 pipes for each parallel copy, then using 7 simultaneous scripts is likely
 reasonable.
 
  Of course, putting half a million files into a single directory level
 is also a bad idea, even with dirhash support.  You'd do better to break
 them up into subdirs containing fewer than ~10K files apiece.
 
  I can't, thats our job structure obviously developed by scrip kiddies
 and not systems ppl, but I digress.
 
  Identifying something which is broken as designed is still helpful,
 since it indicates what needs to change.
 
  Obviously reading all the meta data is a PITA.
 
  Yes.
 
  Doin 10Gb/jumbos but in this case it don't make much of a hoot of a
 diff.
 
  Yeah, probably not-- you're almost certainly I/O bound, not network
 bound.
 
  Actually it was network bound via 1 rsync process which is why I broke
 up 154 dirs into 7 batches of 22 each.
 
  Oh.  Um, unless you can make more network bandwidth available, you've
 saturated the bottleneck.
  Doing a single copy task is likely to complete faster than splitting up
 the job into subtasks in such a case.

 Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going
 were before it was in the 10Ms with 1.

 Also, physically looking at my ZFS server, it now shows the drives lights
 are blinking faster, like every second.  Were as before it was sort of
 seldom, like every 3 seconds or so.

 I was thinking to perhaps zip dirs up and then xfer the file over but it
 would prolly take as long to zip/unzip.

 This bloody project structure we have is nuts.

 - aurf
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to 
 freebsd-questions-unsubscr...@freebsd.org

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org