Re: copying milllions of small files and millions of dirs

aurfalien Thu, 15 Aug 2013 13:41:54 -0700

On Aug 15, 2013, at 1:22 PM, Charles Swiger wrote:

> [ ...combining replies for brevity... ]
> 
> On Aug 15, 2013, at 1:02 PM, Frank Leonhardt <fra...@fjl.co.uk> wrote:
>> I'm reading all this with interest. The first thing I'd have tried would be 
>> tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone 
>> wants to buy me some really big drives I promise I'll update). If it's 
>> really NFS or nothing I guess you couldn't open a socket anyway.
> 
> Either tar via netcat or SSH, or dump / restore via similar pipeline are 
> quite traditional.  tar is more flexible for partial filesystem copies, 
> whereas the dump / restore is more oriented towards complete filesystem 
> copies.  If the destination starts off empty, they're probably faster than 
> rsync, but rsync does delta updates which is a huge win if you're going to be 
> copying changes onto a slightly older version.

Yep, so looks like it is what it is as the data set is changing while I do the 
base sync.  So I'll have to do several more to pick up new comers etc...

> Anyway, you're entirely right that the capabilities of the source matter a 
> great deal.
> If it could do zfs send / receive, or similar snapshot mirroring, that would 
> likely do better than userland tools.
> 
>> I'd be interested to know whether tar is still worth using in this world of 
>> volume managers and SMP.
> 
> Yes.
> 
> On Aug 15, 2013, at 12:14 PM, aurfalien <aurfal...@gmail.com> wrote:
> [ ... ]
>>>>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
>>>>> 
>>>>> Yeah, probably not-- you're almost certainly I/O bound, not network bound.
>>>> 
>>>> Actually it was network bound via 1 rsync process which is why I broke up 
>>>> 154 dirs into 7 batches of 22 each.
>>> 
>>> Oh.  Um, unless you can make more network bandwidth available, you've 
>>> saturated the bottleneck.
>>> Doing a single copy task is likely to complete faster than splitting up the 
>>> job into subtasks in such a case.
>> 
>> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going 
>> were before it was in the 10Ms with 1.
> 
> 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s 
> obviously wasn't close saturating a 10Gb link.

Cool.  Looks like I am doing my best which is what I wanted to know.  I chose 
to do 7 rsync scripts as it evenly divides into 154 parent dirs :)

You should see how our backup system deal with this; Atempo Time Navigator or 
Tina as its called.

It takes an hour just to lay down the dirs on tape before even starting to 
backup, crazyness.  And thats just for 1 parent dir having an avg of 500,000 
dirs.  Actually I'm prolly wrong as the initial creation is 125,000 dirs, of 
which a few are sym links.

Then it grows from there.  Looking at the Tina stats, we see a million objects 
or more.

- aurf
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"

Re: copying milllions of small files and millions of dirs

Reply via email to