Re: Best practices mirroring large file-system hierarchies?

2021-06-07 Thread Amelia A Lewis
Per Google, most likely there's a symlink loop in the source.

See mkdirat(2) (it refers to ELOOP).

See also errno(2), which has: 31 EMLINK Too many links

It also has 

62 ELOOP Too many levels of symbolic links

Your message has the text from EMLINK, but mkdirat only mentions ELOOP. 
That's not dispositive, though (I should look at the code for mkdirat, 
but not gonna).

In either case, the problem is almost certainly, as the error message, 
indicates, too many links (hard or symbolic), not too few inodes.

On Mon, 7 Jun 2021 21:49:01 +0300, Michael Lowery Wilson wrote:
> mkdirat: Too many links

Amy!
-- 
Amelia A. Lewisamyzing {at} talsever.com
It is practically impossible to teach good programming to students that 
have had a prior exposure to BASIC: as potential programmers they are 
mentally mutilated beyond hope of regeneration.
-- Edsger Dijkstra



Re: Best practices mirroring large file-system hierarchies?

2021-06-07 Thread Dave Voutila


Michael Lowery Wilson writes:

> Greetings,
>
> My attempts at creating a local mirror of Project Gutenberg's ebooks
> under OpenBSD 6.9 using openrsync following official instructions:
> https://www.gutenberg.org/help/mirroring.html have been unsuccessful.
>
> Specifically I am using:
>
> openrsync -av --del aleph.gutenberg.org::gutenberg-epub /disk5/gutenberg/
>
> to sync 927606 files (approximately 440 GB), which then fails with the errors:
>
> openrsync: error: 39488: mkdirat: Too many links

That looks like an EMLINK error. I believe it's about the number of
hardlinks to a file.

> openrsync: error: rsync_uploader
> openrsync: error: rsync_receiver
>
> (rsync from the package collection fails in the same manner)

Looking at their documentation on rsync commands, it's odd they give a
very different command in the cron example.

Have you tried rsync from ports and the cron command they provide?

/usr/bin/rsync -avHS --timeout 600 --delete --exclude 'cache/'...

openrsync(8) does not support the -H (preserve hardlinks) option.

>
> It would appear that there are plenty of inodes available in the
> target filesystem:

I do not believe this has anything to do with inodes.

>
> df -ih
> Filesystem SizeUsed   Avail Capacity iused   ifree  %iused  Mounted on
> /dev/sd7c  2.6T1.6T949G63%  991057 365310381 0%   /disk5
>
> My attempts at increasing kern.maxfiles (/etc/sysctl.conf),
> openfiles-max and openfiles-cur (/etc/login.conf) to 102400, have not
> been productive. My sysctl output is appended below. There is possibly
> some sysctl https://man.openbsd.org/sysctl.2 or /etc/login.conf
> setting that I believe I have overlooked, but I am not sure what that
> could be. I would very much appreciate being pointed in the right
> direction.
>

Right, per the error, mkdirat complained about "Too many links". If you
look at the libc errlist (lib/libc/gen/errlist.c) to back-reference to
the errno EMLINK. You can then look into what that means. It's nothing
to do with EMFILE for instance.

> sysctl
>
> kern.ostype=OpenBSD
> kern.osrelease=6.9
> kern.osrevision=202105
> kern.version=OpenBSD 6.9 (GENERIC.MP) #1: Sat May 22 13:19:59 MDT 2021
> 
> r...@syspatch-69-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
> kern.maxvnodes=3292387
> kern.maxproc=32768
> kern.maxfiles=102400
> kern.argmax=524288
> kern.securelevel=1
> kern.hostname=testbox.localdomain
> kern.hostid=0
> kern.clockrate=tick = 1, hz = 100, profhz = 100, stathz = 100
> kern.posix1version=200809
> kern.ngroups=16
> kern.job_control=1
> kern.saved_ids=1
> kern.boottime=Mon Jun  7 12:02:51 2021
> kern.domainname=
> kern.maxpartitions=16
> kern.rawpartition=2
> kern.maxthread=5000
> kern.nthreads=76
> kern.osversion=GENERIC.MP#1
> kern.somaxconn=2048
> kern.sominconn=80
> kern.nosuidcoredump=1
> kern.fsync=1
> kern.sysvmsg=1
> kern.sysvsem=1
> kern.sysvshm=1
> kern.msgbufsize=131032
> kern.malloc.buckets=16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288
> kern.malloc.bucket.16=(calls = 9371 total_allocated = 1280 total_free
> = 494 elements = 256 high watermark = 1280 could_free = 0)
> kern.malloc.bucket.32=(calls = 823928 total_allocated = 1024
> total_free = 291 elements = 128 high watermark = 640 could_free = 54)
> kern.malloc.bucket.64=(calls = 431182 total_allocated = 1856
> total_free = 1170 elements = 64 high watermark = 320 could_free =
> 190057)
> kern.malloc.bucket.128=(calls = 28642 total_allocated = 4992
> total_free = 20 elements = 32 high watermark = 160 could_free = 215)
> kern.malloc.bucket.256=(calls = 72230 total_allocated = 1520
> total_free = 1085 elements = 16 high watermark = 80 could_free = 6197)
> kern.malloc.bucket.512=(calls = 1087 total_allocated = 184 total_free
> = 5 elements = 8 high watermark = 40 could_free = 0)
> kern.malloc.bucket.1024=(calls = 6156 total_allocated = 292 total_free
> = 4 elements = 4 high watermark = 20 could_free = 0)
> kern.malloc.bucket.2048=(calls = 346 total_allocated = 242 total_free
> = 2 elements = 2 high watermark = 10 could_free = 0)
> kern.malloc.bucket.4096=(calls = 2840 total_allocated = 581 total_free
> = 1 elements = 1 high watermark = 5 could_free = 0)
> kern.malloc.bucket.8192=(calls = 661 total_allocated = 81 total_free =
> 1 elements = 1 high watermark = 5 could_free = 0)
> kern.malloc.bucket.16384=(calls = 651 total_allocated = 16 total_free
> = 0 elements = 1 high watermark = 5 could_free = 0)
> kern.malloc.bucket.32768=(calls = 3309 total_allocated = 11 total_free
> = 0 elements = 1 high watermark = 5 could_free = 0)
> kern.malloc.bucket.65536=(calls = 461724 total_allocated = 4
> total_free = 0 elements = 1 high watermark = 5 could_free = 0)
> kern.malloc.bucket.131072=(calls = 0 total_allocated = 0 total_free =
> 0 elements = 1 high watermark = 5 could_free = 0)
> kern.malloc.bucket.262144=(calls = 3 total_allocated = 3 total_free =
> 0 elements = 1 high watermark = 5 could_free = 0)
> 

Re: Best practices mirroring large file-system hierarchies?

2021-06-07 Thread Stuart Henderson
On 2021-06-07, Michael Lowery Wilson  wrote:
> Greetings,
>
> My attempts at creating a local mirror of Project Gutenberg's ebooks under 
> OpenBSD 6.9 using openrsync following official instructions: 
> https://www.gutenberg.org/help/mirroring.html have been unsuccessful.
>
> Specifically I am using:
>
> openrsync -av --del aleph.gutenberg.org::gutenberg-epub /disk5/gutenberg/
>
> to sync 927606 files (approximately 440 GB), which then fails with the errors:
>
> openrsync: error: 39488: mkdirat: Too many links
> openrsync: error: rsync_uploader
> openrsync: error: rsync_receiver

I think this is because there are too many subdirectories in a single directory.
Each subdirectory's ".." link is a hardlink with the parent directory and
you run into LINK_MAX (32767).

$ rsync aleph.gutenberg.org::gutenberg-epub|wc -l 
   65559

By the way, openrsync is a poor choice when dealing with many
files/directories. It does not support the incremental file listing
method that rsync added in 3.0.0 so will use more memory at your
side and perhaps more importantly on the server you're fetching from
as it has to build and transfer the entire file list in one go.




Best practices mirroring large file-system hierarchies?

2021-06-07 Thread Michael Lowery Wilson

Greetings,

My attempts at creating a local mirror of Project Gutenberg's ebooks under 
OpenBSD 6.9 using openrsync following official instructions: 
https://www.gutenberg.org/help/mirroring.html have been unsuccessful.


Specifically I am using:

openrsync -av --del aleph.gutenberg.org::gutenberg-epub /disk5/gutenberg/

to sync 927606 files (approximately 440 GB), which then fails with the errors:

openrsync: error: 39488: mkdirat: Too many links
openrsync: error: rsync_uploader
openrsync: error: rsync_receiver

(rsync from the package collection fails in the same manner)

It would appear that there are plenty of inodes available in the target 
filesystem:


df -ih
Filesystem SizeUsed   Avail Capacity iused   ifree  %iused  Mounted on
/dev/sd7c  2.6T1.6T949G63%  991057 365310381 0%   /disk5

My attempts at increasing kern.maxfiles (/etc/sysctl.conf), openfiles-max and 
openfiles-cur (/etc/login.conf) to 102400, have not been productive. My sysctl 
output is appended below. There is possibly some sysctl 
https://man.openbsd.org/sysctl.2 or /etc/login.conf setting that I believe I 
have overlooked, but I am not sure what that could be. I would very much 
appreciate being pointed in the right direction.


sysctl

kern.ostype=OpenBSD
kern.osrelease=6.9
kern.osrevision=202105
kern.version=OpenBSD 6.9 (GENERIC.MP) #1: Sat May 22 13:19:59 MDT 2021

r...@syspatch-69-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

kern.maxvnodes=3292387
kern.maxproc=32768
kern.maxfiles=102400
kern.argmax=524288
kern.securelevel=1
kern.hostname=testbox.localdomain
kern.hostid=0
kern.clockrate=tick = 1, hz = 100, profhz = 100, stathz = 100
kern.posix1version=200809
kern.ngroups=16
kern.job_control=1
kern.saved_ids=1
kern.boottime=Mon Jun  7 12:02:51 2021
kern.domainname=
kern.maxpartitions=16
kern.rawpartition=2
kern.maxthread=5000
kern.nthreads=76
kern.osversion=GENERIC.MP#1
kern.somaxconn=2048
kern.sominconn=80
kern.nosuidcoredump=1
kern.fsync=1
kern.sysvmsg=1
kern.sysvsem=1
kern.sysvshm=1
kern.msgbufsize=131032
kern.malloc.buckets=16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288
kern.malloc.bucket.16=(calls = 9371 total_allocated = 1280 total_free = 494 
elements = 256 high watermark = 1280 could_free = 0)
kern.malloc.bucket.32=(calls = 823928 total_allocated = 1024 total_free = 291 
elements = 128 high watermark = 640 could_free = 54)
kern.malloc.bucket.64=(calls = 431182 total_allocated = 1856 total_free = 1170 
elements = 64 high watermark = 320 could_free = 190057)
kern.malloc.bucket.128=(calls = 28642 total_allocated = 4992 total_free = 20 
elements = 32 high watermark = 160 could_free = 215)
kern.malloc.bucket.256=(calls = 72230 total_allocated = 1520 total_free = 1085 
elements = 16 high watermark = 80 could_free = 6197)
kern.malloc.bucket.512=(calls = 1087 total_allocated = 184 total_free = 5 
elements = 8 high watermark = 40 could_free = 0)
kern.malloc.bucket.1024=(calls = 6156 total_allocated = 292 total_free = 4 
elements = 4 high watermark = 20 could_free = 0)
kern.malloc.bucket.2048=(calls = 346 total_allocated = 242 total_free = 2 
elements = 2 high watermark = 10 could_free = 0)
kern.malloc.bucket.4096=(calls = 2840 total_allocated = 581 total_free = 1 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.8192=(calls = 661 total_allocated = 81 total_free = 1 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.16384=(calls = 651 total_allocated = 16 total_free = 0 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.32768=(calls = 3309 total_allocated = 11 total_free = 0 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.65536=(calls = 461724 total_allocated = 4 total_free = 0 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.131072=(calls = 0 total_allocated = 0 total_free = 0 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.262144=(calls = 3 total_allocated = 3 total_free = 0 
elements = 1 high watermark = 5 could_free = 0)
kern.malloc.bucket.524288=(calls = 7 total_allocated = 7 total_free = 0 
elements = 1 high watermark = 5 could_free = 0)