Re: [OpenAFS] Re: Performance issues with Git repositories (or in general with many small files workloads)
On Thu, Dec 17, 2020 at 11:44 AM wrote: > 1. You could use git repack to transfer less files without losing the ability > of incremental updates. For some reason Git (on the receiving side) although it receives a pack, it unpacks it. However my question also relates to other use-cases where one has to handle large number of files. (Git was just the latest use-case I faced these days where the performance issue poped-up.) > 2. You could turn off sync-after-close in the cache manager, see fs > storebehind. This should increase upfront performance but may degrade again, > should your cache run out of file handles. So, you'd have to play with cache > parameters, as well. I've already set storebehind to 16 MiB, which is well above the average size of a Git object. Moreover I've even tried to use `-sync never` to the fileserver and these didn't make much difference. Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Performance issues with Git repositories (or in general with many small files workloads)
On Thu, Dec 17, 2020 at 1:51 AM Douglas E Engert wrote: > If you are just backing up, consider "git bundle" that creates one file, > and git clone can read the bundle. > > https://stackoverflow.com/questions/5578270/fully-backup-a-git-repo Thank you for the suggestion. I know about `git-bundle` however I don't want only a backup, but in case I need also a "working" git repository, thus my option for `git push --mirror`. But indeed, if one only wants to backup a Git repository, then `git bundle` is the best option as it results in only one file, which OpenAFS handles flawlessly. However on the downside of bundle, there is no support for incremental backups; i.e. each new backup will be a full dump. Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Performance issues with Git repositories (or in general with many small files workloads)
After fiddling with the `fileserver` arguments, I think the problematic ones were `-p 128` and `-vhandle-max-cachesize 32768`, and perhaps the too large `-b`, `-l` and `-s`. (Also I've switched back to the non direct-attached variant of the servers.) The new arguments I'm using are: /usr/lib/openafs/fileserver -syslog \ -sync onclose \ -p 16 \ -udpsize 67108864 -sendsize 67108864 \ -rxpck 65536 -rxmaxmtu 1400 \ -cb 1048576 -busyat 65536 \ -vc 4096 -b 4096 -l 65536 -s 262144 Apparently by using these new options things work much better now, as in I can now get ~500 KiB/s where previously I had only ~20 KiB/s throughput. Although depending on the repository I can even obtain ~10 or ~20 MiB/s if it contains larger files. Now regarding the arguments, what is exactly `vhandle`? The documentation hints about "file handles"; are these the actual OS file-handles? Is there perhaps a bottleneck for large values of the block and vnode caches? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Performance issues with Git repositories (or in general with many small files workloads)
Hello all! I'm trying to use AFS to backup various Git repositories. By "backup" actually mean `git push --mirror /afs/.cell/some-path/repository.git`, which has the following behaviour: it writes many small files in the `.git/objects` folder fanned by the first two hex digits of the object hash. In fact this pattern can be found in many applications that handle lots of small files. For example `rsync`, build systems, etc. Moreover the pattern I'm describing is single-threaded, as in these files are not created concurrently by multiple threads / processes. Unfortunately the performance is abysmal, I mean what should take perhaps 1-2 seconds on a normal drive it takes perhaps up-to a minute on AFS; for example `git-push` reports an bandwidth of only ~20 KiB/s. Looking at the CPU usage, the `dafileserver` seems to be at ~95%, although the system has 4 cores and is lightly used. I can eliminate the following causes: * network issues (both bandwidth or latency), because this behaviour occurs even if I mount AFS on the same server where the file server lives, thus everything happens over loopback; * encryption -- it is off; * synchronous close -- I've tried to set `fs storebehind -allfiles 16384 -verbose`; * disks backing AFS cache -- it's a NVMe disk capable of ~3GiB/s; * disks backing AFS file server -- it's a RAID5 of 3 top-of-the-line (Gold) WD S-ATA drives; * I can achieve good throughput for large files, or if accessing medium sized files from multiple threads / processes; My OpenAFS deployment is on Linux 5.3.18, OpenSUSE Leap 15.2, and the following are the arguments of the file server and cache manager: /usr/lib/openafs/dafileserver -syslog -sync onclose \ -p 128 -b 524288 -l 524288 -s 1048576 -vc 4096 \ -cb 1048576 -vhandle-max-cachesize 32768 \ -udpsize 67108864 -sendsize 67108864 \ -rxpck 4096 -rxmaxmtu 1400 -busyat 65536 /usr/sbin/afsd -blocks 67108864 -chunksize 17 -files 524288 \ -files_per_subdir 4096 -dcache 524288 \ -stat 524288 -volumes 4096 \ -splitcache 90/10 \ -afsdb -dynroot-sparse -fakestat-all \ -inumcalc md5 -backuptree \ -daemons 8 -rxmaxfrags 8 -rxmaxmtu 1400 \ -rxpck 4096 -nosettime BTW, initially I was using the old `fileserver`-based setup, and even though I've switched to `dafileserver` the performance seems to stay unchanged. Thanks for the help, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Git throwing bus error when pack files not entirely cached
Sorry for reviving this old thread, but this happened to me also. My details: * the same computer is used as both server and client; * openSUSE 15.0 as distribution; * Linux 4.12.14-lp150.12.79-default x86_64 * OpenAFS 1.8.0-lp150.2.2.1 (both client and server packages); * OpenAFS `kmp-default` 1.8.0_k4.12.14_lp150.12.13-lp150.2.2.1; * Git 2.16.4; * large Git repository: * ~30 GiB; * most objects are packed; * largest pack is ~2.5 GiB; * a few are 200-400 MiB; * most are under 128 MiB; * contents of `/etc/openafs/cacheinfo`: /afs:/var/cache/openafs:33554432 * `/var/cache/openafs` is on Ext4 and still has a lot of space free; * nothing useful in neither `dmesg` or logs; * `afsd` is started as: /usr/sbin/afsd -blocks 33554432 -chunksize 17 -files 524288 -files_per_subdir 4096 -dcache 524288 -stat 524288 -volumes 4096 -splitcache 90/10 -afsdb -dynroot-sparse -fakestat-all -inumcalc md5 -backuptree -daemons 8 -rxmaxfrags 8 -rxmaxmtu 1400 -rxpck 4096 -nosettime How to provoke it: run one of the following: git fsck --root --tags --no-reflogs --full --connectivity-only --unreachable --dangling --name-objects git fsck --root --tags --no-reflogs --full --strict --unreachable --dangling --name-objects At random times I get: Bus error (core dumped) If one wants to try this and doesn't have a large enough repository, I would recommend: https://github.com/cdnjs/cdnjs I haven't yet tried to preload the files. Hope it helps find the issue, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
On Mon, Nov 25, 2019 at 2:53 AM Benjamin Kaduk wrote: > > * I suspect that perhaps the issue is due to the latest kernel version, > > because I have run similar patterns a few weeks ago on an older kernel (but > > still from the `5.x` family), but can't say for sure; > > I see the diagnostics and further data points later in the thread, but are > you in a position to boot an older kernel to attempt to confirm/refute this > hypothesis? The issue was on my personal laptop, thus I can try to install an older kernel and retry. (However looking at the OpenSUSE Tumbleweed, a rolling release, I think I'll have a hard time finding an older kernel...) I'll report back if I manage to do this. Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on concurrential file-system patterns (100% CPU in kernel mode)
On Wed, Nov 20, 2019 at 9:37 PM Ciprian Dorin Craciun wrote: > Now the client works OK, however if I start the `afsd` client on the > server itself (i.e. over `loopback` network), where previously (with > `-jumbo`) I was able to max-out the disks (~300 MiB/s), now seems to > be capped at around ~120MiB. (The packet-per-second is aroun > ~120K...) Minor correction (only to the item above, the rest still stands), restarting the `afsd` cache on the server itself (thus over `loopback`), I once again am able to max-out the disks (read only). (For some reason this wasn't the case in the previous test...) (It's not the benchmark, as I read large ~20MiB files with 16 concurrent readers, and the same command was used in both test-cases.) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on concurrential file-system patterns (100% CPU in kernel mode)
Before replying, I want to note that I think I've stumbled upon three (perhaps related) issues (some of which might just be configuration error): * AFS file access getting stuck; (seems to be solved by increasing the number of `fileserver` threads from `-p 4` to `-p 128`;) * trying to `SIGTERM` or `SIGKILL` a "stuck" process, takes Linux (in kernel code) to 100% CPU; * having `-jumbo -rxmaxmtu 9000` on the server, but not on the client, yields poor performance; This new thread (which I was just going to open myself) is related to the third problem of mismatch between server and client jumbo frames setting. On Wed, Nov 20, 2019 at 8:59 PM Kostas Liakakis wrote: > (Yesterday over wireless I didn't use Jumbo frames, but the day > before, where the same thing happened, I was using them.) > > Does this mean that '"the other day with jumbo frames" was over GigE ? Does > this happen over GigE with jumbo frames disabled a well? So, apparently having `-jumbo -rxmaxmtu 9000` on the server, but not configuring jumbo frames on the client yields poor performance. (Also the "getting stuck" issue happens regardless of this other problem.) Without touching the `fileserver` parameters, none of the following seem to work: * `afsd` with `-rxmaxmtu 9000` but without jumbo frames configured on the network card; (clear missconfiguration on my part); * `afsd` with `-rxmaxmtu 1500` but over GigaBit Ethernet (and without jumbo frames configured on the network card); (an usual client on the same network without jumbo frames support;) * `afsd` with `-rxmaxmtu 1500` but over Wifi (which is capable of ~14 MiB receive); (clearly no jumbo frames are supported;) * as mentioned only by matching the server configuration seem to solve the issue; * (encryption is disabled;) I've changed the `fileserver` parameters by removing `-jumbo` and updating `-rxmaxmtu 1400` (I also intend to use this over WAN, thus over PPPoE and VPN, which will add quite an impact on the MTU). Now the client works OK, however if I start the `afsd` client on the server itself (i.e. over `loopback` network), where previously (with `-jumbo`) I was able to max-out the disks (~300 MiB/s), now seems to be capped at around ~120MiB. (The packet-per-second is aroun ~120K...) > I 've seen problems finally attributed to jumbo frames where some > configuration change on a switch someplace amount the path rendered them > unusable. I don't think this is the case here. I have only one switch between the client and the server (no other network equipment), and I haven't encountered performance problems (even with regard to jumbo frames). Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
On Wed, Nov 20, 2019 at 7:49 PM Mark Vitale wrote: > > The following are the arguments of `fileserver`: > > -syslog -sync always -p 4 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb > > 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864 > > -sendsize 67108864 -rxmaxmtu 9000 -rxpck 4096 -busyat 65536 > > I see some areas of concern here. First of all, many of your parameters > indicate that you expect to run relatively high load through this fileserver. > Yet there are only -p 4 server threads defined. The fileserver will > automatically > increase this to the minimum of 6, but that still seems quite low. These parameters (at least most of them) were empirically identified for a highly concurrent access pattern, of a large number of 16KiB to 20MIB files, from a low number of users (2-3) over low-latency network (wired, GigaBit, same LAN). (I also had an IRC discussion with with Jeffrey about this topic.) There is a thread on this mailing list from 9th March 2019, with the subject <>, where I've also listed the IRC discussion with Jeffrey about this topic. The `-p` argument is explicitly present in that discussion. The main use-case of my setup is a home / SOHO file server acting as a NAS. Therefore all my parameters are tuned towards low-latency and high-bandwidth access, at the expense of server RAM (thus the large number for buffers count and sizes). > This low thread number, combined with a very large -busyat value, > means that this fileserver will queue a very large backlog before returning > VBUSY to the client. Is there a reason you need to keep the fileserver > threads so low? Would it be possible for you to increase it dramatically > (perhaps 100) and try the test again? I've just increased this number to `-p 128`, and re-executed the build. (I haven't restarted the client, but I did restart the server.) Under initial parameters (i.e. 8 parallel builds) I wasn't able to replicate the issue in 10 tries. (The solution for this item seemed to be removing `-jumbo` and setting `-rxmaxmtu 1500` instead of `9000`.) Thus I've deleted around ~2K output files and increased the parallelism to 32. Under these conditions, although the build didn't block, the bandwidth (over wireless) was around 500KiB/s (receive) when I would have expected more (the input files are much larger than the output files, for instance ~300KiB in to ~25KiB out), and the task completion rate seemed verry jagged (i.e. no progress for a while, then all of a sudden 10 would finish). (I mention that the workload is not CPU bound, average CPU on client is around ~20%.) I've tried this second scenario (with the no-Jumbo settings) a few times and still nothing got stuck. However even if the case of "stuck process for 20 minutes" is solved, there is still the issue of trying to `SIGTERM` those waiting processes that jumps the kernel in 100% CPU. If I can try other experiments, please let me know. Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
On Wed, Nov 20, 2019 at 7:03 PM Mark Vitale wrote: > Thank you for the backtraces. I agree that 'gm' is the problematic thread; > it appears to be stuck in rxi_WriteProc waiting for the Rx packet transmit > window > to advance. That is, it's waiting for acknowledgments - probably from the > fileserver. It's true that the test was performed over wireless, however the same behaviour was encountered even when over GigaBit LAN. (This is a personal setup, both server, network and client, and there was light to no usage on both the client, server and the network.) > Unfortunately the rest of the backtrace seems muddled and so we can't tell > exactly > what the client was doing. In fact, many of the backtraces are incomplete. I haven't deleted anything from a particular process stacktrace. Although I have deleted processes that have nothing to do with AFS or didn't contain a stack which contained `afs`. (If you think it would be useful I can send you privately a complete, uncensored, output.) > If I have some time later this week, I may try to reproduce this issue. > However, there's no guarantee I will be able to do so, so it would be better > if we could either obtain more information from your site, or if you could > narrow the problem down to a simpler test case. I'll try to reproduce this without the actual build system. (Using say `stat`, `cp` and `xargs`.) > Do you have FileLogs and/or fileserver audit logs for the time in question? Yes, I do have access to them. The following is the syslog output from OpenAFS server in a 5 minute time-window to the stacktrace sent yesterday: FindClient: stillborn client 0x7fe9b0012dc0(77749fe8); conn 0x7fe9d800e390 (host 172.30.214.35:7001) had client 0x7fe9b00131d0(77749fe8) FindClient: stillborn client 0x7fe9b00132a0(77749fec); conn 0x7fe9d800e660 (host 172.30.214.35:7001) had client 0x7fe9b0012dc0(77749fec) FindClient: stillborn client 0x7fe9b0013030(77749fec); conn 0x7fe9d800e660 (host 172.30.214.35:7001) had client 0x7fe9b0012dc0(77749fec) FindClient: stillborn client 0x7fe9b0012cf0(77749fec); conn 0x7fe9d800e660 (host 172.30.214.35:7001) had client 0x7fe9b0012dc0(77749fec) No information is present in `/var/log/openafs` in that timeframe. The following are the arguments of `fileserver`: -syslog -sync always -p 4 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864 -sendsize 67108864 -rxmaxmtu 9000 -rxpck 4096 -busyat 65536 (Yesterday over wireless I didn't use Jumbo frames, but the day before, where the same thing happened, I was using them.) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
On Tue, Nov 19, 2019 at 10:38 PM Ciprian Dorin Craciun wrote: > At the following link you can find an extract of `dmesg` after the > sysrq trigger. > > > https://scratchpad.volution.ro/ciprian/f89fc32a0bbd0ae6d6f3edbbc3ee111c/b9c3bc4f795bbe9e7eaca93b0a57bea0.txt I forgot to mention that in this case the CPU didn't go up to 100%, in fact it was quite "quiet". (The 100% CPU seems to happen only after a process "blocks" and I try to `SIGTERM` or `SIGKILL` it.) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
On Tue, Nov 19, 2019 at 5:10 PM Ciprian Dorin Craciun wrote: > > # echo t > /proc/sysrq-trigger At the following link you can find an extract of `dmesg` after the sysrq trigger. https://scratchpad.volution.ro/ciprian/f89fc32a0bbd0ae6d6f3edbbc3ee111c/b9c3bc4f795bbe9e7eaca93b0a57bea0.txt (I have filtered processes that don't have `afs` in their name, mainly because it exposes all my workstation's processes. However I can provide privately a complete file.) The following is the process which gets stuck (it took almost ~25 minutes to complete, and it is not input file related): gm S0 27572 27562 0x8000 Call Trace: ? __schedule+0x2be/0x6d0 schedule+0x39/0xa0 afs_cv_wait+0x10a/0x300 [libafs] ? wake_up_q+0x60/0x60 rxi_WriteProc+0x21d/0x410 [libafs] ? rxfs_storeUfsWrite+0x55/0xb0 [libafs] ? afs_GenericStoreProc+0x11a/0x1f0 [libafs] ? afs_CacheStoreDCaches+0x1a9/0x5b0 [libafs] ? afs_CacheStoreVCache+0x32c/0x680 [libafs] ? __filemap_fdatawrite_range+0xca/0x100 ? afs_osi_Wakeup+0xb/0x60 [libafs] ? afs_UFSGetDSlot+0xf6/0x4f0 [libafs] ? afs_StoreAllSegments+0x725/0xc20 [libafs] ? afs_linux_flush+0x486/0x4e0 [libafs] ? filp_close+0x32/0x70 ? __x64_sys_close+0x1e/0x50 ? do_syscall_64+0x6e/0x200 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe On a second try (that also lockups) the following is the stack-trace (only for the blocked process) (they look almost identical): gm S0 30548 30545 0x80004000 Call Trace: ? __schedule+0x2be/0x6d0 schedule+0x39/0xa0 afs_cv_wait+0x10a/0x300 [libafs] ? wake_up_q+0x60/0x60 rxi_WriteProc+0x21d/0x410 [libafs] ? rxfs_storeUfsWrite+0x55/0xb0 [libafs] ? afs_GenericStoreProc+0x11a/0x1f0 [libafs] ? afs_CacheStoreDCaches+0x1a9/0x5b0 [libafs] ? afs_CacheStoreVCache+0x32c/0x680 [libafs] ? __filemap_fdatawrite_range+0xca/0x100 ? afs_osi_Wakeup+0xb/0x60 [libafs] ? afs_UFSGetDSlot+0xf6/0x4f0 [libafs] ? afs_StoreAllSegments+0x725/0xc20 [libafs] ? afs_linux_flush+0x486/0x4e0 [libafs] ? filp_close+0x32/0x70 ? __x64_sys_close+0x1e/0x50 ? do_syscall_64+0x6e/0x200 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe I can reliably trigger the issue almost 50% of the times, by just doing the following: * remove a few files (in my case ~15) which should trigger the rebuild of around x2; * start the build with maximum 8 processes concurrency; * all the processes execute similar jobs, with similarly sized inputs, outputs and used CPU time; Based on `htop` I would say that neither `ninja` which does the heavy `stat`-ing, neither `gm` (an ImageMagik alternative) are multi-threaded. The build procedure involves the following AFS related operations: * check if the output exists, and if so `rm`; * create an `output.tmp` file; * move the `output.tmp` to `output`; No other proceses are actively using AFS (except `mc` and a couple of `bash` which have their `cwd` into an AFS volume). (The `[nodaemon]` process is a simple tool that uses `prtcl (PR_SET_CHILD_SUBREAPER)` to catch double forking processses, and also has the `cwd` into AFS.) Hope it helps, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
On Tue, Nov 19, 2019 at 5:06 PM Mark Vitale wrote: > If you had a true soft lockup, there should be some information in the syslog. I don't think it was a "softlokup" as per Linux kernel terminology, as it would have been detected by the kernel. (But still it took all my cores to 100% in kernel space as mentioned.) > If you don't see anything there, you could try this while the hang is > occurring: There wasn't anything in either `journald` (i.e. the syslog replacement) nor in `dmesg`. (And the system was freshly rebooted after each occurrence.) > # echo t > /proc/sysrq-trigger I'll try to re-trigger that issue later today, and report back the findings. Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)
A few days ago I have encountered a very strange OpenAFS client issue that basically exhibits in two ways: * either the processes accessing the file-system get "stuck" reading (or perhaps opening) the files; (although if one waits "long" enough, sometimes those processes will finally complete their job;) (in this case the CPU doesn't go to 100%;) * either if one tries to `SIGTERM` the stuck processes, the CPU goes to 100% (on multiple cores) in kernel mode; (again, sometimes if one waits long enough, the system settles;) The usage pattern is as follows: * it is a typical "build" scenario, where a `make`-like tool (in this case `ninja`) heavily stats all files it knows about to find changed or missing ones; (in my case there are about 90k files, all hosted on AFS; moreover I suspect `ninja` tries to stat these on multiple threads;) * there are a few processes that do CPU-bound tasks, reading a file (from AFS) and writing the output to another one (also on AFS); (the concurrency level doesn't seem to change much, from 128 processes in parallel to 4;) I was able to replicate this issue each time I tried to run the build and send `SIGTERM`, after letting the whole build process run for a night it eventually completed. My setup is as follows: * OpenSUSE Tumbleweed, kernel 5.3.9-1-default, client package `openafs-client` and `openafs-kmp-default` at `1.8.5_k5.3.9_1-1.3` as provided by OpenSUSE; * `afsd` parameters (neither memory cache (on `tmpfs`) or disk cache seems to help; neither daemons from 4 to 1; encryption is off): -verbose -blocks 7864320 -chunksize 17 -files 524288 -files_per_subdir 128 -dcache 524288 -stat 524288 -volumes 128 -splitcache 90/10 -afsdb -dynroot-sparse -fakestat-all -inumcalc md5 -backuptree -daemons 1 -rxmaxfrags 8 -rxmaxmtu 1500 -rxpck 4096 -nosettime -verbose -memcache -blocks 1048576 -chunksize 17 -stat 524288 -volumes 128 -splitcache 90/10 -afsdb -dynroot-sparse -fakestat-all -inumcalc md5 -backuptree -daemons 1 -rxmaxfrags 8 -rxmaxmtu 1500 -rxpck 4096 -nosettime * the server is also on OpenSUSE Leap 15.0, with `openafs-server` package at `1.8.0-lp150.2.2.1` as provided by OpenSUSE; * I suspect that perhaps the issue is due to the latest kernel version, because I have run similar patterns a few weeks ago on an older kernel (but still from the `5.x` family), but can't say for sure; I also tried the following: * `fs flushall` seems to block as the processes accessing the file-system; * the only way to "kill" the stuck processes is to disconnect the network, and let them timeout; Any pointers on how to diagnose this? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)
On Mon, Mar 11, 2019 at 12:35 AM Benjamin Kaduk wrote: > > Thus I think that when one would modify the code, in large part the > > code is common, and where it isn't at least the "switch" is visible in > > there. Therefore I'm confident that the `fileserver` is still a > > viable solution. :) > > I won't really dispute that it is viable at present, but it's pretty clear > to me that it's no longer a *recommended* solution, and I don't really > understand your attachment to it. Is this just because you continue to > investigate running a simple fileserver without the bosserver and > demand-attach has more moving parts in that respect? Exactly. I want to simplify the OpenAFS deployment as much as possible. (Especially since the simpler it is, the better the chance I actually understand what happens with my data.) I see OpenAFS as a viable solution for a WAN-enabled NAS, that one could quickly deploy (the "server" part) in a VM (or even a container), and just use it. (I'm really amazed that to day no other WAN-enabled NAS solution exists, especially one that allows user-defined ACL's, and one that works both on Linux and Windows...) However as it stands today OpenAFS is geared towards large and static deployments, and less for "experimental" ones. I would really love If I managed to "put together" a very lightweight VM that has just the bare minimum services and moving parts. (And this is really achievable once one understands the "underlying" of managing a file server.) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)
On Mon, Mar 11, 2019 at 12:06 AM Benjamin Kaduk wrote: > To be clear, they do share a great bit of code (dafs was not "from > scratch"), but there are many places that do get differential treatment in > the source -- look for AFS_DEMAND_ATTACH_FS preprocessor conditionals. Based on what I see: https://github.com/openafs/openafs/search?q=AFS_DEMAND_ATTACH_FS https://github.com/openafs/openafs/blob/c1d39153da00d5525b2f7874b2d214a7f1b1bb86/src/viced/Makefile.in#L15 https://github.com/openafs/openafs/blob/c1d39153da00d5525b2f7874b2d214a7f1b1bb86/src/dviced/Makefile.in#L15 I would assume that most of the code is common (in terms of files), and at compile time the sources are re-built with different defines. Thus I think that when one would modify the code, in large part the code is common, and where it isn't at least the "switch" is visible in there. Therefore I'm confident that the `fileserver` is still a viable solution. :) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: Starting an server (both DB and FS) without `BOS` (e.g. on Linux with systemd)
On Sat, Mar 9, 2019 at 11:16 PM Jeffrey Altman wrote: > The BOS Overseer Service plays a number of roles: Just wanted to stress that `bos` is wonderful in a distributed deployment, and I'm quite surprised that until this date we don't have other "general purpose" alternatives. However as stated in the previous email I'm using OpenAFS in a home / small office environment, where I'll never have more than one server. Moreover the deployment will in the end be done in a dedicated VM. Thus the need of `bos` seems to be superfluous. > 2. The bosserver is responsible for managing the content of many >configuration files including BosConfig, UserList, and >the server version of the CellServDB file. The KeyFile can >also be updated via bosserver. The files other than BosConfig >are shared with the AFS services. These files are configured one-time only, and from what I gather (and experimented) can easily be created by hand without the `bos` toolchain. (Perhaps only the `KeyFile` requires `bos` commands, but does not require the `bos` daemon to be running.) >c. fs - a bnode which defines the process group for [...] > >d. dafs - a bnode which defines the process group for the > demand attach fileserver. The bosserver has special knowledge > related to process restart in case of failure and integration > with the "bos salvage" command. > > 3. The bosserver is used to request manual salvages of individual >volumes or whole partitions. When the "fs" bnode is in use, >the bnode will be stopped and started while the salvage takes >place. With the "dafs" bnode, single volume salvages do not >require the "dafs" bnode to be halted but full partition >salvages do. > > [...] > > > Does the `fileserver` / `dafileserver` actually start the salvage > > process, or do they communicate this to the `bos` to restart only that > > service? > > Most but not all of these functions could be performed with other tools. > Managing the special inter-dependencies of the "fs" and "dafs" bnode > processes and salvaging are the two exceptions. And this is where things get "opaque", and the documentation doesn't give much internal details. When you say <>, by "failure" you mean "the `fileserver` process just dies", or the `fileserver` process somewhat "signals" this to the `bos` server? Because what I gather from what you say, a simplified file server startup might look like: * run `salvager` / `dasalvager` and wait for it to terminate; * run `volserver` / `davolserver` and parallel, * run `fileserver` / `dafileserver` and, * if any of the volume or file servers fail, stop them and restart from the first step; Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Starting an server (both DB and FS) without `BOS` (e.g. on Linux with systemd)
[I'm adding to the previous question also the issue of salvaging. I'm quoting what I've asked on a previous thread.] BTW, on the topic of volume salvaging, when I define my DAFS / FS node I start a node of `salvager` (for FS) and `dasalvager` and `salvageserver`. However looking at the running processes the `salvager` and `dasalvager` don't seem to be running after the initial startup. Thus I wonder how the salvage process actually happens? Does the `fileserver` / `dafileserver` actually start the salvage process, or do they communicate this to the `bos` to restart only that service? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)
On Sat, Mar 9, 2019 at 11:43 AM Harald Barth wrote: > > However is it still "safe" and "advised" (outside of these > > disadvantages) to run the old `fileserver` component? > > I would recommend everyone to migrate to "da" and not recommend to > start with anything old. For obvious reasons, all the big > installations will migrate to "da" and you don't want to run another > codebase, don't you? Thanks Harald for the feedback. This is exactly what I wanted to find out, namely if the `fileserver` and `dafileserver` have different code bases. (And you've confirmed my hunch that the DAFS codebase is the currently maintained one.) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)
On Sat, Mar 9, 2019 at 4:10 AM Mark Vitale wrote: > DAFS main benefit is the reduced impact of restarting a fileserver, especially > fileserver with thousands or even millions of volumes. DAFS fileservers > are able to restart more quickly, are able to avoid restarts formerly > required for > volume salvages, and are able to reduce the negative effects of restarts on > clients. > Here are some details about how these benefits are acheived: Thanks Mark for explaining the advantages of DAFS, especially number (4) (i.e. saving of the client "states"). However is it still "safe" and "advised" (outside of these disadvantages) to run the old `fileserver` component? (More specifically, from a source code point of view, outside of the demand-attach, are there any other performance / stability improvements in DAFS as compared with FS?) BTW, on the topic of volume salvaging, when I define my DAFS / FS node I start a node of `salvager` (for FS) and `dasalvager` and `salvageserver`. However looking at the running processes the `salvager` and `dasalvager` don't seem to be running after the initial startup. Thus I wonder how the salvage process actually happens? Does the `fileserver` / `dafileserver` actually start the salvage process, or do they communicate this to the `bos` to restart only that service? (My main reason to ask this, is in anticipation of my other email which tries to identify if I can safely run the fileserver processes directly from `systemd` outside the control of `bos`?) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Questions regarding `afsd` caching arguments (`-dcache` and `-files`)
On Fri, Mar 8, 2019 at 11:39 PM Ciprian Dorin Craciun wrote: > On Fri, Mar 8, 2019 at 11:11 PM Jeffrey Altman wrote: > > The performance issues could be anywhere and everywhere between the > > application being used for testing and the disk backing the vice partition. OK, so first of all I want to thank Jeffrey for the support via IRC, as we've solved the issue. Basically it boils down to: * lower the number of threads from the `fileserver` to a proper value based on available CPU's / cores; (in my case `-p 4` or `-p 8`;) * properly configure jumbo frames on the network cards `ip link set dev eth0 mtu 9000`; (this configuration has to be made in the "proper" place else it will be lost after restart;) * (after changing MTU restart both server and clients;) * disable encryption `fs setcrypt -crypt off`; (in the end based on what I understood it's not too powerful, and given that I'll use it mostly on LAN it's not an issue; moreover for WAN I don't need to saturate GigaBit network;) * (after changing re-authenticate, i.e. `unlog && klog`); In order to check the correct configuration one has to: * `cmdebug -server 192.168.0.2 -addrs` (on the client) to see if the MTU is correctly picked up; (else restart the cache manager;) * `rxdebug -server 192.168.0.1 -peer -long` (on the server) to see if the `ifMTU / natMTU / maxMTU` for the client connection have proper values; (in my case they were `8524 / 7108 / 7108`;) * use `top -H` and check if the kernel thread `afs_rxlistener` (on the client) and the many of the `fileserver` threads (on the server) are not maxed-out (i.e. > ~90%); if so, that is the bottleneck (after encryption is disabled and jumbo frames are enabled); A note about the benchmark: in order to saturate the link I've tested only with the large files (i.e. ~20 MiB each), else I'll end up "trashing" the disk, and thus that would become the bottleneck. BTW, I've taken the liberty to copy-paste the log from the IRC channel (I've keep only the relevant lines, and also grouped reordered some of them), because they are very insightful into OpenAFS performance tuning. So once more thank's Jeffrey for the help, Ciprian. 23:43 < auristor> first question, when you are writing to the fileserver, does "top -H" show a fileserver thread at or near 100% cpu? 23:45 < auristor> -H will break them out by process thread instead providing one value for the fileserver as a whole 23:46 < auristor> I ask because one thread is the RX listener thread and that thread is the data pump. If that thread reaches 100% then you are out of capacity to receive and transmit packets 00:00 < auristor> Since you have a single client and 8 processor threads on the fileserver, I would recommend lowering the -p configuration value to reduce lock contention. 23:55 < auristor> there are two major bottlenecks in the OpenAFS. First, the rx listener thread which does all of the work associated with packet allocation, population, transmission, restransmission, and freeing on the sender and packet allocation, population, application queuing, acknowledging, and freeing on the receiver. 23:56 < auristor> In OpenAFS this process is not as efficient as it could be and its architecture limits it to using a single processor thread which means that its ability to scale correlates to the processor clock speed 23:58 < auristor> Second, there are many global locks in play. On the fileserver, there is one global lock for each fileserver subsystem required to process an RPC. For directories there are 8 global locks that must be acquired and 7 for non-directories. 23:59 < auristor> These global locks in the fileserver result in serialization of calls received in parallel. 00:00 < ciprian_craciun> (Even if they are for different directories / files?) 00:00 < ciprian_craciun> (I.e. is there some sort of actual "global lock" that basically serializes all requests from all clients?) 00:01 < auristor> The global locks I mentioned do serialize the startup and shutdown of calls even when the calls touch different objects. 00:02 < auristor> Note that an afs family fileserver is really an object store. unlike a nfs or cifs fileserver, an afs fileserver does not perform path evaluation. path evaluation to object id is performed by the cache managers. 00:04 < auristor> The Linux cache manager also has a single global lock that protects all other locks and data structures. This lock is dropped frequently to permit parallel processing but it does severely limit the amount of a parallel execution 00:09 < ciprian_craciun> Trying now with `-p 4` seems to yield ~35 MiB/s of `cat` throughput. 00:11 < auristor> that would imply that the fileserver is not releasing worker threads from the call channel fast enough
Re: [OpenAFS] Questions regarding `afsd` caching arguments (`-dcache` and `-files`)
[Replying also to the list, just to mention the benchmarking technique.] On Fri, Mar 8, 2019 at 11:11 PM Jeffrey Altman wrote: > The performance issues could be anywhere and everywhere between the > application being used for testing and the disk backing the vice partition. The issue is not the backing disk as using the same benchmarking technique (see bellow) I get around ~270 MiB/s from the actual `/vicepX` files. The technique is simple (i.e. list all files, randomize them, and then `cat` them 128 at a time to `/dev/null`): find . -type f | sort -R | xargs -P 64 -n 128 -- cat > /dev/null Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Regarding OpenAFS performance (on a small / home single node deployment)
Small correction to the previous email, the `-chunksize` for the server `afsd` was `20` (i.e. 1MiB) at the time of the experiment. And the `-dcache` on the LAN client was `65536`. (The values in my initial email were based on some notes I had while I was trying various parameters.) Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Regarding OpenAFS performance (on a small / home single node deployment)
[I've changed the subject to reflect the new topic.] On Fri, Mar 8, 2019 at 9:58 PM Mark Vitale wrote: > >>> (I'm struggling to get AFS to go over the 50MB/s, i.e. half a GigaBit, > >>> bandwidth... My target is to saturate a full GigaBit link...) > > > > Perhaps you know: what is the maximum bandwidth that one has achieved > > with OpenAFS? (Not a "record" but in the sense "usually in enterprise > > deployments we see zzz MB/s".) > > I think this may be a question like "how long is a piece of string?". > The answer is "it depends". Could you be more specific about your use cases, > and what you are seeing (or need to see) in terms of OpenAFS performance? So my use-case is pretty simple: * small (home / office) single node deployment on Linux (OpenSUSE Leap 15.0) running OpenAFS 1.8; * three `/vicepX` partitions on the same Ext4 over RAID5, backed by rotative HDD's, capable of ~300 MiB/s sequential I/O (in total per RAID); (these are migrated from three old disks, and I mean to merge them into a single one;) * 1x GigaBit network, 32 GiB RAM, Core i7, currently not used for anything else; * I have around 600 GiB of personal files, in ~20 volumes; some (around 50%) of these files are largish ~20 MiB files (in one volume), meanwhile the rest are "usual" smallish ~128 KiB to mediumish ~4 MiB (these last figures are an assumption) (all in 2 or 3 volumes); My intention is to saturate the GigaBit network card from one client (in the same LAN) (both with 9k Jumbo frames support), while accessing these files read-only. (The client has an 6 GiB cache over TMPFS, with 8 GiB RAM and 64 GiB swap. I know this last one is not "advisable", but the cache is not swapped, thus it is not impacting the performance.) I've tried to read either sequentially and in parallel (from 8 to 64 processes), all the available files (either sorted by path or randomly), and never get over 40-50 MiB/s in network traffic. (I've done this test both from the server, thus over `lo`, and the networked client, with almost the same performance.) The following is my current configuration: * for the `fileserver`: /usr/lib/openafs/fileserver -syslog -sync always -p 128 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864 -sendsize 67108864 -rxmaxmtu 8192 -rxpck 4096 -busyat 65536 * for the `volserver`: /usr/lib/openafs/volserver -syslog -sync always -p 16 -jumbo -udpsize 67108864 * for the server `afsd`: -memcache -blocks 4194304 -chunksize 17 -stat 524288 -volumes 4096 -splitcache 25/75 -afsdb -dynroot-sparse -fakestat-all -inumcalc md5 -backuptree -daemons 8 -rxmaxfrags 8 -rxmaxmtu 8192 -rxpck 4096 -nosettime * for the LAN client `afsd`: -blocks 7864320 -afsdb -chunksize 20 -files 262144 -files_per_subdir 1024 -dcache 128 -splitcache 25/75 -volumes 256 -stat 262144 -dynroot-sparse -fakestat-all -backuptree -daemons 8 -rxmaxfrags 8 -rxmaxmtu 8192 -rxpck 4096 -nosettime > > (I think my issue is with the file-server not the cache-manager...) > > It is easy to get bottlenecks on both. One way to help characterize this > is to use some of the OpenAFS test programs and see how they perform against > your fileservers: > - afscp (tests/afscp) > - afsio (src/venus/afsio) > > There is also the test server/client pair for checking raw rx network > throughput: > - rxperf (src/tools/rxperf) I'll try to look at them. (None of them seem to be part of the OpenSUSE RPM, thus I'll have to build them.) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Questions regarding `afsd` caching arguments (`-dcache` and `-files`)
On Fri, Mar 8, 2019 at 9:30 PM Mark Vitale wrote: > But now on more careful reading, I see this only applies when -dcache has not > been explicitly specified. > (Which, to be fair, is the normal case). Thanks for the insight. > > (I'm struggling to get AFS to go over the 50MB/s, i.e. half a GigaBit, > > bandwidth... My target is to saturate a full GigaBit link...) > > Here are some helpful commands for examining the results of your > configuration experiments: > > cmdebug -cache > fs getcacheparms -excessive Perhaps you know: what is the maximum bandwidth that one has achieved with OpenAFS? (Not a "record" but in the sense "usually in enterprise deployments we see zzz MB/s".) (I think my issue is with the file-server not the cache-manager...) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Questions regarding `afsd` caching arguments (`-dcache` and `-files`)
On Fri, Mar 8, 2019 at 9:11 PM Mark Vitale wrote: > The -dcache option for a disk-based cache does set the number of dcaches in > memory. > It has a minimum value of 2000 and max of 1. Is the 100K maximum a hard limit imposed in code, or a "best-practice"? (I've looked in a few places and it seems that it is not a hard limit.) > In addition, many of the options interact with each other. > The best guide for how all this _really_ works is the source code - however, > the > source itself is quite confusing at times, so I feel your pain. Currently I go with a trial-and-error approach. :) (I'm struggling to get AFS to go over the 50MB/s, i.e. half a GigaBit, bandwidth... My target is to saturate a full GigaBit link...) Thanks Mark for the info, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: Questions regarding `afsd` caching arguments (`-dcache` and `-files`)
On Fri, Mar 8, 2019 at 6:19 PM Ciprian Dorin Craciun wrote: > (B) Using `-files` and `-chunksize` so that their product is larger > than `-blocks` means that the cache can hold up to as many `-files` > actual AFS files, but their total size can't be larger than `-blocks`? > (I.e. if one has a cell with lots of small files, it is OK to > configure a largish `-chunksize` and `-files` because they will be > cached up to `-blocks`.) I've found the http://docs.openafs.org/Reference/5/afs_cache.html documentation that states: Vn files expand and contract to accommodate the size of the AFS directory listing or file they temporarily house. As mentioned, by default each Vn file holds up to 64 KB (65,536 bytes) of a cached AFS element. AFS elements larger than 64 KB are divided among multiple Vn files. If an element is smaller than 64 KB, the Vn file expands only to the required size. A Vn file accommodates only a single element, so if there many small cached elements, it is possible to exhaust the available Vn files without reaching the maximum cache size. This would imply that: * there is no 1-to-1 relation between "chunks" and "Vn files", one chunk could be stored in multiple "Vn files"; (however one "Vn file" never stores multiple chunks in case the chunk size is bellow 64K?) * by explicitly setting `-files` one can set a limit to the maximum number of actual AFS files to cache; (i.e. if all files are smaller than 64K and the `-blocks` is larger than `-files * 64K`, then no more than `-files` AFS files would be stored;) Am I correct? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Questions regarding `afsd` caching arguments (`-dcache` and `-files`)
I have two small questions about the cache management of `asfd`. (The documentation isn't very explicit.) (In both cases I'm speaking about disk-based cache.) (A) Using `-dcache 128` with a `-chunksize 10` (i.e. 1MiB) for a disk-based cache, would actually allocate 128 MiB from kernel memory (i.e. the product of the two)? It is unclear from the documentation. (Although I would infer yes, based on the description of memory based cache.) (B) Using `-files` and `-chunksize` so that their product is larger than `-blocks` means that the cache can hold up to as many `-files` actual AFS files, but their total size can't be larger than `-blocks`? (I.e. if one has a cell with lots of small files, it is OK to configure a largish `-chunksize` and `-files` because they will be cached up to `-blocks`.) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Documentation for `afsd` argument `files_per_subdir` is wrong (out of sync with the implementation)
Hello all! I'm using OpenAFS on OpenSUSE, version 1.8.x (in fact 1.8.0 and 1.8.2 on two nodes), and although the documentation for the `afsd` daemon states for `files_per_subdir` that: files_per_subdir -- Limits the number of cache files in each subdirectory of the cache directory. The value of the option should be the base-two log of the number of cache files per cache subdirectory (so 10 for 1024 files, 14 for 16384 files, and so forth). It is in fact use without the exponential transformation. I.e. setting `-files_per_subdir 10` will actually result in exactly 10 files per directory, meanwhile `-files_per_subdir 1024` would correctly result in 1K files. Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Starting an server (both DB and FS) without `BOS` (e.g. on Linux with systemd)
I understand that for large deployment the `bos` is useful because it allows administering the AFS services remotely without resorting to SSH. However for small deployments (like for example a single server) could it be removed completely and letting the services be started without it? (Like for example as plain systemd services.) (My assumption based on the snippet in the documentation seems to be "yes".) And if it is possible, then the various services should be started just as they are listed inside `/etc/openafs/BosConfig`? Are there other environment variables (or similar "configuration") that must be configured? Also is there a particular ordering or (hard) dependency between the services? (Or they can be started in parallel.) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Administrators with a slash
On Wed, Mar 6, 2019 at 7:16 AM Benjamin Kaduk wrote: > To a large extent, getting Kerberos set up is pretty much drop it in and > switch it on, but there's a lot of flexibility about principal names, > especially for administrative operations. Getting it integrated with > OpenAFS is mostly about having the right 'pts createuser's happen to > register users, and creating the afs/cellname.fqdn principal to go in the > rxkad.keytab and/or KeyFileExt -- at this point, AFS is just a regular > kerberized service and doesn't require special treatment on the Kerberos > side for the service principals. Indeed this was my experience also, the Kerberos deployment was quite trivial (once I've done it); however in seemed (and still seems) that I've "lost" something along the way because I lack the proper know-how and expertise with Kerberos. > I don't know of specific documentation for this, no. > I think that many sites running Kerberos+AFS have some homegrown database > management system that handles both and keeps them synchronized. And this is unfortunate, especially since deploying OpenAFS "seems" a daunting task for the small cell operator, or one that just wants to "play" with the technology. I say "seems" because deploying an OpenAFS server can be done quite quickly with a couple of copy-pastes. Perhaps (if I'll have time) I will prepare a small hands-on tutorial on deploying OpenAFS on a Linux server. (I know that there already exists the "Quick Starting UNIX Guide", however it is far from "quick"...) :) > > > Of course, rxgk will let us use fancier names for things, so we'll have to > > > get used to a whole new world order when that finishes landing... > > > > Could you elaborate more on this? > > The short form is that we'll be able to use (encoded) GSS principal > names in the UserList file. It looks like the details haven't made it into > the UserList.pod documentation yet (unsurprising, since the code to > authenticate as them isn't in place yet), but the format includes a base64 > encoded version of the GSS exported name. Basically it means one could use something alternative to Kerberos for authentication? (Something that is GSS-compliant?) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] About `dafileserver` vs `fileserver` differences (for small cells)
Hello all! I understand from the documentation that the main difference between `dafileserver` and `fileserver` is the "on-demand-attach" of volumes. However I wonder if there are other advantages / differences between the two, especially with regard to: * performance -- is `dafileserver` more performant than `fileserver`? * reliability -- because (I assume) many cells have migrated to `dafileserver` the "old" `fileserver` gets less used, thus less tested in real deployments; * maintenance -- is the `fileserver` still actively developed and maintained? I ask this also from the perspective of a small cell operator (for personal purposes), where attach-on-demand is not an issue, and in fact I think I would prefer all my volumes to be attached as early as possible. Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Administrators with a slash
On Mon, Mar 4, 2019 at 3:35 AM Benjamin Kaduk wrote: > > Perhaps the OpenAFS Quick Start UNIX chapters touching the Kerberos > > integration (http://docs.openafs.org/QuickStartUnix/HDRWQ53.html) > > should clearly state this issue with principals containing dots and > > using at the same time instances (i.e. slashes)... > > Patches welcome! (XML sources browseable at > http://git.openafs.org/?p=openafs.git;a=tree;f=doc/xml/QuickStartUnix;h=9e4fbd3f23b81696d98b1fcb68519364fe365d3f;hb=HEAD > ; preferred submissions are as gerrit changes (docs on that at > https://wiki.openafs.org/devel/GitDevelopers/) but mailed patches and > similar are fine. I'll try to provide a patch to the documentation. (I am aware that OpenAFS is an open-source, volunteer-based project, thus I was not "demanding" the update.) :) However on the same subject, is there a document describing how one should configure Kerberos (from MIT) to work flawlessly with OpenAFS? (I've tried searching for such a document, but found none, and moreover even "plain" Kerberos deployment tutorials are very scarce...) > > Moreover it's still unclear to me if in `pts createuser` I should use > > the `username.admin` or `username/admin` variants? (It lets me do > > both, but I think only the former actually works.) Could someone tell > > me the "correct" syntax for OpenAFS usernames? > > You should pts createuser the username.admin variants. I'll try to include this in that patch also. > Of course, rxgk will let us use fancier names for things, so we'll have to > get used to a whole new world order when that finishes landing... Could you elaborate more on this? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
On Wed, Dec 5, 2018 at 3:29 PM Harald Barth wrote: > > Can I safely `rsync` the files from the old partition to the new one? > > For Linux (The "new" server partition layout): > > If the file tree really is exactly copied (including all permissions > and chmod-bits) then you have everything you need. This was not true > for the old file system layout for example in SunOS UFS. Just for reference (perhaps it would help others in the future) I've used something on the lines: (please note that it would erase everything from the destination folder; thus it must be used only for migration purposes;) (also note that it is safe to run this multiple times, perhaps to resume a failed synchronization, or perhaps to rollback and try the salvage operation with various options, etc.) rsync \ --recursive --one-file-system \ --delete \ --ignore-times \ --checksum --checksum-choice md5 \ --links --safe-links \ --hard-links \ --perms --times \ --owner --group --numeric-ids \ --whole-file --no-compress \ --preallocate \ --verbose --progress --itemize-changes \ -- \ /mnt/old-disk/vicepX/ \ /mnt/new-disk/vicepX/ \ # > I would copy to a not-yet used partition, mount it then as /vicepY > (where Y is a new unused letter) and then as the first thing when > startting the server run a salvage with the options > -orphans attach -salvagedirs -force I've run the following before starting any OpenAFS services (i.e. without `bos` running.) /usr/lib/openafs/dasalvager \ -partition /vicepX \ -orphans attach \ -salvagedirs \ -force \ # Apparently outside some lines like: Vnode 18800: version < inode version; fixed (old status) , no other "strange" lines appear in the `SalvageLog` file. And finally I've created a completely new OpenAFS (1.8.0) deployment from scratch, initializing the protection database with the same users and groups as in the previous deployment, making sure I've kept the same UID's. (Hopefully this is enough to keep ACL's and ownership from the old volumes intact.) Afterwards I've started the OpenAFS `bos` service, and run the following: vos syncvldb \ -server 172.xx.xx.xx \ -partition X \ -verbose \ -localauth \ # vos syncserv \ -server 172.xx.xx.xx \ -partition X \ -verbose \ -localauth \ # Hopefully this was enough to "migrate" my old AFS deployment to the new server. Thanks all for the help, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Administrators with a slash
On Tue, Jan 10, 2012 at 3:20 PM Bobb Crosbie wrote: > I now recall reading about the slash -> dot remapping in the docs, but I had > forgotten about it. > > I think perhaps the tools might have done a better job of indicating that > there was a problem, and what it might be ? > > If slashes are remapped to dots, then perhaps ``pts createuser'' should issue > a warning message if you try to create a user with a slash ? > As it stands (1.4.12 & 1.6.0), pts happily creates the user with the slash > and also includes it in the list of entries. Sorry for reviving such an old thread, but I've just wasted about 4 hours randomly trying things out in order to get OpenAFS (1.8.0) with Kerberos to actually work... And fortunately (?!) I've managed to find the solution through this random process; thus I've searched the mailing lists to see if anyone had the same issue... Perhaps the OpenAFS Quick Start UNIX chapters touching the Kerberos integration (http://docs.openafs.org/QuickStartUnix/HDRWQ53.html) should clearly state this issue with principals containing dots and using at the same time instances (i.e. slashes)... Moreover as Bobb observed almost 10 years ago, none of the OpenAFS tools (not even in 1.8.0) give any hint about what is happening, not in the logs, nor on stderr... Moreover it's still unclear to me if in `pts createuser` I should use the `username.admin` or `username/admin` variants? (It lets me do both, but I think only the former actually works.) Could someone tell me the "correct" syntax for OpenAFS usernames? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
On Sun, Mar 3, 2019 at 12:29 AM Jeffrey Altman wrote: > On 3/2/2019 3:42 PM, Ciprian Dorin Craciun wrote: > > (A) When you state `exactly copied` you mean only the following > > (based on the `struct stat` members): > > [...] > > The vice partition directory hierarchy is used to create a private > object store. The reason that Harald said "exact copy" is because > OpenAFS leverages the fact that the "dafs" or "fs" bnode services > execute as "root" to encode information in the inode's metadata that is > not guaranteed to be a valid state from the perspective of normal file > tooling. I understand that OpenAFS "reuses" the inode metadata for its own purposes, and that one shouldn't touch it outside the OpenAFS tools. However is it enough if I make sure that while migrating I need to keep **only** the following file metadata: * `st_uid` and `st_gid`; * `st_mode`; * `st_atim`, `st_mtim` and `st_ctim`; Can I assume that no other meta-data is required? (Like for example Linux file-system ACL's or extended user attributes?) (I would assume not, however I wanted to make sure.) Moreover I am curios if the timestamps are actually required? (Especially the access and changed timestamps.) > For many years there was discussion of creating a plug-in interface for > the vice partition object storage. This would permit separate formats > depending on the underlying file system capabilities and use of non-file > system object stores. Although this is a little bit off-topic, I am quite happy that OpenAFS decided to just reuse a "proper" file-system, and layout its own "objects" on-top, instead of going with opaque "object stores"... I understand that from a performance and scalability point of view a more advanced format would help, however for small deployments, I think the plain file-system approach provides more reliability and reassurance that in case something happens one can easily recover files. (See bellow for more about this.) > OpenAFS stores each AFS3 File ID data stream in a single file in > the current format. > > > I.e. formalizing the last one: if one would take any file accessible > > under `/afs` and would compute its SHA1, then by looking into all > > `/vicepX` partitions belonging to that cell, one would definitively > > find a matching file with that SHA1. > > This is true for the current format. Continuing my "reliability" idea of plain file-systems, I for example maintain MD5 checksums for all my AFS stored files (i.e. those in `/afs/cell`), which means that in case something goes wrong with the AFS directories or meta-data, I can always just MD5 the actual `/vicepX` files, and pick my data out of there. In fact, given that I have deployed OpenAFS for personal use and most my "archived" files are on it, and the fact that I don't have too much time to invest in it, just knowing the fact that I can always easily get my data out, gives me almost blind trust in OpenAFS. (This, and the lack of WAN and ACL support, is why I don't use Lustre, Ceph or other "modern" distributed / parallel file-systems.) > > My curiosity into all this is because I want to prepare some `rsync` > > and `cpio` snippets that perhaps could help others in a similar > > endeavor. Moreover (although I know there are at least two other > > "official" ways to achieve this) it can serve as an alternative backup > > mechanism. > > The vice partition format should be considered to be private to the > fileserver processes. It is not portable and should not be used as a > backup or transfer mechanism. I understand this, however I'm thinking more in case of "disaster recovery" scenarios, and in those cases when the OpenAFS services are not capable of running. (As is in my case when I don't have OpenAFS yet installed on my "new" server, and my "old" server OS is unusable. I just have my `/vicepX` partitions... Moreover I intend to create a `cpio` in `newc` format of my old `/vicepX` partitions and keep them for a while... And given that `cpio` has limited metadata support is why I asked about which metadata is required.) Thanks Jeffrey for the information, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
On Wed, Dec 5, 2018 at 3:29 PM Harald Barth wrote: > > Can I safely `rsync` the files from the old partition to the new one? > > For Linux (The "new" server partition layout): > > If the file tree really is exactly copied (including all permissions > and chmod-bits) then you have everything you need. I would like to follow-up on this with some additional questions, which I'll try to keep as succinct as possible. (Nothing critical, however I would like to have a little bit more insight into this.) (A) When you state `exactly copied` you mean only the following (based on the `struct stat` members): * `st_uid` / `st_gid`; * `st_mode`; (i.e. permissions;) * `st_atim`, `st_mtim` and `st_ctim`? (i.e. timestamps) * no ACL's, no `xattr` (or `user_xattr`); * anything else? (B) Also (based on what I gathered by "peeking" into the `/vicepX` partition) there are only plain folders and plain files, without any symlinks or hard-links. (C) Moreover based on the same observations, I guess that the metadata (i.e. uid/gid/permissions/timestamps) for the actual folders inside of `/vicepX` don't matter much. (Only the matadata for the actual files do.) (D) (Not really related to migration) Am I to assume that some of the files inside `AFSIDat` are identical in contents to the actual files on the `/afs` structure? (Disregarding all meta-data, including filenames.) Moreover am I to assume that all the files accessible from `/afs` are found somewhere inside `AFSIDat` with identical contents? I.e. formalizing the last one: if one would take any file accessible under `/afs` and would compute its SHA1, then by looking into all `/vicepX` partitions belonging to that cell, one would definitively find a matching file with that SHA1. My curiosity into all this is because I want to prepare some `rsync` and `cpio` snippets that perhaps could help others in a similar endeavor. Moreover (although I know there are at least two other "official" ways to achieve this) it can serve as an alternative backup mechanism. BTW, is there a document that outlines the actual layout of the `/vicepX` structure? I've searched a bit but found nothing useful. Thanks for the feedback, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
On Wed, Dec 5, 2018 at 3:29 PM Harald Barth wrote: > > Can I safely `rsync` the files from the old partition to the new one? > > For Linux (The "new" server partition layout): > > If the file tree really is exactly copied (including all permissions > and chmod-bits) then you have everything you need. Am I safe to assume that on Linux only the "new" partition layout is used? (Is there a way to check which layout am I using?) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Offline migration of AFS partition files (i.e. the contents of `/vicepX`)
Quick question regarding the following situation: one of my `/vicepX` AFS partitions is currently stored on an old disk (with JFS as file-system) and I thus I need to move all my AFS data from that partition to a fresh one (Ext4); moreover during this movement OpenAFS is not running (and I intend to upgrade the server version also from 1.6.5 to the latest one). Can I safely `rsync` the files from the old partition to the new one? Is there another alternative that doesn't require actually staring OpenAFS? (I know about `voldump`, however it requires me to execute it for each volume, and thus I might "forget" something. Moreover it requires extra storage space for the resulting archive.) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Multi-homed server and NAT-ed client issues
On Wed, Jul 17, 2013 at 10:23 PM, Harald Barth wrote: > >> services: `kaserver`, > > Please consider running a KDC (or use the KDC in your AD if you have > one) instead of kaserver. kaserver is so last century. > > Harald. Yes... The `kaserver` thingy... :) The problem is that when I've started using OpenAFS (for personal purposes), the "stable" version was 1.4 (or at least what was labeled "stable" in my distribution). And at that time `kaserver` was simple to install and manage, and I still use it today, mainly due to laziness. Moreover I only have a few users, thus migrating to a full Kerberos stack seems like an overkill to me... On the same topic, are there any serious concerns related to `kaserver`? Or is it more related with other aspects (like say scalability, integration, future, etc.)? Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Multi-homed server and NAT-ed client issues
Problem solved! Thanks to both posters for pointing me to the right direction: adding the `-rxbind` option to the following services: `kaserver`, `ptserver`, `vlserver`, `fileserver`, and `volserver`. This was simply done by editing the `BosConfig` file in the `/etc/openafs` folder and adding that token in the lines starting with `parm`. (I must confess I feel quite dump for not finding this option myself... I must say I did add it to the `bosserver` invocation but didn't seem to work. I should have added it to each individual service.) However a note for the documentation maintainers it seems that the `-rxbind` option is missing from the manuals of the following services (at least in the HTML version on http://docs.openafs.org ): `kaserver` and `volserver`. About the words that follow bellow, because they were written while I was reading the replies, I'll leave them there, for those that one day will have similar issues with multi-homed servers. On Wed, Jul 17, 2013 at 6:28 PM, Jeffrey Hutzelman wrote: > On Wed, 2013-07-17 at 17:43 +0300, Ciprian Dorin Craciun wrote: >> Hello all! I've encountered quite a blocking issue in my OpenAFS >> setup... I hope someone is able to help me... :) >> >> >> The setup is as follows: >> * multi-homed server with, say S-IP-1 (i.e. x.x.x.5) and S-IP-2 >> (i.e. x.x.x.7), multiple IP addresses, all from the public range; > > Things get much easier if you just use the actual names and addresses, > instead of making up placeholders. Indeed it could seem that I've obscured the situation by providing placeholders for the actual IP's. (The reason revolves mainly around the fact that all these emails are public knowledge.) However the values I've chosen as placeholders have been carefully selected: * both are addresses for the same interface (configured with `ip addr add ...`, thus not with alias interfaces like Debian once had); * both are addresses from the same network (thus are routed identically); * the second IP (the one OpenAFS should use) is marked as `secondary` by `ip addr show`; Below is the output of `ip -4 addr show ethX` (blanking only the interface name and the network address): ethX: mtu 1500 qdisc noqueue state UP inet x.x.x.5/27 brd x.x.x.31 scope global ethX inet x.x.x.7/27 brd x.x.x.31 scope global secondary ethX And the output of `ip -4 route show`: x.x.x.0/27 dev ethX proto kernel scope link src x.x.x.5 127.0.0.0/8 via 127.0.0.1 dev lo default via x.x.x.1 dev ethX The full output of both `ip addr` and `ip route` includes a few more bridges and interfaces. However none share the same IP range with the addresses above, there are no other default routes except the one above, and moreover the OpenAFS clients aren't on any of the "extra" networks (i.e. the packets to them should go through the default route above). > Frequently, doing that sort of thing > hides critical information that may point to the source of the problem. I hope that the details above are sufficient to depict the overall context. > For example, in this case, Linux's choice of source IP address on an > outgoing UDP packet sent from an unbound socket (or one bound to > INADDR_ANY) will depend on the interface it chooses, which will depend > on the route taken, which depends on the server's actual addresses and > the network topology, particularly with respect to the client (or in > this case, to the public address of the NAT the client is behind). As said, the reply packet leaves the server with the source set as the first IP (x.x.x.5). And thus the behaviour is consistent with a socket bound to `INADDR_ANY` and towards a peer that takes the default route. > You also haven't said what version of OpenAFS you're using, so I'll > assume it's some relatively recent 1.6.x. Indeed, my fault. Being hurried to leave for home, I've forgot to mention that I have Linux on both the client and server, and the OpenAFS version is 1.6.2. >> * the second IP, S-IP-2 (i.e. x.x.x.7), is the one listed in >> `NetInfo` and DNS record (and correctly listed when queried via `vos >> listaddrs`); >> * the first IP, S-IP-1 (i.e. x.x.x.5), is listed in >> `NetRestricted` (and doesn't appear in `vos listaddrs`); > > So, the machine the fileserver runs on is multi-homed, but you're only > interested in actually using one of those interfaces to provide AFS > service? Exactly the server is multi-homed and I want it to use the secondary IP address. (In fact all OpenAFS services run on exactly the same server.) > In that case, you use the -rxbind option, which tells the > servers to bind to a specific address instead of INADDR_ANY. That > option needs
[OpenAFS] Multi-homed server and NAT-ed client issues
Hello all! I've encountered quite a blocking issue in my OpenAFS setup... I hope someone is able to help me... :) The setup is as follows: * multi-homed server with, say S-IP-1 (i.e. x.x.x.5) and S-IP-2 (i.e. x.x.x.7), multiple IP addresses, all from the public range; * the second IP, S-IP-2 (i.e. x.x.x.7), is the one listed in `NetInfo` and DNS record (and correctly listed when queried via `vos listaddrs`); * the first IP, S-IP-1 (i.e. x.x.x.5), is listed in `NetRestricted` (and doesn't appear in `vos listaddrs`); * NAT-ed client (no multi-home on the client side); The actual problem is: * the client sends the authentication request to S-IP-2; * the client's router source-NAT's the IP to its own public IP, and adds the UDP "connection" with S-IP-2 as the other peer to its conntrack table; * the server receives the request on S-IP-2; * !!! however it replies from S-IP-1 (i.e. x.x.x.5) !!! (probably because the UDP socket is bound on `0.0.0.0`...) * the client's router receives the packet and can't find it in its conntrack table (because it expects the packet to come from S-IP-2); As a note everything works perfect with non-NAT-ed clients. Moreover on these public-IP-ed clients, I can clearly see via `tcpdump` that outgoing packets go towards S-IP-2, but the replies come from S-IP-1. (The same asymmetry is visible also on the server.) Thus my question is how can I resolve such an issue? I must say I've tried to `iptables -j SNAT ...` outgoing packets to the right S-IP-2, however this doesn't work because SNAT also changes the source port. I've also tried to `-j NETMAP` these packets, but it doesn't work because NETMAP in the `OUTPUT` or `POSTROUTING` tables actually touch the destination... Thus if someone knows of an `iptables`... Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Question for OpenAFS
On Thu, Apr 18, 2013 at 10:10 AM, Lars Schimmer wrote: > On 18.04.2013 08:46, 강신덕 wrote: >> I wonder whether Openafs File Server & DB Server could work on vmware as >> a virtual machine. >> >> If it is possible, We want to migrate our openafs system from physical >> server to virtual machine using VMWare. > > Sure that is possible, some cells do run complete out of VMWare VMs. > But remind: this setup does cost some performance overhead. About this OpenAFS in VM, I tried a small experiment some time ago and I had some issues... Basically there are two approaches to virtualization: * each VM gets its own "public" IP address (practically it is just like other hosts on the same LAN); in this case I have no doubts that OpenAFS works flawlessly, minus the performance issues; * the host has a single IP, and the VM's get a "private" IP address, that is NAT-ed; in this case I remember I had issues with properly configuring OpenAFS to handle such a scenario; Could someone comment on his success / in-success in the second NAT-ed scenario? I remember I gave up and just moved the OpenAFS on the host. (I remember that 2-3 years ago there was some sketchy documentation on this, but I haven't checked lately...) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Where to find the 1.6.1a source code Git repository?
On Tue, Dec 18, 2012 at 10:17 PM, Derrick Brashear wrote: > On Tue, Dec 18, 2012 at 3:13 PM, Ciprian Dorin Craciun > wrote: >> (Why this confusion? Because currently OpenAFS 1.6.1 fails to >> build on latest 3.6 Linux kernel, and I was hopping that there is a >> 1.6.1b version in the working in Git which solves the issues...) >> > > There's a 1.6.2pre1 in git, which as it happens fixes that. :) Indeed it fixes it... :) Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Where to find the 1.6.1a source code Git repository?
On Tue, Dec 18, 2012 at 10:07 PM, Derrick Brashear wrote: > 1.6.1a is macos-only. if you're not building a macos client, you don't > care. if you are building a macos client, apply the 1.6.1a patch in > the macos release directory to the 1.6.1 source. Thanks for the quick reply, it clarifies some things now. But then shouldn't this information (that the version 1.6.1a is OSX only) be clearly written somewhere on the download site? (Indeed if I look where that version appears I can find it only under the Mac section.) (Why this confusion? Because currently OpenAFS 1.6.1 fails to build on latest 3.6 Linux kernel, and I was hopping that there is a 1.6.1b version in the working in Git which solves the issues...) Thanks again, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Where to find the 1.6.1a source code Git repository?
Hello all! I've seen that on the download site there is an OpenAFS 1.6.1a version, but in the git repository I don't seem to find any tag or branch relating to such a version... (Indeed on the download site there is a patch that applies cleanly over the branch `openafs-stable-1_6_1-branch`.) Thus my question is in which repository (or under which reference) can I find the code for the 1.6.1a release? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] Re: OpenAFS 1.6.0pre4 kernel panic
On Sat, Apr 16, 2011 at 14:16, Simon Wilkinson wrote: > Could you run gdb against your kernel module (either openafs.ko or > libafs.ko), and run > list *afs_GetDownD.clone.5+0x1d0 > > This will let us know exactly where in your kernel module the fault is > occurring. > > Cheers, > > Simon Unfortunately it complains that it can't find symbols... gdb ./src/libafs/MODLOAD-2.6.38.3-erebus+-MP/libafs.ko ... (gdb) list *afs_GetDownD.clone.5+0x1d0 No symbol table is loaded. Use the "file" command. I've tried reconfiguring and rebuilding as below, but with the same result: ./configure --prefix=/packages/openafs/1.6.0-pre4--1 --with-afs-sysname=i386_linux26 --enable-kernel-module --disable-transarc-paths --disable-linux-syscall-probing --with-linux-kernel-headers=/tmp/linux--2.6.38.3-erebus+--modules --with-linux-kernel-build=/tmp/linux--2.6.38.3-erebus+--modules --enable-debug --enable-debug-kernel Have I missed some configuration options, or should I change the kernel config? Thanks, Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] Re: OpenAFS 1.6.0pre4 kernel panic
On Sat, Apr 16, 2011 at 12:58, Ciprian Dorin Craciun wrote: > (I've resent this email as the attached image and config file are > too large and were rejected by the mailing list.) > > Hello all! > > I've successfully run OpenAFS v1.4.12 with kernels up-to 2.6.34.x. > But unfortunately lately I'm unable to make neither 1.4.14.1 nor > 1.6.0pre4 work with either 2.6.37.x or 2.6.38.x kernels. > > Attached I put my kernel config file and a picture of my laptop's > screen when "panic"-ed. (How could I easily capture the panic error > after I reboot?) Also attached I put my (custom, but Debian based) > `init.d` script and OpenAFS related config files. (I'm using > ArchLinux.) > > Panic picture: > > http://data.volution.ro/ciprian/8f75abb6c3c12be3375206fa1cdea065/dscf0902-small.jpg > Kernel config: > http://data.volution.ro/ciprian/8f75abb6c3c12be3375206fa1cdea065/config > > Any pointers? > Thank you, > Ciprian. Ok. I've backtracked the problem and identified that the cause is: afsd -dynroot -afsdb -memcache -dcache 8192 -chunksize 14 -stat 32768 -fakestat-all -daemons 6 -volumes 256 -nosettime Because if I remove all the "fine-tuning" it works as: afsd -dynroot -afsdb -memcache -fakestat-all -nosettime But still, should the cache manager crash the system if badly configured? Ciprian. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] OpenAFS 1.6.0pre4 kernel panic
(I've resent this email as the attached image and config file are too large and were rejected by the mailing list.) Hello all! I've successfully run OpenAFS v1.4.12 with kernels up-to 2.6.34.x. But unfortunately lately I'm unable to make neither 1.4.14.1 nor 1.6.0pre4 work with either 2.6.37.x or 2.6.38.x kernels. Attached I put my kernel config file and a picture of my laptop's screen when "panic"-ed. (How could I easily capture the panic error after I reboot?) Also attached I put my (custom, but Debian based) `init.d` script and OpenAFS related config files. (I'm using ArchLinux.) Panic picture: http://data.volution.ro/ciprian/8f75abb6c3c12be3375206fa1cdea065/dscf0902-small.jpg Kernel config: http://data.volution.ro/ciprian/8f75abb6c3c12be3375206fa1cdea065/config Any pointers? Thank you, Ciprian. openafs Description: Binary data cacheinfo Description: Binary data