Note: I have access to only one FreeBSD amd64 context, and it is also my only access to a NUMA context: 2 memory domains. A Threadripper 1950X context. Also: I have only a head FreeBSD context on any architecture, not 12.x or before. So I have limited compare/contrast material.
I present the below basically to ask if the NUMA handling has been validated, or if it is going to be, at least for contexts that might apply to ThreadRipper 1950X and analogous contexts. My results suggest they are not (or libc++'s now times get messed up such that it looks like NUMA mishandling since this is based on odd benchmark results that involve mean time for laps, using a median of such across multiple trials). I ran a benchmark on both Fedora 30 and FreeBSD 13 on this 1950X got got expected results on Fedora but odd ones on FreeBSD. The benchmark is a variation on the old HINT benchmark, spanning the old multi-threading variation. I later tried Fedora because the FreeBSD results looked odd. The other architectures I tried FreeBSD benchmarking with did not look odd like this. (powerpc64 on a old PowerMac 2 socket with 2 cores per socket, aarch64 Cortex-A57 Overdrive 1000, CortextA53 Pine64+ 2GB, armv7 Cortex-A7 Orange Pi+ 2nd Ed. For these I used 4 threads, not more.) I tend to write in terms of plots made from the data instead of the raw benchmark data. FreeBSD testing based on: cpuset -l0-15 -n prefer:1 cpuset -l16-31 -n prefer:1 Fedora 30 testing based on: numactl --preferred 1 --cpunodebind 0 numactl --preferred 1 --cpunodebind 1 While I have more results, I reference primarily DSIZE and ISIZE being unsigned long long and also both being unsigned long as examples. Variations in results are not from the type differences for any LP64 architectures. (But they give an idea of benchmark variability in the test context.) The Fedora results solidly show the bandwidth limitation of using one memory controller. They also show the latency consequences for the remote memory domain case vs. the local memory domain case. There is not a lot of variability between the examples of the 2 type-pairs used for Fedora. Not true for FreeBSD on the 1950X: A) The latency-constrained part of the graph looks to normally be using the local memory domain when -l0-15 is in use for 8 threads. B) Both the -l0-15 and the -l16-31 parts of the graph for 8 threads that should be bandwidth limited show mostly examples that would have to involve both memory controllers for the bandwidth to get the results shown as far as I can tell. There is also wide variability ranging between the expected 1 controller result and, say, what a 2 controller round-robin would be expected produce. C) Even the single threaded result shows a higher result for larger total bytes for the kernel vectors. Fedora does not. I think that (B) is the most solid evidence for something being odd. For reference for FreeBSD: # cpuset -g -d 1 domain 1 mask: 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 -r352341 allows -prefer:0 but I happen to have used -prefer:1 in these experiments. The benchmark was built via devel/g++9 but linked with system libraries, including libc++. Unfortunately, I'm not yet ready for distributing source to the benchmark, but expect to at some point. I do not expect to ever distribute binaries. The source code for normal builds involves just standard C++17 code. Such builds are what is involved here. [The powerpc64 context is a system-clang 8, ELFv1 based system context, not the usual gcc 4.2.1 based one.] More notes: In the 'kernel vectors: total Bytes' vs. 'QUality Improvement Per Second' graphs the left hand side of the curve is latency limited. On the right is bandwidth limited for LP64. (The total Bytes axis is log base 2 scaling in the graphs.) Thread creation has latency so the 8-thread curves are mostly of interest for kernel vectors total bytes being 1 MiByte or more (say) so that thread creations are not that much of the total contributions to the measured time. The thread creations are via std::async use. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) _______________________________________________ freebsd-amd64@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-amd64 To unsubscribe, send any mail to "freebsd-amd64-unsubscr...@freebsd.org"