Re: [R-pkg-devel] using portable simd instructions
Le 27/03/2024 à 14:54, jesse koops a écrit : I tried that but I found the interface awkward and there was really no performance bonus. It was in the early phase of experimentation and I didn't save it, so it could very well be that I got the compiler settings wrong and simd was not used. But if that was the case, there would still be the problem of using the correct compiler settings cross platform. When I compile the example of the cited page with "authorized" flag "-std": g++ -std=c++20 stdx_simd.cpp -o tmp.exe then I do: objdump -d tmp.exe > tmp.asm I do find simd instructions in assembler code, e.g.: grep paddd tmp.asm 14a8: 66 0f fe c1 paddd %xmm1,%xmm0 8f7b: 66 0f fe c1 paddd %xmm1,%xmm0 Op wo 27 mrt 2024 om 14:44 schreef Serguei Sokol : Le 26/03/2024 à 15:51, Tomas Kalibera a écrit : On 3/26/24 10:53, jesse koops wrote: Hello R-package-devel, I recently got inspired by the rcppsimdjson package to try out simd registers. It works fantastic on my computer but I struggle to find information on how to make it portable. It doesn't help in this case that R and Rcpp make including Cpp code so easy that I have never had to learn about cmake and compiler flags. I would appreciate any help, including of the type: "go read instructions at ...". I use RcppArmadillo and Rcpp. I currenlty include the following header: #include The functions in immintrin that I use are: _mm256_loadu_pd _mm256_set1_pd _mm256_mul_pd _mm256_fmadd_pd _mm256_storeu_pd and I define up to four __m256d registers. From information found online (not sure where anymore) I constructed the following makevars file: CXX_STD = CXX14 PKG_CPPFLAGS = -I../inst/include -mfma -msse4.2 -mavx PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) (I also use openmp, that has always worked fine, I just included all lines for completeness) Rcheck gives me two notes: ─ using R version 4.3.2 (2023-10-31 ucrt) ─ using platform: x86_64-w64-mingw32 (64-bit) ─ R was compiled by gcc.exe (GCC) 12.3.0 GNU Fortran (GCC) 12.3.0 ❯ checking compilation flags used ... NOTE Compilation used the following non-portable flag(s): '-mavx' '-mfma' '-msse4.2' ❯ checking C++ specification ... NOTE Specified C++14: please drop specification unless essential But as far as I understand, the flags are necessary, at least in GCC. How can I make this portable and CRAN-acceptable? I think it the best way for portability is to use a higher-level library that already has done the low-level business of maintaining multiple versions of the code (with multiple instruction sets) and choosing one appropriate for the current CPU. It could be say LAPACK, BLAS, openmp, depending of the problem at hand. Talking about libraries, may be the https://en.cppreference.com/w/cpp/experimental/simd will do the job? Best, Serguei. In some cases, code can be rewritten so that the compiler can vectorize it better, using the level of vectorized instructions that have been enabled. Unconditionally using GCC-specific or architecture-specific options in packages would certainly not be portable. Even on Windows, R is now used also with clang and on aarch64, so one should not assume a concrete compiler and architecture. Please note also that GCC on Windows has a bug due to which AVX2 instructions cannot be used reliably - the compiler doesn't always properly align local variables on the stack when emitting these. See [1,2] for more information. Best Tomas [1] https://stat.ethz.ch/pipermail/r-sig-windows/2024q1/000113.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 kind regards, Jesse __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
I like assembler, and I do use SIMD intrinsincs in some of my code (not R), but sparingly. The issue is more than portability between platforms, but also portability between processors - if you write your optimized code using AVX, it might not take advantage of newer AVX512 cpus. In many cases your compiler will do the right thing and optimize your code. I suggest: * write your code in plain C, test it with some long computation and use "perf top" on Linux to observe the code hotspots and which assembler instructions are being used. * if you see instructions like "addps" these are vectorized. If you see instructions like "addss" these are *not* vectorized. * if you see a few instructions as hotspots with arguments in parenthesis "vmovaps %xmm1,(%r8)" then you are likely limited by memory access. * If you are not limited by memory access and the compiler produces a lot of "addss" or similar that are hotspots, then you need to look at your code and make it more parallelizable. * How to make your C code more parallelizable: You want to make easy to interpret loops like for(i=start;i You can help the compiler by using "restrict" keyword to indicate that arrays do not overlap, or (as a sledgehammer) "#pragma ivdep". But before using keywords check with "perf top" which code is actually a hotspot, as the compiler can generate good code without restrict keywords, by using multiple code paths. * You can create small temporary arrays to make your algorithm look more like loops above. The small arrays should be at least 16 wide, because AVX512 has instructions that operate on 16 floats at a time. * To allow use of small arrays you can unroll your loops. Note that compilers do unrolling themselves, so doing it manually is only helpful if this makes the inner body of the loop more parallelizable. * You can debug why the compiler does not parallelize your code by turning on diagnostics. For gcc the flag is "-fopt-info-vec-missed=vec_info.txt" * In very rare cases you use intrinsics. For me this is typically a situation when I need to find a value and the index of a maximum or minimum in an array - compilers do not optimize this well, at least for many different ways of coding this in C that I have tried many years ago. * If after all your work you got a factor of 2 speedup you are doing fine. If you want larger speedup change your algorithm. best Vladimir Dergachev On Wed, 27 Mar 2024, Dirk Eddelbuettel wrote: On 27 March 2024 at 08:48, jesse koops wrote: | Thank you, I was not aware of the easy way to search CRAN. I looked at | rcppsimdjson of course, but couldn't figure it out since it is done in | the simdjson library if interpret it correclty, not within the R | ecosystem and I didn't know how that would change things. Writing R | extensions assumes a lot of prior knowledge so I will have to work my | way up to there first. I think I have (at least) one other package doing something like this _in the library layer too_ as suggested by Tomas, namely crc32c as used by digest. You could study how crc32c [0] does this for x86_64 and arm64 to get hardware optimization. (This may be more specific cpu hardware optimization but at least the library and cmake files are small.) I decided as a teenager that assembler wasn't for me and haven't looked back, but I happily take advantage of it when bundled well. So strong second for the recommendation by Tomas to rely on this being done in an external and tested library. (Another interesting one there is highway [1]. Just packaging that would likely be an excellent contribution.) Dirk [0] repo: https://github.com/google/crc32c [1] repo: https://github.com/google/highway docs: https://google.github.io/highway/en/master/ | | Op di 26 mrt 2024 om 15:41 schreef Dirk Eddelbuettel : | > | > | > On 26 March 2024 at 10:53, jesse koops wrote: | > | How can I make this portable and CRAN-acceptable? | > | > But writing (or borrowing ?) some hardware detection via either configure / | > autoconf or cmake. This is no different than other tasks decided at install-time. | > | > Start with 'Writing R Extensions', as always, and work your way up from | > there. And if memory serves there are already a few other packages with SIMD | > at CRAN so you can also try to take advantage of the search for a 'token' | > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: | > | >https://github.com/search?q=org%3Acran%20SIMD=code | > | > Hth, Dirk | > | > -- | > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
I tried that but I found the interface awkward and there was really no performance bonus. It was in the early phase of experimentation and I didn't save it, so it could very well be that I got the compiler settings wrong and simd was not used. But if that was the case, there would still be the problem of using the correct compiler settings cross platform. Op wo 27 mrt 2024 om 14:44 schreef Serguei Sokol : > > Le 26/03/2024 à 15:51, Tomas Kalibera a écrit : > > > > On 3/26/24 10:53, jesse koops wrote: > >> Hello R-package-devel, > >> > >> I recently got inspired by the rcppsimdjson package to try out simd > >> registers. It works fantastic on my computer but I struggle to find > >> information on how to make it portable. It doesn't help in this case > >> that R and Rcpp make including Cpp code so easy that I have never had > >> to learn about cmake and compiler flags. I would appreciate any help, > >> including of the type: "go read instructions at ...". > >> > >> I use RcppArmadillo and Rcpp. I currenlty include the following header: > >> > >> #include > >> > >> The functions in immintrin that I use are: > >> > >> _mm256_loadu_pd > >> _mm256_set1_pd > >> _mm256_mul_pd > >> _mm256_fmadd_pd > >> _mm256_storeu_pd > >> > >> and I define up to four __m256d registers. From information found > >> online (not sure where anymore) I constructed the following makevars > >> file: > >> > >> CXX_STD = CXX14 > >> > >> PKG_CPPFLAGS = -I../inst/include -mfma -msse4.2 -mavx > >> > >> PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) > >> PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) > >> > >> (I also use openmp, that has always worked fine, I just included all > >> lines for completeness) Rcheck gives me two notes: > >> > >> ─ using R version 4.3.2 (2023-10-31 ucrt) > >> ─ using platform: x86_64-w64-mingw32 (64-bit) > >> ─ R was compiled by > >> gcc.exe (GCC) 12.3.0 > >> GNU Fortran (GCC) 12.3.0 > >> > >> ❯ checking compilation flags used ... NOTE > >>Compilation used the following non-portable flag(s): > >> '-mavx' '-mfma' '-msse4.2' > >> > >> ❯ checking C++ specification ... NOTE > >> Specified C++14: please drop specification unless essential > >> > >> But as far as I understand, the flags are necessary, at least in GCC. > >> How can I make this portable and CRAN-acceptable? > > > > I think it the best way for portability is to use a higher-level library > > that already has done the low-level business of maintaining multiple > > versions of the code (with multiple instruction sets) and choosing one > > appropriate for the current CPU. It could be say LAPACK, BLAS, openmp, > > depending of the problem at hand. > Talking about libraries, may be the > https://en.cppreference.com/w/cpp/experimental/simd will do the job? > > Best, > Serguei. > > In some cases, code can be rewritten > > so that the compiler can vectorize it better, using the level of > > vectorized instructions that have been enabled. > > > > Unconditionally using GCC-specific or architecture-specific options in > > packages would certainly not be portable. Even on Windows, R is now used > > also with clang and on aarch64, so one should not assume a concrete > > compiler and architecture. > > > > Please note also that GCC on Windows has a bug due to which AVX2 > > instructions cannot be used reliably - the compiler doesn't always > > properly align local variables on the stack when emitting these. See > > [1,2] for more information. > > > > Best > > Tomas > > > > [1] https://stat.ethz.ch/pipermail/r-sig-windows/2024q1/000113.html > > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 > > > >> > >> kind regards, > >> Jesse > >> > >> __ > >> R-package-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > > > > __ > > R-package-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-package-devel > __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
Le 26/03/2024 à 15:51, Tomas Kalibera a écrit : On 3/26/24 10:53, jesse koops wrote: Hello R-package-devel, I recently got inspired by the rcppsimdjson package to try out simd registers. It works fantastic on my computer but I struggle to find information on how to make it portable. It doesn't help in this case that R and Rcpp make including Cpp code so easy that I have never had to learn about cmake and compiler flags. I would appreciate any help, including of the type: "go read instructions at ...". I use RcppArmadillo and Rcpp. I currenlty include the following header: #include The functions in immintrin that I use are: _mm256_loadu_pd _mm256_set1_pd _mm256_mul_pd _mm256_fmadd_pd _mm256_storeu_pd and I define up to four __m256d registers. From information found online (not sure where anymore) I constructed the following makevars file: CXX_STD = CXX14 PKG_CPPFLAGS = -I../inst/include -mfma -msse4.2 -mavx PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) (I also use openmp, that has always worked fine, I just included all lines for completeness) Rcheck gives me two notes: ─ using R version 4.3.2 (2023-10-31 ucrt) ─ using platform: x86_64-w64-mingw32 (64-bit) ─ R was compiled by gcc.exe (GCC) 12.3.0 GNU Fortran (GCC) 12.3.0 ❯ checking compilation flags used ... NOTE Compilation used the following non-portable flag(s): '-mavx' '-mfma' '-msse4.2' ❯ checking C++ specification ... NOTE Specified C++14: please drop specification unless essential But as far as I understand, the flags are necessary, at least in GCC. How can I make this portable and CRAN-acceptable? I think it the best way for portability is to use a higher-level library that already has done the low-level business of maintaining multiple versions of the code (with multiple instruction sets) and choosing one appropriate for the current CPU. It could be say LAPACK, BLAS, openmp, depending of the problem at hand. Talking about libraries, may be the https://en.cppreference.com/w/cpp/experimental/simd will do the job? Best, Serguei. In some cases, code can be rewritten so that the compiler can vectorize it better, using the level of vectorized instructions that have been enabled. Unconditionally using GCC-specific or architecture-specific options in packages would certainly not be portable. Even on Windows, R is now used also with clang and on aarch64, so one should not assume a concrete compiler and architecture. Please note also that GCC on Windows has a bug due to which AVX2 instructions cannot be used reliably - the compiler doesn't always properly align local variables on the stack when emitting these. See [1,2] for more information. Best Tomas [1] https://stat.ethz.ch/pipermail/r-sig-windows/2024q1/000113.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 kind regards, Jesse __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
Thanks, the source of digest seems especially helpful. Ironically, I actually used only 10 or so lines with simd code in the package in a single function that is a bottleneck. It gives a very noticable performance boost, even with unaligned vectors, and something about directly using processor registers seems elegant to me. The article by Langdale and Lemire that you pointed to on your blog was also inspirational. Unfortunately getting those 10 lines to reliably work on other machines looks like a long term project and the advice from Thomas that GCC on windows might decide to faultily optimize other code is a bit scary. I'll take the advice to heart and will not try to include simd code in any of my cran packages in the near future. I learned C and Cpp via R, like many people probably, and makefiles and portability have never come up until now so there's quite some learning ahead. At least I now have some good places to start. Thank you all for the help! Op wo 27 mrt 2024 om 12:27 schreef Dirk Eddelbuettel : > > > On 27 March 2024 at 08:48, jesse koops wrote: > | Thank you, I was not aware of the easy way to search CRAN. I looked at > | rcppsimdjson of course, but couldn't figure it out since it is done in > | the simdjson library if interpret it correclty, not within the R > | ecosystem and I didn't know how that would change things. Writing R > | extensions assumes a lot of prior knowledge so I will have to work my > | way up to there first. > > I think I have (at least) one other package doing something like this _in the > library layer too_ as suggested by Tomas, namely crc32c as used by digest. > You could study how crc32c [0] does this for x86_64 and arm64 to get hardware > optimization. (This may be more specific cpu hardware optimization but at > least the library and cmake files are small.) > > I decided as a teenager that assembler wasn't for me and haven't looked back, > but I happily take advantage of it when bundled well. So strong second for > the recommendation by Tomas to rely on this being done in an external and > tested library. > > (Another interesting one there is highway [1]. Just packaging that would > likely be an excellent contribution.) > > Dirk > > [0] repo: https://github.com/google/crc32c > [1] repo: https://github.com/google/highway > docs: https://google.github.io/highway/en/master/ > > > | > | Op di 26 mrt 2024 om 15:41 schreef Dirk Eddelbuettel : > | > > | > > | > On 26 March 2024 at 10:53, jesse koops wrote: > | > | How can I make this portable and CRAN-acceptable? > | > > | > But writing (or borrowing ?) some hardware detection via either configure > / > | > autoconf or cmake. This is no different than other tasks decided at > install-time. > | > > | > Start with 'Writing R Extensions', as always, and work your way up from > | > there. And if memory serves there are already a few other packages with > SIMD > | > at CRAN so you can also try to take advantage of the search for a 'token' > | > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: > | > > | >https://github.com/search?q=org%3Acran%20SIMD=code > | > > | > Hth, Dirk > | > > | > -- > | > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > -- > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
On 27 March 2024 at 08:48, jesse koops wrote: | Thank you, I was not aware of the easy way to search CRAN. I looked at | rcppsimdjson of course, but couldn't figure it out since it is done in | the simdjson library if interpret it correclty, not within the R | ecosystem and I didn't know how that would change things. Writing R | extensions assumes a lot of prior knowledge so I will have to work my | way up to there first. I think I have (at least) one other package doing something like this _in the library layer too_ as suggested by Tomas, namely crc32c as used by digest. You could study how crc32c [0] does this for x86_64 and arm64 to get hardware optimization. (This may be more specific cpu hardware optimization but at least the library and cmake files are small.) I decided as a teenager that assembler wasn't for me and haven't looked back, but I happily take advantage of it when bundled well. So strong second for the recommendation by Tomas to rely on this being done in an external and tested library. (Another interesting one there is highway [1]. Just packaging that would likely be an excellent contribution.) Dirk [0] repo: https://github.com/google/crc32c [1] repo: https://github.com/google/highway docs: https://google.github.io/highway/en/master/ | | Op di 26 mrt 2024 om 15:41 schreef Dirk Eddelbuettel : | > | > | > On 26 March 2024 at 10:53, jesse koops wrote: | > | How can I make this portable and CRAN-acceptable? | > | > But writing (or borrowing ?) some hardware detection via either configure / | > autoconf or cmake. This is no different than other tasks decided at install-time. | > | > Start with 'Writing R Extensions', as always, and work your way up from | > there. And if memory serves there are already a few other packages with SIMD | > at CRAN so you can also try to take advantage of the search for a 'token' | > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: | > | >https://github.com/search?q=org%3Acran%20SIMD=code | > | > Hth, Dirk | > | > -- | > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
Thank you, I was not aware of the easy way to search CRAN. I looked at rcppsimdjson of course, but couldn't figure it out since it is done in the simdjson library if interpret it correclty, not within the R ecosystem and I didn't know how that would change things. Writing R extensions assumes a lot of prior knowledge so I will have to work my way up to there first. Op di 26 mrt 2024 om 15:41 schreef Dirk Eddelbuettel : > > > On 26 March 2024 at 10:53, jesse koops wrote: > | How can I make this portable and CRAN-acceptable? > > But writing (or borrowing ?) some hardware detection via either configure / > autoconf or cmake. This is no different than other tasks decided at > install-time. > > Start with 'Writing R Extensions', as always, and work your way up from > there. And if memory serves there are already a few other packages with SIMD > at CRAN so you can also try to take advantage of the search for a 'token' > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: > >https://github.com/search?q=org%3Acran%20SIMD=code > > Hth, Dirk > > -- > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
Thank you very much, that looks promising. Though if I look at your congigure.ac script, also extremely daunting and far above my current level of understanding. I guess I'll start with the autoconf manual then. Op di 26 mrt 2024 om 16:04 schreef Vincent Dorie : > > Hi Jesse, > > What I've done is to use a mix of compile-time detection of compiler SIMD > support and run-time detection of SIMD hardware support. At package load, > SIMD-specific versions of functions are installed in a symbol table. It's not > perfect and it can be hard to support evolving platforms, especially now that > ARM is more prevalent. However, it does allow for distribution on CRAN as it > uses only autoconf, POSIX make, and no specific compiler. > > At compile time: > 1. Use a configure script to detect the platform and any SIMD instructions > supported by the compiler. This is also the time to identify the compiler > flags necessary to enable instruction sets. Unlike what the existing autoconf > macros do, you can ignore whether or not the host system supports the > instruction sets (with the exception when compiling with Solaris Studio - it > won't let you load a binary with instructions not supported by the host, even > if they cannot be executed). > 2. Use makefiles to conditionally compile different versions of the functions > you want, one for each level of instruction set supported by the compiler, > using the flags detected above. They all should be in different files with > different symbols. For example: partition_sse2.c defines partition_sse2(), > partition_avx.c defines partition_avx(), etc., while partition.c defines > partition_c() - a fall-back compiled without any SIMD instructions. Note that > echoing compilations with SIMD flags will trigger a check warning, as those > units are not inherently portable. That is addressed below. > > At run time: > 1. On package load, detect what instruction sets are supported by the host. > On x86 machines, this usually involves a call to cpuid. > 2. For the maximum level of instruction set supported by the host, install > the relevant symbol for each function into a symbol table. Using the example > above, a header defines an external function pointer partition(), which gets > set to one of the SIMD-specific implementations. > > In setting that up, I found Agner Fog's notes on CPU dispatching to be > extremely helpful. They can be found here: https://www.agner.org/optimize. I > use this strategy in the dbarts package, the code for which is here: > https://github.com/vdorie/dbarts. > > Best, > Vince > > On Tue, Mar 26, 2024 at 10:45 AM Dirk Eddelbuettel wrote: >> >> >> On 26 March 2024 at 10:53, jesse koops wrote: >> | How can I make this portable and CRAN-acceptable? >> >> But writing (or borrowing ?) some hardware detection via either configure / >> autoconf or cmake. This is no different than other tasks decided at >> install-time. >> >> Start with 'Writing R Extensions', as always, and work your way up from >> there. And if memory serves there are already a few other packages with SIMD >> at CRAN so you can also try to take advantage of the search for a 'token' >> (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: >> >>https://github.com/search?q=org%3Acran%20SIMD=code >> >> Hth, Dirk >> >> -- >> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org >> >> __ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
Hi Jesse, What I've done is to use a mix of compile-time detection of compiler SIMD support and run-time detection of SIMD hardware support. At package load, SIMD-specific versions of functions are installed in a symbol table. It's not perfect and it can be hard to support evolving platforms, especially now that ARM is more prevalent. However, it does allow for distribution on CRAN as it uses only autoconf, POSIX make, and no specific compiler. At compile time: 1. Use a configure script to detect the platform and any SIMD instructions supported by the compiler. This is also the time to identify the compiler flags necessary to enable instruction sets. Unlike what the existing autoconf macros do, you can ignore whether or not the host system supports the instruction sets (with the exception when compiling with Solaris Studio - it won't let you load a binary with instructions not supported by the host, even if they cannot be executed). 2. Use makefiles to conditionally compile different versions of the functions you want, one for each level of instruction set supported by the compiler, using the flags detected above. They all should be in different files with different symbols. For example: partition_sse2.c defines partition_sse2(), partition_avx.c defines partition_avx(), etc., while partition.c defines partition_c() - a fall-back compiled without any SIMD instructions. Note that echoing compilations with SIMD flags will trigger a check warning, as those units are not inherently portable. That is addressed below. At run time: 1. On package load, detect what instruction sets are supported by the host. On x86 machines, this usually involves a call to cpuid. 2. For the maximum level of instruction set supported by the host, install the relevant symbol for each function into a symbol table. Using the example above, a header defines an external function pointer partition(), which gets set to one of the SIMD-specific implementations. In setting that up, I found Agner Fog's notes on CPU dispatching to be extremely helpful. They can be found here: https://www.agner.org/optimize. I use this strategy in the dbarts package, the code for which is here: https://github.com/vdorie/dbarts. Best, Vince On Tue, Mar 26, 2024 at 10:45 AM Dirk Eddelbuettel wrote: > > On 26 March 2024 at 10:53, jesse koops wrote: > | How can I make this portable and CRAN-acceptable? > > But writing (or borrowing ?) some hardware detection via either configure / > autoconf or cmake. This is no different than other tasks decided at > install-time. > > Start with 'Writing R Extensions', as always, and work your way up from > there. And if memory serves there are already a few other packages with > SIMD > at CRAN so you can also try to take advantage of the search for a 'token' > (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: > >https://github.com/search?q=org%3Acran%20SIMD=code > > Hth, Dirk > > -- > dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
On 3/26/24 10:53, jesse koops wrote: Hello R-package-devel, I recently got inspired by the rcppsimdjson package to try out simd registers. It works fantastic on my computer but I struggle to find information on how to make it portable. It doesn't help in this case that R and Rcpp make including Cpp code so easy that I have never had to learn about cmake and compiler flags. I would appreciate any help, including of the type: "go read instructions at ...". I use RcppArmadillo and Rcpp. I currenlty include the following header: #include The functions in immintrin that I use are: _mm256_loadu_pd _mm256_set1_pd _mm256_mul_pd _mm256_fmadd_pd _mm256_storeu_pd and I define up to four __m256d registers. From information found online (not sure where anymore) I constructed the following makevars file: CXX_STD = CXX14 PKG_CPPFLAGS = -I../inst/include -mfma -msse4.2 -mavx PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) (I also use openmp, that has always worked fine, I just included all lines for completeness) Rcheck gives me two notes: ─ using R version 4.3.2 (2023-10-31 ucrt) ─ using platform: x86_64-w64-mingw32 (64-bit) ─ R was compiled by gcc.exe (GCC) 12.3.0 GNU Fortran (GCC) 12.3.0 ❯ checking compilation flags used ... NOTE Compilation used the following non-portable flag(s): '-mavx' '-mfma' '-msse4.2' ❯ checking C++ specification ... NOTE Specified C++14: please drop specification unless essential But as far as I understand, the flags are necessary, at least in GCC. How can I make this portable and CRAN-acceptable? I think it the best way for portability is to use a higher-level library that already has done the low-level business of maintaining multiple versions of the code (with multiple instruction sets) and choosing one appropriate for the current CPU. It could be say LAPACK, BLAS, openmp, depending of the problem at hand. In some cases, code can be rewritten so that the compiler can vectorize it better, using the level of vectorized instructions that have been enabled. Unconditionally using GCC-specific or architecture-specific options in packages would certainly not be portable. Even on Windows, R is now used also with clang and on aarch64, so one should not assume a concrete compiler and architecture. Please note also that GCC on Windows has a bug due to which AVX2 instructions cannot be used reliably - the compiler doesn't always properly align local variables on the stack when emitting these. See [1,2] for more information. Best Tomas [1] https://stat.ethz.ch/pipermail/r-sig-windows/2024q1/000113.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412 kind regards, Jesse __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] using portable simd instructions
On 26 March 2024 at 10:53, jesse koops wrote: | How can I make this portable and CRAN-acceptable? But writing (or borrowing ?) some hardware detection via either configure / autoconf or cmake. This is no different than other tasks decided at install-time. Start with 'Writing R Extensions', as always, and work your way up from there. And if memory serves there are already a few other packages with SIMD at CRAN so you can also try to take advantage of the search for a 'token' (here: 'SIMD') at the (unofficial) CRAN mirror at GitHub: https://github.com/search?q=org%3Acran%20SIMD=code Hth, Dirk -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] using portable simd instructions
Hello R-package-devel, I recently got inspired by the rcppsimdjson package to try out simd registers. It works fantastic on my computer but I struggle to find information on how to make it portable. It doesn't help in this case that R and Rcpp make including Cpp code so easy that I have never had to learn about cmake and compiler flags. I would appreciate any help, including of the type: "go read instructions at ...". I use RcppArmadillo and Rcpp. I currenlty include the following header: #include The functions in immintrin that I use are: _mm256_loadu_pd _mm256_set1_pd _mm256_mul_pd _mm256_fmadd_pd _mm256_storeu_pd and I define up to four __m256d registers. From information found online (not sure where anymore) I constructed the following makevars file: CXX_STD = CXX14 PKG_CPPFLAGS = -I../inst/include -mfma -msse4.2 -mavx PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) (I also use openmp, that has always worked fine, I just included all lines for completeness) Rcheck gives me two notes: ─ using R version 4.3.2 (2023-10-31 ucrt) ─ using platform: x86_64-w64-mingw32 (64-bit) ─ R was compiled by gcc.exe (GCC) 12.3.0 GNU Fortran (GCC) 12.3.0 ❯ checking compilation flags used ... NOTE Compilation used the following non-portable flag(s): '-mavx' '-mfma' '-msse4.2' ❯ checking C++ specification ... NOTE Specified C++14: please drop specification unless essential But as far as I understand, the flags are necessary, at least in GCC. How can I make this portable and CRAN-acceptable? kind regards, Jesse __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel