> Now, I have the feeling that the long term solution would be for eigen to do a minimum of JIT.
I remember reading that one of the exact arithmetics library (GMP or MPFR) authors were running an automatic optimizer / benchmark software reshuffling instructions to produce the best results on a given micro-architecture. I suppose a hot spot JIT could get somehow close, but the hand tuned / automatically tuned code will likely be difficult to fully replace with a JIT, at least in the near future. On Fri, Sep 18, 2020 at 8:08 AM William Tambellini <[email protected]> wrote: > A solution : > > - do all the math/algos outside the main, in a dynamic libs (.so, > .dll, ...) > - build multiple dyn libs for the ISA you care about (sse.so, avx1.so, > avx2.so, avx512.so, ... ) > - dynamic loading the right lib from the main according to the > features of the current running deployed cpu: ( > https://github.com/google/cpu_features) > - calling your api in the lib from the main to let the backends run > the algo with the best optim > > Now, I have the feeling that the long term solution would be for eigen to > do a minimum of JIT. Example: oneDNN with asmjit : > https://github.com/asmjit/asmjit > Kind > W. > > <https://www.sdl.com/> > *Share yourfeedback with us* <https://www.surveymonkey.com/r/PYF190816> > > ------------------------------ > *From:* Edward Lam <[email protected]> > *Sent:* Thursday, September 17, 2020 9:24 PM > *To:* [email protected] <[email protected]> > *Subject:* Re: [eigen] Vectorization for general use > > Offhand, I wonder if you could put main() in its own source file and > compile it without any vectorization compiler options, and have that call > your real main() renamed in a different source file that does have > vectorization compiler options enabled. Then your new main() could do CPUID > checks (eg. https://stackoverflow.com/a/4823889 ) and bail out > gracefully. You will of course need to ensure that the CPUID checks are > accurate for your compiler options, which may present its own challenges. > > Cheers, > -Edward > > On Thu, Sep 17, 2020 at 10:52 PM Rob McDonald <[email protected]> > wrote: > > I maintain an open source program that uses Eigen. The vast majority of > my users do not compile the program, instead downloading a pre-compiled > binary from our website. About 80% are on Windows, 10% on Mac and 10% on > Linux. I only provide X86 builds, 32 and 64-bit on Windows, 64-bit only on > Mac and Linux. We may eliminate the 32-bit Windows build soon. > > Historically, I have compiled with no special flags enabling vectorization > options for the CPU. I would like to pursue this as I expect it will > unlock some nice performance gains. However, I'd like to keep things > simple and compatible for users. > > What happens when someone runs a program compiled with vectorization when > their CPU does not support it? If it fails, how graceful is the failure? > > Is there a standard approach to identify the capabilities of a given > machine? I could add that to my program and survey users before making a > change... Would such code still run on a machine that was in the process > of failing due to not having support for the built in vectorization? I.e. > if it is crashing, can we send a message as to why we're going down? > > Is there a graceful way to support multiple options? > > Any tips from other broad use applications is greatly appreciated. > > Rob > > > > > Click here > <https://www.mailcontrol.com/sr/IDXDiOSqylnGX2PQPOmvUhe0y89-yNqhZAviLmkDXL06gGw831_8qiYaAxJOEWVK7LHzKdJh-eoDMGoTToeXlw==> > to report this email as spam. >
