Control: severity -1 serious

On Fri, Jan 24, 2020 at 10:28:36PM +0100, Stefan Weil wrote:
> Am 24.01.20 um 21:53 schrieb peter green:
> 
> > I still don't think -march=native is appropriate for a binary
> > distribution though. If you want to offer different versions of the
> > code built with different CPU requirements, that is fine, but please
> > don't let them depend on what CPU happens to be in the autobuilder.
> 
> Better ideas are welcome.
> 
> Tesseract is used for mass processing of books which can take many weeks
> or even months. Therefore it is very important that the time critical
> code (dot product calculation) runs as fast as possible.
> 
> For x86_64 we know the available SIMD instructions (SSE, AVX, ...) which
> can be used, add code for all variants and check at runtime what is
> supported by the CPU.
> 
> For all other architectures (including ARM) there is currently no such
> special code, and the default code is rather slow. By using
> -march=native for the alternate code, hopefully the compiler will
> produce code which runs faster on any machine which is similar to the
> build machine.

And which will crash on any machine that is not compatible.

If Debian gets an AMD Threadripper buildd, then -march=native code built 
on that buildd will not run on any CPU from Intel.

The same is true for other architectures like ARM - if a buildd happens 
to be the latest server hardware and uses some uncommon CPU extension, 
it is not compatible with many machines.

> Users who build Tesseract on the machine which is also
> used for the mass production will get the best result like that. Users
> using a distribution can try the "native" option and either crash the
> program or get a possibly faster result.
> 
> I see the problem of builds which depend on an autobuilder which may be
> different for each build.

"each build" might be a security update in stable on a much newer buildd,
the tested setup running in production might then just crash.

As user I can handle a quirk or two when setting up a machine,
but anything that breaks out of the void is a huge pain.

If Tesseract is the main usage of the machine and every percetage point 
of performance matters, I would likely build it myself instead of using 
the distribution package.

> What would be the best solution for
> distributions? Suppress the code using a new configure option or some
> magic which detects that the build is for a Debian distribution? Choose
> compiler flags manually for the "native" option (that is already
> possible, see my previous answer)? Other solutions?

The important point is that manual installations and distributions have
different needs.

When a user is manually compiling and installing a software on a machine 
-march=native can be used for everything, not just the most time 
critical part - it is faster with no real downside.

For a distribution you want several variants like what you already
have for x86_64, and no -march=native ever.

If 32bit ARM still matters for your users they might want to use NEON,
many Debian buildds have CPUs that do not support NEON.

For 64bit ARM they might want armv8.2-a+dotprod,
I do not think any current Debian buildd supports that.

You should know best which extensions actually make a difference,
and when the fastest option is autodetected at runtime it is most
likely to benefit the user of a distribution package.

> Stefan

cu
Adrian

Reply via email to