Bug#949638: tesseract: uses -march=native
On Sun, May 24, 2020 at 10:14:49PM +0200, Stefan Weil wrote: > Adrian, I am afraid that there is a misunderstanding. > > The code part which is compiled with -march=native is never executed by > default. I get that point. > There is a command line option which allows users to select the code > which is used for certain time critical calculations (dot product). A > wrong choice is not a security problem You misunderstand the part about the security update, security updates are just the most common reason why a package gets updated (and therefore rebuilt) in a stable distribution. Example: Debian 11 will be released in summer 2021. In autumn 2021 a user sets up a new system and selects "native" for an important production setup with an Intel CPU. In spring 2022 a (security or other) update for Tesseract happens in Debian 11, built on a buildd with the latest AMD CPU. The working production setup suddenly always crashes. > That's quite common for other packages including the standard C > library and scientific libraries, too. They all contain optimized > functions which require certain hardware and which crash otherwise. With proper runtime autodetection of the hardware, if you manage to get a crash it is a bug in these packages. It is quite rare that packages offer manual selection in addition to autodetection. > but simply will crash the > application, no matter whether the user selected "native", "avx" or > "neon". Even when built on the same computer I would have doubts whether automatic vectorization[1] of the trivial C code really beats the hand-written AVX2 code, but when the code is not even built for the computer in question what's the point? A "native" option meaning "some random buildd somewhere" is just confusing, it doesn't make sense for distributions. > Regards > > Stefan cu Adrian [1] if it happens at all, the Debian package build currently overwrites the -O3 with a subsequent -O2
Bug#949638: tesseract: uses -march=native
Adrian, I am afraid that there is a misunderstanding. The code part which is compiled with -march=native is never executed by default. There is a command line option which allows users to select the code which is used for certain time critical calculations (dot product). A wrong choice is not a security problem but simply will crash the application, no matter whether the user selected "native", "avx" or "neon". That's quite common for other packages including the standard C library and scientific libraries, too. They all contain optimized functions which require certain hardware and which crash otherwise. Regards Stefan
Processed: Re: Bug#949638: tesseract: uses -march=native
Processing control commands: > severity -1 serious Bug #949638 [tesseract] tesseract: uses -march=native Severity set to 'serious' from 'normal' -- 949638: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949638 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#949638: tesseract: uses -march=native
Severity 949638 normal Thanks On 24/01/2020 19:16, Stefan Weil wrote: As far as I know all Linux distributions use the autoconf based build, Debian certainly does appear to be using the autoconf based build. The default autoconf build uses -march=native only if it is supported by the compiler Which, of course it is. and only for a single file, but not for the rest of the code. The code from that single file is not executed by default, but only if an advanced user runs Tesseract with a special command line option (-c dotproduct=native). Ok, that dramatically reduces the impact of this issue. Downgrading the bug to normal. I still don't think -march=native is appropriate for a binary distribution though. If you want to offer different versions of the code built with different CPU requirements, that is fine, but please don't let them depend on what CPU happens to be in the autobuilder.
Bug#949638: tesseract: uses -march=native
It is not necessary to patch Tesseract code if for whatever reason -march=native is completely unwanted. `make libtesseract_native_la_CXXFLAGS=` will override the extra compiler flags which are used to produce the native code, so only the default flags which don't include -march=native will be used. Stefan
Bug#949638: tesseract: uses -march=native
> The URL for the patch is 404. s/tessarect/tesseract/ The fixed URL is https://debdiffs.raspbian.org/main/t/tesseract/. Stefan
Bug#949638: tesseract: uses -march=native
Am 24.01.20 um 19:55 schrieb Jeff Breidenbach: > > Regarding: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949638 > > Thank you, Peter. > > 1. The URL for the patch is 404. > > 2. There may be some subtlety with -march=native, specifically related to > detection of SIMD instructions like AVX2. There's been an enormous > amount of back & forth on this topic in upstream over the years, so > I'd like > to take this bug there and let them weigh in. > > Jeff That might be a false alarm. Tesseract supports two different build systems, one based on cmake, one based on autoconf. As far as I know all Linux distributions use the autoconf based build, so they should not be affected by the existing problems from the cmake build. The default autoconf build uses -march=native only if it is supported by the compiler and only for a single file, but not for the rest of the code. The code from that single file is not executed by default, but only if an advanced user runs Tesseract with a special command line option (-c dotproduct=native). Stefan
Bug#949638: tesseract: uses -march=native
BCC: Stefan Weil since I don't know if he wants his email posted in bugs.debian.org Regarding: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949638 Thank you, Peter. 1. The URL for the patch is 404. 2. There may be some subtlety with -march=native, specifically related to detection of SIMD instructions like AVX2. There's been an enormous amount of back & forth on this topic in upstream over the years, so I'd like to take this bug there and let them weigh in. Jeff
Bug#949638: tesseract: uses -march=native
Package: tesseract Version: 4.1.0-1 Severity: serious Tags: patch I recently discovered that tesseract 4.1.1-1 failed the armv7 contamination check we run in raspbian. Investigating shows that since version 4.1.0-1 tesseract started using -march=native. This compiler option is totally inappropriate for a binary distribution like Debian or Raspbian, because it means that the minimum CPU requirements of the resulting binaries will depend on what CPU the buildbox happens to have. 4.1.0-1 was never built in raspbian, I am not sure why 4.1.0-2 passed the contamination check in raspbian. My best guess is that -march=native on arm is poorly implemented and does not recognise the CPUs on some of our buildboxes. Anyway I whipped up a fix and uploaded it to raspbian. A debdiff should appear soon at https://debdiffs.raspbian.org/main/t/tessarect/