Bug#949638: tesseract: uses -march=native

2020-05-24 Thread Adrian Bunk
On Sun, May 24, 2020 at 10:14:49PM +0200, Stefan Weil wrote:
> Adrian, I am afraid that there is a misunderstanding.
> 
> The code part which is compiled with -march=native is never executed by
> default.

I get that point.

> There is a command line option which allows users to select the code
> which is used for certain time critical calculations (dot product). A
> wrong choice is not a security problem

You misunderstand the part about the security update,
security updates are just the most common reason why
a package gets updated (and therefore rebuilt) in a
stable distribution.

Example:
Debian 11 will be released in summer 2021.
In autumn 2021 a user sets up a new system and selects "native"
for an important production setup with an Intel CPU.
In spring 2022 a (security or other) update for Tesseract happens
in Debian 11, built on a buildd with the latest AMD CPU.
The working production setup suddenly always crashes.

> That's quite common for other packages including the standard C
> library and scientific libraries, too. They all contain optimized
> functions which require certain hardware and which crash otherwise.

With proper runtime autodetection of the hardware, if you manage to get 
a crash it is a bug in these packages. It is quite rare that packages 
offer manual selection in addition to autodetection.

> but simply will crash the
> application, no matter whether the user selected "native", "avx" or
> "neon".

Even when built on the same computer I would have doubts whether
automatic vectorization[1] of the trivial C code really beats the 
hand-written AVX2 code, but when the code is not even built for
the computer in question what's the point?

A "native" option meaning "some random buildd somewhere" is just
confusing, it doesn't make sense for distributions.

> Regards
> 
> Stefan

cu
Adrian

[1] if it happens at all, the Debian package build currently overwrites
the -O3 with a subsequent -O2



Bug#949638: tesseract: uses -march=native

2020-05-24 Thread Stefan Weil
Adrian, I am afraid that there is a misunderstanding.

The code part which is compiled with -march=native is never executed by
default.

There is a command line option which allows users to select the code
which is used for certain time critical calculations (dot product). A
wrong choice is not a security problem but simply will crash the
application, no matter whether the user selected "native", "avx" or
"neon". That's quite common for other packages including the standard C
library and scientific libraries, too. They all contain optimized
functions which require certain hardware and which crash otherwise.

Regards

Stefan



Processed: Re: Bug#949638: tesseract: uses -march=native

2020-05-24 Thread Debian Bug Tracking System
Processing control commands:

> severity -1 serious
Bug #949638 [tesseract] tesseract: uses -march=native
Severity set to 'serious' from 'normal'

-- 
949638: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949638
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#949638: tesseract: uses -march=native

2020-01-24 Thread peter green

Severity 949638 normal
Thanks

On 24/01/2020 19:16, Stefan Weil wrote:

As far as I know all Linux distributions use the autoconf based build,

Debian certainly does appear to be using the autoconf based build.

The default autoconf build uses -march=native only if it is supported by
the compiler


Which, of course it is.


  and only for a single file, but not for the rest of the
code. The code from that single file is not executed by default, but
only if an advanced user runs Tesseract with a special command line
option (-c dotproduct=native).

Ok, that dramatically reduces the impact of this issue. Downgrading the bug to 
normal.

I still don't think -march=native is appropriate for a binary distribution 
though. If you want to offer different versions of the code built with 
different CPU requirements, that is fine, but please don't let them depend on 
what CPU happens to be in the autobuilder.



Bug#949638: tesseract: uses -march=native

2020-01-24 Thread Stefan Weil
It is not necessary to patch Tesseract code if for whatever reason
-march=native is completely unwanted.

`make libtesseract_native_la_CXXFLAGS=` will override the extra compiler
flags which are used to produce the native code, so only the default
flags which don't include -march=native will be used.

Stefan



Bug#949638: tesseract: uses -march=native

2020-01-24 Thread Stefan Weil
> The URL for the patch is 404.

s/tessarect/tesseract/

The fixed URL is https://debdiffs.raspbian.org/main/t/tesseract/.

Stefan



Bug#949638: tesseract: uses -march=native

2020-01-24 Thread Stefan Weil
Am 24.01.20 um 19:55 schrieb Jeff Breidenbach:

>
> Regarding: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949638
>
> Thank you, Peter.
>
> 1. The URL for the patch is 404.
>
> 2. There may be some subtlety with -march=native, specifically related to
> detection of  SIMD instructions like AVX2. There's been an enormous
> amount of back & forth on this topic in upstream over the years, so
> I'd like
> to take this bug there and let them weigh in.
>
> Jeff


That might be a false alarm.

Tesseract supports two different build systems, one based on cmake, one
based on autoconf.

As far as I know all Linux distributions use the autoconf based build,
so they should not be affected by the existing problems from the cmake
build.

The default autoconf build uses -march=native only if it is supported by
the compiler and only for a single file, but not for the rest of the
code. The code from that single file is not executed by default, but
only if an advanced user runs Tesseract with a special command line
option (-c dotproduct=native).

Stefan



Bug#949638: tesseract: uses -march=native

2020-01-24 Thread Jeff Breidenbach
BCC: Stefan Weil since I don't know if he wants his email posted in
bugs.debian.org

Regarding: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949638

Thank you, Peter.

1. The URL for the patch is 404.

2. There may be some subtlety with -march=native, specifically related to
detection of  SIMD instructions like AVX2. There's been an enormous
amount of back & forth on this topic in upstream over the years, so I'd like
to take this bug there and let them weigh in.

Jeff


Bug#949638: tesseract: uses -march=native

2020-01-22 Thread peter green

Package: tesseract
Version: 4.1.0-1
Severity: serious
Tags: patch

I recently discovered that tesseract 4.1.1-1 failed the armv7 contamination 
check we run in raspbian.

Investigating shows that since version 4.1.0-1 tesseract started using 
-march=native. This compiler option is totally inappropriate for a binary 
distribution like Debian or Raspbian, because it means that the minimum CPU 
requirements of the resulting binaries will depend on what CPU the buildbox 
happens to have.

4.1.0-1 was never built in raspbian, I am not sure why 4.1.0-2 passed the 
contamination check in raspbian. My best guess is that -march=native on arm is 
poorly implemented and does not recognise the CPUs on some of our buildboxes.

Anyway I whipped up a fix and uploaded it to raspbian. A debdiff should appear 
soon at https://debdiffs.raspbian.org/main/t/tessarect/