Bug#1068890: diffoscope: --hard-timeout option

2024-04-18 Thread Chris Lamb
Vagrant Cascadian wrote:

> On 2024-04-16, Chris Lamb wrote:
>> However, I think this first iteration of --hard-timeout time has a few
>> things that would need ironing out first, and potentially make it not
>> worth implementing:
>>
>> (1) You suggest it should start again with "--max-container-depth 3",
>> but it would surely need some syntax (or another option?) to control
>> that "3" (but for the second time only).
>
> What about going the other direction ... starting with a very small
> value for max-container-depth, and incrementally increasing it,
> generating a report (or at least storing sufficient data to generate
> one) in between each increment, so you always get some information, but
> essentially incrementally increase the resolution?
>
> Or would that approach just be too inefficient?

This is probably a separate required best suited to another  issue  at
this point, but I do like the idea  of  being  able  to  incrementally
increase the resolution over time.  Depending  on  how  it  worked  in
practice, there should not be significant overhead  in  managing  this
if, say, the commands that could not be run "in time" would have token
placeholders internally that rendered to text  in  the  output  rather
than non-trivial/expensive binary diffs.

On the negative side though, I think this would still require a robust
way of killing long-running processes  as  outlined  previously.   But
moreover it would require a HUGE reworking of how  diffoscope  handles
containers and recurses into nested structures in its tree-like style.
Indeed, thinking about it, this change would pretty  much  be  exactly
the same work needed to make diffoscope  run  in  parallel  (!)  which
hopefully communicates both the scope of the  changes  that  would  be
needed to achieve this, and that making  diffoscope  run  in  parallel
also  has   other   benefits.Anyway,   mini   brain   dump   over.


Regards,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o



Bug#1068890: diffoscope: --hard-timeout option

2024-04-18 Thread Chris Lamb
Holger Levsen wrote:

>> (1) You suggest it should start again with "--max-container-depth 3",
>> but it would surely need some syntax (or another option?) to control
>> that "3" (but for the second time only).
>
> another option, --second-pass-max-container-depth or some such
>
>> (2) In fact, its easy to imagine that one would want to restart with
>> other restrictions as well: not just --max-container-depth. For
>> instance, excluding external commands like readelf and objdump that
>> you know to be slow.
>
> yes, that's a good idea and IMO should be automatically implied for the
> 2nd pass or round or try.

It's definitely a "good idea" in the sense that I can  definitely  see
someone   wanting   to   achieve   that   as   an   end   result:)

Yet… upon thinking about it a bit, I don't think it is a good idea  at
all for diffoscope to  grow  a  bunch  of  new  options  or  hardcoded
defaults for a second run.  What (1) and (2) show here is that as soon
as a user would like to adjust these second pass options in  any  way,
then the whole interface becomes very  unwieldy.  Not  only  that, but
from the user's point of view it's neither flexible nor transparent as
well, especially when compared to "just" running diffoscope twice with
different options.  There's no "magic" there, if you see what I  mean.

Can we implement running diffoscope twice  on  tests.r-b.org  manually
first and see how that  goes?   I'm  not  100%  against  the  idea  of
implementing this in diffoscope eventually, but it would make a lot of
sense to try out the "manual" version first and gain  some  real-world
experience first.


Regards,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o



Bug#1068890: diffoscope: --hard-timeout option

2024-04-16 Thread Vagrant Cascadian
On 2024-04-16, Chris Lamb wrote:
> However, I think this first iteration of --hard-timeout time has a few
> things that would need ironing out first, and potentially make it not
> worth implementing:
>
> (1) You suggest it should start again with "--max-container-depth 3",
> but it would surely need some syntax (or another option?) to control
> that "3" (but for the second time only).

What about going the other direction ... starting with a very small
value for max-container-depth, and incrementally increasing it,
generating a report (or at least storing sufficient data to generate
one) in between each increment, so you always get some information, but
essentially incrementally increase the resolution?

Or would that approach just be too inefficient?


> (2) In fact, its easy to imagine that one would want to restart with
> other restrictions as well: not just --max-container-depth. For
> instance, excluding external commands like readelf and objdump that
> you know to be slow.

Ah, yes, knowing the common time sinks would be tremendously helpful!


live well,
  vagrant


signature.asc
Description: PGP signature


Bug#1068890: diffoscope: --hard-timeout option

2024-04-16 Thread Holger Levsen
On Tue, Apr 16, 2024 at 04:51:09PM +0100, Chris Lamb wrote:
> Just to say that I am totally on board with the idea of ensuring we
> get _something_ out of diffoscope on tests.reproducible-builds.org.

:) great!

> Way better than 250 timeouts.

https://tests.reproducible-builds.org/debian/stats_breakages.png
showed that in the last 3-4 years there was constant progress on that! \o/

> However, I think this first iteration of --hard-timeout time has a few
> things that would need ironing out first, and potentially make it not
> worth implementing:
> 
> (1) You suggest it should start again with "--max-container-depth 3",
> but it would surely need some syntax (or another option?) to control
> that "3" (but for the second time only).

another option, --second-pass-max-container-depth or some such

> (2) In fact, its easy to imagine that one would want to restart with
> other restrictions as well: not just --max-container-depth. For
> instance, excluding external commands like readelf and objdump that
> you know to be slow.

yes, that's a good idea and IMO should be automatically implied for the
2nd pass or round or try.

> (3) The output might need some comment saying "this was re-run with
> restrictions as we hit a timeout".

absolutly.

> (4) My gut feel that it would not be all that great to rely on CPython
> to really properly clear up child processes after a certain amount of
> time. Although I believe the most reliable top-level description to do
> this kind of thing inside CPython is to start a watchdog thread that
> sleeps until the timeout and then tries to kill everything, but my
> experience of doing anything like this within Python itself is not
> great, and essentially always needed something at the process level
> outside of it for it to be reliable. A container would be even more
> effective, I'm sure.

hmmm.

> In other words, I think the best way of achieving the result we want
> is, alas, by doing it outside of diffoscope at the level of the
> Jenkins. As in, exactly what you describe here:
> 
> > Else we could also extend the current code for tests.r-b.o/debian, 
> > which currently
> > just kills diffoscope after 2h, to then run diffoscope 
> > --max-container-depth 3 :)
> 
> Is that a massive faff?  :/

not really, I guess it would be rather simple even, I just thought
(or think?) that it would be a nice feature for diffoscope proper.


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

The purpose of propaganda isn't to make you believe something. It's to make you
believe nothing. So that you do nothing. (@DarthPutinKGB)


signature.asc
Description: PGP signature


Bug#1068890: diffoscope: --hard-timeout option

2024-04-16 Thread Chris Lamb
Holger Levsen wrote:

> Anyhow, about my --hard-timeout option idea:
>
> my idea of "--hard-timeout $time" is that diffoscope terminates itself 
> after $time, no matter what *and* then re-starts itself with 
> "--max-container-depth 3"

Just to say that I am totally on board with the idea of ensuring we
get _something_ out of diffoscope on tests.reproducible-builds.org.
Way better than 250 timeouts.

However, I think this first iteration of --hard-timeout time has a few
things that would need ironing out first, and potentially make it not
worth implementing:

(1) You suggest it should start again with "--max-container-depth 3",
but it would surely need some syntax (or another option?) to control
that "3" (but for the second time only).

(2) In fact, its easy to imagine that one would want to restart with
other restrictions as well: not just --max-container-depth. For
instance, excluding external commands like readelf and objdump that
you know to be slow.

(3) The output might need some comment saying "this was re-run with
restrictions as we hit a timeout".

(4) My gut feel that it would not be all that great to rely on CPython
to really properly clear up child processes after a certain amount of
time. Although I believe the most reliable top-level description to do
this kind of thing inside CPython is to start a watchdog thread that
sleeps until the timeout and then tries to kill everything, but my
experience of doing anything like this within Python itself is not
great, and essentially always needed something at the process level
outside of it for it to be reliable. A container would be even more
effective, I'm sure.

In other words, I think the best way of achieving the result we want
is, alas, by doing it outside of diffoscope at the level of the
Jenkins. As in, exactly what you describe here:

> Else we could also extend the current code for tests.r-b.o/debian, 
> which currently
> just kills diffoscope after 2h, to then run diffoscope 
> --max-container-depth 3 :)

Is that a massive faff?  :/


Best wishes,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o



Bug#1068890: diffoscope: --hard-timeout option

2024-04-12 Thread Holger Levsen
Package: diffoscope
Version: 264
Severity: wishlist

Dear Maintainer,

currenlty diffoscope has a --timeout option

   --timeout SECONDS
  Best-effort attempt at a global timeout in seconds. If enabled, 
diffoscope will not recurse into any further sub-archives
  after X seconds of total execution time.  (default: no timeout) 
[experimental]

however this doesnt give any guarantees how long diffoscope will be running, so
so far we haven't used it for the RB CI tests, mostly because I'm not sure
what would be a good inner timeout (=for diffoscope) and what would be a good
good outer timeout (=for killing diffoscope from the outside no matter what).

Currently we use 2h as outer timeout, but have no inner timeout. Maybe we should
use --timeout 1h?

Anyhow, about my --hard-timeout option idea:

my idea of "--hard-timeout $time" is that diffoscope terminates itself after
$time, no matter what *and* then re-starts itself with "--max-container-depth 3"
(or whatever is useful to get a glimpse on what files in a Debian package
are different) (probably also with another hard timeout set...) as to guarantee
to always produce meaningful output (especially html output if specified with 
--html).

What do you think?

Else we could also extend the current code for tests.r-b.o/debian, which 
currently
just kills diffoscope after 2h, to then run diffoscope --max-container-depth 3 
:)

https://tests.reproducible-builds.org/debian/index_breakages.html lists
251 pkg/suite/arch combinations where diffoscope runs into a timeout...


& many thanks for rocking diffoscope airlines..! \o/

-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Bottled water companies don't produce water, they produce plastic bottles.


signature.asc
Description: PGP signature