Bug#1068890: diffoscope: --hard-timeout option
Vagrant Cascadian wrote: > On 2024-04-16, Chris Lamb wrote: >> However, I think this first iteration of --hard-timeout time has a few >> things that would need ironing out first, and potentially make it not >> worth implementing: >> >> (1) You suggest it should start again with "--max-container-depth 3", >> but it would surely need some syntax (or another option?) to control >> that "3" (but for the second time only). > > What about going the other direction ... starting with a very small > value for max-container-depth, and incrementally increasing it, > generating a report (or at least storing sufficient data to generate > one) in between each increment, so you always get some information, but > essentially incrementally increase the resolution? > > Or would that approach just be too inefficient? This is probably a separate required best suited to another issue at this point, but I do like the idea of being able to incrementally increase the resolution over time. Depending on how it worked in practice, there should not be significant overhead in managing this if, say, the commands that could not be run "in time" would have token placeholders internally that rendered to text in the output rather than non-trivial/expensive binary diffs. On the negative side though, I think this would still require a robust way of killing long-running processes as outlined previously. But moreover it would require a HUGE reworking of how diffoscope handles containers and recurses into nested structures in its tree-like style. Indeed, thinking about it, this change would pretty much be exactly the same work needed to make diffoscope run in parallel (!) which hopefully communicates both the scope of the changes that would be needed to achieve this, and that making diffoscope run in parallel also has other benefits.Anyway, mini brain dump over. Regards, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org ⬊ ⬋ o ___ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds
Bug#1068890: diffoscope: --hard-timeout option
Holger Levsen wrote: >> (1) You suggest it should start again with "--max-container-depth 3", >> but it would surely need some syntax (or another option?) to control >> that "3" (but for the second time only). > > another option, --second-pass-max-container-depth or some such > >> (2) In fact, its easy to imagine that one would want to restart with >> other restrictions as well: not just --max-container-depth. For >> instance, excluding external commands like readelf and objdump that >> you know to be slow. > > yes, that's a good idea and IMO should be automatically implied for the > 2nd pass or round or try. It's definitely a "good idea" in the sense that I can definitely see someone wanting to achieve that as an end result:) Yet… upon thinking about it a bit, I don't think it is a good idea at all for diffoscope to grow a bunch of new options or hardcoded defaults for a second run. What (1) and (2) show here is that as soon as a user would like to adjust these second pass options in any way, then the whole interface becomes very unwieldy. Not only that, but from the user's point of view it's neither flexible nor transparent as well, especially when compared to "just" running diffoscope twice with different options. There's no "magic" there, if you see what I mean. Can we implement running diffoscope twice on tests.r-b.org manually first and see how that goes? I'm not 100% against the idea of implementing this in diffoscope eventually, but it would make a lot of sense to try out the "manual" version first and gain some real-world experience first. Regards, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org ⬊ ⬋ o ___ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds
Bug#1068890: diffoscope: --hard-timeout option
On 2024-04-16, Chris Lamb wrote: > However, I think this first iteration of --hard-timeout time has a few > things that would need ironing out first, and potentially make it not > worth implementing: > > (1) You suggest it should start again with "--max-container-depth 3", > but it would surely need some syntax (or another option?) to control > that "3" (but for the second time only). What about going the other direction ... starting with a very small value for max-container-depth, and incrementally increasing it, generating a report (or at least storing sufficient data to generate one) in between each increment, so you always get some information, but essentially incrementally increase the resolution? Or would that approach just be too inefficient? > (2) In fact, its easy to imagine that one would want to restart with > other restrictions as well: not just --max-container-depth. For > instance, excluding external commands like readelf and objdump that > you know to be slow. Ah, yes, knowing the common time sinks would be tremendously helpful! live well, vagrant signature.asc Description: PGP signature ___ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds
Bug#1068890: diffoscope: --hard-timeout option
On Tue, Apr 16, 2024 at 04:51:09PM +0100, Chris Lamb wrote: > Just to say that I am totally on board with the idea of ensuring we > get _something_ out of diffoscope on tests.reproducible-builds.org. :) great! > Way better than 250 timeouts. https://tests.reproducible-builds.org/debian/stats_breakages.png showed that in the last 3-4 years there was constant progress on that! \o/ > However, I think this first iteration of --hard-timeout time has a few > things that would need ironing out first, and potentially make it not > worth implementing: > > (1) You suggest it should start again with "--max-container-depth 3", > but it would surely need some syntax (or another option?) to control > that "3" (but for the second time only). another option, --second-pass-max-container-depth or some such > (2) In fact, its easy to imagine that one would want to restart with > other restrictions as well: not just --max-container-depth. For > instance, excluding external commands like readelf and objdump that > you know to be slow. yes, that's a good idea and IMO should be automatically implied for the 2nd pass or round or try. > (3) The output might need some comment saying "this was re-run with > restrictions as we hit a timeout". absolutly. > (4) My gut feel that it would not be all that great to rely on CPython > to really properly clear up child processes after a certain amount of > time. Although I believe the most reliable top-level description to do > this kind of thing inside CPython is to start a watchdog thread that > sleeps until the timeout and then tries to kill everything, but my > experience of doing anything like this within Python itself is not > great, and essentially always needed something at the process level > outside of it for it to be reliable. A container would be even more > effective, I'm sure. hmmm. > In other words, I think the best way of achieving the result we want > is, alas, by doing it outside of diffoscope at the level of the > Jenkins. As in, exactly what you describe here: > > > Else we could also extend the current code for tests.r-b.o/debian, > > which currently > > just kills diffoscope after 2h, to then run diffoscope > > --max-container-depth 3 :) > > Is that a massive faff? :/ not really, I guess it would be rather simple even, I just thought (or think?) that it would be a nice feature for diffoscope proper. -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ The purpose of propaganda isn't to make you believe something. It's to make you believe nothing. So that you do nothing. (@DarthPutinKGB) signature.asc Description: PGP signature ___ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds
Bug#1068890: diffoscope: --hard-timeout option
Holger Levsen wrote: > Anyhow, about my --hard-timeout option idea: > > my idea of "--hard-timeout $time" is that diffoscope terminates itself > after $time, no matter what *and* then re-starts itself with > "--max-container-depth 3" Just to say that I am totally on board with the idea of ensuring we get _something_ out of diffoscope on tests.reproducible-builds.org. Way better than 250 timeouts. However, I think this first iteration of --hard-timeout time has a few things that would need ironing out first, and potentially make it not worth implementing: (1) You suggest it should start again with "--max-container-depth 3", but it would surely need some syntax (or another option?) to control that "3" (but for the second time only). (2) In fact, its easy to imagine that one would want to restart with other restrictions as well: not just --max-container-depth. For instance, excluding external commands like readelf and objdump that you know to be slow. (3) The output might need some comment saying "this was re-run with restrictions as we hit a timeout". (4) My gut feel that it would not be all that great to rely on CPython to really properly clear up child processes after a certain amount of time. Although I believe the most reliable top-level description to do this kind of thing inside CPython is to start a watchdog thread that sleeps until the timeout and then tries to kill everything, but my experience of doing anything like this within Python itself is not great, and essentially always needed something at the process level outside of it for it to be reliable. A container would be even more effective, I'm sure. In other words, I think the best way of achieving the result we want is, alas, by doing it outside of diffoscope at the level of the Jenkins. As in, exactly what you describe here: > Else we could also extend the current code for tests.r-b.o/debian, > which currently > just kills diffoscope after 2h, to then run diffoscope > --max-container-depth 3 :) Is that a massive faff? :/ Best wishes, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org ⬊ ⬋ o ___ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds
Bug#1068890: diffoscope: --hard-timeout option
Package: diffoscope Version: 264 Severity: wishlist Dear Maintainer, currenlty diffoscope has a --timeout option --timeout SECONDS Best-effort attempt at a global timeout in seconds. If enabled, diffoscope will not recurse into any further sub-archives after X seconds of total execution time. (default: no timeout) [experimental] however this doesnt give any guarantees how long diffoscope will be running, so so far we haven't used it for the RB CI tests, mostly because I'm not sure what would be a good inner timeout (=for diffoscope) and what would be a good good outer timeout (=for killing diffoscope from the outside no matter what). Currently we use 2h as outer timeout, but have no inner timeout. Maybe we should use --timeout 1h? Anyhow, about my --hard-timeout option idea: my idea of "--hard-timeout $time" is that diffoscope terminates itself after $time, no matter what *and* then re-starts itself with "--max-container-depth 3" (or whatever is useful to get a glimpse on what files in a Debian package are different) (probably also with another hard timeout set...) as to guarantee to always produce meaningful output (especially html output if specified with --html). What do you think? Else we could also extend the current code for tests.r-b.o/debian, which currently just kills diffoscope after 2h, to then run diffoscope --max-container-depth 3 :) https://tests.reproducible-builds.org/debian/index_breakages.html lists 251 pkg/suite/arch combinations where diffoscope runs into a timeout... & many thanks for rocking diffoscope airlines..! \o/ -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Bottled water companies don't produce water, they produce plastic bottles. signature.asc Description: PGP signature ___ Reproducible-builds mailing list Reproducible-builds@alioth-lists.debian.net https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/reproducible-builds