Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Florian Schmaus writes: > [[PGP Signed Part:Undecided]] > On 30/06/2023 10.22, Sam James wrote: >> Florian Schmaus writes: >>> [[PGP Signed Part:Undecided]] >>> [in reply to a gentoo-project@ post, but it was asked to continue this >>> on gentoo-dev@] >>> On 28/06/2023 16.46, Sam James wrote: and questions remain unanswered on the ML (why not implement a check in pkgcheck similar to what is in Portage, for example)? >>> >>> On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for >>> the total package-directory size. I only care a little about the tool >>> that checks this limit, but pkgcheck is an obvious choice. I also >>> suggested that we review this policy once the number of Go packages >>> has doubled or two years after this policy was established (whatever >>> comes first). >>> >>> But I fear you may be referring to another kind of check. You may be >>> talking about a check that forbids EGO_SUM in ::gentoo but allows it >>> overlays. >> My position on this has been consistent: > a check is needed to >> statically >> determine when the environment size is too big. Copying the Portage >> check into pkgcheck (in terms of the metrics) would satisfy this. > > It is not as easy as merely copying existing portage code into > pkgcheck (unless I am missing something). > That's why I said "in terms of the metrics". > I've talked to arthurzam, and there appears to be a .environment file > created by pkgcheck, which we could use to approximate the exported > environment. > > Another option would be to have pkgcheck count the EGO_SUM > entries. The tree-sitter API for Bash, which pkgcheck already uses, > seems to allow for that. But that would be different from the check in > portage. Although, IMHO, counting EGO_SUM entries would be sufficient. Right. > > >> That is, regardless of raw size, I'm asking for a calculation based on >> the contents of EGO_SUM where, if exceeded, the package will not be >> installable on some systems. You didn't have an issue implementing this >> for Portage and I've mentioned this a bunch of times since, so I thought >> it was clear what I was hoping to see. > > So pkgcheck counting EGO_SUM entries would be sufficient for the > purpose of having a static check that notices if the ebuild would > likely run into the environment limit? > If you check it actually fires in some of the old broken scenarios (see Bugzilla), then yes. But I'd be interested in your thoughts on radhermit's reply (please reply there). > To find a common compromise, I would possibly invest my time in > developing such a test. Even though I do not deem such a check a > strict prerequisite to reintroduce EGO_SUM. Yes, you've made clear you disagree. > > >>> Intelligibly, EGO_SUM can be considered ugly. Compared to a >>> traditional Gentoo package, EGO_SUM-based ones are larger. The same is >>> true for Rust packages. However, looking at the bigger picture, >>> EGO_SUM's advantages outweigh its disadvantages. >>> >> Again, am on record as being fine with the general EGO_SUM approach, >> even if I wish we didn't need it, as I see it as inevitable for things >> like yarn, .NET, and of course Rust as we already have it. >> Just ideally not huge ones, and certainly not huge ones which then >> aren't even reliably installable because of environment size. > > Talking about "reliably installable" makes it sound to me like there > are cases where installing a EGO_SUM-based package sometimes works and > sometimes not. But the kernel-limit is fixed and not even > configurable, besides, of course patching the source (and in the > absence of architectures with a page size below 4 KiB) [1]. > ulm's reply notes that this is a limitation in the Linux kernel, so I have no idea why musl tinderboxes seemed to disproportionately hit these issues and I assume one of us either missing something or it was just a crazy fluke. > Any developer testing whether or notan ebuild is installable would > become immediately aware if the ebuild runs into the environment > limit, or not. > This clearly didn't happen with the previous examples (see what I said above too), as there were times when they installed for some people, but not in CI/tinderboxes. I don't know why and it merits investigation. signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Zoltan Puskas writes: > On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote: >> On 2023-07-03 Mon 04:17, Florian Schmaus wrote: >> >On 30/06/2023 13.33, Eray Aslan wrote: >> >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: >> >>>Why do we have to keep exporting the related variables that generally >> >>>cause these size issues to the environment? >> >> >> >>I really do not want to make a +1 response but this is an excellent >> >>question that we need to answer before implementing EGO_SUM. >> > >> >Could you please discuss why you make the reintroduction of EGO_SUM >> >dependent on this question? >> >> Just to be clear, I don't particularly care about EGO_SUM enough to gate >> its reintroduction (and don't have any leverage to do so anyway). I'm >> just tired of the circular discussions around env issues that all seem >> to avoid actual fixes, catering instead to functionality used by a >> vanishingly small subset of ebuilds in the main repo that compels a >> certain design mostly due to how portage functioned before EAPI 0. >> >> Other than that, supporting EGO_SUM (or any other language ecosystem >> trending towards distro-unfriendly releases) is fine as long as devs are >> cognizant how the related global-scope eclass design affects everyone >> running or working on the raw repo. I hope devs continue leveraging the >> relatively recent benchmark tooling (and perhaps more future support) to >> improve their work. Along those lines, it could be nice to see sample >> benchmark data in commit messages for large, global-scope eclass work >> just to reinforce that it was taken into account. >> >> Tim >> > > I've been following the EGO_SUM thread for quite some time now. One other > thing > I did not see mentioned in favour of EGO_SUM so far: reproducibility. > > The problem with external tarballs is that they are gone once the ebuild is > dropped from the tree. Should a user ever want to roll back to a previous > version of an application, either by checking out on older version of the > portage tree or copying said ebuild into their local overlay, they still > cannot > simply run an emerge on the it as they have to somehow recreate the tarball > itself too. I believe Hank's email coves this. signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Tue, Jul 04, 2023 at 01:13:30AM -0600, Tim Harder wrote: > On 2023-07-03 Mon 04:17, Florian Schmaus wrote: > >On 30/06/2023 13.33, Eray Aslan wrote: > >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: > >>>Why do we have to keep exporting the related variables that generally > >>>cause these size issues to the environment? > >> > >>I really do not want to make a +1 response but this is an excellent > >>question that we need to answer before implementing EGO_SUM. > > > >Could you please discuss why you make the reintroduction of EGO_SUM > >dependent on this question? > > Just to be clear, I don't particularly care about EGO_SUM enough to gate > its reintroduction (and don't have any leverage to do so anyway). I'm > just tired of the circular discussions around env issues that all seem > to avoid actual fixes, catering instead to functionality used by a > vanishingly small subset of ebuilds in the main repo that compels a > certain design mostly due to how portage functioned before EAPI 0. > > Other than that, supporting EGO_SUM (or any other language ecosystem > trending towards distro-unfriendly releases) is fine as long as devs are > cognizant how the related global-scope eclass design affects everyone > running or working on the raw repo. I hope devs continue leveraging the > relatively recent benchmark tooling (and perhaps more future support) to > improve their work. Along those lines, it could be nice to see sample > benchmark data in commit messages for large, global-scope eclass work > just to reinforce that it was taken into account. > > Tim > I've been following the EGO_SUM thread for quite some time now. One other thing I did not see mentioned in favour of EGO_SUM so far: reproducibility. The problem with external tarballs is that they are gone once the ebuild is dropped from the tree. Should a user ever want to roll back to a previous version of an application, either by checking out on older version of the portage tree or copying said ebuild into their local overlay, they still cannot simply run an emerge on the it as they have to somehow recreate the tarball itself too. While upstream may not host everything forever, it's pretty much guaranteed to be available for much longer than Gentoo's custom tarball bundles of dependencies. Regarding space we are also likely making trade-off. By deprecating EGO_SUM we are saving space in the portage tree but in exchange inflating distfiles as it will start accumulating the same dependencies potentially multiple times since now the content is hidden in tarballs containing a combination of dependencies. This is essentially the source file version of "statically linking". Finally a personal opinion: I find dependency tarballs opaque. With EGO_SUM the ebuild defines all the upstream sources it needs to build the package as well as how to build it, but with the dependency tarball the sources are all hidden and makes verification all that much harder. Zoltan
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Wed, Jul 05, 2023 at 20:40:34 +0200, Gerion Entrup wrote: > Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > > just to be curious about the whole discussion. I did not follow in the > > > > deepest detail but what I got is: > > > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > > > to be respected. A lot of these Manifest files lead to a extremely > > > > increased Portage tree size. EGO_SUM is just one example (though the > > > > biggest one). Statically linked languages like Rust etc. have the same > > > > problem. > > > > - The current solution is to prepackage all modules, put it somewhere on > > > > a webserver and just manifest that file. This make the Portage tree > > > > small in size again, but requires a webserver/mirror and is thus > > > > unfriendly for overlay devs. > > > > > > > > I'm not sure if it was mentioned before but has anyone considered hash > > > > trees / Merkle trees for the manifest file? The idea would be to hash > > > > the standard manifest file a second time if it gets too big and write > > > > down that hash as new manifest file and leave EGO_SUM as is. > > > This is out-of-tree/indirect Manifests, that I proposed here, more than > > > a year ago: > > > https://marc.info/?l=gentoo-dev=168280762310716=2 > > > https://marc.info/?l=gentoo-dev=165472088822215=2 > > > > > > Developing it requires PMS work in addition to package manager > > > development, because it introduces phases. > > > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > > - primary validation of distfiles > > > - secondary fetch of $SRC_URI per indirect Manifest > > > - secondary validation of additional distfiles > > > > > > A significantly impacted use case is "emerge -f", it now needs to run > > > downloads twice. > > > > > > > I'm not sure double downloading is required. Consider a flow similar to > > this: > > > > 1. distfiles are fetched as per the ebuild > > 2. distfiles are hashed into a temporary Manifest > > 3. temporary Manifest is hashed and compared with the hashes stored in > >the in-tree Manifest for the direct Manifest > > This is exactly, what I meant. A webstorage is not needed. A second > download process is also not needed. Just an additional Manifest format > is needed for ebuilds with more than n distfiles. > > > > A new Manifest format would be required in order to differentiate the > > current ones from an indirect one. This may require PMS changes, > > although I suspect ammending GLEP 74 may be enough since the PMS seems > > to just refer to the GLEP for a description of Manifests. > > > > This would also either rely on a stable ordering of Manifest contents > > when generating it or having a separate file listing in the indirect > > Manifest which corresponds to the order in the direct Manifest. For the > > latter, it should also have separate entries for different package > > versions so that every single distfile for every single version of said > > package does not need to be fetched in order to build the direct > > Manifest. > > > > I'm imagining something along these lines: > > > > INDIRECT true > > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 > > ALGO2 hash2 ... > > PACKAGE ... > > Maybe it is reasonable to skip the distfile names at all (or just > provide a hash value of the concatenated file names). Then the manifest > would just contain two/three hashes (for as many distfiles as the ebuild > needs). Since these kind of indirect Manifests should be more rare than > the normal ones, a slightly longer processing time does not have much > impact I would say. > My reasoning behind having the list of files is so that the intermediat/direct Manifest can be accurately recreated. Consider the following (not-so-)hypothetical Manifest: DIST dist.tar.gz 84703 BLAKE2B ... SHA512 ... DIST dist.tar.gz.asc 228 BLAKE2B ... SHA512 ... EBUILD package-r1.ebuild 1535 BLAKE2B ... SHA512 ... EBUILD package.ebuild 1536 BLAKE2B ... SHA512 ... MISC metadata.xml 959 BLAKE2B ... SHA512 ... It is "well behaved" because pkgdev created it. My main concern is if $OTHER_TOOLING generates the Manifest in a different order which would mean the Manifest may be correct, but you get a false negative since the hashes don't match what is in the in-tree indirect Manifest. Having the order specified in the indirect Manifest renders this moot because $OTHER_TOOLING would have to respect this in order to correctly handle indirect Manifests. Additionally, in repos without thin-manifests, the SRC_URI is not enough to build up the Manifest. This may or may not be an issue depending on if a repo's metadata/layout.conf is parsed as part of the Manifest verification process. > > > > Here `ALGO1` and
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Wed, Jul 5, 2023 at 2:40 PM Gerion Entrup wrote: > > Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > > > > > > Developing it requires PMS work in addition to package manager > > > development, because it introduces phases. > > > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > > - primary validation of distfiles > > > - secondary fetch of $SRC_URI per indirect Manifest > > > - secondary validation of additional distfiles > > > > > > A significantly impacted use case is "emerge -f", it now needs to run > > > downloads twice. > > > > I'm not sure double downloading is required. Consider a flow similar to > > this: > > > > 1. distfiles are fetched as per the ebuild > > 2. distfiles are hashed into a temporary Manifest > > 3. temporary Manifest is hashed and compared with the hashes stored in > >the in-tree Manifest for the direct Manifest > > This is exactly, what I meant. A webstorage is not needed. A second > download process is also not needed. Just an additional Manifest format > is needed for ebuilds with more than n distfiles. > I suspect that Robin was proposing indirect manfests AND src uris, and not just indirect manifests. In any case, if he wasn't, then I'd suggest it would make sense to have that so that we don't need giant lists of src_uris or go sums or whatever in ebuilds. Sure, the manifests are even larger than the original file references, but those will still be long. Plus if a file is used by 5 versions of an ebuild it will be present in the manifests once per hash function, but in the ebuilds 5 times. I agree though that if only the manifests are moved to a fetched file then you could fetch that on the first pass, though you'd still need the extra logic to parse it. I'm not sure it really is much of a difference to the effort involved. Aren't go sums already content hashes? It might make even more sense to create some kind of modular manifest verification logic in portage so that the same eclass that handles EGO_SUM could tell the package manager how to check the integrity of the files that are fetched. Well, assuming we trust whatever hash function they're using (I'm afraid to check - maybe this isn't such a great idea...). -- Rich
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Am Mittwoch, 5. Juli 2023, 01:09:30 CEST schrieb Oskari Pirhonen: > On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > > just to be curious about the whole discussion. I did not follow in the > > > deepest detail but what I got is: > > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > > to be respected. A lot of these Manifest files lead to a extremely > > > increased Portage tree size. EGO_SUM is just one example (though the > > > biggest one). Statically linked languages like Rust etc. have the same > > > problem. > > > - The current solution is to prepackage all modules, put it somewhere on > > > a webserver and just manifest that file. This make the Portage tree > > > small in size again, but requires a webserver/mirror and is thus > > > unfriendly for overlay devs. > > > > > > I'm not sure if it was mentioned before but has anyone considered hash > > > trees / Merkle trees for the manifest file? The idea would be to hash > > > the standard manifest file a second time if it gets too big and write > > > down that hash as new manifest file and leave EGO_SUM as is. > > This is out-of-tree/indirect Manifests, that I proposed here, more than > > a year ago: > > https://marc.info/?l=gentoo-dev=168280762310716=2 > > https://marc.info/?l=gentoo-dev=165472088822215=2 > > > > Developing it requires PMS work in addition to package manager > > development, because it introduces phases. > > > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > > - primary validation of distfiles > > - secondary fetch of $SRC_URI per indirect Manifest > > - secondary validation of additional distfiles > > > > A significantly impacted use case is "emerge -f", it now needs to run > > downloads twice. > > > > I'm not sure double downloading is required. Consider a flow similar to > this: > > 1. distfiles are fetched as per the ebuild > 2. distfiles are hashed into a temporary Manifest > 3. temporary Manifest is hashed and compared with the hashes stored in >the in-tree Manifest for the direct Manifest This is exactly, what I meant. A webstorage is not needed. A second download process is also not needed. Just an additional Manifest format is needed for ebuilds with more than n distfiles. > A new Manifest format would be required in order to differentiate the > current ones from an indirect one. This may require PMS changes, > although I suspect ammending GLEP 74 may be enough since the PMS seems > to just refer to the GLEP for a description of Manifests. > > This would also either rely on a stable ordering of Manifest contents > when generating it or having a separate file listing in the indirect > Manifest which corresponds to the order in the direct Manifest. For the > latter, it should also have separate entries for different package > versions so that every single distfile for every single version of said > package does not need to be fetched in order to build the direct > Manifest. > > I'm imagining something along these lines: > > INDIRECT true > PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 > ALGO2 hash2 ... > PACKAGE ... Maybe it is reasonable to skip the distfile names at all (or just provide a hash value of the concatenated file names). Then the manifest would just contain two/three hashes (for as many distfiles as the ebuild needs). Since these kind of indirect Manifests should be more rare than the normal ones, a slightly longer processing time does not have much impact I would say. > Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest > containing the distfiles (and potentially other files if a repo does not > have thin-manifests enabled) and their hashes in the order specified > previously. > > The indirect Manifest as described above would be large-ish for a > package that has lots of distfiles, but likely much smaller than if each > distfile had its set of hashes stored directly. Without storing the filenames, the Manifest file would have the same small size for any amount of distfiles needed. Gerion > Please correct me if there's some detail I've overlooked. > > - Oskari > > > The rest of the posts also go into the matter of duplication within > > EGO_SUM & the indirect Manifests: limiting the growth requires some form > > of content-addressed layout. > > > > It's absolutely something we should get developed, but it's a lot of > > work. > > > > The indirect Manifests still provide a hosting challenge for overlays. > > > > > signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Tue, Jul 04, 2023 at 21:56:26 +, Robin H. Johnson wrote: > On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > > just to be curious about the whole discussion. I did not follow in the > > deepest detail but what I got is: > > - EGO_SUM blows up the Manifest file, since every little Go module needs > > to be respected. A lot of these Manifest files lead to a extremely > > increased Portage tree size. EGO_SUM is just one example (though the > > biggest one). Statically linked languages like Rust etc. have the same > > problem. > > - The current solution is to prepackage all modules, put it somewhere on > > a webserver and just manifest that file. This make the Portage tree > > small in size again, but requires a webserver/mirror and is thus > > unfriendly for overlay devs. > > > > I'm not sure if it was mentioned before but has anyone considered hash > > trees / Merkle trees for the manifest file? The idea would be to hash > > the standard manifest file a second time if it gets too big and write > > down that hash as new manifest file and leave EGO_SUM as is. > This is out-of-tree/indirect Manifests, that I proposed here, more than > a year ago: > https://marc.info/?l=gentoo-dev=168280762310716=2 > https://marc.info/?l=gentoo-dev=165472088822215=2 > > Developing it requires PMS work in addition to package manager > development, because it introduces phases. > > - primary fetch of $SRC_URI per ebuild, including indirect Manifest > - primary validation of distfiles > - secondary fetch of $SRC_URI per indirect Manifest > - secondary validation of additional distfiles > > A significantly impacted use case is "emerge -f", it now needs to run > downloads twice. > I'm not sure double downloading is required. Consider a flow similar to this: 1. distfiles are fetched as per the ebuild 2. distfiles are hashed into a temporary Manifest 3. temporary Manifest is hashed and compared with the hashes stored in the in-tree Manifest for the direct Manifest A new Manifest format would be required in order to differentiate the current ones from an indirect one. This may require PMS changes, although I suspect ammending GLEP 74 may be enough since the PMS seems to just refer to the GLEP for a description of Manifests. This would also either rely on a stable ordering of Manifest contents when generating it or having a separate file listing in the indirect Manifest which corresponds to the order in the direct Manifest. For the latter, it should also have separate entries for different package versions so that every single distfile for every single version of said package does not need to be fetched in order to build the direct Manifest. I'm imagining something along these lines: INDIRECT true PACKAGE category/package-version distfile1 distfile2 ... ALGO1 hash1 ALGO2 hash2 ... PACKAGE ... Here `ALGO1` and `hash1` correspond to the hash of the direct Manifest containing the distfiles (and potentially other files if a repo does not have thin-manifests enabled) and their hashes in the order specified previously. The indirect Manifest as described above would be large-ish for a package that has lots of distfiles, but likely much smaller than if each distfile had its set of hashes stored directly. Please correct me if there's some detail I've overlooked. - Oskari > The rest of the posts also go into the matter of duplication within > EGO_SUM & the indirect Manifests: limiting the growth requires some form > of content-addressed layout. > > It's absolutely something we should get developed, but it's a lot of > work. > > The indirect Manifests still provide a hosting challenge for overlays. > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation Treasurer > E-Mail : robb...@gentoo.org > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Tue, Jul 04, 2023 at 12:44:39PM +0200, Gerion Entrup wrote: > just to be curious about the whole discussion. I did not follow in the > deepest detail but what I got is: > - EGO_SUM blows up the Manifest file, since every little Go module needs > to be respected. A lot of these Manifest files lead to a extremely > increased Portage tree size. EGO_SUM is just one example (though the > biggest one). Statically linked languages like Rust etc. have the same > problem. > - The current solution is to prepackage all modules, put it somewhere on > a webserver and just manifest that file. This make the Portage tree > small in size again, but requires a webserver/mirror and is thus > unfriendly for overlay devs. > > I'm not sure if it was mentioned before but has anyone considered hash > trees / Merkle trees for the manifest file? The idea would be to hash > the standard manifest file a second time if it gets too big and write > down that hash as new manifest file and leave EGO_SUM as is. This is out-of-tree/indirect Manifests, that I proposed here, more than a year ago: https://marc.info/?l=gentoo-dev=168280762310716=2 https://marc.info/?l=gentoo-dev=165472088822215=2 Developing it requires PMS work in addition to package manager development, because it introduces phases. - primary fetch of $SRC_URI per ebuild, including indirect Manifest - primary validation of distfiles - secondary fetch of $SRC_URI per indirect Manifest - secondary validation of additional distfiles A significantly impacted use case is "emerge -f", it now needs to run downloads twice. The rest of the posts also go into the matter of duplication within EGO_SUM & the indirect Manifests: limiting the growth requires some form of content-addressed layout. It's absolutely something we should get developed, but it's a lot of work. The indirect Manifests still provide a hosting challenge for overlays. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Am Dienstag, 4. Juli 2023, 09:13:30 CEST schrieb Tim Harder: > On 2023-07-03 Mon 04:17, Florian Schmaus wrote: > >On 30/06/2023 13.33, Eray Aslan wrote: > >>On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: > >>>Why do we have to keep exporting the related variables that generally > >>>cause these size issues to the environment? > >> > >>I really do not want to make a +1 response but this is an excellent > >>question that we need to answer before implementing EGO_SUM. > > > >Could you please discuss why you make the reintroduction of EGO_SUM > >dependent on this question? > > Just to be clear, I don't particularly care about EGO_SUM enough to gate > its reintroduction (and don't have any leverage to do so anyway). I'm > just tired of the circular discussions around env issues that all seem > to avoid actual fixes, catering instead to functionality used by a > vanishingly small subset of ebuilds in the main repo that compels a > certain design mostly due to how portage functioned before EAPI 0. > > Other than that, supporting EGO_SUM (or any other language ecosystem > trending towards distro-unfriendly releases) is fine as long as devs are > cognizant how the related global-scope eclass design affects everyone > running or working on the raw repo. I hope devs continue leveraging the > relatively recent benchmark tooling (and perhaps more future support) to > improve their work. Along those lines, it could be nice to see sample > benchmark data in commit messages for large, global-scope eclass work > just to reinforce that it was taken into account. > > Tim Hi, just to be curious about the whole discussion. I did not follow in the deepest detail but what I got is: - EGO_SUM blows up the Manifest file, since every little Go module needs to be respected. A lot of these Manifest files lead to a extremely increased Portage tree size. EGO_SUM is just one example (though the biggest one). Statically linked languages like Rust etc. have the same problem. - The current solution is to prepackage all modules, put it somewhere on a webserver and just manifest that file. This make the Portage tree small in size again, but requires a webserver/mirror and is thus unfriendly for overlay devs. I'm not sure if it was mentioned before but has anyone considered hash trees / Merkle trees for the manifest file? The idea would be to hash the standard manifest file a second time if it gets too big and write down that hash as new manifest file and leave EGO_SUM as is. When Portage tries to install the package, it can download all modules, build the "normal" Manifest file like normally, but instead of directly compare it to the Manifest in the tree it can hash it again and compare that to the provided Manifest. With this, Portage should have more less the same guarantees about the validity of the source code, but the manifest file consists of just two hashes again. What one would loose is the direct comparison of file names (they are included in the "meta"-hash, though) or do I miss something? Gerion signature.asc Description: This is a digitally signed message part.
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 2023-07-03 Mon 04:17, Florian Schmaus wrote: On 30/06/2023 13.33, Eray Aslan wrote: On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: Why do we have to keep exporting the related variables that generally cause these size issues to the environment? I really do not want to make a +1 response but this is an excellent question that we need to answer before implementing EGO_SUM. Could you please discuss why you make the reintroduction of EGO_SUM dependent on this question? Just to be clear, I don't particularly care about EGO_SUM enough to gate its reintroduction (and don't have any leverage to do so anyway). I'm just tired of the circular discussions around env issues that all seem to avoid actual fixes, catering instead to functionality used by a vanishingly small subset of ebuilds in the main repo that compels a certain design mostly due to how portage functioned before EAPI 0. Other than that, supporting EGO_SUM (or any other language ecosystem trending towards distro-unfriendly releases) is fine as long as devs are cognizant how the related global-scope eclass design affects everyone running or working on the raw repo. I hope devs continue leveraging the relatively recent benchmark tooling (and perhaps more future support) to improve their work. Along those lines, it could be nice to see sample benchmark data in commit messages for large, global-scope eclass work just to reinforce that it was taken into account. Tim
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 30/06/2023 13.33, Eray Aslan wrote: On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: Why do we have to keep exporting the related variables that generally cause these size issues to the environment? I really do not want to make a +1 response but this is an excellent question that we need to answer before implementing EGO_SUM. Could you please discuss why you make the reintroduction of EGO_SUM dependent on this question? Portage will show you a warning message if the exported environment approaches the kernel limit, and it will show a detailed error message if executing an ebuild failed due to the limit being reached. There seems to be no reason why you should not be able to allow EGO_SUM again without first fixing, for example, https://bugs.gentoo.org/721088. - Flow OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 30/06/2023 10.22, Sam James wrote: Florian Schmaus writes: [[PGP Signed Part:Undecided]] [in reply to a gentoo-project@ post, but it was asked to continue this on gentoo-dev@] On 28/06/2023 16.46, Sam James wrote: and questions remain unanswered on the ML (why not implement a check in pkgcheck similar to what is in Portage, for example)? On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for the total package-directory size. I only care a little about the tool that checks this limit, but pkgcheck is an obvious choice. I also suggested that we review this policy once the number of Go packages has doubled or two years after this policy was established (whatever comes first). But I fear you may be referring to another kind of check. You may be talking about a check that forbids EGO_SUM in ::gentoo but allows it overlays. My position on this has been consistent: > a check is needed to statically determine when the environment size is too big. Copying the Portage check into pkgcheck (in terms of the metrics) would satisfy this. It is not as easy as merely copying existing portage code into pkgcheck (unless I am missing something). I've talked to arthurzam, and there appears to be a .environment file created by pkgcheck, which we could use to approximate the exported environment. Another option would be to have pkgcheck count the EGO_SUM entries. The tree-sitter API for Bash, which pkgcheck already uses, seems to allow for that. But that would be different from the check in portage. Although, IMHO, counting EGO_SUM entries would be sufficient. That is, regardless of raw size, I'm asking for a calculation based on the contents of EGO_SUM where, if exceeded, the package will not be installable on some systems. You didn't have an issue implementing this for Portage and I've mentioned this a bunch of times since, so I thought it was clear what I was hoping to see. So pkgcheck counting EGO_SUM entries would be sufficient for the purpose of having a static check that notices if the ebuild would likely run into the environment limit? To find a common compromise, I would possibly invest my time in developing such a test. Even though I do not deem such a check a strict prerequisite to reintroduce EGO_SUM. Intelligibly, EGO_SUM can be considered ugly. Compared to a traditional Gentoo package, EGO_SUM-based ones are larger. The same is true for Rust packages. However, looking at the bigger picture, EGO_SUM's advantages outweigh its disadvantages. Again, am on record as being fine with the general EGO_SUM approach, even if I wish we didn't need it, as I see it as inevitable for things like yarn, .NET, and of course Rust as we already have it. Just ideally not huge ones, and certainly not huge ones which then aren't even reliably installable because of environment size. Talking about "reliably installable" makes it sound to me like there are cases where installing a EGO_SUM-based package sometimes works and sometimes not. But the kernel-limit is fixed and not even configurable, besides, of course patching the source (and in the absence of architectures with a page size below 4 KiB) [1]. Any developer testing whether or not an ebuild is installable would become immediately aware if the ebuild runs into the environment limit, or not. That said, static code checks are always preferable over dynamic ones. - Flow 1: https://elixir.bootlin.com/linux/v6.4.1/source/include/uapi/linux/binfmts.h#L15 OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On Fri, Jun 30, 2023 at 03:38:11AM -0600, Tim Harder wrote: > Why do we have to keep exporting the related variables that generally > cause these size issues to the environment? I really do not want to make a +1 response but this is an excellent question that we need to answer before implementing EGO_SUM. -- Eray
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
On 2023-06-30 Fri 02:22, Sam James wrote: > My position on this has been consistent: a check is needed to statically > determine when the environment size is too big. Copying the Portage > check into pkgcheck (in terms of the metrics) would satisfy this. > > That is, regardless of raw size, I'm asking for a calculation based on > the contents of EGO_SUM where, if exceeded, the package will not be > installable on some systems. You didn't have an issue implementing this > for Portage and I've mentioned this a bunch of times since, so I thought > it was clear what I was hoping to see. > > I would also like (which is not what I was referring to here) some > limit on the size, given that we already have a limit on the size of > ${FILESDIR}, but this is less of a concern for me given it's bounded > by the aforementioned environment size check. Why do we have to keep exporting the related variables that generally cause these size issues to the environment? I've asked as much on IRC multiple times (nearly every time this discussion has been brought up) and the answers I've gotten are some variation on "it's always been that way" or "not exporting them would break using commands as external programs" (e.g. calling via xargs). The first response isn't a great argument and the second response, while more valid, also feels less important than having a more minimalistic, exported environment that causes less issues like this one and others such as potentially affecting a package's build system in an unexpected fashion. See bug #721088 for the related discussion on environment variable exports. >From my stance, the spec should state that the only variables to be exported are ones already "semi-standard" and used externally of package manager internals in the expected fashion, which probably only includes HOME, TMPDIR, and maybe ROOT. This would of course currently break packages that use `xargs` while calling internal commands depending on some of those exported variables, but from a cursory glance at the gentoo repo, there aren't many ebuilds using that functionality and in general those that are could be written in an easier to understand fashion without using xargs. It should also be possible to proxy the required variables to those commands in various fashions without using the environment if using commands externally is extremely important to the few ebuild maintainers who make use of that functionality. In short, adding checks to portage and pkgcheck feels like a ill-suited workaround that foists hacking around the error onto users or developers due to a poor decision made decades ago on environment handling. Tim
Re: [gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
Florian Schmaus writes: > [[PGP Signed Part:Undecided]] > [in reply to a gentoo-project@ post, but it was asked to continue this > on gentoo-dev@] > > On 28/06/2023 16.46, Sam James wrote: >> Florian Schmaus writes: >>> On 17/06/2023 10.37, Arthur Zamarin wrote: I also want to nominate people who I feel contribute a lot to Gentoo and I have a lot of interaction with (ordered by name, not priority): […] flow >>> >>> I apologize for the late reply, and thank you for the nomination. I am >>> honored and accept. >>> >>> As many of you know, I am spending a lot of time on the EGO_SUM >>> situation, as it is one of the most critical issues to solve. >>> >>> I have used the last few days to carefully consider whether a seat on >>> the council is more harmful or beneficial to my efforts regarding >>> EGO_SUM. On the one hand, council work means I have less time to >>> improve the EGO_SUM situation. On the other hand, a seat in the >>> council increases the probability of positively influencing Gentoo's >>> future, also regarding EGO_SUM. >>> >> That's fine and it's great to see more people running! > > Excellent that we share this view. :) > > >> But with regard to EGO_SUM: you didn't appear at the meeting where we >> discussed >> your previous EGO_SUM proposal, > > Naively, as I am, I expected that the mailing list would be used for > discussion and that the council meeting would be used chiefly for > voting and intra-council discussion. And since the request to the > council to vote on a concrete proposal was preceded by a > multiple-week, if not month-long, mailing list discussion, I assumed > that my presence in the council meeting was optional. > > Had I known that my presence was required, or that the absence in the > meeting would be blamed on me afterward, I would have appeared if > possible. I'm not blaming you for anything. But you didn't speak in #gentoo-council before the meeting (a few days before IIRC) when we were discussing the problem, I pinged you during the meeting, and you didn't appear there afterwards. You also didn't seem to respond to the council decision (or non-decision) in that meeting either, unless I've missed it. It seems self-evident that discussion would happen in the meeting before voting...? What am I misunderstanding? We regularly discuss things before voting on them. Do you normally observe council meetings? I don't think what we did in this instance was at all unusual. (Also: there's the issue of whether or not the council should really be voting on overriding an eclass maintainer who would then be forced to keep something working they don't want to. mgorny raised that.) > > >> and questions remain unanswered on the >> ML (why not implement a check in pkgcheck similar to what is in Portage, >> for example)? > > On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for > the total package-directory size. I only care a little about the tool > that checks this limit, but pkgcheck is an obvious choice. I also > suggested that we review this policy once the number of Go packages > has doubled or two years after this policy was established (whatever > comes first). > > But I fear you may be referring to another kind of check. You may be > talking about a check that forbids EGO_SUM in ::gentoo but allows it > overlays. My position on this has been consistent: a check is needed to statically determine when the environment size is too big. Copying the Portage check into pkgcheck (in terms of the metrics) would satisfy this. That is, regardless of raw size, I'm asking for a calculation based on the contents of EGO_SUM where, if exceeded, the package will not be installable on some systems. You didn't have an issue implementing this for Portage and I've mentioned this a bunch of times since, so I thought it was clear what I was hoping to see. I would also like (which is not what I was referring to here) some limit on the size, given that we already have a limit on the size of ${FILESDIR}, but this is less of a concern for me given it's bounded by the aforementioned environment size check. > > Intelligibly, EGO_SUM can be considered ugly. Compared to a > traditional Gentoo package, EGO_SUM-based ones are larger. The same is > true for Rust packages. However, looking at the bigger picture, > EGO_SUM's advantages outweigh its disadvantages. > Again, am on record as being fine with the general EGO_SUM approach, even if I wish we didn't need it, as I see it as inevitable for things like yarn, .NET, and of course Rust as we already have it. Just ideally not huge ones, and certainly not huge ones which then aren't even reliably installable because of environment size. signature.asc Description: PGP signature
[gentoo-dev] EGO_SUM (was: [gentoo-project] Gentoo Council Election 202306 ... Nominations Open in Just Over 24 Hours.)
[in reply to a gentoo-project@ post, but it was asked to continue this on gentoo-dev@] On 28/06/2023 16.46, Sam James wrote: Florian Schmaus writes: On 17/06/2023 10.37, Arthur Zamarin wrote: I also want to nominate people who I feel contribute a lot to Gentoo and I have a lot of interaction with (ordered by name, not priority): […] flow I apologize for the late reply, and thank you for the nomination. I am honored and accept. As many of you know, I am spending a lot of time on the EGO_SUM situation, as it is one of the most critical issues to solve. I have used the last few days to carefully consider whether a seat on the council is more harmful or beneficial to my efforts regarding EGO_SUM. On the one hand, council work means I have less time to improve the EGO_SUM situation. On the other hand, a seat in the council increases the probability of positively influencing Gentoo's future, also regarding EGO_SUM. That's fine and it's great to see more people running! Excellent that we share this view. :) But with regard to EGO_SUM: you didn't appear at the meeting where we discussed your previous EGO_SUM proposal, Naively, as I am, I expected that the mailing list would be used for discussion and that the council meeting would be used chiefly for voting and intra-council discussion. And since the request to the council to vote on a concrete proposal was preceded by a multiple-week, if not month-long, mailing list discussion, I assumed that my presence in the council meeting was optional. Had I known that my presence was required, or that the absence in the meeting would be blamed on me afterward, I would have appeared if possible. and questions remain unanswered on the ML (why not implement a check in pkgcheck similar to what is in Portage, for example)? On 2023-05-30 [1], I proposed a limit in the range of 2 to 1.5 MiB for the total package-directory size. I only care a little about the tool that checks this limit, but pkgcheck is an obvious choice. I also suggested that we review this policy once the number of Go packages has doubled or two years after this policy was established (whatever comes first). But I fear you may be referring to another kind of check. You may be talking about a check that forbids EGO_SUM in ::gentoo but allows it overlays. However, as stated before [2], this is not a viable approach. One reason why it is not practicable is auditability. The blocker is not a council seat, it's about addressing people's concerns... Unfortunately, it appears that I am terrible at convincing everyone that the deprecation of EGO_SUM was a mistake. I tried to respond to every concern. Often, the response included arguments based on factual data. But eventually, I would only expect to convince some, as the EGO_SUM question touches the subjective realm of style. I know that the EGO_SUM situation and the resulting discussion grew huge and left many understandably bored or confused, which then turned away. But that is a pity because it is a relevant discussion for Gentoo's long-term success. The bottom line is that the EGO_SUM discussion yielded no evidence or even a slight indication that EGO_SUM was deprecated based on technical issues. Instead, it appears that EGO_SUM was deprecated because some deemed it unaesthetic. Intelligibly, EGO_SUM can be considered ugly. Compared to a traditional Gentoo package, EGO_SUM-based ones are larger. The same is true for Rust packages. However, looking at the bigger picture, EGO_SUM's advantages outweigh its disadvantages. - Flow 1: https://marc.info/?l=gentoo-dev=168546196902731 <25308876-7ac4-8c90-8641-1034cc67c...@gentoo.org> 2: https://marc.info/?l=gentoo-dev=168569387514376 <012fa74d-2910-ea90-6008-26cc23604...@gentoo.org> OpenPGP_0x8CAC2A9678548E35.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature