On Tuesday, November 5, 2019 7:05 PM, Mickaël Bucas <mbu...@gmail.com> wrote:
> Hi Caveman
>
> The Portage tree contains a few binary packages prepared by Gentoo
> developers, like Firefox, Rust, LibreOffice...
> "ls -d /usr/portage//-bin" shows about 90 packages prepared in this
> way, some of them because they are non-free like Oracle JDK
>
> This means that there is no necessary changes to Gentoo to accomplish
> what you describe : compile the packages, write the ebuilds for the
> binary packages, publish ebuilds in an overlay.

Some qt-related packages are really slow to compile, yet still not listed.
A problem with this approach is that IMO it's too manual and doesn't react
dynamically to user changes.

IMO we can consider this an automated community-driven bin-host that uses
statistics in order to tell which packages are reliable.  In case of hardware
mismatches, I think we can find a binary that's compiled with the desired,
say, USE flags, but compiled on an older CPU model that's backward compatible
with the newer rare one that one might be using.

> But the really short list above shows that it's a really complex task
> because of all dependencies and configurable elements in Gentoo. If
> you just have a look at the output of "emerge --info" you can imagine
> all the moving parts, like compiler versions and compile options,
> Bash, Perl, Python, Init system, USE flags (combinatorial), even human
> languages. And that is just the easily visible parts !

True, however a few points:

* If we look at that info, from the perspective of individual packages, it is
  has much less degrees of variations in practice.  E.g. if we look at the USE
  flags dimension, dev-qt/qtwebengine has 12 of them, so worst case for this
  aspect we get about:

    nchoosek(12,1) + nchoosek(12,2) + ... + nchoosek(12,12) = 4095

  possible combinations with those 12 flags.  But,
  most people are only interested in 2 sets of potential USE flag
  configurations, one with ALSA, or another with PA.  So in practice, that 4095
  is probably reduced to just 2 or 3 clusters of configurations (not 4095).

* For hardware details, such as the exact CPU model and the kinds of features
  actually enabled by the compiler when using `-march=native`.   I don't know
  the actual distribution of this in practice, but is it not possible
  that users can be given the choice to simply pick a binary that's compiled on
  an older backwards compatible CPU?

  E.g. the system could prompt the user the nearest (e.g. in selection of USE
  flags) to his query, by presenting the user with a binary compiled with an
  older x86-64 CPU model than his newer x86-64 CPU.

  This way, this could become simply an automated bin-host that blurs as
  necessary, and forks variations of specific configurations as demand raises,
  all without needing manual dev time to package *-bin manually.


> I remember reading an article about a man trying to reproduce binary
> packages of a binary distribution and failing to do so, because there
> are so many parts involved. I've read later that distributions have
> done some work to have reproducible builds, but I'm not sure how
> successful they are, even when all choices are predefined.
>
> Given that Gentoo has taken a whole different road by having more
> choices available to the user, I don't think the compilation results
> of one configuration would be easily used on another.

Is it possible to collect statistics of such configurations from Gentoo users?

I don't know what would the outcome be, but I think it's worth exploring.  E.g.
what if it turned out that there is not much diversity in our
settings?  E.g. we can find a few really popular clusters of USE, langauge,
license, flags?  As for hardware, what would be the latest backwards compatible
CPU that has compiled a binary for me with enough statistical confidence in its
reliability?


> To go even further, pushing your compiled packages to a public server
> may create a security risk by exposing many parts of your
> configuration that could be analyzed by malicious people.

Any example of such sensitive information that might be in the binaries?  Just
curious, as I don't know much about this.

I could be wrong, but so far my thought is that I don't think we get much bits
of entropy for our security by hiding our package lists, because I think an
adversary can probably already use statistics to predict common clusters of
package lists that we might use.s.

So I personally doubt that attackers would face much difficulty by not knowing
our packages, because our packages are probably already predictable since our
distribution of packages is not that diverse.


> So far I don't see a really big advantage in building this kind of
> infrastructure compared to either a binary distribution or Gentoo with
> home compilation.

IMO the real value is that it will be some kind of an automated community-driven
bin-host that uses statistics to quantify the reliability of its bins, and to
automatically create bins of special cases as the demand raises (e.g. common USE
flag combinations that become trendy), without needing to wait for a package
maintainer to bundle a *-bin package.

I think, if this works, it may make Gentoo even better at binary packages than
the mainly binary distros.


rgrds,
cm


Reply via email to