More thoughts, and some significant progress in this area....

I spent most of yesterday collecting the efsedploy rules for
EVERYTHING I've built into /efs/dist over the last few months (it's a
lot), by copying the src directory to:

    ~/dev/efs/deploy-config/$metaproj/$project

If you reduce the efsedploy.conf files to nothing by the site-specific
information, you end up with VERY VERY little site-specific metadata,
and it all falls into one of two categories:

(a) Platform/compilers customization
(b) Dependency specifications

Let's discuss these in reverse order....

(b) Dependency specifications

The problem with the dependencies is that the specific releasealiases
(i.e. versions) of the dependencies are really site specific.  The
generic rules describe how to install, say gnu/gcc, but they only
specify the dependencies to the project level.   I.e. gnu/gcc has
dependencies on gnu/mpfr and gnu/gmp, for example, but the rules do
NOT say which version.  That is determined by whatever version you
(the site administrator) decided to download and install.

As I mentioned a week or two ago, I'm going to move the "depends" step
from the end of the efsdeploy workflow to the very beginning, before
source, but after download.  (The reason for post-download is so we
can someday inspect the source, and perhaps do some automated
dependency discovery, but this is VERY hard).   The depends
functionality will change as follows...

The purpose of this command will be to expand the generic dependency
specification, which will usually just be M/P, into a set of M/P/R
dependencies, and those will be stashed into a dynamically generated
file (probably ../build/common/efsdeploy-depends.conf, or something
similar), which will the FIX these dependencies for the rest of the
build process.  If you want to change them, then you'll need to re-run
the depends command.

This will be entirely backwards compatible, because it will still be
possible to specify explicit dependencies in efsdeploy.conf, which
will override any dynamic discovery mechanism, however the dependency
specification format will probably change going forward.   Following
my usual approach of keeping things simple, something like this:

For gnu/gcc versions later than 4.5, you need 3 key dependencies, and
these would be specified in efsdeploy/globals.conf:

[depends]
    c_runtime = gnu/gmp gnu/mpfr gnu/mpc

When the dependencies are "compiled" and cached, each of these would
be expanded to pick the "latest" version available.  What does that
mean?   The latest release number (not a trivial problem, but
solvable) that includes all the required platform support.   We'll
start with some simple heuristics, and add complexity as it is needed.
 For example, if we knew that something had to use gnu/mpfr/5.0.*,
because 5.1.* is unstable, then we could specify things this way:

[depends]
    c_runtime = gnu/gmp gnu/mpfr/5.0.* gnu/mpc

This would tell the dependency expansion algorithm that we want to use
5.0.1 or 5.0.2 or whatever the
latest 5.0.* release is.   This will cover 80-90% of the use cases
I've seen, and we can use hooks when we need more complex logic
(there's a depends hook already, and I've only had to use it twice).

Again, start simple, and add complexity ONLY as necessary.

(a) Platform/compilers customization

This information is really project-specific, or releaseALIAS specific,
so I'm thinking this might be a good place to leverage EFS built-in
attributes mechanism.  For example, these definitions would let me
extract the way I manage gnu/gcc:

# /efs/dist/rhel is RHEL-only
efs setattr metaproj rhel efsdeploy_platforms x86-32.rhel.5,x86-64.rhel.5

efs setattr releasealias gnu gcc 4.4 \
    efsdeploy_platforms "x86-32.rhel.5 x86-64.rhel.5
sparc32.sunos.5.10 sparc64,sunos.5.10 powerpc32.aix.6"
efs setattr releasealias gnu gcc 4.5 \
    efsdeploy_platforms "x86-32.rhel.5 x86-64.rhel.5"
efs setattr releasealias gnu gcc 4.5 \
    efsdeploy_platforms "x86-32.rhel.5 x86-64.rhel.5"

That is a site-specific decision.  You *could* try to build 4.5 and
4.6 for the other platforms, but I've decided not to.   The gnu/gcc
efsdeploy rules do NOT assert this.   However, they DO assert that
gnu/gcc can't be compiled on powerpc64.aix.6, so the first attribute
isn't necessary, really.

If I go this route, I'll probably support *_skip attributes, just like
efsdeploy does, AND this will be extended to specifying compilers as
well.   I am also thinking about a VERY simple way to specify things
by OS or CPU, as well.

If I added those two features (the depends changes are underway,
already, the platform/compiler stuff I'm still thinking about), then I
would be able to remove ALL of the site-specific information I
currently have in the stuff in:

    ~/dev/efs/deploy-config/$metaproj/$project

and it will be 100% generic.

OK, that's enough to digest in one email...  Next up, my ideas for
what how to manage the stuff in git, once it's been made generic and
sharable.

On Wed, Dec 28, 2011 at 10:41 AM, Phillip Moore
<[email protected]> wrote:
> I'm very close to being able to create this git repo, and begin
> experimenting with git2efs implementations, but I'm really not sure
> that a git-repo per-project is going to scale.   If we follow this
> path, we'll end up with a LOT of git repos on openefs.org, since there
> will easily be close to 1000 projects to manage this way, over time.
>
> Having said that, I'm going to move forward with this for now, and not
> get hung up prematurely optimizing something.  I fully expect this to
> evolve, just like everything else in EFS.
>
> The bigger problem with managing the efsdeploy data is that the data
> isn't segregated correctly.   There are site-specific configuration
> values interleaved with generic values, and you can't commit most of
> the existing efsdeploy.conf files into a shared repo because of this.
>  For example, the specific releases used for dependencies, and the
> list of platforms to build are site-specific, NOT generic, yet
> everything else in efsdeploy.conf is generic.   Refactoring the
> gnu/gcc and gnu/gcclib rules exposed this, but I think I have a
> solution.
>
> efsdeploy has always supported a hierarchy of config files, and it was
> always possible to override things in commands.conf, etc (as of a
> patch yesterday, this includes globals.conf, too).   This allows me to
> reduce efsdeploy.conf to ONLY the site-specific customizations, by
> moving things like:
>
> [options]
>    use_vpath_build = 1
>
> from efsdeploy.conf to efsdeploy/globals.conf.   In the code I will
> shortly be committing to the new gnu-gcc repo, the efsdeploy.conf file
> will be empty, with nothing but comments.  I'm also considering naming
> this file efsdeploy.conf.tmpl, and adding efsdeploy.conf to
> .gitignore.  RIght now, efsdeploy requires that the efsdeploy.conf
> file exist, but I can probably make it optional.
>
> This will make efsdeploy.conf 100% site specific, and solves the data
> segregation problem fairly well.   The last issue is then how to
> manage the efsdeploy.conf file itself with git (or svn, or whatever).
> The challenge here is that I don't think you can have a single
> directory that contains files managed by two different git repos, and
> even if you could, I suspect it's a fragile configuration.  That
> suggests we look for a way to manage these files in different
> directory locations.
>
> I do not have a solution for this handy yet, but this first pass at
> the problem will at least reduce the scope of the data we are NOT
> managing in git to a very small subset, and then once we "see" the
> bigger picture and understand the nature of the data we are managing
> in the reduced efsdeploy.conf file, a solution will present itself.
> They do this to me, sometimes....
>
> This will allow me to start sketching out a "git2efs" utility, which
> can hopefully be made generic.
>
> Plowing forward....
>
> On Fri, Oct 28, 2011 at 9:49 AM, Phillip Moore
> <[email protected]> wrote:
>> I'm working on a solution to the problem of how to manage the efsdeploy
>> configuration information, hooks, patches, etc.
>>
>> Right now, we're not putting this information into source code control
>> ANYWHERE.   Not in EFS 2 land, not in EFS 3, nowhere.   This isn't such a
>> major problem because the /efs/dev namespace effectively versionizes this
>> stuff everytime you checkpoint a new release, but it's a problem because
>> we're not sharing this information, and we're not tracking ANY of the
>> changes anywhere.
>>
>> I'm got about 5 pages of notes I made on how to tackle this and a solution
>> is starting to form in my mind.   The basic idea is that each project will
>> have it's own git repo, and efsdeploy will be enhanced to know how to talk
>> to it, very basically.  In order to start playing around with the idea, and
>> experimenting with automation for it, I need to create a few new git repos
>> on openefs.org.
>>
>> The convention I'm currently considering is:
>>
>>     git://git.openefs.org/efs-deploy-config-$metaproj-$project.git
>>
>> Then efsdeploy will get some new commands, for example:
>>
>>     scminit
>>     scmpull
>>     scmpush
>>
>> For metaproj's like gnu, where we install each project by hand, this
>> approach will work pretty well.  For one's llke perl5, where most of the
>> projects are created automatically by cpan2efs, I don't want to require a
>> git repo, since in the overwhelming majority of cases, installation is 100%
>> automated now.   We're almost to the point where even some gnu projects can
>> be installed with NO config (if we supported a default download->url for
>> gnu/tar, not even efsdeploy.conf is necessary).
>>
>> This has become a higher priority issue because of the changes I have
>> recently made to gnu/gcc's build scripts.  I want to make sure we have this
>> critical information available, and right now, it's not.
>>
>> Jerry -- can you create the above repo for me for gnu/gcc, and document how
>> it's done, so we can automate it?  I have some reservations about the git
>> repo namespace growing linearly this way, so this is NOT a final design, but
>> merely an experiment for a proof of concept.  Once I have a basic workflow
>> functioning, I want to write it up and get a discussion going (if possible
>> -- this is one of the lowest volume mailing lists in the history of the
>> Internet) about how to generalize the solution.
>>
>> Without a good solution to this, anyone bootstrapping an EFS 3 domain will
>> have to struggle with ALL the same issues that are solved in the currently
>> unavailable gnu/gcc rules, among other things.  This is the missing piece of
>> the puzzle for complete reproducibility of an EFS domain.
>>
_______________________________________________
EFS-dev mailing list
[email protected]
http://mailman.openefs.org/mailman/listinfo/efs-dev

Reply via email to