Re: [gentoo-dev] EBUILD_FORMAT support

Brian Harring Fri, 26 Aug 2005 00:43:18 -0700

Pardon the delay, been putting this one off since it's going to be a 
fun one to address, and will be a bit long :)

On Thu, Aug 25, 2005 at 12:34:00PM +0200, Paul de Vrieze wrote:
> What I mean is compatibility with current portage versions. Current 
> versions do not understand EAPI. There would be a good chance that they 
> could choke on packages with all kinds of new features, even in the sync 
> phase. A different extension would ensure that those portage versions 
> would still work (crippled) on a new tree. Of course such an extension 
> change should only be done once. Once the API versions are available this 
> is not an issue.

General portage stance towards EAPI is unset EAPI == 0 (current stable 
ebuild format); if EAPI > then portage internal EAPI, unable to merge, 
which should be able to be detected during buildplan.

Current portage doesn't know about EAPI; boned in that respect I'll 
admit, but it's the case for all new features rolled out- three options 
for dealing with this
1) Usual method, deploy support, N months later use support.
2) tweak stable so it's aware and can complain.  Still requires 
   people to upgrade, just makes it so that they're not forced into 
   upgrading to 3.x; this is mainly a benefit for those who may don't 
   care to try the first few releases of 3.x when it hits (akin to 
   people dodging the first release or two of a gcc release).

Worth noting that one rather visibile aspect of EAPI=1 is that 
(assuming the council votes on it, and yay's it) glep33 *will* result 
in current eclasses being effectively abandoned w/in the N months 
after an EAPI capable portage is released.

Sound kind of bad, but people will have to upgrade for the 
capabilities.  If EAPI was pegged into portage/ebuilds already 
it wouldn't be an issue, issues could be detected prior.  
Unfortunately it's not, and introduction of it (and use of it) is 
going to involve a few road bumps.

Plus side, once it's in, portage *will* know if the ebuild is 
incompatible with the pythonic/bash ebuild code, and portage/the UI 
can act accordingly.

Meanwhile, the changes that are being pushed into EAPI are addition of 
configure phase (broken out from compile), elib addition, and eclass2 
support (same beast, different rules due to env save/restoration).

The potential for horkage on sync'ing isn't there due to the fact 
that's purely python side; ebuild*sh doesn't play into it.

Re: regen, issue isn't really there either; if you try and merge an 
eapi=0 on a non eapi aware portage, it works, same as it did before.
If you try to merge an eapi=1 ebuild you hit either an issue with 
inherit, or a bail immediately in src_compile, due to the fact eapi=1 
ebuilds will seperate configure out from compile (eapi=0 portage won't 
know to call it; no configure == failed compile).

That said, there also quite likely is a change coming down the pipe to 
the tree's cache; the change will shift the rsync'd metadata cache 
over to a key/val based cache.

Why oh why, yet another cache change?  Simple.  The change moves away 
from list based format to key:value pairs; in short it's a change that 
once made, means keys can be added to the cache from that point on 
without causing cache complaints on sync'ing.  Last cache breakage, I 
swear :P

EAPI addition being the next key tagged in; stable (not surprising) 
needs to be released with a version capable of reading both old and 
new format; once that's done, time for the usual "yo, upgrade people, 
something's coming down the line".  Same version that supports 
old and new cache format can also include rudimentary eapi awareness.

At least that's what I'm thinking.  It's roughly inline with the 
previous forced cache breakages, just in this case slipping in some 
extra support in the process.

Notices obviously would go out prior to moving on this also, along 
with a good chunk of waiting.

> > > ps. I would also suggest requiring that EAPI can be retrieved by a
> > > simple line by line parsing without using bash. (This allows for
> > > changing the parsing system)
> >
> > No, that yanks EAPI setting away from eclasses.
> 
> If the eclasses follow similar rules that would be easilly parseable. 
> (taking inherit ...) into account is easy as long as the inherit line is 
> on one line of it's own. (unconditionally) These rules that would 
> allready be followed out of style reasons would make various kinds of 
> parsers able to parse them.

while it's insane, people *can* use indirection (eg inherit $var) for 
inherit's as long as it's deterministic, always the same inherit call 
for that ebuild's data.  Don't see a good reason to ixnay that, which 
means we'd have to parse the whole enchilada, eclasses and the ebuild.

Effectively, raiding a single var out wouldn't fly; eclasses could 
override an ebuild's eapi setting for example, just like any other 
metadata key (imo).

A *true* format change, moving away from bash for example or moving to 
an executing design of ebuilds would require an extension change; such 
a change must imo anyways, since it's not a change of the ebuild env's 
template/hooks; either it's a fundamentally different model for 
ebuilds- either via no longer being bash based, or moving away from our 
declarative design of ebuilds.

> > Only time this would be required is if we move away from bash; if that
> > occurs, then I'd think a new extension would be required.
<inserting a comment> contradicting myself via above, above is correct 
</comment>
>
> It would allow to for example restrict the ebuild format such that initial 
> parsing is not done by bash (but the files are still parseable by bash). 
> If we perform changes I think it should be done right in the first place.
Elaborate please

> > As is, shifting the 'template' loaded for an ebuild can be done in
> > ebd's init_environ easy enough, so no reason to add the extra
> > restrictions/changes.
> 
> One of the issues of ebuilds is the cache/metadata stuff. Parsing an 
> ebuild for basic information takes a lot of time. This can be done lots 
> faster with a less featured parser (I've written one some day) that 
> accepts 98% of all current ebuilds, just doesn't like dynamic features in 
> the toplevel. Such a parser could be a python plugin and as such easy to 
> use from python. However to ensure compatibility with a faster parser the 
> EAPI variable should be there in a way that is a little more strict than 
> the other variables. And such a restriction is in practice not a 
> restriction.

Any parser that doesn't support full bash syntax isn't acceptable from 
where I sit; re: slow down, 2.1 is around 33% faster sourcing the 
whole tree (some cases 60% faster, some 5%, etc).  The speed up's are 
also what allow template's to be swapped, the eapi concept.

I'd note limiting the bash capabilities is a restriction that 
transcends anything EAPI should supply; changes to what's possible in 
the language (a subset of bash syntax as you're suggesting) are a 
seperate format from where I draw the line in the sand.

Mainly, limiting the syntax has the undesired affect of deviating from 
what users/devs know already; mistakes *will* occur.  QA tools can be 
written, but people are fallable; both in writing a QA tool, and 
abiding by the syntax subset allowed.

> The restriction I propose would be:
> - If EAPI is defined in the ebuild it should be unconditional, on it's own
>   line in the toplevel of the ebuild before any functions are defined.
>   (preferably the first element after the comments and whitespace)
> 
> - If EAPI is not defined in the ebuild, but in an eclass, the inherit
>   chain should be unconditional and direct. Further more in the eclass the
>   above rules should be followed.
> 
> Please note that many of the conditions are allready true for current 
> ebuilds, just portage can "handle" more.

inherit chain must be unconditional anyways.  re: eapi placement, I 
would view that as somewhat arbitrary; the question is what gain it 
would give.

I'd wonder about the parsing speed of your parser; the difference 
between parsing ebuilds and running from cache metadata is several 
orders of magnitude differant- the current cache backend flat_list 
and portage design properly corrected ought to widen the gap too.
General cache lookup is slow due to- 
A) bad call patterns, allowed by the api; N calls to get different 
   bits of metadata from a cpv, resulting in potentially N to disk set 
   of ops.
B) default cache requires opening/closing a file per cpv lookup; syscall's 
   are killer here.
C) every metadata lookup incurs 2 stats, ebuild and cache file.

Getting to the point; cache is 100x to 400x faster then sourcing for 
<=2.0.51.  Haven't tested it under 2.1, should be different due to 
cache and regen fixups/rewrites.

Back to the point, essentially, EAPI matters in two places; 
1) metadata transfer from the ebuild env into python side during 
   depends phase; has to know what to transfer key wise.
2) actual ebuild build phase executions; if it isn't the depends phase, 
   eapi being required so that the parser can swap drop in the appropriate 
   ebuild env template.

The restrictions suggested for EAPI would only make sense if eyeing 
#1, an alternative parser; no reason to drop the cache unless the 
parser is capable of hitting the same runtime performance the cache 
can hit (frankly, it's not possible from where I'm sitting although 
the gap can be narrowed).

So... the EAPI limitations, not much for due to the conclusion above.  

Interested in the parser however, since ebd is effectively a pipe 
hack so that pythonic portage can control ebuild.sh.  I (and others) 
have been after a bashlib for a while, just no one has crunched down 
and done it (easier said then done I suspect).

My 2 cents at least.
~harring

pgps0akY27cTi.pgp
Description: PGP signature

Re: [gentoo-dev] EBUILD_FORMAT support

Reply via email to