On Friday 26 August 2005 09:35, Brian Harring wrote:
> Any parser that doesn't support full bash syntax isn't acceptable from
> where I sit; re: slow down, 2.1 is around 33% faster sourcing the
> whole tree (some cases 60% faster, some 5%, etc).  The speed up's are
> also what allow template's to be swapped, the eapi concept.

For the toplevel of the ebuilds there are many things that are not 
allowed. Basically things must be deterministic for the cache to work. I 
have built an extension that would parse 98% of current ebuilds properly, 
and much (more than 10 times) faster than the bash/ecache way. It is in 
the shape of a python module written in C. It just ignores the functions, 
so anything is allowed in there. As such the parser understands enough of 
bash to support it. Even variable substitution and inherit are supported. 
What's not supported is various kinds of uncommon substitution tricks 
that should probably not happen in the toplevel either.

Using EAPI would also allow to see something as capabilities. Say have 
portage support version 2-relaxed and version 2-strict. 2-relaxed has all 
the bash freedom and is parsed using bash. 2-strict would allow parsing 
by a faster parser module, but would limit the bash freedom. I don't say 
we have to do this, but if ebuild and eclass EAPI declarations follow a 
few very simple rules that are normally obeyed, it would be possible to 
support this thing in the future.

One of the problems I see with the current ebuild format is that it is 
impossible to do incompatible changes at all. This means that many 
features that might be desired can not be implemented. EAPI can relieve 
that. To make it easier there should be an easy way to get the EAPI of a 
package.

>
> I'd note limiting the bash capabilities is a restriction that
> transcends anything EAPI should supply; changes to what's possible in
> the language (a subset of bash syntax as you're suggesting) are a
> seperate format from where I draw the line in the sand.

What I suggest is making a policy that would make this possible in the 
future. Note that I do not wish to restrict any bash functionality in the 
various functions in the ebuild. 

> Mainly, limiting the syntax has the undesired affect of deviating from
> what users/devs know already; mistakes *will* occur.  QA tools can be
> written, but people are fallable; both in writing a QA tool, and
> abiding by the syntax subset allowed.

The QA tools would just be running the parser. If the parser chokes (which 
it doesn't easilly) then the ebuild does not conform to the correct 
syntax. It's even possible to just compare the variables returned. If 
they don't match, the format is wrong for the C parser.

>
> > The restriction I propose would be:
> > - If EAPI is defined in the ebuild it should be unconditional, on
> > it's own line in the toplevel of the ebuild before any functions are
> > defined. (preferably the first element after the comments and
> > whitespace)
> >
> > - If EAPI is not defined in the ebuild, but in an eclass, the inherit
> >   chain should be unconditional and direct. Further more in the
> > eclass the above rules should be followed.
> >
> > Please note that many of the conditions are allready true for current
> > ebuilds, just portage can "handle" more.
>
> inherit chain must be unconditional anyways.  re: eapi placement, I
> would view that as somewhat arbitrary; the question is what gain it
> would give.

The gain of putting it at the top would be that there are less chances for 
parsers to have choked on incompatible syntax. If EAPI is in the top, at 
some point incompatible syntax might be allowed, and older parsers could 
still retrieve the EAPI. Of course any syntax that works on 'egrep 
"^[ \t]*EAPI[ \t]*="' should be no problem.

>
> I'd wonder about the parsing speed of your parser; the difference
> between parsing ebuilds and running from cache metadata is several
> orders of magnitude differant- the current cache backend flat_list
> and portage design properly corrected ought to widen the gap too.
> General cache lookup is slow due to-
> A) bad call patterns, allowed by the api; N calls to get different
>    bits of metadata from a cpv, resulting in potentially N to disk set
>    of ops.
> B) default cache requires opening/closing a file per cpv lookup;
> syscall's are killer here.
> C) every metadata lookup incurs 2 stats, ebuild and cache file.

This parser was part of a stranded rewrite attempt. One of the features 
was that it regarded packages and package instances (specific files) as 
objects whose attributes would be lazilly evaluated. That means that it 
would parse if not available, lookup otherwise. The speed of "emerge -s" 
is stunning on the program as it uses a directory search which is orders 
of magnitudes faster than python doing the same thing.

> Getting to the point; cache is 100x to 400x faster then sourcing for
> <=2.0.51.  Haven't tested it under 2.1, should be different due to
> cache and regen fixups/rewrites.

Don't forget the fact that bash must be execed for normal parses, and that 
python has extremely slow string handling when not using one of the 
standard parsing modules (that work in C). To put my money where my mouth 
is, I've tarred up my code and put it on my dev space:
http://dev.gentoo.org/~pauldv/portage_native-0.1.tar.bz2

Just run make in the extracted dir. The binary created is xbuildparse, 
this is a standalone parser that takes the ebuild as argument. It will 
look for eclasses in /usr/portage/eclass.

The python module can be built with "make xbuildparse.so", and includes a 
little bit of help reachable through the normal python way.
>
> Back to the point, essentially, EAPI matters in two places;
> 1) metadata transfer from the ebuild env into python side during
>    depends phase; has to know what to transfer key wise.
> 2) actual ebuild build phase executions; if it isn't the depends phase,
>    eapi being required so that the parser can swap drop in the
> appropriate ebuild env template.

I think it also matters in actually allowing future incompatible versions 
of ebuild formats. I don't mean to say good bye to the current format, 
but when redesigning the format, we should now design it for 
extensionability.

> The restrictions suggested for EAPI would only make sense if eyeing
> #1, an alternative parser; no reason to drop the cache unless the
> parser is capable of hitting the same runtime performance the cache
> can hit (frankly, it's not possible from where I'm sitting although
> the gap can be narrowed).

You're probably right, but the time needed to parse an ebuild can be 
reduced that much that parsing will not be the issue anymore, but 
building the right tree is:

time ./xbuildparse /usr/portage/sys-libs/db/db-4.2.52_p2.ebuild 
&>/dev/null

real    0m0.054s
user    0m0.048s
sys     0m0.002s

Please note that the parser is incomplete, does have some small bugs 
(don't try it on flag-o-matic as it someway goes into an endless loop), 
and could probably do some things smarter.

> So... the EAPI limitations, not much for due to the conclusion above.
>
> Interested in the parser however, since ebd is effectively a pipe
> hack so that pythonic portage can control ebuild.sh.  I (and others)
> have been after a bashlib for a while, just no one has crunched down
> and done it (easier said then done I suspect).

See it above. It does not fully understand every bash statement around. 
And important is that it currently does not understand the "if" 
statement. This is easy to add though, just wasn't added out of "policy". 
But being that even my own ebuilds (like db) use it, it should probably 
be added.

I do believe that the parser could be made usefull for most ebuilds. This 
would however still mean a small restriction in allowed syntax. The 
parser module has basically one function which is "parse" it parses an 
ebuild, the eclasses, and returns a list of variables. Not all variables 
are substituted though, I have a python function that does this. If 
people are interested I can take a look at sanitizing my whole tree and 
providing it.

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: [EMAIL PROTECTED]
Homepage: http://www.devrieze.net

Attachment: pgpcQDjtcPCp2.pgp
Description: PGP signature

Reply via email to