On Mon, Sep 16, 2013 at 03:07:19PM +0000, Xyne wrote: > Chris “Kwpolska” Warrick wrote: > > >Why not adapt the actual Bash parser (in C) to only read and do stuff > >safely? In most cases, this would be enough. In the others, we > >already have mess in those fields in the AUR. (my C skills are not > >appropriate for this)
> That is basically what needs to be done but it is a difficult task. Even if > you > can adapt the Bash source code to return the AST, you would still need to > create an extensive whitelist of executables (both internal and external) that > may be run in order interpolate all of the variables. The code must be able to > detect variable settings nested in the package functions, skip commands that > do > not affect variables (which may require it to work backwards), count loop > cycles to prevent infinite loops, track time to prevent timeouts, etc. And you'd need to do all this work at a level lower than the parser itself to avoid subversion via aliases, functions, and scripts which mask the actual operation's nature... I think I've mentioned this a few times, but I think there's 2 options if you want better parsing on the AUR: 1) Extend .AURINFO, implement it as .SRCINFO in makepkg proper. To date, I think there's been a number of issues which no one has been willing to address to make this a reality. 2) Use a VM (e.g. http://www.vidarholen.net/contents/evalbot/) to evalulate the code. This would require something very similar to the guts of makepkg which understands per-package overrides. The output would be something similar to #1, so really... interested parties should just work on that. > I have thought about this before when I wrote the Bauerbill PKGBUILD parser, > but I gave up trying to find a way to extract the AST using the Bash code. In > the end my code would simply wrap the PKGBUILD in a function, source the file, > spit it out with "set" to homogenize the syntax, and then parse it with > regexes. > > I started writing a Bash parser in Haskell with Parsec but my free time ran > out > and I had to move on to other things. I think that approach would work quite > well if the Bash sources are too tangled to extract the parser, but it is a > huge task for one person (word expansion, string manipulation, all of the > built-ins, etc.). I would be willing to collaborate on that as well, if there > is any interest. You'd probably be interested in shellcheck: http://www.shellcheck.net/ It's written in Haskell, and while it doesn't execute anything, it does understand a large amount of bash syntax. I found an obscure bug in it recently which was quickly fixed by the author (he's a denizen of #bash on freenode).