Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.
On 09/01/2010, at 2:50 AM, Xyne wrote: What was the problem with that from Sebastian which was discussed earlier on the mailing lists, IRCs ? How does it know more ? I don't know. I wrote this because I needed a PKGBUILD parser in Perl for Bauerbill. Maybe it's better, maybe it's worse. I posted it here in case someone finds it useful per se, or wishes to take any of the ideas from it and use them to iimprove other parsers. It is quite a clever idea. I haven't seen this approach before. I haven't looked at it thoroughly, but it looks like you're simply sourcing the PKGBUILD with some trickery not to execute the code. Why then the need for further parsing? Does `set` produce raw bash, e.g. 'source=(https://localhost/$pkgname.tgz;)'? It seems like bash should be able to do it itself. If that were the case, the parser would be extremely reliable (definitely more so than mine). There are still some safety issues involved, although maybe not for your purposes. One major thing is infinite loops - there's no way to break them. I'm sure this parser will be very useful when such things aren't an issue. Hmmm, after briefly reviewing the messages, I can mention that my parser: * doesn't depent on Yacc/Lex * supports split packages already * handles multiline assignments * supports interpolation and string substitutions For the record pkgparse does support split packages and word substitutions (though it's primitive atm, i.e. only $foo works, modifiers like ${foo##bar} don't). The major problem is with multiline assignments, but once that get's fixed, most PKGBUILDs should be parse- able. It probably won't depend on yacc/lex anymore either, but it will depend on Lemon/Ragel, as that's the direction it seems to be going :P. It's a compile-time dependency though, so it's not really a reason not to use it. To use it in perl you'd have to make perl bindings, which would require compilation anyway.
Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.
It is quite a clever idea. I haven't seen this approach before. I haven't looked at it thoroughly, but it looks like you're simply sourcing the PKGBUILD with some trickery not to execute the code. Why then the need for further parsing? Does `set` produce raw bash, e.g. 'source=(https://localhost/$pkgname.tgz;)'? It seems like bash should be able to do it itself. If that were the case, the parser would be extremely reliable (definitely more so than mine). There are still some safety issues involved, although maybe not for your purposes. One major thing is infinite loops - there's no way to break them. I'm sure this parser will be very useful when such things aren't an issue. You haven't fully understood how it works so I hope you don't mind if I try to explain it again. I first check the PKGBUILD with /bin/bash -n PKGBUILD. If this command exits without error then the PKGBUILD contains valid syntax, most importantly it does not contain extra closing brackets (}). This lets me wrap the entire PKGBUILD in a function, e.g. pkgbuild () { PKGBUILD } I can then source the file with Bash without executing any code. The previous check with bash -n guarantees that the PKGBUILD can not escape the wrapping function. Because all code is inside a function, sourcing the file does not execute any code at all. Bash simply parses the file and stores the code itself in the pkgbuild function, which itself contains other variables and functions (e.g. package_foo, build). Because the code has not been executed, the variables have not been expanded/interpolated and thus still contain things such s http://example.com/$pkgname-$pkgver.tar;, which is why it must still be intepolated by the parser. The advantage of this method is that set will print out the pkgbuild function and its contents in a canonical form, e.g. all assignments to a variable are on a single line, if/then/else statements follow a single format, etc. This makes it possible to easily parse the assignments themselves, in the order that they occur, without haing to consider all variations of valid whitespace in statements. The parser simply needs to recognize Bash syntax for things such as string substitutions, but this is a relatively limted set so it is not difficult to handle all such cases. The output of set also guarantees that you have a representation of all variable assignments (in sequential order, and within their local environment) so you have all the information that you need to interpolate them. You could even handle command output if you wish, using a command white-list to make sure that no trickery is used to run malicious code. Let me repeat that my method does not run any code in the PKGBUILD. I've tested this by including an infinite loop at the top of the file and it was not executed. I actually believe that this method provides a perfectly safe and potentially very reliable method of retrieving all metadata in the PKGBUILD with very little dependencies and considerable portability. Regards, Xyne
Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.
On Sat 09 Jan 2010 21:23 +0100, Xyne wrote: You haven't fully understood how it works so I hope you don't mind if I try to explain it again. I first check the PKGBUILD with /bin/bash -n PKGBUILD. If this command exits without error then the PKGBUILD contains valid syntax, most importantly it does not contain extra closing brackets (}). [...] Wow this is quite clever. It definitely would make the job of parsing much easier. Thanks for the explanation.
Re: [aur-dev] patch for AUR about setting the DEFAULT_LANG
On Mon 30 Nov 2009 17:27 +0800, Athurg Gooth wrote: When i port a chinese version AUR, I fount this bug. That once i setting a default language to sth(eg: zh_CN) by change DEFAULT_LANG macr define in web/lib/config.inc, it won't work, and this language page(here is zh_CN) could not show its native strings. DEFAULT_LANG was supposed to indicate the language that strings in the code are written in, so that if someone asked for 'en' then the code wouldn't look for en.po and come up with an error. I think your idea makes more sense though. Then i turn back to check if i got a wrong spelling. But i fount the developer have told us that options couldn't be change(in web/lib/config.inc, line 48). So I think maybe its a bug which havn't been fixed. After i check all the code about language setting, i think i got the reason. We have two problems which cause that bug. First, in .../web/lib/aur.inc, between line 296 to line 298. Even the $LANG==DEFAULT_LANG, we should include the $LANG.po file. Because once the DEFAULT_LANG isn't english, we also need translate the strings. So i just suggest add an 'else' branch after line 298 to include DEFAULT_LANG.po.such aselse{include_once(DEFAULT_LANG..po;)} Second, in .../web/lib/translator.inc, between line 52 to line 62. The reason is as the same as i said above. If we havn't set a $LANG var, the $LANG will be set to DEFAULT_LANG. But the DEFAULT_LANG doesn't mean english. Even the $LANG havn't been set, the $_t maybe setting (see Firest above) when include from DEFAULT_LANG.po. We should also translate them. So i think we should remove the 'else' identify. make the 'else' branch work for ever. By the way if the function include_lang() in .../web/lib/translator.inc, between line 32 to line 40 is an old function to make the lang func? Maybe we should remove them. Indeed. I think we can remove it now. I prepare to make a mirror for our AUR to chinese people, how could i got the databases an files from aur.archlinux.org. OR i couldn't make a mirror for that. It's great to hear that people are playing with the AUR code. Thanks for the patch. I've applied a slightly modified version. It has helped reveal some redundant code that we could eliminate too. Please let us know more about your ideas about mirroring the AUR. Thanks and cheers! Sorry about the delay.
Re: [aur-dev] Safe and relatively reliable PKGBUILD parser.
Loui Chang wrote: Wow this is quite clever. It definitely would make the job of parsing much easier. Thanks for the explanation. :) I intend to flesh out the parser as special cases pop up. As already mentioned, there will be limits to what it can do depending on whether the packager uses command output to set variables, but perhaps Arch could eventually impose a de facto standard for PKGBUILDs using the parser itself as the standard, i.e. if the PKGBUILD metadata gets past the parser, the PKGBUILD itself would be considered invalid. In that case, the parser would support tricks such as [ $ARCH == x86_64 ] depends=('foo' 'bar') I want to be very clear that I am NOT suggesting that my parser become the standard, only that a parser based on this approach _could_ become one. Also note that this is really a method on its own that just happens to be implemented in Perl in this case. If you look at the code, you will see that it could very quickly be adapted to Python (and thus Django), or PHP, or just about anything.