Re: [arch-general] PKGBUILD parser

2010-05-11 Thread Andre "Osku" Schmidt
On Mon, May 10, 2010 at 8:14 PM, Pierre Chapuis  wrote:
>>> http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
>>
> Actually if you read the page linked above completely you will notice that it 
> says that you can. Regexps like POSIX that use finite automata can't but PCRE 
> (that are everywhere) can, at least recent versions. That's also why they are 
> slower.

oh, indeed, thanks! i somehow skipped all those and went straight to
read about CFG/PEG...

here's what seems to work in php (and i assume in many server side languages):
/^build\s*\(\)\s*(\{((?>[^{}]+)|(?R1))*\})/mx

sadly it (flag x) didn't work in any browser (js) i tested...

cheers
.andre


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Loui Chang
On Mon 10 May 2010 09:23 +1000, Allan McRae wrote:
> On 10/05/10 02:06, Loui Chang wrote:
> >On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:
> >>But I just had an idea now, if we're thinking about AUR use case :
> >>makepkg --source could generate a suitable and parsable file providing
> >>all information that AUR needs, and ships that next to the PKGBUILD in
> >>the source tarball. Does that sound crazy ?
> >>This would not fix the problem now, but it could fix it eventually,
> >>when most pkgbuilds are re-submitted. Or this parsable file could be
> >>generated for all pkgbuilds in a row, just for the conversion, in a
> >>chroot/jail on a machine not in production.
> >
> >Yeah I've thought about this as well. Source packages could have a
> >similar format as binary packages with a .PKGINFO file to present the
> >metadata in an easily parsable format.
> >
> >You can read some of my incomplete brainstormings here:
> >http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz
>
> I am told I like to be really negative anytime this is bought up...
> it is not deliberate, I just see the barriers to this working.  So
> here we go!  I know you have pointed out some problems already and
> this is related.

No problem. I didn't really share this before because I hadn't even
thought of a real solution. Since it was mentioned though, I thought I'd
share my thoughts. There are definitely many barriers to sort out.

> makepkg does not actually parse any of the splitpkg overrides until
> build time. How do we get the packaging variable overrides without
> actually making the package (and on every architecture)?  We would
> need to extract the needed fields from the package functions somehow.
> So that brings us back to needing to hack a bash parser in makepkg or
> to actually require the package building to take place before you can
> create a source package.  And this is not restricted to package
> splitting...
>
> e.g.
>
> pkgname=foo
> ...
> # depends not needed at make time
> # depends=('bar')
> ...
> package() {
>   depends=('bar')
> }
>
> Welcome to the world of makepkg hacks...   And do not think such
> hacks are not used.  The old klibc PKGBUILD generated a provides
> array in the build function on the basis of a file name only
> available at the end of the build process.

Yeah there'd have to be some kind of standard constructs for all these
kinds of hacks like platform specific dependencies, etc. That would
probably mean changing or expanding the PKGBUILD spec. I wouldn't be
afraid to do that, but it might not sit well with compatibility or with
Arch principles.

> The joy of PKGBUILDs is that they are so flexible.  The problem with
> PKGBUILDs is that they are so flexible.

Indeed.



Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Kaiting Chen
Interesting I didn't realize that. But then it's not really a 'regular'
expression then. They should call it a 'limited-context-free' expression...

Kaiting.

On Mon, May 10, 2010 at 2:21 PM, C Anthony Risinger wrote:

> On Mon, May 10, 2010 at 1:14 PM, Pierre Chapuis 
> wrote:
> >
> > "Andre "Osku" Schmidt"  a écrit :
> >
> >>> A regular
> >>> expression will never be able to parse that.because it can never decide
> >>> which brace is the final one. This might be better explained here.
> >>>
> >>>
> http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
> >>
> >>thank you. i'll continue my journey on parsing this another way.
> >
> > Actually if you read the page linked above completely you will notice
> that it says that you can. Regexps like POSIX that use finite automata can't
> but PCRE (that are everywhere) can, at least recent versions. That's also
> why they are slower.
>
> yes via a "recursive" expression.  another option would be to use a
> multipass setup (i havent looked at the OP's code) to break the
> problem into smaller chunks instead of trying to to it all in one
> expression (i.e. use an expression to count the braces/etc. and build
> another expression dynamically based off the results of the first)
>



-- 
Kiwis and Limes: http://kaitocracy.blogspot.com/


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread C Anthony Risinger
On Mon, May 10, 2010 at 1:14 PM, Pierre Chapuis  wrote:
>
> "Andre "Osku" Schmidt"  a écrit :
>
>>> A regular
>>> expression will never be able to parse that.because it can never decide
>>> which brace is the final one. This might be better explained here.
>>>
>>> http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
>>
>>thank you. i'll continue my journey on parsing this another way.
>
> Actually if you read the page linked above completely you will notice that it 
> says that you can. Regexps like POSIX that use finite automata can't but PCRE 
> (that are everywhere) can, at least recent versions. That's also why they are 
> slower.

yes via a "recursive" expression.  another option would be to use a
multipass setup (i havent looked at the OP's code) to break the
problem into smaller chunks instead of trying to to it all in one
expression (i.e. use an expression to count the braces/etc. and build
another expression dynamically based off the results of the first)


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Pierre Chapuis

"Andre "Osku" Schmidt"  a écrit :

>> A regular
>> expression will never be able to parse that.because it can never decide
>> which brace is the final one. This might be better explained here.
>>
>> http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
>
>thank you. i'll continue my journey on parsing this another way.

Actually if you read the page linked above completely you will notice that it 
says that you can. Regexps like POSIX that use finite automata can't but PCRE 
(that are everywhere) can, at least recent versions. That's also why they are 
slower.
-- 
catwell (from mobile phone) 

Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Andre "Osku" Schmidt
my journey ended here:
http://en.wikipedia.org/wiki/Parsing_expression_grammar

i tried couple hours to understand how to use the two js librarys that
are listed there... but yeah, i have no expertise in that (PEG) field
and couldn't find any newbie friendly tutorials on that subject...

so another freezed/dead project on my side.
.andre

ps. i just wanted to experiment on making a web gui for editing
PKGBUILD files. you know, with validation, completion, "help bubbles",
whistles&bells, etc...


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Xavier Chantry
On Mon, May 10, 2010 at 1:23 AM, Allan McRae  wrote:
> On 10/05/10 02:06, Loui Chang wrote:
>>
>> On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:
>>>
>>> On Sun, May 9, 2010 at 2:44 PM, Allan McRae  wrote:

 Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It
 also
 fails with package splitting...
>>
>>> But I just had an idea now, if we're thinking about AUR use case :
>>> makepkg --source could generate a suitable and parsable file providing
>>> all information that AUR needs, and ships that next to the PKGBUILD in
>>> the source tarball. Does that sound crazy ?
>>> This would not fix the problem now, but it could fix it eventually,
>>> when most pkgbuilds are re-submitted. Or this parsable file could be
>>> generated for all pkgbuilds in a row, just for the conversion, in a
>>> chroot/jail on a machine not in production.
>>
>> Yeah I've thought about this as well. Source packages could have a
>> similar format as binary packages with a .PKGINFO file to present the
>> metadata in an easily parsable format.
>>
>> You can read some of my incomplete brainstormings here:
>> http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz
>>
>
> I am told I like to be really negative anytime this is bought up...  it is
> not deliberate, I just see the barriers to this working.  So here we go!  I
> know you have pointed out some problems already and this is related.
>
> makepkg does not actually parse any of the splitpkg overrides until build
> time. How do we get the packaging variable overrides without actually making
> the package (and on every architecture)?  We would need to extract the
> needed fields from the package functions somehow.  So that brings us back to
> needing to hack a bash parser in makepkg or to actually require the package
> building to take place before you can create a source package.  And this is
> not restricted to package splitting...
>
> e.g.
>
> pkgname=foo
> ...
> # depends not needed at make time
> # depends=('bar')
> ...
> package() {
>  depends=('bar')
> }
>
> Welcome to the world of makepkg hacks...   And do not think such hacks are
> not used.  The old klibc PKGBUILD generated a provides array in the build
> function on the basis of a file name only available at the end of the build
> process.
>
> The joy of PKGBUILDs is that they are so flexible.  The problem with
> PKGBUILDs is that they are so flexible.
>

The biggest problem indeed comes from any variables that are declared
inside a function. Well, it's easy, let's just make a rule to forbid
it.
Any AUR packager who breaks the rule will have its package data messed
up in the AUR interface. Too bad for him/her.
The klibc package is/was an exception, not the rule, and it wasn't on
AUR so less problematic (still problematic for other tools like my
python check-packages for integrity check, but well).

So the main thing is split variables that need to be moved top-level.
Dan, Aaron and I had some proposals / examples how to deal with that.
You were included in the few mail exchanges we had but I am not sure
if you did receive all of them as you didn't reply directly in that
thread, I will forward it to you.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Allan McRae

On 10/05/10 02:06, Loui Chang wrote:

On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:

On Sun, May 9, 2010 at 2:44 PM, Allan McRae  wrote:

Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It also
fails with package splitting...



But I just had an idea now, if we're thinking about AUR use case :
makepkg --source could generate a suitable and parsable file providing
all information that AUR needs, and ships that next to the PKGBUILD in
the source tarball. Does that sound crazy ?
This would not fix the problem now, but it could fix it eventually,
when most pkgbuilds are re-submitted. Or this parsable file could be
generated for all pkgbuilds in a row, just for the conversion, in a
chroot/jail on a machine not in production.


Yeah I've thought about this as well. Source packages could have a
similar format as binary packages with a .PKGINFO file to present the
metadata in an easily parsable format.

You can read some of my incomplete brainstormings here:
http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz



I am told I like to be really negative anytime this is bought up...  it 
is not deliberate, I just see the barriers to this working.  So here we 
go!  I know you have pointed out some problems already and this is 
related.


makepkg does not actually parse any of the splitpkg overrides until 
build time. How do we get the packaging variable overrides without 
actually making the package (and on every architecture)?  We would need 
to extract the needed fields from the package functions somehow.  So 
that brings us back to needing to hack a bash parser in makepkg or to 
actually require the package building to take place before you can 
create a source package.  And this is not restricted to package splitting...


e.g.

pkgname=foo
...
# depends not needed at make time
# depends=('bar')
...
package() {
  depends=('bar')
}

Welcome to the world of makepkg hacks...   And do not think such hacks 
are not used.  The old klibc PKGBUILD generated a provides array in the 
build function on the basis of a file name only available at the end of 
the build process.


The joy of PKGBUILDs is that they are so flexible.  The problem with 
PKGBUILDs is that they are so flexible.


Allan


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread vlad
Hello,

On Sun, May 09, 2010 at 12:06:34PM -0400, Loui Chang wrote:
> On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:
> > On Sun, May 9, 2010 at 2:44 PM, Allan McRae  wrote:
> > > Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It 
> > > also
> > > fails with package splitting...
> 
> > But I just had an idea now, if we're thinking about AUR use case :
> > makepkg --source could generate a suitable and parsable file providing
> > all information that AUR needs, and ships that next to the PKGBUILD in
> > the source tarball. Does that sound crazy ?
> > This would not fix the problem now, but it could fix it eventually,
> > when most pkgbuilds are re-submitted. Or this parsable file could be
> > generated for all pkgbuilds in a row, just for the conversion, in a
> > chroot/jail on a machine not in production.
> 
> Yeah I've thought about this as well. Source packages could have a
> similar format as binary packages with a .PKGINFO file to present the
> metadata in an easily parsable format.
The idea of a separate file only for parsing metadata is pretty good.
The functions are not needed for the metainformation of the src.tar.gz.
In pkgman I cut the functions out of the PKGBUILD and source
only the remaining variables:
"
# get rid of all functions (from first appearing function to EOF) and empty 
lines in $_BuildScript
sed -i -e "/[[:space:]]*()[[:space:]]*[^}]/,$ d" -e 
"/^[[:space:]]*pkgname=/,$ { /^$/d; }" ${__TMPPKGBUILD}
bash -n ${__TMPPKGBUILD} && source ${__TMPPKGBUILD} && rm --force 
${__TMPPKGBUILD} || error "blablabla"
"

Sure, if one is really malevolent, he can add a var like "_iamevil=$(rm -fr 
${HOME})". 
But this is a common sourcing problem which one has with all script
languages.
The problem with additional files is the one debian has. The
control.tar.gz in debian packages contains multiple files, which provide almost 
no
information. So most of these files are useless. But I think this is not
intended in Arch.
However, Xyne made a function based information parser, which I actually
didn't understand. It would be nice if Xyne could explain his ideas more
detailed and give some hints how to use it with bash. 
> 
> You can read some of my incomplete brainstormings here:
> http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz

-- 


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Xavier Chantry
On Sun, May 9, 2010 at 6:06 PM, Loui Chang  wrote:
>
> Yeah I've thought about this as well. Source packages could have a
> similar format as binary packages with a .PKGINFO file to present the
> metadata in an easily parsable format.
>
> You can read some of my incomplete brainstormings here:
> http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz
>
>

Ah, that's great, never heard of that before !

A few comments :
- using the PKGINFO format sounds like a good idea, but not sure why
you want to keep the same name. As you noticed yourself, this would
cause stupid problems like a possible confusion between source and
package tarballs. Better just call it SRCINFO, so pacman will never be
confused.
- for split pkgbuilds and arch , well... Maybe it would be simpler to
write as many SRCINFO as there are PKGINFO/packages , i.e. one for
every combination of split name / arch. Maybe all these files could be
all combined into just one, I am not sure.
But I would not care about data duplication, I would rather keep it as
dummy and easy to parse as possible.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Loui Chang
On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:
> On Sun, May 9, 2010 at 2:44 PM, Allan McRae  wrote:
> > Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It also
> > fails with package splitting...

> But I just had an idea now, if we're thinking about AUR use case :
> makepkg --source could generate a suitable and parsable file providing
> all information that AUR needs, and ships that next to the PKGBUILD in
> the source tarball. Does that sound crazy ?
> This would not fix the problem now, but it could fix it eventually,
> when most pkgbuilds are re-submitted. Or this parsable file could be
> generated for all pkgbuilds in a row, just for the conversion, in a
> chroot/jail on a machine not in production.

Yeah I've thought about this as well. Source packages could have a
similar format as binary packages with a .PKGINFO file to present the
metadata in an easily parsable format.

You can read some of my incomplete brainstormings here:
http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz



Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Andre "Osku" Schmidt
On Sun, May 9, 2010 at 4:57 PM, Kaiting Chen  wrote:
> Just to let you know dude, you can't parse that with a regular expression. A
> regular expression is modeled / parsed by a finite automaton = a state
> machine with a finite number of states. Braces allow nesting which creates a
> source with potentially an infinite number of states consider,
>
> a() { echo 1; b() { echo 2; }; }
>
> Potentially I could next expressions like that endlessly. A regular
> expression will never be able to parse that.because it can never decide
> which brace is the final one. This might be better explained here.
>
> http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

thank you. i'll continue my journey on parsing this another way.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Kaiting Chen
Just to let you know dude, you can't parse that with a regular expression. A
regular expression is modeled / parsed by a finite automaton = a state
machine with a finite number of states. Braces allow nesting which creates a
source with potentially an infinite number of states consider,

a() { echo 1; b() { echo 2; }; }

Potentially I could next expressions like that endlessly. A regular
expression will never be able to parse that.because it can never decide
which brace is the final one. This might be better explained here.

http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

Kaiting.

On Sun, May 9, 2010 at 10:21 AM, Xavier Chantry wrote:

> On Sun, May 9, 2010 at 2:44 PM, Allan McRae  wrote:
> >
> > Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It
> also
> > fails with package splitting...
> >
>
> Makes me wonder why pkgbuilds are written in bash. Sounds like a big
> design flaw.
>
> But it depends on what our needs are :
> 1) we don't care about untrusted source or security, we always trust
> the source, then bash sourcing is very convenient (original idea
> behind that design)
> 2) we care about security and dealing with untrusted source in a
> secure way : the existing format sucks
>
> Currently we are neither in 1), nor in 2), we are somewhere in the
> middle with the inconvenient of both sides. We lost the convenience of
> 1) bash sourcing with package splitting. (I've been meaning to fix
> this for one year or so, just never got to it).
>
> And we don't have any ideas about how we could ever suit 2).
> Changing pkgbuild format doesn't sound really doable and realistic, it
> might be the most important characterization of what Arch is, changing
> it would make a new distrib.
> But I just had an idea now, if we're thinking about AUR use case :
> makepkg --source could generate a suitable and parsable file providing
> all information that AUR needs, and ships that next to the PKGBUILD in
> the source tarball. Does that sound crazy ?
> This would not fix the problem now, but it could fix it eventually,
> when most pkgbuilds are re-submitted. Or this parsable file could be
> generated for all pkgbuilds in a row, just for the conversion, in a
> chroot/jail on a machine not in production.
>
> To re-iterate : PKGBUILD format was meant to be easy to parse by using
> bash source. The moment you stop using bash source, it's just all
> wrong, and it's the format you have to change.
>



-- 
Kiwis and Limes: http://kaitocracy.blogspot.com/


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Xavier Chantry
On Sun, May 9, 2010 at 2:44 PM, Allan McRae  wrote:
>
> Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It also
> fails with package splitting...
>

Makes me wonder why pkgbuilds are written in bash. Sounds like a big
design flaw.

But it depends on what our needs are :
1) we don't care about untrusted source or security, we always trust
the source, then bash sourcing is very convenient (original idea
behind that design)
2) we care about security and dealing with untrusted source in a
secure way : the existing format sucks

Currently we are neither in 1), nor in 2), we are somewhere in the
middle with the inconvenient of both sides. We lost the convenience of
1) bash sourcing with package splitting. (I've been meaning to fix
this for one year or so, just never got to it).

And we don't have any ideas about how we could ever suit 2).
Changing pkgbuild format doesn't sound really doable and realistic, it
might be the most important characterization of what Arch is, changing
it would make a new distrib.
But I just had an idea now, if we're thinking about AUR use case :
makepkg --source could generate a suitable and parsable file providing
all information that AUR needs, and ships that next to the PKGBUILD in
the source tarball. Does that sound crazy ?
This would not fix the problem now, but it could fix it eventually,
when most pkgbuilds are re-submitted. Or this parsable file could be
generated for all pkgbuilds in a row, just for the conversion, in a
chroot/jail on a machine not in production.

To re-iterate : PKGBUILD format was meant to be easy to parse by using
bash source. The moment you stop using bash source, it's just all
wrong, and it's the format you have to change.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Allan McRae

On 09/05/10 22:35, Matthew Monaco wrote:

On 05/09/2010 05:53 AM, Andre "Osku" Schmidt wrote:

Well,

my second rewrite seems also come to a dead-end, and before i rewrite
this again, i was hoping someone here could give me tips on what would
be the best method to parse a PKGBUILD file ?

you can play with my latest fail here:
http://osku.de/dump/pkgbuild.js/test-pkgbuild.html

@todo parseBuild() fails if you have } in body
@todo some array content in PKGBUILD mix ' and ", so this fails ATM...
(and the test-pkgbuild.html somehow doesn't parse build() on every
second run...)

cheers
.andre



Is this something that's strictly intended for the web? Because you
could just source it, and that would only leave you with the comments to
inspect.


Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It 
also fails with package splitting...


Allan


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Matthew Monaco

On 05/09/2010 05:53 AM, Andre "Osku" Schmidt wrote:

Well,

my second rewrite seems also come to a dead-end, and before i rewrite
this again, i was hoping someone here could give me tips on what would
be the best method to parse a PKGBUILD file ?

you can play with my latest fail here:
http://osku.de/dump/pkgbuild.js/test-pkgbuild.html

@todo parseBuild() fails if you have } in body
@todo some array content in PKGBUILD mix ' and ", so this fails ATM...
(and the test-pkgbuild.html somehow doesn't parse build() on every
second run...)

cheers
.andre



Is this something that's strictly intended for the web? Because you could just 
source it, and that would only leave you with the comments to inspect.