Re: [arch-general] PKGBUILD parser

2010-05-11 Thread Andre Osku Schmidt
On Mon, May 10, 2010 at 8:14 PM, Pierre Chapuis catw...@archlinux.us wrote:
 http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

 Actually if you read the page linked above completely you will notice that it 
 says that you can. Regexps like POSIX that use finite automata can't but PCRE 
 (that are everywhere) can, at least recent versions. That's also why they are 
 slower.

oh, indeed, thanks! i somehow skipped all those and went straight to
read about CFG/PEG...

here's what seems to work in php (and i assume in many server side languages):
/^build\s*\(\)\s*(\{((?[^{}]+)|(?R1))*\})/mx

sadly it (flag x) didn't work in any browser (js) i tested...

cheers
.andre


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Xavier Chantry
On Mon, May 10, 2010 at 1:23 AM, Allan McRae al...@archlinux.org wrote:
 On 10/05/10 02:06, Loui Chang wrote:

 On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:

 On Sun, May 9, 2010 at 2:44 PM, Allan McRaeal...@archlinux.org  wrote:

 Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It
 also
 fails with package splitting...

 But I just had an idea now, if we're thinking about AUR use case :
 makepkg --source could generate a suitable and parsable file providing
 all information that AUR needs, and ships that next to the PKGBUILD in
 the source tarball. Does that sound crazy ?
 This would not fix the problem now, but it could fix it eventually,
 when most pkgbuilds are re-submitted. Or this parsable file could be
 generated for all pkgbuilds in a row, just for the conversion, in a
 chroot/jail on a machine not in production.

 Yeah I've thought about this as well. Source packages could have a
 similar format as binary packages with a .PKGINFO file to present the
 metadata in an easily parsable format.

 You can read some of my incomplete brainstormings here:
 http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz


 I am told I like to be really negative anytime this is bought up...  it is
 not deliberate, I just see the barriers to this working.  So here we go!  I
 know you have pointed out some problems already and this is related.

 makepkg does not actually parse any of the splitpkg overrides until build
 time. How do we get the packaging variable overrides without actually making
 the package (and on every architecture)?  We would need to extract the
 needed fields from the package functions somehow.  So that brings us back to
 needing to hack a bash parser in makepkg or to actually require the package
 building to take place before you can create a source package.  And this is
 not restricted to package splitting...

 e.g.

 pkgname=foo
 ...
 # depends not needed at make time
 # depends=('bar')
 ...
 package() {
  depends=('bar')
 }

 Welcome to the world of makepkg hacks...   And do not think such hacks are
 not used.  The old klibc PKGBUILD generated a provides array in the build
 function on the basis of a file name only available at the end of the build
 process.

 The joy of PKGBUILDs is that they are so flexible.  The problem with
 PKGBUILDs is that they are so flexible.


The biggest problem indeed comes from any variables that are declared
inside a function. Well, it's easy, let's just make a rule to forbid
it.
Any AUR packager who breaks the rule will have its package data messed
up in the AUR interface. Too bad for him/her.
The klibc package is/was an exception, not the rule, and it wasn't on
AUR so less problematic (still problematic for other tools like my
python check-packages for integrity check, but well).

So the main thing is split variables that need to be moved top-level.
Dan, Aaron and I had some proposals / examples how to deal with that.
You were included in the few mail exchanges we had but I am not sure
if you did receive all of them as you didn't reply directly in that
thread, I will forward it to you.


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Andre Osku Schmidt
my journey ended here:
http://en.wikipedia.org/wiki/Parsing_expression_grammar

i tried couple hours to understand how to use the two js librarys that
are listed there... but yeah, i have no expertise in that (PEG) field
and couldn't find any newbie friendly tutorials on that subject...

so another freezed/dead project on my side.
.andre

ps. i just wanted to experiment on making a web gui for editing
PKGBUILD files. you know, with validation, completion, help bubbles,
whistlesbells, etc...


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Pierre Chapuis

Andre Osku Schmidt andre.osku.schm...@googlemail.com a écrit :

 A regular
 expression will never be able to parse that.because it can never decide
 which brace is the final one. This might be better explained here.

 http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

thank you. i'll continue my journey on parsing this another way.

Actually if you read the page linked above completely you will notice that it 
says that you can. Regexps like POSIX that use finite automata can't but PCRE 
(that are everywhere) can, at least recent versions. That's also why they are 
slower.
-- 
catwell (from mobile phone) 

Re: [arch-general] PKGBUILD parser

2010-05-10 Thread C Anthony Risinger
On Mon, May 10, 2010 at 1:14 PM, Pierre Chapuis catw...@archlinux.us wrote:

 Andre Osku Schmidt andre.osku.schm...@googlemail.com a écrit :

 A regular
 expression will never be able to parse that.because it can never decide
 which brace is the final one. This might be better explained here.

 http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

thank you. i'll continue my journey on parsing this another way.

 Actually if you read the page linked above completely you will notice that it 
 says that you can. Regexps like POSIX that use finite automata can't but PCRE 
 (that are everywhere) can, at least recent versions. That's also why they are 
 slower.

yes via a recursive expression.  another option would be to use a
multipass setup (i havent looked at the OP's code) to break the
problem into smaller chunks instead of trying to to it all in one
expression (i.e. use an expression to count the braces/etc. and build
another expression dynamically based off the results of the first)


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Kaiting Chen
Interesting I didn't realize that. But then it's not really a 'regular'
expression then. They should call it a 'limited-context-free' expression...

Kaiting.

On Mon, May 10, 2010 at 2:21 PM, C Anthony Risinger anth...@extof.mewrote:

 On Mon, May 10, 2010 at 1:14 PM, Pierre Chapuis catw...@archlinux.us
 wrote:
 
  Andre Osku Schmidt andre.osku.schm...@googlemail.com a écrit :
 
  A regular
  expression will never be able to parse that.because it can never decide
  which brace is the final one. This might be better explained here.
 
 
 http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
 
 thank you. i'll continue my journey on parsing this another way.
 
  Actually if you read the page linked above completely you will notice
 that it says that you can. Regexps like POSIX that use finite automata can't
 but PCRE (that are everywhere) can, at least recent versions. That's also
 why they are slower.

 yes via a recursive expression.  another option would be to use a
 multipass setup (i havent looked at the OP's code) to break the
 problem into smaller chunks instead of trying to to it all in one
 expression (i.e. use an expression to count the braces/etc. and build
 another expression dynamically based off the results of the first)




-- 
Kiwis and Limes: http://kaitocracy.blogspot.com/


Re: [arch-general] PKGBUILD parser

2010-05-10 Thread Loui Chang
On Mon 10 May 2010 09:23 +1000, Allan McRae wrote:
 On 10/05/10 02:06, Loui Chang wrote:
 On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:
 But I just had an idea now, if we're thinking about AUR use case :
 makepkg --source could generate a suitable and parsable file providing
 all information that AUR needs, and ships that next to the PKGBUILD in
 the source tarball. Does that sound crazy ?
 This would not fix the problem now, but it could fix it eventually,
 when most pkgbuilds are re-submitted. Or this parsable file could be
 generated for all pkgbuilds in a row, just for the conversion, in a
 chroot/jail on a machine not in production.
 
 Yeah I've thought about this as well. Source packages could have a
 similar format as binary packages with a .PKGINFO file to present the
 metadata in an easily parsable format.
 
 You can read some of my incomplete brainstormings here:
 http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz

 I am told I like to be really negative anytime this is bought up...
 it is not deliberate, I just see the barriers to this working.  So
 here we go!  I know you have pointed out some problems already and
 this is related.

No problem. I didn't really share this before because I hadn't even
thought of a real solution. Since it was mentioned though, I thought I'd
share my thoughts. There are definitely many barriers to sort out.

 makepkg does not actually parse any of the splitpkg overrides until
 build time. How do we get the packaging variable overrides without
 actually making the package (and on every architecture)?  We would
 need to extract the needed fields from the package functions somehow.
 So that brings us back to needing to hack a bash parser in makepkg or
 to actually require the package building to take place before you can
 create a source package.  And this is not restricted to package
 splitting...

 e.g.

 pkgname=foo
 ...
 # depends not needed at make time
 # depends=('bar')
 ...
 package() {
   depends=('bar')
 }

 Welcome to the world of makepkg hacks...   And do not think such
 hacks are not used.  The old klibc PKGBUILD generated a provides
 array in the build function on the basis of a file name only
 available at the end of the build process.

Yeah there'd have to be some kind of standard constructs for all these
kinds of hacks like platform specific dependencies, etc. That would
probably mean changing or expanding the PKGBUILD spec. I wouldn't be
afraid to do that, but it might not sit well with compatibility or with
Arch principles.

 The joy of PKGBUILDs is that they are so flexible.  The problem with
 PKGBUILDs is that they are so flexible.

Indeed.



Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Matthew Monaco

On 05/09/2010 05:53 AM, Andre Osku Schmidt wrote:

Well,

my second rewrite seems also come to a dead-end, and before i rewrite
this again, i was hoping someone here could give me tips on what would
be the best method to parse a PKGBUILD file ?

you can play with my latest fail here:
http://osku.de/dump/pkgbuild.js/test-pkgbuild.html

@todo parseBuild() fails if you have } in body
@todo some array content in PKGBUILD mix ' and , so this fails ATM...
(and the test-pkgbuild.html somehow doesn't parse build() on every
second run...)

cheers
.andre



Is this something that's strictly intended for the web? Because you could just 
source it, and that would only leave you with the comments to inspect.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Allan McRae

On 09/05/10 22:35, Matthew Monaco wrote:

On 05/09/2010 05:53 AM, Andre Osku Schmidt wrote:

Well,

my second rewrite seems also come to a dead-end, and before i rewrite
this again, i was hoping someone here could give me tips on what would
be the best method to parse a PKGBUILD file ?

you can play with my latest fail here:
http://osku.de/dump/pkgbuild.js/test-pkgbuild.html

@todo parseBuild() fails if you have } in body
@todo some array content in PKGBUILD mix ' and , so this fails ATM...
(and the test-pkgbuild.html somehow doesn't parse build() on every
second run...)

cheers
.andre



Is this something that's strictly intended for the web? Because you
could just source it, and that would only leave you with the comments to
inspect.


Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It 
also fails with package splitting...


Allan


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Xavier Chantry
On Sun, May 9, 2010 at 2:44 PM, Allan McRae al...@archlinux.org wrote:

 Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It also
 fails with package splitting...


Makes me wonder why pkgbuilds are written in bash. Sounds like a big
design flaw.

But it depends on what our needs are :
1) we don't care about untrusted source or security, we always trust
the source, then bash sourcing is very convenient (original idea
behind that design)
2) we care about security and dealing with untrusted source in a
secure way : the existing format sucks

Currently we are neither in 1), nor in 2), we are somewhere in the
middle with the inconvenient of both sides. We lost the convenience of
1) bash sourcing with package splitting. (I've been meaning to fix
this for one year or so, just never got to it).

And we don't have any ideas about how we could ever suit 2).
Changing pkgbuild format doesn't sound really doable and realistic, it
might be the most important characterization of what Arch is, changing
it would make a new distrib.
But I just had an idea now, if we're thinking about AUR use case :
makepkg --source could generate a suitable and parsable file providing
all information that AUR needs, and ships that next to the PKGBUILD in
the source tarball. Does that sound crazy ?
This would not fix the problem now, but it could fix it eventually,
when most pkgbuilds are re-submitted. Or this parsable file could be
generated for all pkgbuilds in a row, just for the conversion, in a
chroot/jail on a machine not in production.

To re-iterate : PKGBUILD format was meant to be easy to parse by using
bash source. The moment you stop using bash source, it's just all
wrong, and it's the format you have to change.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Kaiting Chen
Just to let you know dude, you can't parse that with a regular expression. A
regular expression is modeled / parsed by a finite automaton = a state
machine with a finite number of states. Braces allow nesting which creates a
source with potentially an infinite number of states consider,

a() { echo 1; b() { echo 2; }; }

Potentially I could next expressions like that endlessly. A regular
expression will never be able to parse that.because it can never decide
which brace is the final one. This might be better explained here.

http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns

Kaiting.

On Sun, May 9, 2010 at 10:21 AM, Xavier Chantry chantry.xav...@gmail.comwrote:

 On Sun, May 9, 2010 at 2:44 PM, Allan McRae al...@archlinux.org wrote:
 
  Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It
 also
  fails with package splitting...
 

 Makes me wonder why pkgbuilds are written in bash. Sounds like a big
 design flaw.

 But it depends on what our needs are :
 1) we don't care about untrusted source or security, we always trust
 the source, then bash sourcing is very convenient (original idea
 behind that design)
 2) we care about security and dealing with untrusted source in a
 secure way : the existing format sucks

 Currently we are neither in 1), nor in 2), we are somewhere in the
 middle with the inconvenient of both sides. We lost the convenience of
 1) bash sourcing with package splitting. (I've been meaning to fix
 this for one year or so, just never got to it).

 And we don't have any ideas about how we could ever suit 2).
 Changing pkgbuild format doesn't sound really doable and realistic, it
 might be the most important characterization of what Arch is, changing
 it would make a new distrib.
 But I just had an idea now, if we're thinking about AUR use case :
 makepkg --source could generate a suitable and parsable file providing
 all information that AUR needs, and ships that next to the PKGBUILD in
 the source tarball. Does that sound crazy ?
 This would not fix the problem now, but it could fix it eventually,
 when most pkgbuilds are re-submitted. Or this parsable file could be
 generated for all pkgbuilds in a row, just for the conversion, in a
 chroot/jail on a machine not in production.

 To re-iterate : PKGBUILD format was meant to be easy to parse by using
 bash source. The moment you stop using bash source, it's just all
 wrong, and it's the format you have to change.




-- 
Kiwis and Limes: http://kaitocracy.blogspot.com/


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Loui Chang
On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:
 On Sun, May 9, 2010 at 2:44 PM, Allan McRae al...@archlinux.org wrote:
  Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It also
  fails with package splitting...

 But I just had an idea now, if we're thinking about AUR use case :
 makepkg --source could generate a suitable and parsable file providing
 all information that AUR needs, and ships that next to the PKGBUILD in
 the source tarball. Does that sound crazy ?
 This would not fix the problem now, but it could fix it eventually,
 when most pkgbuilds are re-submitted. Or this parsable file could be
 generated for all pkgbuilds in a row, just for the conversion, in a
 chroot/jail on a machine not in production.

Yeah I've thought about this as well. Source packages could have a
similar format as binary packages with a .PKGINFO file to present the
metadata in an easily parsable format.

You can read some of my incomplete brainstormings here:
http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz



Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Xavier Chantry
On Sun, May 9, 2010 at 6:06 PM, Loui Chang louipc@gmail.com wrote:

 Yeah I've thought about this as well. Source packages could have a
 similar format as binary packages with a .PKGINFO file to present the
 metadata in an easily parsable format.

 You can read some of my incomplete brainstormings here:
 http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz



Ah, that's great, never heard of that before !

A few comments :
- using the PKGINFO format sounds like a good idea, but not sure why
you want to keep the same name. As you noticed yourself, this would
cause stupid problems like a possible confusion between source and
package tarballs. Better just call it SRCINFO, so pacman will never be
confused.
- for split pkgbuilds and arch , well... Maybe it would be simpler to
write as many SRCINFO as there are PKGINFO/packages , i.e. one for
every combination of split name / arch. Maybe all these files could be
all combined into just one, I am not sure.
But I would not care about data duplication, I would rather keep it as
dummy and easy to parse as possible.


Re: [arch-general] PKGBUILD parser

2010-05-09 Thread Allan McRae

On 10/05/10 02:06, Loui Chang wrote:

On Sun 09 May 2010 16:21 +0200, Xavier Chantry wrote:

On Sun, May 9, 2010 at 2:44 PM, Allan McRaeal...@archlinux.org  wrote:

Sourcing is dangerous if the PKGBUILD is from an untrusted source.  It also
fails with package splitting...



But I just had an idea now, if we're thinking about AUR use case :
makepkg --source could generate a suitable and parsable file providing
all information that AUR needs, and ships that next to the PKGBUILD in
the source tarball. Does that sound crazy ?
This would not fix the problem now, but it could fix it eventually,
when most pkgbuilds are re-submitted. Or this parsable file could be
generated for all pkgbuilds in a row, just for the conversion, in a
chroot/jail on a machine not in production.


Yeah I've thought about this as well. Source packages could have a
similar format as binary packages with a .PKGINFO file to present the
metadata in an easily parsable format.

You can read some of my incomplete brainstormings here:
http://louipc.mine.nu/arch/%5BRFC%5D-PKGINFO-in-srctargz



I am told I like to be really negative anytime this is bought up...  it 
is not deliberate, I just see the barriers to this working.  So here we 
go!  I know you have pointed out some problems already and this is 
related.


makepkg does not actually parse any of the splitpkg overrides until 
build time. How do we get the packaging variable overrides without 
actually making the package (and on every architecture)?  We would need 
to extract the needed fields from the package functions somehow.  So 
that brings us back to needing to hack a bash parser in makepkg or to 
actually require the package building to take place before you can 
create a source package.  And this is not restricted to package splitting...


e.g.

pkgname=foo
...
# depends not needed at make time
# depends=('bar')
...
package() {
  depends=('bar')
}

Welcome to the world of makepkg hacks...   And do not think such hacks 
are not used.  The old klibc PKGBUILD generated a provides array in the 
build function on the basis of a file name only available at the end of 
the build process.


The joy of PKGBUILDs is that they are so flexible.  The problem with 
PKGBUILDs is that they are so flexible.


Allan