Re: Goldie Parsing System v0.4 Released - Now for D2

2011-04-16 Thread Nick Sabalausky
Nick Sabalausky a@a.a wrote in message 
news:ioanmi$82c$1...@digitalmars.com...
 Andrej Mitrovic Wrote:

 What I meant was that code like this will throw if MyType isn't
 defined anywhere:

 int main(int x)
 {
 MyType var;
 }

 goldie.exception.UnexpectedTokenException@src\goldie\exception.d(35):
 test.c(3:12): Unexpected Id: 'var'

 It looks like valid C /syntax/, except that MyType isn't defined. But
 this will work:
 struct MyType {
int field;
 };
 int main(int x)
 {
 struct MyType var;
 }

 So either Goldie or ParseAnything needs to have all types defined.
 Maybe this is obvious, but I wouldn't know since I've never used a
 parser before. :p

 Oddly enough, this one will throw:
 typedef struct {
 int field;
 } MyType;
 int main(int x)
 {
 MyType var;
 }

 goldie.exception.UnexpectedTokenException@src\goldie\exception.d(35):
 test.c(7:12): Unexpected Id: 'var'

 This one will throw as well:
 struct SomeStruct {
 int field;
 };
 typedef struct SomeStruct MyType;
 int main(int x)
 {
 MyType var;
 }

 goldie.exception.UnexpectedTokenException@src\goldie\exception.d(35):
 test.c(13:12): Unexpected Id: 'myvar'

 Isn't typedef a part of ANSI C?

 I'm not at my computer right now, so I can't check, but it sounds like the 
 grammar follows the really old C-style of requiring structs to be declared 
 with struct StructName varName. Apperently it doesn't take into account 
 the possibility of typedefs being used to eliminate that. When I get home, 
 I'll check, I think it may be an easy change to the grammar.


Yea, turns out that grammar just doesn't support using user-defined types 
without preceding them with struct, union, or enum. You can see that 
here:

Var Decl ::= Mod Type Var Var List  ';'
 |   Type Var Var List  ';'
 | ModVar Var List  ';'

Mod  ::= extern
 | static
 | register
 | auto
 | volatile
 | const

Type ::= Base Pointers

Base ::= Sign Scalar  ! Ie, the built-ins like char, signed int, 
etc...
 | struct Id
 | struct '{' Struct Def '}'
 | union Id
 | union '{' Struct Def '}'
 | enum Id

So when you use MyType instead of struct MyType: It sees MyType, 
assumes it's a variable since it doesn't match any of the Type forms 
above, and then barfs on var because variable1 variable2 isn't valid C 
code. Normally, you'd just add another form to Base (Ie, add a line after 
  | enum Id that says   | Id ). Except, the problem is...

C is notorious for types and variables being ambiguous with each other. So 
the distinction pretty much has to be done in the semantic phase (ie, 
outside of the formal grammar). But this grammar seems to be trying to make 
that distinction anyway. So trying to fix it by just simply adding a Base 
::= Id leads to ambiguity problems with types versus variables/expressions. 
That's probably why they didn't enhance the grammar that far - their 
separation of type and variable approach doesn't really work for C.

I'll have to think a bit on how best to adjust it. You can also check the 
GOLD mailing lists here to see if anyone has another C grammar:

http://www.devincook.com/goldparser/contact.htm





Re: Goldie Parsing System v0.4 Released - Now for D2

2011-04-16 Thread Kagamin
Nick Sabalausky Wrote:

 Yea, turns out that grammar just doesn't support using user-defined types 
 without preceding them with struct, union, or enum. You can see that 
 here:
 
 Var Decl ::= Mod Type Var Var List  ';'
  |   Type Var Var List  ';'
  | ModVar Var List  ';'
 
 Mod  ::= extern
  | static
  | register
  | auto
  | volatile
  | const
 
 Type ::= Base Pointers
 
 Base ::= Sign Scalar  ! Ie, the built-ins like char, signed int, 
 etc...
  | struct Id
  | struct '{' Struct Def '}'
  | union Id
  | union '{' Struct Def '}'
  | enum Id
 
 So when you use MyType instead of struct MyType: It sees MyType, 
 assumes it's a variable since it doesn't match any of the Type forms 
 above, and then barfs on var because variable1 variable2 isn't valid C 
 code. Normally, you'd just add another form to Base (Ie, add a line after 
   | enum Id that says   | Id ). Except, the problem is...
 
 C is notorious for types and variables being ambiguous with each other.

As I understand, Type is a type, Var is a variable. There should be no 
problem here.


Re: Goldie Parsing System v0.4 Released - Now for D2

2011-04-16 Thread Nick Sabalausky
Kagamin s...@here.lot wrote in message 
news:iod552$rbe$1...@digitalmars.com...
 Nick Sabalausky Wrote:

 Yea, turns out that grammar just doesn't support using user-defined types
 without preceding them with struct, union, or enum. You can see 
 that
 here:

 Var Decl ::= Mod Type Var Var List  ';'
  |   Type Var Var List  ';'
  | ModVar Var List  ';'

 Mod  ::= extern
  | static
  | register
  | auto
  | volatile
  | const

 Type ::= Base Pointers

 Base ::= Sign Scalar  ! Ie, the built-ins like char, signed 
 int,
 etc...
  | struct Id
  | struct '{' Struct Def '}'
  | union Id
  | union '{' Struct Def '}'
  | enum Id

 So when you use MyType instead of struct MyType: It sees MyType,
 assumes it's a variable since it doesn't match any of the Type forms
 above, and then barfs on var because variable1 variable2 isn't valid 
 C
 code. Normally, you'd just add another form to Base (Ie, add a line 
 after
   | enum Id that says   | Id ). Except, the problem is...

 C is notorious for types and variables being ambiguous with each other.

 As I understand, Type is a type, Var is a variable. There should be no 
 problem here.

First of all, the name Var up there is misleading. That only refers the 
the name of the variable in the variable's declaration. When actually 
*using* a variable, that's a Value, which is defined like this:

Value  ::= OctLiteral
   | HexLiteral
   | DecLiteral
   | StringLiteral
   | CharLiteral
   | FloatLiteral
   | Id '(' Expr ')'   ! Function call
   | Id '(' ')' ! Function call
   | Id   ! Use a variable
   | '(' Expr ')'

So we have a situation like this:

Type ::= Base
Base ::= Id
Value ::= Id

So when the parser encounters an Id, how does it know whether to reduce it 
to a Base or a Value? Since they can both appear in the same place (Ex: 
Immediately after a left curly-brace, such as at the start of a function 
body), there's no way to tell.

Worse, suppose it comes across this:

x*y

If x is a variable, then that's a multiplication. If x is a type then it's a 
pointer declaration. Is it supposed to be multiplication or a declaration? 
Could be either. They're both permitted in the same place.





Re: Goldie Parsing System v0.4 Released - Now for D2

2011-04-16 Thread Nick Sabalausky
Nick Sabalausky a@a.a wrote in message 
news:iod6fn$tch$1...@digitalmars.com...
 Kagamin s...@here.lot wrote in message 
 news:iod552$rbe$1...@digitalmars.com...

 As I understand, Type is a type, Var is a variable. There should be 
 no problem here.

 First of all, the name Var up there is misleading. That only refers the 
 the name of the variable in the variable's declaration. When actually 
 *using* a variable, that's a Value, which is defined like this:

 Value  ::= OctLiteral
   | HexLiteral
   | DecLiteral
   | StringLiteral
   | CharLiteral
   | FloatLiteral
   | Id '(' Expr ')'   ! Function call
   | Id '(' ')' ! Function call
   | Id   ! Use a variable
   | '(' Expr ')'

 So we have a situation like this:

 Type ::= Base
 Base ::= Id
 Value ::= Id

 So when the parser encounters an Id, how does it know whether to reduce it 
 to a Base or a Value? Since they can both appear in the same place 
 (Ex: Immediately after a left curly-brace, such as at the start of a 
 function body), there's no way to tell.

 Worse, suppose it comes across this:

 x*y

 If x is a variable, then that's a multiplication. If x is a type then it's 
 a pointer declaration. Is it supposed to be multiplication or a 
 declaration? Could be either. They're both permitted in the same place.


In other words, we basically have a form of this:

A ::= B | C
B ::= X
C ::= X

Can't be done. No way to tell if X is B or C.




Re: Goldie Parsing System v0.4 Released - Now for D2

2011-04-16 Thread Nick Sabalausky
Nick Sabalausky a@a.a wrote in message 
news:iobh9o$1d04$1...@digitalmars.com...
 Nick Sabalausky a@a.a wrote in message 
 news:ioanmi$82c$1...@digitalmars.com...
 Andrej Mitrovic Wrote:

 What I meant was that code like this will throw if MyType isn't
 defined anywhere:

 int main(int x)
 {
 MyType var;
 }

 goldie.exception.UnexpectedTokenException@src\goldie\exception.d(35):
 test.c(3:12): Unexpected Id: 'var'

 It looks like valid C /syntax/, except that MyType isn't defined. But
 this will work:
 struct MyType {
int field;
 };
 int main(int x)
 {
 struct MyType var;
 }

 So either Goldie or ParseAnything needs to have all types defined.
 Maybe this is obvious, but I wouldn't know since I've never used a
 parser before. :p

 Oddly enough, this one will throw:
 typedef struct {
 int field;
 } MyType;
 int main(int x)
 {
 MyType var;
 }

 goldie.exception.UnexpectedTokenException@src\goldie\exception.d(35):
 test.c(7:12): Unexpected Id: 'var'

 This one will throw as well:
 struct SomeStruct {
 int field;
 };
 typedef struct SomeStruct MyType;
 int main(int x)
 {
 MyType var;
 }

 goldie.exception.UnexpectedTokenException@src\goldie\exception.d(35):
 test.c(13:12): Unexpected Id: 'myvar'

 Isn't typedef a part of ANSI C?

 I'm not at my computer right now, so I can't check, but it sounds like 
 the grammar follows the really old C-style of requiring structs to be 
 declared with struct StructName varName. Apperently it doesn't take 
 into account the possibility of typedefs being used to eliminate that. 
 When I get home, I'll check, I think it may be an easy change to the 
 grammar.


 Yea, turns out that grammar just doesn't support using user-defined types 
 without preceding them with struct, union, or enum. You can see that 
 here:

 Var Decl ::= Mod Type Var Var List  ';'
 |   Type Var Var List  ';'
 | ModVar Var List  ';'

 Mod  ::= extern
 | static
 | register
 | auto
 | volatile
 | const

 Type ::= Base Pointers

 Base ::= Sign Scalar  ! Ie, the built-ins like char, signed int, 
 etc...
 | struct Id
 | struct '{' Struct Def '}'
 | union Id
 | union '{' Struct Def '}'
 | enum Id

 So when you use MyType instead of struct MyType: It sees MyType, 
 assumes it's a variable since it doesn't match any of the Type forms 
 above, and then barfs on var because variable1 variable2 isn't valid C 
 code. Normally, you'd just add another form to Base (Ie, add a line 
 after   | enum Id that says   | Id ). Except, the problem is...

 C is notorious for types and variables being ambiguous with each other. So 
 the distinction pretty much has to be done in the semantic phase (ie, 
 outside of the formal grammar). But this grammar seems to be trying to 
 make that distinction anyway. So trying to fix it by just simply adding a 
 Base ::= Id leads to ambiguity problems with types versus 
 variables/expressions. That's probably why they didn't enhance the grammar 
 that far - their separation of type and variable approach doesn't really 
 work for C.

 I'll have to think a bit on how best to adjust it. You can also check the 
 GOLD mailing lists here to see if anyone has another C grammar:

 http://www.devincook.com/goldparser/contact.htm


Unfortunately, I think this may require LALR(k). Goldie and GOLD are only 
LALR(1) right now.

I had been under the impression that LALR(1) was sufficient because 
according to the oh-so-useful-in-the-real-world formal literature, any LR(k) 
can *technically* be converted into a *cough* equivalent LR(1). But not 
only is algorithm to do this hidden behind the academic ivory wall, but word 
on the street is that the resulting grammar is gigantic and bears little or 
no resemblance to the original structure (and is therefore essentially 
useless in the real world).

Seems I'm gonna have to add some backtracking or stack-cloning to Goldie, 
probably along with some sort of cycle-detection. (I think I'm starting to 
understand why Walter said he doesn't like to bother with parser generators, 
unngh...)