#1544: Derived Read instances for recursive datatypes with infix constructors
are
too inefficient
-------------------------------------+--------------------------------------
Reporter: [EMAIL PROTECTED] | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 6.8 branch
Component: Compiler | Version: 6.6.1
Severity: normal | Resolution:
Keywords: | Difficulty: Unknown
Os: Unknown | Testcase:
Architecture: Unknown |
-------------------------------------+--------------------------------------
Comment (by simonpj):
Koen writes: Take the following example datatype.
{{{
data T = T :+: T | A
}}}
Generating a naive recursive parser for this leads to the bad behavior.
However, for such simple recursive datatypes, we kind of know what
parsers we should generate. For example, forthe above datatype we
simply generate a parser that first parses a simple T, and then checks
if there is an occurrence of :+:, in which case it tries to parse
another T. Nice and linear.
However, we can never do this modularly. Here is why. Imagine a module
with the following datatype:
{{{
data T a b = a :+: b deriving ( Read )
}}}
The only thing we can do here is to generate the naive parser.
Imagine now another module that imports the above module, with a
datatype declaration that looks like:
{{{
data A = (T A A) :*: A
| C
deriving ( Read )
}}}
Again, the only thing we can do here is to generate a naive parser,
that uses the parser we generated for T. (We do not even know that the
above datatype is recursive without looking at the definition of T.)
Now, the resulting parser will again have the bad behavior: Any
expression starting with many parentheses will lead to exponential
behavior, because, for each parenthesis, we do not know if it belongs
to something of type A or something of type T A A until we have parsed
the whole expression.
We ''can'' do something about recursive datatypes where the left
argument of the operators has exactly the same type as the whole
datatype. This works for:
{{{
data SnocList a = Nil | SnocList a :- a
}}}
for example, but not for:
{{{
data T a = T [a] :+: T [a] | Leaf a
}}}
I guess it is worth implementing this special case anyway, although it
will not apply to all cases.
-------------------------------------------------------------------------
Here is the idea.
First a simple case. For the following Haskell code:
{{{
infix 5 +
infix 6 *
data A
= A + A
| A * A
| C B
}}}
We generate the following grammar:
{{{
A(n) ::= A(6) "+" A(6) (n<=5)
| A(7) "*" A(7) (n<=6)
| "C" B(0)
| "(" A(0) ")"
}}}
Right now, you simply turn the above into a parser directly. However,
we are going to "massage" the grammar a bit. First, we explicitly
split up the grammar into parts, depending on precedence. For this, we
need to sort and groups the operators according to precedences:
{{{
A(0..5) ::= A(6) "+" A(6)
| A(6)
A(6) ::= A(7) "*" A(7)
| A(7)
A(7..10) ::= "C" B(0)
| "(" A(0) ")"
}}}
Then, we see that we have overlap in the first two grammar parts. We
get rid of it by using optional [ ]'s:
{{{
A(0..5) ::= A(6) ["+" A(6)]
A(6) ::= A(7) ["*" A(7)]
A(7..10) ::= "C" B(0)
| "(" A(0) ")"
}}}
This can be turned into efficient Haskell code directly.
-------------------------------------------------------------------------
Now a more complicated case. For the following Haskell code:
{{{
infix 5 +
infix 6 *
data A
= A + A
| B * A -- this operator is not left-recursive
| C B
}}}
We generate the following grammar:
{{{
A(n) ::= A(6) "+" A(6) (n<=5)
| B(7) "*" A(7) (n<=6)
| "C" B(0)
| "(" A(0) ")"
}}}
Right now, you simply turn the above into a parser directly. Again, we
are going to "massage" the grammar. First, we explicitly split up the
grammar into parts, depending on precedence.
{{{
A(0..5) ::= A(6) "+" A(6)
| A(6)
A(6) ::= B(7) "*" A(7)
| A(7)
A(7..10) ::= "C" B(0)
| "(" A(0) ")"
}}}
Unfortunately, there is no (explicit) overlap in the cases for A(6).
We know that there probably exists overlap (both grammars will accept
any number of parentheses), but this is not clear, since we do not
have B's grammar.
We can get rid of some overlap by using optional [ ]'s:
{{{
A(0..5) ::= A(6) ["+" A(6)]
A(6) ::= B(7) "*" A(7)
| A(7)
A(7..10) ::= "C" B(0)
| "(" A(0) ")"
}}}
This can be turned into Haskell code directly:
{{{
readP n
| n <= 5 =
do x <- readP 6
return x +++ do "+" <- lex
y <- readP 6
return (x+y)
| n <= 6 =
do x <- readP 7 -- :: B
"*" <- lex
y <- readP 7
return (x*y)
+++ do readP 7 -- :: A
| otherwise =
do "C" <- lex
x <- readP 0
return (C x)
+++ do "(" <- lex
x <- readP 0
")" <- lex
return x
}}}
However, this code will
be inefficient when parsing A(6)'s that start with lots of
parentheses.
--
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/1544>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
Glasgow-haskell-bugs@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs