On Oct 20, 2006, at 12:17 PM, Sylvain Schmitz wrote:
A serious study could attempt to find deeper connections, but the connections seem likely to be roughly the same as the ones with context-free grammars.
This level of analysis is exactly what I had in mind. The notational similarities are nothing but a red herring, which I agree are apt to mislead the casual reader. (Indeed, it is this short of notational overloading that leads us to seek Quasi Natural Language representations in which terms of art can directly stand for themselves rather than forcing the reader to disambiguate every group's preferred interpretation of common characters like '/'.)
Steedman's discussion of efficient chart-based CCG parsing strategies struck me as being vaguely analogous to Packrat Parsing. The "fully lexicalized" nature of the grammar which makes its dynamic extension painless, its ability to tackle phenomena like "argument cluster coordination", and its functional programming orientation in directly deriving logical forms rather than parse trees are all very appealing and strike me as being more elegant than other approaches to context- free grammars.
In short, I would hypothesize that CGGs may be a viable compliment to PEGs in moving from Quasi Natural Language utterances that you might make in explaining your code to a fellow programmer used to working in another programming language to unambiguous programming language semantics that are amenable to execution. So one might posit interleaving PEGs and CCGs in such a way that a PEG would tokenize input; invoke a CCG lexicon & parse phase to fill in local anaphora, and address control and binding issues; and in effect de-scramble some natural language constructs that might be very hard to directly model with a PEG, before passing their logical forms on to a strictly PEG phase for the final composition of an AST to be fed to a programming language interpreter or compiler.
In effect, the CCG could be invoked in parsing individual statements which might perhaps share some global state reflecting a discourse representation structure to resolve inter statement anaphora, while the PEG would select a canonical reading of each statement and derive the overall parse of the program. This way we could optimize our handling of state by enapsulating stateful aspects of our code in a CCG/discourse-world outside of the PEG proper (somewhat akin to the IO Monad in Haskell).
As a further optimization, the first phase PEG could detect and immediately parse simple statement forms like traditional function calls that wouldn't require CCG processing (as would QNL text containing complex NL forms which by language definition would be reduced to one canonical reading) without invoking the CCG at all. This way the language designer would only pay a minimal CCG overhead when using NL constructs.
But making the final call on the feasibility of such a hybrid approach and determining whether it could help us resolve the ambiguity v. ordering conundrum is a bit outside the ambit of my personal expertise. Likewise, it may be possible to parse these same QNL constructs with a cleverly crafted and highly optimized PEG, although I have yet to find any work trying to apply PEGs to NL text. Other open questions would be whether a PackRat Parser could be used as a component of a CGG implementation and whether a CGG rule type could be embedded in a PEG grammar.
In any case, I would direct anyone curious about CCG's to an excellent tutorial on them that can be found at: <ftp:// ftp.cogsci.ed.ac.uk/pub/steedman/ccg/manifesto.pdf>
There are also some CCG implementations, other papers, and many links available at: <http://groups.inf.ed.ac.uk/ccg/index.html>
Cheers, Peter _______________________________________________ PEG mailing list PEG@lists.csail.mit.edu https://lists.csail.mit.edu/mailman/listinfo/peg