Re: switching to different parser in Pig

2009-02-24 Thread pi song
 (1) Lack of good documentation which makes it hard to and time consuming
to learn javacc and make changes to Pig grammar
== ANTLR is very very well documented.
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
http://media.pragprog.com/titles/tpantlr/toc.pdf
http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home

(2) No easy way to customize error handling and error messages
== ANTLR has very extensive error handling support
http://media.pragprog.com/titles/tpantlr/errors.pdf

(3) Single path that performs both tokenizing and parsing
== What is the advantage of decoupling tokenizer and parsing ?

In addition, Composite Grammar is very useful for keeping the parser
modular. Things that can be treated as sub-languages such as bag schema
definition can be done and unit tested separately.

ANTLRWorks http://www.antlr.org/works/index.html
http://www.antlr.org/works/index.htmlalso
makes grammar development very efficient. Think about IDE that helps you
debug your code (which is grammar).

One question, is there any use case for branching and loops? The current Pig
is more like a query (declarative) language. I don't really see how loop
constructs would fit. I think what Ted mentioned is more embedding Pig in
other languages and use those languages to do loops.

We should think about how the logical plan layer can be made simpler for
external use so don't have to introduce a new layer. Is there any major
active development on it? Currently I have more spare time and should be
able to help out. (BTW, I'm slow because this is just my hobby. I don't want
to drag you guys)

Pi Song

On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia niteshbhatia...@gmail.comwrote:

 Hi
 I got this info from javacc mailing lists. This may prove helpful:


 
 -Original Message- From: Ken Beesley
 [mailto:ken@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56
 PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello All)

 Vicas wrote:

 Hello All

 Kindly let me know other parsers available which does the same job as
 javacc.

 It would be very nice of you if you can send me some documentation
 related to this.

 Thanks Vikas

 (Correction and clarifications to the following would be _very_
 welcome. I'm very likely out of date.)

 Of course, no two software tools are likely to do _exactly_ the same
 job. Someone already pointed you to ANTLR, which is probably the
 best-known alternative to JavaCC. Another possibility is SableCC.
 http://sablecc.org

 The criteria include stability, documentation, language of the parser
 generated, and abstract-syntax-tree building.

 When I last looked (a couple of years ago) at ANTLR, SableCC and
 JavaCC, I chose JavaCC for the following reasons:

 1. ANTLR could not handle Unicode input. Things change, of course, so
 ANTLR might now be more Unicode-friendly. Unicode was important to me,
 so this was a big factor in my decision.

 On the plus side for ANTLR, it has better abstract-syntax-tree
 building capabilities (in my opinion) than JJTree/JavaCC. You can
 learn to use JJTree commands, but it's not easy for most people.

 And ANTLR can generate either a Java or a C++ parser. JavaCC generates
 only Java parsers.

 Another concern about ANTLR was that it was reputed to change a lot as
 the guru, Terence Parr, experimented with new syntax and
 functionality. JavaCC, at least at the time, was reputed to be more
 stable, perhaps stable to a fault. I wanted stability and reliability.

 2. SableCC is much like JavaCC; it generates a Java parser from a
 grammar description; but it had, in my opinion, less flexible
 abstract-syntax-tree building than JJTree/JavaCC. In SableCC (when I
 looked at it), the AST it built was always a direct reflection of your
 grammar, generating one tree node for each grammar expansion involved
 in a parse, much like using JavaCC with Java Tree Builder (JTB
 http://www.cs.purdue.edu/jtb/). When using JavaCC, JTB is the
 alternative to using JJTree.

 Using SableCC, or the combination JavaCC/JTB, should be _very_ similar
 indeed.

 In my opinion, SableCC and JavaCC/JTB have made a conscious choice to
 simplify AST building--you get trees that reflect the expansions in
 your grammar. Period. But often these default trees will be big, full
 of extraneous nodes that reflect precedence hierarchies in the
 recursive-descent parsing. If you want to have more control over AST
 building, to get more compact and tailored ASTs, you need to pay the
 price of learning JJTree.

 Assuming that you need to build ASTs, with JavaCC you have the choice
 between JJTree and JTB. With SableCC, when I last looked at it, you
 only get the JTB-like option.

 ***

 (Again, corrections and expansions would be much appreciated.)

 Ken Beesley

Re: switching to different parser in Pig

2009-02-20 Thread pi song
Sounds good but how about exposing the logical plan layer instead? Wouldn't
that yield the same effect?  From python for example you still can construct
a logical plan and give to Pig to execute.
On Wed, Feb 18, 2009 at 10:07 AM, Ted Dunning ted.dunn...@gmail.com wrote:

 2009/2/17 Alan Gates ga...@yahoo-inc.com

  [not commenting on the switch, only on the exposure of AST's] Is that
  correct?
 

 Nearly so.


  So whether we switch parsing technologies or not is not of interest to
 you,
  only the interfaces we expose?
 

 I would think that switching parsing technologies would encourage creation
 of a better AST interface layer which further my goal of getting to the
 AST's for other purposes.  I also think that exposing the AST layer would
 further your goal of switching parser technology by allowing outsiders to
 contribute parsers that you might ultimately like better.

 So I do see a linkage and do support switching.

 +1 to switching parsers (and thus making switching easier)