Re: Earley complexity over grammar size

Jeffrey Kegler Thu, 01 Nov 2018 17:36:01 -0700

Forgive me if I'm overexplaining -- you obviously know a lot of these
matters, but any over-explanations will help other readers, so I hope
you'll be patient with them.

Quibble 1: "Complexity" since the 60s has usually referred to results in
terms of Landau (big-O) notation.  These are only relevant for infinite
sets, which do not of course occur in practice.  Nonetheless Big-O has
proved so helpful in practice that its limits are sometimes forgotten.  The
grammar you cite is constant-sized, so in that sense there is no room for a
complexity advantage either way.

Quibble 2: CYK is always cubic (O(n**3)), so that it can have a complexity
advantage only for inputs which Earley/Leo (=Marpa) is cubic, which are
rare in practical use.  But these do occur in NLP.  So to keep this a horse
race, let's assume we're talking about one of the grammars where Earley/Leo
goes cubic.

Foreshadowing: Elsewhere I've listed the 5 algorithms that IMHO might stand
the test of time and survive in some form: regular expressions, Earley/Leo,
recursive descent, PEG and CYK.  So I believe there are some grammars for
which CYK is better than Marpa -- how many and how interesting they are is
another question.

OK, now for handicapping the horse race:

Marpa would have to put a prediction for every one of the rules you list
into one of its Earley sets, but for predictions this can be done by
flipping a bit in a vector -- in a given Earley set, each prediciton
depends only on the rule, which is an integer less than a constant.  Last I
recall, Marpa does not use such a bit vector, but that's only because it's
necessity is not proven -- Marpa seems to be fast enough.  After that 1st
Earley set only the rule whose initial terminal was found in the input will
generate entries in the Earley sets.

CYK would behave similarly but the selection of the terminal would be out
of pairs with every other possible token in the grammar -- if the grammar
is truly large, we are going quadratic on a very large number.  And this
propagates up the tree.

These are, I believe, valid and important point, but admittedly I have
acted as Marpa's lawyer, puffing up my client's case and poking holes in
the case for CYK.  For some grammars, I think a case could be made the
other way, and the verdict of a candid finder of fact might in fact go to
CYK.

I hope this helps, jeffrey

On Thu, Nov 1, 2018 at 8:14 PM emh <[email protected]> wrote:

> In general, it seems that Earley, especially with Marpa enhancements,
> matches or improves on most any parsing algorithm. But its complexity seems
> to be a judged with respect to the length of the string, not the size of
> the grammar.
>
> I’m not great at complexity proofs, and I was wondering if Earley retains
> its performance advantages for very large numbers of rules, over CYK. If I
> understand correctly, it has to add Earley items for every rule, top down,
> and examine them. So for an NLP project I am doing, with many thousands of
> necessarily distinct rules, would its space/time complexity suffer greatly?
>
> For example, if every rule started with a distinct terminal, would Earley
> have to add and scan every one in S(0), while CYK would in constant time
> (and less space) select just the relevant rule(s)? As in:
>
> start: r1 | r2 | r3 | … | r_i
>
> r1: "Print" …
> r2: "Set" …
> r3: "Walk" …
> ….
> r_i: T_i …
>
> Then in S(0) we have to add and scan rules r1 … r_i, while in CYK it would
> immediately be able to select a very small subset of r1 … r_i (in CNF),
> without examining all other rules.
>
> Thanks for your help.
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Earley complexity over grammar size

Reply via email to