I wonder if an 'ordinary' regex-based solution might actually do you better.

But I'd definitely start out looking at the source of the 'chronic' gem, as Kaspar suggested, to see how they do it -- they do some portion of what you want to do already, fairly well.

On 12/14/2011 6:20 AM, Nigel Thorne wrote:
Hi Cody..

I have to agree with Kaspar, this isn't what Parslet is for..

However...If I _had_ to get your parser to work..

1/ You are parsing (word | word space word)... Parslet will always consume the first matching option.. so 'word' will match instead of 'word space word'. Flip them to get 'word' to be the fallback option...

2/ there is ambiguity on the day (as it may be part of the name)... so I would change the grammar to have "on" as your indicator that you are starting a date... and 'at' to indicate a time. This would mean you can't have someone called "on" as their last name... shame. With this (and you assume your order is always the same... ie event_type, optional attendee, date, time... ) then you can

rule(:attendee) { (word >> (space >> str('on').absnt? >> word ).repeat ).as(:attendee)}

<code>
require 'rubygems'
require 'parslet'

class SimpleParser < Parslet::Parser
rule(:space){str(" ")}
rule(:word)  { match('\w').repeat(1) }
#rule(:attendee) { ((word >> space >> word )| word ).as(:attendee)}
rule(:attendee) { (word >> (space >> str('on').absnt? >> word ).repeat ).as(:attendee)} rule(:event_type) { ( str('lesson') | str('class') | str('interview') ).as(:event_type) }
rule(:attendance) { str('with') >> space >> attendee}
rule(:stuff) {any.repeat}
rule(:temporality){ str('on') >> space >> stuff.as <http://stuff.as>(:when) }
rule(:event) { event_type >> space >> attendance >> space >> temporality}
root(:event)
end

@parser  = SimpleParser.new


 require 'parslet/convenience'
# puts @parser.attendee.parse_with_debug("nigel thorne")
puts @parser.parse_with_debug("lesson with john doe on friday at 2:00pm")
</code>

... something like this


---
"No man is an island... except Philip"


On 13 December 2011 19:32, Kaspar Schiess <[email protected] <mailto:[email protected]>> wrote:

    Hi Cody,

    The kind of sentences you're trying to match are highly ambiguous.
    Suppose you want to schedule dinner with Monday Doe, named after
    the day
    she was born? You will have to exclude all week-days from all names in
    order to even match names - and then clashes like this one are
    programmed to happen.

    If you look at the history of parsing, other formalisms than PEG have
    been developed for this kind of sentences. They use probabilistic and
    heuristic 'interpretation'-type approaches. [1] gives a good overview
    and many of these algorithms have been implemented in Ruby.

    Parslet implements PEG very slavishly, it doesn't even do whitespace
    ignore or left recursion. This is going to stay that way. PEG is very
    good (IMHO) for computer languages. It avoids many ambiguities by
    posing
    a formalism that doesn't allow them, which appeals to me because it is
    elegant.

    So that means: Parsing natural language with parslet will make you
    unhappy, folks! Cody, the common strategy for parsing things like this
    is to use a bottom up parser, LR(k) or the like. There are very
    elegant
    natural language frameworks out there as well. And last (but .. not
    least) there is chronic[2], which does part of what you want...

    regards,
    kaspar

    [1] http://en.wikipedia.org/wiki/Parser
    [2] http://rubygems.org/gems/chronic
    [3] http://duckduckgo.com/?q=natural+language+ruby



Reply via email to