I wonder if an 'ordinary' regex-based solution might actually do you
better.
But I'd definitely start out looking at the source of the 'chronic' gem,
as Kaspar suggested, to see how they do it -- they do some portion of
what you want to do already, fairly well.
On 12/14/2011 6:20 AM, Nigel Thorne wrote:
Hi Cody..
I have to agree with Kaspar, this isn't what Parslet is for..
However...If I _had_ to get your parser to work..
1/ You are parsing (word | word space word)... Parslet will always
consume the first matching option.. so 'word' will match instead of
'word space word'. Flip them to get 'word' to be the fallback option...
2/ there is ambiguity on the day (as it may be part of the name)... so
I would change the grammar to have "on" as your indicator that you are
starting a date... and 'at' to indicate a time. This would mean you
can't have someone called "on" as their last name... shame. With this
(and you assume your order is always the same... ie event_type,
optional attendee, date, time... ) then you can
rule(:attendee) { (word >> (space >> str('on').absnt? >> word
).repeat ).as(:attendee)}
<code>
require 'rubygems'
require 'parslet'
class SimpleParser < Parslet::Parser
rule(:space){str(" ")}
rule(:word) { match('\w').repeat(1) }
#rule(:attendee) { ((word >> space >> word )| word ).as(:attendee)}
rule(:attendee) { (word >> (space >> str('on').absnt? >> word
).repeat ).as(:attendee)}
rule(:event_type) { ( str('lesson') | str('class') | str('interview')
).as(:event_type) }
rule(:attendance) { str('with') >> space >> attendee}
rule(:stuff) {any.repeat}
rule(:temporality){ str('on') >> space >> stuff.as
<http://stuff.as>(:when) }
rule(:event) { event_type >> space >> attendance >> space >> temporality}
root(:event)
end
@parser = SimpleParser.new
require 'parslet/convenience'
# puts @parser.attendee.parse_with_debug("nigel thorne")
puts @parser.parse_with_debug("lesson with john doe on friday at
2:00pm")
</code>
... something like this
---
"No man is an island... except Philip"
On 13 December 2011 19:32, Kaspar Schiess <[email protected]
<mailto:[email protected]>> wrote:
Hi Cody,
The kind of sentences you're trying to match are highly ambiguous.
Suppose you want to schedule dinner with Monday Doe, named after
the day
she was born? You will have to exclude all week-days from all names in
order to even match names - and then clashes like this one are
programmed to happen.
If you look at the history of parsing, other formalisms than PEG have
been developed for this kind of sentences. They use probabilistic and
heuristic 'interpretation'-type approaches. [1] gives a good overview
and many of these algorithms have been implemented in Ruby.
Parslet implements PEG very slavishly, it doesn't even do whitespace
ignore or left recursion. This is going to stay that way. PEG is very
good (IMHO) for computer languages. It avoids many ambiguities by
posing
a formalism that doesn't allow them, which appeals to me because it is
elegant.
So that means: Parsing natural language with parslet will make you
unhappy, folks! Cody, the common strategy for parsing things like this
is to use a bottom up parser, LR(k) or the like. There are very
elegant
natural language frameworks out there as well. And last (but .. not
least) there is chronic[2], which does part of what you want...
regards,
kaspar
[1] http://en.wikipedia.org/wiki/Parser
[2] http://rubygems.org/gems/chronic
[3] http://duckduckgo.com/?q=natural+language+ruby