Hi, all. 

I'm writing a parser to rewrite end user's search terms to a
particular search engine API.

I'm trying to parse out phrases, defined as any group of words -
quoted or not - that isn't a reserved word (boolean operators mostly).

I can't seem to come up with a grammar that will return the longest
run of words that doesn't contain a boolean.

I've tried adding lookaheads in various spots with no luck -- booleans
are simply parsed as words, not reserved words.

Below is my test app with broken grammar and test phrases -- am I
missing something painfully obvious?

-- 
Neil Kohl
Manager, ACP-ASIM Online
[EMAIL PROTECTED]

#!/usr/local/bin/perl
# $Id: test.pl,v 1.3 2002/10/16 20:04:36 neilk Exp neilk $

use Data::Dumper;
use Parse::RecDescent;
use strict; 
$|++;
$::RD_AUTOACTION = q { [@item[0..$#item]]; };
$::RD_HINT=1;
$::RD_WARN=1;
# $::RD_TRACE=1;

my $grammar =<<'EOG';

query:   phrase (reserved_word phrase)(?)
       | <error>

phrase:   quoted_phrase
        | nekkid_phrase

quoted_phrase:   '"' /[^\"]+/ '"'
               | "'" /[^\']+/ "'"

nekkid_phrase: word(s) ...!reserved_word

reserved_word:  /AND\s+NOT|AND|NOT|OR|NEAR/

word: /[^\s\{\}\(\)]+/

EOG

my $p = new Parse::RecDescent($grammar) or die;
while (<DATA>) {
  chomp;
  print $_, " -> ";
  my $out = $p->query($_);
  print Dumper($out), "\n"; 
}

exit;

__END__
test AND of OR all AND NOT boolean NEAR operators NOT defined 
simple phrase query
"quoted string"
"quoted string's with apostrophe"
'single quoted string'
simple phrase AND "quoted string"
simple phrase AND another phrase
one AND two NEAR three phrases
this OR that AND the other thing NOT that one
broken boolean query AND
single quote's

Reply via email to