Dear Marvin,

I am sorry: it's my bad. While answering your questions below, I found I have made a mistake in my test code to pin down my problem. You asked how I build the query and this made have another look at how I instantiate the QueryParser object. I selected the wrong fields. I may have done something similar in the real code. I will check this tomorrow. Sorry to have bothered you.

Thanks for asking the right questions and telling me how it should work.

Kind regards,
Arjan.

Reason for my question is this. If I index a single document of a single
line that contains words separated by hyphens, I can retrieve that
document by any word, but the words separated by hyphens nor the whole
phrase including the hyphens.

For example I index a single document with only this sentence

"please subscribe to this mailing-list"

I can retrieve this document by searching for "please" or "subscribe" or
"please subscribe", but not by searching for "mailing-list" or "mailing"
or "list".
I'm confused -- there seems to be a contradiction between your ability to
retrieve the document "by any word", and your inability to retrieve the
document by searching for "mailing" or "list".

Can you please clarify what you get when you search for "mailing"?
I can retrieve the document by "please", "subscribe", "to" and "this" but not by "mailing", "list" or "mailing-list". So if I search for mailing, I get zero hits.
It seems that the words "mailing" and "list" are treated as separate
words, since both "mailing" and "list" can be found in the lexicon.
They're in the lexicon?  Do you mean that you've gone all the way down into
Lucy::Index::Lexicon, or something else?
Yes, like so:
my $polyreader = Lucy::Index::IndexReader->open(
        index => $env->message_storage,
    );
my $seg_readers = $polyreader->seg_readers;

foreach my $seg_reader ( @$seg_readers ) {
    say "segment: $seg_reader";
    my $lex_reader = $seg_reader->obtain( "Lucy::Index::LexiconReader" );
    my $lexicon    = $lex_reader->lexicon( field => 'title' );

    while ( $lexicon->next ) {
        say encode( 'utf8', $lexicon->get_term );
    }
}
Any help would be appreciated, or is this a bug?
How are you building/executing the query?
Ohhhhh....
What does the FieldType assigned to the field in question look like?

For common Analyzer configurations, Lucy's QueryParser is supposed to parse
hyphenated constructs as phrases -- so these should all produce the same
results:

     "mailing list"
     "mailing-list"
     mailing-list

Similarly, these should all produce the same results:

     "please subscribe"
     "please-subscribe"
     please-subscribe

It might be interesting to know whether those work as expected.

Best,

Marvin Humphrey



--
Recent: http://www.lomcongres.nl/
Congres- en nieuwsbriefportaal met relatiebeheer systeem voor het Landelijk 
Overleg Milieuhandhaving

Setting Standards, a a Delft University of Technology and United Knowledge 
simulation exercise on strategy and cooperation in standardization, 
http://www.setting-standards.com

United Knowledge, internet voor de publieke sector
Keizersgracht 74
1015 CT Amsterdam
T +31 (0)20 52 18 300
F +31 (0)20 52 18 301
[email protected]
http://www.unitedknowledge.nl

M +31 (0)6 2427 1444
E [email protected]

Bezoek onze site op:
http://www.unitedknowledge.nl

Of bekijk een van onze projecten:
http://www.handhavingsportaal.nl/
http://www.setting-standards.com/
http://www.lomcongres.nl/
http://www.clubvanmaarssen.org/



Reply via email to