Yeah, I suspected that it was going to be expert/cryptic. I think the real point, from my September proposal, was that we need a common piece of code that has all that expert smarts to reconstruct the "graph" and then can generate the Lucene Query structure that will match that reconstructed graph. It would also be great to have clear, detailed doc of the linear graph encoding, but it is the code to reconstitute the graph that is the real goal.

And, of course, we need Synonym filter to do the full graph encoding. Last time I looked (September), the PosLenAtt didn't seem to have a value that I could make sense of in terms of the graph (I think it always had the same value), but maybe I just wasn't interpreting it according to the undocumented rules.

I think a fair number of people want to be able to do query-time synonyms, so I wouldn't classify that as an "expert" use case, but I agree that dealing with graph encoding/decoding is more of an expert chore. Although, in the end, it is really just a small number of people who work on query parsers who actually have a "need to know" about synonym graphs.

-- Jack Krupansky

-----Original Message----- From: Michael McCandless
Sent: Saturday, January 26, 2013 6:31 AM
To: dev@lucene.apache.org
Subject: Re: Fixing query-time multi-word synonym issue

On Fri, Jan 25, 2013 at 6:28 PM, Jack Krupansky <j...@basetechnology.com> wrote:

Is there a decent writeup on PositionLengthAttribute? I mean, the Javadoc
says "The positionLength determines how many positions this token spans",
which doesn't sound very relevant to multi-term synonyms that span multiple
positions.

I don't think we have good enough javadocs around PosLenAtt ... we
really should add a simple graph examples.  Though, it is a very
expert topic ...

The way to think of PosIncAtt is that it tells you the start node of
this arc (= token), while PosLenAtt tells you the end node, except
both atts are "delta coded", making it hard to think about.
Furthermore, all arcs (tokens) must be emitted "in order", ie smaller
numbered nodes (positions) must come first, and all arcs leaving each
node must be enumerated before you go to the next node.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to