First off Karl, thanks for your reply and your time.
karl wettin-3 wrote: > > One could also say you are classifying your data based on keywords in > the text? > I probably didn't explain myself very well or more specifically provide a good example. In my case, there really isn't any relationship between the mapped terms per document. That is to say that an individual term or phrase in the document is mapped to a concrete concept in a controlled vocabulary. The concept doesn't represent a class of anything and no relationship exists between the concepts. They would never be grouped by any means. It is more a matter of replacing some arbitrary word or phrase with an adjudicated version. The example I gave did in fact use classifications for the terms, but that is not exactly the point that I was trying to convey. I suppose a better example would be where each term or phrase in the sentence mapped to any equivilent in another language: dog -> canis dog -> cane dog -> chien So that if you searched for "canis", then any document with "dog" would be returned (unless the context inferred that dog meant something else). By the same token, if the text was "here we go" or "let's go", then it may map to "vamos" or "vamonos". To confuse matters more, it is not really a matter of synonyms, as the orginal term is discarded from the index and there is only one mapped term per original term or phrase and the algorithm determines the controlled meaning from the context. karl wettin-3 wrote: > > > You can always store values in a field, but the term and the stored > value is not coupled. Thus you would need to store the positions per > document in each field in machine readable format you then parse: > > doc.addField("f", "keyword:12,32;54,32", Field.Store.YES, .. > > But that is a way expensive solution. > Indeed, though doesn't a analyzed field have some other information attached to it? Forgive me if this is a naive question. I am fairly new to Lucene. karl wettin-3 wrote: > > > This is known as faceted classification. > > <http://en.wikipedia.org/wiki/Faceted_classification> > <http://www.nabble.com/forum/Search.jtp?query=facets&local=y&forum=44> > Again, I am not overly familiar with these disciplines, but I always thought of facets as a organizational strategy. As I said, my example betrayed me a bit, as I am not that interested in organizing these documents, rather providing a controlled vocabulary from which to search as opposed to any random text. karl wettin-3 wrote: > > Are you aware of the hightlighter contrib module? > > http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/ > > The simplest solution is to a new facet Term per classification in text > and use the text start and end positions of the text field, and have the > hightligher to load the text and highlight this text field. > This is actually not a web based application and the highlighting would really only be used for analyzing performance of the mapping algorithms. The main issue is that we do need to be able to provide the location of the original term for each mapped keyword. karl wettin-3 wrote: > > Matching a document with the same terms occuring multiple times will > cause a greater score than it only occuring once. This is probably > problematic for you. > It may not be that big of an issue. karl wettin-3 wrote: > > Instead you could add a single Term, ignore the built in positions and > store them for all positions in the payload of that single Term. > > > for (String facet : facets) { > doc.addField( > "f", new SingleTokenTokenStream( > facet, new Payload(offsets.toByteArray()) > ) > ); > } > > (This is dry coded, you will need to implement some of them things.) > > You also need to modify the highligher so it can read this data. > > Something like this seems like it might work well for my purposes. I will look at this further. Thanks again, J --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/storing-position---keyword-tp15855654p15865186.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]