Re: Correcting text at index time
Hi all Thanks for the replies. So there's no getting away from doing it on my own then... @Jack: I need to replace a whole list of shortened words... It would make a crazy regex (which I incidentally wouldn't even know how to formulate). Cheers A. -- View this message in context: http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Correcting text at index time
Hi Markus Thanks for the reply. I'm already using the Synonyms filter and it is working fine (i.e., when I search for customer, it also returns documents containing cst.). What the synonyms filter does not do is to actually replace the word cst. with customer in the document. Just to be clearer: in the returned results, I do not want to see the word cst. any more (it should be permanently replaced with customer). I want to only see the expanded form. Cheers A. -- View this message in context: http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html Sent from the Solr - User mailing list archive at Nabble.com.
Correcting text at index time
Hi everyone I'm wondering if it's possible in Solr to correct text at indexing time, based on a synonyms-like list. This would be great for expanding undesirable abbreviations (for example, cst. instead of customer). I've been searching the Solr docs and the web quite thoroughly I believe, but haven't found anything to do this. I guess if there really isn't anything like this, I could implement it as a custom Filter... Thanks! A. -- View this message in context: http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr uima and opennlp
Hi Tommaso Thanks for the quick reply! I have another question about using the Dictionary Annotator, but I guess it's better to post it separately. Cheers Andreea -- View this message in context: http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html Sent from the Solr - User mailing list archive at Nabble.com.
solr and uima dictionary annotator
Hi everyone I am using the UIMA DictionaryAnnotator to tag Solr documents. It seems to be working (I do get tags), but I get some strange behavior: 1. I am using the White Space Tokenizer both for the indexed text and for creating the dictionary. Most entries in my dictionary consist of multiple words. From the documentation, it seems that with the default settings, a document must contain all words in order to match the dictionary entry. However, this is not the case in practice. I'm seeing documents being randomly tagged with single words, although my dictionary does not contain an entry for those single words (they only appear as part of multi word entries). This would be fine (even preferable), if it were consistent. But it is not. The tagging happens only for a subset of single words, not for all. What am I doing wrong? 2. If a dictionary word appears multiple times in the analyzed field, it is also added just as many times to the mapped field (i.e. my tags). Is there a way to control/disable this? Thanks! Regards Andreea -- View this message in context: http://lucene.472066.n3.nabble.com/solr-and-uima-dictionary-annotator-tp4208359.html Sent from the Solr - User mailing list archive at Nabble.com.
solr uima and opennlp
Hi everyone I'm trying to plug in a new UIMA annotator into solr. What is necessary for this? Is is enough to build a Jar similarly to the ones from the uima-addons package? More specifically, are the uima-addona Jars identical to the ones found in solr's contrib folder? Thanks! Andreea -- View this message in context: http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873.html Sent from the Solr - User mailing list archive at Nabble.com.