Re: Correcting text at index time

2015-06-30 Thread hossmaa
Hi all

Thanks for the replies. So there's no getting away from doing it on my own
then...

@Jack: I need to replace a whole list of shortened words... It would make a
crazy regex (which I incidentally wouldn't even know how to formulate).

Cheers
A.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Correcting text at index time

2015-06-29 Thread hossmaa
Hi Markus

Thanks for the reply. I'm already using the Synonyms filter and it is
working fine (i.e., when I search for customer, it also returns documents
containing cst.).
What the synonyms filter does not do is to actually replace the word cst.
with customer in the document.

Just to be clearer: in the returned results, I do not want to see the word
cst. any more (it should be permanently replaced with customer). I want
to only see the expanded form.

Cheers
A.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Correcting text at index time

2015-06-29 Thread hossmaa
Hi everyone

I'm wondering if it's possible in Solr to correct text at indexing time,
based on a synonyms-like list. This would be great for expanding undesirable
abbreviations (for example, cst. instead of customer).
I've been searching the Solr docs and the web quite thoroughly I believe,
but haven't found anything to do this.

I guess if there really isn't anything like this, I could implement it as a
custom Filter...

Thanks!
A.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr uima and opennlp

2015-05-28 Thread hossmaa
Hi Tommaso

Thanks for the quick reply! I have another question about using the
Dictionary Annotator, but I guess it's better to post it separately.

Cheers
Andreea



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr and uima dictionary annotator

2015-05-28 Thread hossmaa
Hi everyone

I am using the UIMA DictionaryAnnotator to tag Solr documents. It seems to
be working (I do get tags), but I get some strange behavior:

1. I am using the White Space Tokenizer both for the indexed text and for
creating the dictionary. Most entries in my dictionary consist of multiple
words. From the documentation, it seems that with the default settings, a
document must contain all words in order to match the dictionary entry.
However, this is not the case in practice. I'm seeing documents being
randomly tagged with single words, although my dictionary does not contain
an entry for those single words (they only appear as part of multi word
entries). This would be fine (even preferable), if it were consistent. But
it is not. The tagging happens only for a subset of single words, not for
all. What am I doing wrong?

2. If a dictionary word appears multiple times in the analyzed field, it is
also added just as many times to the mapped field (i.e. my tags). Is there a
way to control/disable this?

Thanks!
Regards
Andreea



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-and-uima-dictionary-annotator-tp4208359.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr uima and opennlp

2015-05-21 Thread hossmaa
Hi everyone 

I'm trying to plug in a new UIMA annotator into solr. What is necessary for
this? Is is enough to build a Jar similarly to the ones from the uima-addons
package? More specifically, are the uima-addona Jars identical to the ones
found in solr's contrib folder? 

Thanks! 
Andreea



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873.html
Sent from the Solr - User mailing list archive at Nabble.com.