Re: Correcting text at index time
Absolutely - I'm always in favor of coming up with additional work for other people to do. -- Jack Krupansky On Wed, Jul 1, 2015 at 6:04 AM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > Honestly, if I had to write a custom UpdateRequestProcessor I would go for > a SynonymUpdateProcessor, taking in input the same Synonim file style > SynonimTokenFilter is using. > > Would be much easier to configure and use it! > > Cheers > > 2015-07-01 2:55 GMT+01:00 Jack Krupansky : > > > You would have to have a separate instance of the update processor, each > > with one of the words. > > > > Or, you could code a JavaScript script with the stateless script update > > processor that has the long list or words and replacements as two arrays > or > > an array of objects, and then iterate through the input value and the > > array. > > > > > > -- Jack Krupansky > > > > On Tue, Jun 30, 2015 at 5:23 PM, hossmaa > > wrote: > > > > > Hi all > > > > > > Thanks for the replies. So there's no getting away from doing it on my > > own > > > then... > > > > > > @Jack: I need to replace a whole list of shortened words... It would > > make a > > > crazy regex (which I incidentally wouldn't even know how to formulate). > > > > > > Cheers > > > A. > > > > > > > > > > > > > > > -- > > > View this message in context: > > > > > > http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > -- > -- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: Correcting text at index time
Honestly, if I had to write a custom UpdateRequestProcessor I would go for a SynonymUpdateProcessor, taking in input the same Synonim file style SynonimTokenFilter is using. Would be much easier to configure and use it! Cheers 2015-07-01 2:55 GMT+01:00 Jack Krupansky : > You would have to have a separate instance of the update processor, each > with one of the words. > > Or, you could code a JavaScript script with the stateless script update > processor that has the long list or words and replacements as two arrays or > an array of objects, and then iterate through the input value and the > array. > > > -- Jack Krupansky > > On Tue, Jun 30, 2015 at 5:23 PM, hossmaa > wrote: > > > Hi all > > > > Thanks for the replies. So there's no getting away from doing it on my > own > > then... > > > > @Jack: I need to replace a whole list of shortened words... It would > make a > > crazy regex (which I incidentally wouldn't even know how to formulate). > > > > Cheers > > A. > > > > > > > > > > -- > > View this message in context: > > > http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: Correcting text at index time
You would have to have a separate instance of the update processor, each with one of the words. Or, you could code a JavaScript script with the stateless script update processor that has the long list or words and replacements as two arrays or an array of objects, and then iterate through the input value and the array. -- Jack Krupansky On Tue, Jun 30, 2015 at 5:23 PM, hossmaa wrote: > Hi all > > Thanks for the replies. So there's no getting away from doing it on my own > then... > > @Jack: I need to replace a whole list of shortened words... It would make a > crazy regex (which I incidentally wouldn't even know how to formulate). > > Cheers > A. > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Correcting text at index time
Hi all Thanks for the replies. So there's no getting away from doing it on my own then... @Jack: I need to replace a whole list of shortened words... It would make a crazy regex (which I incidentally wouldn't even know how to formulate). Cheers A. -- View this message in context: http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4215056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Correcting text at index time
The regex replace processor can be used to do this: https://lucene.apache.org/solr/5_2_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html -- Jack Krupansky On Mon, Jun 29, 2015 at 6:20 PM, Walter Underwood wrote: > Yes, do this in an update request processor before it gets to the analyzer > chain. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > On Jun 29, 2015, at 3:19 PM, Erick Erickson > wrote: > > > Hmmm, very hard to do currently. The _point_ of stored fields is that > > an exact, verbatim > > copy of the input is returned in fl lists and this is violating that > > promise. I suppose some > > kind of custom update processor could work, but it's really "roll your > > own" funcitonality > > I think. > > > > Best, > > Erick > > > > On Mon, Jun 29, 2015 at 8:38 AM, hossmaa > wrote: > >> Hi Markus > >> > >> Thanks for the reply. I'm already using the Synonyms filter and it is > >> working fine (i.e., when I search for "customer", it also returns > documents > >> containing "cst."). > >> What the synonyms filter does not do is to actually replace the word > "cst." > >> with "customer" in the document. > >> > >> Just to be clearer: in the returned results, I do not want to see the > word > >> "cst." any more (it should be permanently replaced with "customer"). I > want > >> to only see the expanded form. > >> > >> Cheers > >> A. > >> > >> > >> > >> -- > >> View this message in context: > http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Correcting text at index time
Yes, do this in an update request processor before it gets to the analyzer chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 29, 2015, at 3:19 PM, Erick Erickson wrote: > Hmmm, very hard to do currently. The _point_ of stored fields is that > an exact, verbatim > copy of the input is returned in fl lists and this is violating that > promise. I suppose some > kind of custom update processor could work, but it's really "roll your > own" funcitonality > I think. > > Best, > Erick > > On Mon, Jun 29, 2015 at 8:38 AM, hossmaa wrote: >> Hi Markus >> >> Thanks for the reply. I'm already using the Synonyms filter and it is >> working fine (i.e., when I search for "customer", it also returns documents >> containing "cst."). >> What the synonyms filter does not do is to actually replace the word "cst." >> with "customer" in the document. >> >> Just to be clearer: in the returned results, I do not want to see the word >> "cst." any more (it should be permanently replaced with "customer"). I want >> to only see the expanded form. >> >> Cheers >> A. >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html >> Sent from the Solr - User mailing list archive at Nabble.com.
Re: Correcting text at index time
Hmmm, very hard to do currently. The _point_ of stored fields is that an exact, verbatim copy of the input is returned in fl lists and this is violating that promise. I suppose some kind of custom update processor could work, but it's really "roll your own" funcitonality I think. Best, Erick On Mon, Jun 29, 2015 at 8:38 AM, hossmaa wrote: > Hi Markus > > Thanks for the reply. I'm already using the Synonyms filter and it is > working fine (i.e., when I search for "customer", it also returns documents > containing "cst."). > What the synonyms filter does not do is to actually replace the word "cst." > with "customer" in the document. > > Just to be clearer: in the returned results, I do not want to see the word > "cst." any more (it should be permanently replaced with "customer"). I want > to only see the expanded form. > > Cheers > A. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html > Sent from the Solr - User mailing list archive at Nabble.com.
RE: Correcting text at index time
Hi Markus Thanks for the reply. I'm already using the Synonyms filter and it is working fine (i.e., when I search for "customer", it also returns documents containing "cst."). What the synonyms filter does not do is to actually replace the word "cst." with "customer" in the document. Just to be clearer: in the returned results, I do not want to see the word "cst." any more (it should be permanently replaced with "customer"). I want to only see the expanded form. Cheers A. -- View this message in context: http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636p4214643.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Correcting text at index time
Hello - why not just use synonyms or StemmerOverrideFilter? Markus -Original message- > From:hossmaa > Sent: Monday 29th June 2015 14:08 > To: solr-user@lucene.apache.org > Subject: Correcting text at index time > > Hi everyone > > I'm wondering if it's possible in Solr to correct text at indexing time, > based on a synonyms-like list. This would be great for expanding undesirable > abbreviations (for example, "cst." instead of "customer"). > I've been searching the Solr docs and the web quite thoroughly I believe, > but haven't found anything to do this. > > I guess if there really isn't anything like this, I could implement it as a > custom Filter... > > Thanks! > A. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Correcting text at index time
Hi everyone I'm wondering if it's possible in Solr to correct text at indexing time, based on a synonyms-like list. This would be great for expanding undesirable abbreviations (for example, "cst." instead of "customer"). I've been searching the Solr docs and the web quite thoroughly I believe, but haven't found anything to do this. I guess if there really isn't anything like this, I could implement it as a custom Filter... Thanks! A. -- View this message in context: http://lucene.472066.n3.nabble.com/Correcting-text-at-index-time-tp4214636.html Sent from the Solr - User mailing list archive at Nabble.com.