Re: Phonetic search for portuguese
Anyone could help? Thanks 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com: Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? Thanks
Re: Phonetic search for portuguese
On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Anyone could help? Thanks 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com: Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? We did this, in another context, by using the open-source aspell library to handle the spell-checking for us. This has distinct advantages as aspell is well-tested, handles soundslike in a better manner at least IMHO, and supports a wide variety of languages, including Portugese. There are some drawbacks, as aspell only has C/C++ interfaces, and hence we built bindings on top of SWIG. Also, we handled the integration with Solr via a custom filter factory, though there are better ways to do this. Such a project would thus, have dependencies on aspell, and our custom code. If there is interest in this, we would be happy to open source this code: Given our current schedule this could take 2-3 weeks. Regards, Gora
Re: Phonetic search for portuguese
Hi Gora, thanks for the reply. I'm interesting in see how you did this solution. But , my time is not to long and i need to create some solution for my client early. If anyone knows some other simple and fast solution, please post on this thread. Gora, you could talk how you implemented the Custom Filter Factory and how used this on SOLR? Thanks 2012/1/22, Gora Mohanty g...@mimirtech.com: On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos anderson.v...@gmail.com wrote: Anyone could help? Thanks 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com: Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? We did this, in another context, by using the open-source aspell library to handle the spell-checking for us. This has distinct advantages as aspell is well-tested, handles soundslike in a better manner at least IMHO, and supports a wide variety of languages, including Portugese. There are some drawbacks, as aspell only has C/C++ interfaces, and hence we built bindings on top of SWIG. Also, we handled the integration with Solr via a custom filter factory, though there are better ways to do this. Such a project would thus, have dependencies on aspell, and our custom code. If there is interest in this, we would be happy to open source this code: Given our current schedule this could take 2-3 weeks. Regards, Gora
Re: Phonetic search for portuguese
On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi Gora, thanks for the reply. I'm interesting in see how you did this solution. But , my time is not to long and i need to create some solution for my client early. If anyone knows some other simple and fast solution, please post on this thread. What is your time line? I will see if we can expedite the open sourcing of this. Gora, you could talk how you implemented the Custom Filter Factory and how used this on SOLR? [...] That part is quite simple, though it is possible that I have not correctly addressed all issues for a custom FilterFactory. Please see: AspellFilterFactory: http://pastebin.com/jTBcfmd1 AspellFilter:http://pastebin.com/jDDKrPiK The latter loads a java_aspell library that is created by SWIG by setting up Java bindings on top of SWIG, and configuring it for the language of interest. Next, you will need a library that encapsulates various aspell functionality in Java. I am afraid that this is a little long: Suggest: http://pastebin.com/6NrGCVma Finally, you will have to set up the Solr schema to use this filter factory, e.g., one could create a new Solr TextField, where the solr.DoubleMetaphoneFilterFactory is replaced with com.mimirtech.search.solr.analysis.AspellFilterFactory We can discuss further how to set this up, but should probably take that discussion off-list. Regards, Gora
Re: Phonetic search for portuguese
Thanks a lot Gora. I need to delivery the first release for my client on 25 january. With your explanation, i can negociate better the date to delivery of this feature for next month, because i have other business rules for delivery and this features is more complex than i thought. I could help you to shared this solution with solr community. Maybe we can create some component in google code, or something like that, wich any solr user can use. 2012/1/23, Gora Mohanty g...@mimirtech.com: On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Hi Gora, thanks for the reply. I'm interesting in see how you did this solution. But , my time is not to long and i need to create some solution for my client early. If anyone knows some other simple and fast solution, please post on this thread. What is your time line? I will see if we can expedite the open sourcing of this. Gora, you could talk how you implemented the Custom Filter Factory and how used this on SOLR? [...] That part is quite simple, though it is possible that I have not correctly addressed all issues for a custom FilterFactory. Please see: AspellFilterFactory: http://pastebin.com/jTBcfmd1 AspellFilter:http://pastebin.com/jDDKrPiK The latter loads a java_aspell library that is created by SWIG by setting up Java bindings on top of SWIG, and configuring it for the language of interest. Next, you will need a library that encapsulates various aspell functionality in Java. I am afraid that this is a little long: Suggest: http://pastebin.com/6NrGCVma Finally, you will have to set up the Solr schema to use this filter factory, e.g., one could create a new Solr TextField, where the solr.DoubleMetaphoneFilterFactory is replaced with com.mimirtech.search.solr.analysis.AspellFilterFactory We can discuss further how to set this up, but should probably take that discussion off-list. Regards, Gora
Re: Phonetic search for portuguese
On Mon, Jan 23, 2012 at 9:21 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Thanks a lot Gora. I need to delivery the first release for my client on 25 january. With your explanation, i can negociate better the date to delivery of this feature for next month, because i have other business rules for delivery and this features is more complex than i thought. OK.I have ideas on how to improve this solution, but we can take these up at a later stage. We have tested this solution, and I know that it works. I will also be discussing with people here about how soon we can open source this. I could help you to shared this solution with solr community. Maybe we can create some component in google code, or something like that, wich any solr user can use. Yes, I have been meaning to do that forever, but work has been intruding. We will put up something on BitBucket as soon as possible. Regards, Gora
Phonetic search for portuguese
Hi The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex, Caverphone) is only for english language or works for other languages? Have some phonetic filter for portuguese? If dont have, how i can implement this? Thanks