Re: Phonetic search for portuguese

2012-01-22 Thread Anderson vasconcelos
Anyone could help?

Thanks

2012/1/20, Anderson vasconcelos anderson.v...@gmail.com:
 Hi

 The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
 Caverphone) is only for english language or works for other languages? Have
 some phonetic filter for portuguese? If dont have, how i can implement
 this?

 Thanks



Re: Phonetic search for portuguese

2012-01-22 Thread Gora Mohanty
On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos
anderson.v...@gmail.com wrote:
 Anyone could help?

 Thanks

 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com:
 Hi

 The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
 Caverphone) is only for english language or works for other languages? Have
 some phonetic filter for portuguese? If dont have, how i can implement
 this?

We did this, in another context, by using the open-source aspell library to
handle the spell-checking for us. This has distinct advantages as aspell
is well-tested, handles soundslike in a better manner at least IMHO, and
supports a wide variety of languages, including Portugese.

There are some drawbacks, as aspell only has C/C++ interfaces, and
hence we built bindings on top of SWIG. Also, we handled the integration
with Solr via a custom filter factory, though there are better ways to do this.
Such a project would thus, have dependencies on aspell, and our custom
code. If there is interest in this, we would be happy to open source this
code: Given our current schedule this could take 2-3 weeks.

Regards,
Gora


Re: Phonetic search for portuguese

2012-01-22 Thread Anderson vasconcelos
Hi Gora, thanks for the reply.

I'm interesting in see how you did this solution. But , my time is not
to long and i need to create some solution for my client early. If
anyone knows some other simple and fast solution, please post on this
thread.

Gora, you could talk how you implemented the Custom Filter Factory and
how used this on SOLR?

Thanks


2012/1/22, Gora Mohanty g...@mimirtech.com:
 On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Anyone could help?

 Thanks

 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com:
 Hi

 The phonetic filters (DoubleMetaphone, Metaphone, Soundex,
 RefinedSoundex,
 Caverphone) is only for english language or works for other languages?
 Have
 some phonetic filter for portuguese? If dont have, how i can implement
 this?

 We did this, in another context, by using the open-source aspell library to
 handle the spell-checking for us. This has distinct advantages as aspell
 is well-tested, handles soundslike in a better manner at least IMHO, and
 supports a wide variety of languages, including Portugese.

 There are some drawbacks, as aspell only has C/C++ interfaces, and
 hence we built bindings on top of SWIG. Also, we handled the integration
 with Solr via a custom filter factory, though there are better ways to do
 this.
 Such a project would thus, have dependencies on aspell, and our custom
 code. If there is interest in this, we would be happy to open source this
 code: Given our current schedule this could take 2-3 weeks.

 Regards,
 Gora



Re: Phonetic search for portuguese

2012-01-22 Thread Gora Mohanty
On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos
anderson.v...@gmail.com wrote:
 Hi Gora, thanks for the reply.

 I'm interesting in see how you did this solution. But , my time is not
 to long and i need to create some solution for my client early. If
 anyone knows some other simple and fast solution, please post on this
 thread.

What is your time line? I will see if we can expedite the open
sourcing of this.

 Gora, you could talk how you implemented the Custom Filter Factory and
 how used this on SOLR?
[...]

That part is quite simple, though it is possible that I have not
correctly addressed all issues for a custom FilterFactory.
Please see:
  AspellFilterFactory: http://pastebin.com/jTBcfmd1
  AspellFilter:http://pastebin.com/jDDKrPiK

The latter loads a java_aspell library that is created by SWIG
by setting up Java bindings on top of SWIG, and configuring
it for the language of interest.

Next, you will need a library that encapsulates various
aspell functionality in Java. I am afraid that this is a little
long:
  Suggest: http://pastebin.com/6NrGCVma

Finally, you will have to set up the Solr schema to use
this filter factory, e.g., one could create a new Solr
TextField, where the solr.DoubleMetaphoneFilterFactory
is replaced with
com.mimirtech.search.solr.analysis.AspellFilterFactory

We can discuss further how to set this up, but should
probably take that discussion off-list.

Regards,
Gora


Re: Phonetic search for portuguese

2012-01-22 Thread Anderson vasconcelos
Thanks a lot Gora.
I need to delivery the first release for my client on 25 january.
With your explanation, i can negociate better the date to delivery of
this feature for next month, because i have other business rules for
delivery and this features is more complex than i thought.
I could help you to shared this solution with solr community. Maybe we
can create some component in google code, or something like that, wich
any solr user can use.

2012/1/23, Gora Mohanty g...@mimirtech.com:
 On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Hi Gora, thanks for the reply.

 I'm interesting in see how you did this solution. But , my time is not
 to long and i need to create some solution for my client early. If
 anyone knows some other simple and fast solution, please post on this
 thread.

 What is your time line? I will see if we can expedite the open
 sourcing of this.

 Gora, you could talk how you implemented the Custom Filter Factory and
 how used this on SOLR?
 [...]

 That part is quite simple, though it is possible that I have not
 correctly addressed all issues for a custom FilterFactory.
 Please see:
   AspellFilterFactory: http://pastebin.com/jTBcfmd1
   AspellFilter:http://pastebin.com/jDDKrPiK

 The latter loads a java_aspell library that is created by SWIG
 by setting up Java bindings on top of SWIG, and configuring
 it for the language of interest.

 Next, you will need a library that encapsulates various
 aspell functionality in Java. I am afraid that this is a little
 long:
   Suggest: http://pastebin.com/6NrGCVma

 Finally, you will have to set up the Solr schema to use
 this filter factory, e.g., one could create a new Solr
 TextField, where the solr.DoubleMetaphoneFilterFactory
 is replaced with
 com.mimirtech.search.solr.analysis.AspellFilterFactory

 We can discuss further how to set this up, but should
 probably take that discussion off-list.

 Regards,
 Gora



Re: Phonetic search for portuguese

2012-01-22 Thread Gora Mohanty
On Mon, Jan 23, 2012 at 9:21 AM, Anderson vasconcelos
anderson.v...@gmail.com wrote:
 Thanks a lot Gora.
 I need to delivery the first release for my client on 25 january.
 With your explanation, i can negociate better the date to delivery of
 this feature for next month, because i have other business rules for
 delivery and this features is more complex than i thought.

OK.I have ideas on how to improve this solution, but
we can take these up at a later stage. We have tested
this solution, and I know that it works. I will also be
discussing with people here about how soon we can
open source this.

 I could help you to shared this solution with solr community. Maybe we
 can create some component in google code, or something like that, wich
 any solr user can use.

Yes, I have been meaning to do that forever, but work has
been intruding. We will put up something on BitBucket as
soon as possible.

Regards,
Gora


Phonetic search for portuguese

2012-01-20 Thread Anderson vasconcelos
Hi

The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
Caverphone) is only for english language or works for other languages? Have
some phonetic filter for portuguese? If dont have, how i can implement this?

Thanks