For development purposes I need the ability in lucene to normalize ancient greek characters for al the cases of grammatical details such as accents, diacritics and so on.

My need is to retrieve ancient greek words with accents and other grammatical details by the input of the string without accents.

For example the input of οργανον (organon) should to retrieve also Ὄργανον,


I am not a lucene commiter and I a new to this so my question is about the best practice to implement this in Lucene, and possibile submit a commit proposal to Lucene A project management committee.

I have made some searches and found this file in Lucene-soir:


It contains normalization for some chars.
My thought would be to add extra normalization here, including all unicode ancient greek chars with all grammatical details. I already have all the unicode values for that chars so It should not be difficult for me to include them

If my understanding is correct, this should add to lucene the features described above.


As I am new to this, my needs are:

1.   To be sure that this is the correct place in Lucene for doing
   normalization
2. How to post commit proposal


Any help appreciated

Kind regards

Paolo

Reply via email to