Do the check _before_ indexing.
Use https://code.google.com/p/language-detection/  to verify the
language of the text document before you put it in the index.

-Glen Newton
http://zzzoot.blogspot.com/

On Mon, Feb 27, 2012 at 10:53 AM, Ilya Zavorin <[email protected]> wrote:
> Suppose I have a bunch of text documents in language X but I index ithem 
> using an analyzer for language Y. Once the index is created, is it possible 
> to perform some sort of simple "sanity" check to see if the original language 
> selection was wrong? I presume I can try searching for some common word in 
> language Y, but I am not sure how reliable this would be. On the other hand, 
> if languages are from the same group, say X and Y are English and Spanish, I 
> should expect that this sanity check would produce a false match. However, I 
> would be happy if it worked reliably enough for languages using different 
> scripts, e.g. Latin vs Cyrillic vs Arabic vs Chinese etc.
>
>
> Thanks much
>
>
>
> Ilya Zavorin



-- 
-
http://zzzoot.blogspot.com/
-

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to