https://issues.apache.org/bugzilla/show_bug.cgi?id=49687

--- Comment #44 from Glenn Adams <gl...@skynav.com> 2011-10-20 13:03:06 UTC ---
(In reply to comment #43)
> Created attachment 27822 [details]
> list of Gujarati words and sentences
> 
> As per my exchange with Glenn, I've attached a UTF-8 encoded file that 
> contains
> Gujarati words and sentences. Actually, it's an output of a multiple choice
> quiz from my application. If the data is expected in some other form, let me
> know. Also, let me know if you need more of such data.

Thanks. I'll let you know when I've got the Gujarati support working and have
tried out these word forms. By the way, for Arabic script, i have approximately
85,000 word forms which represents a significant cross section of a number of
corpuses. I would hope to have similar number of word forms for other scripts.

I prefer word forms only for Indic scripts rather than phrases or sentences,
since the latter do not typically use any whitespace between words. So I might
ask you to manually segment your data into word forms only.

G.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

Reply via email to