I just had a talk regarding this with Golam Mortaza Bhai, pasting that for future references :)
(05:52:23 PM) [email protected]/HomeC8631CA7: I've mailed you regarding an issue betten 'ত্ and 'ৎ', if you get the time, plase feel free to answer (05:52:25 PM) Golam Mortuza Hossain: I mean I got (05:52:30 PM) [email protected]/HomeC8631CA7: cool (05:52:34 PM) Golam Mortuza Hossain: Please (05:52:42 PM) Golam Mortuza Hossain: follow "ৎ" (05:53:26 PM) Golam Mortuza Hossain: Khanda-Ta as a separate glyph is now Unicode standard (05:54:03 PM) Golam Mortuza Hossain: which wasn't the case earlier (05:54:41 PM) [email protected]/HomeC8631CA7: I was following ৎ all this time, but came across some sites that have ত্ and the fact that in unicode character set ৎ has a comment like this "a dead consonant form of ta, without implicit vowel, used in some sequences", that why I thought I consult you (05:55:48 PM) Golam Mortuza Hossain: the reason for this, earlier there was no glyph for "Khanda-Ta" in Unicode (05:55:59 PM) [email protected]/HomeC8631CA7: yeah I know (05:57:03 PM) Golam Mortuza Hossain: If you want to make it backward compatible then (05:57:23 PM) Golam Mortuza Hossain: you could consider mapping "ত্" (05:57:31 PM) Golam Mortuza Hossain: to "ৎ" (05:57:40 PM) Golam Mortuza Hossain: But it could be tricky (05:58:57 PM) [email protected]/HomeC8631CA7: yeah (05:59:07 PM) [email protected]/HomeC8631CA7: I know, I tried a bit (05:59:36 PM) Golam Mortuza Hossain: :-) (06:01:17 PM) [email protected]/HomeC8631CA7: we might need to build a table for that, for eg. ত্ক - ৎক its always like that isn't it, but we can't map like it in উত্তর (06:01:36 PM) [email protected]/HomeC8631CA7: so we might need a to check all these :( (06:02:32 PM) Golam Mortuza Hossain: If I remember correctly then sometime people also (06:02:42 PM) Golam Mortuza Hossain: used ZWNJ after Halant (06:02:51 PM) [email protected]/HomeC8631CA7: yeah (06:03:03 PM) [email protected]/HomeC8631CA7: I've seen that too (06:03:21 PM) Golam Mortuza Hossain: this case should be easy (06:04:30 PM) Golam Mortuza Hossain: also when it appears just before "," , ":", "।", "?", " " etc. (06:04:44 PM) [email protected]/HomeC8631CA7: am alreay running the source text through a normalizer right now, becase ড় - ড + nukta, we sometimes get text in the complex form and the parser gets confused (06:04:54 PM) [email protected]/HomeC8631CA7: aha (06:05:23 PM) Golam Mortuza Hossain: yeah I see (06:06:50 PM) [email protected]/HomeC8631CA7: so you think its do-able right? (06:07:22 PM) Golam Mortuza Hossain: no (06:07:52 PM) [email protected]/HomeC8631CA7: btw, could I paste this conversation in the group just as a reference for the others? (06:09:11 PM) Golam Mortuza Hossain: In some cases unambiguous mapping may not be possible (06:09:16 PM) Golam Mortuza Hossain: Yeah, sure (06:13:37 PM) Golam Mortuza Hossain: My suggestion would be handle only "ৎ" in the engine. (06:15:28 PM) Golam Mortuza Hossain: If needed then mapping should be done in text pre-parser. (06:16:21 PM) Golam Mortuza Hossain: In the long term "ত্" appearance will go away! (06:16:30 PM) [email protected]/HomeC8631CA7: I agree -- Regards Abu Zaher Md. Faridee http://zaher14.blogspot.com/ http://sourceforge.net/projects/apertium/ --- Time heals every wound, but time itself is a wound that never heals. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Bengalinux-core mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bengalinux-core
