Hi Tino and Fammie, Due to my mistake in sending the email before, I am not sure whether you have received the email I sent, so I'm sending the email to you again now. Hope you can receive it.
These days, I read the wikipedia description of tokenization and got a general idea of how it works.I also learn some icu syntax every day. At the mean time, I'm also searching for information on how to handle tokenized Unicode vocabularies. Recently I have been reading "further reading"[1] of my proposed project[2], which is about HFST. The code is a bit hard to understand. But my task is "Update lttoolbox to be fully Unicode compliant with regards to medication to alphabetical symbols". May I know exactly how tokenization is implemented in lttoolbox and the specific code that I'm going to update? Regards, Weizhe [1] https://github.com/hfst/hfst/blob/master/tools/src/hfst-tokenize.cc [2] http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation On Thu, Mar 5, 2020 at 12:12 PM 杨伟哲 <gavinwzma...@gmail.com> wrote: > Yes, my code looks very messy this time. Thank you for pointing out my >> shortcomings. >> >> I will spend time reading the code in the extension readings, trying to >> understand the various usages of the syntax in the program, understanding >> the project flow, and getting familiar with the code style. After that, >> I'll modify > > my code. Definitely, I will strive to integrate myself into apertium as >> soon as > > possible. >> >> Many thanks, >> >> Weizhe > > > On Tue, Mar 3, 2020 at 9:33 PM Tino Didriksen <m...@tinodidriksen.com> > wrote: > >> The code for the challenge works. However, it is very far from idiomatic >> C++ - it's more akin to C with Classes. ICU causes a little of this, but >> things like malloc(), #define, and having variables first have no home in >> C++. And how is one supposed to build the code? Also, mixing I/O is >> generally a bad idea. What this says to me is that you've coded a bit of >> C89 before, but no C99 or C++, and not used a build system. >> >> As for what to do next, the wiki pages say what project you're meant to >> extend, both on the main ideas page and the coding challenge page. You even >> quoted that part in your mail. So look at that project's code and see if >> you can understand the flow. >> >> -- Tino Didriksen >> >> >> On Thu, 27 Feb 2020 at 06:45, 杨伟哲 <gavinwzma...@gmail.com> wrote: >> >>> Hi Francis and Flammie, >>> >>> I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC >>> project. And >>> currently I’m writing the proposal. >>> >>> I have completed the code challenge listed in the project, which has >>> been put >>> on Pastebin[2]. However, I’m not quite clear where this project starting >>> with. >>> And I will be much appreciate if you could list somewhere (e.g. GitHub >>> repo >>> related to this project) for me to get started with. I will also try to >>> learn >>> and solve issues there if possible. >>> >>> Bio: I’m Chinese undergraduate in Software Engineering. In my freshman >>> year, I >>> joined the high-performance computing center[3] of the university as a >>> research >>> assistant. Through research and learning during the period, I have a deep >>> understanding of software architecture and open source projects. >>> >>> >>> [1] >>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation >>> >>> [2] https://github.com/GavinWz/Apertium >>> >>> [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm >>> >>> >>> Regards, >>> >>> Weizhe Yang >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff