Hi Tino and Fammie,

Due to my mistake in sending the email before, I am not sure whether you
have
received the email I sent, so I'm sending the email to you again now. Hope
you can
receive it.

These days, I read the wikipedia description of tokenization and got a
general idea
of how it works.I also learn some icu syntax every day. At the mean time,
I'm also
searching for information on how to handle tokenized Unicode vocabularies.

Recently I have been reading "further reading"[1] of my proposed
project[2], which
is about HFST. The code is a bit hard to understand. But my task is "Update
lttoolbox to be fully Unicode compliant with regards to medication to
alphabetical
symbols". May I know exactly how tokenization is implemented in lttoolbox
and the
specific code that I'm going to update?

Regards,

Weizhe

[1] https://github.com/hfst/hfst/blob/master/tools/src/hfst-tokenize.cc

[2]
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation

On Thu, Mar 5, 2020 at 12:12 PM 杨伟哲 <gavinwzma...@gmail.com> wrote:

> Yes, my code looks very messy this time. Thank you for pointing out my
>> shortcomings.
>>
>> I will spend time reading the code in the extension readings, trying to
>> understand the various usages of the syntax in the program, understanding
>> the project flow, and getting familiar with the code style. After that,
>> I'll modify
>
> my code. Definitely, I will strive to integrate myself into apertium as
>> soon as
>
> possible.
>>
>> Many thanks,
>>
>> Weizhe
>
>
> On Tue, Mar 3, 2020 at 9:33 PM Tino Didriksen <m...@tinodidriksen.com>
> wrote:
>
>> The code for the challenge works. However, it is very far from idiomatic
>> C++ - it's more akin to C with Classes. ICU causes a little of this, but
>> things like malloc(), #define, and having variables first have no home in
>> C++. And how is one supposed to build the code? Also, mixing I/O is
>> generally a bad idea. What this says to me is that you've coded a bit of
>> C89 before, but no C99 or C++, and not used a build system.
>>
>> As for what to do next, the wiki pages say what project you're meant to
>> extend, both on the main ideas page and the coding challenge page. You even
>> quoted that part in your mail. So look at that project's code and see if
>> you can understand the flow.
>>
>> -- Tino Didriksen
>>
>>
>> On Thu, 27 Feb 2020 at 06:45, 杨伟哲 <gavinwzma...@gmail.com> wrote:
>>
>>> Hi Francis and Flammie,
>>>
>>> I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC
>>> project. And
>>> currently I’m writing the proposal.
>>>
>>> I have completed the code challenge listed in the project, which has
>>> been put
>>> on Pastebin[2]. However, I’m not quite clear where this project starting
>>> with.
>>> And I will be much appreciate if you could list somewhere (e.g. GitHub
>>> repo
>>> related to this project) for me to get started with. I will also try to
>>> learn
>>> and solve issues there if possible.
>>>
>>> Bio: I’m Chinese undergraduate in Software Engineering. In my freshman
>>> year, I
>>> joined the high-performance computing center[3] of the university as a
>>> research
>>> assistant. Through research and learning during the period, I have a deep
>>> understanding of software architecture and open source projects.
>>>
>>>
>>> [1]
>>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
>>>
>>> [2] https://github.com/GavinWz/Apertium
>>>
>>> [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm
>>>
>>>
>>> Regards,
>>>
>>> Weizhe Yang
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to