Re: [Apertium-stuff] Willingness to participate in the project

2020-03-16 Thread 杨伟哲
Thanks so much!

I once visited the repo of lttoolbox and read the source code of
lt-proc.cc,
lt-comp.cc, lt-expand.cc, etc. But at that time, I was not sure whether it
was
the code I needed, so I only read it roughly. But I still remember their
location
in the repository. Now I'll look more closely and try to find out the
specific code
that implements tokenization and where it fits into the ICU. I think this
will help
improve my proposal.

Sincerely,

Weizhe

On Mon, Mar 16, 2020 at 11:44 PM Tino Didriksen 
wrote:

> It's somewhere in https://github.com/apertium/lttoolbox - I don't know
> the exact location.
>
> The entrypoint that does tokenization is lt-proc, so start from lt-proc.cc
> and trace execution to somewhere that does tokenization. That's also a good
> way to learn the codebase.
>
> -- Tino Didriksen
>
>
> On Mon, 16 Mar 2020 at 16:00, 杨伟哲  wrote:
>
>> Hi Tino and Fammie,
>>
>> Due to my mistake in sending the email before, I am not sure whether you
>> have
>> received the email I sent, so I'm sending the email to you again now.
>> Hope you can
>> receive it.
>>
>> These days, I read the wikipedia description of tokenization and got a
>> general idea
>> of how it works.I also learn some icu syntax every day. At the mean time,
>> I'm also
>> searching for information on how to handle tokenized Unicode vocabularies.
>>
>> Recently I have been reading "further reading"[1] of my proposed
>> project[2], which
>> is about HFST. The code is a bit hard to understand. But my task is
>> "Update
>> lttoolbox to be fully Unicode compliant with regards to medication to
>> alphabetical
>> symbols". May I know exactly how tokenization is implemented in lttoolbox
>> and the
>> specific code that I'm going to update?
>>
>> Regards,
>>
>> Weizhe
>>
>> [1] https://github.com/hfst/hfst/blob/master/tools/src/hfst-tokenize.cc
>>
>> [2]
>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-16 Thread Tino Didriksen
It's somewhere in https://github.com/apertium/lttoolbox - I don't know the
exact location.

The entrypoint that does tokenization is lt-proc, so start from lt-proc.cc
and trace execution to somewhere that does tokenization. That's also a good
way to learn the codebase.

-- Tino Didriksen


On Mon, 16 Mar 2020 at 16:00, 杨伟哲  wrote:

> Hi Tino and Fammie,
>
> Due to my mistake in sending the email before, I am not sure whether you
> have
> received the email I sent, so I'm sending the email to you again now. Hope
> you can
> receive it.
>
> These days, I read the wikipedia description of tokenization and got a
> general idea
> of how it works.I also learn some icu syntax every day. At the mean time,
> I'm also
> searching for information on how to handle tokenized Unicode vocabularies.
>
> Recently I have been reading "further reading"[1] of my proposed
> project[2], which
> is about HFST. The code is a bit hard to understand. But my task is
> "Update
> lttoolbox to be fully Unicode compliant with regards to medication to
> alphabetical
> symbols". May I know exactly how tokenization is implemented in lttoolbox
> and the
> specific code that I'm going to update?
>
> Regards,
>
> Weizhe
>
> [1] https://github.com/hfst/hfst/blob/master/tools/src/hfst-tokenize.cc
>
> [2]
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-16 Thread 杨伟哲
Hi Tino and Fammie,

Due to my mistake in sending the email before, I am not sure whether you
have
received the email I sent, so I'm sending the email to you again now. Hope
you can
receive it.

These days, I read the wikipedia description of tokenization and got a
general idea
of how it works.I also learn some icu syntax every day. At the mean time,
I'm also
searching for information on how to handle tokenized Unicode vocabularies.

Recently I have been reading "further reading"[1] of my proposed
project[2], which
is about HFST. The code is a bit hard to understand. But my task is "Update
lttoolbox to be fully Unicode compliant with regards to medication to
alphabetical
symbols". May I know exactly how tokenization is implemented in lttoolbox
and the
specific code that I'm going to update?

Regards,

Weizhe

[1] https://github.com/hfst/hfst/blob/master/tools/src/hfst-tokenize.cc

[2]
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation

On Thu, Mar 5, 2020 at 12:12 PM 杨伟哲  wrote:

> Yes, my code looks very messy this time. Thank you for pointing out my
>> shortcomings.
>>
>> I will spend time reading the code in the extension readings, trying to
>> understand the various usages of the syntax in the program, understanding
>> the project flow, and getting familiar with the code style. After that,
>> I'll modify
>
> my code. Definitely, I will strive to integrate myself into apertium as
>> soon as
>
> possible.
>>
>> Many thanks,
>>
>> Weizhe
>
>
> On Tue, Mar 3, 2020 at 9:33 PM Tino Didriksen 
> wrote:
>
>> The code for the challenge works. However, it is very far from idiomatic
>> C++ - it's more akin to C with Classes. ICU causes a little of this, but
>> things like malloc(), #define, and having variables first have no home in
>> C++. And how is one supposed to build the code? Also, mixing I/O is
>> generally a bad idea. What this says to me is that you've coded a bit of
>> C89 before, but no C99 or C++, and not used a build system.
>>
>> As for what to do next, the wiki pages say what project you're meant to
>> extend, both on the main ideas page and the coding challenge page. You even
>> quoted that part in your mail. So look at that project's code and see if
>> you can understand the flow.
>>
>> -- Tino Didriksen
>>
>>
>> On Thu, 27 Feb 2020 at 06:45, 杨伟哲  wrote:
>>
>>> Hi Francis and Flammie,
>>>
>>> I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC
>>> project. And
>>> currently I’m writing the proposal.
>>>
>>> I have completed the code challenge listed in the project, which has
>>> been put
>>> on Pastebin[2]. However, I’m not quite clear where this project starting
>>> with.
>>> And I will be much appreciate if you could list somewhere (e.g. GitHub
>>> repo
>>> related to this project) for me to get started with. I will also try to
>>> learn
>>> and solve issues there if possible.
>>>
>>> Bio: I’m Chinese undergraduate in Software Engineering. In my freshman
>>> year, I
>>> joined the high-performance computing center[3] of the university as a
>>> research
>>> assistant. Through research and learning during the period, I have a deep
>>> understanding of software architecture and open source projects.
>>>
>>>
>>> [1]
>>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
>>>
>>> [2] https://github.com/GavinWz/Apertium
>>>
>>> [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm
>>>
>>>
>>> Regards,
>>>
>>> Weizhe Yang
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-15 Thread gavinwzmails
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-15 Thread gavinwzmails
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-04 Thread 杨伟哲
>
> Yes, my code looks very messy this time. Thank you for pointing out my
> shortcomings.
>
> I will spend time reading the code in the extension readings, trying to
> understand the various usages of the syntax in the program, understanding
> the project flow, and getting familiar with the code style. After that,
> I'll modify

my code. Definitely, I will strive to integrate myself into apertium as
> soon as

possible.
>
> Many thanks,
>
> Weizhe


On Tue, Mar 3, 2020 at 9:33 PM Tino Didriksen 
wrote:

> The code for the challenge works. However, it is very far from idiomatic
> C++ - it's more akin to C with Classes. ICU causes a little of this, but
> things like malloc(), #define, and having variables first have no home in
> C++. And how is one supposed to build the code? Also, mixing I/O is
> generally a bad idea. What this says to me is that you've coded a bit of
> C89 before, but no C99 or C++, and not used a build system.
>
> As for what to do next, the wiki pages say what project you're meant to
> extend, both on the main ideas page and the coding challenge page. You even
> quoted that part in your mail. So look at that project's code and see if
> you can understand the flow.
>
> -- Tino Didriksen
>
>
> On Thu, 27 Feb 2020 at 06:45, 杨伟哲  wrote:
>
>> Hi Francis and Flammie,
>>
>> I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC project.
>> And
>> currently I’m writing the proposal.
>>
>> I have completed the code challenge listed in the project, which has been
>> put
>> on Pastebin[2]. However, I’m not quite clear where this project starting
>> with.
>> And I will be much appreciate if you could list somewhere (e.g. GitHub
>> repo
>> related to this project) for me to get started with. I will also try to
>> learn
>> and solve issues there if possible.
>>
>> Bio: I’m Chinese undergraduate in Software Engineering. In my freshman
>> year, I
>> joined the high-performance computing center[3] of the university as a
>> research
>> assistant. Through research and learning during the period, I have a deep
>> understanding of software architecture and open source projects.
>>
>>
>> [1]
>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
>>
>> [2] https://github.com/GavinWz/Apertium
>>
>> [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm
>>
>>
>> Regards,
>>
>> Weizhe Yang
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-03 Thread 杨伟哲
>
> Yes, my code looks very messy this time. Thank you for pointing out my
> shortcomings.
>
> I will spend time reading the code in the extension readings, trying to
> understand the various usages of the syntax in the program, understanding
> the project flow, and getting familiar with the code style. After that,
> I'll modify

my code. Definitely, I will strive to integrate myself into apretium as
> soon as

possible.
>
> Many thanks,
>
> Weizhe
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-03 Thread 杨伟哲
OK! Thanks a lot for your reply and recommendation.

I configured the apertium core and litoolbox environment and downloaded
several dictionaries on my computer the other day. Recently I've been
getting familiar with their usage and the meaning of each of the options.

I have a certain understanding of the composition of Unicode code, and now
I am also studying the grammar of ICU and making some progress.

As for IRC, I will always keep an eye on the communication on the channel.

Best regards,

--Weizhe

On Tue, Mar 3, 2020 at 9:10 PM Flammie A Pirinen  wrote:

> Hi,
>
> I am this week on hliday with low internet availability so only few
> quick points. Firstly I strogly recommend joining #apertium IRC channel,
> I think even non-mentors will have useful clues. For the tokenisation
> problem I think the main resource is to understand various unicode
> technical reports that describe tokenisations and a C++ library like
> ICU, and then how apertium currently does tokenisations and how this
> projects code will interact, especially for the last point many other
> people in IRC know it better  than me.
>
> Regards,
>
> On Thu, Feb 27, 2020 at 01:45:09PM +0800, 杨伟哲 wrote:
> > Hi Francis and Flammie,
> >
> > I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC project.
> > And
> > currently I’m writing the proposal.
> >
> > I have completed the code challenge listed in the project, which has been
> > put
> > on Pastebin[2]. However, I’m not quite clear where this project starting
> > with.
> > And I will be much appreciate if you could list somewhere (e.g. GitHub
> repo
> > related to this project) for me to get started with. I will also try to
> > learn
> > and solve issues there if possible.
> >
> > Bio: I’m Chinese undergraduate in Software Engineering. In my freshman
> > year, I
> > joined the high-performance computing center[3] of the university as a
> > research
> > assistant. Through research and learning during the period, I have a deep
> > understanding of software architecture and open source projects.
> >
> >
> > [1]
> >
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
> >
> > [2] https://github.com/GavinWz/Apertium
> >
> > [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm
> >
> >
> > Regards,
> >
> > Weizhe Yang
>
>
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> --
> Regards, Flammie 
> (Please note, that I will often include my replies inline instead of
> top or bottom of the mail)
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-03 Thread Tino Didriksen
The code for the challenge works. However, it is very far from idiomatic
C++ - it's more akin to C with Classes. ICU causes a little of this, but
things like malloc(), #define, and having variables first have no home in
C++. And how is one supposed to build the code? Also, mixing I/O is
generally a bad idea. What this says to me is that you've coded a bit of
C89 before, but no C99 or C++, and not used a build system.

As for what to do next, the wiki pages say what project you're meant to
extend, both on the main ideas page and the coding challenge page. You even
quoted that part in your mail. So look at that project's code and see if
you can understand the flow.

-- Tino Didriksen


On Thu, 27 Feb 2020 at 06:45, 杨伟哲  wrote:

> Hi Francis and Flammie,
>
> I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC project.
> And
> currently I’m writing the proposal.
>
> I have completed the code challenge listed in the project, which has been
> put
> on Pastebin[2]. However, I’m not quite clear where this project starting
> with.
> And I will be much appreciate if you could list somewhere (e.g. GitHub repo
> related to this project) for me to get started with. I will also try to
> learn
> and solve issues there if possible.
>
> Bio: I’m Chinese undergraduate in Software Engineering. In my freshman
> year, I
> joined the high-performance computing center[3] of the university as a
> research
> assistant. Through research and learning during the period, I have a deep
> understanding of software architecture and open source projects.
>
>
> [1]
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
>
> [2] https://github.com/GavinWz/Apertium
>
> [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm
>
>
> Regards,
>
> Weizhe Yang
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Willingness to participate in the project

2020-03-03 Thread Flammie A Pirinen
Hi,

I am this week on hliday with low internet availability so only few
quick points. Firstly I strogly recommend joining #apertium IRC channel,
I think even non-mentors will have useful clues. For the tokenisation
problem I think the main resource is to understand various unicode
technical reports that describe tokenisations and a C++ library like
ICU, and then how apertium currently does tokenisations and how this
projects code will interact, especially for the last point many other
people in IRC know it better  than me.

Regards,

On Thu, Feb 27, 2020 at 01:45:09PM +0800, 杨伟哲 wrote:
> Hi Francis and Flammie,
> 
> I’m interested in the “Robust tokenisation in lttoolbox”[1] GSoC project.
> And
> currently I’m writing the proposal.
> 
> I have completed the code challenge listed in the project, which has been
> put
> on Pastebin[2]. However, I’m not quite clear where this project starting
> with.
> And I will be much appreciate if you could list somewhere (e.g. GitHub repo
> related to this project) for me to get started with. I will also try to
> learn
> and solve issues there if possible.
> 
> Bio: I’m Chinese undergraduate in Software Engineering. In my freshman
> year, I
> joined the high-performance computing center[3] of the university as a
> research
> assistant. Through research and learning during the period, I have a deep
> understanding of software architecture and open source projects.
> 
> 
> [1]
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation
> 
> [2] https://github.com/GavinWz/Apertium
> 
> [3] http://cs.wfu.edu.cn/2014/0603/c1227a33048/page.htm
> 
> 
> Regards,
> 
> Weizhe Yang


> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


-- 
Regards, Flammie 
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff