Thanks so much! I once visited the repo of lttoolbox and read the source code of lt-proc.cc, lt-comp.cc, lt-expand.cc, etc. But at that time, I was not sure whether it was the code I needed, so I only read it roughly. But I still remember their location in the repository. Now I'll look more closely and try to find out the specific code that implements tokenization and where it fits into the ICU. I think this will help improve my proposal.
Sincerely, Weizhe On Mon, Mar 16, 2020 at 11:44 PM Tino Didriksen <m...@tinodidriksen.com> wrote: > It's somewhere in https://github.com/apertium/lttoolbox - I don't know > the exact location. > > The entrypoint that does tokenization is lt-proc, so start from lt-proc.cc > and trace execution to somewhere that does tokenization. That's also a good > way to learn the codebase. > > -- Tino Didriksen > > > On Mon, 16 Mar 2020 at 16:00, 杨伟哲 <gavinwzma...@gmail.com> wrote: > >> Hi Tino and Fammie, >> >> Due to my mistake in sending the email before, I am not sure whether you >> have >> received the email I sent, so I'm sending the email to you again now. >> Hope you can >> receive it. >> >> These days, I read the wikipedia description of tokenization and got a >> general idea >> of how it works.I also learn some icu syntax every day. At the mean time, >> I'm also >> searching for information on how to handle tokenized Unicode vocabularies. >> >> Recently I have been reading "further reading"[1] of my proposed >> project[2], which >> is about HFST. The code is a bit hard to understand. But my task is >> "Update >> lttoolbox to be fully Unicode compliant with regards to medication to >> alphabetical >> symbols". May I know exactly how tokenization is implemented in lttoolbox >> and the >> specific code that I'm going to update? >> >> Regards, >> >> Weizhe >> >> [1] https://github.com/hfst/hfst/blob/master/tools/src/hfst-tokenize.cc >> >> [2] >> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Robust_tokenisation >> >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff