Hi Tommi, all,

A couple years ago, a Swarthmore student implemented an algorithm for
tokenisation of spaceless orthographies using morphological transducers.
She used a fork of a prototype Japanese transducer developed by another of
my students to evaluate it.

The work is available at the following urls:

https://scholarship.tricolib.brynmawr.edu/handle/10066/20002

https://github.com/chanlon1/tokenisation

https://github.com/chanlon1/apertium-jpn

--
Jonathan

On Wed, Feb 26, 2020, 06:38 Tomohiro Akazawa <tomohiroakaz...@gmail.com>
wrote:

> Thank you for your reply.
> If  "improving the support of Japanese on Apertium" could be a new project
> on GSoC, I would find the problems of the current version of Apertium and
> figure out the solutions for them.
> Thank you.
>
> 2020年2月26日(水) 0:47 Tommi A Pirinen <tommi.antero.piri...@uni-hamburg.de>:
>
>> Hi all,
>> one thing that might be worth considering ia improving support of
>> Japanese in Apertium, is that we currently do not have any good
>> generic solution for the word-tokenisation, this affects especially
>> languages like Japanese where a space- and punct-based tokenisation is
>> much more suboptimal than for European languages. If you'd be interested
>> in
>> formulating a project solving the tokenisation problem, I think it would
>> fit to Apertium gsoc quite well, and if others agree I could (co-)mentor
>>
>> On Mon, Feb 24, 2020 at 06:12:28AM +0900, Tomohiro Akazawa wrote:
>> > Thank you for your reply.
>> > Considering there are many resources for English and Japanese, possibly
>> I
>> > should change my plan .
>> > Thank you
>>
>>
>>
>> > On Sun, 23 Feb 2020, 23:58 Hèctor Alòs i Font, <hectora...@gmail.com>
>> wrote:
>> >
>> > > Hi Tomohiro,
>> > >
>> > > Maybe it is not the 2019 version of the application form, but the
>> 2020 one
>> > > (if Apertium is elected by Google as a partner organisation) should
>> not be
>> > > very different of this one:
>> > > http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications
>> > > Essentially, for a pair like English and Japanese the main questions
>> > > probably will be:
>> > >
>> > >     * reasons why Google and Apertium should sponsor it,
>> > >     * a description of how and who it will benefit in society,
>> > >
>> > > (essentially because both English and Japanese are resourceful
>> languages).
>> > > Imho, Okinawan-Japanese would be a much more Apertium-like proposal.
>> But,
>> > > of course, I may be wrong. I should maybe add that for building a
>> > > translator it is not absolutely necessary to be proficient in the
>> source
>> > > language. If you can read it and you have access to grammars,
>> dictionaries
>> > > and informants, this is usually enough. But, of course, the more you
>> know
>> > > the source language (not only the target one), the better.
>> > >
>> > > Hèctor
>> > >
>> > > Missatge de Tomohiro Akazawa <tomohiroakaz...@gmail.com> del dia
>> dg., 23
>> > > de febr. 2020 a les 14:27:
>> > >
>> > >>  Hello.
>> > >> My name is Tomohiro and I am a student of the University of Tokyo in
>> > >> Japan.
>> > >>  Seeing the Apertium's idea list for GSoC 2020, I found "Adopt an
>> > >> unreleased language pair" interesting.
>> > >>  Do you think it is possible to make the language pair between
>> English
>> > >> and Japanese?
>> > >> Thank you very much.
>> > >> _______________________________________________
>> > >> Apertium-stuff mailing list
>> > >> Apertium-stuff@lists.sourceforge.net
>> > >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> > >>
>> > > _______________________________________________
>> > > Apertium-stuff mailing list
>> > > Apertium-stuff@lists.sourceforge.net
>> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> > >
>>
>>
>> > _______________________________________________
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> --
>> Doktor Tommi A Pirinen, Computational Linguist,
>> <https://flammie.github.io/purplemonkeydishwasher/>, Universität
>> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
>> Entwickler.  President of ACL SIGUR SIG for Uralic languages
>> <http://gtweb.uit.no/sigur/>.
>> I tend to follow inline-posting style in desktop e-mail messages.
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to