Re: [Apertium-stuff] GSOC2023

2023-03-01 Thread Eiji Miyamoto
Okay, thank you. I will work on the integration and then try to do other
tasks!


On Wed, 1 Mar 2023 at 14:56, Daniel Swanson 
wrote:

> You're certainly welcome to submit pull requests on the Japanese
> repository, but due to the tokenization problems, that probably
> shouldn't be your entire coding challenge, since we also need to see
> that you can work on that aspect of the project.
>
> Daniel
>
> On Wed, Mar 1, 2023 at 9:49 AM Eiji Miyamoto  wrote:
> >
> > Hello, I am thinking to work on the integration of apertium-3 into
> apertium-jpn as Jonathan san suggested. Do I need to language data for it?
> I have already installed dev tools locally.
> >
> > Also, I’ve found an issue in apertium-jpn, and I wonder should I do this
> for something like a coding challenge?
> >
> > Cheers,
> >
> > *Sorry for your inconvenient to be asked through email. IRC seems weird
> for my account now.
> >
> > On Mon, 27 Feb 2023 at 01:08, Jonathan Washington <
> jonathan.n.washing...@gmail.com> wrote:
> >>
> >> Hi Eijisan,
> >>
> >> There's also the tokeniser used for Nuosu, which uses the transducer
> itself to tokenise:
> >> https://github.com/apertium/apertium-iii
> >>
> >> I believe this is a later implementation of what's described in the
> thesis sent by Kevin in [2].
> >>
> >> This method has some downsides, but it also has some advantages over a
> statistical model.  Perhaps a way to get started would be to explore the
> pros and cons of each approach, and think about what a hybrid model could
> achieve.  It would be good to join the IRC channel to discuss all this with
> the mentors.
> >>
> >> Another good way to get started (and it would help you do the above
> too) would be to integrate the tokeniser from apertium-iii into
> apertium-jpn:
> >> https://github.com/apertium/apertium-jpn
> >>
> >> You would need to modify the Makefile.am, the modes.xml file, drop in
> the tokeniser script, and that's about it?  Then see if you can get it to
> analyse text without spaces (test it first with the same text,
> hand-tokenised, to see what the output is).  Again, come to IRC for
> guidance.
> >>
> >> The tokeniser.py script is a bit slow, mainly because of Python string
> processing.  Rewriting it in C/C++ would be useful, and also a good way to
> get a better handle on how it works.
> >>
> >> --
> >> Jonathan
> >>
> >>
> >> On Fri, Feb 24, 2023, 13:03 Eiji Miyamoto  wrote:
> >>>
> >>> Thank you for your reply. The project seems cool to work on for
> GSOC2023, and I would like to participate in. I reckon there are two tasks
> on the page and could you tell me where to start?
> >>>
> >>> On Fri, 24 Feb 2023 at 08:20, Kevin Brubeck Unhammer <
> unham...@fsfe.org> wrote:
> 
>  > I'd like to participate in Google Summer of Code 2023 at Apertium.
>  > In particular, I'm interested in adding new language pair and I am
>  > thinking to add Japanese-English as I speak Japanese. I took summer
>  > school at Tokyo University online on natural language processing
>  > before.
>  > Could you tell me more about the project?
> 
>  Hi,
> 
>  Getting some support for Japanese would be great! I'm not sure if you
>  saw the whole IRC discussion, but what we really need in that regard
> is
>  support for the *tokenisation* step, where our regular methods[1] fail
>  us, since the text might have no spaces and lots of
>  tokenisation-ambiguity. There has been some prior work[2] and it's
>  already listed as a potential GsoC project.
> 
>  Support for anything-Japanese depends on tokenisation. It's also a big
>  enough job that it would qualify as a full GsoC project, so if you
> were
>  hoping for jpn-eng in a summer you will be disappointeda (but having a
>  toy language pair to test with would help!). On the other hand, if we
>  get good spaceless tokenisation we open up the possibility for not
> just
>  Japanese, but Thai, Lao, Chinese etc. – and of course all those
> writing
>  systems used before the invention of the space character :)
> 
>  regards,
>  Kevin
> 
>  [1] https://wiki.apertium.org/wiki/LRLM
>  [2] http://hdl.handle.net/10066/20002
>  [3]
> https://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in/Tokenisation_for_spaceless_orthographies
>  ___
>  Apertium-stuff mailing list
>  Apertium-stuff@lists.sourceforge.net
>  https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >>>
> >>> ___
> >>> Apertium-stuff mailing list
> >>> Apertium-stuff@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >>
> >> ___
> >> Apertium-stuff mailing list
> >> Apertium-stuff@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> > ___
> > Apertium-stuff 

Re: [Apertium-stuff] GSoC'23

2023-03-01 Thread Daniel Swanson
I don't care how you implement it, just that the editor extensions
(any of 
https://microsoft.github.io/language-server-protocol/implementors/tools/)
can connect to it.

Daniel

On Wed, Mar 1, 2023 at 12:20 PM Abd-El-Rahman Nasser
 wrote:
>
> Ok, I will do this server
> Do you need a specific API to be fetched or any general one?
> Also, I will write it on Node since it's the technology I'm using.
>
> Last question, any problem if I used any library like axios or got or you 
> need just node and nothing else?
>
> On Wed, Mar 1, 2023, 4:48 PM Daniel Swanson  
> wrote:
>>
>> For the LSP project, how about writing a trivial server that receives
>> the JSON requests from the editor and just prints them to the
>> terminal.
>>
>> For Annotatrix, I hope one of the people who works on it directly will
>> see this and clarify the status. In the meantime, if you really want
>> to work on that, your coding challenge is to open a pull request
>> addressing any open issue:
>> https://github.com/jonorthwash/ud-annotatrix/issues
>>
>> Daniel
>>
>> On Wed, Mar 1, 2023 at 12:34 AM Abd-El-Rahman  
>> wrote:
>> >
>> > Ok thank you mr Daniel.
>> > I would like to ask you about another project to put it in my mind which 
>> > is building the "Language Server Protocol" if it wasn't assigned to anyone 
>> > and if there's a code challenge to complete it to contribute with you.
>> > I'm also still interested in the Node project that you said you are not 
>> > sure about it (I need to contribute to it so I need to know information 
>> > about it if it is still available)
>> >
>> >
>> > On Tue, Feb 28 2023 at 11:41:26 PM -0500, Daniel Swanson 
>> >  wrote:
>> >
>> > I'm not sure what the status of the Annotatrix project is, but I can give 
>> > you a coding challenge for the capitalization project, which is to fork a 
>> > translation pair of your choice and modify the makefile and modes.xml so 
>> > that capitalization is in the pipeline but doesn't do anything yet. Daniel 
>> > On Tue, Feb 28, 2023 at 11:20 PM Abd-El-Rahman  
>> > wrote:
>> >
>> > Hello, my name is Abd-El-Rahman Nasser and I need to contribute with you 
>> > on GSoC this year. I'm from Egypt and have basic knowledge of programming 
>> > and concepts related like Architecture, database, OS, OOP, OOD, Design 
>> > Patterns and some on technology with javascript and NodeJS and seeking to 
>> > know more with you So, I want you help me to know where to start to be 
>> > accepted with you. The projects that I want to contribute with it are one 
>> > of those: 1. Support for Enhanced Dependencies in UD Annotatrix 2. Add 
>> > Capitalization Handling Module to a Language Thanks in advance. Sorry for 
>> > my poor English. ___ 
>> > Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net 
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> >
>> > ___ Apertium-stuff mailing 
>> > list Apertium-stuff@lists.sourceforge.net 
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> >
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC'23

2023-03-01 Thread Abd-El-Rahman Nasser
Ok, I will do this server
Do you need a specific API to be fetched or any general one?
Also, I will write it on Node since it's the technology I'm using.

Last question, any problem if I used any library like axios or got or you
need just node and nothing else?

On Wed, Mar 1, 2023, 4:48 PM Daniel Swanson 
wrote:

> For the LSP project, how about writing a trivial server that receives
> the JSON requests from the editor and just prints them to the
> terminal.
>
> For Annotatrix, I hope one of the people who works on it directly will
> see this and clarify the status. In the meantime, if you really want
> to work on that, your coding challenge is to open a pull request
> addressing any open issue:
> https://github.com/jonorthwash/ud-annotatrix/issues
>
> Daniel
>
> On Wed, Mar 1, 2023 at 12:34 AM Abd-El-Rahman 
> wrote:
> >
> > Ok thank you mr Daniel.
> > I would like to ask you about another project to put it in my mind which
> is building the "Language Server Protocol" if it wasn't assigned to anyone
> and if there's a code challenge to complete it to contribute with you.
> > I'm also still interested in the Node project that you said you are not
> sure about it (I need to contribute to it so I need to know information
> about it if it is still available)
> >
> >
> > On Tue, Feb 28 2023 at 11:41:26 PM -0500, Daniel Swanson <
> awesomeevildu...@gmail.com> wrote:
> >
> > I'm not sure what the status of the Annotatrix project is, but I can
> give you a coding challenge for the capitalization project, which is to
> fork a translation pair of your choice and modify the makefile and
> modes.xml so that capitalization is in the pipeline but doesn't do anything
> yet. Daniel On Tue, Feb 28, 2023 at 11:20 PM Abd-El-Rahman <
> abdonasser...@gmail.com> wrote:
> >
> > Hello, my name is Abd-El-Rahman Nasser and I need to contribute with you
> on GSoC this year. I'm from Egypt and have basic knowledge of programming
> and concepts related like Architecture, database, OS, OOP, OOD, Design
> Patterns and some on technology with javascript and NodeJS and seeking to
> know more with you So, I want you help me to know where to start to be
> accepted with you. The projects that I want to contribute with it are one
> of those: 1. Support for Enhanced Dependencies in UD Annotatrix 2. Add
> Capitalization Handling Module to a Language Thanks in advance. Sorry for
> my poor English. ___
> Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> > ___ Apertium-stuff mailing
> list Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC2023

2023-03-01 Thread Daniel Swanson
You're certainly welcome to submit pull requests on the Japanese
repository, but due to the tokenization problems, that probably
shouldn't be your entire coding challenge, since we also need to see
that you can work on that aspect of the project.

Daniel

On Wed, Mar 1, 2023 at 9:49 AM Eiji Miyamoto  wrote:
>
> Hello, I am thinking to work on the integration of apertium-3 into 
> apertium-jpn as Jonathan san suggested. Do I need to language data for it? I 
> have already installed dev tools locally.
>
> Also, I’ve found an issue in apertium-jpn, and I wonder should I do this for 
> something like a coding challenge?
>
> Cheers,
>
> *Sorry for your inconvenient to be asked through email. IRC seems weird for 
> my account now.
>
> On Mon, 27 Feb 2023 at 01:08, Jonathan Washington 
>  wrote:
>>
>> Hi Eijisan,
>>
>> There's also the tokeniser used for Nuosu, which uses the transducer itself 
>> to tokenise:
>> https://github.com/apertium/apertium-iii
>>
>> I believe this is a later implementation of what's described in the thesis 
>> sent by Kevin in [2].
>>
>> This method has some downsides, but it also has some advantages over a 
>> statistical model.  Perhaps a way to get started would be to explore the 
>> pros and cons of each approach, and think about what a hybrid model could 
>> achieve.  It would be good to join the IRC channel to discuss all this with 
>> the mentors.
>>
>> Another good way to get started (and it would help you do the above too) 
>> would be to integrate the tokeniser from apertium-iii into apertium-jpn:
>> https://github.com/apertium/apertium-jpn
>>
>> You would need to modify the Makefile.am, the modes.xml file, drop in the 
>> tokeniser script, and that's about it?  Then see if you can get it to 
>> analyse text without spaces (test it first with the same text, 
>> hand-tokenised, to see what the output is).  Again, come to IRC for guidance.
>>
>> The tokeniser.py script is a bit slow, mainly because of Python string 
>> processing.  Rewriting it in C/C++ would be useful, and also a good way to 
>> get a better handle on how it works.
>>
>> --
>> Jonathan
>>
>>
>> On Fri, Feb 24, 2023, 13:03 Eiji Miyamoto  wrote:
>>>
>>> Thank you for your reply. The project seems cool to work on for GSOC2023, 
>>> and I would like to participate in. I reckon there are two tasks on the 
>>> page and could you tell me where to start?
>>>
>>> On Fri, 24 Feb 2023 at 08:20, Kevin Brubeck Unhammer  
>>> wrote:

 > I'd like to participate in Google Summer of Code 2023 at Apertium.
 > In particular, I'm interested in adding new language pair and I am
 > thinking to add Japanese-English as I speak Japanese. I took summer
 > school at Tokyo University online on natural language processing
 > before.
 > Could you tell me more about the project?

 Hi,

 Getting some support for Japanese would be great! I'm not sure if you
 saw the whole IRC discussion, but what we really need in that regard is
 support for the *tokenisation* step, where our regular methods[1] fail
 us, since the text might have no spaces and lots of
 tokenisation-ambiguity. There has been some prior work[2] and it's
 already listed as a potential GsoC project.

 Support for anything-Japanese depends on tokenisation. It's also a big
 enough job that it would qualify as a full GsoC project, so if you were
 hoping for jpn-eng in a summer you will be disappointeda (but having a
 toy language pair to test with would help!). On the other hand, if we
 get good spaceless tokenisation we open up the possibility for not just
 Japanese, but Thai, Lao, Chinese etc. – and of course all those writing
 systems used before the invention of the space character :)

 regards,
 Kevin

 [1] https://wiki.apertium.org/wiki/LRLM
 [2] http://hdl.handle.net/10066/20002
 [3] 
 https://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in/Tokenisation_for_spaceless_orthographies
 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC2023

2023-03-01 Thread Eiji Miyamoto
Hello, I am thinking to work on the integration of apertium-3 into
apertium-jpn as Jonathan san suggested. Do I need to language data for it?
I have already installed dev tools locally.

Also, I’ve found an issue in apertium-jpn, and I wonder should I do this
for something like a coding challenge?

Cheers,

*Sorry for your inconvenient to be asked through email. IRC seems weird for
my account now.

On Mon, 27 Feb 2023 at 01:08, Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> Hi Eijisan,
>
> There's also the tokeniser used for Nuosu, which uses the transducer
> itself to tokenise:
> https://github.com/apertium/apertium-iii
>
> I believe this is a later implementation of what's described in the thesis
> sent by Kevin in [2].
>
> This method has some downsides, but it also has some advantages over a
> statistical model.  Perhaps a way to get started would be to explore the
> pros and cons of each approach, and think about what a hybrid model could
> achieve.  It would be good to join the IRC channel to discuss all this with
> the mentors.
>
> Another good way to get started (and it would help you do the above too)
> would be to integrate the tokeniser from apertium-iii into apertium-jpn:
> https://github.com/apertium/apertium-jpn
>
> You would need to modify the Makefile.am, the modes.xml file, drop in the
> tokeniser script, and that's about it?  Then see if you can get it to
> analyse text without spaces (test it first with the same text,
> hand-tokenised, to see what the output is).  Again, come to IRC for
> guidance.
>
> The tokeniser.py script is a bit slow, mainly because of Python string
> processing.  Rewriting it in C/C++ would be useful, and also a good way to
> get a better handle on how it works.
>
> --
> Jonathan
>
>
> On Fri, Feb 24, 2023, 13:03 Eiji Miyamoto  wrote:
>
>> Thank you for your reply. The project seems cool to work on for GSOC2023,
>> and I would like to participate in. I reckon there are two tasks on the
>> page and could you tell me where to start?
>>
>> On Fri, 24 Feb 2023 at 08:20, Kevin Brubeck Unhammer 
>> wrote:
>>
>>> > I'd like to participate in Google Summer of Code 2023 at Apertium.
>>> > In particular, I'm interested in adding new language pair and I am
>>> > thinking to add Japanese-English as I speak Japanese. I took summer
>>> > school at Tokyo University online on natural language processing
>>> > before.
>>> > Could you tell me more about the project?
>>>
>>> Hi,
>>>
>>> Getting some support for Japanese would be great! I'm not sure if you
>>> saw the whole IRC discussion, but what we really need in that regard is
>>> support for the *tokenisation* step, where our regular methods[1] fail
>>> us, since the text might have no spaces and lots of
>>> tokenisation-ambiguity. There has been some prior work[2] and it's
>>> already listed as a potential GsoC project.
>>>
>>> Support for anything-Japanese depends on tokenisation. It's also a big
>>> enough job that it would qualify as a full GsoC project, so if you were
>>> hoping for jpn-eng in a summer you will be disappointeda (but having a
>>> toy language pair to test with would help!). On the other hand, if we
>>> get good spaceless tokenisation we open up the possibility for not just
>>> Japanese, but Thai, Lao, Chinese etc. – and of course all those writing
>>> systems used before the invention of the space character :)
>>>
>>> regards,
>>> Kevin
>>>
>>> [1] https://wiki.apertium.org/wiki/LRLM
>>> [2] http://hdl.handle.net/10066/20002
>>> [3]
>>> https://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in/Tokenisation_for_spaceless_orthographies
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC'23

2023-03-01 Thread Daniel Swanson
For the LSP project, how about writing a trivial server that receives
the JSON requests from the editor and just prints them to the
terminal.

For Annotatrix, I hope one of the people who works on it directly will
see this and clarify the status. In the meantime, if you really want
to work on that, your coding challenge is to open a pull request
addressing any open issue:
https://github.com/jonorthwash/ud-annotatrix/issues

Daniel

On Wed, Mar 1, 2023 at 12:34 AM Abd-El-Rahman  wrote:
>
> Ok thank you mr Daniel.
> I would like to ask you about another project to put it in my mind which is 
> building the "Language Server Protocol" if it wasn't assigned to anyone and 
> if there's a code challenge to complete it to contribute with you.
> I'm also still interested in the Node project that you said you are not sure 
> about it (I need to contribute to it so I need to know information about it 
> if it is still available)
>
>
> On Tue, Feb 28 2023 at 11:41:26 PM -0500, Daniel Swanson 
>  wrote:
>
> I'm not sure what the status of the Annotatrix project is, but I can give you 
> a coding challenge for the capitalization project, which is to fork a 
> translation pair of your choice and modify the makefile and modes.xml so that 
> capitalization is in the pipeline but doesn't do anything yet. Daniel On Tue, 
> Feb 28, 2023 at 11:20 PM Abd-El-Rahman  wrote:
>
> Hello, my name is Abd-El-Rahman Nasser and I need to contribute with you on 
> GSoC this year. I'm from Egypt and have basic knowledge of programming and 
> concepts related like Architecture, database, OS, OOP, OOD, Design Patterns 
> and some on technology with javascript and NodeJS and seeking to know more 
> with you So, I want you help me to know where to start to be accepted with 
> you. The projects that I want to contribute with it are one of those: 1. 
> Support for Enhanced Dependencies in UD Annotatrix 2. Add Capitalization 
> Handling Module to a Language Thanks in advance. Sorry for my poor English. 
> ___ Apertium-stuff mailing list 
> Apertium-stuff@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___ Apertium-stuff mailing list 
> Apertium-stuff@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff