Re: [Apertium-stuff] Transducer contains initial epsilon loop

2023-05-20 Thread Jonathan Washington
This is essentially how spellrelax works in modern language modules.

To see how it's implemented, you can have a look at the spellrelax file and
Makefile.am for recent language modules, e.g.:
https://github.com/apertium/apertium-yua

--
Jonathan

On Fri, May 19, 2023, 08:55 Zanga Chimombo  wrote:

> Thanks for this. I am not sure where to put (which file) the XFST
> rule(s) and the syntax. Are there any examples online that you could
> point me to please?
>
> On Sun, May 14, 2023 at 9:00 PM Kevin Brubeck Unhammer
>  wrote:
> >
> >
> > > I am looking at this again. Removing the extra tag at the transfer
> > > stage seems to be too late down the pipeline (I need the adjective to
> > > match the noun which is done by CG). Actually, surely removing the
> > > extra tag could be done at the same CG stage?
> >
> > If you use an xfst rule, that happens on the analyser FST, ie. before CG
> > and long before transfer.
> >
> >
> >
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Questions regarding Apertium-related matters [Google Summer of Code]

2023-03-30 Thread Jonathan Washington
Hi Yunze,

Please see the FAQ for that project:
https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Adopt_a_language_pair

I'm CCing the Apertium mailing list where others can chime in and where you
can get ask follow-up questions.

--
Jonathan


On Thu, Mar 30, 2023, 11:44 Jonathan Washington 
wrote:

>
> -- Forwarded message -
> From: Yunze Song 
> Date: Thu, Mar 30, 2023, 11:39
> Subject: Questions regarding Apertium-related matters [Google Summer of
> Code]
>
>
>
> Dear Prof. Jonathan Washington,
>
> I hope this email finds you well. Since I have not received the Wiki
> account, I contact you by email.
>
> First, I want to introduce myself to you. I am Yunze Song, a Year 4
> student from Computer Science and Technology at the University of Liverpool
> (United Kingdom) and Xi'an Jiaotong-Liverpool University (China; A joint
> degree program). I am interested in natural language processing and would
> like to contribute to Apertium because I believe that an open source,
> high-quality machine translation platform is essential for everyone working
> in the field of language research and NLP.
>
> I have a good command of Simplified Chinese, Traditional Chinese and
> English. Therefore, I will focus on and may choose one from the following
> three projects based on my computer background and language proficiency.
>
> 1. Bring an unreleased translation pair to releasable quality [Simplified
> Chinese <<>> Traditional Chinese].
> 2. Add a new variety to an existing language [Simplified Chinese <<>>
> English].
> 3*(Most Interested). Develop a prototype MT system for a strategic
> language pair [Simplified Chinese <<>> English].
>
> I would like to ask the following two questions.
> 1. Does the language pairs I marked above meet the requirements of the
> project?
> 2. Secondly, I think it is possible for the second and third projects
> mentioned above to be carried out simultaneously. Can you give me some
> advice related to feasibility?
>
> I hope to finish the warm-up questions as soon as possible and put forward
> the constructive research proposal.
>
> Looking forward to hearing from you.
>
> Regards,
> Yunze
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC2023

2023-02-26 Thread Jonathan Washington
Hi Eijisan,

There's also the tokeniser used for Nuosu, which uses the transducer itself
to tokenise:
https://github.com/apertium/apertium-iii

I believe this is a later implementation of what's described in the thesis
sent by Kevin in [2].

This method has some downsides, but it also has some advantages over a
statistical model.  Perhaps a way to get started would be to explore the
pros and cons of each approach, and think about what a hybrid model could
achieve.  It would be good to join the IRC channel to discuss all this with
the mentors.

Another good way to get started (and it would help you do the above too)
would be to integrate the tokeniser from apertium-iii into apertium-jpn:
https://github.com/apertium/apertium-jpn

You would need to modify the Makefile.am, the modes.xml file, drop in the
tokeniser script, and that's about it?  Then see if you can get it to
analyse text without spaces (test it first with the same text,
hand-tokenised, to see what the output is).  Again, come to IRC for
guidance.

The tokeniser.py script is a bit slow, mainly because of Python string
processing.  Rewriting it in C/C++ would be useful, and also a good way to
get a better handle on how it works.

--
Jonathan


On Fri, Feb 24, 2023, 13:03 Eiji Miyamoto  wrote:

> Thank you for your reply. The project seems cool to work on for GSOC2023,
> and I would like to participate in. I reckon there are two tasks on the
> page and could you tell me where to start?
>
> On Fri, 24 Feb 2023 at 08:20, Kevin Brubeck Unhammer 
> wrote:
>
>> > I'd like to participate in Google Summer of Code 2023 at Apertium.
>> > In particular, I'm interested in adding new language pair and I am
>> > thinking to add Japanese-English as I speak Japanese. I took summer
>> > school at Tokyo University online on natural language processing
>> > before.
>> > Could you tell me more about the project?
>>
>> Hi,
>>
>> Getting some support for Japanese would be great! I'm not sure if you
>> saw the whole IRC discussion, but what we really need in that regard is
>> support for the *tokenisation* step, where our regular methods[1] fail
>> us, since the text might have no spaces and lots of
>> tokenisation-ambiguity. There has been some prior work[2] and it's
>> already listed as a potential GsoC project.
>>
>> Support for anything-Japanese depends on tokenisation. It's also a big
>> enough job that it would qualify as a full GsoC project, so if you were
>> hoping for jpn-eng in a summer you will be disappointeda (but having a
>> toy language pair to test with would help!). On the other hand, if we
>> get good spaceless tokenisation we open up the possibility for not just
>> Japanese, but Thai, Lao, Chinese etc. – and of course all those writing
>> systems used before the invention of the space character :)
>>
>> regards,
>> Kevin
>>
>> [1] https://wiki.apertium.org/wiki/LRLM
>> [2] http://hdl.handle.net/10066/20002
>> [3]
>> https://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in/Tokenisation_for_spaceless_orthographies
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC 2023 Mentors & Ideas?

2023-01-28 Thread Jonathan Washington
I'm also happy to mentor!  (And help with recursive transfer:)  I made
sure my name was on all the projects I'd like to mentor.

--
Jonathan

27 yan 2023, C. tarixində 22:29 tarixində Hèctor Alòs i Font
 yazdı:
>
> Missatge de Kevin Brubeck Unhammer  del dia dv., 27 de 
> gen. 2023 a les 23:41:
>>
>> > As far as rewriting the
>> > transfer rules using apertium-recursive is concerned, a co-mentor with
>> > experience in the module would be highly desirable.
>>
>> I can try to assist :)
>
>
> Great, thanks, Kevin! It now remains to be seen whether all the conditions 
> are in place for there to be a solid proposal in this sense (starting with 
> Apertium being chosen by Google this year).
>
> Hèctor
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Capitalization Handling

2022-12-29 Thread Jonathan Washington
Thanks for sharing this great news, Daniel!

Is there anything special that needs to be done to leverage this new
approach to capitalisation in new pairs created using apertium-init?

--
Jonathan

27 dek 2022, Ç.a. tarixində 10:33 tarixində Daniel Swanson
 yazdı:
>
> Greetings Apertiumers!
>
> For anyone testing this, I've now also added -w/--dictionary-case to
> apertium-{transfer,interchunk,postchunk} which makes the
> capitalization instructions simply do nothing so we don't have two
> conflicting sets of rules trying to solve the same problem in opposite
> ways.
>
> Daniel
>
> On Tue, Dec 27, 2022 at 6:47 AM Marc Riera Irigoyen
>  wrote:
> >
> > Thanks for the great work! I'll make sure to test it with apertium-eng-cat, 
> > which has generation errors due to capitalization.
> >
> > Happy holidays!
> >
> > Marc Riera
> >
> >
> > Missatge de Hèctor Alòs i Font  del dia ds., 24 de 
> > des. 2022 a les 14:12:
> >>
> >> Looks very good, Daniel. Thanks in advance. I'll try to test in the next 
> >> days in the pairs I maintain.
> >> Merry Christmas/Hanukkah/New Year/*.
> >> Hèctor
> >>
> >> Missatge de Daniel Swanson  del dia dv., 23 de 
> >> des. 2022 a les 0:41:
> >>>
> >>> Greetings Apertiumers!
> >>>
> >>> I have two updates to report:
> >>>
> >>> First, I have rewritten the postgenerator (again), this time as part
> >>> of apertium-separable (and so not breaking the old one, unlike last
> >>> time), and in such a way that postgenerator rules can both match on
> >>> lemma and tags in addition to surface forms and iteratively apply to
> >>> their own output.
> >>>
> >>> This is available as part of apertium-separable 0.7.0 and is
> >>> documented at https://wiki.apertium.org/wiki/Postgenerator
> >>>
> >>> Second, I just added a pair of modules which move capitalization
> >>> information into word-bound blanks at the beginning of the pipeline
> >>> and then reapply them according to LRX-like rules at the end of the
> >>> pipeline, allowing all intermediate modules to operate solely on
> >>> dictionary case.
> >>>
> >>> This should be available after the next nightly build (i.e. tomorrow)
> >>> in apertium 3.9.0, and is documented at
> >>> https://wiki.apertium.org/wiki/Capitalization_restoration
> >>>
> >>> If anyone has questions or would like help trying this out for a
> >>> language pair or if I missed something in the documentation, let me
> >>> know.
> >>>
> >>> Thanks to Kevin Unhammer and Marc Riera for helping me figure out what
> >>> the design of the capitalization module should be.
> >>>
> >>> Merry Christmas,
> >>> Daniel
> >>>
> >>> P.S. To anyone not interested in either of these developments: your
> >>> Christmas gift is that I accidentally made lexical selection quite a
> >>> bit faster while I was working on these.
> >>>
> >>>
> >>> ___
> >>> Apertium-stuff mailing list
> >>> Apertium-stuff@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >>
> >> ___
> >> Apertium-stuff mailing list
> >> Apertium-stuff@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [Apertium-contact] Apertium Simpleton UI bug

2022-09-12 Thread Jonathan Washington
Hei Jørgen,

Could you describe the problem in more detail?  We didn't receive the
screenshot.

P.S. I'm sorry that I couldn't respond to you in Norwegian, but please feel
free to write back in Norwegian!  Plenty of people who can help you here
know Norwegian.

--
Jonathan

31 avq 2022, Ç. tarixində 14:48 tarixində Jørgen Finsveen <
jor...@stud.ntnu.no> yazdı:

> Hei,
>
> Jeg har oppdaget en bug som oppstår når jeg prøver å oversette diverse
> setninger på skrivebordsversjonen av Apertium simpleton. Dette har blitt
> testet på en windows-11 maskin. Jeg har prøvd å framkalle den samme feilen
> på MacOS, men der oppsto ikke denne buggen. Alle språkpakker samt «Required
> Core Tools» og «apertium-all-dev» har blitt oppdatert uten at dette
> påvirket resultatene. Bilde er vedlagt.
>
> Mvh
> Jørgen Finsveen
>
>
> ___
> Apertium-contact mailing list
> apertium-cont...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-contact
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Fwd: [LoResMT] Final call for papers: LoResMT 2022 (The 5th​ Workshop on Technologies for MT of Low-Resource Languages)

2022-07-23 Thread Jonathan Washington
-- Forwarded message -
From: Atul Kr. Ojha 
Date: Wed, Jul 20, 2022, 06:09
Subject: [LoResMT] Final call for papers: LoResMT 2022 (The 5th​ Workshop
on Technologies for MT of Low-Resource Languages)
To: loresmt 


__Apologies for cross-posting__

The Fifth Workshop on Technologies for Machine Translation of
Low Resource Languages (LoResMT 2022)
https://sites.google.com/view/loresmt

@ COLING 2022
Gyeongju, Republic of Korea, October 12-17, 2022

SUBMISSION
https://www.softconf.com/coling2022/LoResMT_2022/

TIMELINE
Submission Open Until: July 30, 2022 (Saturday) at 23:59 (Anywhere on Earth)
Notification of acceptance: August 22, 2022 (Monday)
Camera-ready papers due: September 5, 2022 (Monday)
LoResMT workshop: October 16, 2022 (online on Sunday at UTC+9)
COLING 2022: October 12-17, 2022


SCOPE

Based on the success of past low-resource machine translation (MT)
workshops at AMTA 2018 (https://amtaweb.org/), MT Summit 2019 (
https://www.mtsummit2019.com), AACL-IJCNLP 2020 (http://aacl2020.org/), and
AMTA 2021, we introduce the Fifth LoResMT workshop at COLING 2022. The
workshop provides a discussion panel for researchers working on MT
systems/methods for low-resource and under-represented languages in
general. We would like to help review/overview the state of MT for
low-resource languages and define the most important directions. We also
solicit papers dedicated to supplementary NLP tools that are used in any
language and especially in low-resource languages. Overview papers of these
NLP tools are very welcome. It will be beneficial if the evaluations of
these tools in research papers include their impact on the quality of MT
output.


TOPICS

We are highly interested in (1) original research papers, (2)
review/opinion papers, and (3) online systems on the topics below; however,
we welcome all novel ideas that cover research on low-resource languages.

- COVID-related corpora, their translations and corresponding NLP/MT systems
- Neural machine translation for low-resource languages
- Work that presents online systems for practical use by native speakers
- Word tokenizers/de-tokenizers for specific languages
- Word/morpheme segmenters for specific languages
- Alignment/Re-ordering tools for specific language pairs
- Use of morphology analyzers and/or morpheme segmenters in MT
- Multilingual/cross-lingual NLP tools for MT
- Corpora creation and curation technologies for low-resource languages
- Review of available parallel corpora for low-resource languages
- Research and review papers of MT methods for low-resource languages
- MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource languages
- Pivot MT for low-resource languages
- Zero-shot MT for low-resource languages
- Fast building of MT systems for low-resource languages
- Re-usability of existing MT systems for low-resource languages
- Machine translation for language preservation


SUBMISSION INFORMATION

We are soliciting two types of submissions: (1) research, review, and
position papers and (2) system demonstration papers. For research, review
and position papers, the length of each paper should be at least four (4)
and not exceed eight (8) pages, plus unlimited pages for references. For
system demonstration papers, the limit is four (4) pages. Submissions
should be formatted according to the official COLING 2022 style templates
(LaTeX, Word, Overleaf). Accepted papers will be published online in the
COLING 2022 proceedings and will be presented at the conference.

Submissions must be anonymized and should be done using the official
conference management system (
https://www.softconf.com/coling2022/LoResMT_2022/). Scientific papers that
have been or will be submitted to other venues must be declared as such and
must be withdrawn from the other venues if accepted and published at
LoResMT. The review will be double-blind.

We would like to encourage authors to cite papers written in ANY language
that are related to the topics, as long as both original bibliographic
items and their corresponding English translations are provided.

Registration is handled by the main conference (
https://coling2022.org/coling).


ORGANIZING COMMITTEE (LISTED ALPHABETICALLY)

Atul Kr. Ojha, DSI, National University of Ireland Galway & Panlingua
Language Processing LLP
Chao-Hong Liu, Potamu Research Ltd
Ekaterina Vylomova, University of Melbourne, Australia
Jade Abbott, Retro Rabbit
Jonathan Washington, Swarthmore College
Nathaniel Oco, National University (Philippines)
Tommi A Pirinen, UiT The Arctic University of Norway, Tromsø
Valentin Malykh, Huawei Noah’s Ark lab and Kazan Federal University
Varvara Logacheva, Skolkovo Institute of Science and Technology
Xiaobing Zhao, Minzu University of China


PROGRAM COMMITTEE (LISTED ALPHABETICALLY)

Alberto Poncelas, Rakuten, Singapore
Alina Karakanta, Fondazione Bruno Kessler
Amirhossein Tebbifakhr, Fondazione Bruno Kessler
Anna Currey, Amazon Web Services
Aswarth Abhilash Dara, Amazon
Ar

Re: [Apertium-stuff] Apertium PMC Election: Bypass election?

2022-04-27 Thread Jonathan Washington
I'm also happy with either approach.  I support avoiding bureaucracy,
but I get Tanmai's point about appearances.  But then, being friendly
and going forward unanimously and unbureaucratically is also an
appearance :D

--
Jonathan

27 apr 2022, Ç. tarixində 08:08 tarixində Juan Pablo  yazdı:
>
> Same here! It'll be great to vote if there are more candidates taking a
> step forward. But if not, I vote for avoiding bureaucracy. Candidates
> can be proclaimed by unanimous consent.
>
> best,
>
> Juan Pablo
>
> El 27/04/2022 a las 12:57, Kevin Brubeck Unhammer escribió:
> > As someone currently outside the PMC, I too vote for no vote if it means
> > avoiding unnecessary bureaucracy :)
> >
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium PMC Election: Census & Candidates

2022-04-26 Thread Jonathan Washington
I would also like to run for the PMC again.

--
Jonathan

26 apr 2022, Ç.a. tarixində 03:36 tarixində Kevin Brubeck Unhammer
 yazdı:
>
> > === Candidates:
> > Do you want to be a PMC member? Speak up!
>
> I do.
>
>
> -Kevin
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium Google Cloud owner?

2022-04-20 Thread Jonathan Washington
I think Sushain set up the analytics stuff a long time ago, if that's what
that is?

--
Jonathan

On Mon, Apr 18, 2022, 15:36 Xavi Ivars  wrote:

> It didn't sound familiar, but I double-checked. I don't.
>
> Missatge de Tino Didriksen  del dia dl., 18
> d’abr. 2022 a les 21:00:
>
>> Hello everyone,
>>
>> Who owns https://console.cloud.google.com/home/dashboard?project=apertium
>> ?
>>
>> -- Tino Didriksen
>>
>>
>
> --
> < Xavi Ivars >
> < http://xavi.ivars.me >
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] can't install language pairs packages on debian 11

2021-11-22 Thread Jonathan Washington
21 noy 2021, B. tarixində 13:43 tarixində Bernard Chardonneau
 yazdı:

> Last year, I decided to change Debian 7 to Debian 10. Still another horrible
> graphic interface and the graphic editor gedit changed loosing his top menu.
> I lost several file changes because I had problems to find the way to save
> them. That was a reason to continue using Debian 6 often.
>
> This year, I reinstalled the Debian 10 with mute. The way the graphic
> interface works is now OK. The gedit editor now called pluma works as
> Debian 6 gedit. On gedit, I found having a list of the 5 last files seen
> or edited with it is too few, & would prefer 10 or 20. On pluma, it is
> still only 5 !

Salut(on), Bech,

I offer some quick notes on some of the [off-topic] issues you
mention.  I hope you find some portion of it helpful.

The lack of a window manager frame on gedit is called client-side
decorations.  This can be disabled easily using the gtk3-nocsd package
in Debian, and launching gedit through `GTK_CSD=0 gedit`.  This will
restore the use of your normal window-manager decorations.  Pluma is
Mint / mate's fork of gedit, and keeps the traditional interface, if
that's what you like, but there's no reason you can't have gedit and
pluma installed side-by-side and even run both simultaneously.

Also, there is no single Debian UI.  Mate is probably a good choice if
you're used to gnome 2, as you've found out.  Cinnamon also isn't bad.
I'd also recommend looking into gnome-throwback (not as true to its
name as it sounds) and xfce4.

And to your other complaint, one great thing about FOSS is that if
your editor doesn't list enough recently opened files, you can just
modify it so that it does.  There are of course debates as to the
number of workflows to support through built-in options, but you could
potentially even make this feature into a user-presented option and
submit a patch/PR to pluma or gedit.  And you could definitely create
your own fork, and call it Bech-Pluma or Plume or whatever you like.
What comes shipped is limited by the imaginations and principles of
the original devs, but unlike with proprietary software, the
possibilities are endless.

--
Jonathan


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] A question about Apertium Kazakh and Tatar packages

2021-09-04 Thread Jonathan Washington
As to Andrey's question concerning kaz-rus not working because of a
missing .t4x file, that sounds like a legit packaging error, which I'm
not sure how to fix (I really should learn...)

In the meantime, Andrey, you should be able to just clone the pair and
compile from source (`apertium-get kaz-rus`), which will fix the
missing file issue.¹  This gets around the missing modes as well,² and
will be more future-proof given what Kevin and Tino are discussing.

¹ missing file issue resolved:
apertium-kaz-rus$ echo сәлем деген сөз сөздікте жоқ екен, бірақ басқа
сөздер аударылады ғой  | apertium -d . kaz-rus
@сәлем  сказать  базар  в  словаре @жоқ #, но  #иной базары  #Поднимать

² missing modes resolved:
apertium-kaz-rus$ echo сөздер | apertium -d . kaz-rus-biltrans
^сөз/базар/слово$
^е/$^./.$

I realise the translation is terrible, and I have no idea off the top
of my head why сөз is mapped to базар in the dictionary, but yeah.

Otherwise, does this help?

--
Jonathan

2 sen 2021, C.a. tarixində 04:23 tarixində Tino Didriksen
 yazdı:
>
> On Thu, 2 Sept 2021 at 09:53, Kevin Brubeck Unhammer  
> wrote:
>>
>> However – there are people who want to use debug modes but would rather
>> not want to compile a pair and manually
>> `git pull && make && make test || revert-to-last-working-revision`.
>>
>> Would it make sense to install debug-modes to a debug-modes folder? Put
>> stuff like -biltrans etc. in /usr/share/apertium/debug-modes, and then
>> `apertium -l` only shows translation /modes while `apertium -L` shows
>> both /modes and /debug-modes? (And `apertium kaz-rus-biltrans` works
>> without any special switches because why not, while `apertium
>> nonexistent` runs `apertium -l` and gives a hint to use `-L` to show the
>> rest.)
>
>
>
> That's a good idea, but some debug modes require files that are not normally 
> installed. We shouldn't clutter end-user installs with these files.
>
> But we could install everything to 2 packages: A main package 
> apertium-zzz-xxx for end-users and another apertium-zzz-xxx-corpus (or 
> whatever bikeshed -name we come up with) with the extras. That would also 
> lead nicely into the spellers going into a separate package, as people who 
> just want spellers probably don't care about anything else.
>
> -- Tino Didriksen
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Fwd: LoResMT 2021: Final Call for Papers - Low-Resource Languages Workshop

2021-06-08 Thread Jonathan Washington
-- Forwarded message -
From: Atul Kr. Ojha 
Date: Wed, Jun 2, 2021 at 11:52 PM
Subject: LoResMT 2021: Final Call for Papers - Low-Resource Languages Workshop
To: Jonathan Washington 
Cc: Chao-Hong 


Dear Jonathan,
Hope you are doing well!!

I was wondering if you could please post our workshop's CFP in the
Apertium group(CFP is below).
Thank you!!

Thanks,
Atul

===
The Fourth Workshop on Technologies for MT of Low-Resource Languages
(LoResMT 2021)
https://sites.google.com/view/loresmt/
@ MT Summit XVIII – 2021
The 18th biennial conference of the International Association of Machine
Translation
16-20 August 2021, Orlando, Florida, USA
===

Invited Speakers
===
Barry Haddow
University of Edinburgh

Catherine Muthoni Gitau
African Institute for Mathematical Sciences (AIMS)

Mathias Müller
Institut für Computerlinguistik, Universität Zürich

Mona Diab
Facebook, George Washington University

Scope
===
Based on the success of past low-resource machine translation (MT)
workshops at AACL-IJCNLP 2020 (http://aacl2020.org/), MT Summit 2019
(https://www.mtsummit2019.com) and AMTA 2018 (https://amtaweb.org/),
we introduce the fourth LoResMT workshop at MT Summit 2021. Like its
predecessors, this workshop will bring together researchers and
translators of low-resource languages to compare and contrast how each
uses digital technology for translation. Specifically, the workshop
focuses on novel advances on the coverage of even more languages than
past workshops with different geographical presence, degree of
diffusion and digitalization.

We solicit original work on low-resource translation which includes,
but is not limited to, MT systems that include the word
tokenizers/de-tokenizers, word segmenters, morphological analyzers,
and more. We furthermore invite work that includes MT systems based on
neural networks along with their methods, natural language processing
approaches, and overall coverage of low-resource languages.
Additionally, novel work covering translations of COVID-related text
and their practical use for low-resource communities are of high
interest.

The goal of this workshop is to begin to close the gap between
low-resource translation systems and their practical use in the real
world. Online systems and original research that can be used by native
speakers of low-resource languages are of particular interest.
Therefore,
It will be beneficial if the evaluations of these tools in research
papers include their impact on the quality of MT output and how they
can
be used in the real world.

Shared Tasks
===
We are happy to announce the introduction of new shared tasks focused on
the building of MT systems for COVID-related texts. The task aims to
encourage research on MT systems involving three low-resource language
pairs:

(1) Taiwanese Sign Language <> Traditional Chinese (> 100,000 pairs)
(2) English <> Irish
(3) English <> Marathi

The training, development, and test sets for the three groups will be
released shortly (please see the important dates). Updated information
will be available on the LoResMT website
(https://sites.google.com/view/loresmt/) and in the Google Group
(https://groups.google.com/g/loresmt2021/).

Topics
===
We are highly interested in (1) original research papers, (2)
review/opinion papers, and (3) online systems on the topics below;
however, we welcome all novel ideas that cover research on
low-resource languages.
- COVID-related corpora, their translations and corresponding NLP/MT
systems
- Neural machine translation for low-resource languages
- Work that presents online systems for practical use by native speakers
- Word tokenizers/de-tokenizers for specific languages
- Word/morpheme segmenters for specific languages
- Alignment/Re-ordering tools for specific language pairs
- Use of morphology analyzers and/or morpheme segmenters in MT
- Multilingual/cross-lingual NLP tools for MT
- Corpora creation and curation technologies for low-resource languages
- Review of available parallel corpora for low-resource languages
- Research and review papers of MT methods for low-resource languages
- MT systems/methods (e.g. rule-based, SMT, NMT) for low-resource
languages
- Pivot MT for low-resource languages
- Zero-shot MT for low-resource languages
- Fast building of MT systems for low-resource languages
- Re-usability of existing MT systems for low-resource languages
- Machine translation for language preservation

Submission Information
===
For research, review and position papers, the length of each paper
should be at least four (4) and not exceed eight (8) pages, plus
unlimited pages for references. For system demonstration papers, the
limit is four (4) pages. Submissions should be formatted according to
the official MT Summit 2021 style templates (PDF, LaTeX, Word).
Accepted papers will be published on

Re: [Apertium-stuff] Usage of mfn tag in the Hindi dictionary

2021-05-05 Thread Jonathan Washington
While things are being fixed, it might be better to use Apertium, like
ind instead of indef.  But that's less important.

--
Jonathan

5 may 2021, Ç. tarixində 09:53 tarixində Daniel Swanson
 yazdı:
>
> The forms in question are used in the kok-hin bidix, so that would
> need to be updated too.
>
> I've been thinking about how to write a script to update all uses of a
> tag and I think next week or the week after I might have time to
> actually finish that, which sounds like it might be of use here.
>
> Daniel
>
> On Wed, May 5, 2021 at 8:25 AM Hèctor Alòs i Font  
> wrote:
> >
> > Missatge de Anuradha Pandey  del dia dc., 5 de 
> > maig 2021 a les 15:51:
> >>
> >> Hello everyone,
> >> I have been working on a new language pair, and I was having a look at the 
> >> word forms defined in the Hindi paradigms. The "mfn" tag seems suspicious 
> >> for Hindi. It stands for gender-neutral by definition, like "it" in 
> >> English.  Hindi nouns have two grammatical genders: masculine and 
> >> feminine. There is no neutral gender for nouns in Hindi. The mfn tag has 
> >> been used at 3 places -
> >>
> >>  "गलत__adj"
> >> "स/ा__adj"
> >> "एक__det"
> >>
> >> The last paradigm makes sense since a determiner can be gender-neutral. 
> >> However, I was curious about their usage in the case of adjectives. The 
> >> definitions of these have used the "mfn" tag along with the "sp" tag(which 
> >> is wherein singular and plural are equivalent I suppose). I couldn't come 
> >> up with an example where the adjective is gender-neutral and are singular 
> >> and plural are equivalent.
> >
> >
> > Even if the determiner has the same form for both genders, masculine and 
> > feminine, I would expect an "mf" tag, not an "mfn" one.
> > In fact the whole paradigm is quite strange:
> >
> > 
> >> n="sg"/>
> > 
> >
> > So, there is only one single form, just for singular and for the oblique 
> > case, and the order of the tags is not the expected: gender, number and 
> > case (as the adjectives and the nous have).
> >
> > Other paradigms determinants have other unexpected forms, with only one 
> > form and without any gender and/or case tags.
> >
> > This kind of things are unexpected for a released language. If these 
> > paradigms are changed in the Hindi dictionary and the Hindi-Urdu released 
> > pair relies on them, it could not work.
> >
> > Hèctor
> >
> >>
> >>
> >> If someone who has worked with the Hindi dictionary can clarify the logic 
> >> behind using this tag, and give an example for better clarity, it would be 
> >> really helpful.
> >>
> >> Regards,
> >> Anuradha Pandey
> >> IRC: Anuradha_Pandey
> >>
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> Apertium-stuff mailing list
> >> Apertium-stuff@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> >
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Fwd: Apertium simpleton not working on mac

2021-05-03 Thread Jonathan Washington
Hi Olav,

Perhaps someone on the apertium-stuff mailing list can help.  I'm
forwarding your message there.

--
Jonathan

-- Forwarded message --
From: olav lund solheim 
To: apertium-cont...@lists.sourceforge.net
Cc:
Bcc:
Date: Mon, 3 May 2021 12:33:36 +0200
Subject: Apertium simpleton not working on mac
Hi,

I'm currently trying to run Apertium simpleton on my mac. I've installed
all required tools and specific packages for the languages I'm trying to
translate. I've tried a wide range of different languages and the same
error message shows up regardless. I'm currently running macOS Sierra
version 10.12.6 on my 2015 MacBook air.

Error message:
dyld: lazy symbol binding failed: Symbol not found: chkstk_darwin

  Referenced from: /Users/olavsolheim/Library/Application Support/Tino
Didriksen Consult/Apertium
Simpleton/apertium-all-dev/bin/../lib/libpcre.1.dylib

  Expected in: /usr/lib/libSystem.B.dylib



This message shows up five times.

Hope you can help me with this issue as soon as possible.

Regards Olav Solheim
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [Apertium-contact] ES-CA Issue

2021-04-17 Thread Jonathan Washington
Hi Eduard,

Are you using the translation pair locally or via a website?  If the
former, how did you install it?  If the latter, which website?

--
Jonathan

9 apr 2021, C. tarixində 02:59 tarixində Eduard Urgell
 yazdı:
>
> Hi,
>
> Is there any known issue with machine translation for the language pair 
> Spanish-Catalan? I'm trying to translate a document from Spanish into Catalan 
> and it only outputs the same file I upload. Thank you
> ___
> Apertium-contact mailing list
> apertium-cont...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-contact


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC proposal draft - A morphological analyzer and generator for Romeyka

2021-04-17 Thread Jonathan Washington
Hi Utku,

Just to follow up on Hèctor's points, for the coding challenge, we'd
want to see a prototype of the transducer.  It doesn't have to have
any real level of usefulness, but it should solve some of the types of
problems you expect to encounter.  That is, you should be about to
implement a little bit of the morphotactics and morphophonology, even
if for only a couple words.

The same goes for any GSoC student looking to work on a transducer.

I'm also curious about the corpus.  Not just how big it is, but
whether it is available.  It's certainly not a requirement for GSoC
that it be open; if it needs to be kept private, it can be kept
between you and your mentors.  But it would be good to document these
things early on.

--
Jonathan

14 apr 2021, Ç. tarixində 13:50 tarixində Hèctor Alòs i Font
 yazdı:
>
> Hi Utku,
>
> Your proposal seems interesting.
>
> 1. Did you take look to apertium-ell ? How much could it help?
> 2. In your proposal, you speak about a corpus. You intend to reach 80% 
> coverage. From what kind of corpus are you speaking? How much Romeyka is 
> written?
> 3. Could you explain on what you understand by "modelling allomorphy"? Is 
> that Apertium's morphological disambiguation?
> 4. Could you also explain how do you intend to tag "content phenomenon"?
> 5. I couldn't find anything about your coding challenge. The coding challenge 
> is a must. It shows that you know to install and have a basic understanding 
> of Apertium.
>
> Hèctor
>
> Missatge de Utku Turk  del dia dc., 14 d’abr. 2021 a 
> les 15:58:
>>
>> Hi,
>>
>> My name is Utku Türk. I am a linguistics student at Boğaziçi University, 
>> Turkey. I want to attend GSoC with a Romeyka morphological analyzer project.
>>
>> Romeyka is one of the many Modern Greek dialects spoken in Asia Minor. It 
>> has no NLP footprint, and I believe it is an important first step for 
>> Quantitative Language Contact and Dialectology studies. Its morphology and 
>> lexicon are heavily influenced by Ancient Greek, Turkish, and Laz.
>>
>> The following link[1] is my draft for the GSoC proposal. Any feedback is 
>> very much appreciated!
>>
>> [1]: 
>> https://docs.google.com/document/d/1CJrD7TRJvFKKD5qsW_fnLdNbk1iQ2t4_dim3MA3g4_E/edit?usp=sharing
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC proposal draft - Create a usable version of these language pair: English--Igbo

2021-04-13 Thread Jonathan Washington
Hi Okonkwo,

I see you submitted the proposal for improving a morphological
transducer, which is correct.

What I'd really like to see, as a mentor, is that you have an
understanding of how to not just expand the lexicon, but how to expand
the morphology.

Specifically, the analyser should be able to analyse different forms
of a word, for example (just taking examples from Wikipedia), nye and
nyerere, or avóọla and emévọọla, not just vó.

We can help you implement this in lexd.  It would be good if you were
to come on IRC and discuss it with us.

--
Jonathan

13 apr 2021, Ç.a. tarixində 06:49 tarixində Okonkwo Ifeanyichukwu
 yazdı:
>
> Thanks Hèctor for you feedback, I get it now. Let me make changes on it right 
> away.
>
> On Tue, 13 Apr 2021, 10:39 Hèctor Alòs i Font,  wrote:
>>
>> Hi Okonkwo,
>>
>> The problem with this second version of your project is that it seems almost 
>> the same. The Igbo morphology seems to complex for developing a translator 
>> in a single GSoC, so forget about working on a bilingual dictionary. You 
>> should focus on the analyser, so the phases of your project should deal 
>> about which parts of the morphological analyser you implement, and the 
>> number of words you add. Your proposal should prove that you understand the 
>> difficulties of constructing such an analyser. Could you reach a naive 
>> coverage of 85% or 90%? At the end of the project, you could also think 
>> about a morphological disambiguator (maybe during a couple of weeks). At 
>> least, this is my opinion. Someone else can have another.
>>
>> Hèctor
>>
>> Missatge de Okonkwo Ifeanyichukwu  del dia dt., 
>> 13 d’abr. 2021 a les 11:34:
>>>
>>> Thanks, Jonathan, Daniel has converted the existing work to lexd as he 
>>> offered to do it. It is really great this way, I will wait for it to be 
>>> merged in order to pull it.
>>>
>>> Creating a morphological analyser is one of my goals, I don't understand 
>>> when you say I did not mention morphology. Please I need more detail.
>>>
>>> --
>>> Okonkwo
>>>
>>>
>>> On Mon, Apr 12, 2021 at 3:13 PM Jonathan Washington 
>>>  wrote:
>>>>
>>>> Hi Okonkwo,
>>>>
>>>> Thank you for your continued interest in Apertium!
>>>>
>>>> My main comment on this latest version of your proposal is that you
>>>> don't mention morphology.  This should be a main focus of your
>>>> work—not just expanding the lexicon, but making it productive.
>>>>
>>>> Also, given the range of non-suffixational morphology in Igbo, I think
>>>> it might be a good idea to implement the dictionary in lexd instead of
>>>> lexc.  Daniel has offered to help convert your existing work.  What do
>>>> you think?
>>>>
>>>> --
>>>> Jonathan
>>>>
>>>> 12 apr 2021, B.e. tarixində 07:33 tarixində Okonkwo Ifeanyichukwu
>>>>  yazdı:
>>>> >
>>>> > Thanks, Sevilay, Hèctor and  Ngadou Yopa for reviewing my project 
>>>> > proposal. I have taken all the suggestions here into consideration and 
>>>> > made some changes to my proposal. Below is the link to the recent 
>>>> > changes I made.
>>>> >
>>>> > link to the proposal:
>>>> > https://docs.google.com/document/d/1iK_9VTqb5ZHH1bEjl5UAqBP77ijaNKm6p4HIT2JO5qk/edit?usp=sharing
>>>> >
>>>> > Okonkwo
>>>> >
>>>> > On Mon, Apr 12, 2021 at 9:47 AM Ngadou Yopa  
>>>> > wrote:
>>>> >>
>>>> >> Hello Okonkwo,
>>>> >>
>>>> >> I agree with @hectora...@gmail.com. You should probably consider 
>>>> >> rescoping your project to produce a monodix of good quality.
>>>> >> One week is definitely not enough to work on transfer rules.
>>>> >>
>>>> >> Best,
>>>> >> Ngadou Yopa
>>>> >>
>>>> >> On Sat, 10 Apr 2021 at 14:52, Hèctor Alòs i Font  
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi Okonkwo,
>>>> >>>
>>>> >>> My remark is slightly different to Sevilay's. Igbo seems to be a 
>>>> >>> language with quite a complex morphology. Wouldn't it make sense to 
>>>> >>> work just on the morphological analyser ( 
>>>> >>> https://wiki.apertium.org/wiki/Idea

Re: [Apertium-stuff] GSOC proposal draft - Create a usable version of these language pair: English--Igbo

2021-04-12 Thread Jonathan Washington
Hi Okonkwo,

Thank you for your continued interest in Apertium!

My main comment on this latest version of your proposal is that you
don't mention morphology.  This should be a main focus of your
work—not just expanding the lexicon, but making it productive.

Also, given the range of non-suffixational morphology in Igbo, I think
it might be a good idea to implement the dictionary in lexd instead of
lexc.  Daniel has offered to help convert your existing work.  What do
you think?

--
Jonathan

12 apr 2021, B.e. tarixində 07:33 tarixində Okonkwo Ifeanyichukwu
 yazdı:
>
> Thanks, Sevilay, Hèctor and  Ngadou Yopa for reviewing my project proposal. I 
> have taken all the suggestions here into consideration and made some changes 
> to my proposal. Below is the link to the recent changes I made.
>
> link to the proposal:
> https://docs.google.com/document/d/1iK_9VTqb5ZHH1bEjl5UAqBP77ijaNKm6p4HIT2JO5qk/edit?usp=sharing
>
> Okonkwo
>
> On Mon, Apr 12, 2021 at 9:47 AM Ngadou Yopa  wrote:
>>
>> Hello Okonkwo,
>>
>> I agree with @hectora...@gmail.com. You should probably consider rescoping 
>> your project to produce a monodix of good quality.
>> One week is definitely not enough to work on transfer rules.
>>
>> Best,
>> Ngadou Yopa
>>
>> On Sat, 10 Apr 2021 at 14:52, Hèctor Alòs i Font  
>> wrote:
>>>
>>> Hi Okonkwo,
>>>
>>> My remark is slightly different to Sevilay's. Igbo seems to be a language 
>>> with quite a complex morphology. Wouldn't it make sense to work just on the 
>>> morphological analyser ( 
>>> https://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Morphological_analyser
>>>  ) ? Currently, apertium-ibo lexc file has some 200 words and no 
>>> morphotactics. So the analyser should be done from scratch (maybe previous 
>>> work in Apertium on another Niger-Congo language can help a bit).
>>> Otherwise, as Sevilay points, you'll have not enough time to work on 
>>> transfer rules. And transfer between two very distant languages, like Igbo 
>>> and English, is a major challenge. At least one whole GSoC should be 
>>> devoted to it.
>>>
>>> Hèctor
>>>
>>> Missatge de Sevilay Bayatlı  del dia ds., 10 
>>> d’abr. 2021 a les 11:14:


 Hi Okonkwo,

 How many words you will be able to add into the monodix (and how many in 
 the bidix?), and what is your WER goal?

 Do you think 1 week is enough to work on transfer rules?

 Another thing, as I understand from your proposal your main focus on a 
 bilingual dictionary, the monodix needs more focus otherwise you can't 
 good result or you have to work simultaneously.


 Sevilay






 On Fri, Apr 9, 2021 at 12:42 PM Okonkwo Ifeanyichukwu 
  wrote:
>
> My name is Okonkwo Ifeanyichukwu a final year student at the University 
> of Buea. I am interested in participating in GSoC 2021, on the project - 
> "Create a usable version of these language pair: English--Igbo".
>
> I am planning to build English(eng)-Ibo(ibo) MT pair. I have added some 
> ibo words to ibo pair and pull request open. I have done some coding 
> challenge but it still needs improvement. I will upload the translated 
> story to Github with my work on the GitHub repository mentioned in the 
> proposal draft. It would be of great help if I could get some feedback 
> before I make the final submission.
>
>  Link to my proposal draft:
> Proposal Draft
> https://docs.google.com/document/d/1iK_9VTqb5ZHH1bEjl5UAqBP77ijaNKm6p4HIT2JO5qk/edit?usp=sharing
>
> Sincerely,
> Okonkwo
> IRC: Ifeanyi
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Getting Started

2020-12-13 Thread Jonathan Washington
Hi Subham,

I should note there are other ways to contribute to Apertium besides
through language pairs:
https://wiki.apertium.org/wiki/Contributing

You can also check out open issues on any Apertium github repo, and offer
your assistance in fixing the issur:
https://github.com/apertium/

--
Jonathan

On Wed, Dec 9, 2020, 23:56 Shubham Dikshit  wrote:

> Hi Daniel,
> Yes, apart from English I am familiar with Hindi, Bengali, Sanskrit, and
> Punjabi.
> Thank you for the response and the links shared.
>
> Shubham
>
> On Wed, Dec 9, 2020 at 10:33 PM Daniel Swanson 
> wrote:
>
>> Hi Shubham,
>>
>> Are you familiar with any languages besides English?
>>
>> https://wiki.apertium.org/wiki/Apertium_New_Language_Pair_HOWTO
>> https://wiki.apertium.org/wiki/Contributing_to_an_existing_pair
>> Have information on contributing to translation pairs.
>>
>> You can also join us on IRC for quicker responses:
>> https://wiki.apertium.org/wiki/IRC
>>
>> Daniel
>>
>> On Tue, Dec 8, 2020 at 3:48 PM Shubham Dikshit 
>> wrote:
>>
>>> Hi everyone,
>>> I am an NLP and Machine Translation enthusiast who recently started
>>> open-source,
>>> and I would like your guidance on how I should start or what I should do
>>> to contribute in this Apertium open-source community.
>>> Any help would be greatly appreciated as I would like to become a
>>> regular contributor in this group.
>>>
>>> Thank you
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Setting up a new language pair

2020-11-28 Thread Jonathan Washington
Hi Christian,

We're certainly open to collaborations, and would be happy to receive your
application for GSoC if we become a mentoring organisation this year.

Many of the potential Apertium GSoC mentors generally prefer to devote
resources (including mentoring time and GSoC slots) to supporting
minoritised languages where a community benefits, but we are also fairly
open-minded.  One of the criteria that we usually rank above others in
selecting participants in GSoC is if the candidate shows a good
understanding of the project and all its steps, which can be demonstrated
through a coding challenge or existing work on the project.

Feel free to visit the Apertium IRC channel for more informal and real-time
conversations:
http://wiki.apertium.org/wiki/IRC

The Apertium IRC channel is a great place to interact with the community,
get help with setup or any number of formalisms, learn more about GSoC
applications, etc.

--
Jonathan

On Wed, Nov 25, 2020, 17:03 Daniel Swanson 
wrote:

> Hi Christian,
>
> One of the following pages is probably what you're looking for:
> https://wiki.apertium.org/wiki/Apertium_New_Language_Pair_HOWTO
> https://wiki.apertium.org/wiki/How_to_bootstrap_a_new_pair
>
> Daniel
>
> On Wed, Nov 25, 2020 at 3:35 PM Christian Chiarcos via Apertium-stuff <
> apertium-stuff@lists.sourceforge.net> wrote:
>
>> Dear list members,
>>
>> apologies for my ignorance, but where can I find an overview about the
>> minimal requirements to set up a new language pair for Apertium MT? I'm
>> considering to contribute to the development of translation system from
>> Sumerian to English to complement our NMT (and EBMT) baselines (
>> https://github.com/cdli-gh), possibly in the context of the Apertium
>> ecosystem.
>>
>> Also, as we do have lexical resources and core components for morphology
>> and syntactic parsing for Sumerian in place, I was also wondering about
>> suggesting work on such a prototype as a topic for a GSoC project for 2021
>> -- and if there is any interest from the Apertium community in this,
>> possibly as a collaborative effort (meaning joint mentorship) between the
>> Cuneiform DIgital Library Initiative and the Apertium community.
>>
>> Clearly, a disadvantage of any MT system involving Sumerian is that the
>> translation bridges languages that are not closely related by any means,
>> but nevertheless, the relative data sparsity (this is an ancient language
>> after all, even though well-documented) calls to explore directions of
>> symbolic translation as yet another option, at least.
>>
>> Best regards,
>> Christian
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] comments sought on Iraqi Türkman ISO code

2020-10-02 Thread Jonathan Washington
Dear colleagues (apologies for cross-posting),

Sevilay (CCed) and I have submitted an application to the ISO 639-3
registrar for a new three-letter code for Sevilay's native language,
Iraqi Türkman, to be added to the standard:
https://iso639-3.sil.org/request/2020-039

The registration authority is currently accepting comments from the
public (until December 15th), which are taken into consideration when
the decision is made to approve the request or not.  We would like to
ask you to consider submitting a comment.

Because of how the world works, an ISO code is the next step towards
recognition of the existence of the language among academics and
industry.  Hence it is also a major prerequisite for providing access
to language technology, which in turn has the potential to reinforce
continued use and intergenerational transmission of the language.

One concern those reviewing the application might have is the
similarity of the language to other Western Oghuz varieties, like
Turkish and Azerbaycani.  This is a valid concern—there is some level
of mutual intelligibility of the spoken varieties, and many speakers
of Iraqi Türkman do have some level of exposure to Turkish.  However,
the varieties are linguistically rather divergent, and there are
distinct literary traditions.  Furthermore, official classification of
Iraqi Türkman as a dialect of Turkish (i.e., denial of the application
along these lines) runs the risk of denying speakers of Iraqi Türkman
access to materials in their own language, whether already existing or
yet to be created.

Please feel free to contact Sevilay and/or me with any questions about
any of this.

--
Jonathan


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] let's move the mailing lists to sourcehut

2020-09-23 Thread Jonathan Washington
One other question:

Will it be possible to move existing apertium-stuff (and PMC, etc) archives
to the new location?  Or would we be starting over with those archives?

--
Jonathan

On Mon, Sep 21, 2020, 10:15 Francis Tyers  wrote:

> El 2020-09-21 15:07, Tino Didriksen escribió:
> > On Mon, 21 Sep 2020 at 10:10, Kevin Brubeck Unhammer
> >  wrote:
> >
> >> Considering the trouble people have just setting up their own e-mail
> >> server without getting constantly spam-listed by other people's
> >> Gmail
> >> accounts – and the fact that the mailing lists are supposed to be
> >> public
> >> anyway – it'd be nice to have a third party host our mailing
> >> lists.
> >
> > We have at least 2 private mailing lists: PMC and GSoC mentors. The
> > PMC list we absolutely should run ourselves on our own server(s) for
> > confidentiality reasons, and better so that future PMC members can
> > actually refer to the archives. GSoC lists are per year, but no less
> > confidential.
> >
> > So if we're going to run mailman or whatever for those, we might as
> > well run it for all of them - with appropriate public mirrors for the
> > relevant lists. We could even run the Sourcehut software, if it's that
> > good.
> >
> > But yes, it is annoying to run your own MTA. There's unfortunately a
> > lot of email providers that blacklists IPs for rather annoying and bad
> > reasons. But the wiki already sends mail from the apertium.org [1]
> > server, so we know that works.
> >
>
> The Wiki is a single email and often doesn't work.
>
> I think the internal lists don't run mailman, they're just distribution
> lists.
>
> And agree that a lot of email providers are rubbish about people running
> their own MTAs.
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Fixing Phonological Processes

2020-09-15 Thread Jonathan Washington
I ran into this recently too:
https://pytwolc.readthedocs.io/en/latest/index.html

I haven't looked at who wrote it, or looked closely at the whole thing, but
it looks fairly thorough.

--
Jonathan

On Tue, Sep 15, 2020, 10:57 Francis Tyers  wrote:

> El 2020-09-14 07:27, Flammie A Pirinen escribió:
> > On Fri, Sep 11, 2020 at 03:18:44PM +0200, Zanga Chimombo wrote:
> >> Hello again,
> >>
> >> I've had a bit of time to continue looking at this. I've copied over
> >> something from:
> >>
> https://github.com/apertium/apertium-lin/blob/master/apertium-lin.lin.twol
> >>
> >> %{K%}:k <=> :n :0 _ .#. ;
> >>
> >> But it's not working yet and I am not sure how to debug it. Is there
> >> an intro to twol online?
> >
> > I think the historical documents from Xerox at fsmbook.com (click on
> > the
> > newSoftware and agree to the the terms) and the original dissertation
> > by
> > Prof. Koskenniemi
> >  are
> > quite good to understand the backgroudn.
> >
>
> It's also available online here:
>
> https://web.stanford.edu/~laurik/.book2software/twolc.pdf
>
> You can also check out one of my tutorials:
>
> https://ftyers.github.io/morphology/
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Fwd: CFP: Resources and Representations for Under-resourced Languages and Domains

2020-09-05 Thread Jonathan Washington
Apertiumers,

See the following information about a workshop of relevance.

--
Jonathan

=

Dear ALL,
We are organizing RESOURCEFUL-2020 (RESOURCEs and Representations For
Under-resourced Languages and Domains) which will be collocated with the
Eighth Swedish Language Technology Conference (SLTC), organized by the
University of Gothenburg, Sweden on 25th November 2020. The conference and
the workshop will be held online.

The aim of the workshop is to create a forum for researchers in the area of
resource creation and representation learning in limited or low-resource
environments. You can find more details about the questions we would like
to address:
 https://gu-clasp.github.io/resourceful-2020/
Best regards,
Tewodros

On behalf of the organizers:

Tewodros Gebreselassie, University of Gothenburg
Simon Dobnik, University of Gothenburg
Barbara Plank, IT University Copenhagen
Lars Borin, University of Gothenburg

Contact: resourceful2...@easychair.org
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Fixing Phonological Processes

2020-08-29 Thread Jonathan Washington
Hi Zanga,

Given the highly agglutinative nature of Yao morphology, using dix to model
it is probably not a great option.  Also, as you and Hèctor have concluded,
the morphophonology will be much easier to model using twol.

Given the extent to which the morphology involves prefixes, lexc (what we
traditionally use with twol) is probably also a poor choice for modeling
the morphology.  However, lexd was designed as a replacement for lexc for
languages like Yao (and works well with twol).  I think this is the route
you should take.

Documentation is available here:

https://github.com/apertium/lexd/blob/master/Usage.md

Some languages in Apertium whose morphologies are already implemented in
lexd (none are entirely complete yet, but some are pretty far along):

Swahili: https://github.com/apertium/apertium-swa
Lingala: https://github.com/apertium/apertium-lin
Nivkh: https://github.com/apertium/apertium-niv
Wamesa: https://github.com/apertium/apertium-wad

I probably forgot a few, but these should provide good models (and two are
related to Yao).  There are also a couple other languages being developed
using lexd that aren't public (yet).

And of course you can message this list if you have trouble, or ask in real
time in the IRC channel.

--
Jonathan

On Sat, Aug 29, 2020, 02:30 Zanga Chimombo  wrote:

> Yes. I think I should be using twol
>
> On Fri, Aug 28, 2020 at 3:56 PM Hèctor Alòs i Font 
> wrote:
> >
> > I don't think you have to do anything with the modes or the compilation
> file. The problem is in the post-yao.dix file.
> > If you add , it works:
> >
> > 
> >   
> > nk
> > ng
> >   
> >   
> > 
> >
> > $ echo "~nka" | lt-proc -p yao.autopgen.bin
> > nga
> > $ echo "~nkb" | lt-proc -p yao.autopgen.bin
> > nkb
> >
> > I don't know why without  there is no match, but in any case you
> need to add  to the relevant places (words, affixes, etc.) you want to
> trigger this rule. If you want that always nk + vowel should be ng, you
> should this in twol, not here.
> >
> > Hèctor
> >
> > Missatge de Zanga Chimombo  del dia dv., 28 d’ag.
> 2020 a les 15:41:
> >>
> >> I am still not getting anywhere and both modes.xml and the Makefile
> >> seem ok. My code is here:
> >> https://gitlab.com/zangaphee/CiBantu/-/tree/master/twoc/apertium-yao
> >>
> >> On Fri, Aug 28, 2020 at 7:36 AM Hèctor Alòs i Font <
> hectora...@gmail.com> wrote:
> >> >
> >> > The relevant files are modes.xml and Makefile.am I recommend taking a
> look to them in e.g. apertium-fra and apertium-fra-cat (or any other
> released pair using post-generation). In the first one you define the
> pipeline, so copy and adapt the call to autopgen in the end. In the second
> one you have the actual compilation of the programme.
> >> >
> >> > Missatge de Zanga Chimombo  del dia dv., 28
> d’ag. 2020 a les 7:52:
> >> >>
> >> >> Hi again, I actually have:
> >> >>
> >> >> 
> >> >>   
> >> >> nk
> >> >> ng
> >> >>   
> >> >>   
> >> >> 
> >> >>
> >> >> But it doesn't seem to get executed. Is there a missing flag/ switch
> >> >> that I was supposed to initialise/ build with? I am not seeing
> >> >> anything relating to building autopgen in the modes.xml file in the
> >> >> monolingual directory...?
> >> >>
> >> >> On Thu, Aug 27, 2020 at 2:57 PM Hèctor Alòs i Font <
> hectora...@gmail.com> wrote:
> >> >> >
> >> >> > Yes, it is in the monodix. It is just a mark put on the right
> side, e.g.
> >> >> >
> >> >> >   que
> >> >> >   que   que n="itg"/>
> >> >> >
> >> >> > If you want, you may not put it, but if you have in the post-dix
> file something like:
> >> >> >
> >> >> > 
> >> >> >   
> >> >> > nk
> >> >> > ng
> >> >> >   
> >> >> > 
> >> >> >
> >> >> > ... then every nk will be substituted by ng. That is not what you
> want, for sure. So better to put a mark in the dictionnary to know which
> "nk" may be changed (in some contexts) to nk.
> >> >> >
> >> >> > Missatge de Zanga Chimombo  del dia dj., 27
> d’ag. 2020 a les 15:18:
> >> >> >>
> >> >> >> Looking at the examples in apertium-fra.post-fra.dix it is clear
> that
> >> >> >> the tilde/ ~/  is inserted as some sort of marker earlier in
> the
> >> >> >> pipeline so that the PG recognises it and actions on it.
> >> >> >>
> >> >> >> Where in the pipeline is it inserted? Could you give me a line
> number
> >> >> >> of the insertion within the monodix perhaps?
> >> >> >>
> >> >> >> On Thu, Aug 27, 2020 at 12:12 PM Hèctor Alòs i Font
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > You can take a look, for instance to
> https://github.com/apertium/apertium-fra/blob/master/apertium-fra.post-fra.dix
> >> >> >> >
> >> >> >> > For example (at line 633) :
> >> >> >> > nen'
> >> >> >> >
> >> >> >> > Missatge de Hèctor Alòs i Font  del dia
> dj., 27 d’ag. 2020 a les 13:07:
> >> >> >> >>
> >> >> >> >> There two things in:
> >> >> >> >>
> >> >> >> >> 
> >> >> >> >>   
> >> >> >> >> nk
> >> >> >> >>

Re: [Apertium-stuff] Infinite loop in testvoc???

2020-08-16 Thread Jonathan Washington
Gabriel,

Each translation pair has to be developed specially, and takes quite a bit
of time and effort.  Usually developers don't focus their time on pairs
like Yiddish-Klingon or Guaraní-Khakas because of low practical application.

Hèctor suggested beta.apertium.org.  This site has unreleased pairs (ones
that still do not translate cleanly), and also offers a "pivot/chained
translation" feature, where you can translate between two languages which
do not have a developed pair as long as there is some path of pairs between
them.

I hope this clarifies the limitation, and helps you find a work-around.

--
Jonathan

On Sun, Aug 16, 2020, 01:52 Medina, Gabriel  wrote:

> When I pick a language, it only highlights specific ones I can only
> choose, which depends on the language you pick, which makes translating
> languages very limited.
>
> On Sat, Aug 15, 2020 at 3:25 PM Jonathan Washington <
> jonathan.n.washing...@gmail.com> wrote:
>
>> Gabriel,
>>
>> Could you clarify what you mean by "language barriers"?
>>
>> --
>> Jonathan
>>
>> On Sat, Aug 15, 2020, 13:40 Medina, Gabriel  wrote:
>>
>>> Does this have something to do with the language barriers for all
>>> languages on the online translator?
>>>
>>> On Sat, Aug 15, 2020 at 1:03 PM Marc Riera Irigoyen <
>>> marc.riera.irigo...@gmail.com> wrote:
>>>
>>>> I've been able to reproduce the loop and fix it. It was mainly due to
>>>> an unexpected pattern in the testvoc script, but there was also a typo in
>>>> the bidix that contributed to the problem.
>>>>
>>>> 1. The testvoc script did not account for bidix entries with empty
>>>> translations and would add extra slashes in many cases. These are used to
>>>> test multiple translations for a single entry, which is done by an awk
>>>> script in a while loop that could not be escaped. I have fixed the issue
>>>> with the extra slashes and changed the while loop to a for limited to 50
>>>> iterations. This should be enough for any pair and the loop includes a
>>>> condition to escape it before the 50 iterations, so there is no extra
>>>> unnecessary processing. I'll post a pull request directly to the repo with
>>>> the fixes shortly.
>>>> 2. There is an entry in the bidix (and probably Arpitan monodix as
>>>> well, because it generates properly), "Salinas de Gotari", with a line
>>>> break after the last tag. It looks like a typo. This typo appears to be
>>>> valid in Apertium format but the testvoc script assumes an entry per line
>>>> and the double slashes occurred here too. Thanks to the loop limit, testvoc
>>>> doesn't get blocked anymore by this entry (and it doesn't appear in the
>>>> list of errors, because it generates properly), but it should be fixed.
>>>>
>>>> Regards,
>>>>
>>>> *Marc Riera*
>>>>
>>>>
>>>> Missatge de Marc Riera Irigoyen  del
>>>> dia ds., 15 d’ag. 2020 a les 11:53:
>>>>
>>>>> Hello Hèctor,
>>>>>
>>>>> I see that the testvoc script you're using is the one I developed
>>>>> based on previous scripts used in several pairs. It shouldn't be producing
>>>>> a loop and have never found it before. Given that it's happening only when
>>>>> translating from Arpitan to French, I guess there may be something that I
>>>>> didn't account for when developing the script. I'll take a look and try to
>>>>> recreate it.
>>>>>
>>>>> Regards,
>>>>>
>>>>> *Marc Riera*
>>>>>
>>>>>
>>>>> Missatge de Hèctor Alòs i Font  del dia ds., 15
>>>>> d’ag. 2020 a les 10:46:
>>>>>
>>>>>> I am experiencing a very strange behaviour in the fra-frp testvoc.
>>>>>> While there is not any problem in the frp2fra side (the test is finished 
>>>>>> in
>>>>>> less than 30 minutes in my computer), in the fra2frp there is a kind of
>>>>>> infinitive loop. The same fine is again and again created and deleted and
>>>>>> the tesvoc does not end even waiting during more than 24 hours. The file
>>>>>> which is deleted and created again and again (always with the same name)
>>>>>> has exactly the same content. The first lines are:
>>>>>>
>&g

Re: [Apertium-stuff] Infinite loop in testvoc???

2020-08-15 Thread Jonathan Washington
Gabriel,

Could you clarify what you mean by "language barriers"?

--
Jonathan

On Sat, Aug 15, 2020, 13:40 Medina, Gabriel  wrote:

> Does this have something to do with the language barriers for all
> languages on the online translator?
>
> On Sat, Aug 15, 2020 at 1:03 PM Marc Riera Irigoyen <
> marc.riera.irigo...@gmail.com> wrote:
>
>> I've been able to reproduce the loop and fix it. It was mainly due to an
>> unexpected pattern in the testvoc script, but there was also a typo in the
>> bidix that contributed to the problem.
>>
>> 1. The testvoc script did not account for bidix entries with empty
>> translations and would add extra slashes in many cases. These are used to
>> test multiple translations for a single entry, which is done by an awk
>> script in a while loop that could not be escaped. I have fixed the issue
>> with the extra slashes and changed the while loop to a for limited to 50
>> iterations. This should be enough for any pair and the loop includes a
>> condition to escape it before the 50 iterations, so there is no extra
>> unnecessary processing. I'll post a pull request directly to the repo with
>> the fixes shortly.
>> 2. There is an entry in the bidix (and probably Arpitan monodix as well,
>> because it generates properly), "Salinas de Gotari", with a line break
>> after the last tag. It looks like a typo. This typo appears to be valid in
>> Apertium format but the testvoc script assumes an entry per line and the
>> double slashes occurred here too. Thanks to the loop limit, testvoc doesn't
>> get blocked anymore by this entry (and it doesn't appear in the list of
>> errors, because it generates properly), but it should be fixed.
>>
>> Regards,
>>
>> *Marc Riera*
>>
>>
>> Missatge de Marc Riera Irigoyen  del dia
>> ds., 15 d’ag. 2020 a les 11:53:
>>
>>> Hello Hèctor,
>>>
>>> I see that the testvoc script you're using is the one I developed based
>>> on previous scripts used in several pairs. It shouldn't be producing a loop
>>> and have never found it before. Given that it's happening only when
>>> translating from Arpitan to French, I guess there may be something that I
>>> didn't account for when developing the script. I'll take a look and try to
>>> recreate it.
>>>
>>> Regards,
>>>
>>> *Marc Riera*
>>>
>>>
>>> Missatge de Hèctor Alòs i Font  del dia ds., 15
>>> d’ag. 2020 a les 10:46:
>>>
 I am experiencing a very strange behaviour in the fra-frp testvoc.
 While there is not any problem in the frp2fra side (the test is finished in
 less than 30 minutes in my computer), in the fra2frp there is a kind of
 infinitive loop. The same fine is again and again created and deleted and
 the tesvoc does not end even waiting during more than 24 hours. The file
 which is deleted and created again and again (always with the same name)
 has exactly the same content. The first lines are:


 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^frère$]^frère/~/frâre$+^./~/.$

 [\^1er$]^1er/~/1ér$+^./~/.$

 [\^1er$]^1er/~/1ér$+^./~/.$

 [\^1er$]^1er/~/1ér$+^./~/.$

 [\^1er$]^1er/~/1ér$+^./~/.$

 [\^abattu$]^abattu/~/abatu$+^./~/.$

 [\^abattu$]^abattu/~/dèfêt$+^./~/.$

 [\^abattu$]^abattu/~/dèchesu$+^./~/.$

 [\^abattu$]^abattu/~/abatu$+^./~/.$

 I have never seen such a thing before and I cannot imagine what can
 cause this behaviour. Any ideas?

 Hèctor
 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

>>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Fwd: An interesting on-line (and nearly free) summer school: WeSSLI

2020-06-27 Thread Jonathan Washington
Dear Apertium community,

Here are some upcoming courses/workshops that might be of interest to
some of you.  (Also, the programme is hosted by my alma mater!)

--
Jonathan

-- Forwarded message -
From: Rikker Dockum 
Date: Fri, Jun 26, 2020 at 4:42 PM
Subject: An interesting on-line (and nearly free) summer school: WeSSLI


Hi all,
Passing along info about an upcoming on-line summer school that costs
$10 to register for:

WeSSLLI, the Web Summer School in Logic, Language and Information is
the on-line version of NASSLI, the North American Summer School in
Logic, Language and Information (which was delayed until summer of
2021).

The offerings are more limited than the in-person event would have
been, but there still going to be five courses, all of which look like
they will be very interesting (and taught by excellent scholars):

• Robin Cooper, Staffan Larsson - Modelling Linguistic
Communication Using Types

• Elizabeth Coppock - Cross-linguistic Variation in Degree Semantics

• Shalom Lappin - Deep Learning and the Nature of Linguistic
Representation

• Natasha Korotkova - The Notional Category of Evidentiality

• Larry Moss - Workshop: Natural Logic Meets Machine Learning

These events will be running the week starting July 11th, and will
happen five times (or so) each. For more info, see the website
.  Registration is $10!


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-14 Thread Jonathan Washington
Hi all,

I think Tanmai's decision to focus on the part of this proposal that no one
seems to disagree about is the right approach for now.  Opinions about how
to transport other information through the pipeline were getting strong,
and feelings were getting hurt.

That said, I think we should keep this conversation open, and not simply
archive it.  But it should be discussed with less urgency (my apologies for
raising the urgency of the question—I see now that that probably made the
situation worse).  I believe that the community can come to consensus on an
approach, but it may take a while.  One more or less active core developer
or language developer objecting strongly to any given approach is enough to
tell us we have to think a lot harder about things.  It would be great if
we could keep it constructive and respectful, and not give up on the
conversation just because an expressed view isn't being well received.  The
fact that the view is being expressed alone is not enough either, of
course—this question can only be resolved through sustained good-faith
argumentation, and that will take some energy and commitment by invested
parties.

--
Jonathan

14 iyn 2020, B. tarixində 09:08 tarixində Tanmai Khanna <
khanna.tan...@gmail.com> yazdı:

> No there’s no new election happening, least of all over a design choice in
> my project. We’re an open source community with experienced people and it’s
> surprising to me that we’re having such a hard time agreeing upon this.
>
> Francis is one of our most experienced members and the PMC president and
> as such does have an important say in design decisions and major directions
> that Apertium takes. There have been multiple proposals and Francis has
> made clear his objections. If at this point he doesn’t like it then we
> should stop proposing a design again and again. Keep in mind I would also
> say the same for any other experienced member who really objected to a
> design choice.
>
> Francis sees the need for wordbound blanks for markup handling and that
> was a major part of the original proposal so as of now we will go ahead
> with that, and before implementing anything, I will analyse what wikimedia
> needs in terms of markup handling and how we’re planning to sort that out.
>
> We will archive the secondary tags discussion for now and once this is
> done I will make a list of people’s needs for reading bound information and
> we will analyse how best to address those needs.
>
> I think that due to the open nature of this solution we jumped the gun on
> the analysis part and instead of focusing on a problem to solve we started
> building up our solution based on problems we could face in the future.
>
> Point is, all this conflict isn’t really worth it. I’ve always been of the
> opinion that even if one person in the PMC objects to something it
> shouldn’t happen, and I’m sure a group of experts and I can find something
> we all agree on as a design decision.
>
> Peace ✌🏼
>
> Tanmai Khanna
>
> Sent from my iPhone
>
> On 14-Jun-2020, at 18:02, Samuel Sloniker  wrote:
>
> 
> If we do have a new election, can we vote on the new bylaws first, so we
> use STV?
>
> On Sun, Jun 14, 2020, 04:01 Francis Tyers  wrote:
>
>> El 2020-06-14 11:51, Hèctor Alòs i Font escribió:
>> > Missatge de Francis Tyers  del dia dg., 14 de
>> > juny 2020 a les 10:32:
>> >
>> >> El 2020-06-13 23:18, Jonathan Washington escribió:
>> >>> On Sat, Jun 13, 2020, 16:05 Francis Tyers 
>> >> wrote:
>> >>>
>> >>>> El 2020-06-13 19:31, Xavi Ivars escribió:
>> >>>>> Before anything, let me say that I like the proposal to enhance
>> >>>> the
>> >>>>> pipeline with more data (including, but not limited to the
>> >> surface
>> >>>>> forms), to be able to do properly do things that currently we're
>> >>>> doing
>> >>>>> in vry hacky (to me) and definitely non-linguistic ways
>> >>>>>
>> >>>>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
>> >>>>>> spa-morph
>> >>>>>> ^El/el$
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> ^mango/mango/mangar/MANGO_FRUTA$^./.$
>> >>>>>
>> >>>>> In this example, we "add" semantic information to the pipeline
>> >>>> (and
>> >>>>> disambiguate via CG3) by creating a "fake lemma" needed for
>> >>>> SPA-CAT,
>> >>>>> beca

Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-14 Thread Jonathan Washington
I could see another way to treat cases like mango¹/mango² or ат¹/ат².

If we were to eventually have a module that holds other arbitrary
information through the pipeline, you could have tags added in the
transducer that are immediately offloaded to the arbitrary information
storage, and are then accessible to disambiguation, lexical selection,
and bidix.

For example, you could have mango[sem:fruit] and
mango[sem:handle] (or whatever) returned by the transducer, with
the second part picked off by another module and sent through the
pipeline in some other format.

This is just me thinking out loud.

--
Jonathan

14 iyn 2020, B. tarixində 13:45 tarixində Francis Tyers
 yazdı:
>
> El 2020-06-14 11:51, Hèctor Alòs i Font escribió:
> > Missatge de Francis Tyers  del dia dg., 14 de
> > juny 2020 a les 10:32:
> >
> >> El 2020-06-13 23:18, Jonathan Washington escribió:
> >>> On Sat, Jun 13, 2020, 16:05 Francis Tyers 
> >> wrote:
> >>>
> >>>> El 2020-06-13 19:31, Xavi Ivars escribió:
> >>>>> Before anything, let me say that I like the proposal to enhance
> >>>> the
> >>>>> pipeline with more data (including, but not limited to the
> >> surface
> >>>>> forms), to be able to do properly do things that currently we're
> >>>> doing
> >>>>> in vry hacky (to me) and definitely non-linguistic ways
> >>>>>
> >>>>>> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
> >>>>>> spa-morph
> >>>>>> ^El/el$
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > ^mango/mango/mangar/MANGO_FRUTA$^./.$
> >>>>>
> >>>>> In this example, we "add" semantic information to the pipeline
> >>>> (and
> >>>>> disambiguate via CG3) by creating a "fake lemma" needed for
> >>>> SPA-CAT,
> >>>>> because "mango" (pan stick) and "mango_fruta" are
> >> translated
> >>>>> differently in Catalan. But this, in turn, forces every other
> >>>> language
> >>>>> pair using Spanish to know about "mango_fruta" even if the
> >>>>> translation was the same as "mango".
> >>>>>
> >>>>
> >>>> What is the problem here? That "mango" has two possible lemmas
> >> and
> >>>> paradigms
> >>>> in Spanish?
> >>>>
> >>>> The way that I've treated that is to have mango¹ and mango²,
> >> like
> >>>> in a
> >>>> traditional dictionary. I don't think that this requires any
> >> further
> >>>
> >>>> information.
> >>>
> >>> I think Xavi's point is that there are a number of ways to
> >> approach
> >>> this, and having the option of another stream to put this extra
> >>> information could be one of them.  Imho, it is nicer in many ways
> >> than
> >>> even having (very arbitrary) superscripts (that aren't really any
> >>> better to have in a morphological analysis than _fruta).
> >>>
> >>
> >> It's following what the lexicographers do:
> >>
> >> https://dle.rae.es/?w=mango
> >>
> >> So it's following a fairly established practice.
> >>
> >> Fran
> >
> > As far as I understand the mango's issue, Xavi is contemplating the
> > possibility of a semantic module which would add extra information
> > that may be used by other models (especially by the lexical selection
> > one) to add information about "mango". This could be used for
> > distinguishing between a handle or a fruit, but in fact not only.
> > "Mango" can be the fruit and the plant. One could eventually add what
> > kind of handle it is, e.g. in the RAE dictionary provided by Fran's
> > the handle of a knife is specifically distinguished among other
> > handles. As Xavi shows, this extra information could be added so that
> > it can be ignored by pairs who don't need it. It seems clear that the
> > solution based on being able to add any additional secondary
> > information is more versatile, instead of "_fruta", "_2" and the like.
> >
> > Moreover, in the lexical selection we have lots of lists like "fruit",
> > "building", "person", "devic

Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-13 Thread Jonathan Washington
On Sat, Jun 13, 2020, 16:05 Francis Tyers  wrote:

> El 2020-06-13 19:31, Xavi Ivars escribió:
> > Before anything, let me say that I like the proposal to enhance the
> > pipeline with more data (including, but not limited to the surface
> > forms), to be able to do properly do things that currently we're doing
> > in vry hacky (to me) and definitely non-linguistic ways
> >
> >> xavi@dell:~/src/apertium-spa$ echo "El mango" | apertium -d .
> >> spa-morph
> >> ^El/el$
> >>
> >
> ^mango/mango/mangar/MANGO_FRUTA$^./.$
> >
> > In this example, we "add" semantic information to the pipeline (and
> > disambiguate via CG3) by creating a "fake lemma" needed for SPA-CAT,
> > because "mango" (pan stick) and "mango_fruta" are translated
> > differently in Catalan. But this, in turn, forces every other language
> > pair using Spanish to know about "mango_fruta" even if the
> > translation was the same as "mango".
> >
>
> What is the problem here? That "mango" has two possible lemmas and
> paradigms
>   in Spanish?
>
> The way that I've treated that is to have mango¹ and mango², like in a
> traditional dictionary. I don't think that this requires any further

information.
>

I think Xavi's point is that there are a number of ways to approach this,
and having the option of another stream to put this extra information could
be one of them.  Imho, it is nicer in many ways than even having (very
arbitrary) superscripts (that aren't really any better to have in a
morphological analysis than _fruta).

--
Jonathan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-13 Thread Jonathan Washington
13 iyn 2020, Ş. tarixində 14:53 tarixində Francis Tyers
 yazdı:
>
> So when I go and try it out on an actual pair, instead of toy examples
> with xyz and abc really obvious issues come up that should have been
> dealt with in the original proposal.

Tanmai sent suggestions for addressing the points you raised.  Could
you engage with those?

--
Jonathan


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-13 Thread Jonathan Washington
13 iyn 2020, Ş. tarixində 13:15 tarixində  yazdı:
>
> Tino Didriksen wrote:
> > On Sat, 13 Jun 2020 at 17:50, Francis Tyers  wrote:
> >
> > > As far as I understand the objective is to be able
> > > to
> > > put the original surface form in the output translation as an unknown
> > > token
> > > instead of the lemma.
> > >
> > > ...
> > >
> > > I think that the appropriate way to deal with this is by coming up with
> > > a
> > > clear plan for the linguistic eventualities. I don't see that in the
> > > current
> > > proposal. I have been showing Tanmai through the creation of a new MT
> > > system,
> > > and we have been documenting these issues as they arise. I don't think
> > > it makes
> > > sense to start development before they have been resolved.
> > >
> >
> >
> > Those are important issues, but they're orthogonal to how to transport
> > secondary information through the pipe.
>
> > Even at the earliest stages of the proposal, it was expanded to be
> > 1) Get secondary tags through the pipe. 2) Use that ability to
> > eliminate trimming. 3) Use the same ability for a myriad of other
> > things, such as markup handling.
>
> If I understand the issue correctly, it isn't clear yet that (2) as
> phrased is possible.
>
> *Is there* an answer yet to what we *want* to happen in
>
> > such as what should happen to secondary information when tokens are
> > merged/split.
>
> ?

I agree.  A proposed solution to the issues Fran raises need to be
part of the proposal for transport format.  The issues are too closely
intertwined.

> Not about the algorithm or implementation or anything. Just, what
> would we like the result to be?
>
> > We need to implement and solve #1 first - be able to transport (and
> > potentially manipulate) any amount of data that might be needed to
> > solve #2 and #3 and ... #9.
>
> I don't think it makes sense to mandate a mechanism we aren't
> convinced will work...

This has never been suggested as a mandate.  Whatever the approach to
the issues at hand, the proposal is for an extra feature that
translation pair developers may decide to use or ignore as they see
fit.

I want to also highlight Tino's point about urgency.  This is part of
an active GSoC project, and that project needs to move forward.  That
doesn't mean that this discussion shouldn't be allowed to take its
time, but we really do need to find a path forward.

--
Jonathan


> Cheers,
> Nick


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-13 Thread Jonathan Washington
On Sat, Jun 13, 2020, 11:50 Francis Tyers  wrote:

> El 2020-06-13 15:20, Tino Didriksen escribió:
> > I would like everyone to read and seriously consider this thread and
> > give your opinion. This meanders a bit, so please read it all.
> >
>
> Here is a non-exhaustive list of potential pitfalls of using the
> "surface
> form is a tag" thing. As far as I understand the objective is to be able
> to
> put the original surface form in the output translation as an unknown
> token
> instead of the lemma.
>

Could you provide disambiguated analyses of each of these?  It's hard to
picture what the problem is for people who can't at least do the relevant
tokenisation in their heads.  (I'm not familiar with the example in (2)—I
imagine other people are similarly uninformed about the other examples.)

--
Jonathan

0) languages without spaces in the writing system:
>
> what is a surface form here? is it just the longest token matched?
>
> 1) compounds
>
> i)  infrastruktuurontwikkelingsplan, does each part of the compound get
>  the surface form tag? if so, one happens if one part of the compound
>  is translated but the other parts aren't, e.g. would you get
>  *infrastruktuurontwikkelingsplan *infrastruktuurontwikkelingsplan
> plan?
>
> 2) contractions
>
> i)  chawe - if you attach the surface form to both and both are unknown,
> do
>  you get both in the output? if you only attach it to one, which one
> do you
>  attach it to, where is that decision made?
>
> ii) dárselo - if you attach the surface form to the clitic pronouns in
> addition
> to the verb, what happens if the verb is not in the dictionary but
> the clitic
> pronouns are? do you get the surface form and the translations in the
> output?
>
> I think that the appropriate way to deal with this is by coming up with
> a
> clear plan for the linguistic eventualities. I don't see that in the
> current
> proposal. I have been showing Tanmai through the creation of a new MT
> system,
> and we have been documenting these issues as they arise. I don't think
> it makes
> sense to start development before they have been resolved.
>
> Fran
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-13 Thread Jonathan Washington
On Sat, Jun 13, 2020, 10:20 Tino Didriksen  wrote:

>
> Yes, inline secondary data is linguistically impure. I recognize this. I
> still think it's worth it, and is the best way to do it.
>

This is the main point against secondary tags that Tino provided.  As
someone who cares deeply about linguistics, I have no problem with
secondary tags on these grounds.

That is, I don't see "purity" as a legitimate concern—we do all sorts of
things with Apertium that are not ideal from a linguistic point of view,
but if it works broadly, I don't mind.

There are legitimate concerns with secondary tags that do need to be
discussed.

I'd also like to mention here that we have something like secondary tags
peripheral to the pipeline already, cf.

apertium-kaz-kir$ echo "Оқу инемен құдық қазғандай." | apertium -d .
kaz-kir-disam
^Оқы/¬Оқу+е/¬Оқу+е$
^ине$
^құдық/¬құдық/¬құдық+е/¬құдық+е$
^қаз$^.$^.$

--
Jonathan

>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium's Wider Use & Secondary Tags

2020-06-13 Thread Jonathan Washington
Fran,

Could you restate your objections and concerns for the benefit of this
thread?

--
Jonathan


On Sat, Jun 13, 2020, 10:43 Francis Tyers  wrote:

> El 2020-06-13 15:20, Tino Didriksen escribió:
> > I would like everyone to read and seriously consider this thread and
> > give your opinion. This meanders a bit, so please read it all.
> >
>
> This email does not do credit to my objections and concerns with the
> proposal.
>
> Fran
>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] autonyms for Apertium languages and variants

2020-06-04 Thread Jonathan Washington
Dear Apertium contributors,

Now that https://beta.apertium.org/ is up, we find that there are a
lot of languages that do not have a "fallback" for most localisations.
By "fallback", I just mean the names of the languages in the languages
themselves.  This is what will be seen by most users in most languages
for most language and variant names, since full localisation of every
language name in every language is near impossible.¹

There is an issue open for this on GitHub:
https://github.com/apertium/apertium-apy/issues/152

If you can contribute correct names for these languages and variants,
you may add them to the issue (and I will add them later), or you may
add them to the codebase directly (or by PR).

Somewhat less importantly, if you can contribute localisations in
other languages for these (or anything else missing), please consider
doing that as well.  Also important is localising the website
interface:
https://github.com/apertium/apertium-html-tools/tree/master/assets/strings

¹ Note that a number of these languages and variants have English
localisations already, so depending on your browser's locale, you may
end up not seeing some of these as "missing".

--
Jonathan


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium Beta Portal

2020-06-02 Thread Jonathan Washington
Hi Mansur,

I believe turkic.apertium.org is up to date with Tino's latest
additions to Apertium nightly packaging.  Tatar-English also ~works
(to the extent that it's been developed).  Tatar-Turkish I do not
believe is an available pair.  Is this pair that has received some
RBMT attention?

--
Jonathan

31 may 2020, B. tarixində 06:43 tarixində mansur <6688...@gmail.com> yazdı:
>
> Hi!
>
> Jonathan, http://turkic.apertium.org/ looks great. As I remember earlier it 
> used to have old, outdated versions of packages. That's why I didn't use it. 
> Great that you updated it and I hope you are going to keep it up to date :) 
> By the way, Tatar-Turkish and Tatar-English translation directions doesn't 
> work at all there.
>
> Bernard, thank you for your reply. I am sure there would't be many users of 
> Tatar related translator taking in account that Google Translate supports 
> Tatar language now and its quality is very good. So don't waste your time :) 
> But in your site I liked morphological analyzer form, it is very convenient 
> and I will add a link to it in my site.
>
> Thank you!
>
> With best regards,
> Mansur
>
> Am So., 31. Mai 2020 um 02:20 Uhr schrieb Bernard Chardonneau 
> :
>>
>> > Date: Sat, 30 May 2020 18:02:29 +0300
>> > From: mansur <6688...@gmail.com>
>> > To: apertium-stuff@lists.sourceforge.net
>> > Reply-To: apertium-stuff@lists.sourceforge.net
>> > Subject: Re: [Apertium-stuff] Apertium Beta Portal
>> > Pièce(s) jointes(s) probable(s)>
>> > Looks great, but I have some problems:
>> >
>> > 1. I chose translation from Tatar to any of available languages, but it
>> > doesn't do anything after clicking "Translate":
>> > http://apertiumtrad.tuxfamily.org/tradtexte.php
>>
>> The problem is Apertium tools use systems libraries and newer are the 
>> Apertium
>> tools versions, newer are the system libraries versions required.
>>
>> Secondly, I don't have root access on the server, so, to prepare the server,
>> several things that can be done easily by a apt-get for instance were done
>> differently outside /usr directory
>>
>> It was difficult but possible for several things like autotools and 
>> pkg-config,
>> but for several systems libraries, it does not work if they are outside their
>> normal place, generally somewhere under /usr and the guy who created 
>> tuxfamily
>> service was not pleased when I asked for extra system libraries.
>>
>> He proposed me to install a complete system and to do a chroot. That may be 
>> the
>> solution to allow more language pairs working.
>>
>> >
>> > 2. Is it possible to choose translation direction/settings in the url, for
>> > example, something like ".../tradtexte.php?from=tatar&to=bashkir&direct". I
>> > want to preselect these settings for convenience of our corpus' users.
>> >
>>
>> Yes, there are two ways to do that.
>>
>> 1) Using language names :
>> https://apertiumtrad.tuxfamily.org/tradtexte.php?mode_sel=L&trad_directe=yes&lg_source=tat&lg_cible=bak
>> mode_sel=L (the l can be lowercase) is the default value if you did not 
>> change it previously
>>
>> 2) Using translation direction
>> https://apertiumtrad.tuxfamily.org/tradtexte.php?mode_sel=s&senstrad1=tat-bak
>>
>> The complete list of possible get parameters is there :
>> https://apertiumtrad.tuxfamily.org/doc_param_get.php
>>
>> And after setting once your translation parameters,  you can also look at
>> cookies to see values of these parameters.
>> https://apertiumtrad.tuxfamily.org/cookies.php
>>
>> O avoid a very large number of language choices, you can also select a subset
>> of languages using user preferences. And then, depending if you keep cookies
>> after closing your web browser or not, you can save that on cookies or using
>> an account on apertiumtrad website.
>>
>> If you just need the web interface but you have your language pair installed
>> locally, there is a very similar website to download? You need a php server
>> but no database.
>>
>> So the only problem will be to update apertium tools in order to make more
>> language pairs working. If I am not the only user of this website in the
>> world, that me be useful to spend time for that.
>>
>>
>> 
>> Bernard Chardonneau (France)
>> Phone : [33] 9 72 36 32 90
>> GSM phone : [33] 7 69 46 16 31
>>
>> An alternative Apertium translation website :
>> http://apertiumtrad.tuxfamily.org
>>
>> Multilingual websites for my free softwares :
>> http://libremail.free.fr and http://libremail.tuxfamily.org
>> http://cyloop.tuxfamily.org (mainly translated with Apertium)
>>
>> My general website (in french only)
>> http://bech.free.fr
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Apertium Beta Portal

2020-05-30 Thread Jonathan Washington
Hi Mansur,

If you're looking to use bleeding-edge Tatar tools from Apertium, you
can also try http://turkic.apertium.org .  I just brought it
up-to-date with nightly packaging.

Note that there are a few Turkic modules that haven't made it into
nightly packaging yet, and there are a few issues with localisation.
The former is part of what Tino is currently working on.  So things
will probably be improving rather quickly going forward.  But do feel
free to continue to report issues you notice.  (Some things might be
more quickly or easily addressed if communicated on IRC or via
GitHub.)

--
Jonathan

30 may 2020, Ş. tarixində 11:04 tarixində mansur <6688...@gmail.com> yazdı:

>
> Thank you, Tino!
>
> Am Sa., 30. Mai 2020 um 18:02 Uhr schrieb mansur <6688...@gmail.com>:
>>
>> Looks great, but I have some problems:
>>
>> 1. I chose translation from Tatar to any of available languages, but it 
>> doesn't do anything after clicking "Translate":
>> http://apertiumtrad.tuxfamily.org/tradtexte.php
>>
>> 2. Is it possible to choose translation direction/settings in the url, for 
>> example, something like ".../tradtexte.php?from=tatar&to=bashkir&direct". I 
>> want to preselect these settings for convenience of our corpus' users.
>>
>> Am Sa., 30. Mai 2020 um 17:32 Uhr schrieb Hèctor Alòs i Font 
>> :
>>>
>>> Sorry, I forgot to copy it: 
>>> http://apertiumtrad.tuxfamily.org/index.php?lang=eng
>>>
>>> Missatge de mansur <6688...@gmail.com> del dia ds., 30 de maig 2020 a les 
>>> 17:10:

 Hi, Hèctor!

 Sounds interesting! Could you, please, provide a link to @Bernard 
 Chardonneau's portal?

 Best,
 Mansur

 Am Sa., 30. Mai 2020 um 17:06 Uhr schrieb Hèctor Alòs i Font 
 :
>
> Hi, Mansur!
> As an alternative, @Bernard Chardonneau's site seems much more reliable, 
> although it doesn't have all the features that supposedly provides Beta 
> Apertium. I was very enthusiastic with Beta Apertium, but I ceased use it 
> because it doesn't work.
> Best,
> Hèctor
>
> Missatge de mansur <6688...@gmail.com> del dia ds., 30 de maig 2020 a les 
> 16:51:
>>
>> Hey!
>>
>> It turned out, the Apertium Beta portal stopped working for some reason:
>> http://beta.apertium.org/
>> If I use httpS it redirects to the wiki page.
>>
>> Will it be fixed sometime soon? If not, what should we use that includes 
>> beta features?
>>
>> With best regards,
>> Mansur
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Nightly Batch Add & Edit

2020-05-29 Thread Jonathan Washington
29 may 2020, C. tarixində 15:47 tarixində Tino Didriksen
 yazdı:
>
> Good question. I don't know.
>
> But going by 
> https://github.com/apertium/apertium-all/commit/5764549df9f0d550cdcae56541cc5a1585742b9a
>  the updater picks and propagates force-pushes correctly. apertium-ain was 
> thoroughly rewritten.

But apertium-ain wasn't on your list of pairs changed.  So now I'm
confused about the lists you sent and what they mean.

--
Jonathan


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] ADDCOHORT in Constraint Grammar

2020-05-28 Thread Jonathan Washington
Xavi,

One of the two uses of apertium-separable (what we're currently
calling "multiword disassembly", with a mode name of "revautoseq") is
to expand smaller units into individual multi-token units.

Since lsx compiles into an FST, we simply reverse the labels (just
like the relationship between morphological analyser/generator FSTs),
and use the output module between structural transfer and before
generation.  In a case like this, I could see an argument for wanting
it happen before structural transfer.

--
Jonathan


28 may 2020, C.a. tarixində 18:45 tarixində Xavi Ivars
 yazdı:
>
> The reason I was asking was exactly because of that: we're not trying to 
> rewrite multi-tokens into smaller units but the opposite: expand smaller 
> units into multiple ones.
>
> But just to make sure: not because I thought it doesn't belong there, but 
> because I really don't know what they're actual scope of separable is (except 
> of having used it for a few phrasal verbs in eng-cat)
>
>
> --
> Xavi Ivars
> < http://xavi.ivars.me >
>
> El dv., 29 de maig 2020, 0:39, Francis Tyers  va 
> escriure:
>>
>> El 2020-05-28 23:12, Xavi Ivars escribió:
>> > How would this fit in apertium-separable?
>> >
>> > As far as I know the goal of apertium separable is to handle
>> > multi-words in a better way than in the monodixes.
>> >
>> > I totally get (and totally agree) that we should put in transfer only
>> > stuff that is really about transfer between both languages and we
>> > don't want to abuse it... But is that a good enough reason to abuse
>> > another module? Or may it be the case that apertium-separable should
>> > handle a broader set of use cases (and probably change its name)?
>> >
>> > --
>> > Xavi Ivars
>> > < http://xavi.ivars.me >
>> >
>> > El dj., 28 de maig 2020, 16:14, Jonathan Washington
>> >  va escriure:
>> >
>> >> This could definitely be done in apertium-separable.  That would be
>> >> by far the most straightforward way to solve this problem.  And if
>> >> you did it as a language-specific lsx file as has been being
>> >> discussed recently, it would serve the purpose you describe.
>> >>
>> >> Don't treat it as a structural transfer issue.  The less lexical
>> >> stuff in transfer the better.
>> >>
>> >> --
>> >> Jonathan
>> >>
>> >> On Thu, May 28, 2020, 06:47 Jaume Ortolà i Font
>> >>  wrote:
>> >>
>> >> Isn't this something that should go in transfer,
>> >>
>> >> Dropping this "que" is possible in Spanish, but it is not regular
>> >> syntax, it is a mannerism used in bureaucratic jargon. The regular
>> >> syntax is with "que". It makes sense to add it, so all language
>> >> pairs can translate as usual.
>> >>
>> >> Transfer is extremely annoying for this kind of things, in my
>> >> experience.
>> >>
>> >> or you could
>> >> use apertium-separable for it?
>> >>
>> >> Probably yes. We are not using apertium-separable in spa-cat, and it
>> >> will be useful to do it.
>> >>
>>
>> I think it fits better in separable (rewriting multitokens into smaller
>> units) than in CG (disambiguation).
>>
>> Another place could be in the bilingual dictionary, a special tag or a
>> special
>> lexeme, marking the missing que on the target side.
>>
>> e.g. in the monodix you could have:
>>
>> rogar¹:pregar
>> rogar²:pregar
>>
>>
>> Then in the bidix:
>>
>> rogar¹:pregar
>> rogar²:pregar# que
>>
>> Then if a clean -cat was desired, a transfer rule could just insert a
>> que
>> when lemq was "# que".
>>
>> In fact, if we consider this stylism to be a different lexeme it kind of
>> makes sense.
>>
>> Fran
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] ADDCOHORT in Constraint Grammar

2020-05-28 Thread Jonathan Washington
This could definitely be done in apertium-separable.  That would be by far
the most straightforward way to solve this problem.  And if you did it as a
language-specific lsx file as has been being discussed recently, it would
serve the purpose you describe.

Don't treat it as a structural transfer issue.  The less lexical stuff in
transfer the better.

--
Jonathan

On Thu, May 28, 2020, 06:47 Jaume Ortolà i Font 
wrote:

> Isn't this something that should go in transfer,
>
>
> Dropping this "que" is possible in Spanish, but it is not regular syntax,
> it is a mannerism used in bureaucratic jargon. The regular syntax is with
> "que". It makes sense to add it, so all language pairs can translate as
> usual.
>
> Transfer is extremely annoying for this kind of things, in my experience.
>
>
>> or you could
>> use apertium-separable for it?
>>
>
> Probably yes. We are not using apertium-separable in spa-cat, and it will
> be useful to do it.
>
> Jaume
>
> Fran
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Jonathan Washington
Hi all,

After having read through and thought some on this thread, I have some
responses.

First of all, I don't care what the "default" is (i.e., whatever
apertium-init creates without flags), as long as there remains choice.  A
lot of pairs already have things set up in different ways, and I see no
problem with allowing for more variation.  So as long as everything is
backwards-compatible and nothing is affected by these changes that doesn't
want to be, then everything is fine.  One way to keep things this way is to
provide a module to allow the injection of secondary tags from surface
forms and superblanks *after* analysis, and keep secondary tag code out of
the transducer processors.

I believe Daniel's proposal for apertium-separable trimming allows for
another nice compromise.  I was skeptical of this as it was being discussed
on IRC, but Daniel's explanation in this thread clarified things (I often
engage with IRC these days while dealing with small children, and can't
necessarily follow everything as closely as I might like to...).

The one difficulty with this approach is that MWEs really do need to be
offloaded to lsx files, and those really do then need to be part of
language modules, not translation pairs.  Lsx dictionaries being part of
language modules is something I've wondered about from the start, but the
choice of which MWEs should be included is pair-specific, so it was decided
they should be part of translation pairs.  If we trim against the bidix the
same way we've been trimming the monodix (and forgo trimming the monodix),
then I think we might be able to have our cake and eat it too.

In short, it allows us to control the MWEs we use for a given language pair
by simply having them in the bidix.  With weighting of the monodix against
the bidix and not trimming the monodix, we can also have forms not in the
bidix still benefit various stages of translation without "wrong" analyses
(more often, beneficial to some uses but not necessarily a given
translation pair) interfering with tokenisation.  We just have to offload
all MWEs (most entries with spaces) from the monodix to the lsx file.

Along those lines, I'm ready to implement this for the Kazakh-Kyrgyz pair
(which is at a "staging" level of development).  What will need to be done:

- Disable trimming of monodixes,
- Enable weighting of monodixes against bidix,
- Move lsx files to respective monolingual modules,
- Merge apertium-eng-kir's kir.lsx file into the Kyrgyz monolingual lsx
file (and do remaining steps for eng-kir too),
- Move all (or most) MWEs from monolingual modules to monolingual lsx
files.  Probably for now add a comment like "Use/MWE" to the "moved"
entries in the monolingual dictionary and grep those lines out at compile
time.  This is important at least for the Kazakh transducer, which is used
in two released pairs and a number of other developed tools.

One challenge I can see with automating the moving of "MWEs" (defined here
as open-category words that have spaces) is that because of the nature of
MWEs, a number of them in any given language have elements that aren't used
elsewhere in the language, and so won't otherwise receive analyses.  My
current understanding is that if there's not an obvious way (or need) to
handle these in lsx, there should be no problem with leaving them in the
monolingual dictionaries.  These are not the forms that could cause "take
precautions" problems anyway.

After conducting this change, when secondary tags become available, things
would be set up to begin to leverage them.

--
Jonathan

On Tue, May 26, 2020, 07:27 Kevin Brubeck Unhammer 
wrote:

> Xavi Ivars  čálii:
>
> > * In the trimming disadvantages number 1, we're stating that we're OK
> > having crappy monodixes because we *fix* that later on with trimming. I'm
> > sure that's where we are now, but as a project that focuses a lot on
> > provided free (as in speech) language resources that are later used for
> > many other use cases, I don't feel comfortable with that status. I think
> we
> > should aim to have as correct as possible dictionaries. And if we did
> that,
> > disadvantage number 1 would be smaller (even if not disappearing
> > completely).
>
> This point seems like distraction. No one puts errors in monodix on
> purpose. We do fix errors in monodix (when we find them, and have
> time). When we use monodix for other tasks than MT, we find and fix even
> more. On the other hand, there's no point in manually going through
> every monodix and bloody well searching for errors because there may be
> some that may show up if you stop trimming – please spend your time on
> something more useful.
>
> But there may also be some confusion as to what is an error. There may
> be things in monodixes that don't belong in "regular" dictionaries, but
> do belong in monodix – because the goal is building MT systems, not
> Dictionaries.
>
> And if your monodix is to be used for other things than MT, you're just
> gonna get many mo

Re: [Apertium-stuff] How useful is eliminating trimming for language developers?

2020-05-26 Thread Jonathan Washington
On Tue, May 26, 2020, 08:48 Francis Tyers  wrote:

> El 2020-05-26 12:27, Kevin Brubeck Unhammer escribió:
> > Xavi Ivars  čálii:
> >
> >> * In the trimming disadvantages number 1, we're stating that we're OK
> >> having crappy monodixes because we *fix* that later on with trimming.
> >> I'm
> >> sure that's where we are now, but as a project that focuses a lot on
> >> provided free (as in speech) language resources that are later used
> >> for
> >> many other use cases, I don't feel comfortable with that status. I
> >> think we
> >> should aim to have as correct as possible dictionaries. And if we did
> >> that,
> >> disadvantage number 1 would be smaller (even if not disappearing
> >> completely).
> >
> > This point seems like distraction. No one puts errors in monodix on
> > purpose. We do fix errors in monodix (when we find them, and have
> > time). When we use monodix for other tasks than MT, we find and fix
> > even
> > more. On the other hand, there's no point in manually going through
> > every monodix and bloody well searching for errors because there may be
> > some that may show up if you stop trimming – please spend your time on
> > something more useful.
> >
> > But there may also be some confusion as to what is an error. There may
> > be things in monodixes that don't belong in "regular" dictionaries, but
> > do belong in monodix – because the goal is building MT systems, not
> > Dictionaries.
> >
> > And if your monodix is to be used for other things than MT, you're just
> > gonna get many more such "weird" entries that all other use-cases need
> > to filter out. E.g. Giellatekno's Northern Saami analyser (used for MT,
> > spelling, grammar check etc.) contains several non-normative analyses,
> > "multiwords" and unusual taggings just for the grammar checker. These
> > are not included in the FST's built for other use-cases, but are
> > trimmed
> > out, mostly using tags (but also bidix, in the case of MT).
> >
>
> A better way of doing this kind of "lexicographic" work would be useful,
> in
> .lexc-based analysers we mostly use comments, but they are very ad-hoc.
> Some
> examples:
>
> ! Use/MT- Only use this in MT systems
> ! Src/Bible - This word came from the Bible
> ! Err/Orth  - Orthographic error
> ! Dial/North- Northern variant
> ! Use/kaz-kir   - Only use this is kaz-kir
> ! Use/Circ  - This causes a cycle
> ! Dir/LR- Only analysis
> ! Dir/RL- Only generation
> ! Use/MWE   - Multiword
> ! Der/Caus  - Derived form by causative
> ! Use/Arch  - Archaic form
>
> Fran
>

Another problem with these comments is that we don't use most of them for
anything.

In particular, Use/MT line should be stripped out to produce vanilla
transducers, but I don't think we've ever done that.

This isn't a problem inherent to the methodology—just our inability to get
organised enough to use it for everything we dreamt it might be useful for.

--
Jonathan



>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Secondary Tag Prefixes

2020-05-09 Thread Jonathan Washington
Speaking as a language developer,¹ I prefer concise, textual tags.

E.g., I don't think  is good—it clogs the stream
with verbosity, as Fran points out.

On the other hand, I don't mind symbols here and there, like <§agent>.
But I don't think this is a good secondary tag, unless we make it very
explicit which unicode spans/classes can be used to define secondary
tags.

I don't like tags like <:human>, unless we are explicit about
considering this a special way of encoding *semantic category*
information in secondary tags.

Fran, regarding these last two points, could you define what each
symbol in your example tags stands for, and what range/class of
symbols can continue to be used for secondary tags?

¹ Qualification: not one that's developed a released pair from start
to finish, but that's mostly because my attention is too divided...  I
do have decent amounts of experience with the entire pipeline, though,
and experience working on all pipeline stages in several individual
pairs (including one approaching release / "in nursery").

--
Jonathan

8 may 2020, C. tarixində 14:02 tarixində Hèctor Alòs i Font
 yazdı:
>
> Missatge de Francis Tyers  del dia dv., 8 de maig 2020 a 
> les 18:05:
>>
>> El 2020-05-08 15:50, Tino Didriksen escribió:
>> > For khannatanmai's GSoC project, secondary tags will be implemented in
>> > a backwards compatible manner. That it in itself indisputable. But,
>> > there is a question of how the initial batch of secondary tags should
>> > look.
>> >
>> > I feel they should be in the form of , as in a very short
>> > textual lower-case prefix, followed by :, followed by whatever value
>> > there is. Or even an upper-case prefix, as in  or .
>> >
>> > spectie wants symbol prefixes in the form of <%:cdefg>.
>> >
>>
>> [snip]
>>
>> > From a technical and scientific basis, textual prefixes are just
>> > better. And yet, spectie wants symbol prefixes because he likes them.
>> > I disagree. Hence, this mail asking for opinions.
>> >
>> > Do you language developers actually prefer symbol prefixes?
>> >
>>
>> Tino misrepresented me slightly. I never proposed using the pound sign.
>>
>> My proposal was for:
>>
>>
>> отец<@subj><§agent><%:отца><:human><:kin>
>>
>> If we have to have these "secondary tags"... which I have yet to be
>> completely convinced of,
>> I would like to have them be readable and not clutter the stream with
>> unnecessary
>> verbosity. There are a lot of rule-based formalisms out there that are
>> impossible to read,
>> having been dreamt up by people who don't actually spend a lot of time
>> writing language
>> data, and I would like to avoid that happening with Apertium.
>
>
> Well, from a developer's point of view, I'd like very much if I could get 
> information like "human", "construction", "denonym", "material", "musical 
> instrument", etc. which I have to use for lexical selection and also 
> sometimes for transfer. It seems logical to me that this data would be some 
> day placed in the dictionary or in a kind of secondary dictionary. In fact 
> the trend is already to add more semantic information to words: for example 
> in proper names we now often distinguish between first names, surnames, place 
> names, hidronyms, etc.
>
> Personally, I don't have any preference in the syntax. I'm fine with any 
> method that is short, easy to type on any keyboard and that identifies a tag 
> as secondary.
>
> Hèctor
>
>
>>
>> Again, and again I want to see a translation and a linguistic
>> motivation. In an _actual_
>> language pair, not in someone's imagination.
>>
>> We have a lot of modules that have been made but not reached use in a
>> released pair,
>> so I don't see how this should be different.
>>
>> Fran
>>
>>
>>
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bylaws Overhaul Proposal

2020-04-22 Thread Jonathan Washington
If you don't like the diff that GitHub offers, you can clone the repos and
use whatever diff tool you prefer.  The git executable itself has several
diff modes too, including a word diff (which sounds like what you're
after?).

--
Jonathan

On Wed, Apr 22, 2020, 01:14 Tino Didriksen  wrote:

> On Wed, 22 Apr 2020 at 01:10, Bernard Chardonneau 
> wrote:
>
>> Good idea to put a diff file but highlighting differences should be done
>> everywhere. There are long paragraphs on which it is very difficult to
>> see changes.
>> Or an alternative possibility could be to put one pink line followed by
>> the correspondong one in light green. But I prefer highlighting
>> differences.
>>
>
> We have no control over that - the diff is entirely generated and rendered
> by Github.
>
> -- Tino Didriksen
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bylaws Overhaul Proposal

2020-04-21 Thread Jonathan Washington
My understanding is that the Secretary doesn't have much power—mostly just
responsibility.  Or, what power are you talking about?

--
Jonathan

On Tue, Apr 21, 2020, 12:36 Samuel Sloniker  wrote:

> I just don't like giving one group too much power. Maybe I've spent too
> much time studying the US Constitution. 😀
>
> On Tue, Apr 21, 2020, 08:56 Jonathan Washington <
> jonathan.n.washing...@gmail.com> wrote:
>
>>
>>
>> On Mon, Apr 20, 2020, 17:57 Samuel Sloniker 
>> wrote:
>>
>>> 1. I oppose allowing the Secretary and Treasurer to be the same person.
>>>
>>
>> What's your reasoning for this?
>>
>> --
>> Jonathan
>>
>> 2. Using additional criteria for tiebreaking could easily turn into a
>>> rigged election.
>>> 3. I believe there should be a court of some sort for handling
>>> violations. See the Bylaw Violation Court in
>>> http://wiki.apertium.org/wiki/User:ScoopGracie/PMC/Proposed_bylaws for
>>> an example
>>>
>>> On Mon, Apr 20, 2020, 13:15 Tino Didriksen 
>>> wrote:
>>>
>>>> I'm proposing overhauling the Apertium Bylaws, and after some fixes and
>>>> refinements by the PMC, it's time to get everyone's input.
>>>>
>>>> PR with comments: https://github.com/apertium/organisation/pull/13
>>>>
>>>> Current bylaws: http://wiki.apertium.org/wiki/Bylaws
>>>>
>>>> Proposed bylaws:
>>>> https://github.com/apertium/organisation/blob/overhaul/Bylaws.md
>>>> Proposed CLA:
>>>> https://github.com/apertium/organisation/blob/overhaul/CLA-optional.md
>>>>
>>>> Diff: https://github.com/apertium/organisation/pull/13/files
>>>>
>>>> The rationale is laid out in the top PR comment. There are 3 underlying
>>>> tenets:
>>>> - Codify our de facto behavior into de jure language.
>>>> - Ensure all forms of contributions can result in voting rights.
>>>> - Set up for an eventual Apertium legal entity.
>>>>
>>>> Please read the whole thing, and give feedback on the PR so that it is
>>>> kept in one place.
>>>>
>>>> -- Tino Didriksen
>>>>
>>>> ___
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Where do I find the dictionaries

2020-04-21 Thread Jonathan Washington
Btw, dpkg -L shows the following on my Debian system:
/usr/share/apertium/apertium-swe/apertium-swe.swe.dix

It could probably work for what you want.

--
Jonathan

On Tue, Apr 21, 2020, 12:29 Jonathan Washington <
jonathan.n.washing...@gmail.com> wrote:

> Hi Per,
>
> To add to what Daniel said, language data installed from apt is put in
> system directories as root, and is not good for doing dev work.
>
> As a fairly up-to-date Apertium language data developer, I don't know the
> path of system-installed language data off the top of my head (you can
> always run dpkg -L apertium-swe to find out) and I'm not even sure it
> includes the uncompiled dictionaries.  Maybe I'm just an elite developer
> without my pulse on the needs of actual Apertium users.
>
> But I do recommend what Daniel suggested—that would be the easiest
> approach, imo.
>
> --
> Jonathan
>
> On Mon, Apr 20, 2020, 14:53 Daniel Swanson 
> wrote:
>
>> You can get the Swedish monodix from
>> https://github.com/apertium/apertium-swe or by running 'apertium-get swe'
>>
>> On Mon, Apr 20, 2020 at 2:51 PM Per Tunedal 
>> wrote:
>>
>>> Hi,
>>> I'm a bit rusty, not having used Apertium for a long time.
>>>
>>> I would like to get a dictionary containing Swedish lemmas, doing
>>> something like:
>>>
>>> apertium-dixtools grep --par '.*__n' apertium-swe.dix
>>>
>>> Where do I find the Swedish monodix?
>>>
>>> I'm running Ubuntu as an app on Windows 10. I've installed Apertium
>>> nightly build. The language pairs swe-dan and swe-nor are installed from
>>> the repository with sudo apt-get install ...
>>>
>>> And I've successfully installed apertium-dixtools. Then I got stuck. I
>>> cannot figure out where the language files are installed.
>>>
>>> Yours,
>>> Per Tunedal
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Where do I find the dictionaries

2020-04-21 Thread Jonathan Washington
Hi Per,

To add to what Daniel said, language data installed from apt is put in
system directories as root, and is not good for doing dev work.

As a fairly up-to-date Apertium language data developer, I don't know the
path of system-installed language data off the top of my head (you can
always run dpkg -L apertium-swe to find out) and I'm not even sure it
includes the uncompiled dictionaries.  Maybe I'm just an elite developer
without my pulse on the needs of actual Apertium users.

But I do recommend what Daniel suggested—that would be the easiest
approach, imo.

--
Jonathan

On Mon, Apr 20, 2020, 14:53 Daniel Swanson 
wrote:

> You can get the Swedish monodix from
> https://github.com/apertium/apertium-swe or by running 'apertium-get swe'
>
> On Mon, Apr 20, 2020 at 2:51 PM Per Tunedal 
> wrote:
>
>> Hi,
>> I'm a bit rusty, not having used Apertium for a long time.
>>
>> I would like to get a dictionary containing Swedish lemmas, doing
>> something like:
>>
>> apertium-dixtools grep --par '.*__n' apertium-swe.dix
>>
>> Where do I find the Swedish monodix?
>>
>> I'm running Ubuntu as an app on Windows 10. I've installed Apertium
>> nightly build. The language pairs swe-dan and swe-nor are installed from
>> the repository with sudo apt-get install ...
>>
>> And I've successfully installed apertium-dixtools. Then I got stuck. I
>> cannot figure out where the language files are installed.
>>
>> Yours,
>> Per Tunedal
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Modifying the apertium stream format to include arbitrary information

2020-04-21 Thread Jonathan Washington
The main thing I worry about here is lrx rules.

Currently a lot of pairs have rules that match e.g. tags="adj", but not
necessarily tags="adj.*".  So something that's normally hargle might
now be hargle, and that means the lrx rule won't match.

Since we want this to be backwards-compatible (without rewriting rules),
the lrx compiler and/or processor will have to be rewritten to ignore
secondary tags for matching (unless a rule is written to check a secondary
tag??).

I guess this sort of worry is the sort of thing you're keeping track of so
that it can be worked on?

--
Jonathan

On Mon, Apr 20, 2020, 14:52 Tanmai Khanna  wrote:

> In a nutshell, by using the source analysis for disambiguation and
> transfer, we make the translation output better, and by outputting the
> source surface form instead of the source lemma, we make the output more
> comprehensible, or post-editable.
>
> Tanmai
>
> On Tue, Apr 21, 2020 at 12:19 AM Tanmai Khanna 
> wrote:
>
>> Hey Francis,
>> I agree that it does seem like a solution searching for a problem if we
>> look at it in isolation. But it's important to look at this in the context
>> of eliminating trimming. Chronologically, this project was first about and
>> still is, about eliminating dictionary trimming. Modification to the stream
>> is just part of the solution - a solution that will help this problem, but
>> also potentially several other problems, such as the superblank reordering
>> problem. I went into detail about this in the proposal but I'll explain it
>> here.
>>
>> The monodix of a language is generally larger than the bidix for a
>> language pair involving that language pair. It was noticed that if used as
>> is, there are a lot of translation errors (the ones with @), which
>> basically just put the lemma of the source language if a translation
>> isnt available. To deal with this, dictionary trimming was added, which
>> basically removed a word from the monodix if it wasn't present in the bidix
>> and it went through the pipeline as an unknown word and the source surface
>> form was found in the final translation (with a *), which is arguably
>> better and more intelligible than just the source lemma.
>>
>> However, trimming meant giving up certain benefits. Let's look at these
>> benefits in greater detail:
>>
>>- *Lexical Selection:* By discarding the analysis of a word in the
>>source language, we lose the ability to use it as context to disambiguate
>>words in its context. Assume a [Noun Adjective] in which the we don't know
>>the translation of the Adjective, i.e. it isn't in the bidix. With 
>> trimming
>>we would discard it and hence if the Noun has several ambiguous forms, we
>>have no way to disambiguate it since we've discarded the analysis of the
>>Adjective (which included the fact that it's an adjective)
>>- *Transfer:* In the same example, assume that in the target
>>language, [Noun Adj] is to be rearranged into [Adj Noun]. With trimming,
>>this can't be done as we've discarded the analysis of the Adjective,
>>treating it as an unknown word.
>>
>> Now, if we don't discard the analysis and don't trim, we would again fall
>> into the earlier problem of untranslated lemmas.
>>
>> This project, is a way to have our cake and eat it too. We don't discard
>> the analysis even if we don't know the translation, but we don't just
>> output the lemma either - we output the source surface form. For a solution
>> like this, it is *essential that we propagate the surface form till at
>> least transfer or even till the generator*, so that we can use the
>> benefits of the source analysis and then before translation, we discard it
>> and use the source surface form.
>>
>> Currently the source surface form is discarded at the tagger. This is
>> where the stream modification comes in. It's a robust way to propagate the
>> surface form through the stream with least disruption to the current
>> modules.
>>
>> Then there are other possible benefits of secondary information, such as
>> markup tags. Hope this makes sense.
>>
>> Tanmai
>>
>> On Tue, Apr 21, 2020 at 12:02 AM Francis Tyers 
>> wrote:
>>
>>> El 2020-04-20 19:21, Daniel Swanson escribió:
>>> >> Another way of putting this is that it looks like a technical
>>> > solution
>>> >> in search of a problem, rather than a problem description in search
>>> >> of a solution.
>>> >
>>> > To me the most obvious thing to do with it is to put markup
>>> > information in secondary tags as a way of solving the superblank
>>> > reordering problem.
>>> >
>>>
>>> Didn't we have a solution for this that was worked on over a couple
>>> of GSOC projects ?
>>>
>>> Fran
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
>>
>> --
>> *Khanna, Tanmai*
>>
>
>
> --
> *Khanna, Tanmai*
> ___
>

Re: [Apertium-stuff] Bylaws Overhaul Proposal

2020-04-21 Thread Jonathan Washington
On Mon, Apr 20, 2020, 17:57 Samuel Sloniker  wrote:

> 1. I oppose allowing the Secretary and Treasurer to be the same person.
>

What's your reasoning for this?

--
Jonathan

2. Using additional criteria for tiebreaking could easily turn into a
> rigged election.
> 3. I believe there should be a court of some sort for handling violations.
> See the Bylaw Violation Court in
> http://wiki.apertium.org/wiki/User:ScoopGracie/PMC/Proposed_bylaws for an
> example
>
> On Mon, Apr 20, 2020, 13:15 Tino Didriksen  wrote:
>
>> I'm proposing overhauling the Apertium Bylaws, and after some fixes and
>> refinements by the PMC, it's time to get everyone's input.
>>
>> PR with comments: https://github.com/apertium/organisation/pull/13
>>
>> Current bylaws: http://wiki.apertium.org/wiki/Bylaws
>>
>> Proposed bylaws:
>> https://github.com/apertium/organisation/blob/overhaul/Bylaws.md
>> Proposed CLA:
>> https://github.com/apertium/organisation/blob/overhaul/CLA-optional.md
>>
>> Diff: https://github.com/apertium/organisation/pull/13/files
>>
>> The rationale is laid out in the top PR comment. There are 3 underlying
>> tenets:
>> - Codify our de facto behavior into de jure language.
>> - Ensure all forms of contributions can result in voting rights.
>> - Set up for an eventual Apertium legal entity.
>>
>> Please read the whole thing, and give feedback on the PR so that it is
>> kept in one place.
>>
>> -- Tino Didriksen
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Lexical Selection

2020-04-13 Thread Jonathan Washington
I believe the multitrans script in lex-tools (
https://github.com/apertium/apertium-lex-tools) makes it possible to get
all versions of the translation by expanding the dictionary and skipping
lexical selection.  So you'd get two sentences output for this particular
example:

- The season is more rainy
- The station is more rainy

--
Jonathan

On Fri, Apr 3, 2020, 09:00 Kevin Brubeck Unhammer  wrote:

> Jaume Ortolà i Font
>  čálii:
>
> > Missatge de egea piñeiro helena <
> helena.egea-tryufelddafe5aofshc...@public.gmane.org> del dia dc., 1
> > d’abr. 2020 a les 10:48:
> >
> >> How to show the text translated with the multiple options due to
> polisemy.
> >> "The *season/station* more rainy is"
> >>
> >
> > This is a recurrent request, that could be useful in some applications,
> but
> > there is no way to do it in Apertium now.
>
> You can make a new pipeline that splits into separate lexical units
> instead of disambiguating. There's an example for eng-ita at
> http://wiki.apertium.org/wiki/Translate_without_disambiguation
>
> Basically, replace cg-proc+apertium-tagger
>
> #!/usr/bin/python3
> import streamparser,sys
> for (b, lu) in streamparser.parse_file(sys.stdin,with_text=True):
>  print(b+"[/]".join(["^"+streamparser.reading_to_string(r)+"$" for r in
> lu.readings]),end="")'
>
> and replace lrx-proc with
>
> #!/usr/bin/python3
> import streamparser,sys
> for (b, lu) in streamparser.parse_file(sys.stdin, with_text=True):
>   print(b +
> "[/]".join(["^"+lu.wordform+"/"+streamparser.reading_to_string(r)+"$" for r
> in lu.readings]), end="")'
>
> in your pipeline and you get slash-separated alternatives.
>
>
> Of course, this won't get handled correctly by transfer (transfer will
> see e.g. several nouns in a row where there was one source noun), but if
> all you want is to send all alternatives through, it may be Good Enough
> for some purposes (e.g. testvoc, or MT for language learning).
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] PMC proposal

2020-04-10 Thread Jonathan Washington
I still see no comparison with other options, besides leaving things
as they are currently.  As I stated before, if anyone's going to vote
in favour of this, they'll want to understand the full range of
options out there that they're not voting for.

--
Jonathan

9 apr 2020, C.a. tarixində 15:32 tarixində Samuel Sloniker
 yazdı:
>
> I added some more pros as well as a few cons. 
> http://wiki.apertium.org/wiki/PMC_proposals/Use_Material_Design#In_detail
>
> On Thu, Apr 9, 2020 at 10:52 AM Jonathan Washington 
>  wrote:
>>
>> That's not a good reason to choose something.  As you yourself were
>> noting about our use of certain frameworks in html-tools, popularity
>> isn't a good criterion.
>>
>> The way this should work is the proposer (potentially with other
>> interested parties) does a thorough review of the options and opens up
>> discussion in the community based on those options.  Proposing one
>> option with no reasoning (except that it "seems fairly popular") or
>> context is unlikely to win anyone over.
>>
>> --
>> Jonathan
>>
>> 9 apr 2020, C.a. tarixində 13:47 tarixində Samuel Sloniker
>>  yazdı:
>> >
>> > No particular reason, it just seems fairly popular.
>> >
>> > On Thu, Apr 9, 2020, 10:45 Jonathan Washington 
>> >  wrote:
>> >>
>> >> Why Material and not some other standard?
>> >>
>> >> --
>> >> Jonathan
>> >>
>> >> 8 apr 2020, Ç. tarixində 16:02 tarixində Samuel Sloniker
>> >>  yazdı:
>> >> >
>> >> > (for the next PMC) 
>> >> > http://wiki.apertium.org/wiki/PMC_proposals/Use_Material_Design
>> >> > ___
>> >> > Apertium-stuff mailing list
>> >> > Apertium-stuff@lists.sourceforge.net
>> >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> >>
>> >>
>> >> ___
>> >> Apertium-stuff mailing list
>> >> Apertium-stuff@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> >
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] PMC proposal

2020-04-09 Thread Jonathan Washington
That's not a good reason to choose something.  As you yourself were
noting about our use of certain frameworks in html-tools, popularity
isn't a good criterion.

The way this should work is the proposer (potentially with other
interested parties) does a thorough review of the options and opens up
discussion in the community based on those options.  Proposing one
option with no reasoning (except that it "seems fairly popular") or
context is unlikely to win anyone over.

--
Jonathan

9 apr 2020, C.a. tarixində 13:47 tarixində Samuel Sloniker
 yazdı:
>
> No particular reason, it just seems fairly popular.
>
> On Thu, Apr 9, 2020, 10:45 Jonathan Washington 
>  wrote:
>>
>> Why Material and not some other standard?
>>
>> --
>> Jonathan
>>
>> 8 apr 2020, Ç. tarixində 16:02 tarixində Samuel Sloniker
>>  yazdı:
>> >
>> > (for the next PMC) 
>> > http://wiki.apertium.org/wiki/PMC_proposals/Use_Material_Design
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium offline translator Android app has gone ... offline.

2020-04-09 Thread Jonathan Washington
7 apr 2020, Ç.a. tarixində 05:44 tarixində Tino Didriksen <
m...@tinodidriksen.com> yazdı:

> Well, that makes sense from a security point of view. And that is a good
> reason to revive the native port, which I talked about on IRC but never
> wrote about to the mailing list.
>
> IRC logs:
>
> https://tinodidriksen.com/pisg/freenode/logs/%23apertium/search.php?q=TinoDidriksen+Android
>
> On 2015-11-25, I started seriously looking into getting all the C++ tools
> running natively on Android. Two weeks later (2015-12-07), everything
> worked - I had the Simpleton UI running on Android, with the native tools
> run as-is in the well-known pipes, with data files compiled on Debian.
>

Hmm, did you commit your work on this somewhere?

--
Jonathan


>
> At the time, I did not go further with it. There were some random crashes
> due to memory corruption bugs, which I think have all been ironed out by
> now.
>
> So what we need is an app that ships with the released binaries and
> downloads language data from us ( https://apertium.projectjj.com/pkgs.php
> ).
>
> -- Tino Didriksen
>
>
> On Tue, 7 Apr 2020 at 10:57, Jacob Nordfalk 
> wrote:
>
>> Hi there,
>>
>> It seems that, for security reasons, its not permitted anymore for apps
>> to download executable code from unknown sources and execute it.
>>
>>
>> So, Apertium_Android  is
>> currently unavialable
>> https://play.google.com/store/apps/details?id=org.apertium.android
>>
>> Same goes for Mikel Artetxe's Mitzuli version
>> https://play.google.com/store/apps/details?id=com.mitzuli
>>
>> They are both using Lttoolbox-java
>> .
>> Google probably found out as we use .jar file format (which is
>> essentially a ZIP file) as distribution mechanism (see
>> Language_pair_packages
>> ) - here is an
>> example
>> 
>> .
>>
>> While I totally agree with the principle of not downloading code from
>> unknown sources, I recon that this is essentially what the
>> Bytecode_for_transfer
>>  - is:  Java
>> bytecode representing the transfer stage downloaded and executed with no
>> security measures.
>>
>> I can see the following options:
>> 1) Change the distribution mechanism to be via Google Play
>> 2) Distribute the transfer files as XML and generate the bytecode
>> on-device
>> 3) Give up on bytecode for tranfer (transfer will be much slower)
>> 4) Try to make C++ version of lttoolbox, apertium (CG, HFS,...) usable in
>> Android
>> 5) Give up on offline functionality
>>
>> While 3) was really terribly slowing transfer down 10 years ago, our
>> computers have got faster. And most pairs are using constraint
>> grammar which is also very slow, compared to the rest of the pipeline, so
>> it might not be an issue anymore.
>>
>> I like 2) the most - moving the transfer compilation to the device
>> wouldnt be that hard, and I'd be happy to take part of it.
>> But the community was never very positive to the idea of Java/cross
>> platform, and Im not very active anymore, so perhaps 4) or 5) would be best.
>>
>> What do you think?
>>
>>
>> Yours,
>> Jacob
>>
>>
>> -- Forwarded message -
>> Fra: Google Play Support 
>> Date: tor. 26. mar. 2020 kl. 12.49
>> Subject: Notification from Google Play about Apertium offline translator
>> To: 
>> Cc: 
>>
>>
>> Hi Developers at Jacob Nordfalk,
>>
>> After a recent review, Apertium offline translator, org.apertium.android,
>> has been removed from Google Play due to a policy violation. This app won’t
>> be available to users until you submit a compliant update.
>>
>> *Issue: Violation of Malicious Behavior policy*
>>
>> An app distributed via Google Play may not modify, replace, or update
>> itself using any method other than Google Play's update mechanism.
>> Likewise, an app may not download executable code (e.g. dex, JAR, .so
>> files) from a source other than Google Play.
>>
>> *Next steps: Submit your updated app for another review*
>>
>>1. Read through the Malicious Behavior
>>
>> 
>>policy for more details.
>>2. Make changes to bring your app into compliance.
>>3. Make sure that your app is compliant with all other Developer
>>Program Policies
>>
>> .
>>Additional enforcement could occur if there are further policy violations.
>>4. Sign in to your Play Console
>>
>> 

Re: [Apertium-stuff] PMC proposal

2020-04-09 Thread Jonathan Washington
Why Material and not some other standard?

--
Jonathan

8 apr 2020, Ç. tarixində 16:02 tarixində Samuel Sloniker
 yazdı:
>
> (for the next PMC) 
> http://wiki.apertium.org/wiki/PMC_proposals/Use_Material_Design
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Election Results

2020-04-04 Thread Jonathan Washington
I'd like to make one clarification on this:

4 apr 2020, Ş. tarixində 14:55 tarixində Daniel Swanson
 yazdı:
>
> Hi Apertiumers!
>
> The election proceedings are now complete and the votes have been tallied as 
> follows:
>
> Votes:41
> For president :
> - Tino Didriksen9
> - Francis Tyers30
> For members :
> - Sushain K. Cherivirala18
> - Tino Didriksen28
> - Mikel L. Forcada29
> - Scoop Gracie (pseudonym)4
> - Xavi Ivars20
> - Tanmai Khanna4
> - Francis Tyers23
> - Jonathan Washington24
>
> There is a tie between Scoop Gracie and Tanmai Khanna. In consultation with 
> the current PMC it was decided that under a strict reading of the bylaws 
> Tanmai Khanna would be eligible to run and Scoop Gracie would not. Thus we 
> announce the election results as follows:
>
> President: Francis Tyers
> Members:
> - Sushain K. Cherivirala
> - Tino Didriksen
> - Mikel L. Forcada
> - Xavi Ivars
> - Tanmai Khanna*
> - Jonathan Washington
>
> * Due to participation in GSoC, Tanmai Khanna's appointment will be delayed.

No one has been selected for GSoC yet (we're in the mentor selection
stage), so whether or not Tanmai immediately assumes the PMC position
will be contingent on whether or not he is selected for GSoC.

--
Jonathan

> Thank you to the candidates and to everyone who voted.
>
> The election committee,
> Sevilay, Hèctor, and Daniel
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] PMC proposal

2020-04-02 Thread Jonathan Washington
Not everywhere observes the same days as the U.S.  Different areas and
peoples have their own traditions, just like they have their own languages.

https://en.wikipedia.org/wiki/Massacre_of_the_Innocents#Feast_day

--
Jonathan

On Thu, Apr 2, 2020, 11:59 Scoop Gracie  wrote:

> Huh?
>
> On Thu, Apr 2, 2020, 08:55 Mikel L. Forcada  wrote:
>
>> Ah, I never think of that. Our fools' day is December 28...
>> El 2/4/20 a les 15:13, Scoop Gracie ha escrit:
>>
>> It was an April Fool's Day joke.
>>
>> On Wed, Apr 1, 2020, 23:19 Mikel L. Forcada  wrote:
>>
>>> WTF.
>>> El 2/4/20 a les 3:46, Scoop Gracie ha escrit:
>>>
>>> http://wiki.apertium.org/wiki/PMC_proposals/Replanetize_Pluto
>>>
>>>
>>> ___
>>> Apertium-stuff mailing 
>>> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>> --
>>> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
>>> Departament de Llenguatges i Sistemes Informàtics
>>> Universitat d'Alacant
>>> E-03690 Sant Vicent del Raspeig
>>> Spain
>>> Office: +34 96 590 9776
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
>>
>> ___
>> Apertium-stuff mailing 
>> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> --
>> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
>> Departament de Llenguatges i Sistemes Informàtics
>> Universitat d'Alacant
>> E-03690 Sant Vicent del Raspeig
>> Spain
>> Office: +34 96 590 9776
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium elections coming up.

2020-03-13 Thread Jonathan Washington
пт, 13 мар. 2020 г. в 17:43, Bernard Chardonneau :
>
> > Date: Thu, 12 Mar 2020 22:20:41 +0100 > From: Tino Didriksen 
> > 
> > To: "[apertium-stuff]" 
> > Reply-To: apertium-stuff@lists.sourceforge.net
> > Subject: Re: [Apertium-stuff] Apertium elections coming up.
> > Pièce(s) jointes(s) probable(s)>
> >
> > So far, these people have indicated they want to run for PMC members:
> > - Jonathan Washington
> > - Francis Tyers
> > - Tino Didriksen
> > - Scoop Gracie (pseudonym)
> > - Tanmai Khanna
> > - Xavi Ivars
> > ...with Mikel L. Forcada as a maybe.
> >
> > These are standing for PMC President:
> > - Francis Tyers
> > - Tino Didriksen
> >
> > These have volunteered for being the election board:
> > - Sevilay Bayatlı
> > - Hèctor Alòs i Font
> > - Daniel Swanson
> >
> > If Mikel moves from maybe to certain, then we have the minimum 7 members.
> > And then the election would be simply to determine who is the president -
> > in which case, I would yield to Francis so that we can simply avoid needing
> > to run the election. If we get more than 7 candidates so that we need to
> > run the full election anyway, I'll contest the presidency.
> >
> > Alternatively, given how the world is shutting down from Covid-19, maybe we
> > should just postpone the election until things have settled down for
> > everyone.
> >
> > -- Tino Didriksen
> >
> >
>
> Well, if there is exactly the needed number of PMC candidates for having each
> of them elected and only one volonteer for president, no problem if the actual
> PMCs or the actual president or the election board or both of them decides to
> declare the results without running the election, I would not find it stupid.
>
> But using the pretext of the Covid-19 to postpone the election, I find that
> very strange.
>
> In different countries of the world, at least in Europe. People are called to
> stay at home instead of moving somewhere to work and going back home later
> every working day. For some ot them, the reason is just to keep at home 
> children
> who don't go to school.
>
> Staying at home, gives free time that we don't have when working.
>
> In France, from next Monday, from mother school to university, pupils and
> students will have to stay at home, but in different schools teachers are
> called th stay in contact with then, and to use internet to give them some
> work.
>
> For persons working in enterprises, when it is possible, they are called to
> stay at home and to use a computer with Internet for working.
>
> So, this illness may be an occasion for some people who move each day for
> working or studing to stay at home and to go on working or studing (certainly
> in a less intensive way) using a computer and Internet.
>
> For a free software developer, using a computer and Internet is nothing new.
>
> Participating to the PMC election, for people who just vote, that will mean
> receiving 2 or 3 emails, and sending one email for voting.
>
> That is not a big work and doing that when moving every day to the working
> ou studing place or when staying at home will not change a lot.
>
> And as I said earlier, having to stay at home may even give more free time.
>
> If there was hundred of million peoples in the world ill due to covid-19
> at the same time (5 % of the world population or more), that would be a
> valuable reason to postpone this election. But we are very very far from
> it, and in an Apertium PMC election, there are generally only around 30
> persons voting.
>

I will attest that in my case, I have *less* free time in many ways.
Young children home from school adds much more in the way of
distractions and interruptions from work.  The second that any of us
start getting symptoms will make work even harder, of course.

--
Jonathan


> 
> Bernard Chardonneau (France)
> Phone : [33] 9 72 36 32 90
> GSM phone : [33] 7 69 46 16 31
>
> An alternative Apertium translation website :
> http://apertiumtrad.tuxfamily.org
>
> Multilingual websites for my free softwares :
> http://libremail.free.fr and http://libremail.tuxfamily.org
> http://cyloop.tuxfamily.org (mainly translated with Apertium)
>
> My general website (in french only)
> http://bech.free.fr
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium elections coming up.

2020-03-06 Thread Jonathan Washington
And since
- Tino's running for president, and
- taking a census like this is technically the election committee's
responsibility (I believe),

The election committee should probably verify Tino's work.

--
Jonathan

пт, 6 мар. 2020 г. в 07:05, Tino Didriksen :

> Combined list of everyone I can find that currently are on any of our
> lists and thus may be eligible to vote:
>
> https://docs.google.com/spreadsheets/d/1ECL_8Lkfx4A66xpHhbOTn7ljKoDcLa0w7MdFZC7DOpA
>
> Includes outside-collaborators and .mailmap. Comes out to 310 names, of
> which 308 have emails.
>
> I have also merged everything into
> https://github.com/apertium/apertium-packaging/blob/master/authors.json
>
> -- Tino Didriksen
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC-2020

2020-03-05 Thread Jonathan Washington
Hi Himanshu,

Apertium generally isn't looking for "some developer" to write MT systems.
The community is usually looking for people who know a language and care
about it.

Also, there are over 150 million speakers of those languages combined, so
I'm not sure how rare it is to find developers who know one of them.

This all being said, there are interesting ways to augment Apertium with
neural methods.  I believe the community is open to projects that do this,
so if you're interested in pursuing that sort of project, there may be
viable options.

--
Jonathan

чт, 5 мар. 2020 г. в 15:33, Himanshu choudhary <
himanshuchoudhary_bt2...@dtu.ac.in>:

> Ok thanks
>
> On Fri, Mar 6, 2020, 1:49 AM Scoop Gracie  wrote:
>
>> It needs to be rule based.
>>
>> On Thu, Mar 5, 2020, 12:06 Himanshu choudhary <
>> himanshuchoudhary_bt2...@dtu.ac.in> wrote:
>>
>>> Hi,
>>>
>>> I just want to ask can't we use Neural machine translation or
>>> unsupervised machine translation rather than rule-based learning for the
>>> task "Apertium English--Hausa/Igbo/Swahili/Tigrinya/Yoruba". As I got some
>>> open-source data for some of these languages and I believe neural machine
>>> translation can produce better results and also we don't need to have a
>>> knowledge of the morphology and grammar of that particular language as it
>>> is very rare that some developer has the knowledge of these languages.
>>> Please let me know if I can proceed with it or I have to apply rule-based
>>> learning. so that I can move forward.
>>>
>>> Thank you
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium elections coming up.

2020-02-26 Thread Jonathan Washington
I just invited you to have write access to the Apertium phenny repo :)

This is an important discussion, and one that's come up before, but it
won't be resolved in time for the election.  So, for now, the easiest
work-around is to make sure everyone who thinks and who other
contributors think should be a Commiter be granted official Commiter
status by being given write access to the appropriate repo(s).

Also, I think we should consider starting a separate
"apertium-committers" email list that's more for bureaucratic and
policy-related discussions like this and doesn't need to include
people who are just here for the language technology stuff and don't
care to be involved in policy discussions.

--
Jonathan

ср, 26 февр. 2020 г. в 13:11, Scoop Gracie :
>
> I have requested write access on IRC, but never got an answer.
>
> On Wed, Feb 26, 2020, 10:09 Juan Pablo  wrote:
>>
>> Rather than ammending the bylaws, wouldn't it be simpler that you (and any 
>> others that may feel sidelined) file a request to the PMC asking to be 
>> granted committer status? According to bylaw 11, this is what you need:
>>
>> Bylaw 11. Committer access is received by committing code and getting 
>> sponsorship by two existing Committers, a nominator and a seconder. Upon 
>> fulfillment of these conditions, a PMC member will give write access.
>>
>> Best,
>> Juan Pablo
>>
>> On 26/02/2020 18:09, Scoop Gracie wrote:
>>
>> May users without PMC or committer status propose a PMC vote?
>>
>> Amend Bylaw 5: "The project's Committers are responsible for the project's 
>> technical management. Committers are developers who have write access to the 
>> project's source repositories, or who have contributed code to Apertium in 
>> any meaningful and significant way in the past six months. Committers may 
>> cast binding votes on any technical discussion regarding the project."
>> Amend Bylaw 23.G: "After 7 days to amend the census, a definitive census of 
>> Committers with right to vote will be published by the Election Board. Only 
>> Committers with email addresses known to the current PMC or the Election 
>> Board will be allowed to vote."
>>
>> This would include all PR contributors, as well as devs with write access. 
>> It would also ensure that only devs we can contact are included.
>>
>> On Wed, Feb 26, 2020 at 9:04 AM Scoop Gracie  wrote:
>>>
>>> Well, I consider myself a fairly active contributor, but I do not have 
>>> write access to any repo. Therefore, I am excluded from voting, even though 
>>> I am just as much an Apertium developer as many of the other devs (who get 
>>> to vote). IMHO, that seems unfair.
>>>
>>> On Wed, Feb 26, 2020 at 9:02 AM Mikel L. Forcada  wrote:

 I might be wrong, but I understood that pull requests are the way in which 
 casual developers contribute. If a developer contributes through PRs in a 
 sustained way, they should be named committers.

 Mikel


 El 26/2/20 a les 17:57, Scoop Gracie ha escrit:

 Because now, many contributions come through pull requests. Those 
 definitions exclude any contributors who do not have write access, even if 
 they have contributed significantly to Apertium.

 On Wed, Feb 26, 2020 at 8:55 AM Mikel L. Forcada  wrote:
>
> Why is it SF-related? It talks about the project's source repositories,
> without reference to SF.
>
> Mikel
>
> El 26/2/20 a les 17:49, Scoop Gracie ha escrit:
> > That is an outdated, SF-based definition. Shouldn't developers who
> > have submitted PRs be equally eligible?
>
> --
> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03690 Sant Vicent del Raspeig
> Spain
> Office: +34 96 590 9776
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff



 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff

 --
 Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
 Departament de Llenguatges i Sistemes Informàtics
 Universitat d'Alacant
 E-03690 Sant Vicent del Raspeig
 Spain
 Office: +34 96 590 9776

 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://

Re: [Apertium-stuff] Apertium elections coming up.

2020-02-26 Thread Jonathan Washington
Agreed, that seems pretty clear.  But that's not how it works in reality.

That is, the primary maintainers of specific repositories give other people
write access, not the PMC.  E.g., when I give someone access to a Turkic
translation pair, it's not because I'm on the PMC, but because I'm a
maintainer of that pair.  I think the by-laws should probably be revised to
reflect a more realistic workflow.

Also, PMC members can certainly be pinged to give access, but does it
require a vote?  Does it require some research and/or checking with the
repo's primary maintainers?  (PMC members are not usually aware of the
social history of everyone seeking to contribute to any given language
pair, for example.)

--
Jonathan

ср, 26 февр. 2020 г. в 11:25, Mikel L. Forcada :

> I think this is well defined:
>
> "18 The responsibilities of the Project Management Committee include
> […]
> Giving access rights to new Committers.
>
> Cheers
>
> Mikel
>
>
> El 26/2/20 a les 17:22, Jonathan Washington ha escrit:
>
> That's a good start for a discussion about revising committer status in
> the bylaws, and maybe for use of a definition for the election, but it also
> raises a few questions:
>
> - How do we decide who's a member of the GitHub org?
> - How do we decide who has write access to an Apertium repo?
>
> These are decided somewhat arbitrarily at this point.
>
> Also, people decide not to include their email address on their GitHub
> profile for various reasons (though often don't take the care the mask it
> in their commit history).  I understand where you're going with this,
> though—and I think it's better stated the other way around: "for purposes
> of being contacted for voting, a committer's email address should be known
> to the PMC; they cannot expect to receive a ballot if they do not either
> have an email address posted on their GitHub profile or are a member of the
> apertium-stuff mailing list."
>
> Actually, though, admin access to the manage the apertium-stuff mailing
> list has been lost, iirc, so that might be problematic too.  We could
> potentially work something out with sourceforge to regain access though?
>
> --
> Jonathan
>
> ср, 26 февр. 2020 г. в 11:03, Scoop Gracie :
>
>> Also, may I suggest a definition of "committer"?
>> * All members of the Apertium GitHub org (except, obviously,
>> ApertiumBot/begiak),
>> * Anyone with write access to an Apertium repo,
>> * Anyone who has submitted and had merged a PR in the last 6 months
>> Anyone in any of the categories above must have an email address on
>> his/her GitHub profile.
>>
>> On Wed, Feb 26, 2020 at 7:27 AM Scoop Gracie 
>> wrote:
>>
>>> Okay, so it sounds like I can run.
>>>
>>> On Wed, Feb 26, 2020, 05:26 Tino Didriksen 
>>> wrote:
>>>
>>>> On Wed, 26 Feb 2020 at 13:13, Scoop Gracie 
>>>> wrote:
>>>>
>>>>> Is there any requirement for PMC members to be over 18? And, how much
>>>>> of a time commitment is this?
>>>>>
>>>>
>>>> Undefined. I would say that to be a PMC member, you just need to be 13
>>>> - that would comply with various worldwide laws.
>>>>
>>>> But, any power of the purse should be restricted to 18+, again to
>>>> comply with contract laws.
>>>>
>>>> As for time spent ... well, if previous years is any indication, less
>>>> than an hour per week on average. Vast majority of time spent on Apertium
>>>> isn't really spent on PMC-specific matters.
>>>>
>>>> -- Tino Didriksen
>>>>
>>>> ___
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
>
> ___
> Apertium-stuff mailing 
> listApertium-stuff@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> --
> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03690 Sant Vicent del Raspeig
> Spain
> Office: +34 96 590 9776
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium elections coming up.

2020-02-26 Thread Jonathan Washington
That's a good start for a discussion about revising committer status in the
bylaws, and maybe for use of a definition for the election, but it also
raises a few questions:

- How do we decide who's a member of the GitHub org?
- How do we decide who has write access to an Apertium repo?

These are decided somewhat arbitrarily at this point.

Also, people decide not to include their email address on their GitHub
profile for various reasons (though often don't take the care the mask it
in their commit history).  I understand where you're going with this,
though—and I think it's better stated the other way around: "for purposes
of being contacted for voting, a committer's email address should be known
to the PMC; they cannot expect to receive a ballot if they do not either
have an email address posted on their GitHub profile or are a member of the
apertium-stuff mailing list."

Actually, though, admin access to the manage the apertium-stuff mailing
list has been lost, iirc, so that might be problematic too.  We could
potentially work something out with sourceforge to regain access though?

--
Jonathan

ср, 26 февр. 2020 г. в 11:03, Scoop Gracie :

> Also, may I suggest a definition of "committer"?
> * All members of the Apertium GitHub org (except, obviously,
> ApertiumBot/begiak),
> * Anyone with write access to an Apertium repo,
> * Anyone who has submitted and had merged a PR in the last 6 months
> Anyone in any of the categories above must have an email address on
> his/her GitHub profile.
>
> On Wed, Feb 26, 2020 at 7:27 AM Scoop Gracie 
> wrote:
>
>> Okay, so it sounds like I can run.
>>
>> On Wed, Feb 26, 2020, 05:26 Tino Didriksen 
>> wrote:
>>
>>> On Wed, 26 Feb 2020 at 13:13, Scoop Gracie 
>>> wrote:
>>>
 Is there any requirement for PMC members to be over 18? And, how much
 of a time commitment is this?

>>>
>>> Undefined. I would say that to be a PMC member, you just need to be 13 -
>>> that would comply with various worldwide laws.
>>>
>>> But, any power of the purse should be restricted to 18+, again to comply
>>> with contract laws.
>>>
>>> As for time spent ... well, if previous years is any indication, less
>>> than an hour per week on average. Vast majority of time spent on Apertium
>>> isn't really spent on PMC-specific matters.
>>>
>>> -- Tino Didriksen
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC 2020

2020-02-26 Thread Jonathan Washington
Hi Tomohiro,

Actually, my point was that there is still a lot to be done.  The work I
pointed you to is a proof of concept more than anything, and it has not
been integrated into Apertium.

If I were you, and interested in participating in GSoC, I would have a look
at those resources and try to get them running, and figure out how they
work and what the limitations are.  That will give you a good idea of what
still needs to be done.

--
Jonathan

On Wed, Feb 26, 2020, 08:41 Tomohiro Akazawa 
wrote:

> Hi Jonathan,
>
> thank you for your feedback.
> there seem to be enough implementations for Japanese.
>
> --
> Tomohiro
>
> 2020年2月26日(水) 22:26 Jonathan Washington :
>
>> Hi Tommi, all,
>>
>> A couple years ago, a Swarthmore student implemented an algorithm for
>> tokenisation of spaceless orthographies using morphological transducers.
>> She used a fork of a prototype Japanese transducer developed by another of
>> my students to evaluate it.
>>
>> The work is available at the following urls:
>>
>> https://scholarship.tricolib.brynmawr.edu/handle/10066/20002
>>
>> https://github.com/chanlon1/tokenisation
>>
>> https://github.com/chanlon1/apertium-jpn
>>
>> --
>> Jonathan
>>
>> On Wed, Feb 26, 2020, 06:38 Tomohiro Akazawa 
>> wrote:
>>
>>> Thank you for your reply.
>>> If  "improving the support of Japanese on Apertium" could be a new
>>> project on GSoC, I would find the problems of the current version of
>>> Apertium and figure out the solutions for them.
>>> Thank you.
>>>
>>> 2020年2月26日(水) 0:47 Tommi A Pirinen >> >:
>>>
>>>> Hi all,
>>>> one thing that might be worth considering ia improving support of
>>>> Japanese in Apertium, is that we currently do not have any good
>>>> generic solution for the word-tokenisation, this affects especially
>>>> languages like Japanese where a space- and punct-based tokenisation is
>>>> much more suboptimal than for European languages. If you'd be
>>>> interested in
>>>> formulating a project solving the tokenisation problem, I think it would
>>>> fit to Apertium gsoc quite well, and if others agree I could (co-)mentor
>>>>
>>>> On Mon, Feb 24, 2020 at 06:12:28AM +0900, Tomohiro Akazawa wrote:
>>>> > Thank you for your reply.
>>>> > Considering there are many resources for English and Japanese,
>>>> possibly I
>>>> > should change my plan .
>>>> > Thank you
>>>>
>>>>
>>>>
>>>> > On Sun, 23 Feb 2020, 23:58 Hèctor Alòs i Font, 
>>>> wrote:
>>>> >
>>>> > > Hi Tomohiro,
>>>> > >
>>>> > > Maybe it is not the 2019 version of the application form, but the
>>>> 2020 one
>>>> > > (if Apertium is elected by Google as a partner organisation) should
>>>> not be
>>>> > > very different of this one:
>>>> > > http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications
>>>> > > Essentially, for a pair like English and Japanese the main questions
>>>> > > probably will be:
>>>> > >
>>>> > > * reasons why Google and Apertium should sponsor it,
>>>> > > * a description of how and who it will benefit in society,
>>>> > >
>>>> > > (essentially because both English and Japanese are resourceful
>>>> languages).
>>>> > > Imho, Okinawan-Japanese would be a much more Apertium-like
>>>> proposal. But,
>>>> > > of course, I may be wrong. I should maybe add that for building a
>>>> > > translator it is not absolutely necessary to be proficient in the
>>>> source
>>>> > > language. If you can read it and you have access to grammars,
>>>> dictionaries
>>>> > > and informants, this is usually enough. But, of course, the more
>>>> you know
>>>> > > the source language (not only the target one), the better.
>>>> > >
>>>> > > Hèctor
>>>> > >
>>>> > > Missatge de Tomohiro Akazawa  del dia
>>>> dg., 23
>>>> > > de febr. 2020 a les 14:27:
>>>> > >
>>>> > >>  Hello.
>>>> > >> My name is Tomohiro and I am a student of the University of Tokyo
>>>> in
>>>> > >> Jap

Re: [Apertium-stuff] Apertium elections coming up.

2020-02-26 Thread Jonathan Washington
Thanks for calling the election, Mikel.  It was pointed out on IRC during
GCI that it was time to run an election again, but everyone was busy with
GCI so it didn't receive more attention.

+1 to Tino's points.

I'll be running for PMC again, so I'll recuse myself from involvement with
the election board.

--
Jonathan

On Wed, Feb 26, 2020, 08:26 Tino Didriksen  wrote:

> On Wed, 26 Feb 2020 at 13:13, Scoop Gracie  wrote:
>
>> Is there any requirement for PMC members to be over 18? And, how much of
>> a time commitment is this?
>>
>
> Undefined. I would say that to be a PMC member, you just need to be 13 -
> that would comply with various worldwide laws.
>
> But, any power of the purse should be restricted to 18+, again to comply
> with contract laws.
>
> As for time spent ... well, if previous years is any indication, less than
> an hour per week on average. Vast majority of time spent on Apertium isn't
> really spent on PMC-specific matters.
>
> -- Tino Didriksen
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC 2020

2020-02-26 Thread Jonathan Washington
Hi Tommi, all,

A couple years ago, a Swarthmore student implemented an algorithm for
tokenisation of spaceless orthographies using morphological transducers.
She used a fork of a prototype Japanese transducer developed by another of
my students to evaluate it.

The work is available at the following urls:

https://scholarship.tricolib.brynmawr.edu/handle/10066/20002

https://github.com/chanlon1/tokenisation

https://github.com/chanlon1/apertium-jpn

--
Jonathan

On Wed, Feb 26, 2020, 06:38 Tomohiro Akazawa 
wrote:

> Thank you for your reply.
> If  "improving the support of Japanese on Apertium" could be a new project
> on GSoC, I would find the problems of the current version of Apertium and
> figure out the solutions for them.
> Thank you.
>
> 2020年2月26日(水) 0:47 Tommi A Pirinen :
>
>> Hi all,
>> one thing that might be worth considering ia improving support of
>> Japanese in Apertium, is that we currently do not have any good
>> generic solution for the word-tokenisation, this affects especially
>> languages like Japanese where a space- and punct-based tokenisation is
>> much more suboptimal than for European languages. If you'd be interested
>> in
>> formulating a project solving the tokenisation problem, I think it would
>> fit to Apertium gsoc quite well, and if others agree I could (co-)mentor
>>
>> On Mon, Feb 24, 2020 at 06:12:28AM +0900, Tomohiro Akazawa wrote:
>> > Thank you for your reply.
>> > Considering there are many resources for English and Japanese, possibly
>> I
>> > should change my plan .
>> > Thank you
>>
>>
>>
>> > On Sun, 23 Feb 2020, 23:58 Hèctor Alòs i Font, 
>> wrote:
>> >
>> > > Hi Tomohiro,
>> > >
>> > > Maybe it is not the 2019 version of the application form, but the
>> 2020 one
>> > > (if Apertium is elected by Google as a partner organisation) should
>> not be
>> > > very different of this one:
>> > > http://wiki.apertium.org/wiki/Top_tips_for_GSOC_applications
>> > > Essentially, for a pair like English and Japanese the main questions
>> > > probably will be:
>> > >
>> > > * reasons why Google and Apertium should sponsor it,
>> > > * a description of how and who it will benefit in society,
>> > >
>> > > (essentially because both English and Japanese are resourceful
>> languages).
>> > > Imho, Okinawan-Japanese would be a much more Apertium-like proposal.
>> But,
>> > > of course, I may be wrong. I should maybe add that for building a
>> > > translator it is not absolutely necessary to be proficient in the
>> source
>> > > language. If you can read it and you have access to grammars,
>> dictionaries
>> > > and informants, this is usually enough. But, of course, the more you
>> know
>> > > the source language (not only the target one), the better.
>> > >
>> > > Hèctor
>> > >
>> > > Missatge de Tomohiro Akazawa  del dia
>> dg., 23
>> > > de febr. 2020 a les 14:27:
>> > >
>> > >>  Hello.
>> > >> My name is Tomohiro and I am a student of the University of Tokyo in
>> > >> Japan.
>> > >>  Seeing the Apertium's idea list for GSoC 2020, I found "Adopt an
>> > >> unreleased language pair" interesting.
>> > >>  Do you think it is possible to make the language pair between
>> English
>> > >> and Japanese?
>> > >> Thank you very much.
>> > >> ___
>> > >> Apertium-stuff mailing list
>> > >> Apertium-stuff@lists.sourceforge.net
>> > >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> > >>
>> > > ___
>> > > Apertium-stuff mailing list
>> > > Apertium-stuff@lists.sourceforge.net
>> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> > >
>>
>>
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> --
>> Doktor Tommi A Pirinen, Computational Linguist,
>> , Universität
>> Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
>> Entwickler.  President of ACL SIGUR SIG for Uralic languages
>> .
>> I tend to follow inline-posting style in desktop e-mail messages.
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSOC: Ideas for Google Summer of Code Page Up-to-Date?

2020-02-23 Thread Jonathan Washington
Hi Nadia,

The page is still being edited to be up-to-date.

The anaphora resolution project was completed last year, and will need
to be removed from the ideas page.

This year a related project idea might be to integrate anaphora
resolution into several Apertium translation pairs.  Another option
might be a system that learns anaphora rules.

--
Jonathan

вс, 23 февр. 2020 г. в 22:22, Nadia Sheikh :
>
> Hi,
>
> I am a little confused. Is this page up to date?
> Specifically: Is anaphora resolution is a project that can still be carried 
> out?
>
> Best,
>
> Nadia
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GSoC--Apertium Website Development

2020-02-23 Thread Jonathan Washington
Hi Mohit,

As with all GSoC projects, the best way to show your interest in the
project and your ability to complete it is to start working on it.  In
this case, this would mean starting by tackling a few open issues in
apertium-html-tools
(https://github.com/apertium/apertium-html-tools/issues) and
apertium-apy (https://github.com/apertium/apertium-apy/issues).

--
Jonathan

вс, 23 февр. 2020 г. в 18:49, Mohit Kumar Verma :
>
> Hi,
>
> My name is Mohit Kumar Verma currently studying in NIT Hamirpur,
> Himachal Pradesh, India. I would like to work with you in further
> developing the Apertium Website and adding new features. By browsing the
> website, it seems that in the past someone started the work but left it
> unfinished. I would like to continue the work and make the website such
> that it is appealing to the eyes and gets the work done in minimum data
> consumption.
>
> Thank You.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need your guidance for GSoC 2020

2020-02-23 Thread Jonathan Washington
Hi Arzoo,

This mailing list isn't necessarily the venue for convincing GSoC
mentors that this project idea is worthwhile.

However, I will give you some feedback on it:

First of all, the page you linked to says that there is no single
Pahari language.  Other places I investigated said the same thing.
You'll need to be more specific.

Second of all, you'll need to find a mentor who's both able to
supervise such a project (e.g., knows sufficient Hindi or a related
language) and interested.

And most importantly, you'll need to complete the coding
challenge—i.e., create a prototype system—to both learn everything
that's involved and demonstrate to the mentor community that you
understand those things.

--
Jonathan

пт, 21 февр. 2020 г. в 12:02, Arzoo :
>
> Hello everyone,
>
> After going through apertium website, contributing guidelines, past projects 
> and gsoc 2020 ideas, I decided to develop a new language translation pair. 
> Currently, there is no machine translation system which can translate to/from 
> 'PAHARI' language. Pahari language is spoken in India in Himachal Pradesh, 
> Uttarakhand, Jammu & Kashmir, Punjab (some parts) states. It would help 
> society as a whole by maintaining diversity.  So, I would like to develop 
> Apertium machine translation pair either for Pahari and Hindi languages or 
> Pahari and English languages. Creating a new language (i.e., Pahari) will 
> definitely open the doors in the future for new translation pair of Pahari 
> language with other languages. I believe that my work will be beneficial for 
> the society and also for the future developments.
> Looking forward for precious feedbacks.
>
> Thanks and regards,
> Arzoo
>
> On Fri, Feb 21, 2020 at 1:48 AM Jonathan Washington 
>  wrote:
>>
>> Other places to check are here:
>> http://wiki.apertium.org/wiki/Contributing
>>
>> and here:
>> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
>>
>> , the latter of which needs some updates.
>>
>> --
>> Jonathan
>>
>> чт, 20 февр. 2020 г. в 14:52, Sevilay Bayatlı :
>> >
>> > Hi Arzoo,
>> >
>> > Welcome to Apertium, you have to read about apertium 
>> > https://github.com/apertium  then decide what can you contribute.
>> >
>> > Best regards,
>> >
>> > Sevilay
>> >
>> > On Thu, Feb 20, 2020 at 10:39 PM Arzoo  wrote:
>> >>
>> >> Good evening everyone,
>> >>
>> >> My name is Arzoo. I am a fifth-year student at the National Institute of 
>> >> Technology, Hamirpur pursuing a dual degree program (B.Tech + M.Tech) in 
>> >> Computer Science and Engineering. I am looking for GSoC 2020. And for the 
>> >> same, I need a mentor for guidance. Please help and give directions so 
>> >> that I can contribute something solid to this GSoC.
>> >>
>> >> Thanks and Regards,
>> >> Arzoo
>> >>
>> >> --
>> >> Arzoo
>> >> Computer Science and Engineering Department
>> >> National Institute of Technology, Hamirpur
>> >> +91-9971718061
>> >> ___
>> >> Apertium-stuff mailing list
>> >> Apertium-stuff@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> >
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
>
> --
> Arzoo
> Computer Science and Engineering Department
> National Institute of Technology, Hamirpur
> +91-9971718061
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need your guidance for GSoC 2020

2020-02-20 Thread Jonathan Washington
Other places to check are here:
http://wiki.apertium.org/wiki/Contributing

and here:
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code

, the latter of which needs some updates.

--
Jonathan

чт, 20 февр. 2020 г. в 14:52, Sevilay Bayatlı :
>
> Hi Arzoo,
>
> Welcome to Apertium, you have to read about apertium 
> https://github.com/apertium  then decide what can you contribute.
>
> Best regards,
>
> Sevilay
>
> On Thu, Feb 20, 2020 at 10:39 PM Arzoo  wrote:
>>
>> Good evening everyone,
>>
>> My name is Arzoo. I am a fifth-year student at the National Institute of 
>> Technology, Hamirpur pursuing a dual degree program (B.Tech + M.Tech) in 
>> Computer Science and Engineering. I am looking for GSoC 2020. And for the 
>> same, I need a mentor for guidance. Please help and give directions so that 
>> I can contribute something solid to this GSoC.
>>
>> Thanks and Regards,
>> Arzoo
>>
>> --
>> Arzoo
>> Computer Science and Engineering Department
>> National Institute of Technology, Hamirpur
>> +91-9971718061
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium GSoC 2020? Deadline Feb 5th

2020-02-08 Thread Jonathan Washington
Hi James,

There's still no news on whether Apertium will be accepted into GSoC,
so no one knows for sure whether they will mentor or not yet.  More to
the point, who mentors will depend to some extent on what student
projects are selected for the summer and which prospective mentors are
available and interested in mentoring specific projects.

At this point there are only prospective mentors.

--
Jonathan

сб, 8 февр. 2020 г. в 11:12, James sandy :
>
> Hello Tommi,
>
> Are you mentoring for Apertium this year ? if so i'll love to talk to you
>
> On Thu, Feb 6, 2020 at 12:35 PM Tommi A Pirinen 
>  wrote:
>>
>> On Tue, Feb 04, 2020 at 01:51:38PM -0500, Jonathan Washington wrote:
>>
>> > Everyone else interested in mentoring, please let me know.
>>
>> I can co-mentor this year again but with very randomly varying schedule
>> while I'm possible moving between jobs or places.
>>
>> --
>> Doktor Tommi A Pirinen, Computational Linguist,
>> <https://flammie.github.io/purplemonkeydishwasher/>, Universität
>> Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
>> Entwickler.  President of ACL SIGUR SIG for Uralic languages
>> <http://gtweb.uit.no/sigur/>.
>> I tend to follow inline-posting style in desktop e-mail messages.
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Apertium GSoC 2020? Deadline Feb 5th

2020-02-04 Thread Jonathan Washington
Hi James,

Prospective GSoC mentors need to have sufficient experience in
Apertium to oversee student work.  Could you clarify what your
experience with Apertium is?

My apologies if you've been an active contributor—there's a chance I
may have simply not encountered your contributions, or may recognise
them under a different name or username.

--
Jonathan

вт, 4 февр. 2020 г. в 15:00, James sandy :
>
> Hello Jonathan,
>
> I’m interested to mentor for Gsoc this year with Apertium
>
> On Tuesday, February 4, 2020, Jonathan Washington 
>  wrote:
>>
>> Hi all,
>>
>> Thanks to those of you who have offered to help with GSoC.
>>
>> Everyone else interested in mentoring, please let me know.
>>
>> Also note the following draft of information for the application:
>> wiki.apertium.org/wiki/Google_Summer_of_Code/Application_2020
>>
>> --
>> Jonathan
>>
>> пн, 3 февр. 2020 г. в 23:14, Aléssio Miranda Jr :
>> >
>> > Hello everyone,
>> >
>> > My name is Aléssio (Brazil), I participated as a student at GSOC in 
>> > 2009 with a project that aimed to develop a more friendly interface to 
>> > assist the expansion and management of dictionaries with a graphical 
>> > interface. At that time, the tool I developed did not have enough 
>> > functionality to be used by the community and in the last 3 years it was 
>> > common to receive questions about it in my email.
>> >At the GSOC 2019, I submitted with my student Vinicius a new proposal 
>> > for this collaborative web interface, but it was not accepted as a 
>> > suggestion. Despite this, we worked in 2019 on a prototype that validated 
>> > the ways to manage the dictionaries through a web interface and applying 
>> > directly to Git.
>> >This functional prototype is available and I would like to discuss its 
>> > potential. In 2020 I will be ready to mentor this project, with another 
>> > student if the community believes he can make a difference.
>> >
>> >   Vinicius is working with a documentation of this prototype to discuss.
>> >
>> > hug...
>> >
>> > On Mon, Feb 3, 2020 at 8:34 PM Gianfranco Fronteddu  
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> My name is Gianfranco Fronteddu. Just as Xavi said, to whom I send my 
>> >> regards, I would be happy to be a mentor for this edition of the GSoC. I 
>> >> was a student in the 2016 and 2017 GSoc's editions for Italian-Sardinian 
>> >> and Catalan-Sardinian pairs. Both pairs have been released. In the last 
>> >> few years I've worked in research on MT (my master thesis was written 
>> >> both in Italian and Sardinian on the description of my Apertium projects 
>> >> and is being published). At the moment I'm a PhD student in translation 
>> >> technologies at the Autonomous University of Barcelona. It would be an 
>> >> honor to be able to continue contributing to the development of Apertium.
>> >>
>> >> Il giorno lun 3 feb 2020 alle ore 19:29 Xavi Ivars  
>> >> ha scritto:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I'd be happy to help for GSoC, and happy to mentor any website and 
>> >>> romanic language pair related project.
>> >>>
>> >>> I knowalready Gianfranco Fronteddu (he was a GSoC student few years ago, 
>> >>> with Sardinian-related pairs) wants to be mentor as well.
>> >>>
>> >>>
>> >>>
>> >>> Missatge de Amr Mohamed Hosny Anwar  del dia 
>> >>> dt., 28 de gen. 2020 a les 23:07:
>> >>>>
>> >>>> On 1/28/20 6:37 PM, Francis Tyers wrote:
>> >>>> > El 2020-01-28 15:08, Tino Didriksen escribió:
>> >>>> >> https://summerofcode.withgoogle.com/ is open for organization
>> >>>> >> applications, until February 5th.
>> >>>> >>
>> >>>> >> Are we participating in 2020? Who's up for mentoring and
>> >>>> >> administrating?
>> >>>> >>
>> >>>> >> Previous years at this stage we've at least talked about it, but this
>> >>>> >> year it's been rather silent. Students are as usual already looking 
>> >>>> >> at
>> >>>> >> the ideas page and asking how to get started.
>> >>>> >

Re: [Apertium-stuff] Apertium GSoC 2020? Deadline Feb 5th

2020-02-04 Thread Jonathan Washington
Hi all,

Thanks to those of you who have offered to help with GSoC.

Everyone else interested in mentoring, please let me know.

Also note the following draft of information for the application:
wiki.apertium.org/wiki/Google_Summer_of_Code/Application_2020

--
Jonathan

пн, 3 февр. 2020 г. в 23:14, Aléssio Miranda Jr :
>
> Hello everyone,
>
> My name is Aléssio (Brazil), I participated as a student at GSOC in 2009 
> with a project that aimed to develop a more friendly interface to assist the 
> expansion and management of dictionaries with a graphical interface. At that 
> time, the tool I developed did not have enough functionality to be used by 
> the community and in the last 3 years it was common to receive questions 
> about it in my email.
>At the GSOC 2019, I submitted with my student Vinicius a new proposal for 
> this collaborative web interface, but it was not accepted as a suggestion. 
> Despite this, we worked in 2019 on a prototype that validated the ways to 
> manage the dictionaries through a web interface and applying directly to Git.
>This functional prototype is available and I would like to discuss its 
> potential. In 2020 I will be ready to mentor this project, with another 
> student if the community believes he can make a difference.
>
>   Vinicius is working with a documentation of this prototype to discuss.
>
> hug...
>
> On Mon, Feb 3, 2020 at 8:34 PM Gianfranco Fronteddu  wrote:
>>
>> Hi,
>>
>> My name is Gianfranco Fronteddu. Just as Xavi said, to whom I send my 
>> regards, I would be happy to be a mentor for this edition of the GSoC. I was 
>> a student in the 2016 and 2017 GSoc's editions for Italian-Sardinian and 
>> Catalan-Sardinian pairs. Both pairs have been released. In the last few 
>> years I've worked in research on MT (my master thesis was written both in 
>> Italian and Sardinian on the description of my Apertium projects and is 
>> being published). At the moment I'm a PhD student in translation 
>> technologies at the Autonomous University of Barcelona. It would be an honor 
>> to be able to continue contributing to the development of Apertium.
>>
>> Il giorno lun 3 feb 2020 alle ore 19:29 Xavi Ivars  ha 
>> scritto:
>>>
>>> Hi,
>>>
>>> I'd be happy to help for GSoC, and happy to mentor any website and romanic 
>>> language pair related project.
>>>
>>> I knowalready Gianfranco Fronteddu (he was a GSoC student few years ago, 
>>> with Sardinian-related pairs) wants to be mentor as well.
>>>
>>>
>>>
>>> Missatge de Amr Mohamed Hosny Anwar  del dia dt., 
>>> 28 de gen. 2020 a les 23:07:

 On 1/28/20 6:37 PM, Francis Tyers wrote:
 > El 2020-01-28 15:08, Tino Didriksen escribió:
 >> https://summerofcode.withgoogle.com/ is open for organization
 >> applications, until February 5th.
 >>
 >> Are we participating in 2020? Who's up for mentoring and
 >> administrating?
 >>
 >> Previous years at this stage we've at least talked about it, but this
 >> year it's been rather silent. Students are as usual already looking at
 >> the ideas page and asking how to get started.
 >>
 >> -- Tino Didriksen
 >
 > I'm potentially willing to mentor this summer, but I cannot be an admin
 > this year due to prior commitments. I hope to be able to resume next
 > year!
 >
 > Fran
 >
 >
 > ___
 > Apertium-stuff mailing list
 > Apertium-stuff@lists.sourceforge.net
 > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
 As a GSoC student with Apertium, I really liked the experience and I
 believe it's a great opportunity for university students to contribute
 to Apertium.
 I can help with tasks like archiving ideas that were already done in
 GSoC 2019.

 Amr

 ___
 Apertium-stuff mailing list
 Apertium-stuff@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>>
>>> --
>>> < Xavi Ivars >
>>> < http://xavi.ivars.me >
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>> --
>> Esta mensagem foi verificada pelo sistema de antivírus e
>> acredita-se estar livre de perigo. 
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> --
> Esta mensagem foi verificada pelo sistema de antivírus e
> acredita-se estar livre de perigo. 
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists

Re: [Apertium-stuff] Apertium GSoC 2020? Deadline Feb 5th

2020-01-28 Thread Jonathan Washington
I've been eyeing the deadline, and it's been on my to-do list to bring
it up.  But I've also been busy with other things and kind of waiting
for the last of the GCI admin stuff to be out of the way.

In any case, I'm willing to be an org admin for GSoC—it's a little bit
lower pressure than for GCI imo.  As in the recent past, I can take
the lead on the application, but I'll need a little support in the
form of help with institutional knowledge about previous GSoCs.  So if
Fran(, Mikel, etc.) can help me with that, we'll just need one more
person to help out as an official org admin.

Everyone else should be thinking about whether they might want to be
mentors, and also what sorts of projects they'd want to mentor (and
adding them to the ideas page¹!).

¹ http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code

--
Jonathan

вт, 28 янв. 2020 г. в 11:37, Francis Tyers :
>
> El 2020-01-28 15:08, Tino Didriksen escribió:
> > https://summerofcode.withgoogle.com/ is open for organization
> > applications, until February 5th.
> >
> > Are we participating in 2020? Who's up for mentoring and
> > administrating?
> >
> > Previous years at this stage we've at least talked about it, but this
> > year it's been rather silent. Students are as usual already looking at
> > the ideas page and asking how to get started.
> >
> > -- Tino Didriksen
>
> I'm potentially willing to mentor this summer, but I cannot be an admin
> this year due to prior commitments. I hope to be able to resume next
> year!
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Bootstrapping a new language pair – Hanging Problem

2020-01-14 Thread Jonathan Washington
Hi Zanga,

I believe you're running into a know issue with the tagger as apertium-init
sets it up:
https://github.com/apertium/apertium-init/issues/31

The fix is to add -x to the apertium-tagger lines in your bilingual
modes.xml.  A work-around is to just remove the apertium-tagger blocks
entirely.

Let us know if you continue having trouble with this.

--
Jonathan

On Mon, Jan 13, 2020, 06:21 Hèctor Alòs i Font  wrote:

> Hi Zanga,
>
> I've downloaded your code and compiled it. It doesn't hang in my computer:
> https://drive.google.com/open?id=16GLQjMIlwJx2Dtx585JFpmvV61dUrpUH
>
> According to your screenshot you have a problem in apertium-tagger.
> Seemingly, you get a result if you type:
>
> $ echo "houses" | apertium -d . nya-yao-disam
> ^houses/house$^./.$
>
> But not if you type:
>
> $ echo "houses" | apertium -d . nya-yao-tagger
> ^house$^.$
>
> So, it seems there is a problem in your installation of Apertium.
>
> Best,
> Hèctor
>
> Missatge de Zanga Chimombo  del dia dl., 13 de gen.
> 2020 a les 9:41:
>
>> I have followed the guidelines on bootstrapping a new language pair
>> (with no existing monolingual packages) at:
>> http://wiki.apertium.org/wiki/How_to_bootstrap_a_new_pair
>>
>> I have made one change only to one of the monolingual language
>> dictionaries, replacing “house” with “casa” in a couple of places in
>> order to be able to test the new pair. My code is at:
>> https://gitlab.com/zangaphee/CiBantu
>>
>> However, I am encountering the hanging problem. When I attempt the
>> tests suggested in the README file:
>> echo "House" | apertium -d . nya-yao
>> echo "casas" | apertium -d . yao-nya
>> It just hangs. I don’t know what it is doing.
>>
>> Apertium-Viewer is quite helpful but I am unable to interpret the
>> screenshot uploaded to the following location:
>> https://gitlab.com/snippets/1929316
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Kurmanji apertium data upload

2019-12-21 Thread Jonathan Washington
Hi Sakine,

If I understand right, you'd like to use a web interface to analyse some
Kurmancî data?

If so, you can try this:
https://beta.apertium.org/?choice=kmr#analyzation

If that's not what you're after, could you clarify?

--
Jonathan

On Thu, Dec 19, 2019, 19:31 Sakine Cabuk Balli 
wrote:

> Dear Apertium team,
> I am working on the acquisition of Kurmanji as a first language. I have
> transcribed my data in ELAN (
> https://tla.mpi.nl/tools/tla-tools/elan/download/). I would like to use
> apertium kmr for my data of Kurmanji Kurdish and do POS and morphological
> tagging of the data. Will you please help me to upload my data into by
> creating a web-tool to transfer/upload the data into apertium kmr?
> Best regards,
> Sakine
>
>
>
>
>
>
> -
> Sakine Çabuk Ballı
> University of Zurich
> Department of Comparative Linguistics
> Plattenstrasse 54
> CH-8032 Zurich
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-26 Thread Jonathan Washington
Okay, I submitted the EOI, and sent an email with details to the
preliminary author list.  If anyone is an established Apertium
contributor (major contributions, previous publications, or similar)
and didn't get the details email but would like to be involved,
there's still time—just let me know.

--
Jonathan

вт, 26 нояб. 2019 г. в 08:26, Jonathan Washington
:
>
> I can take care of it in an hour or two.  Thanks for the reminder, Ilnar, and 
> for the organisational help, Sevilay!
>
> --
> Jonathan
>
> On Tue, Nov 26, 2019, 06:59 Sevilay Bayatlı  wrote:
>>
>> Hi,
>> I think we do, we already have the list of authors and draft of the paper.  
>> We need someone  to submit  a title, list of authors, and
>> a short description.
>>
>> Sevilay
>>
>>
>> On Wed, Nov 20, 2019 at 8:05 PM Jonathan Washington 
>>  wrote:
>>>
>>> Hi all,
>>>
>>> This is just a reminder that the expression of interest for this
>>> volume is due in less than a week!
>>>
>>> The expression of interest is easy: just a title, list of authors, and
>>> a short description.
>>>
>>> If anyone else would like to help out with the updated Apertium paper
>>> that we're planning to submit, then please get in touch.
>>>
>>> --
>>> Jonathan
>>>
>>> пт, 1 нояб. 2019 г. в 22:21, Jonathan Washington
>>> :
>>> >
>>> > Hi all,
>>> >
>>> > Below please find a revised CFP for the Machine Translation Special
>>> > Issue on MT for Low-Resource Languages.
>>> >
>>> > =
>>> > CALL FOR PAPERS: Machine Translation Journal
>>> > Special Issue on Machine Translation for Low-Resource Languages
>>> > https://www.springer.com/computer/ai/journal/10590/
>>> >
>>> > GUEST EDITORS (Listed alphabetically)
>>> > • Alina Karakanta (FBK-Fondazione Bruno Kessler)
>>> > • Audrey N. Tong (NIST)
>>> > • Chao-Hong Liu (ADAPT Centre/Dublin City University)
>>> > • Ian Soboroff (NIST)
>>> > • Jonathan Washington (Swarthmore College)
>>> > • Oleg Aulov (NIST)
>>> > • Xiaobing Zhao (Minzu University of China)
>>> >
>>> > Machine translation (MT) technologies have been improved significantly
>>> > in the last two decades, with developments in phrase-based statistical
>>> > MT (SMT) and recently neural MT (NMT). However, most of these methods
>>> > rely on the availability of large parallel data for training the MT
>>> > systems, resources which are not available for the majority of
>>> > language pairs, and hence current technologies often fall short in
>>> > their ability to be applied to low-resource languages. Developing MT
>>> > technologies using relatively small corpora still presents a major
>>> > challenge for the MT community. In addition, many methods for
>>> > developing MT systems still rely on several natural language
>>> > processing (NLP) tools to pre-process texts in source languages and
>>> > post-process MT outputs in target languages. The performance of these
>>> > tools often has a great impact on the quality of the resulting
>>> > translation. The availability of MT technologies and NLP tools can
>>> > facilitate equal access to information for the speakers of a language
>>> > and determine on which side of the digital divide they will end up.
>>> > The lack of these technologies for many of the world's languages
>>> > provides opportunities both for the field to grow and for making tools
>>> > available for speakers of low-resource languages.
>>> >
>>> > In recent years, several workshops and evaluations have been organized
>>> > to promote research on low-resource languages. NIST has been
>>> > conducting Low Resource Human Language Technology evaluations
>>> > (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
>>> > no training data in the evaluation language. Participants receive
>>> > training data in related languages, but need to bootstrap systems in
>>> > the surprise evaluation language at the start of the evaluation.
>>> > Methods for this include pivoting approaches and taking advantage of
>>> > linguistic universals. The evaluations are supported by DARPA's Low
>>> > Resource Languages for Emergent Incidents (LORELEI) program, which
>>> > seeks to advance tech

Re: [Apertium-stuff] Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-26 Thread Jonathan Washington
I can take care of it in an hour or two.  Thanks for the reminder, Ilnar,
and for the organisational help, Sevilay!

--
Jonathan

On Tue, Nov 26, 2019, 06:59 Sevilay Bayatlı 
wrote:

> Hi,
> I think we do, we already have the list of authors and draft of the
> paper.  We need someone  to submit  a title, list of authors, and
> a short description.
>
> Sevilay
>
>
> On Wed, Nov 20, 2019 at 8:05 PM Jonathan Washington <
> jonathan.n.washing...@gmail.com> wrote:
>
>> Hi all,
>>
>> This is just a reminder that the expression of interest for this
>> volume is due in less than a week!
>>
>> The expression of interest is easy: just a title, list of authors, and
>> a short description.
>>
>> If anyone else would like to help out with the updated Apertium paper
>> that we're planning to submit, then please get in touch.
>>
>> --
>> Jonathan
>>
>> пт, 1 нояб. 2019 г. в 22:21, Jonathan Washington
>> :
>> >
>> > Hi all,
>> >
>> > Below please find a revised CFP for the Machine Translation Special
>> > Issue on MT for Low-Resource Languages.
>> >
>> > =
>> > CALL FOR PAPERS: Machine Translation Journal
>> > Special Issue on Machine Translation for Low-Resource Languages
>> > https://www.springer.com/computer/ai/journal/10590/
>> >
>> > GUEST EDITORS (Listed alphabetically)
>> > • Alina Karakanta (FBK-Fondazione Bruno Kessler)
>> > • Audrey N. Tong (NIST)
>> > • Chao-Hong Liu (ADAPT Centre/Dublin City University)
>> > • Ian Soboroff (NIST)
>> > • Jonathan Washington (Swarthmore College)
>> > • Oleg Aulov (NIST)
>> > • Xiaobing Zhao (Minzu University of China)
>> >
>> > Machine translation (MT) technologies have been improved significantly
>> > in the last two decades, with developments in phrase-based statistical
>> > MT (SMT) and recently neural MT (NMT). However, most of these methods
>> > rely on the availability of large parallel data for training the MT
>> > systems, resources which are not available for the majority of
>> > language pairs, and hence current technologies often fall short in
>> > their ability to be applied to low-resource languages. Developing MT
>> > technologies using relatively small corpora still presents a major
>> > challenge for the MT community. In addition, many methods for
>> > developing MT systems still rely on several natural language
>> > processing (NLP) tools to pre-process texts in source languages and
>> > post-process MT outputs in target languages. The performance of these
>> > tools often has a great impact on the quality of the resulting
>> > translation. The availability of MT technologies and NLP tools can
>> > facilitate equal access to information for the speakers of a language
>> > and determine on which side of the digital divide they will end up.
>> > The lack of these technologies for many of the world's languages
>> > provides opportunities both for the field to grow and for making tools
>> > available for speakers of low-resource languages.
>> >
>> > In recent years, several workshops and evaluations have been organized
>> > to promote research on low-resource languages. NIST has been
>> > conducting Low Resource Human Language Technology evaluations
>> > (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
>> > no training data in the evaluation language. Participants receive
>> > training data in related languages, but need to bootstrap systems in
>> > the surprise evaluation language at the start of the evaluation.
>> > Methods for this include pivoting approaches and taking advantage of
>> > linguistic universals. The evaluations are supported by DARPA's Low
>> > Resource Languages for Emergent Incidents (LORELEI) program, which
>> > seeks to advance technologies that are less dependent on large data
>> > resources and that can be quickly pivoted to new languages within a
>> > very short amount of time so that information from any language can be
>> > extracted in a timely manner to provide situation awareness to
>> > emergent incidents. There are also the Workshop on Technologies for MT
>> > of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning
>> > Approaches for Low-Resource Natural Language Processing (DeepLo),
>> > which provide a venue for sharing research and working on the research
>> > and development in this field.
>> >
>> &g

[Apertium-stuff] 2nd CFP: Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-21 Thread Jonathan Washington
CALL FOR PAPERS: Machine Translation Journal
Special Issue on Machine Translation for Low-Resource Languages
https://www.springer.com/journal/10590/updates/17293708

NOTICE
1. Expression of interest (EOI) by next Tuesday (26 Nov 2019):
https://forms.gle/mAQH4qaPTuzDhEceA
2. Paper submission deadline: 25 Feb 2020.

GUEST EDITORS (Listed alphabetically)
• Alina Karakanta (FBK-Fondazione Bruno Kessler)
• Audrey N. Tong (NIST)
• Chao-Hong Liu (ADAPT Centre/Dublin City University)
• Ian Soboroff (NIST)
• Jonathan Washington (Swarthmore College)
• Oleg Aulov (NIST)
• Xiaobing Zhao (Minzu University of China)

Machine translation (MT) technologies have been improved significantly
in the last two decades, with developments in phrase-based statistical
MT (SMT) and recently neural MT (NMT). However, most of these methods
rely on the availability of large parallel data for training the MT
systems, resources which are not available for the majority of
language pairs, and hence current technologies often fall short in
their ability to be applied to low-resource languages. Developing MT
technologies using relatively small corpora still presents a major
challenge for the MT community. In addition, many methods for
developing MT systems still rely on several natural language
processing (NLP) tools to pre-process texts in source languages and
post-process MT outputs in target languages. The performance of these
tools often has a great impact on the quality of the resulting
translation. The availability of MT technologies and NLP tools can
facilitate equal access to information for the speakers of a language
and determine on which side of the digital divide they will end up.
The lack of these technologies for many of the world's languages
provides opportunities both for the field to grow and for making tools
available for speakers of low-resource languages.

In recent years, several workshops and evaluations have been organized
to promote research on low-resource languages. NIST has been
conducting Low Resource Human Language Technology evaluations
(LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
no training data in the evaluation language. Participants receive
training data in related languages, but need to bootstrap systems in
the surprise evaluation language at the start of the evaluation.
Methods for this include pivoting approaches and taking advantage of
linguistic universals. The evaluations are supported by DARPA's Low
Resource Languages for Emergent Incidents (LORELEI) program, which
seeks to advance technologies that are less dependent on large data
resources and that can be quickly pivoted to new languages within a
very short amount of time so that information from any language can be
extracted in a timely manner to provide situation awareness to
emergent incidents. There are also the Workshop on Technologies for MT
of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning
Approaches for Low-Resource Natural Language Processing (DeepLo),
which provide a venue for sharing research and working on the research
and development in this field.

This special issue solicits original research papers on MT
systems/methods and related NLP tools for low-resource languages in
general. LoReHLT, LORELEI, LoResMT and DeepLo participants are very
welcome to submit their work to the special issue. Summary papers on
MT research for specific low-resource languages, as well as extended
versions (>40% difference) of published papers from relevant
conferences/workshops are also welcome.

Topics of the special issue include but are not limited to:
 * Research and review papers of MT systems/methods for low-resource languages
 * Research and review papers of pre-processing and/or post-processing
NLP tools for MT
 * Word tokenizers/de-tokenizers for low-resource languages
 * Word/morpheme segmenters for low-resource languages
 * Use of morphological analyzers and/or morpheme segmenters in MT
 * Multilingual/cross-lingual NLP tools for MT
 * Review of available corpora of low-resource languages for MT
 * Pivot MT for low-resource languages
 * Zero-shot MT for low-resource languages
 * Fast building of MT systems for low-resource languages
 * Re-usability of existing MT systems and/or NLP tools for
low-resource languages
 * Machine translation for language preservation
 * Techniques that work across many languages and modalities
 * Techniques that are less dependent on large data resources
 * Use of language-universal resources
 * Bootstrap trained resources for short development cycle
 * Entity-, relation- and event-extraction
 * Sentiment detection
 * Summarization
 * Processing diverse languages, genres (news, social media, etc.) and
modalities (text, speech, video, etc.)

IMPORTANT DATES
November 26, 2019: Expression of interest (EOI)
February 25, 2020: Paper submission deadline
July 7, 2020: Camera-ready papers due
December, 2020: Publication

SUBMISSION GUIDELINES
o For EOI, please s

Re: [Apertium-stuff] Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-20 Thread Jonathan Washington
Hi all,

This is just a reminder that the expression of interest for this
volume is due in less than a week!

The expression of interest is easy: just a title, list of authors, and
a short description.

If anyone else would like to help out with the updated Apertium paper
that we're planning to submit, then please get in touch.

--
Jonathan

пт, 1 нояб. 2019 г. в 22:21, Jonathan Washington
:
>
> Hi all,
>
> Below please find a revised CFP for the Machine Translation Special
> Issue on MT for Low-Resource Languages.
>
> =
> CALL FOR PAPERS: Machine Translation Journal
> Special Issue on Machine Translation for Low-Resource Languages
> https://www.springer.com/computer/ai/journal/10590/
>
> GUEST EDITORS (Listed alphabetically)
> • Alina Karakanta (FBK-Fondazione Bruno Kessler)
> • Audrey N. Tong (NIST)
> • Chao-Hong Liu (ADAPT Centre/Dublin City University)
> • Ian Soboroff (NIST)
> • Jonathan Washington (Swarthmore College)
> • Oleg Aulov (NIST)
> • Xiaobing Zhao (Minzu University of China)
>
> Machine translation (MT) technologies have been improved significantly
> in the last two decades, with developments in phrase-based statistical
> MT (SMT) and recently neural MT (NMT). However, most of these methods
> rely on the availability of large parallel data for training the MT
> systems, resources which are not available for the majority of
> language pairs, and hence current technologies often fall short in
> their ability to be applied to low-resource languages. Developing MT
> technologies using relatively small corpora still presents a major
> challenge for the MT community. In addition, many methods for
> developing MT systems still rely on several natural language
> processing (NLP) tools to pre-process texts in source languages and
> post-process MT outputs in target languages. The performance of these
> tools often has a great impact on the quality of the resulting
> translation. The availability of MT technologies and NLP tools can
> facilitate equal access to information for the speakers of a language
> and determine on which side of the digital divide they will end up.
> The lack of these technologies for many of the world's languages
> provides opportunities both for the field to grow and for making tools
> available for speakers of low-resource languages.
>
> In recent years, several workshops and evaluations have been organized
> to promote research on low-resource languages. NIST has been
> conducting Low Resource Human Language Technology evaluations
> (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
> no training data in the evaluation language. Participants receive
> training data in related languages, but need to bootstrap systems in
> the surprise evaluation language at the start of the evaluation.
> Methods for this include pivoting approaches and taking advantage of
> linguistic universals. The evaluations are supported by DARPA's Low
> Resource Languages for Emergent Incidents (LORELEI) program, which
> seeks to advance technologies that are less dependent on large data
> resources and that can be quickly pivoted to new languages within a
> very short amount of time so that information from any language can be
> extracted in a timely manner to provide situation awareness to
> emergent incidents. There are also the Workshop on Technologies for MT
> of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning
> Approaches for Low-Resource Natural Language Processing (DeepLo),
> which provide a venue for sharing research and working on the research
> and development in this field.
>
> This special issue solicits original research papers on MT
> systems/methods and related NLP tools for low-resource languages in
> general. LoReHLT, LORELEI, LoResMT and DeepLo participants are very
> welcome to submit their work to the special issue. Summary papers on
> MT research for specific low-resource languages, as well as extended
> versions (>40% difference) of published papers from relevant
> conferences/workshops are also welcome.
>
> Topics of the special issue include but are not limited to:
>  * Research and review papers of MT systems/methods for low-resource languages
>  * Research and review papers of pre-processing and/or post-processing
> NLP tools for MT
>  * Word tokenizers/de-tokenizers for low-resource languages
>  * Word/morpheme segmenters for low-resource languages
>  * Use of morphological analyzers and/or morpheme segmenters in MT
>  * Multilingual/cross-lingual NLP tools for MT
>  * Review of available corpora of low-resource languages for MT
>  * Pivot MT for low-resource languages
>  * Zero-shot MT for low-resource languages
>  * Fast building of MT systems for low-resource langua

Re: [Apertium-stuff] Help

2019-11-14 Thread Jonathan Washington
Hi Kiran,

Did you try the modes fix that Tino suggested?  What are the contents of
your modes.xml file?

--
Jonathan

On Mon, Nov 11, 2019, 21:45 kiran srigiri  wrote:

> Not working !
>
> -Kiran srigiri
>
> On Thu, 7 Nov 2019, 22:41 Ngadou Yopa,  wrote:
>
>> Hey Srigiri
>>
>> Try this `echo "house" | apertium -d . eng-hau`
>>
>> Cheers
>>
>> On Thu, 7 Nov 2019 at 12:10, kiran srigiri  wrote:
>>
>>> I am stuck at this screen when I run translate command. Somebody please
>>> help
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Contact

2019-11-06 Thread Jonathan Washington
Hi Kiran,

GSoC 2020 has not been announced to my knowledge, and it's certainly
still too early for Apertium to apply.

We're happy to discuss your progress on the pair, though, and help you
understand what we'd expect for a GSoC application.  Feel free to
discuss it with any recent GSoC mentors on IRC.

--
Jonathan

ср, 6 нояб. 2019 г. в 22:10, kiran srigiri :
>
> I will upload all the files to my GitHub repo today and share the link with 
> you.
>
> On Thu, 7 Nov 2019, 08:13 kiran srigiri,  wrote:
>>
>> I want to propose a project for Gsoc 2020. A new language pair English to 
>> Hausa. I have complied two monolingual pairs (English and Hausa) that still 
>> needs some work hope you can help get this proposal selected.
>> I have documented for Hausa in Google Code-in 2018 which you can find here.
>>
>> https://codein.withgoogle.com/archive/2018/task/6465621755691008/
>>
>>
>> On Thu, 7 Nov 2019, 07:21 Jonathan Washington, 
>>  wrote:
>>>
>>> Hi Kiran,
>>>
>>> A number of us on this list have mentored GSoC before.  What's the
>>> question?  Feel free to ask on #apertium as well.
>>>
>>> --
>>> Jonathan
>>>
>>> вт, 5 нояб. 2019 г. в 07:39, kiran srigiri :
>>> >
>>> > Any Google summer of code - Gsoc mentor I can talk with?
>>> > ___
>>> > Apertium-stuff mailing list
>>> > Apertium-stuff@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Contact

2019-11-06 Thread Jonathan Washington
Hi Kiran,

A number of us on this list have mentored GSoC before.  What's the
question?  Feel free to ask on #apertium as well.

--
Jonathan

вт, 5 нояб. 2019 г. в 07:39, kiran srigiri :
>
> Any Google summer of code - Gsoc mentor I can talk with?
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Updating the Apertium 2-page brochure

2019-11-06 Thread Jonathan Washington
Aha!  The organisation repo indeed might be the right place for these
things.  What do others think?

--
Jonathan

ср, 6 нояб. 2019 г. в 15:04, Xavi Ivars :
>
> I like the idea of having all these materials in a "promotional-materials" 
> repo (even if not sure if it should go to a folder inside 
> https://github.com/apertium/organisation instead)
>
> Missatge de Mikel L. Forcada  del dia dt., 5 de nov. 2019 a 
> les 11:28:
>>
>> Agreed! And if it is generated from source (e.g. dot), the source should
>> also be somewhere there.
>>
>> The same with the block diagram.
>>
>> Mikel
>>
>>
>> El 5/11/19 a les 16:53, Jonathan Washington ha escrit:
>> > Hi Mikel,
>> >
>> > I've been thinking recently that we also should put the Apertium SVG
>> > logo in the repos somewhere.  I wonder if this sort of thing should be
>> > grouped together (maybe in a "promotional materials" repo?) or
>> > separately.
>> >
>> > What do other people familiar with Apertium's GitHub organisation think?
>> >
>> > Also consider this overarching organisational concept:
>> > https://apertium.github.io/apertium-on-github/source-browser.html
>> >
>> > --
>> > Jonathan
>> >
>> > пн, 4 нояб. 2019 г. в 08:19, Mikel L. Forcada :
>> >> Dear all,
>> >>
>> >> Next month I will be travelling to the LT4All meeting 
>> >> (https://lt4all.elra.info/en/), where Apertium is a sponsor 
>> >> (https://lt4all.elra.info/en/sponsors/sponsors/).
>> >>
>> >> Attached goes a brochure in flat ODT  (.fodt, text file, nicer for 
>> >> versioning) which is probably 2 years old and needs updating.
>> >>
>> >> We would have to:
>> >>
>> >> Push this somewhere in GitHub.com/apertium, but where?
>> >> Update it (there are two graphs there).
>> >>
>> >> I'd appreciate it very much if:
>> >>
>> >> Someone could push it in our repositories and tell us where, so that I 
>> >> can start playing with it, and
>> >> People who know apertium better let me know of obvious changes that need 
>> >> to be done (bearing in mind that this has to be two pages).
>> >>
>> >> Thanks a million!
>> >>
>> >> All the best,
>> >>
>> >> Mikel
>> >>
>> >> --
>> >>   Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
>> >> Departament de Llenguatges i Sistemes Informàtics
>> >> Universitat d'Alacant
>> >> E-03071 Alacant, Spain
>> >> Phone: +34 96 590 9776
>> >> Fax: +34 96 590 9326
>> >>
>> >> ___
>> >> Apertium-stuff mailing list
>> >> Apertium-stuff@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>> >
>> > ___
>> > Apertium-stuff mailing list
>> > Apertium-stuff@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> --
>> Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
>> Departament de Llenguatges i Sistemes Informàtics
>> Universitat d'Alacant
>> E-03690 Sant Vicent del Raspeig
>> Spain
>> Office: +34 96 590 9776
>>
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
>
> --
> < Xavi Ivars >
> < http://xavi.ivars.me >
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Updating the Apertium 2-page brochure

2019-11-05 Thread Jonathan Washington
Hi Mikel,

I've been thinking recently that we also should put the Apertium SVG
logo in the repos somewhere.  I wonder if this sort of thing should be
grouped together (maybe in a "promotional materials" repo?) or
separately.

What do other people familiar with Apertium's GitHub organisation think?

Also consider this overarching organisational concept:
https://apertium.github.io/apertium-on-github/source-browser.html

--
Jonathan

пн, 4 нояб. 2019 г. в 08:19, Mikel L. Forcada :
>
> Dear all,
>
> Next month I will be travelling to the LT4All meeting 
> (https://lt4all.elra.info/en/), where Apertium is a sponsor 
> (https://lt4all.elra.info/en/sponsors/sponsors/).
>
> Attached goes a brochure in flat ODT  (.fodt, text file, nicer for 
> versioning) which is probably 2 years old and needs updating.
>
> We would have to:
>
> Push this somewhere in GitHub.com/apertium, but where?
> Update it (there are two graphs there).
>
> I'd appreciate it very much if:
>
> Someone could push it in our repositories and tell us where, so that I can 
> start playing with it, and
> People who know apertium better let me know of obvious changes that need to 
> be done (bearing in mind that this has to be two pages).
>
> Thanks a million!
>
> All the best,
>
> Mikel
>
> --
>  Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03071 Alacant, Spain
> Phone: +34 96 590 9776
> Fax: +34 96 590 9326
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] [Apertium-contact] Utilgjengelige språk

2019-11-03 Thread Jonathan Washington
Hi Nebojsa,

First of all, my apologies for not being able to respond in Norwegian.  I
would definitely have preferred to.

One reason other languages would be greyed out is because Apertium has a
limited number of released language pairs with English.

To use unreleased translation pairs, you can try http://beta.apertium.org ,
and you can also check "multi-step translation" for more pairs.

That said, there might be a bug here—the list of pairs with English
currently available seems very low.  I'm including the main Apertium
community mailing list on this message to see if anyone has any thoughts on
this.

--
Jonathan


On Sat, Nov 2, 2019, 14:29 Nebojsa Simic  wrote:

> Hei,
>
> På nettsiden (
> https://www.apertium.org/index.nob.html?dir=eng-cat#translation) er det
> mulig å oversette bare på disse språk: esperanto, galisisk, katalansk og
> spansk. Alle andre språk er utilgjengelig, som kan sees fra skjermbildet
> som er vedlagt. Hvorfor er andre språk «disabled»?
>
> Mvh,
>
> NS
>
>
>
> --
>
> Nebojsa Simic
>
> Associate Professor
>
> Norwegian University of Science and Technology (NTNU)
>
> Department of Chemistry, Høgskoleringen 5
>
> Natural Science Building, Office E2-122
>
> NO-7491 Trondheim, Norway
>
> Tel. +47 735 90 829
>
> Fax: +47 735 96 255
>
>
> ___
> Apertium-contact mailing list
> apertium-cont...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-contact
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Error with apertium-init

2019-11-02 Thread Jonathan Washington
Hi Kiran,

Try python3?

--
Jonathan

On Sat, Nov 2, 2019, 21:45 kiran srigiri  wrote:

> $~/Desktop/apertium eng-hau prototype$ python apertium-init.py eng-hau
> File "apertium-init.py", line 34
> SyntaxError: Non-ASCII character '\xca' in file apertium-init.py on line
> 34, but no encoding declared; see http://python.org/dev/peps/pep-0263/
> for details
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Special Issue on Machine Translation for Low-Resource Languages (MT Journal)

2019-11-01 Thread Jonathan Washington
Hi all,

Below please find a revised CFP for the Machine Translation Special
Issue on MT for Low-Resource Languages.

=
CALL FOR PAPERS: Machine Translation Journal
Special Issue on Machine Translation for Low-Resource Languages
https://www.springer.com/computer/ai/journal/10590/

GUEST EDITORS (Listed alphabetically)
• Alina Karakanta (FBK-Fondazione Bruno Kessler)
• Audrey N. Tong (NIST)
• Chao-Hong Liu (ADAPT Centre/Dublin City University)
• Ian Soboroff (NIST)
• Jonathan Washington (Swarthmore College)
• Oleg Aulov (NIST)
• Xiaobing Zhao (Minzu University of China)

Machine translation (MT) technologies have been improved significantly
in the last two decades, with developments in phrase-based statistical
MT (SMT) and recently neural MT (NMT). However, most of these methods
rely on the availability of large parallel data for training the MT
systems, resources which are not available for the majority of
language pairs, and hence current technologies often fall short in
their ability to be applied to low-resource languages. Developing MT
technologies using relatively small corpora still presents a major
challenge for the MT community. In addition, many methods for
developing MT systems still rely on several natural language
processing (NLP) tools to pre-process texts in source languages and
post-process MT outputs in target languages. The performance of these
tools often has a great impact on the quality of the resulting
translation. The availability of MT technologies and NLP tools can
facilitate equal access to information for the speakers of a language
and determine on which side of the digital divide they will end up.
The lack of these technologies for many of the world's languages
provides opportunities both for the field to grow and for making tools
available for speakers of low-resource languages.

In recent years, several workshops and evaluations have been organized
to promote research on low-resource languages. NIST has been
conducting Low Resource Human Language Technology evaluations
(LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is
no training data in the evaluation language. Participants receive
training data in related languages, but need to bootstrap systems in
the surprise evaluation language at the start of the evaluation.
Methods for this include pivoting approaches and taking advantage of
linguistic universals. The evaluations are supported by DARPA's Low
Resource Languages for Emergent Incidents (LORELEI) program, which
seeks to advance technologies that are less dependent on large data
resources and that can be quickly pivoted to new languages within a
very short amount of time so that information from any language can be
extracted in a timely manner to provide situation awareness to
emergent incidents. There are also the Workshop on Technologies for MT
of Low-Resource Languages (LoResMT) and the Workshop on Deep Learning
Approaches for Low-Resource Natural Language Processing (DeepLo),
which provide a venue for sharing research and working on the research
and development in this field.

This special issue solicits original research papers on MT
systems/methods and related NLP tools for low-resource languages in
general. LoReHLT, LORELEI, LoResMT and DeepLo participants are very
welcome to submit their work to the special issue. Summary papers on
MT research for specific low-resource languages, as well as extended
versions (>40% difference) of published papers from relevant
conferences/workshops are also welcome.

Topics of the special issue include but are not limited to:
 * Research and review papers of MT systems/methods for low-resource languages
 * Research and review papers of pre-processing and/or post-processing
NLP tools for MT
 * Word tokenizers/de-tokenizers for low-resource languages
 * Word/morpheme segmenters for low-resource languages
 * Use of morphological analyzers and/or morpheme segmenters in MT
 * Multilingual/cross-lingual NLP tools for MT
 * Review of available corpora of low-resource languages for MT
 * Pivot MT for low-resource languages
 * Zero-shot MT for low-resource languages
 * Fast building of MT systems for low-resource languages
 * Re-usability of existing MT systems and/or NLP tools for
low-resource languages
 * Machine translation for language preservation
 * Techniques that work across many languages and modalities
 * Techniques that are less dependent on large data resources
 * Use of language-universal resources
 * Bootstrap trained resources for short development cycle
 * Entity-, relation- and event-extraction
 * Sentiment detection
 * Summarization
 * Processing diverse languages, genres (news, social media, etc.) and
modalities (text, speech, video, etc.)

IMPORTANT DATES
November 26, 2019: Expression of interest (EOI)
February 25, 2020: Paper submission deadline
July 7, 2020: Camera-ready papers due
December, 2020: Publication

SUBMISSION GUIDELINES
o For EOI, please submit via the link: http

[Apertium-stuff] call for mentors for Google Code-In

2019-10-24 Thread Jonathan Washington
Greetings, Apertiumers!

Our application for Google Code-In is due on Monday (the 28th).  If
you'd like to be a mentor, please do the following before then:

1. Add your name to tasks that you can mentor at the Task Ideas page:
http://wiki.apertium.org/wiki/Task_ideas_for_Google_Code-in

2. If there are tasks you'd like to mentor that aren't on the ideas
page, feel free to add them!  You can see the talk page for tasks from
previous years.  A good place to start for coming up with tasks could
be the issue tracker for your favourite Apertium repository!

3. Send an email to apertium-gci-mentors-2...@googlegroups.com to
express your interest in being a mentor and being added to that
mailing list.  (If you'd like to be an org admin, mention that too.
Currently it's me, Fran, and Mikel, but having an additional backup
wouldn't hurt!)

Please note: all mentors should have experience with some part of
Apertium (i.e., should ideally have committed code to Apertium in the
past).

Sorry for the short notice!

--
Jonathan


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Adding invariant prefixes

2019-09-17 Thread Jonathan Washington
On Tue, Sep 17, 2019, 14:58 Kevin Brubeck Unhammer 
wrote:

>
> The upside is that you can combine words without listing everything
> twice. If you've only got one prefix, the HFST-like method is probably
> better. If you're combining lots, compounding may be worth considering.
>

We can and do implement compounding of that sort in HFST transducers too :)

Normally we don't bother separating derivational morphemes (in Turkic
transducers), though, unless they're extremely productive.

Jaume, are you planning on using this for translation or something else?
If for translation, how do you anticipate it improving translation quality?

--
Jonathan

>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] genv*dix.py need conversion

2019-09-12 Thread Jonathan Washington
Thanks, Kevin.  Yes, that's the approach I use.  It's not well-documented,
but it's pretty straightforward.

Do you know anything about the metalrx.py script that Tino mentioned?

--
Jonathan


On Thu, Sep 12, 2019, 13:47 Kevin Brubeck Unhammer 
wrote:

> Jonathan Washington
>  čálii:
>
> > I'm curious about metalrx.py.  Where is it used?  What does it do?  Is it
> > documented anywhere?  I've been using an xslt file I found in a Sámi pair
> > for creating lrx from metalrx.
>
> The Sámi pairs use an XSLT script
>
> https://github.com/apertium/apertium-sme-sma/blob/master/metalrx-to-lrx.xslt
> that two extra functions to lrx files:
>
> 1. , see example at
>
> https://github.com/apertium/apertium-sme-sma/blob/master/apertium-sme-sma.sme-sma.metalrx#L3
>for often-used sequences (that file just has a single , but you
>could have several in a row)
>
> 2.  you can wrap around a  or  or  to
>repeat it up to n times:
>
> https://github.com/apertium/apertium-sme-sma/blob/master/apertium-sme-sma.sme-sma.metalrx#L2184
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] genv*dix.py need conversion

2019-09-12 Thread Jonathan Washington
I'm curious about metalrx.py.  Where is it used?  What does it do?  Is it
documented anywhere?  I've been using an xslt file I found in a Sámi pair
for creating lrx from metalrx.

--
Jonathan

On Thu, Sep 12, 2019, 02:19 Tino Didriksen  wrote:

> Maybe I can do it myself, then. 2to3 did most of it, but I just gave up
> when the result didn't work immediately.
>
> I would absolutely say they belong in apertium's apertium-dev subpackage.
> The scripts may be written in Python, but they don't rely on the
> apertium-python API - they could be written in any language. Hence why they
> must also be installed without the .py suffix.
>
> I would move them into apertium first, make sure they get installed and
> work, then adjust the languages.
>
> Centralization also applies to metalrx.py and any other helper script that
> gets used across multiple languages/pairs.
>
> -- Tino Didriksen
>
>
> On Thu, 12 Sep 2019 at 06:49, Xavi Ivars  wrote:
>
>> I won't be able to do anything before September 20th (on vacation,
>> without computer around).
>>
>> Once I get there, I'll change the scripts. Not sure how to do it so they
>> "become" part of the apertium package, so I'll try to do out in multiple
>> steps: first convert to python3 and then, move out.
>>
>> Would it be better to create a python package to contain these type of
>> scripts, instead of bundling everything into apertium?
>>
>>
>> --
>> Xavi Ivars
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] TransducerHasWrongTypeException

2019-09-04 Thread Jonathan Washington
Hi Zanga,

Normally we don't use HFST transducers (.hfst) in bilingual modules.  Is
there a reason you'd like to use it instead of lttoolbox transducers (.bin)?

If you do want to use it in place of the lttoolbox transducer, as Fran
suggests, Makefile.am is definitely where you'd want to specify pulling it
in from the monolingual repo(s).  And yes, you'd also want to add it to the
modes.xml file.  You'll see in both files, though, that the default is
lttoolbox transducers.

--
Jonathan

ср, 4 сент. 2019 г. в 13:06, Francis Tyers :

> El 2019-09-04 17:39, Zanga Chimombo escribió:
> > Will do shortly although there's not much as I am just getting
> > started. I have seen the pipeline in the modes.xml file so I will also
> > pick through that. Thanks!
> >
> > On Wed, Sep 4, 2019 at 6:35 PM Francis Tyers 
> > wrote:
> >>
> >> El 2019-09-04 10:51, Zanga Chimombo escribió:
> >> > I am following the instructions at the How_to_bootstrap_a_new_pair and
> >> > The_quick_and_dirty_guide_to_making_a_new_language_pair wikis using
> >> > HFST for both languages. After a bit of tweaking the following test
> >> > command in the bilingual directory works:
> >> > echo “Houses” | apertium -d . XXX-YYY
> >> > Casas
> >> > echo “Casas” | apertium -d . YYY-XXX
> >> > Houses
> >> >
> >> > I then attempted to follow the instructions at the
> >> > Lexc_and_flag_diacritics_for_prefix_tagging wiki in order to manage
> >> > the prefixing. The tests above continue to work (I didn’t remove their
> >> > respective lines from the .lexc files). However, none of the new noun
> >> > roots work. E.g.
> >> > echo “Abambo” | apertium -d . XXX-YYY
> >> > * Abambo
> >> > echo “Acibaba” | apertium -d . YYY-XXX
> >> > * Acibaba
> >> >
> >> > When I run a few hfst commands in each monolingual directory, however,
> >> > it shows that the required reordering of the POS tags is working. E.g.
> >> > echo “Abambo” | hfst-proc -w XXX.automorph.hfst
> >> > ^Abambo/bambo$
> >> > echo “Acibaba” | hfst-proc -w YYY.automorph.hfst
> >> > ^Acibaba/baba$
> >> >
> >> > To continue debugging, I would like to go step by step through the
> >> > rest of the translation pipeline in order to pinpoint where the
> >> > problem is (using hfst-proc instead of lt-proc as defined in modes.xml
> >> > file)! However, there are no *.hfst files in the XXX-YYY bilingual
> >> > directory! And when I try:
> >> > echo “Acibaba” | hfst-proc -w YYY-XXX.automorf.bin
> >> >
> >> > I get the following error:
> >> > terminate called after throwing an instance of
> >> > 'TransducerHasWrongTypeException'
> >> >
> >> > How do I generate *.hfst files in the XXX-YYY bilingual directory? Or,
> >> > what is it that I am missing?
> >> >
> >> >
> >>
> >> Dear Zanga,
> >>
> >> Could you direct us to a location (e.g. GitHub or GitLab) where we
> >> could
> >> download your code and try it out? It would be much easier to debug.
> >>
> >> Regards,
> >>
> >> Fran
>
> You probably also want to look at the Makefile.am in that package
> and in other packages that use HFST.
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Portuguese-Catalan (almost) ready for packaging

2019-08-29 Thread Jonathan Washington
Hi Hèctor,

Then what I might recommend is to import the history of apertium-por-cat
into (on top of) the history of apertium-pt-ca, and then rename the latter
to replace the former.  This might take some effort (might need to be done
file by file?), and certainly some git foo.

Does anyone have suggestions for how to go about doing this?

Btw, I believe nuboro was a GCI student.  There may have been a task to
convert trunk pairs to use 3-letter codes.  But it sounds like it was done
tentatively, and not using best practices.

--
Jonathan

On Thu, Aug 29, 2019, 01:44 Hèctor Alòs i Font  wrote:

> Hi Jonathan,
>
> Sure apertium-por-cat is a continuation of the old apertium-pt-ca. There
> is a vast amount of fine work in apertium-pt-ca and it can be found in
> apertium-por-cat. But I did not create apertium-por-cat. I found it. It
> seems it was created by the user nuboro  in January 2016, maybe with the
> help of Fran, who also has a commit then (
> https://github.com/apertium/apertium-por-cat/commits/master?after=e918d92eef977808f5542db47c549e27832aa0ed+209
>  ).
> I think that the problem is "only" with the history of apertium-pt-ca in
> GitHub if it is removed.
> By the way, the AUTHORS file in apertium-por-cat was empty, which seemed
> strange to me. I see now that that AUTHORS file in apertium-pt-ca has a
> number of lines, so I am surprised why this particular file in
> apertium-pot-cat was not copied from apertium-pt-ca. So I'll  add now the
> old AUTHORS information to the new package.
>
> Hèctor
>
> Missatge de Jonathan Washington  del dia
> dj., 29 d’ag. 2019 a les 5:59:
>
>> Hi Hèctor,
>>
>> Do I understand that apertium-por-cat is not a continuation of the code
>> from apertium-pt-ca?
>>
>> I would have expected that the code is a continuation (or branch) of the
>> old repo.  If this were the case, pushing the new code to the
>> apertium-pt-ca repo and renaming it to apertium-por-cat would be the ideal
>> approach.
>>
>> If instead we're dealing with two separate and unrelated code bases (what
>> it sounds like), then we need to figure out a way to keep the old code
>> around but make sure people don't try to use it for anything unless they
>> know that it's been superseded.  A two-repo solution is not ideal in this
>> case...
>>
>> --
>> Jonathan
>>
>> ср, 28 авг. 2019 г. в 08:55, Kevin Brubeck Unhammer :
>>
>>> Hèctor Alòs i Font 
>>> čálii:
>>>
>>> > Thanks, Kevin!
>>> > About the other question, I have no idea, but the case of
>>> apertium-ca-it
>>> > (which has to be substitued by apertium-cat-ita) is the same. I cannot
>>> > understand has has been done with it:
>>> > https://github.com/search?q=apertium-ca-it
>>>
>>> It seems that ca-it has been *renamed* cat-ita, so now
>>> https://github.com/apertium/apertium-ca-it redirects to
>>> https://github.com/apertium/apertium-cat-ita
>>>
>>> > By the way, apertium-por-cat should probably be labeled as trunk.
>>>
>>> done :)
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Portuguese-Catalan (almost) ready for packaging

2019-08-28 Thread Jonathan Washington
Hi Hèctor,

Do I understand that apertium-por-cat is not a continuation of the code
from apertium-pt-ca?

I would have expected that the code is a continuation (or branch) of the
old repo.  If this were the case, pushing the new code to the
apertium-pt-ca repo and renaming it to apertium-por-cat would be the ideal
approach.

If instead we're dealing with two separate and unrelated code bases (what
it sounds like), then we need to figure out a way to keep the old code
around but make sure people don't try to use it for anything unless they
know that it's been superseded.  A two-repo solution is not ideal in this
case...

--
Jonathan

ср, 28 авг. 2019 г. в 08:55, Kevin Brubeck Unhammer :

> Hèctor Alòs i Font 
> čálii:
>
> > Thanks, Kevin!
> > About the other question, I have no idea, but the case of apertium-ca-it
> > (which has to be substitued by apertium-cat-ita) is the same. I cannot
> > understand has has been done with it:
> > https://github.com/search?q=apertium-ca-it
>
> It seems that ca-it has been *renamed* cat-ita, so now
> https://github.com/apertium/apertium-ca-it redirects to
> https://github.com/apertium/apertium-cat-ita
>
> > By the way, apertium-por-cat should probably be labeled as trunk.
>
> done :)
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] GCI 2019 Mentorship

2019-08-27 Thread Jonathan Washington
Hi Andi,

The main requirement on our side for being a mentor is that you've
contributed before and you're capable of mentoring some subset of tasks.
So there should be no problem :)

Of course this is all contingent on Apertium being accepted this year.  If
we're accepted, we'll reach out and invite you and anyone else who's
expressed interest and is qualified.  And of course feel free to ping any
org admin on IRC or by email if things are moving along and you think we
might've forgotten.

--
Jonathan

On Tue, Aug 27, 2019, 10:31 Andi Qu  wrote:

> Hi everyone
>
> My name is Andi Qu and I am interested in mentoring for Apertium for GCI
> 2019. I have previously worked with Apertium during GCI 2017 and GCI 2018,
> so I have a decent understanding of how Apertium works
>
> Please let me know what additional steps there are to becoming a mentor
> either through email or through the IRC (nick: dolphingarlic)
>
> Regards
> Andi
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] recursive transfer mode name

2019-07-22 Thread Jonathan Washington
Hi everyone,

The recursive transfer GSoC project is progressing nicely, and it's
anticipated that a release will occur at the end of GSoC.  It's very
exciting!

We'd like to add debug mode support to Apertium for recursive transfer, via
the following file:
https://github.com/apertium/apertium/blob/master/apertium/modes2debugmodes.xsl

Debug modes are what you get when you run e.g., "apertium -d .
abc-xyz-morph" or "abc-xyz-biltrans" or the like.

The question is what to call the debug mode.

I've been calling it "-rectransfer" on my own machine while I test the
module this summer.  I see two disadvantages of this name.  First of all,
it's a bit longer and less abbreviated than most names.  Also, it's a
stand-in replacement for the shallow (.t*x) transfer module, so really it's
just "the transfer module" for any pair that uses it (see
http://wiki.apertium.org/wiki/Pipeline for the updated pipeline).

For this reason, we could just call the mode "-transfer", but maybe users
could get confused if they're using the shallow transfer module and
expecting "-transfer" to work there.  (Perhaps we could set it up to work
for either, regardless?).  Also, there's been some talk of adding a .t*x
file after the recursive transfer module in certain pairs to make certain
things "easier" (definitely subjective and open for debate), so another
mode would be needed in these cases (if anyone were to ever try to set up a
pair in such a way).

So we're opening this issue to discussion.  We'd love to reach some sort of
consensus on this, and community input is crucial to do so.

--
Jonathan
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Ôdp: Ôdp: [Pd: Implementing apertium-html-tools]

2019-07-21 Thread Jonathan Washington
We support plenty of other languages that don't have two-letter codes (chv,
kaa, ...), and we've never had to do it that way to my knowledge.  Maybe
Sushain can clarify?

--
Jonathan

On Sun, Jul 21, 2019, 11:35 Grzegorz Kulik  wrote:

> No, you define a language's name with the two-letter code:
>
> var languages = {'af': 'Afrikaans',...
>
> and then you link it to the three-letter one
>
> var iso639Codes = {'afr': 'af',...
>
> I don't know about the others but I've got a pol-szl pair, so I have to do
> it like that. :)
>
> Greg
>
> We niydziela, 21 lip 2019 ô godzinie 10:36, Jonathan Washington (
> jonathan.n.washing...@gmail.com) pisze:
>
> Shouldn't it work without needing a two-letter code?
>
> --
> Jonathan
>
> On Sun, Jul 21, 2019, 09:58 Grzegorz Kulik  wrote:
>
> Hi Sushain,
>
> thank you for your answer, it pointed me in the right direction. Turns out
> Silesian wasn't defined at all in localization.js or in min.js. In 'var
> languages' I created a fake two-letter code for Silesian and then in 'var
> iso639Codes' I linked it to the three-letter real one. Now everything works
> as expected apart from the POST method. I'm trying to figure it out because
> some people might consider the GET requests a nuisance from the privacy
> point of view.
>
> Best,
> Greg
>
> We sobota, 20 lip 2019 ô godzinie 16:07, Sushain Cherivirala (
> sush...@skc.name) pisze:
>
> Hi Greg,
>
> There are a couple errors in the JS console that reveal the issue.
> Html-tools doesn't attempt to localize language names until it has obtained
> the localized interface strings. Since you haven't hosted those, that
> request is failing, resulting in the success callback never firing and
> language names never being localized. Here's the relevant code:
>
>
> https://github.com/apertium/apertium-html-tools/blob/master/assets/js/localization.js#L328-L336
>
> I'm happy to see you've got something working (albeit with a little elbow
> grease). Html-tools isn't really designed as a set of components
> (unfortunately) so it's a bit difficult to integrate into an existing site.
> There's also not much documentation inside the code in terms of comments
> (this one is past-me's fault).
>
> [image: Sushain Cherivirala]
> *Sushain K. Cherivirala *
> Stanford University, M.S. in Computer Science '19
> Carnegie Mellon University, B.S. in Computer Science '18
> (713) 992-4043 | www.skc.name
>
>
> On Sat, Jul 20, 2019 at 2:34 PM Francis Tyers  wrote:
>
> El 2019-07-20 22:31, Francis Tyers escribió:
> > El 2019-07-20 22:18, Grzegorz Kulik escribió:
> >> I figured it out! The "index.eng.html" part, I mean. The translator is
> >> not in testing anymore, so I moved it to
> >> https://silling.org/translator I still don't know how to approach the
> >> other two problems. If anyone could help, I would really appreciate
> >> it. :)
> >>
> >> Greg
> >
> > I believe that to change the name of the language from the code you
> > need to make the change in html-tools.
> >
> > Could you make an issue here:
> >
> > https://github.com/apertium/apertium-html-tools/issues
> >
> > Thanks!
>
> It would also be cool if you could link to Apertium here:
>
> https://silling.org/informacyje-o-maszinowym-translatorze/
>
> :)
>
> F.
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
>
> Apertium-stuff mailing list
>
> Apertium-stuff@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
> ___
>
> Apertium-stuff mailing list
>
> Apertium-stuff@lists.sourceforge.net
>
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


  1   2   >