Re: [Apertium-stuff] Update about superblanks in transfer

2020-08-29 Thread Kevin Brubeck Unhammer
Tanmai Khanna 
čálii:

> we no longer need the user to be worried about blank
> positions in transfer rules. The latest update to the apertium code makes
> it such that  is now the same as  . You can change the  pos="X"/> in your transfer rules to just  and it'll work.
>
> Now, the only thing you need to worry about when writing transfer rules is
> whether you want a blank between the two LUs or not. 

  


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Fixing Phonological Processes

2020-08-29 Thread Jonathan Washington
Hi Zanga,

Given the highly agglutinative nature of Yao morphology, using dix to model
it is probably not a great option.  Also, as you and Hèctor have concluded,
the morphophonology will be much easier to model using twol.

Given the extent to which the morphology involves prefixes, lexc (what we
traditionally use with twol) is probably also a poor choice for modeling
the morphology.  However, lexd was designed as a replacement for lexc for
languages like Yao (and works well with twol).  I think this is the route
you should take.

Documentation is available here:

https://github.com/apertium/lexd/blob/master/Usage.md

Some languages in Apertium whose morphologies are already implemented in
lexd (none are entirely complete yet, but some are pretty far along):

Swahili: https://github.com/apertium/apertium-swa
Lingala: https://github.com/apertium/apertium-lin
Nivkh: https://github.com/apertium/apertium-niv
Wamesa: https://github.com/apertium/apertium-wad

I probably forgot a few, but these should provide good models (and two are
related to Yao).  There are also a couple other languages being developed
using lexd that aren't public (yet).

And of course you can message this list if you have trouble, or ask in real
time in the IRC channel.

--
Jonathan

On Sat, Aug 29, 2020, 02:30 Zanga Chimombo  wrote:

> Yes. I think I should be using twol
>
> On Fri, Aug 28, 2020 at 3:56 PM Hèctor Alòs i Font 
> wrote:
> >
> > I don't think you have to do anything with the modes or the compilation
> file. The problem is in the post-yao.dix file.
> > If you add , it works:
> >
> > 
> >   
> > nk
> > ng
> >   
> >   
> > 
> >
> > $ echo "~nka" | lt-proc -p yao.autopgen.bin
> > nga
> > $ echo "~nkb" | lt-proc -p yao.autopgen.bin
> > nkb
> >
> > I don't know why without  there is no match, but in any case you
> need to add  to the relevant places (words, affixes, etc.) you want to
> trigger this rule. If you want that always nk + vowel should be ng, you
> should this in twol, not here.
> >
> > Hèctor
> >
> > Missatge de Zanga Chimombo  del dia dv., 28 d’ag.
> 2020 a les 15:41:
> >>
> >> I am still not getting anywhere and both modes.xml and the Makefile
> >> seem ok. My code is here:
> >> https://gitlab.com/zangaphee/CiBantu/-/tree/master/twoc/apertium-yao
> >>
> >> On Fri, Aug 28, 2020 at 7:36 AM Hèctor Alòs i Font <
> hectora...@gmail.com> wrote:
> >> >
> >> > The relevant files are modes.xml and Makefile.am I recommend taking a
> look to them in e.g. apertium-fra and apertium-fra-cat (or any other
> released pair using post-generation). In the first one you define the
> pipeline, so copy and adapt the call to autopgen in the end. In the second
> one you have the actual compilation of the programme.
> >> >
> >> > Missatge de Zanga Chimombo  del dia dv., 28
> d’ag. 2020 a les 7:52:
> >> >>
> >> >> Hi again, I actually have:
> >> >>
> >> >> 
> >> >>   
> >> >> nk
> >> >> ng
> >> >>   
> >> >>   
> >> >> 
> >> >>
> >> >> But it doesn't seem to get executed. Is there a missing flag/ switch
> >> >> that I was supposed to initialise/ build with? I am not seeing
> >> >> anything relating to building autopgen in the modes.xml file in the
> >> >> monolingual directory...?
> >> >>
> >> >> On Thu, Aug 27, 2020 at 2:57 PM Hèctor Alòs i Font <
> hectora...@gmail.com> wrote:
> >> >> >
> >> >> > Yes, it is in the monodix. It is just a mark put on the right
> side, e.g.
> >> >> >
> >> >> >   que
> >> >> >   que   que n="itg"/>
> >> >> >
> >> >> > If you want, you may not put it, but if you have in the post-dix
> file something like:
> >> >> >
> >> >> > 
> >> >> >   
> >> >> > nk
> >> >> > ng
> >> >> >   
> >> >> > 
> >> >> >
> >> >> > ... then every nk will be substituted by ng. That is not what you
> want, for sure. So better to put a mark in the dictionnary to know which
> "nk" may be changed (in some contexts) to nk.
> >> >> >
> >> >> > Missatge de Zanga Chimombo  del dia dj., 27
> d’ag. 2020 a les 15:18:
> >> >> >>
> >> >> >> Looking at the examples in apertium-fra.post-fra.dix it is clear
> that
> >> >> >> the tilde/ ~/  is inserted as some sort of marker earlier in
> the
> >> >> >> pipeline so that the PG recognises it and actions on it.
> >> >> >>
> >> >> >> Where in the pipeline is it inserted? Could you give me a line
> number
> >> >> >> of the insertion within the monodix perhaps?
> >> >> >>
> >> >> >> On Thu, Aug 27, 2020 at 12:12 PM Hèctor Alòs i Font
> >> >> >>  wrote:
> >> >> >> >
> >> >> >> > You can take a look, for instance to
> https://github.com/apertium/apertium-fra/blob/master/apertium-fra.post-fra.dix
> >> >> >> >
> >> >> >> > For example (at line 633) :
> >> >> >> > nen'
> >> >> >> >
> >> >> >> > Missatge de Hèctor Alòs i Font  del dia
> dj., 27 d’ag. 2020 a les 13:07:
> >> >> >> >>
> >> >> >> >> There two things in:
> >> >> >> >>
> >> >> >> >> 
> >> >> >> >>   
> >> >> >> >> nk
> >> >> >> 

[Apertium-stuff] Update about superblanks in transfer

2020-08-29 Thread Tanmai Khanna
Hey guys!
The wordbound blanks project handles blanks that are supposed to be
reordered. Therefore, we no longer need the user to be worried about blank
positions in transfer rules. The latest update to the apertium code makes
it such that  is now the same as  . You can change the  in your transfer rules to just  and it'll work.

Now, the only thing you need to worry about when writing transfer rules is
whether you want a blank between the two LUs or not. *Input blanks will be
stored as a queue and will be printed in order in all available  spots
in the rule output. *

*Note:*
- If the output rule has more blank spots than input blanks, then the
remaining blank spots will be spaces.
- If the output rule has less blank spots than input blanks, then the
remaining input blanks will be output after the rule output.
- If the input blank is an empty string, it is stored as a space.

In some transfer rules, there are input patterns which don't have a space
between them. In the output section of these transfer rules,  used
to give an empty string, but it will now give a space. To remove the blank
from the output, you will need to remove the  from the transfer
rule and it will be fine.

Here are some examples from the tests.

EXAMPLE 1:
Input:

[blank1] ^worda/wordta$ ;[blank2]; ^wordb/wordtb$
[blank3];  ^hun/ho$ [blank4]

There's no  in rule output, so all blanks are after flushed after rule
output.

Output:

[blank1] ^test1{^wordta$^wordtb$^ho$}$
;[blank2];  [blank3];   [blank4]

EXAMPLE 2:
Input:

[blank1] ^wordb/wordtb$ ;[blank2]; ^worda/wordta$
[blank3];  ^hun/ho$ [blank4]

There's one  in rule output, so it prints one and flushes the rest.

Output:

[blank1] ^test1{^wordta$ ;[blank2]; ^ho$}$ [blank3];
  [blank4]

This has been implemented for the chunker, interchunk, and postchunk.

If you have any questions, suggestions, comments, etc., I'll be happy to
respond to them.

Thanks and Regards,
*तन्मय खन्ना *
*Tanmai Khanna*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Fixing Phonological Processes

2020-08-29 Thread Zanga Chimombo
Yes. I think I should be using twol

On Fri, Aug 28, 2020 at 3:56 PM Hèctor Alòs i Font  wrote:
>
> I don't think you have to do anything with the modes or the compilation file. 
> The problem is in the post-yao.dix file.
> If you add , it works:
>
> 
>   
> nk
> ng
>   
>   
> 
>
> $ echo "~nka" | lt-proc -p yao.autopgen.bin
> nga
> $ echo "~nkb" | lt-proc -p yao.autopgen.bin
> nkb
>
> I don't know why without  there is no match, but in any case you need to 
> add  to the relevant places (words, affixes, etc.) you want to trigger 
> this rule. If you want that always nk + vowel should be ng, you should this 
> in twol, not here.
>
> Hèctor
>
> Missatge de Zanga Chimombo  del dia dv., 28 d’ag. 2020 
> a les 15:41:
>>
>> I am still not getting anywhere and both modes.xml and the Makefile
>> seem ok. My code is here:
>> https://gitlab.com/zangaphee/CiBantu/-/tree/master/twoc/apertium-yao
>>
>> On Fri, Aug 28, 2020 at 7:36 AM Hèctor Alòs i Font  
>> wrote:
>> >
>> > The relevant files are modes.xml and Makefile.am I recommend taking a look 
>> > to them in e.g. apertium-fra and apertium-fra-cat (or any other released 
>> > pair using post-generation). In the first one you define the pipeline, so 
>> > copy and adapt the call to autopgen in the end. In the second one you have 
>> > the actual compilation of the programme.
>> >
>> > Missatge de Zanga Chimombo  del dia dv., 28 d’ag. 
>> > 2020 a les 7:52:
>> >>
>> >> Hi again, I actually have:
>> >>
>> >> 
>> >>   
>> >> nk
>> >> ng
>> >>   
>> >>   
>> >> 
>> >>
>> >> But it doesn't seem to get executed. Is there a missing flag/ switch
>> >> that I was supposed to initialise/ build with? I am not seeing
>> >> anything relating to building autopgen in the modes.xml file in the
>> >> monolingual directory...?
>> >>
>> >> On Thu, Aug 27, 2020 at 2:57 PM Hèctor Alòs i Font  
>> >> wrote:
>> >> >
>> >> > Yes, it is in the monodix. It is just a mark put on the right side, e.g.
>> >> >
>> >> >   que
>> >> >   que   que> >> > n="itg"/>
>> >> >
>> >> > If you want, you may not put it, but if you have in the post-dix file 
>> >> > something like:
>> >> >
>> >> > 
>> >> >   
>> >> > nk
>> >> > ng
>> >> >   
>> >> > 
>> >> >
>> >> > ... then every nk will be substituted by ng. That is not what you want, 
>> >> > for sure. So better to put a mark in the dictionnary to know which "nk" 
>> >> > may be changed (in some contexts) to nk.
>> >> >
>> >> > Missatge de Zanga Chimombo  del dia dj., 27 d’ag. 
>> >> > 2020 a les 15:18:
>> >> >>
>> >> >> Looking at the examples in apertium-fra.post-fra.dix it is clear that
>> >> >> the tilde/ ~/  is inserted as some sort of marker earlier in the
>> >> >> pipeline so that the PG recognises it and actions on it.
>> >> >>
>> >> >> Where in the pipeline is it inserted? Could you give me a line number
>> >> >> of the insertion within the monodix perhaps?
>> >> >>
>> >> >> On Thu, Aug 27, 2020 at 12:12 PM Hèctor Alòs i Font
>> >> >>  wrote:
>> >> >> >
>> >> >> > You can take a look, for instance to 
>> >> >> > https://github.com/apertium/apertium-fra/blob/master/apertium-fra.post-fra.dix
>> >> >> >
>> >> >> > For example (at line 633) :
>> >> >> > nen'
>> >> >> >
>> >> >> > Missatge de Hèctor Alòs i Font  del dia dj., 
>> >> >> > 27 d’ag. 2020 a les 13:07:
>> >> >> >>
>> >> >> >> There two things in:
>> >> >> >>
>> >> >> >> 
>> >> >> >>   
>> >> >> >> nk
>> >> >> >> ng
>> >> >> >>   
>> >> >> >> 
>> >> >> >>
>> >> >> >> First is the  that must precede (that's the ~ Kevin said 
>> >> >> >> because it is shown as a tilde in the output). If you don't have 
>> >> >> >> it, there won't be any matching.
>> >> >> >>
>> >> >> >> Second, is the , i.e. a space. So nk- will not match, but only 
>> >> >> >> nk followed by a blank (a preceded by an ). If matched, it will 
>> >> >> >> be replaced by ng followed by a blank to.
>> >> >> >>
>> >> >> >> Hèctor
>> >> >> >>
>> >> >> >>
>> >> >> >> Missatge de Zanga Chimombo  del dia dj., 27 
>> >> >> >> d’ag. 2020 a les 12:31:
>> >> >> >>>
>> >> >> >>> Not sure I know what you mean by "~"...? Sorry. I'm new to this
>> >> >> >>>
>> >> >> >>> The input is "nkutenda". Expected output: "ngutenda".
>> >> >> >>>
>> >> >> >>> On Thu, Aug 27, 2020 at 11:26 AM Kevin Brubeck Unhammer
>> >> >> >>>  wrote:
>> >> >> >>> >
>> >> >> >>> > Zanga Chimombo 
>> >> >> >>> > čálii:
>> >> >> >>> >
>> >> >> >>> > > One of the processes that occurs in one of the languages I am 
>> >> >> >>> > > dealing
>> >> >> >>> > > with is "nk-" becoming "ng-"
>> >> >> >>> > >
>> >> >> >>> > > I thought I would be able to fix this using the post generator 
>> >> >> >>> > > here:
>> >> >> >>> > > https://gitlab.com/zangaphee/CiBantu/-/blob/master/twoc/apertium-yao/apertium-yao.post-yao.dix
>> >> >> >>> > >
>> >> >> >>> > > However, that doesn't fix it. Have I done it incorrectly? 
>> >> >> >>> > > Should I