Re: [Apertium-stuff] Update about superblanks in transfer

Hèctor Alòs i Font Sun, 30 Aug 2020 02:28:37 -0700

Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30 d’ag.
2020 a les 12:06:


> Hi Hèctor,
> I'm dealing with the issues I see one by one.
> 1. I was flushing the remaining blanks after processOut because I thought
> usually we only have one <out>..</out> block in the rule, but in some of
> your rules there's multiple, so in the latest commit to apertium/apertium,
> I made them flush after the rule is finished outputting entirely. This
> solves some of the issues such as:
>
> $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp
>
> u licê Louis-lo-Grant.
>
>
>
It's too difficult to have a single <out> when dealing with complex
structures. For instance, in French there is "not + verb + secondary-not",
but in Arpitan I have "verb + not". Furthermore, the verb can be in a past
tense in the source language but needs "aux + participle" in the target
language (and I have to deal with which of the auxiliaries to use). More :
the verb can be pronominal in the output language, but not in the source.
So I use macros that deal with each of these issues and add or remove
stuff. The result is a kind of multi-step output (and I'm not the only that
does it).


> 2. The spaces between numbers in your output are probably coming because
> you have <b/> in the rules. If you remove those, the spaces will go away.
>

I can't remove <b/> in the rules. They are added when a new word is added,
so I must add a blank too, at its beginning or its end.


>
> I'm still evaluating some other issues.
>
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>>
>>
>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30
>> d’ag. 2020 a les 9:49:
>>
>>> My guess is, the transfer rule for Franco-Japanese has a two word input,
>>> so the stored blank is "-". Now the output has 3 words "una
>>> Franco-Japonêsa", since the blanks are printed in order, they're printed in
>>> the first available <b/> spot in the output rules.
>>>
>>
>> Yes, that is. "Franco" is a prefix and it is analysed as such. I have
>> some tens of prefixes for avoiding having hundreds of words in the
>> dictionaries and, more important, to be able to deal to unknown pairs like
>> "franco-tibétain" or "franco-silésien".
>>
>>
>>>
>>> There's a few possible solutions for this. One idea is to have two kinds
>>> of blank markers - one that will print a space always, and one that will
>>> print available input blanks. This can also be implemented by having a <lit
>>> v=" "/> in the output rule and then <b/> in the next spot. If this seems
>>> too hacky a solution we can discuss other options.
>>>
>>> *तन्मय खन्ना *
>>> *Tanmai Khanna*
>>>
>>>
>>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna <khanna.tan...@gmail.com>
>>> wrote:
>>>
>>>> Hèctor,
>>>> No worries I'll look into this. Can you send the input sentences? I
>>>> want to see the transfer rules that are applying to the erroneous parts.
>>>> They might need some changing.
>>>>
>>>> तन्मय खन्ना
>>>> Tanmai Khanna
>>>>
>>>> ------------------------------
>>>> *From:* Hèctor Alòs i Font <hectora...@gmail.com>
>>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM
>>>> *To:* [apertium-stuff] <apertium-stuff@lists.sourceforge.net>
>>>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer
>>>>
>>>> Unfortunately, I found a lot of problems cased by superblanks,
>>>> especially with the handling of hyphens. See a couple of differences in
>>>> translations of my French test corpus into Arpitan before and after the
>>>> update:
>>>>
>>>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur
>>>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa.
>>>> ---
>>>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur
>>>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa.
>>>>
>>>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la
>>>> "*foresta" des pêrches de la Sêna.
>>>> ---
>>>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la
>>>> "*foresta" des pêrches de la Sêna.
>>>>
>>>> Hèctor
>>>>
>>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia ds., 29
>>>> d’ag. 2020 a les 16:50:
>>>>
>>>> Hey guys!
>>>> The wordbound blanks project handles blanks that are supposed to be
>>>> reordered. Therefore, we no longer need the user to be worried about blank
>>>> positions in transfer rules. The latest update to the apertium code makes
>>>> it such that <b pos="X"/> is now the same as <b/> . You can change the <b
>>>> pos="X"/> in your transfer rules to just <b/> and it'll work.
>>>>
>>>> Now, the only thing you need to worry about when writing transfer rules
>>>> is whether you want a blank between the two LUs or not. *Input blanks
>>>> will be stored as a queue and will be printed in order in all
>>>> available <b/> spots in the rule output. *
>>>>
>>>> *Note:*
>>>> - If the output rule has more blank spots than input blanks, then the
>>>> remaining blank spots will be spaces.
>>>> - If the output rule has less blank spots than input blanks, then the
>>>> remaining input blanks will be output after the rule output.
>>>> - If the input blank is an empty string, it is stored as a space.
>>>>
>>>> In some transfer rules, there are input patterns which don't have a
>>>> space between them. In the output section of these transfer rules, <b
>>>> pos="1"/> used to give an empty string, but it will now give a space.
>>>> To remove the blank from the output, you will need to remove the <b
>>>> pos="1"/> from the transfer rule and it will be fine.
>>>>
>>>> Here are some examples from the tests.
>>>>
>>>> EXAMPLE 1:
>>>> Input:
>>>>
>>>> [blank1] ^worda<det>/wordta<det>$ ;[blank2]; ^wordb<adj>/wordtb<adj>$ 
>>>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>>>
>>>> There's no <b/> in rule output, so all blanks are after flushed after
>>>> rule output.
>>>>
>>>> Output:
>>>>
>>>> [blank1] ^test1<adj>{^wordta<det>$^wordtb<adj>$^ho<n><acr>$}$ ;[blank2];  
>>>> [blank3];   [blank4]
>>>>
>>>> EXAMPLE 2:
>>>> Input:
>>>>
>>>> [blank1] ^wordb<adj>/wordtb<adj>$ ;[blank2]; ^worda<det>/wordta<det>$ 
>>>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>>>
>>>> There's one <b/> in rule output, so it prints one and flushes the rest.
>>>>
>>>> Output:
>>>>
>>>> [blank1] ^test1<det>{^wordta<det>$ ;[blank2]; ^ho<n><acr>$}$ [blank3];   
>>>> [blank4]
>>>>
>>>> This has been implemented for the chunker, interchunk, and postchunk.
>>>>
>>>> If you have any questions, suggestions, comments, etc., I'll be happy
>>>> to respond to them.
>>>>
>>>> Thanks and Regards,
>>>> *तन्मय खन्ना *
>>>> *Tanmai Khanna*
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Update about superblanks in transfer

Reply via email to