Re: [Apertium-stuff] Update about superblanks in transfer

Hèctor Alòs i Font Sun, 30 Aug 2020 00:51:38 -0700

Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30 d’ag.
2020 a les 9:49:


> My guess is, the transfer rule for Franco-Japanese has a two word input,
> so the stored blank is "-". Now the output has 3 words "una
> Franco-Japonêsa", since the blanks are printed in order, they're printed in
> the first available <b/> spot in the output rules.
>

Yes, that is. "Franco" is a prefix and it is analysed as such. I have some
tens of prefixes for avoiding having hundreds of words in the
dictionaries and, more important, to be able to deal to unknown pairs like
"franco-tibétain" or "franco-silésien".


>
> There's a few possible solutions for this. One idea is to have two kinds
> of blank markers - one that will print a space always, and one that will
> print available input blanks. This can also be implemented by having a <lit
> v=" "/> in the output rule and then <b/> in the next spot. If this seems
> too hacky a solution we can discuss other options.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna <khanna.tan...@gmail.com>
> wrote:
>
>> Hèctor,
>> No worries I'll look into this. Can you send the input sentences? I want
>> to see the transfer rules that are applying to the erroneous parts. They
>> might need some changing.
>>
>> तन्मय खन्ना
>> Tanmai Khanna
>>
>> ------------------------------
>> *From:* Hèctor Alòs i Font <hectora...@gmail.com>
>> *Sent:* Sunday, August 30, 2020 11:57:16 AM
>> *To:* [apertium-stuff] <apertium-stuff@lists.sourceforge.net>
>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer
>>
>> Unfortunately, I found a lot of problems cased by superblanks, especially
>> with the handling of hyphens. See a couple of differences in translations
>> of my French test corpus into Arpitan before and after the update:
>>
>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos
>> Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa.
>> ---
>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos
>> Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa.
>>
>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la
>> "*foresta" des pêrches de la Sêna.
>> ---
>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la
>> "*foresta" des pêrches de la Sêna.
>>
>> Hèctor
>>
>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia ds., 29
>> d’ag. 2020 a les 16:50:
>>
>> Hey guys!
>> The wordbound blanks project handles blanks that are supposed to be
>> reordered. Therefore, we no longer need the user to be worried about blank
>> positions in transfer rules. The latest update to the apertium code makes
>> it such that <b pos="X"/> is now the same as <b/> . You can change the <b
>> pos="X"/> in your transfer rules to just <b/> and it'll work.
>>
>> Now, the only thing you need to worry about when writing transfer rules
>> is whether you want a blank between the two LUs or not. *Input blanks
>> will be stored as a queue and will be printed in order in all
>> available <b/> spots in the rule output. *
>>
>> *Note:*
>> - If the output rule has more blank spots than input blanks, then the
>> remaining blank spots will be spaces.
>> - If the output rule has less blank spots than input blanks, then the
>> remaining input blanks will be output after the rule output.
>> - If the input blank is an empty string, it is stored as a space.
>>
>> In some transfer rules, there are input patterns which don't have a space
>> between them. In the output section of these transfer rules, <b pos="1"/>
>>  used to give an empty string, but it will now give a space. To remove
>> the blank from the output, you will need to remove the <b pos="1"/> from
>> the transfer rule and it will be fine.
>>
>> Here are some examples from the tests.
>>
>> EXAMPLE 1:
>> Input:
>>
>> [blank1] ^worda<det>/wordta<det>$ ;[blank2]; ^wordb<adj>/wordtb<adj>$ 
>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>
>> There's no <b/> in rule output, so all blanks are after flushed after
>> rule output.
>>
>> Output:
>>
>> [blank1] ^test1<adj>{^wordta<det>$^wordtb<adj>$^ho<n><acr>$}$ ;[blank2];  
>> [blank3];   [blank4]
>>
>> EXAMPLE 2:
>> Input:
>>
>> [blank1] ^wordb<adj>/wordtb<adj>$ ;[blank2]; ^worda<det>/wordta<det>$ 
>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>
>> There's one <b/> in rule output, so it prints one and flushes the rest.
>>
>> Output:
>>
>> [blank1] ^test1<det>{^wordta<det>$ ;[blank2]; ^ho<n><acr>$}$ [blank3];   
>> [blank4]
>>
>> This has been implemented for the chunker, interchunk, and postchunk.
>>
>> If you have any questions, suggestions, comments, etc., I'll be happy to
>> respond to them.
>>
>> Thanks and Regards,
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Update about superblanks in transfer

Reply via email to