Re: [Apertium-stuff] Update about superblanks in transfer

Tanmai Khanna Sun, 30 Aug 2020 02:49:37 -0700

Also, I agree with Tino, if the punctuation is important for the context
then it should probably be analysed as a token.


*तन्मय खन्ना *
*Tanmai Khanna*


On Sun, Aug 30, 2020 at 3:16 PM Tanmai Khanna <khanna.tan...@gmail.com>
wrote:

> Hèctor what I mean is, if you don't want a space in the output of rules
> you have to remove the <b/>.
> For eg.,
>
> $ echo "f.75v." | apertium -d .. fra-frp
>
> *f.75 v..
>
>
> This space between 75 and v is now there because the output rule has a
> <b/> and so if you want the output to come without a space, you should
> remove the <b/>.
>
> <rule comment="REGLA: NUM NOM">
>       <pattern>
>         <pattern-item n="num"/>
>         <pattern-item n="nom"/>
>       </pattern>
>       <action>
>         <call-macro n="f_concord2">
>           <with-param pos="2"/>
>           <with-param pos="1"/>
>         </call-macro>
>         <call-macro n="f_chunk_tags">
>           <with-param pos="2"/>
>         </call-macro>
>         <out>
>           <chunk name="num_n">
>             <tags>
>               <tag><lit-tag v="SN"/></tag>
>               <tag><clip pos="2" side="tl" part="gen"/></tag>
>               <tag><clip pos="2" side="tl" part="nbr"/></tag>
>               <tag><var n="genero_sl"/></tag>
>               <tag><var n="numero_sl"/></tag>
>             </tags>
>               <lu>
>                 <clip pos="1" side="tl" part="lemh"/>
>                 <clip pos="1" side="tl" part="tags"/>
>                 <clip pos="1" side="tl" part="lemq"/>
>               </lu>
>               <b pos="1"/>
>               <lu>
>                 <clip pos="2" side="tl" part="lemh"/>
>                 <clip pos="2" side="tl" part="tags"/>
>                 <clip pos="2" side="tl" part="lemq"/>
>               </lu>
>             </chunk>
>          </out>
>       </action>
>     </rule>
>
> This is because if the input blank is an empty string then that isn't
> counted as a blank. Does that work?
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Sun, Aug 30, 2020 at 2:58 PM Hèctor Alòs i Font <hectora...@gmail.com>
> wrote:
>
>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30
>> d’ag. 2020 a les 12:06:
>>
>>> Hi Hèctor,
>>> I'm dealing with the issues I see one by one.
>>> 1. I was flushing the remaining blanks after processOut because I
>>> thought usually we only have one <out>..</out> block in the rule, but in
>>> some of your rules there's multiple, so in the latest commit to
>>> apertium/apertium, I made them flush after the rule is finished outputting
>>> entirely. This solves some of the issues such as:
>>>
>>> $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp
>>>
>>> u licê Louis-lo-Grant.
>>>
>>>
>>>
>> It's too difficult to have a single <out> when dealing with complex
>> structures. For instance, in French there is "not + verb + secondary-not",
>> but in Arpitan I have "verb + not". Furthermore, the verb can be in a past
>> tense in the source language but needs "aux + participle" in the target
>> language (and I have to deal with which of the auxiliaries to use). More :
>> the verb can be pronominal in the output language, but not in the source.
>> So I use macros that deal with each of these issues and add or remove
>> stuff. The result is a kind of multi-step output (and I'm not the only that
>> does it).
>>
>>
>>> 2. The spaces between numbers in your output are probably coming
>>> because you have <b/> in the rules. If you remove those, the spaces will go
>>> away.
>>>
>>
>> I can't remove <b/> in the rules. They are added when a new word is
>> added, so I must add a blank too, at its beginning or its end.
>>
>>
>>>
>>> I'm still evaluating some other issues.
>>>
>>>
>>> *तन्मय खन्ना *
>>> *Tanmai Khanna*
>>>
>>>
>>> On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font <hectora...@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30
>>>> d’ag. 2020 a les 9:49:
>>>>
>>>>> My guess is, the transfer rule for Franco-Japanese has a two word
>>>>> input, so the stored blank is "-". Now the output has 3 words "una
>>>>> Franco-Japonêsa", since the blanks are printed in order, they're printed 
>>>>> in
>>>>> the first available <b/> spot in the output rules.
>>>>>
>>>>
>>>> Yes, that is. "Franco" is a prefix and it is analysed as such. I have
>>>> some tens of prefixes for avoiding having hundreds of words in the
>>>> dictionaries and, more important, to be able to deal to unknown pairs like
>>>> "franco-tibétain" or "franco-silésien".
>>>>
>>>>
>>>>>
>>>>> There's a few possible solutions for this. One idea is to have two
>>>>> kinds of blank markers - one that will print a space always, and one that
>>>>> will print available input blanks. This can also be implemented by having 
>>>>> a
>>>>> <lit v=" "/> in the output rule and then <b/> in the next spot. If this
>>>>> seems too hacky a solution we can discuss other options.
>>>>>
>>>>> *तन्मय खन्ना *
>>>>> *Tanmai Khanna*
>>>>>
>>>>>
>>>>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna <
>>>>> khanna.tan...@gmail.com> wrote:
>>>>>
>>>>>> Hèctor,
>>>>>> No worries I'll look into this. Can you send the input sentences? I
>>>>>> want to see the transfer rules that are applying to the erroneous parts.
>>>>>> They might need some changing.
>>>>>>
>>>>>> तन्मय खन्ना
>>>>>> Tanmai Khanna
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Hèctor Alòs i Font <hectora...@gmail.com>
>>>>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM
>>>>>> *To:* [apertium-stuff] <apertium-stuff@lists.sourceforge.net>
>>>>>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer
>>>>>>
>>>>>> Unfortunately, I found a lot of problems cased by superblanks,
>>>>>> especially with the handling of hyphens. See a couple of differences in
>>>>>> translations of my French test corpus into Arpitan before and after the
>>>>>> update:
>>>>>>
>>>>>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur
>>>>>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa.
>>>>>> ---
>>>>>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur
>>>>>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa.
>>>>>>
>>>>>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la
>>>>>> "*foresta" des pêrches de la Sêna.
>>>>>> ---
>>>>>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la
>>>>>> "*foresta" des pêrches de la Sêna.
>>>>>>
>>>>>> Hèctor
>>>>>>
>>>>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia ds., 29
>>>>>> d’ag. 2020 a les 16:50:
>>>>>>
>>>>>> Hey guys!
>>>>>> The wordbound blanks project handles blanks that are supposed to be
>>>>>> reordered. Therefore, we no longer need the user to be worried about 
>>>>>> blank
>>>>>> positions in transfer rules. The latest update to the apertium code makes
>>>>>> it such that <b pos="X"/> is now the same as <b/> . You can change
>>>>>> the <b pos="X"/> in your transfer rules to just <b/> and it'll work.
>>>>>>
>>>>>> Now, the only thing you need to worry about when writing transfer
>>>>>> rules is whether you want a blank between the two LUs or not. *Input
>>>>>> blanks will be stored as a queue and will be printed in order in all
>>>>>> available <b/> spots in the rule output. *
>>>>>>
>>>>>> *Note:*
>>>>>> - If the output rule has more blank spots than input blanks, then the
>>>>>> remaining blank spots will be spaces.
>>>>>> - If the output rule has less blank spots than input blanks, then the
>>>>>> remaining input blanks will be output after the rule output.
>>>>>> - If the input blank is an empty string, it is stored as a space.
>>>>>>
>>>>>> In some transfer rules, there are input patterns which don't have a
>>>>>> space between them. In the output section of these transfer rules, <b
>>>>>> pos="1"/> used to give an empty string, but it will now give a
>>>>>> space. To remove the blank from the output, you will need to remove the 
>>>>>> <b
>>>>>> pos="1"/> from the transfer rule and it will be fine.
>>>>>>
>>>>>> Here are some examples from the tests.
>>>>>>
>>>>>> EXAMPLE 1:
>>>>>> Input:
>>>>>>
>>>>>> [blank1] ^worda<det>/wordta<det>$ ;[blank2]; ^wordb<adj>/wordtb<adj>$ 
>>>>>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>>>>>
>>>>>> There's no <b/> in rule output, so all blanks are after flushed
>>>>>> after rule output.
>>>>>>
>>>>>> Output:
>>>>>>
>>>>>> [blank1] ^test1<adj>{^wordta<det>$^wordtb<adj>$^ho<n><acr>$}$ ;[blank2]; 
>>>>>>  [blank3];   [blank4]
>>>>>>
>>>>>> EXAMPLE 2:
>>>>>> Input:
>>>>>>
>>>>>> [blank1] ^wordb<adj>/wordtb<adj>$ ;[blank2]; ^worda<det>/wordta<det>$ 
>>>>>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>>>>>
>>>>>> There's one <b/> in rule output, so it prints one and flushes the
>>>>>> rest.
>>>>>>
>>>>>> Output:
>>>>>>
>>>>>> [blank1] ^test1<det>{^wordta<det>$ ;[blank2]; ^ho<n><acr>$}$ [blank3];   
>>>>>> [blank4]
>>>>>>
>>>>>> This has been implemented for the chunker, interchunk, and postchunk.
>>>>>>
>>>>>> If you have any questions, suggestions, comments, etc., I'll be happy
>>>>>> to respond to them.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> *तन्मय खन्ना *
>>>>>> *Tanmai Khanna*
>>>>>> _______________________________________________
>>>>>> Apertium-stuff mailing list
>>>>>> Apertium-stuff@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>>
>>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> Apertium-stuff@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Update about superblanks in transfer

Reply via email to