Re: [Apertium-stuff] Update about superblanks in transfer

Tanmai Khanna Sun, 30 Aug 2020 02:47:38 -0700

Hèctor what I mean is, if you don't want a space in the output of rules you
have to remove the <b/>.
For eg.,


$ echo "f.75v." | apertium -d .. fra-frp

*f.75 v..


This space between 75 and v is now there because the output rule has a <b/>
and so if you want the output to come without a space, you should remove
the <b/>.

<rule comment="REGLA: NUM NOM">
      <pattern>
        <pattern-item n="num"/>
        <pattern-item n="nom"/>
      </pattern>
      <action>
        <call-macro n="f_concord2">
          <with-param pos="2"/>
          <with-param pos="1"/>
        </call-macro>
        <call-macro n="f_chunk_tags">
          <with-param pos="2"/>
        </call-macro>
        <out>
          <chunk name="num_n">
            <tags>
              <tag><lit-tag v="SN"/></tag>
              <tag><clip pos="2" side="tl" part="gen"/></tag>
              <tag><clip pos="2" side="tl" part="nbr"/></tag>
              <tag><var n="genero_sl"/></tag>
              <tag><var n="numero_sl"/></tag>
            </tags>
              <lu>
                <clip pos="1" side="tl" part="lemh"/>
                <clip pos="1" side="tl" part="tags"/>
                <clip pos="1" side="tl" part="lemq"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="lemh"/>
                <clip pos="2" side="tl" part="tags"/>
                <clip pos="2" side="tl" part="lemq"/>
              </lu>
            </chunk>
         </out>
      </action>
    </rule>

This is because if the input blank is an empty string then that isn't
counted as a blank. Does that work?

*तन्मय खन्ना *
*Tanmai Khanna*


On Sun, Aug 30, 2020 at 2:58 PM Hèctor Alòs i Font <hectora...@gmail.com>
wrote:

> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30 d’ag.
> 2020 a les 12:06:
>
>> Hi Hèctor,
>> I'm dealing with the issues I see one by one.
>> 1. I was flushing the remaining blanks after processOut because I thought
>> usually we only have one <out>..</out> block in the rule, but in some of
>> your rules there's multiple, so in the latest commit to apertium/apertium,
>> I made them flush after the rule is finished outputting entirely. This
>> solves some of the issues such as:
>>
>> $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp
>>
>> u licê Louis-lo-Grant.
>>
>>
>>
> It's too difficult to have a single <out> when dealing with complex
> structures. For instance, in French there is "not + verb + secondary-not",
> but in Arpitan I have "verb + not". Furthermore, the verb can be in a past
> tense in the source language but needs "aux + participle" in the target
> language (and I have to deal with which of the auxiliaries to use). More :
> the verb can be pronominal in the output language, but not in the source.
> So I use macros that deal with each of these issues and add or remove
> stuff. The result is a kind of multi-step output (and I'm not the only that
> does it).
>
>
>> 2. The spaces between numbers in your output are probably coming because
>> you have <b/> in the rules. If you remove those, the spaces will go away.
>>
>
> I can't remove <b/> in the rules. They are added when a new word is added,
> so I must add a blank too, at its beginning or its end.
>
>
>>
>> I'm still evaluating some other issues.
>>
>>
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>>
>>
>> On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font <hectora...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30
>>> d’ag. 2020 a les 9:49:
>>>
>>>> My guess is, the transfer rule for Franco-Japanese has a two word
>>>> input, so the stored blank is "-". Now the output has 3 words "una
>>>> Franco-Japonêsa", since the blanks are printed in order, they're printed in
>>>> the first available <b/> spot in the output rules.
>>>>
>>>
>>> Yes, that is. "Franco" is a prefix and it is analysed as such. I have
>>> some tens of prefixes for avoiding having hundreds of words in the
>>> dictionaries and, more important, to be able to deal to unknown pairs like
>>> "franco-tibétain" or "franco-silésien".
>>>
>>>
>>>>
>>>> There's a few possible solutions for this. One idea is to have two
>>>> kinds of blank markers - one that will print a space always, and one that
>>>> will print available input blanks. This can also be implemented by having a
>>>> <lit v=" "/> in the output rule and then <b/> in the next spot. If this
>>>> seems too hacky a solution we can discuss other options.
>>>>
>>>> *तन्मय खन्ना *
>>>> *Tanmai Khanna*
>>>>
>>>>
>>>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna <khanna.tan...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hèctor,
>>>>> No worries I'll look into this. Can you send the input sentences? I
>>>>> want to see the transfer rules that are applying to the erroneous parts.
>>>>> They might need some changing.
>>>>>
>>>>> तन्मय खन्ना
>>>>> Tanmai Khanna
>>>>>
>>>>> ------------------------------
>>>>> *From:* Hèctor Alòs i Font <hectora...@gmail.com>
>>>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM
>>>>> *To:* [apertium-stuff] <apertium-stuff@lists.sourceforge.net>
>>>>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer
>>>>>
>>>>> Unfortunately, I found a lot of problems cased by superblanks,
>>>>> especially with the handling of hyphens. See a couple of differences in
>>>>> translations of my French test corpus into Arpitan before and after the
>>>>> update:
>>>>>
>>>>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur
>>>>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa.
>>>>> ---
>>>>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur
>>>>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa.
>>>>>
>>>>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la
>>>>> "*foresta" des pêrches de la Sêna.
>>>>> ---
>>>>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la
>>>>> "*foresta" des pêrches de la Sêna.
>>>>>
>>>>> Hèctor
>>>>>
>>>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia ds., 29
>>>>> d’ag. 2020 a les 16:50:
>>>>>
>>>>> Hey guys!
>>>>> The wordbound blanks project handles blanks that are supposed to be
>>>>> reordered. Therefore, we no longer need the user to be worried about blank
>>>>> positions in transfer rules. The latest update to the apertium code makes
>>>>> it such that <b pos="X"/> is now the same as <b/> . You can change
>>>>> the <b pos="X"/> in your transfer rules to just <b/> and it'll work.
>>>>>
>>>>> Now, the only thing you need to worry about when writing transfer
>>>>> rules is whether you want a blank between the two LUs or not. *Input
>>>>> blanks will be stored as a queue and will be printed in order in all
>>>>> available <b/> spots in the rule output. *
>>>>>
>>>>> *Note:*
>>>>> - If the output rule has more blank spots than input blanks, then the
>>>>> remaining blank spots will be spaces.
>>>>> - If the output rule has less blank spots than input blanks, then the
>>>>> remaining input blanks will be output after the rule output.
>>>>> - If the input blank is an empty string, it is stored as a space.
>>>>>
>>>>> In some transfer rules, there are input patterns which don't have a
>>>>> space between them. In the output section of these transfer rules, <b
>>>>> pos="1"/> used to give an empty string, but it will now give a space.
>>>>> To remove the blank from the output, you will need to remove the <b
>>>>> pos="1"/> from the transfer rule and it will be fine.
>>>>>
>>>>> Here are some examples from the tests.
>>>>>
>>>>> EXAMPLE 1:
>>>>> Input:
>>>>>
>>>>> [blank1] ^worda<det>/wordta<det>$ ;[blank2]; ^wordb<adj>/wordtb<adj>$ 
>>>>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>>>>
>>>>> There's no <b/> in rule output, so all blanks are after flushed after
>>>>> rule output.
>>>>>
>>>>> Output:
>>>>>
>>>>> [blank1] ^test1<adj>{^wordta<det>$^wordtb<adj>$^ho<n><acr>$}$ ;[blank2];  
>>>>> [blank3];   [blank4]
>>>>>
>>>>> EXAMPLE 2:
>>>>> Input:
>>>>>
>>>>> [blank1] ^wordb<adj>/wordtb<adj>$ ;[blank2]; ^worda<det>/wordta<det>$ 
>>>>> [blank3];  ^hun<n><acr>/ho<n><acr>$ [blank4]
>>>>>
>>>>> There's one <b/> in rule output, so it prints one and flushes the
>>>>> rest.
>>>>>
>>>>> Output:
>>>>>
>>>>> [blank1] ^test1<det>{^wordta<det>$ ;[blank2]; ^ho<n><acr>$}$ [blank3];   
>>>>> [blank4]
>>>>>
>>>>> This has been implemented for the chunker, interchunk, and postchunk.
>>>>>
>>>>> If you have any questions, suggestions, comments, etc., I'll be happy
>>>>> to respond to them.
>>>>>
>>>>> Thanks and Regards,
>>>>> *तन्मय खन्ना *
>>>>> *Tanmai Khanna*
>>>>> _______________________________________________
>>>>> Apertium-stuff mailing list
>>>>> Apertium-stuff@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>>
>>>>> _______________________________________________
>>>> Apertium-stuff mailing list
>>>> Apertium-stuff@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>>
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Update about superblanks in transfer

Reply via email to