Also, I agree with Tino, if the punctuation is important for the context then it should probably be analysed as a token.
*तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 3:16 PM Tanmai Khanna <khanna.tan...@gmail.com> wrote: > Hèctor what I mean is, if you don't want a space in the output of rules > you have to remove the <b/>. > For eg., > > $ echo "f.75v." | apertium -d .. fra-frp > > *f.75 v.. > > > This space between 75 and v is now there because the output rule has a > <b/> and so if you want the output to come without a space, you should > remove the <b/>. > > <rule comment="REGLA: NUM NOM"> > <pattern> > <pattern-item n="num"/> > <pattern-item n="nom"/> > </pattern> > <action> > <call-macro n="f_concord2"> > <with-param pos="2"/> > <with-param pos="1"/> > </call-macro> > <call-macro n="f_chunk_tags"> > <with-param pos="2"/> > </call-macro> > <out> > <chunk name="num_n"> > <tags> > <tag><lit-tag v="SN"/></tag> > <tag><clip pos="2" side="tl" part="gen"/></tag> > <tag><clip pos="2" side="tl" part="nbr"/></tag> > <tag><var n="genero_sl"/></tag> > <tag><var n="numero_sl"/></tag> > </tags> > <lu> > <clip pos="1" side="tl" part="lemh"/> > <clip pos="1" side="tl" part="tags"/> > <clip pos="1" side="tl" part="lemq"/> > </lu> > <b pos="1"/> > <lu> > <clip pos="2" side="tl" part="lemh"/> > <clip pos="2" side="tl" part="tags"/> > <clip pos="2" side="tl" part="lemq"/> > </lu> > </chunk> > </out> > </action> > </rule> > > This is because if the input blank is an empty string then that isn't > counted as a blank. Does that work? > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Sun, Aug 30, 2020 at 2:58 PM Hèctor Alòs i Font <hectora...@gmail.com> > wrote: > >> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30 >> d’ag. 2020 a les 12:06: >> >>> Hi Hèctor, >>> I'm dealing with the issues I see one by one. >>> 1. I was flushing the remaining blanks after processOut because I >>> thought usually we only have one <out>..</out> block in the rule, but in >>> some of your rules there's multiple, so in the latest commit to >>> apertium/apertium, I made them flush after the rule is finished outputting >>> entirely. This solves some of the issues such as: >>> >>> $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp >>> >>> u licê Louis-lo-Grant. >>> >>> >>> >> It's too difficult to have a single <out> when dealing with complex >> structures. For instance, in French there is "not + verb + secondary-not", >> but in Arpitan I have "verb + not". Furthermore, the verb can be in a past >> tense in the source language but needs "aux + participle" in the target >> language (and I have to deal with which of the auxiliaries to use). More : >> the verb can be pronominal in the output language, but not in the source. >> So I use macros that deal with each of these issues and add or remove >> stuff. The result is a kind of multi-step output (and I'm not the only that >> does it). >> >> >>> 2. The spaces between numbers in your output are probably coming >>> because you have <b/> in the rules. If you remove those, the spaces will go >>> away. >>> >> >> I can't remove <b/> in the rules. They are added when a new word is >> added, so I must add a blank too, at its beginning or its end. >> >> >>> >>> I'm still evaluating some other issues. >>> >>> >>> *तन्मय खन्ना * >>> *Tanmai Khanna* >>> >>> >>> On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font <hectora...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia dg., 30 >>>> d’ag. 2020 a les 9:49: >>>> >>>>> My guess is, the transfer rule for Franco-Japanese has a two word >>>>> input, so the stored blank is "-". Now the output has 3 words "una >>>>> Franco-Japonêsa", since the blanks are printed in order, they're printed >>>>> in >>>>> the first available <b/> spot in the output rules. >>>>> >>>> >>>> Yes, that is. "Franco" is a prefix and it is analysed as such. I have >>>> some tens of prefixes for avoiding having hundreds of words in the >>>> dictionaries and, more important, to be able to deal to unknown pairs like >>>> "franco-tibétain" or "franco-silésien". >>>> >>>> >>>>> >>>>> There's a few possible solutions for this. One idea is to have two >>>>> kinds of blank markers - one that will print a space always, and one that >>>>> will print available input blanks. This can also be implemented by having >>>>> a >>>>> <lit v=" "/> in the output rule and then <b/> in the next spot. If this >>>>> seems too hacky a solution we can discuss other options. >>>>> >>>>> *तन्मय खन्ना * >>>>> *Tanmai Khanna* >>>>> >>>>> >>>>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna < >>>>> khanna.tan...@gmail.com> wrote: >>>>> >>>>>> Hèctor, >>>>>> No worries I'll look into this. Can you send the input sentences? I >>>>>> want to see the transfer rules that are applying to the erroneous parts. >>>>>> They might need some changing. >>>>>> >>>>>> तन्मय खन्ना >>>>>> Tanmai Khanna >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Hèctor Alòs i Font <hectora...@gmail.com> >>>>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM >>>>>> *To:* [apertium-stuff] <apertium-stuff@lists.sourceforge.net> >>>>>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer >>>>>> >>>>>> Unfortunately, I found a lot of problems cased by superblanks, >>>>>> especially with the handling of hyphens. See a couple of differences in >>>>>> translations of my French test corpus into Arpitan before and after the >>>>>> update: >>>>>> >>>>>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>>>>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. >>>>>> --- >>>>>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>>>>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. >>>>>> >>>>>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la >>>>>> "*foresta" des pêrches de la Sêna. >>>>>> --- >>>>>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la >>>>>> "*foresta" des pêrches de la Sêna. >>>>>> >>>>>> Hèctor >>>>>> >>>>>> Missatge de Tanmai Khanna <khanna.tan...@gmail.com> del dia ds., 29 >>>>>> d’ag. 2020 a les 16:50: >>>>>> >>>>>> Hey guys! >>>>>> The wordbound blanks project handles blanks that are supposed to be >>>>>> reordered. Therefore, we no longer need the user to be worried about >>>>>> blank >>>>>> positions in transfer rules. The latest update to the apertium code makes >>>>>> it such that <b pos="X"/> is now the same as <b/> . You can change >>>>>> the <b pos="X"/> in your transfer rules to just <b/> and it'll work. >>>>>> >>>>>> Now, the only thing you need to worry about when writing transfer >>>>>> rules is whether you want a blank between the two LUs or not. *Input >>>>>> blanks will be stored as a queue and will be printed in order in all >>>>>> available <b/> spots in the rule output. * >>>>>> >>>>>> *Note:* >>>>>> - If the output rule has more blank spots than input blanks, then the >>>>>> remaining blank spots will be spaces. >>>>>> - If the output rule has less blank spots than input blanks, then the >>>>>> remaining input blanks will be output after the rule output. >>>>>> - If the input blank is an empty string, it is stored as a space. >>>>>> >>>>>> In some transfer rules, there are input patterns which don't have a >>>>>> space between them. In the output section of these transfer rules, <b >>>>>> pos="1"/> used to give an empty string, but it will now give a >>>>>> space. To remove the blank from the output, you will need to remove the >>>>>> <b >>>>>> pos="1"/> from the transfer rule and it will be fine. >>>>>> >>>>>> Here are some examples from the tests. >>>>>> >>>>>> EXAMPLE 1: >>>>>> Input: >>>>>> >>>>>> [blank1] ^worda<det>/wordta<det>$ ;[blank2]; ^wordb<adj>/wordtb<adj>$ >>>>>> [blank3]; ^hun<n><acr>/ho<n><acr>$ [blank4] >>>>>> >>>>>> There's no <b/> in rule output, so all blanks are after flushed >>>>>> after rule output. >>>>>> >>>>>> Output: >>>>>> >>>>>> [blank1] ^test1<adj>{^wordta<det>$^wordtb<adj>$^ho<n><acr>$}$ ;[blank2]; >>>>>> [blank3]; [blank4] >>>>>> >>>>>> EXAMPLE 2: >>>>>> Input: >>>>>> >>>>>> [blank1] ^wordb<adj>/wordtb<adj>$ ;[blank2]; ^worda<det>/wordta<det>$ >>>>>> [blank3]; ^hun<n><acr>/ho<n><acr>$ [blank4] >>>>>> >>>>>> There's one <b/> in rule output, so it prints one and flushes the >>>>>> rest. >>>>>> >>>>>> Output: >>>>>> >>>>>> [blank1] ^test1<det>{^wordta<det>$ ;[blank2]; ^ho<n><acr>$}$ [blank3]; >>>>>> [blank4] >>>>>> >>>>>> This has been implemented for the chunker, interchunk, and postchunk. >>>>>> >>>>>> If you have any questions, suggestions, comments, etc., I'll be happy >>>>>> to respond to them. >>>>>> >>>>>> Thanks and Regards, >>>>>> *तन्मय खन्ना * >>>>>> *Tanmai Khanna* >>>>>> _______________________________________________ >>>>>> Apertium-stuff mailing list >>>>>> Apertium-stuff@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>> >>>>>> _______________________________________________ >>>>> Apertium-stuff mailing list >>>>> Apertium-stuff@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>> >>>> _______________________________________________ >>>> Apertium-stuff mailing list >>>> Apertium-stuff@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>> >>> _______________________________________________ >>> Apertium-stuff mailing list >>> Apertium-stuff@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff