Re: [Apertium-stuff] Update about superblanks in transfer
Hey guys, there's a few more important updates about blanks in Apertium. 1. Empty strings are now counted as blanks as well. 2. A blank is popped from the blank queue only if the or is inside .., so that you can do checks with blanks (the front one in the queue at least) in other places, such as macros, or within .. blocks. Thanks and Regards, *तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 5:59 PM Tanmai Khanna wrote: > Are the changes being implemented going to alter the behavior of the >> punctuation marks that are not analyzed as tokens? >> > > Yes, as was discussed in the thread about markup handling in Apertium, > input blanks are now read as a queue and output in order in the available > spots in the rule output. So it might not be possible to strictly > control the position of the blanks in the output as was done earlier, but > that was pretty much the intention of the change. > > *तन्मय खन्ना * > *Tanmai Khanna* > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
> Are the changes being implemented going to alter the behavior of the > punctuation marks that are not analyzed as tokens? > Yes, as was discussed in the thread about markup handling in Apertium, input blanks are now read as a queue and output in order in the available spots in the rule output. So it might not be possible to strictly control the position of the blanks in the output as was done earlier, but that was pretty much the intention of the change. *तन्मय खन्ना * *Tanmai Khanna* ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
Missatge de Tino Didriksen del dia dg., 30 d’ag. 2020 a les 11:15: > Why is - a blank in the first place? If it's needed in contexts, it should > be fully analyzed as a token. > This goes for all Apertium languages and pairs. I don't understand why > punctuation generally isn't analyzed. I assume it's just historic. > There are pros and cons. For instance, If you analyze a quotation mark (") as a token, you need to adjust every disambiguation rule where the quote can appear (which is everywhere, in fact), and that can be very annoying. I don't have a definitive answer. My guess (in the languages I am familiar with) is that most punctuation marks should interrupt the analysis, except for quotation marks, which should not (with some exceptions in turn). Are the changes being implemented going to alter the behavior of the punctuation marks that are not analyzed as tokens? Jaume ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
Also, I agree with Tino, if the punctuation is important for the context then it should probably be analysed as a token. *तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 3:16 PM Tanmai Khanna wrote: > Hèctor what I mean is, if you don't want a space in the output of rules > you have to remove the . > For eg., > > $ echo "f.75v." | apertium -d .. fra-frp > > *f.75 v.. > > > This space between 75 and v is now there because the output rule has a > and so if you want the output to come without a space, you should > remove the . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is because if the input blank is an empty string then that isn't > counted as a blank. Does that work? > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Sun, Aug 30, 2020 at 2:58 PM Hèctor Alòs i Font > wrote: > >> Missatge de Tanmai Khanna del dia dg., 30 >> d’ag. 2020 a les 12:06: >> >>> Hi Hèctor, >>> I'm dealing with the issues I see one by one. >>> 1. I was flushing the remaining blanks after processOut because I >>> thought usually we only have one .. block in the rule, but in >>> some of your rules there's multiple, so in the latest commit to >>> apertium/apertium, I made them flush after the rule is finished outputting >>> entirely. This solves some of the issues such as: >>> >>> $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp >>> >>> u licê Louis-lo-Grant. >>> >>> >>> >> It's too difficult to have a single when dealing with complex >> structures. For instance, in French there is "not + verb + secondary-not", >> but in Arpitan I have "verb + not". Furthermore, the verb can be in a past >> tense in the source language but needs "aux + participle" in the target >> language (and I have to deal with which of the auxiliaries to use). More : >> the verb can be pronominal in the output language, but not in the source. >> So I use macros that deal with each of these issues and add or remove >> stuff. The result is a kind of multi-step output (and I'm not the only that >> does it). >> >> >>> 2. The spaces between numbers in your output are probably coming >>> because you have in the rules. If you remove those, the spaces will go >>> away. >>> >> >> I can't remove in the rules. They are added when a new word is >> added, so I must add a blank too, at its beginning or its end. >> >> >>> >>> I'm still evaluating some other issues. >>> >>> >>> *तन्मय खन्ना * >>> *Tanmai Khanna* >>> >>> >>> On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font >>> wrote: >>> >>>> >>>> >>>> Missatge de Tanmai Khanna del dia dg., 30 >>>> d’ag. 2020 a les 9:49: >>>> >>>>> My guess is, the transfer rule for Franco-Japanese has a two word >>>>> input, so the stored blank is "-". Now the output has 3 words "una >>>>> Franco-Japonêsa", since the blanks are printed in order, they're printed >>>>> in >>>>> the first available spot in the output rules. >>>>> >>>> >>>> Yes, that is. "Franco" is a prefix and it is analysed as such. I have >>>> some tens of prefixes for avoiding having hundreds of words in the >>>> dictionaries and, more important, to be able to deal to unknown pairs like >>>> "franco-tibétain" or "franco-silésien". >>>> >>>> >>>>> >>>>> There's a few possible solutions for this. One idea is to have two >>>>> kinds of blank markers - one that will print a space always, and one that >>>>> will print available input blanks. This can also be implemented by having >>>>> a >>>>> in the output rule and then in the next spot. If this >>>>> seems too hacky a solution we can discus
Re: [Apertium-stuff] Update about superblanks in transfer
Hèctor what I mean is, if you don't want a space in the output of rules you have to remove the . For eg., $ echo "f.75v." | apertium -d .. fra-frp *f.75 v.. This space between 75 and v is now there because the output rule has a and so if you want the output to come without a space, you should remove the . This is because if the input blank is an empty string then that isn't counted as a blank. Does that work? *तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 2:58 PM Hèctor Alòs i Font wrote: > Missatge de Tanmai Khanna del dia dg., 30 d’ag. > 2020 a les 12:06: > >> Hi Hèctor, >> I'm dealing with the issues I see one by one. >> 1. I was flushing the remaining blanks after processOut because I thought >> usually we only have one .. block in the rule, but in some of >> your rules there's multiple, so in the latest commit to apertium/apertium, >> I made them flush after the rule is finished outputting entirely. This >> solves some of the issues such as: >> >> $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp >> >> u licê Louis-lo-Grant. >> >> >> > It's too difficult to have a single when dealing with complex > structures. For instance, in French there is "not + verb + secondary-not", > but in Arpitan I have "verb + not". Furthermore, the verb can be in a past > tense in the source language but needs "aux + participle" in the target > language (and I have to deal with which of the auxiliaries to use). More : > the verb can be pronominal in the output language, but not in the source. > So I use macros that deal with each of these issues and add or remove > stuff. The result is a kind of multi-step output (and I'm not the only that > does it). > > >> 2. The spaces between numbers in your output are probably coming because >> you have in the rules. If you remove those, the spaces will go away. >> > > I can't remove in the rules. They are added when a new word is added, > so I must add a blank too, at its beginning or its end. > > >> >> I'm still evaluating some other issues. >> >> >> *तन्मय खन्ना * >> *Tanmai Khanna* >> >> >> On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font >> wrote: >> >>> >>> >>> Missatge de Tanmai Khanna del dia dg., 30 >>> d’ag. 2020 a les 9:49: >>> >>>> My guess is, the transfer rule for Franco-Japanese has a two word >>>> input, so the stored blank is "-". Now the output has 3 words "una >>>> Franco-Japonêsa", since the blanks are printed in order, they're printed in >>>> the first available spot in the output rules. >>>> >>> >>> Yes, that is. "Franco" is a prefix and it is analysed as such. I have >>> some tens of prefixes for avoiding having hundreds of words in the >>> dictionaries and, more important, to be able to deal to unknown pairs like >>> "franco-tibétain" or "franco-silésien". >>> >>> >>>> >>>> There's a few possible solutions for this. One idea is to have two >>>> kinds of blank markers - one that will print a space always, and one that >>>> will print available input blanks. This can also be implemented by having a >>>> in the output rule and then in the next spot. If this >>>> seems too hacky a solution we can discuss other options. >>>> >>>> *तन्मय खन्ना * >>>> *Tanmai Khanna* >>>> >>>> >>>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna >>>> wrote: >>>> >>>>> Hèctor, >>>>> No worries I'll look into this. Can you send the input sentences? I >>>>> want to see the transfer rules that are applying to the erroneous parts. >>>>> They might need some changing. >>>>> >>>>> तन्मय खन्ना >>>>> Tanmai Khanna >>>>> >>>>> -- >>>>> *From:* Hèctor Alòs i Font >>>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM >>>>> *To:* [apertium-stuff] >>>>&
Re: [Apertium-stuff] Update about superblanks in transfer
Missatge de Tanmai Khanna del dia dg., 30 d’ag. 2020 a les 12:06: > Hi Hèctor, > I'm dealing with the issues I see one by one. > 1. I was flushing the remaining blanks after processOut because I thought > usually we only have one .. block in the rule, but in some of > your rules there's multiple, so in the latest commit to apertium/apertium, > I made them flush after the rule is finished outputting entirely. This > solves some of the issues such as: > > $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp > > u licê Louis-lo-Grant. > > > It's too difficult to have a single when dealing with complex structures. For instance, in French there is "not + verb + secondary-not", but in Arpitan I have "verb + not". Furthermore, the verb can be in a past tense in the source language but needs "aux + participle" in the target language (and I have to deal with which of the auxiliaries to use). More : the verb can be pronominal in the output language, but not in the source. So I use macros that deal with each of these issues and add or remove stuff. The result is a kind of multi-step output (and I'm not the only that does it). > 2. The spaces between numbers in your output are probably coming because > you have in the rules. If you remove those, the spaces will go away. > I can't remove in the rules. They are added when a new word is added, so I must add a blank too, at its beginning or its end. > > I'm still evaluating some other issues. > > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font > wrote: > >> >> >> Missatge de Tanmai Khanna del dia dg., 30 >> d’ag. 2020 a les 9:49: >> >>> My guess is, the transfer rule for Franco-Japanese has a two word input, >>> so the stored blank is "-". Now the output has 3 words "una >>> Franco-Japonêsa", since the blanks are printed in order, they're printed in >>> the first available spot in the output rules. >>> >> >> Yes, that is. "Franco" is a prefix and it is analysed as such. I have >> some tens of prefixes for avoiding having hundreds of words in the >> dictionaries and, more important, to be able to deal to unknown pairs like >> "franco-tibétain" or "franco-silésien". >> >> >>> >>> There's a few possible solutions for this. One idea is to have two kinds >>> of blank markers - one that will print a space always, and one that will >>> print available input blanks. This can also be implemented by having a >> v=" "/> in the output rule and then in the next spot. If this seems >>> too hacky a solution we can discuss other options. >>> >>> *तन्मय खन्ना * >>> *Tanmai Khanna* >>> >>> >>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna >>> wrote: >>> >>>> Hèctor, >>>> No worries I'll look into this. Can you send the input sentences? I >>>> want to see the transfer rules that are applying to the erroneous parts. >>>> They might need some changing. >>>> >>>> तन्मय खन्ना >>>> Tanmai Khanna >>>> >>>> -- >>>> *From:* Hèctor Alòs i Font >>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM >>>> *To:* [apertium-stuff] >>>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer >>>> >>>> Unfortunately, I found a lot of problems cased by superblanks, >>>> especially with the handling of hyphens. See a couple of differences in >>>> translations of my French test corpus into Arpitan before and after the >>>> update: >>>> >>>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. >>>> --- >>>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. >>>> >>>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la >>>> "*foresta" des pêrches de la Sêna. >>>> --- >>>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la >>>> "*foresta" des pêrches de la Sêna. >>>> >>>> Hèctor >>>> >>>> Missatge de Tanmai Khanna del dia ds., 29 >>>> d’ag. 2020 a les 16:50: >>>> >>>> Hey guys! >>>> The wordbound blanks project handles
Re: [Apertium-stuff] Update about superblanks in transfer
Why is - a blank in the first place? If it's needed in contexts, it should be fully analyzed as a token. This goes for all Apertium languages and pairs. I don't understand why punctuation generally isn't analyzed. I assume it's just historic. -- Tino Didriksen On Sun, 30 Aug 2020 at 08:27, Hèctor Alòs i Font wrote: > Unfortunately, I found a lot of problems cased by superblanks, especially > with the handling of hyphens. See a couple of differences in translations > of my French test corpus into Arpitan before and after the update: > > < 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos > Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. > --- > > 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos > Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. > > < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la > "*foresta" des pêrches de la Sêna. > --- > > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la > "*foresta" des pêrches de la Sêna. > > Hèctor > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
Hey, It solved them Franco-Japanese issue as well :D Can you check the diff once and see if there's any more issues Hèctor? (After updating apertium). *तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 2:35 PM Tanmai Khanna wrote: > Hi Hèctor, > I'm dealing with the issues I see one by one. > 1. I was flushing the remaining blanks after processOut because I thought > usually we only have one .. block in the rule, but in some of > your rules there's multiple, so in the latest commit to apertium/apertium, > I made them flush after the rule is finished outputting entirely. This > solves some of the issues such as: > > $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp > > u licê Louis-lo-Grant. > > > 2. The spaces between numbers in your output are probably coming because > you have in the rules. If you remove those, the spaces will go away. > > > I'm still evaluating some other issues. > > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font > wrote: > >> >> >> Missatge de Tanmai Khanna del dia dg., 30 >> d’ag. 2020 a les 9:49: >> >>> My guess is, the transfer rule for Franco-Japanese has a two word input, >>> so the stored blank is "-". Now the output has 3 words "una >>> Franco-Japonêsa", since the blanks are printed in order, they're printed in >>> the first available spot in the output rules. >>> >> >> Yes, that is. "Franco" is a prefix and it is analysed as such. I have >> some tens of prefixes for avoiding having hundreds of words in the >> dictionaries and, more important, to be able to deal to unknown pairs like >> "franco-tibétain" or "franco-silésien". >> >> >>> >>> There's a few possible solutions for this. One idea is to have two kinds >>> of blank markers - one that will print a space always, and one that will >>> print available input blanks. This can also be implemented by having a >> v=" "/> in the output rule and then in the next spot. If this seems >>> too hacky a solution we can discuss other options. >>> >>> *तन्मय खन्ना * >>> *Tanmai Khanna* >>> >>> >>> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna >>> wrote: >>> >>>> Hèctor, >>>> No worries I'll look into this. Can you send the input sentences? I >>>> want to see the transfer rules that are applying to the erroneous parts. >>>> They might need some changing. >>>> >>>> तन्मय खन्ना >>>> Tanmai Khanna >>>> >>>> -- >>>> *From:* Hèctor Alòs i Font >>>> *Sent:* Sunday, August 30, 2020 11:57:16 AM >>>> *To:* [apertium-stuff] >>>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer >>>> >>>> Unfortunately, I found a lot of problems cased by superblanks, >>>> especially with the handling of hyphens. See a couple of differences in >>>> translations of my French test corpus into Arpitan before and after the >>>> update: >>>> >>>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. >>>> --- >>>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. >>>> >>>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la >>>> "*foresta" des pêrches de la Sêna. >>>> --- >>>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la >>>> "*foresta" des pêrches de la Sêna. >>>> >>>> Hèctor >>>> >>>> Missatge de Tanmai Khanna del dia ds., 29 >>>> d’ag. 2020 a les 16:50: >>>> >>>> Hey guys! >>>> The wordbound blanks project handles blanks that are supposed to be >>>> reordered. Therefore, we no longer need the user to be worried about blank >>>> positions in transfer rules. The latest update to the apertium code makes >>>> it such that is now the same as . You can change the >>> pos="X"/> in your transfer rules to just and it'll work. >>>> >>>> Now, the only thing you need to worry about when writing transfer rules >>>> is whether you want a blank between the two LUs or not. *Input blanks >>&g
Re: [Apertium-stuff] Update about superblanks in transfer
Hi Hèctor, I'm dealing with the issues I see one by one. 1. I was flushing the remaining blanks after processOut because I thought usually we only have one .. block in the rule, but in some of your rules there's multiple, so in the latest commit to apertium/apertium, I made them flush after the rule is finished outputting entirely. This solves some of the issues such as: $ echo "au lycée Louis-le-Grand" | apertium -d .. fra-frp u licê Louis-lo-Grant. 2. The spaces between numbers in your output are probably coming because you have in the rules. If you remove those, the spaces will go away. I'm still evaluating some other issues. *तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 1:21 PM Hèctor Alòs i Font wrote: > > > Missatge de Tanmai Khanna del dia dg., 30 d’ag. > 2020 a les 9:49: > >> My guess is, the transfer rule for Franco-Japanese has a two word input, >> so the stored blank is "-". Now the output has 3 words "una >> Franco-Japonêsa", since the blanks are printed in order, they're printed in >> the first available spot in the output rules. >> > > Yes, that is. "Franco" is a prefix and it is analysed as such. I have some > tens of prefixes for avoiding having hundreds of words in the > dictionaries and, more important, to be able to deal to unknown pairs like > "franco-tibétain" or "franco-silésien". > > >> >> There's a few possible solutions for this. One idea is to have two kinds >> of blank markers - one that will print a space always, and one that will >> print available input blanks. This can also be implemented by having a > v=" "/> in the output rule and then in the next spot. If this seems >> too hacky a solution we can discuss other options. >> >> *तन्मय खन्ना * >> *Tanmai Khanna* >> >> >> On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna >> wrote: >> >>> Hèctor, >>> No worries I'll look into this. Can you send the input sentences? I want >>> to see the transfer rules that are applying to the erroneous parts. They >>> might need some changing. >>> >>> तन्मय खन्ना >>> Tanmai Khanna >>> >>> -- >>> *From:* Hèctor Alòs i Font >>> *Sent:* Sunday, August 30, 2020 11:57:16 AM >>> *To:* [apertium-stuff] >>> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer >>> >>> Unfortunately, I found a lot of problems cased by superblanks, >>> especially with the handling of hyphens. See a couple of differences in >>> translations of my French test corpus into Arpitan before and after the >>> update: >>> >>> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>> *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. >>> --- >>> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur >>> *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. >>> >>> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la >>> "*foresta" des pêrches de la Sêna. >>> --- >>> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la >>> "*foresta" des pêrches de la Sêna. >>> >>> Hèctor >>> >>> Missatge de Tanmai Khanna del dia ds., 29 >>> d’ag. 2020 a les 16:50: >>> >>> Hey guys! >>> The wordbound blanks project handles blanks that are supposed to be >>> reordered. Therefore, we no longer need the user to be worried about blank >>> positions in transfer rules. The latest update to the apertium code makes >>> it such that is now the same as . You can change the >> pos="X"/> in your transfer rules to just and it'll work. >>> >>> Now, the only thing you need to worry about when writing transfer rules >>> is whether you want a blank between the two LUs or not. *Input blanks >>> will be stored as a queue and will be printed in order in all >>> available spots in the rule output. * >>> >>> *Note:* >>> - If the output rule has more blank spots than input blanks, then the >>> remaining blank spots will be spaces. >>> - If the output rule has less blank spots than input blanks, then the >>> remaining input blanks will be output after the rule output. >>> - If the input blank is an empty string, it is stored as a space. >>> >>> In some transfer rules, there are input patterns which don't have a >>> space between them. In the output section of the
Re: [Apertium-stuff] Update about superblanks in transfer
Missatge de Tanmai Khanna del dia dg., 30 d’ag. 2020 a les 9:49: > My guess is, the transfer rule for Franco-Japanese has a two word input, > so the stored blank is "-". Now the output has 3 words "una > Franco-Japonêsa", since the blanks are printed in order, they're printed in > the first available spot in the output rules. > Yes, that is. "Franco" is a prefix and it is analysed as such. I have some tens of prefixes for avoiding having hundreds of words in the dictionaries and, more important, to be able to deal to unknown pairs like "franco-tibétain" or "franco-silésien". > > There's a few possible solutions for this. One idea is to have two kinds > of blank markers - one that will print a space always, and one that will > print available input blanks. This can also be implemented by having a v=" "/> in the output rule and then in the next spot. If this seems > too hacky a solution we can discuss other options. > > *तन्मय खन्ना * > *Tanmai Khanna* > > > On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna > wrote: > >> Hèctor, >> No worries I'll look into this. Can you send the input sentences? I want >> to see the transfer rules that are applying to the erroneous parts. They >> might need some changing. >> >> तन्मय खन्ना >> Tanmai Khanna >> >> ---------- >> *From:* Hèctor Alòs i Font >> *Sent:* Sunday, August 30, 2020 11:57:16 AM >> *To:* [apertium-stuff] >> *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer >> >> Unfortunately, I found a lot of problems cased by superblanks, especially >> with the handling of hyphens. See a couple of differences in translations >> of my French test corpus into Arpitan before and after the update: >> >> < 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos >> Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. >> --- >> > 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos >> Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. >> >> < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la >> "*foresta" des pêrches de la Sêna. >> --- >> > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la >> "*foresta" des pêrches de la Sêna. >> >> Hèctor >> >> Missatge de Tanmai Khanna del dia ds., 29 >> d’ag. 2020 a les 16:50: >> >> Hey guys! >> The wordbound blanks project handles blanks that are supposed to be >> reordered. Therefore, we no longer need the user to be worried about blank >> positions in transfer rules. The latest update to the apertium code makes >> it such that is now the same as . You can change the > pos="X"/> in your transfer rules to just and it'll work. >> >> Now, the only thing you need to worry about when writing transfer rules >> is whether you want a blank between the two LUs or not. *Input blanks >> will be stored as a queue and will be printed in order in all >> available spots in the rule output. * >> >> *Note:* >> - If the output rule has more blank spots than input blanks, then the >> remaining blank spots will be spaces. >> - If the output rule has less blank spots than input blanks, then the >> remaining input blanks will be output after the rule output. >> - If the input blank is an empty string, it is stored as a space. >> >> In some transfer rules, there are input patterns which don't have a space >> between them. In the output section of these transfer rules, >> used to give an empty string, but it will now give a space. To remove >> the blank from the output, you will need to remove the from >> the transfer rule and it will be fine. >> >> Here are some examples from the tests. >> >> EXAMPLE 1: >> Input: >> >> [blank1] ^worda/wordta$ ;[blank2]; ^wordb/wordtb$ >> [blank3]; ^hun/ho$ [blank4] >> >> There's no in rule output, so all blanks are after flushed after >> rule output. >> >> Output: >> >> [blank1] ^test1{^wordta$^wordtb$^ho$}$ ;[blank2]; >> [blank3]; [blank4] >> >> EXAMPLE 2: >> Input: >> >> [blank1] ^wordb/wordtb$ ;[blank2]; ^worda/wordta$ >> [blank3]; ^hun/ho$ [blank4] >> >> There's one in rule output, so it prints one and flushes the rest. >> >> Output: >> >> [blank1] ^test1{^wordta$ ;[blank2]; ^ho$}$ [blank3]; >> [blank4] >> >> This has been implemented for the chunker, interchunk, and postchunk. >> >> If you have any questions, suggestions, comments, etc., I'll be happy to >> respond to them. >> >> Thanks and Regards, >> *तन्मय खन्ना * >> *Tanmai Khanna* >> ___ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> >> ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
My guess is, the transfer rule for Franco-Japanese has a two word input, so the stored blank is "-". Now the output has 3 words "una Franco-Japonêsa", since the blanks are printed in order, they're printed in the first available spot in the output rules. There's a few possible solutions for this. One idea is to have two kinds of blank markers - one that will print a space always, and one that will print available input blanks. This can also be implemented by having a in the output rule and then in the next spot. If this seems too hacky a solution we can discuss other options. *तन्मय खन्ना * *Tanmai Khanna* On Sun, Aug 30, 2020 at 12:09 PM Tanmai Khanna wrote: > Hèctor, > No worries I'll look into this. Can you send the input sentences? I want > to see the transfer rules that are applying to the erroneous parts. They > might need some changing. > > तन्मय खन्ना > Tanmai Khanna > > -- > *From:* Hèctor Alòs i Font > *Sent:* Sunday, August 30, 2020 11:57:16 AM > *To:* [apertium-stuff] > *Subject:* Re: [Apertium-stuff] Update about superblanks in transfer > > Unfortunately, I found a lot of problems cased by superblanks, especially > with the handling of hyphens. See a couple of differences in translations > of my French test corpus into Arpitan before and after the update: > > < 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos > Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. > --- > > 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos > Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. > > < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la > "*foresta" des pêrches de la Sêna. > --- > > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la > "*foresta" des pêrches de la Sêna. > > Hèctor > > Missatge de Tanmai Khanna del dia ds., 29 d’ag. > 2020 a les 16:50: > > Hey guys! > The wordbound blanks project handles blanks that are supposed to be > reordered. Therefore, we no longer need the user to be worried about blank > positions in transfer rules. The latest update to the apertium code makes > it such that is now the same as . You can change the pos="X"/> in your transfer rules to just and it'll work. > > Now, the only thing you need to worry about when writing transfer rules is > whether you want a blank between the two LUs or not. *Input blanks will > be stored as a queue and will be printed in order in all > available spots in the rule output. * > > *Note:* > - If the output rule has more blank spots than input blanks, then the > remaining blank spots will be spaces. > - If the output rule has less blank spots than input blanks, then the > remaining input blanks will be output after the rule output. > - If the input blank is an empty string, it is stored as a space. > > In some transfer rules, there are input patterns which don't have a space > between them. In the output section of these transfer rules, used > to give an empty string, but it will now give a space. To remove the blank > from the output, you will need to remove the from the > transfer rule and it will be fine. > > Here are some examples from the tests. > > EXAMPLE 1: > Input: > > [blank1] ^worda/wordta$ ;[blank2]; ^wordb/wordtb$ > [blank3]; ^hun/ho$ [blank4] > > There's no in rule output, so all blanks are after flushed after > rule output. > > Output: > > [blank1] ^test1{^wordta$^wordtb$^ho$}$ ;[blank2]; > [blank3]; [blank4] > > EXAMPLE 2: > Input: > > [blank1] ^wordb/wordtb$ ;[blank2]; ^worda/wordta$ > [blank3]; ^hun/ho$ [blank4] > > There's one in rule output, so it prints one and flushes the rest. > > Output: > > [blank1] ^test1{^wordta$ ;[blank2]; ^ho$}$ [blank3]; > [blank4] > > This has been implemented for the chunker, interchunk, and postchunk. > > If you have any questions, suggestions, comments, etc., I'll be happy to > respond to them. > > Thanks and Regards, > *तन्मय खन्ना * > *Tanmai Khanna* > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
Hèctor, No worries I'll look into this. Can you send the input sentences? I want to see the transfer rules that are applying to the erroneous parts. They might need some changing. तन्मय खन्ना Tanmai Khanna From: Hèctor Alòs i Font Sent: Sunday, August 30, 2020 11:57:16 AM To: [apertium-stuff] Subject: Re: [Apertium-stuff] Update about superblanks in transfer Unfortunately, I found a lot of problems cased by superblanks, especially with the handling of hyphens. See a couple of differences in translations of my French test corpus into Arpitan before and after the update: < 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. --- > 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos > Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la "*foresta" des pêrches de la Sêna. --- > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la "*foresta" > des pêrches de la Sêna. Hèctor Missatge de Tanmai Khanna mailto:khanna.tan...@gmail.com>> del dia ds., 29 d’ag. 2020 a les 16:50: Hey guys! The wordbound blanks project handles blanks that are supposed to be reordered. Therefore, we no longer need the user to be worried about blank positions in transfer rules. The latest update to the apertium code makes it such that is now the same as . You can change the in your transfer rules to just and it'll work. Now, the only thing you need to worry about when writing transfer rules is whether you want a blank between the two LUs or not. Input blanks will be stored as a queue and will be printed in order in all available spots in the rule output. Note: - If the output rule has more blank spots than input blanks, then the remaining blank spots will be spaces. - If the output rule has less blank spots than input blanks, then the remaining input blanks will be output after the rule output. - If the input blank is an empty string, it is stored as a space. In some transfer rules, there are input patterns which don't have a space between them. In the output section of these transfer rules, used to give an empty string, but it will now give a space. To remove the blank from the output, you will need to remove the from the transfer rule and it will be fine. Here are some examples from the tests. EXAMPLE 1: Input: [blank1] ^worda/wordta$ ;[blank2]; ^wordb/wordtb$ [blank3]; ^hun/ho$ [blank4] There's no in rule output, so all blanks are after flushed after rule output. Output: [blank1] ^test1{^wordta$^wordtb$^ho$}$ ;[blank2]; [blank3]; [blank4] EXAMPLE 2: Input: [blank1] ^wordb/wordtb$ ;[blank2]; ^worda/wordta$ [blank3]; ^hun/ho$ [blank4] There's one in rule output, so it prints one and flushes the rest. Output: [blank1] ^test1{^wordta$ ;[blank2]; ^ho$}$ [blank3]; [blank4] This has been implemented for the chunker, interchunk, and postchunk. If you have any questions, suggestions, comments, etc., I'll be happy to respond to them. Thanks and Regards, तन्मय खन्ना Tanmai Khanna ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net<mailto:Apertium-stuff@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/apertium-stuff ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
Unfortunately, I found a lot of problems cased by superblanks, especially with the handling of hyphens. See a couple of differences in translations of my French test corpus into Arpitan before and after the update: < 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos Marcos, tomba amouerox de Yvonne, una Franco-Japonêsa. --- > 00607. Tandis que les Tétes Broulâyes sont en *permission sur *Espritos Marcos, tomba amouerox de Yvonne, una- Franco Japonêsa. < 00748. On povêt per ègzemplo parlar, sot Charlo-lo-Pelâ, de la "*foresta" des pêrches de la Sêna. --- > 00748. On povêt per ègzemplo parlar, sot Charlo-lo- Pelâ, de la "*foresta" des pêrches de la Sêna. Hèctor Missatge de Tanmai Khanna del dia ds., 29 d’ag. 2020 a les 16:50: > Hey guys! > The wordbound blanks project handles blanks that are supposed to be > reordered. Therefore, we no longer need the user to be worried about blank > positions in transfer rules. The latest update to the apertium code makes > it such that is now the same as . You can change the pos="X"/> in your transfer rules to just and it'll work. > > Now, the only thing you need to worry about when writing transfer rules is > whether you want a blank between the two LUs or not. *Input blanks will > be stored as a queue and will be printed in order in all > available spots in the rule output. * > > *Note:* > - If the output rule has more blank spots than input blanks, then the > remaining blank spots will be spaces. > - If the output rule has less blank spots than input blanks, then the > remaining input blanks will be output after the rule output. > - If the input blank is an empty string, it is stored as a space. > > In some transfer rules, there are input patterns which don't have a space > between them. In the output section of these transfer rules, used > to give an empty string, but it will now give a space. To remove the blank > from the output, you will need to remove the from the > transfer rule and it will be fine. > > Here are some examples from the tests. > > EXAMPLE 1: > Input: > > [blank1] ^worda/wordta$ ;[blank2]; ^wordb/wordtb$ > [blank3]; ^hun/ho$ [blank4] > > There's no in rule output, so all blanks are after flushed after > rule output. > > Output: > > [blank1] ^test1{^wordta$^wordtb$^ho$}$ ;[blank2]; > [blank3]; [blank4] > > EXAMPLE 2: > Input: > > [blank1] ^wordb/wordtb$ ;[blank2]; ^worda/wordta$ > [blank3]; ^hun/ho$ [blank4] > > There's one in rule output, so it prints one and flushes the rest. > > Output: > > [blank1] ^test1{^wordta$ ;[blank2]; ^ho$}$ [blank3]; > [blank4] > > This has been implemented for the chunker, interchunk, and postchunk. > > If you have any questions, suggestions, comments, etc., I'll be happy to > respond to them. > > Thanks and Regards, > *तन्मय खन्ना * > *Tanmai Khanna* > ___ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff
Re: [Apertium-stuff] Update about superblanks in transfer
Tanmai Khanna čálii: > we no longer need the user to be worried about blank > positions in transfer rules. The latest update to the apertium code makes > it such that is now the same as . You can change the pos="X"/> in your transfer rules to just and it'll work. > > Now, the only thing you need to worry about when writing transfer rules is > whether you want a blank between the two LUs or not. signature.asc Description: PGP signature ___ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff