Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-04 Thread Tanmai Khanna
In response to this issue, empty blanks are considered blanks again, so a
 or a  in the output will put an empty blank if the
input had an empty blank in that position.

Once again, a  and a  now does the same thing. Earlier,  was used to print a blank from the input and  was used to
print a literal space in the output. Now, a  will print a blank from
the input in order of the blanks read. In transfer rules written in the
future there's no need to add a pos attribute to , and the ones that
exist already will act the same as a .

This means that there's no way to reorder blanks from transfer rules now,
but that is by design. Hèctor, let me know if this solved your issue :)

Thanks and Regards,
*तन्मय खन्ना *
*Tanmai Khanna*


On Fri, Sep 4, 2020 at 12:15 PM Hèctor Alòs i Font 
wrote:

> Missatge de Tanmai Khanna  del dia dv., 4 de
> set. 2020 a les 9:22:
>
>> Hèctor,
>> Yes, the new improvements aren't backwards compatible but that's because
>> they're better than the system we had earlier. Here's the changes:
>>
>> So, you are saying that the new stuff is not backwards compatible, aren't
>>> you? There aren't any  in the rule, but , which is not
>>> the same. Until now,  means explicitly putting a blank, while >> pos="1,2..."/> means copying to the output whatever is in the input in a
>>> given point.
>>>
>>
>>  and  now do exactly the same thing. You don't need
>> to replace all of the former with the latter but even if you do or don't it
>> won't change anything. Until now it meant what you said but now it means
>> that if you see a  or a  then print one blank from
>> the blank queue in the output.
>>
>> Superblanks most of the time are blanks, but, as you now probably know
>>> better than anyone else, they can be lots of things; they can even contain
>>> no blanks at all. Even in some cases, like in Romance-language enclitics,
>>> we know there shouldn't be any blank at all before them, but we had to
>>> add  for not loosing information on italics, bold letters,
>>> etc.
>>>
>>
>> You're right, except now we have a completely different system to deal
>> with italics, bold letters, and all markup, i.e. wordbound blanks, which
>> aren't considered blanks. Now that there is no information to lose, we
>> didn't want to burden the people who write transfer rules to explicitly
>> define positions of blanks. In cases where you don't want a space in the
>> output, you just don't put a  in the output rule.
>>
>>
>>> I'm not really ready to change all  in the hundreds of
>>> rules I've been writing in several language pairs. Specifically for
>>> apertium-fra-frp, I hope it will be able to publish it before the new
>>> version of the Apertium core you are preparing, so they are needed right
>>> now.
>>>
>>
>> You won't have to change all of them. Most of them will work as it is.
>> The new system prints blanks in the same order as they were input, so it
>> won't harm most of the rules. The *only thing *you'll have to change, is
>> rules where you don't want a space in the output between LUs, you remove
>> the  from those rules. This is because now, an empty blank
>> isn't considered a blank anymore. This was because we want the users to
>> have control about whether they want a blank or not between their output
>> LUs, regardless of the input blanks. If we consider an empty blank, your
>> problem will be solved, but other problems will come up, where empty blanks
>> will appear in the output regardless of s in the output.
>>
>> So to conclude, the only thing you need to remove is the 
>> from rules where you know you don't want a space in the output, like num_n,
>> and maybe some enclitics. Apart from that, everything will work as it is.
>> To improve the system, at some point we'll have to add a change that isn't
>> strictly backwards compatible, and several people agree that after
>> wordbound blanks, we should stop handling blank positions in transfer rules.
>>
>
> The problem is that in 99% of the cases I want a blank in num_n, that is
> between the numeral and the name. In most of the cases we have "two cows",
> "3 dogs", etc. In Romance languages, the rule is needed mostly for gender
> agreement. The problem is that sometimes, as we see, we got something else.
> So the question is not whether I want a blank there or not. I want whatever
> was there. So, let me try to formulate it in another way. If I want to
> preserve what was written between two words, I shouldn't write  pos="1,2..."/>, but if I want to add a blank, I have to add . Am I
> right? If this is correct, it comes to remove all . It
> seems it would be easier that they wouldn't be taken into account, and thus
> avoiding any change in the language pairs. Am I missing something?
>
> Hèctor
>
>
>>
>> If this isn't acceptable, we can discuss other possible solutions :)
>>
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> 

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-04 Thread Hèctor Alòs i Font
Missatge de Tanmai Khanna  del dia dv., 4 de set.
2020 a les 9:22:

> Hèctor,
> Yes, the new improvements aren't backwards compatible but that's because
> they're better than the system we had earlier. Here's the changes:
>
> So, you are saying that the new stuff is not backwards compatible, aren't
>> you? There aren't any  in the rule, but , which is not
>> the same. Until now,  means explicitly putting a blank, while > pos="1,2..."/> means copying to the output whatever is in the input in a
>> given point.
>>
>
>  and  now do exactly the same thing. You don't need
> to replace all of the former with the latter but even if you do or don't it
> won't change anything. Until now it meant what you said but now it means
> that if you see a  or a  then print one blank from
> the blank queue in the output.
>
> Superblanks most of the time are blanks, but, as you now probably know
>> better than anyone else, they can be lots of things; they can even contain
>> no blanks at all. Even in some cases, like in Romance-language enclitics,
>> we know there shouldn't be any blank at all before them, but we had to
>> add  for not loosing information on italics, bold letters,
>> etc.
>>
>
> You're right, except now we have a completely different system to deal
> with italics, bold letters, and all markup, i.e. wordbound blanks, which
> aren't considered blanks. Now that there is no information to lose, we
> didn't want to burden the people who write transfer rules to explicitly
> define positions of blanks. In cases where you don't want a space in the
> output, you just don't put a  in the output rule.
>
>
>> I'm not really ready to change all  in the hundreds of
>> rules I've been writing in several language pairs. Specifically for
>> apertium-fra-frp, I hope it will be able to publish it before the new
>> version of the Apertium core you are preparing, so they are needed right
>> now.
>>
>
> You won't have to change all of them. Most of them will work as it is. The
> new system prints blanks in the same order as they were input, so it won't
> harm most of the rules. The *only thing *you'll have to change, is rules
> where you don't want a space in the output between LUs, you remove the  pos="1"/> from those rules. This is because now, an empty blank isn't
> considered a blank anymore. This was because we want the users to have
> control about whether they want a blank or not between their output LUs,
> regardless of the input blanks. If we consider an empty blank, your problem
> will be solved, but other problems will come up, where empty blanks will
> appear in the output regardless of s in the output.
>
> So to conclude, the only thing you need to remove is the  from
> rules where you know you don't want a space in the output, like num_n, and
> maybe some enclitics. Apart from that, everything will work as it is. To
> improve the system, at some point we'll have to add a change that isn't
> strictly backwards compatible, and several people agree that after
> wordbound blanks, we should stop handling blank positions in transfer rules.
>

The problem is that in 99% of the cases I want a blank in num_n, that is
between the numeral and the name. In most of the cases we have "two cows",
"3 dogs", etc. In Romance languages, the rule is needed mostly for gender
agreement. The problem is that sometimes, as we see, we got something else.
So the question is not whether I want a blank there or not. I want whatever
was there. So, let me try to formulate it in another way. If I want to
preserve what was written between two words, I shouldn't write , but if I want to add a blank, I have to add . Am I
right? If this is correct, it comes to remove all . It
seems it would be easier that they wouldn't be taken into account, and thus
avoiding any change in the language pairs. Am I missing something?

Hèctor


>
> If this isn't acceptable, we can discuss other possible solutions :)
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-04 Thread Tanmai Khanna
Hèctor,
Yes, the new improvements aren't backwards compatible but that's because
they're better than the system we had earlier. Here's the changes:

So, you are saying that the new stuff is not backwards compatible, aren't
> you? There aren't any  in the rule, but , which is not
> the same. Until now,  means explicitly putting a blank, while  pos="1,2..."/> means copying to the output whatever is in the input in a
> given point.
>

 and  now do exactly the same thing. You don't need to
replace all of the former with the latter but even if you do or don't it
won't change anything. Until now it meant what you said but now it means
that if you see a  or a  then print one blank from
the blank queue in the output.

Superblanks most of the time are blanks, but, as you now probably know
> better than anyone else, they can be lots of things; they can even contain
> no blanks at all. Even in some cases, like in Romance-language enclitics,
> we know there shouldn't be any blank at all before them, but we had to
> add  for not loosing information on italics, bold letters,
> etc.
>

You're right, except now we have a completely different system to deal with
italics, bold letters, and all markup, i.e. wordbound blanks, which aren't
considered blanks. Now that there is no information to lose, we didn't want
to burden the people who write transfer rules to explicitly define
positions of blanks. In cases where you don't want a space in the output,
you just don't put a  in the output rule.


> I'm not really ready to change all  in the hundreds of
> rules I've been writing in several language pairs. Specifically for
> apertium-fra-frp, I hope it will be able to publish it before the new
> version of the Apertium core you are preparing, so they are needed right
> now.
>

You won't have to change all of them. Most of them will work as it is. The
new system prints blanks in the same order as they were input, so it won't
harm most of the rules. The *only thing *you'll have to change, is rules
where you don't want a space in the output between LUs, you remove the  from those rules. This is because now, an empty blank isn't
considered a blank anymore. This was because we want the users to have
control about whether they want a blank or not between their output LUs,
regardless of the input blanks. If we consider an empty blank, your problem
will be solved, but other problems will come up, where empty blanks will
appear in the output regardless of s in the output.

So to conclude, the only thing you need to remove is the  from
rules where you know you don't want a space in the output, like num_n, and
maybe some enclitics. Apart from that, everything will work as it is. To
improve the system, at some point we'll have to add a change that isn't
strictly backwards compatible, and several people agree that after
wordbound blanks, we should stop handling blank positions in transfer rules.

If this isn't acceptable, we can discuss other possible solutions :)

*तन्मय खन्ना *
*Tanmai Khanna*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Hèctor Alòs i Font
Missatge de Tanmai Khanna  del dia dj., 3 de set.
2020 a les 23:10:

> Hèctor,
> The extra blank there is because there's a blank in your rule output. See:
>
> $ echo "^052/052$^F/F$" | apertium-transfer
> -z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin'
>
> ^num_n{^052$ ^F$}$
>
>
> The rule for num_n has a  in the rule output and hence there's a
> space. The reason earlier there wasn't space was because an empty string
> was considered a blank. Now, if you don't want a space between the LUs in
> the rule output, you just don't put a . So if you remove the  from
> the num_n rule it will start working properly. Earlier you used to add a
>  everytime the rule had multiple LUs in the output but now *you only
> add a  if you want a space/blank between the output words.*
>
>
> Try removing the  and it should work.
>


So, you are saying that the new stuff is not backwards compatible, aren't
you? There aren't any  in the rule, but , which is not
the same. Until now,  means explicitly putting a blank, while  means copying to the output whatever is in the input in a
given point. Superblanks most of the time are blanks, but, as you now
probably know better than anyone else, they can be lots of things; they
can even contain no blanks at all. Even in some cases, like in
Romance-language enclitics, we know there shouldn't be any blank at all
before them, but we had to add  for not
loosing information on italics, bold letters, etc.

I'm not really ready to change all  in the hundreds of
rules I've been writing in several language pairs. Specifically for
apertium-fra-frp, I hope it will be able to publish it before the new
version of the Apertium core you are preparing, so they are needed right
now.

Hèctor


>
> As for the discussion about  Iér o 5e, we all agreed that we
> don't want them in the dictionaries and hence you can analyse them as
> individual LUs and then using apertium-separable you can combine them into
> one LU. Finally, the space between l and ér shouldn't appear in the rule
> output and it is because of an issue that's still being fixed. But it'll be
> fine soon :)
>
>
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Sep 3, 2020 at 11:46 PM Hèctor Alòs i Font 
> wrote:
>
>> Hi Tanmai,
>>
>> Yes, hyphens and quotes (") seem to be solved. But the system persists to
>> add blanks where there were not. For instance, this causes that we get now
>> strange Unicode codes:
>>
>> 05076. Table des caractères Unicode U+0500 à U+052F.
>> < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F.
>> ---
>> > 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F.
>>
>> The same for names of standards (e.g. 802.3j), road names, car (Fiat
>> 621RN) or plane (EA-18G Growler) models, etc.
>>
>> On the ... I wouldn't say that it is very beautiful. It could
>> be misleading if there is just one character, as it often happens, like in
>> 5e. In any case, what most interests me is how to deal with these things
>> in the dictionaries. That's not a problem of the new blank-treatment or
>> Transfuse. That's a problem we already had, but I never thought about it. I
>> wouldn't like to have Iér o 5e in the dictionaries. It may cause
>> problems, i.a. because ér and e can be words of their own, so we'll get a
>> wrong morphological analysis.
>>
>> Hèctor
>>
>>
>>
>>
>> Missatge de Tanmai Khanna  del dia dj., 3 de
>> set. 2020 a les 18:57:
>>
>>> Hèctor can you check the page on beta now? The hyphen and the
>>> superscript issues are solved. Of course, there's now a space between l and
>>> ér. If that's a big problem we can discuss other solutions.
>>>
>>> *तन्मय खन्ना *
>>> *Tanmai Khanna*
>>>
>>>
>>> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen 
>>> wrote:
>>>
 I have adjusted Transfuse with how spaces are treated for Apertium, and
 implemented adding temporary spaces around  and . Changes are
 deployed on beta.

 I repeat my plea that all symbols should have an analysis. It breaks
 markup that things like - and : are not tokens.

 -- Tino Didriksen


 On Wed, 2 Sep 2020 at 13:23, Tino Didriksen 
 wrote:

> That's not something the pipe ever sees - you can't fix it on your
> end. It's something I have to adjust in Transfuse.
>
> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
> and L629 expands inline tags to encompass surrounding plain text, because
> it is unfortunately common for formatting to be partially on a word while
> you really want the whole word translated as a unit.
>
> However, for HTML I should add spaces around  and  so that
> they can't gobble up their surroundings. Tracked as
> https://github.com/TinoDidriksen/Transfuse/issues/7
>
> -- Tino Didriksen
>
>
> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
> wrote:
>
>> I'm taking a look on how this list of names on Wikipedia:
>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hèctor,
The extra blank there is because there's a blank in your rule output. See:

$ echo "^052/052$^F/F$" | apertium-transfer
-z -b 'apertium-fra-frp.frp-fra.t1x' 'frp-fra.t1x.bin'

^num_n{^052$ ^F$}$


The rule for num_n has a  in the rule output and hence there's a space.
The reason earlier there wasn't space was because an empty string was
considered a blank. Now, if you don't want a space between the LUs in the
rule output, you just don't put a . So if you remove the  from the
num_n rule it will start working properly. Earlier you used to add a 
everytime the rule had multiple LUs in the output but now *you only add a
 if you want a space/blank between the output words.*


Try removing the  and it should work.


As for the discussion about  Iér o 5e, we all agreed that we don't
want them in the dictionaries and hence you can analyse them as individual
LUs and then using apertium-separable you can combine them into one LU.
Finally, the space between l and ér shouldn't appear in the rule output and
it is because of an issue that's still being fixed. But it'll be fine soon
:)



*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 3, 2020 at 11:46 PM Hèctor Alòs i Font 
wrote:

> Hi Tanmai,
>
> Yes, hyphens and quotes (") seem to be solved. But the system persists to
> add blanks where there were not. For instance, this causes that we get now
> strange Unicode codes:
>
> 05076. Table des caractères Unicode U+0500 à U+052F.
> < 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F.
> ---
> > 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F.
>
> The same for names of standards (e.g. 802.3j), road names, car (Fiat
> 621RN) or plane (EA-18G Growler) models, etc.
>
> On the ... I wouldn't say that it is very beautiful. It could
> be misleading if there is just one character, as it often happens, like in
> 5e. In any case, what most interests me is how to deal with these things
> in the dictionaries. That's not a problem of the new blank-treatment or
> Transfuse. That's a problem we already had, but I never thought about it. I
> wouldn't like to have Iér o 5e in the dictionaries. It may cause
> problems, i.a. because ér and e can be words of their own, so we'll get a
> wrong morphological analysis.
>
> Hèctor
>
>
>
>
> Missatge de Tanmai Khanna  del dia dj., 3 de
> set. 2020 a les 18:57:
>
>> Hèctor can you check the page on beta now? The hyphen and the superscript
>> issues are solved. Of course, there's now a space between l and ér. If
>> that's a big problem we can discuss other solutions.
>>
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>>
>>
>> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen 
>> wrote:
>>
>>> I have adjusted Transfuse with how spaces are treated for Apertium, and
>>> implemented adding temporary spaces around  and . Changes are
>>> deployed on beta.
>>>
>>> I repeat my plea that all symbols should have an analysis. It breaks
>>> markup that things like - and : are not tokens.
>>>
>>> -- Tino Didriksen
>>>
>>>
>>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen 
>>> wrote:
>>>
 That's not something the pipe ever sees - you can't fix it on your end.
 It's something I have to adjust in Transfuse.

 https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
 and L629 expands inline tags to encompass surrounding plain text, because
 it is unfortunately common for formatting to be partially on a word while
 you really want the whole word translated as a unit.

 However, for HTML I should add spaces around  and  so that
 they can't gobble up their surroundings. Tracked as
 https://github.com/TinoDidriksen/Transfuse/issues/7

 -- Tino Didriksen


 On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
 wrote:

> I'm taking a look on how this list of names on Wikipedia:
> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
> and how it is translated in beta.apertium:
> https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>
> There still are quite a few problems with HTML-tags if we look that
> the whole Iér is becoming a superscript, and also with italics. The space
> after the hyphen is an already known problem.
>
> By the way, I wonder whether it is possible to match in our
> dictionaries Iér. I have Iér in the dictionary, but when the
> ending ér stays as a superscript, as usually done in the texts, it is not
> matched. Should I add Iér to the dictionary?
>
> Hèctor
>
 ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> 

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Hèctor Alòs i Font
Hi Tanmai,

Yes, hyphens and quotes (") seem to be solved. But the system persists to
add blanks where there were not. For instance, this causes that we get now
strange Unicode codes:

05076. Table des caractères Unicode U+0500 à U+052F.
< 05076. Tâbla des caractèros Unicode *U+0500 a *U+052F.
---
> 05076. Tâbla des caractèros Unicode *U+0500 a *U+052 F.

The same for names of standards (e.g. 802.3j), road names, car (Fiat 621RN)
or plane (EA-18G Growler) models, etc.

On the ... I wouldn't say that it is very beautiful. It could be
misleading if there is just one character, as it often happens, like in 5e.
In any case, what most interests me is how to deal with these things in the
dictionaries. That's not a problem of the new blank-treatment or Transfuse.
That's a problem we already had, but I never thought about it. I wouldn't
like to have Iér o 5e in the dictionaries. It may cause problems,
i.a. because ér and e can be words of their own, so we'll get a wrong
morphological analysis.

Hèctor




Missatge de Tanmai Khanna  del dia dj., 3 de set.
2020 a les 18:57:

> Hèctor can you check the page on beta now? The hyphen and the superscript
> issues are solved. Of course, there's now a space between l and ér. If
> that's a big problem we can discuss other solutions.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen 
> wrote:
>
>> I have adjusted Transfuse with how spaces are treated for Apertium, and
>> implemented adding temporary spaces around  and . Changes are
>> deployed on beta.
>>
>> I repeat my plea that all symbols should have an analysis. It breaks
>> markup that things like - and : are not tokens.
>>
>> -- Tino Didriksen
>>
>>
>> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen 
>> wrote:
>>
>>> That's not something the pipe ever sees - you can't fix it on your end.
>>> It's something I have to adjust in Transfuse.
>>>
>>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>>> and L629 expands inline tags to encompass surrounding plain text, because
>>> it is unfortunately common for formatting to be partially on a word while
>>> you really want the whole word translated as a unit.
>>>
>>> However, for HTML I should add spaces around  and  so that
>>> they can't gobble up their surroundings. Tracked as
>>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>>
>>> -- Tino Didriksen
>>>
>>>
>>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
>>> wrote:
>>>
 I'm taking a look on how this list of names on Wikipedia:
 https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
 and how it is translated in beta.apertium:
 https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation

 There still are quite a few problems with HTML-tags if we look that the
 whole Iér is becoming a superscript, and also with italics. The space after
 the hyphen is an already known problem.

 By the way, I wonder whether it is possible to match in our
 dictionaries Iér. I have Iér in the dictionary, but when the
 ending ér stays as a superscript, as usually done in the texts, it is not
 matched. Should I add Iér to the dictionary?

 Hèctor

>>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hèctor can you check the page on beta now? The hyphen and the superscript
issues are solved. Of course, there's now a space between l and ér. If
that's a big problem we can discuss other solutions.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 3, 2020 at 8:09 PM Tino Didriksen 
wrote:

> I have adjusted Transfuse with how spaces are treated for Apertium, and
> implemented adding temporary spaces around  and . Changes are
> deployed on beta.
>
> I repeat my plea that all symbols should have an analysis. It breaks
> markup that things like - and : are not tokens.
>
> -- Tino Didriksen
>
>
> On Wed, 2 Sep 2020 at 13:23, Tino Didriksen 
> wrote:
>
>> That's not something the pipe ever sees - you can't fix it on your end.
>> It's something I have to adjust in Transfuse.
>>
>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>> and L629 expands inline tags to encompass surrounding plain text, because
>> it is unfortunately common for formatting to be partially on a word while
>> you really want the whole word translated as a unit.
>>
>> However, for HTML I should add spaces around  and  so that they
>> can't gobble up their surroundings. Tracked as
>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>
>> -- Tino Didriksen
>>
>>
>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
>> wrote:
>>
>>> I'm taking a look on how this list of names on Wikipedia:
>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>>> and how it is translated in beta.apertium:
>>> https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>>
>>> There still are quite a few problems with HTML-tags if we look that the
>>> whole Iér is becoming a superscript, and also with italics. The space after
>>> the hyphen is an already known problem.
>>>
>>> By the way, I wonder whether it is possible to match in our dictionaries
>>> Iér. I have Iér in the dictionary, but when the ending ér stays
>>> as a superscript, as usually done in the texts, it is not matched. Should I
>>> add Iér to the dictionary?
>>>
>>> Hèctor
>>>
>> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tino Didriksen
I have adjusted Transfuse with how spaces are treated for Apertium, and
implemented adding temporary spaces around  and . Changes are
deployed on beta.

I repeat my plea that all symbols should have an analysis. It breaks markup
that things like - and : are not tokens.

-- Tino Didriksen


On Wed, 2 Sep 2020 at 13:23, Tino Didriksen  wrote:

> That's not something the pipe ever sees - you can't fix it on your end.
> It's something I have to adjust in Transfuse.
>
> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
> and L629 expands inline tags to encompass surrounding plain text, because
> it is unfortunately common for formatting to be partially on a word while
> you really want the whole word translated as a unit.
>
> However, for HTML I should add spaces around  and  so that they
> can't gobble up their surroundings. Tracked as
> https://github.com/TinoDidriksen/Transfuse/issues/7
>
> -- Tino Didriksen
>
>
> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
> wrote:
>
>> I'm taking a look on how this list of names on Wikipedia:
>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>> and how it is translated in beta.apertium:
>> https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>
>> There still are quite a few problems with HTML-tags if we look that the
>> whole Iér is becoming a superscript, and also with italics. The space after
>> the hyphen is an already known problem.
>>
>> By the way, I wonder whether it is possible to match in our dictionaries
>> Iér. I have Iér in the dictionary, but when the ending ér stays
>> as a superscript, as usually done in the texts, it is not matched. Should I
>> add Iér to the dictionary?
>>
>> Hèctor
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Kevin Brubeck Unhammer
Tanmai Khanna 
čálii:

>> So currently if I have the multiword "i dag", it'll recognize
> "idag" but it won't recognize "i dag"? (And I suppose if
> I have the non-multiword "today" it won't recognize "today".)
>
> Exactly, but even when it recognises "idag", the  will probably
> be lost because it's being seen as a normal blank.
>
>> One possibility might be to have wordbound blanks match "space or
> epsilon" in lt-proc – then it would recognize all of the above.
>
> I had to do this for postgeneration and it wasn't trivial, so it's not like
> I can't do it for the analyser as well, but we decided that all multiword
> matches will be offloaded to apertium-separable, so the individual parts
> can be analysed as LUs and then apertium-separable can combine them into
> one LU. I have already modified apertium-separable such that it applies the
> individual markups on the final MWE. If this is done then
> both "idag" and "i dag" will be recognised and the italics
> will apply on the entire word.
>
> If this isn't acceptable or too much of an inconvenience, then I can modify
> the analyser.

Using separable for those cases seems like a good solution to me :)


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
> So currently if I have the multiword "i dag", it'll recognize
"idag" but it won't recognize "i dag"? (And I suppose if
I have the non-multiword "today" it won't recognize "today".)

Exactly, but even when it recognises "idag", the  will probably
be lost because it's being seen as a normal blank.

> One possibility might be to have wordbound blanks match "space or
epsilon" in lt-proc – then it would recognize all of the above.

I had to do this for postgeneration and it wasn't trivial, so it's not like
I can't do it for the analyser as well, but we decided that all multiword
matches will be offloaded to apertium-separable, so the individual parts
can be analysed as LUs and then apertium-separable can combine them into
one LU. I have already modified apertium-separable such that it applies the
individual markups on the final MWE. If this is done then
both "idag" and "i dag" will be recognised and the italics
will apply on the entire word.

If this isn't acceptable or too much of an inconvenience, then I can modify
the analyser.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 3, 2020 at 12:49 PM Kevin Brubeck Unhammer 
wrote:

> Tanmai Khanna 
> čálii:
>
> > the analyser sees wordbound blanks as normal blanks,
>
> So currently if I have the multiword "i dag", it'll recognize
> "idag" but it won't recognize "i dag"? (And I suppose if
> I have the non-multiword "today" it won't recognize "today".)
>
> One possibility might be to have wordbound blanks match "space or
> epsilon" in lt-proc – then it would recognize all of the above.
>
> It seems OK that the wblank then applies to the whole LU, so what comes
> out from translating "idag" into English is "today".
>
> Though maybe it would be hard to implement – does this happen a lot?
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Kevin Brubeck Unhammer
Tanmai Khanna 
čálii:

> the analyser sees wordbound blanks as normal blanks,

So currently if I have the multiword "i dag", it'll recognize
"idag" but it won't recognize "i dag"? (And I suppose if
I have the non-multiword "today" it won't recognize "today".)

One possibility might be to have wordbound blanks match "space or
epsilon" in lt-proc – then it would recognize all of the above.

It seems OK that the wblank then applies to the whole LU, so what comes
out from translating "idag" into English is "today".

Though maybe it would be hard to implement – does this happen a lot?


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Oh I see the hyphen thing. That should've been fixed after the latest
commit. Will check it out.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 3, 2020 at 12:34 PM Tanmai Khanna 
wrote:

> Hey,
> As of now, the analyser sees wordbound blanks as normal blanks, and so
> when they occur, the dictionary will often not recognise multiwords. The
> reason this was done was because we are offloading multiwords to
> apertium-separable anyway. As for Iér, given that Tino Didriksen
> is able to fix this without adding spaces around it, adding Iér to the
> dictionary will make it recognise the word but there's a problem, as stated
> later. If spaces are added, then the space will be one blank and the
> wordbound blank another so it won't match.
>
> If these kind of cases can't be handled in apertium-separable, then I can
> at some point modify the analyser to ignore wblanks when doing FST
> matching, although I guess the point of the offloading was that in the
> analyser we stop handling anything that has a blank between it. But since
> wblanks aren't supposed to be "blanks", technically, we can have lér to
> the dictionary, and modify the analyser to deal with it.
>
> But there is a much bigger problem wherever we handle this: wordbound
> blanks apply to LUs, so if at any point we get an LU ^Iér$, then there's
> really no way to tell the pipe that the superscript applies just on the ér.
> So yeah, fundamentally, as of now *it's not possible to have markup on
> part of an LU.* It's possible though if you keep I & ér as separate LUs.
>
> Also Hèctor, is the space after hyphen issue still there? Looks fine to me.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Wed, Sep 2, 2020 at 4:55 PM Tino Didriksen 
> wrote:
>
>> That's not something the pipe ever sees - you can't fix it on your end.
>> It's something I have to adjust in Transfuse.
>>
>> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
>> and L629 expands inline tags to encompass surrounding plain text, because
>> it is unfortunately common for formatting to be partially on a word while
>> you really want the whole word translated as a unit.
>>
>> However, for HTML I should add spaces around  and  so that they
>> can't gobble up their surroundings. Tracked as
>> https://github.com/TinoDidriksen/Transfuse/issues/7
>>
>> -- Tino Didriksen
>>
>>
>> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
>> wrote:
>>
>>> I'm taking a look on how this list of names on Wikipedia:
>>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>>> and how it is translated in beta.apertium:
>>> https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>>
>>> There still are quite a few problems with HTML-tags if we look that the
>>> whole Iér is becoming a superscript, and also with italics. The space after
>>> the hyphen is an already known problem.
>>>
>>> By the way, I wonder whether it is possible to match in our dictionaries
>>> Iér. I have Iér in the dictionary, but when the ending ér stays
>>> as a superscript, as usually done in the texts, it is not matched. Should I
>>> add Iér to the dictionary?
>>>
>>> Hèctor
>>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-03 Thread Tanmai Khanna
Hey,
As of now, the analyser sees wordbound blanks as normal blanks, and so when
they occur, the dictionary will often not recognise multiwords. The reason
this was done was because we are offloading multiwords to
apertium-separable anyway. As for Iér, given that Tino Didriksen
is able to fix this without adding spaces around it, adding Iér to the
dictionary will make it recognise the word but there's a problem, as stated
later. If spaces are added, then the space will be one blank and the
wordbound blank another so it won't match.

If these kind of cases can't be handled in apertium-separable, then I can
at some point modify the analyser to ignore wblanks when doing FST
matching, although I guess the point of the offloading was that in the
analyser we stop handling anything that has a blank between it. But since
wblanks aren't supposed to be "blanks", technically, we can have lér to
the dictionary, and modify the analyser to deal with it.

But there is a much bigger problem wherever we handle this: wordbound
blanks apply to LUs, so if at any point we get an LU ^Iér$, then there's
really no way to tell the pipe that the superscript applies just on the ér.
So yeah, fundamentally, as of now *it's not possible to have markup on part
of an LU.* It's possible though if you keep I & ér as separate LUs.

Also Hèctor, is the space after hyphen issue still there? Looks fine to me.

*तन्मय खन्ना *
*Tanmai Khanna*


On Wed, Sep 2, 2020 at 4:55 PM Tino Didriksen 
wrote:

> That's not something the pipe ever sees - you can't fix it on your end.
> It's something I have to adjust in Transfuse.
>
> https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604
> and L629 expands inline tags to encompass surrounding plain text, because
> it is unfortunately common for formatting to be partially on a word while
> you really want the whole word translated as a unit.
>
> However, for HTML I should add spaces around  and  so that they
> can't gobble up their surroundings. Tracked as
> https://github.com/TinoDidriksen/Transfuse/issues/7
>
> -- Tino Didriksen
>
>
> On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
> wrote:
>
>> I'm taking a look on how this list of names on Wikipedia:
>> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
>> and how it is translated in beta.apertium:
>> https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>>
>> There still are quite a few problems with HTML-tags if we look that the
>> whole Iér is becoming a superscript, and also with italics. The space after
>> the hyphen is an already known problem.
>>
>> By the way, I wonder whether it is possible to match in our dictionaries
>> Iér. I have Iér in the dictionary, but when the ending ér stays
>> as a superscript, as usually done in the texts, it is not matched. Should I
>> add Iér to the dictionary?
>>
>> Hèctor
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-02 Thread Tino Didriksen
That's not something the pipe ever sees - you can't fix it on your end.
It's something I have to adjust in Transfuse.

https://github.com/TinoDidriksen/Transfuse/blob/master/src/dom.cpp#L604 and
L629 expands inline tags to encompass surrounding plain text, because it is
unfortunately common for formatting to be partially on a word while you
really want the whole word translated as a unit.

However, for HTML I should add spaces around  and  so that they
can't gobble up their surroundings. Tracked as
https://github.com/TinoDidriksen/Transfuse/issues/7

-- Tino Didriksen


On Wed, 2 Sep 2020 at 12:58, Hèctor Alòs i Font 
wrote:

> I'm taking a look on how this list of names on Wikipedia:
> https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
> and how it is translated in beta.apertium:
> https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation
>
> There still are quite a few problems with HTML-tags if we look that the
> whole Iér is becoming a superscript, and also with italics. The space after
> the hyphen is an already known problem.
>
> By the way, I wonder whether it is possible to match in our dictionaries
> Iér. I have Iér in the dictionary, but when the ending ér stays
> as a superscript, as usually done in the texts, it is not matched. Should I
> add Iér to the dictionary?
>
> Hèctor
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-09-02 Thread Hèctor Alòs i Font
I'm taking a look on how this list of names on Wikipedia:
https://frp.wikipedia.org/wiki/Lista_des_comtos_et_ducs_de_Savou%C3%A8
and how it is translated in beta.apertium:
https://beta.apertium.org/index.fra.html?dir=frp-fra=https%3A%2F%2Ffrp.wikipedia.org%2Fwiki%2FLista_des_comtos_et_ducs_de_Savou%25C3%25A8#webpageTranslation

There still are quite a few problems with HTML-tags if we look that the
whole Iér is becoming a superscript, and also with italics. The space after
the hyphen is an already known problem.

By the way, I wonder whether it is possible to match in our dictionaries
Iér. I have Iér in the dictionary, but when the ending ér stays
as a superscript, as usually done in the texts, it is not matched. Should I
add Iér to the dictionary?

Hèctor

Missatge de Tanmai Khanna  del dia dj., 27 d’ag.
2020 a les 22:45:

> Unhammer I think I've implemented this in:
> https://github.com/apertium/apertium/pull/102 . If it looks good I can
> implement in interchunk and postchunk as well.
>
> The blanks are stored as a queue and output in available s in the rule
> output. If any are remaining they're output after the rule output.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Aug 27, 2020 at 3:05 PM Kevin Brubeck Unhammer 
> wrote:
>
>> Tanmai Khanna 
>> čálii:
>>
>> > So what I'll try to do, is after the blanks are collected, lets say X is
>> > the number of source LUs in the pattern and Y is the number of output
>> LUs.
>> > If X = Y then we can keep them in the same place, if X < Y, then we can
>> > keep them in the first X gaps the rest can be spaces or whatever the
>> user
>> > denotes. If X > Y, then we can print the first Y blanks and then flush
>> the
>> > remaining. After this the  option will become useless. Does that
>> > sound good?
>>
>> By "gaps" do you mean where the rule is outputting a ? So if input
>> is "ab c" and a rule matching that has two 's in its , the
>>  gets output on the first  and then on the second  we get
>> a regular space. If the rule has three 's, the third one is also
>> a regular space. If the rule has no 's, the  gets output after
>> the rule output. That would be nice (though I could also live with the
>>  always ending up after the rule as long as I never have to think
>> about pos="…")
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Kevin Brubeck Unhammer
Tanmai Khanna 
čálii:

> So what I'll try to do, is after the blanks are collected, lets say X is
> the number of source LUs in the pattern and Y is the number of output LUs.
> If X = Y then we can keep them in the same place, if X < Y, then we can
> keep them in the first X gaps the rest can be spaces or whatever the user
> denotes. If X > Y, then we can print the first Y blanks and then flush the
> remaining. After this the  option will become useless. Does that
> sound good?

By "gaps" do you mean where the rule is outputting a ? So if input
is "ab c" and a rule matching that has two 's in its , the
 gets output on the first  and then on the second  we get
a regular space. If the rule has three 's, the third one is also
a regular space. If the rule has no 's, the  gets output after
the rule output. That would be nice (though I could also live with the
 always ending up after the rule as long as I never have to think
about pos="…")


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Again, remember these aren't wordbound blanks or block tags, just
superblanks, like , or other tags that aren't hard breaks or wordbound.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Aug 27, 2020 at 2:33 PM Tanmai Khanna 
wrote:

> Or, if we want to give the user complete control of the blanks between the
> output LUs, i.e. if they want a space or not, we can just flush all the
> blanks in the patterns before any LUs are output (or after). It's
> considerably easier to implement and gives the user complete control over
> the spaces between their output LUs.
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Aug 27, 2020 at 2:31 PM Tanmai Khanna 
> wrote:
>
>> So what I'll try to do, is after the blanks are collected, lets say X is
>> the number of source LUs in the pattern and Y is the number of output LUs.
>> If X = Y then we can keep them in the same place, if X < Y, then we can
>> keep them in the first X gaps the rest can be spaces or whatever the user
>> denotes. If X > Y, then we can print the first Y blanks and then flush the
>> remaining. After this the  option will become useless. Does that
>> sound good?
>>
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>>
>>
>> On Thu, Aug 27, 2020 at 2:28 PM Francis Tyers 
>> wrote:
>>
>>> El 2020-08-27 09:54, Kevin Brubeck Unhammer escribió:
>>> > Tanmai Khanna 
>>> > čálii:
>>> >
>>> >> Hmm, well then I guess for now the small list of tags that still
>>> exist
>>> >> as
>>> >> blanks have to be printed using the  option. I could
>>> >> change it
>>> >> so that it flushes all blanks, or keeps them in their position if
>>> >> possible.
>>> >> The good thing is that these aren't wordbound semantically so them not
>>> >> being in their "correct" position won't cause a lot of issues.
>>> >>
>>> >> This can be discussed: Do we want control over where to print these
>>> >> tags if
>>> >> they exist in the stream (not wordbound tags or block tags), or do we
>>> >> want
>>> >> that they flush out anyway and the user shouldn't worry about the
>>> >> blanks.
>>> >
>>> > I would prefer that the user doesn't have to worry about the blanks (or
>>> > the b pos attribute). I can't imagine a use-case where it would matter
>>> > since we don't actually look at the contents, when we write  in
>>> > a rule we just want there to be some space between the words.
>>> >
>>> > Ideally the blanks would be printed as close as possible to where they
>>> > were, but if these are rare (and don't belong on words and don't have
>>> > anything to do with structure) maybe it's not so important, as long as
>>> > they get printed before/after the rule output.
>>> >
>>>
>>> Agree, blank handling should not really be handled in transfer, putting
>>> 
>>> is ok, but we shouldn't be controlling it with the pos="" option.
>>>
>>> Fran
>>>
>>>
>>> ___
>>> Apertium-stuff mailing list
>>> Apertium-stuff@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Or, if we want to give the user complete control of the blanks between the
output LUs, i.e. if they want a space or not, we can just flush all the
blanks in the patterns before any LUs are output (or after). It's
considerably easier to implement and gives the user complete control over
the spaces between their output LUs.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Aug 27, 2020 at 2:31 PM Tanmai Khanna 
wrote:

> So what I'll try to do, is after the blanks are collected, lets say X is
> the number of source LUs in the pattern and Y is the number of output LUs.
> If X = Y then we can keep them in the same place, if X < Y, then we can
> keep them in the first X gaps the rest can be spaces or whatever the user
> denotes. If X > Y, then we can print the first Y blanks and then flush the
> remaining. After this the  option will become useless. Does that
> sound good?
>
> *तन्मय खन्ना *
> *Tanmai Khanna*
>
>
> On Thu, Aug 27, 2020 at 2:28 PM Francis Tyers  wrote:
>
>> El 2020-08-27 09:54, Kevin Brubeck Unhammer escribió:
>> > Tanmai Khanna 
>> > čálii:
>> >
>> >> Hmm, well then I guess for now the small list of tags that still exist
>> >> as
>> >> blanks have to be printed using the  option. I could
>> >> change it
>> >> so that it flushes all blanks, or keeps them in their position if
>> >> possible.
>> >> The good thing is that these aren't wordbound semantically so them not
>> >> being in their "correct" position won't cause a lot of issues.
>> >>
>> >> This can be discussed: Do we want control over where to print these
>> >> tags if
>> >> they exist in the stream (not wordbound tags or block tags), or do we
>> >> want
>> >> that they flush out anyway and the user shouldn't worry about the
>> >> blanks.
>> >
>> > I would prefer that the user doesn't have to worry about the blanks (or
>> > the b pos attribute). I can't imagine a use-case where it would matter
>> > since we don't actually look at the contents, when we write  in
>> > a rule we just want there to be some space between the words.
>> >
>> > Ideally the blanks would be printed as close as possible to where they
>> > were, but if these are rare (and don't belong on words and don't have
>> > anything to do with structure) maybe it's not so important, as long as
>> > they get printed before/after the rule output.
>> >
>>
>> Agree, blank handling should not really be handled in transfer, putting
>> 
>> is ok, but we shouldn't be controlling it with the pos="" option.
>>
>> Fran
>>
>>
>> ___
>> Apertium-stuff mailing list
>> Apertium-stuff@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
So what I'll try to do, is after the blanks are collected, lets say X is
the number of source LUs in the pattern and Y is the number of output LUs.
If X = Y then we can keep them in the same place, if X < Y, then we can
keep them in the first X gaps the rest can be spaces or whatever the user
denotes. If X > Y, then we can print the first Y blanks and then flush the
remaining. After this the  option will become useless. Does that
sound good?

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Aug 27, 2020 at 2:28 PM Francis Tyers  wrote:

> El 2020-08-27 09:54, Kevin Brubeck Unhammer escribió:
> > Tanmai Khanna 
> > čálii:
> >
> >> Hmm, well then I guess for now the small list of tags that still exist
> >> as
> >> blanks have to be printed using the  option. I could
> >> change it
> >> so that it flushes all blanks, or keeps them in their position if
> >> possible.
> >> The good thing is that these aren't wordbound semantically so them not
> >> being in their "correct" position won't cause a lot of issues.
> >>
> >> This can be discussed: Do we want control over where to print these
> >> tags if
> >> they exist in the stream (not wordbound tags or block tags), or do we
> >> want
> >> that they flush out anyway and the user shouldn't worry about the
> >> blanks.
> >
> > I would prefer that the user doesn't have to worry about the blanks (or
> > the b pos attribute). I can't imagine a use-case where it would matter
> > since we don't actually look at the contents, when we write  in
> > a rule we just want there to be some space between the words.
> >
> > Ideally the blanks would be printed as close as possible to where they
> > were, but if these are rare (and don't belong on words and don't have
> > anything to do with structure) maybe it's not so important, as long as
> > they get printed before/after the rule output.
> >
>
> Agree, blank handling should not really be handled in transfer, putting
> 
> is ok, but we shouldn't be controlling it with the pos="" option.
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Francis Tyers

El 2020-08-27 09:54, Kevin Brubeck Unhammer escribió:

Tanmai Khanna 
čálii:

Hmm, well then I guess for now the small list of tags that still exist 
as
blanks have to be printed using the  option. I could 
change it
so that it flushes all blanks, or keeps them in their position if 
possible.

The good thing is that these aren't wordbound semantically so them not
being in their "correct" position won't cause a lot of issues.

This can be discussed: Do we want control over where to print these 
tags if
they exist in the stream (not wordbound tags or block tags), or do we 
want
that they flush out anyway and the user shouldn't worry about the 
blanks.


I would prefer that the user doesn't have to worry about the blanks (or
the b pos attribute). I can't imagine a use-case where it would matter
since we don't actually look at the contents, when we write  in
a rule we just want there to be some space between the words.

Ideally the blanks would be printed as close as possible to where they
were, but if these are rare (and don't belong on words and don't have
anything to do with structure) maybe it's not so important, as long as
they get printed before/after the rule output.



Agree, blank handling should not really be handled in transfer, putting 


is ok, but we shouldn't be controlling it with the pos="" option.

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Tanmai Khanna
Hmm, well then I guess for now the small list of tags that still exist as
blanks have to be printed using the  option. I could change it
so that it flushes all blanks, or keeps them in their position if possible.
The good thing is that these aren't wordbound semantically so them not
being in their "correct" position won't cause a lot of issues.

This can be discussed: Do we want control over where to print these tags if
they exist in the stream (not wordbound tags or block tags), or do we want
that they flush out anyway and the user shouldn't worry about the blanks.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Aug 27, 2020 at 1:20 PM Kevin Brubeck Unhammer 
wrote:

> Tanmai Khanna 
> čálii:
>
> > I always thought that's the default behaviour. That if some blanks aren't
> > explicitly printed in the transfer rules then they're flushed. I'll check
> > it out, but it should be that.
>
> The old behaviour has been to just throw away anything that's eaten by
> a rule but not explicitly printed. So if you had a rule matching two
> patterns, for example the words "ph'nglui mglw'nafh", and your input was
> "ph'nglui mglw'nafh", but you just used  and not
> , then transfer would eat the first blink giving "ph'nglui
> mglw'nafh" and you would not be eaten first.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-27 Thread Kevin Brubeck Unhammer
Tanmai Khanna 
čálii:

> I always thought that's the default behaviour. That if some blanks aren't
> explicitly printed in the transfer rules then they're flushed. I'll check
> it out, but it should be that.

The old behaviour has been to just throw away anything that's eaten by
a rule but not explicitly printed. So if you had a rule matching two
patterns, for example the words "ph'nglui mglw'nafh", and your input was
"ph'nglui mglw'nafh", but you just used  and not
, then transfer would eat the first blink giving "ph'nglui
mglw'nafh" and you would not be eaten first.


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Tanmai Khanna
I always thought that's the default behaviour. That if some blanks aren't
explicitly printed in the transfer rules then they're flushed. I'll check
it out, but it should be that.

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Aug 27, 2020 at 1:27 AM Kevin Brubeck Unhammer 
wrote:

> Tanmai Khanna 
> čálii:
>
> > Thanks Unhammer!
> > So now we have three kinds of units: block tags, superblanks, and
> wordbound
> > blanks. Block tags are hard breaks in the text, wordbound blanks move
> > around with words, and superblanks are tags that aren't hard breaks but
> not
> > attached to words (such as ). Tino can give you a list of tags and
> > their classifications.
> >
> > As per your question about transfer, the  refer to the
> > superblanks if they exist in the input. Wordbound blanks will reorder
> > automatically and block tags won't move at all.
> >
> > We can also decide to make the remaining superblanks immovable and just
> > output them when a rule is matched, but that is a decision that can be
> > taken in the future. For now, blanks in transfer rules work for any
> > superblanks that still exist in the stream. Hope that answered your
> > question :))
>
> Almost :) If input has  somewhere, and my transfer rules don't have
> any  (only maybe ), will it still be output?
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Kevin Brubeck Unhammer
Tanmai Khanna 
čálii:

> Thanks Unhammer!
> So now we have three kinds of units: block tags, superblanks, and wordbound
> blanks. Block tags are hard breaks in the text, wordbound blanks move
> around with words, and superblanks are tags that aren't hard breaks but not
> attached to words (such as ). Tino can give you a list of tags and
> their classifications.
>
> As per your question about transfer, the  refer to the
> superblanks if they exist in the input. Wordbound blanks will reorder
> automatically and block tags won't move at all.
>
> We can also decide to make the remaining superblanks immovable and just
> output them when a rule is matched, but that is a decision that can be
> taken in the future. For now, blanks in transfer rules work for any
> superblanks that still exist in the stream. Hope that answered your
> question :))

Almost :) If input has  somewhere, and my transfer rules don't have
any  (only maybe ), will it still be output?


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Tanmai Khanna
Thanks Unhammer!
So now we have three kinds of units: block tags, superblanks, and wordbound
blanks. Block tags are hard breaks in the text, wordbound blanks move
around with words, and superblanks are tags that aren't hard breaks but not
attached to words (such as ). Tino can give you a list of tags and
their classifications.

As per your question about transfer, the  refer to the
superblanks if they exist in the input. Wordbound blanks will reorder
automatically and block tags won't move at all.

We can also decide to make the remaining superblanks immovable and just
output them when a rule is matched, but that is a decision that can be
taken in the future. For now, blanks in transfer rules work for any
superblanks that still exist in the stream. Hope that answered your
question :))

*तन्मय खन्ना *
*Tanmai Khanna*


On Wed, Aug 26, 2020 at 11:43 PM Kevin Brubeck Unhammer 
wrote:

> Woohoo congrats and thanks for all the hard work Tanmai and Tino =D
> The superblank issues have been a pain for quite some time.
>
> How does it work with transfer now, what are the semantics of things
> like  or just  ?
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

2020-08-26 Thread Kevin Brubeck Unhammer
Woohoo congrats and thanks for all the hard work Tanmai and Tino =D
The superblank issues have been a pain for quite some time.

How does it work with transfer now, what are the semantics of things
like  or just  ?


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff