Re: [Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-12 Thread Tanmai Khanna
Thanks a lot for your suggestions guys! :))

*तन्मय खन्ना *
*Tanmai Khanna*


On Thu, Sep 10, 2020 at 7:57 AM Samuel Sloniker 
wrote:

>
>
> On Wed, Sep 9, 2020, 02:34 Tanmai Khanna  wrote:
>
>> (Accepting snazzier title suggestions).
>>
> Maybe "Web" instead of "internet"?
>
>> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-09 Thread Samuel Sloniker
On Wed, Sep 9, 2020, 02:34 Tanmai Khanna  wrote:

> (Accepting snazzier title suggestions).
>
Maybe "Web" instead of "internet"?

>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-09 Thread Aure Séguier

Hello

I don't know about its maturity in Apertium, but I know that basque 
language is a lot syntactically different from spanish and french (I 
think spa-eus pair is more mature). Maybe it could be a good example.


Regards,

Aure Séguier, desvolopaira
Lo Congrès permanent de la lenga occitana
Castèth d'Este/Château d'Este Av. de la Pléiade 64140 Billère/Vilhèra
T. +33 (0)5 59 13 06 40 - Fax : +33 (0)5 59 13 06 44
a.segu...@locongres.org 
www.locongres.org 


Le 09/09/2020 à 11:33, Tanmai Khanna a écrit :

Hey guys,
I'm writing a system demonstration to be submitted at LowResMT 2020 
about the recent project that was done as part of GSoC, titled 
"Translating the internet into low resource languages with Apertium" 
(Accepting snazzier title suggestions).


As part of this demonstration, I want to show some real world examples 
of how the new system of markup handling will help the translation of 
webpages and formatted documents - odt, pptx, rtx, etc. To show this 
effectively, I need to choose 3-4 released language pairs that are 
sufficiently syntactically divergent that they show the effect of 
markup reordering in the translation output. As far as I know, spa-cat 
is one of our most mature pairs, however I'm not sure how 
syntactically divergent it is. If it is, then I'm happy to be 
corrected. If your language pair has had issues with webpage 
translation and those issues are now solved (ish), then some examples 
would be really helpful.


TLDR: I need suggestions of language pairs which are mature, low 
resource (at least the target language), and which are syntactically 
divergent enough to see the benefits of markup handling in the 
translation. If you can provide examples, that'll be great as well. 
Any help will be sincerely appreciated :))


Thanks and Regards,
*तन्मय खन्ना *
*Tanmai Khanna*


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-09 Thread Jaume Ortolà i Font
Missatge de Xavi Ivars  del dia dc., 9 de set. 2020 a
les 13:53:

> Missatge de Tanmai Khanna  del dia dc., 9 de
> set. 2020 a les 11:34:
>
>> Hey guys,
>> I'm writing a system demonstration to be submitted at LowResMT 2020 about
>> the recent project that was done as part of GSoC, titled "Translating the
>> internet into low resource languages with Apertium" (Accepting snazzier
>> title suggestions).
>>
>> As part of this demonstration, I want to show some real world examples of
>> how the new system of markup handling will help the translation of webpages
>> and formatted documents - odt, pptx, rtx, etc. To show this effectively, I
>> need to choose 3-4 released language pairs that are sufficiently
>> syntactically divergent that they show the effect of markup reordering in
>> the translation output. As far as I know, spa-cat is one of our most mature
>> pairs, however I'm not sure how syntactically divergent it is. If it is,
>> then I'm happy to be corrected. If your language pair has had issues with
>> webpage translation and those issues are now solved (ish), then some
>> examples would be really helpful.
>>
>>
> Spanish and Catalan are very similar in terms of syntax. We could
> definitely try to get examples of where diverge the most, but those
> examples would need to be completely synthetic.
>
> Markup handling helps, though, in markup handling on different areas: some
> formats where inline tags are common (like ODT), previous
> formatter/deformatter was splitting words where tags appeared, so
> translation of those has improved quite a lot.
>

Spanish and Catalan diverge syntactically in the use of some prepositions.
For example, *de que* (spa) usually becomes *que* (cat).

The last days there has been some oscillation in the translations when a
quotation mark is just in the middle of *de "que*. See:
https://github.com/apertium/apertium-spa-cat/commit/f6f7ee2f560b7ae817ac87c33278a5c4354090dc

But I'm not sure if there has been a real improvement or not. I have to
look at it more carefully.

Jaume
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-09 Thread Xavi Ivars
Missatge de Tanmai Khanna  del dia dc., 9 de set.
2020 a les 11:34:

> Hey guys,
> I'm writing a system demonstration to be submitted at LowResMT 2020 about
> the recent project that was done as part of GSoC, titled "Translating the
> internet into low resource languages with Apertium" (Accepting snazzier
> title suggestions).
>
> As part of this demonstration, I want to show some real world examples of
> how the new system of markup handling will help the translation of webpages
> and formatted documents - odt, pptx, rtx, etc. To show this effectively, I
> need to choose 3-4 released language pairs that are sufficiently
> syntactically divergent that they show the effect of markup reordering in
> the translation output. As far as I know, spa-cat is one of our most mature
> pairs, however I'm not sure how syntactically divergent it is. If it is,
> then I'm happy to be corrected. If your language pair has had issues with
> webpage translation and those issues are now solved (ish), then some
> examples would be really helpful.
>
>
Spanish and Catalan are very similar in terms of syntax. We could
definitely try to get examples of where diverge the most, but those
examples would need to be completely synthetic.

Markup handling helps, though, in markup handling on different areas: some
formats where inline tags are common (like ODT), previous
formatter/deformatter was splitting words where tags appeared, so
translation of those has improved quite a lot.

-- 
< Xavi Ivars >
< http://xavi.ivars.me >
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Need help to decide language pairs for examples of markup handling

2020-09-09 Thread Tanmai Khanna
Hey guys,
I'm writing a system demonstration to be submitted at LowResMT 2020 about
the recent project that was done as part of GSoC, titled "Translating the
internet into low resource languages with Apertium" (Accepting snazzier
title suggestions).

As part of this demonstration, I want to show some real world examples of
how the new system of markup handling will help the translation of webpages
and formatted documents - odt, pptx, rtx, etc. To show this effectively, I
need to choose 3-4 released language pairs that are sufficiently
syntactically divergent that they show the effect of markup reordering in
the translation output. As far as I know, spa-cat is one of our most mature
pairs, however I'm not sure how syntactically divergent it is. If it is,
then I'm happy to be corrected. If your language pair has had issues with
webpage translation and those issues are now solved (ish), then some
examples would be really helpful.

TLDR: I need suggestions of language pairs which are mature, low resource
(at least the target language), and which are syntactically divergent
enough to see the benefits of markup handling in the translation. If you
can provide examples, that'll be great as well. Any help will be sincerely
appreciated :))

Thanks and Regards,
*तन्मय खन्ना *
*Tanmai Khanna*
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff