Re: [Apertium-stuff] add ambiguous weighted rules to apertium-transfer challenge

2019-03-11 Thread Francis Tyers

El 2019-03-12 00:51, Aboelhamd Aly escribió:

Hi all,

This challenge was set for me by spectie, and it was to integrate the
ambiguous weighted rules program to apertium-transfer program.
Sorry Spectie for being late, I was unavailable for some days.
I think the challenge is finished now and the new command for our
program - also added to the help message for apertium-transfer - is :

apertium-transfer -a trules localeid models k [input [output]]

where -a for ambiguous, trules for chunker transfer file (.t1x),
localeid is ICU locale ID for the source language -I think there is a
way to make it figures the source language automatically- , models is
the yasmet trained models folder path , k is beam size for beam search
algorithm used to choose the best possible translation , input is the
input from lexical transfer and finally output is the output file path
or stdout id left empty and it has the program output which is the
ambiguous rules out chunks for the first stage.

- For the models being a destination for models folder , instead of a
file contains all the models. It is not a hard task to merge all the
models into one file through some script, though it would need a
little modification in the program manipulation with the models. So I
preferred to go on and learn how integrate the code into apertium's
and how to modify the makefile. I struggled a bit with the makefile as
I have a little exposure to it with more simpler makefiles. Also I
needed to modify our program a little bit to be able to integrate with
apertium.
- For preproc, I omitted it because we don't use it in our program,
just the .t1x file.
- For input being default instead of lexical transfer only, I didn't
manage to make it an option. because I will have then to use
transfer.cc with our code, which I found hard for the task of just
making the code work as it is. As I was supposed to make the transfer
object to do only the preBilingual without making the actual transfer.

But now I still have some doubts with your statement spectie saying
"using our coding style" as it was a little vague, or may be it was so
because I didn't ask about it , I wanted to make it work first then
ask about further information. I wondered If you meant to use the
transfer.cc file instead of our implemented one -which is a difficult
task because I think I need enough time to go deep and understand your
code and then be able to use it to achieve the same result, though it
would be very less buggy I think and also lots our program's bugs have
been solved throughout the past months - , or to make our
implementation code style like yours -which is a difficult task too
because your code is so professional compared to ours and I think also
it would take some time- , or finally to integrate our code with
transfer.cc and then make it used by apertium_transfer. Honestly, I
chose the easiest and fastest solution which is doing little
modification to my code and yours to make the program works.

I forked apertium core and then added and modified some files and it's
now ready in my forked repo, you can take a look here
https://github.com/aboelhamd/apertium

And now spectie, what's next ? Can we discuss further in the
documentation , thoughts and questions I wrote in the past week or two
, or you still have some modifications or tasks for me to do ?

Thanks and sorry for my verbose message.


Hi Aboelhamd,

Thanks for your email. Could you do a couple of things that will help us
to review your code more easily:

1) Make your changes in a branch
2) Make a pull request

I'll try and get back with further comments shortly.

Greetings from Kyrgyzstan!

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] add ambiguous weighted rules to apertium-transfer challenge

2019-03-11 Thread Aboelhamd Aly
Hi all,

This challenge was set for me by spectie, and it was to integrate the
ambiguous weighted rules program to apertium-transfer program.
Sorry Spectie for being late, I was unavailable for some days.
I think the challenge is finished now and the new command for our program -
also added to the help message for apertium-transfer - is :

apertium-transfer -a trules localeid models k [input [output]]

where -a for ambiguous, trules for chunker transfer file (.t1x), localeid
is ICU locale ID for the source language -I think there is a way to make it
figures the source language automatically- , models is the yasmet trained
models folder path , k is beam size for beam search algorithm used to
choose the best possible translation , input is the input from lexical
transfer and finally output is the output file path or stdout id left empty
and it has the program output which is the ambiguous rules out chunks for
the first stage.

- For the models being a destination for models folder , instead of a file
contains all the models. It is not a hard task to merge all the models into
one file through some script, though it would need a little modification in
the program manipulation with the models. So I preferred to go on and learn
how integrate the code into apertium's and how to modify the makefile. I
struggled a bit with the makefile as I have a little exposure to it with
more simpler makefiles. Also I needed to modify our program a little bit to
be able to integrate with apertium.
- For preproc, I omitted it because we don't use it in our program, just
the .t1x file.
- For input being default instead of lexical transfer only, I didn't manage
to make it an option. because I will have then to use transfer.cc with our
code, which I found hard for the task of just making the code work as it
is. As I was supposed to make the transfer object to do only the
preBilingual without making the actual transfer.

But now I still have some doubts with your statement spectie saying "using
our coding style" as it was a little vague, or may be it was so because I
didn't ask about it , I wanted to make it work first then ask about further
information. I wondered If you meant to use the transfer.cc file instead of
our implemented one -which is a difficult task because I think I need
enough time to go deep and understand your code and then be able to use it
to achieve the same result, though it would be very less buggy I think and
also lots our program's bugs have been solved throughout the past months -
, or to make our implementation code style like yours -which is a difficult
task too because your code is so professional compared to ours and I think
also it would take some time- , or finally to integrate our code with
transfer.cc and then make it used by apertium_transfer. Honestly, I chose
the easiest and fastest solution which is doing little modification to my
code and yours to make the program works.

I forked apertium core and then added and modified some files and it's now
ready in my forked repo, you can take a look here
https://github.com/aboelhamd/apertium

And now spectie, what's next ? Can we discuss further in the documentation
, thoughts and questions I wrote in the past week or two , or you still
have some modifications or tasks for me to do ?

Thanks and sorry for my verbose message.
Aboelhamd
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] OpenNMT

2019-03-11 Thread Sevilay Bayatlı
I have read through its documentation but I couldn't get idea of validation
data ,is it using a subset of the to-train data(source sentences) for
validation? Please can you point me?

Sevilay

On Mon, Mar 11, 2019 at 4:49 PM Tommi A Pirinen <
tommi.antero.piri...@uni-hamburg.de> wrote:

> On Sun, Mar 10, 2019 at 11:09:19PM +0300, Sevilay Bayatlı wrote:
>
> > Do you know any open neural MT? It is necessary for doing comparison with
> > kaz-tur MT.
>
> I found the python version to be the easiest to build a baseline with,
> it has step-by-step instructions:
>
> https://github.com/OpenNMT/OpenNMT-py#quickstart
>
> I have not gotten it to produce the bleu points that it should though...
>
> --
> Doktor Tommi A Pirinen, Computational Linguist,
> , Universität
> Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
> Entwickler.  President of ACL SIGUR SIG for Uralic languages
> .
> I tend to follow inline-posting style in desktop e-mail messages.
> -BEGIN PGP SIGNATURE-
>
> iQKTBAABCAB9FiEEBjaPger6S6xkwNHdtVxzs4xLlkIFAlyGaV5fFIAALgAo
> aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDA2
> MzY4RjgxRUFGQTRCQUM2NEMwRDFEREI1NUM3M0IzOEM0Qjk2NDIACgkQtVxzs4xL
> lkIfKw//dqz/BfJbjAD8KSZ5xXpWYxUa7ox80PU7k5WDJZBwQebFPwb/2qTTf3P0
> 7zanJf8j1qwwMi3isNnwO4Zbs0KrJV5l/XCJHpSlCH+OE828L9nZFP9/F6i4j3fk
> RQD5yb/lzB2r3NbDL7py4GD1kjVPo/QsOm05DlTulKO4Wv7bToVuXDpgd7OipXsC
> HGguZxFVtSEq3aRuZSJgKKO7H8mQs+TUBAC0tc+22DJxZKGrivmaSC299lhJgzzA
> oyjFJPGLzULlTlp9pC4XfJ6AXhG6hrjX4xBBS4GNLfqSkUm+WZiXsJASx+Przr+X
> zbmeDhDj2f1vD5TFk9I9y9Guh31HLHmgZWZSszC3C2a+9O58P749omS/xvBMSpdd
> +P118TioJaGBmFPTY9RjSB8wX9QSU7u8dl8xL1VnjkxQrC5syofuJhEV+6RFmeRO
> /XrcBKcj1hSm4txWajmF1BkaftXzUWzz8B9rwfyhcu/NaOSQq9VQgSu+i5y8whTc
> 8fkhYd7BtsObJUwt5QtSwttRWq0OOtR1YZ57JsQ0WipHjMgD/HPXL3IbZLOnuEKB
> FwmuLbAbEYgcLtZwRPxlTqLxSLllK+oB0ReQSe9Lmn4DSIWGybIIG27IYSzDwcd0
> 2cYrzWB7bLQmAadEIS+P7NfBdEKjz1+tDRmWeA0dJlDhdjA8imI=
> =bpCy
> -END PGP SIGNATURE-
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Released: nno-nob 1.2.0, swe-dan 0.8.0, swe-nor 0.3.0, dan-nor 1.4.0

2019-03-11 Thread Hèctor Alòs i Font
*Very* impressive work. Congrats!
Hèctor

Missatge de Kevin Brubeck Unhammer  del dia dl., 11 de
març 2019 a les 21:35:

> God aftan,
>
> New versions of the four Scandinavian pairs are now available from
> SourceForge, Github and apertium.org.
>
> These releases come courtesy of Nynorsk pressekontor / NPK (an enclave
> of Nynorsk journalists working within NTB, the Norwegian News
> Agency[1]), with funding from the Norwegian Ministry of Culture. There
> has been some press about the project.[2][3]
>
> NPK have been using apertium-nno-nob in production since fall 2018 –
> it's integrated into their translation/editing systems – and we've been
> continually improving it with the help of their post-edits and
> feedback. The form/spelling/style choices used by nob→nno are now more
> modern and uniform (there was a major release of Nynorsk[4] back in
> 2012, while most style decisions in the translator were made in the
> first release back in 2009).
>
> Other major changes to nno-nob:
> - 35 new transfer rules[5]
> - 248 new lrx rules
> - about 42.000 new names and 3.800 new non-names added to bidix
> - regression testing by checking that WER does not drop
> - lots of work on nob disambiguation
> - we now do long-distance adjective congruence
> - there's a post-nno.dix to get rid of triple consonants resulting from
>   compounding
> - compounding happens on proper nouns too now
> - genitives are translated not just by preposition-rewriting, but we now
>   also have:
>   - lists of exceptions where we want to keep genitives
>   - rewriting some nouns with relatives
>   - rewriting nationalities with adjectives
>   - rewriting some abstract nouns into compounds
>
> The project is not yet done, but people have been asking about when the
> fruits of it will show up on apertium.org :-)
>
> The other three pairs have also had improvements since last release;
> some were also getting pretty bad testvoc-issues due to changes in
> dependencies[6], so they get releases too. Apart from testvoc, the pairs
> have gotten some transfer rules and fixes merged in from nno-nob
> (e.g. prop compounding, and handling genitives in coordinated NP's), and
> various disambiguation and vocabulary updates.
>
>
> -Kevin
>
>
> [1] https://en.wikipedia.org/wiki/Norwegian_News_Agency
> [2]
> https://www.medier24.no/artikler/na-blir-det-nynorsk-bonanza-i-ntb-splitter-ny-robot-oversetter-artikler-automatisk-fra-bokmal/440934
> [3]
> https://framtida.no/2018/08/08/nynorskrobot-ei-god-loysing-for-a-dekke-nynorskprosenten
> [4] http://www.sprakradet.no/upload/Brosjyrer/Ny%20nynorskrettskriving.pdf
> [5] One of which required a bugfix to apertium-transfer
>
> https://github.com/apertium/apertium/commit/542de014a93c96905198f193e0a62a89317fa8a9
> [6] https://github.com/apertium/apertium-packaging/issues/12
>
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


[Apertium-stuff] Released: nno-nob 1.2.0, swe-dan 0.8.0, swe-nor 0.3.0, dan-nor 1.4.0

2019-03-11 Thread Kevin Brubeck Unhammer
God aftan,

New versions of the four Scandinavian pairs are now available from
SourceForge, Github and apertium.org.

These releases come courtesy of Nynorsk pressekontor / NPK (an enclave
of Nynorsk journalists working within NTB, the Norwegian News
Agency[1]), with funding from the Norwegian Ministry of Culture. There
has been some press about the project.[2][3]

NPK have been using apertium-nno-nob in production since fall 2018 –
it's integrated into their translation/editing systems – and we've been
continually improving it with the help of their post-edits and
feedback. The form/spelling/style choices used by nob→nno are now more
modern and uniform (there was a major release of Nynorsk[4] back in
2012, while most style decisions in the translator were made in the
first release back in 2009).

Other major changes to nno-nob:
- 35 new transfer rules[5]
- 248 new lrx rules
- about 42.000 new names and 3.800 new non-names added to bidix
- regression testing by checking that WER does not drop
- lots of work on nob disambiguation
- we now do long-distance adjective congruence
- there's a post-nno.dix to get rid of triple consonants resulting from
  compounding
- compounding happens on proper nouns too now
- genitives are translated not just by preposition-rewriting, but we now
  also have:
  - lists of exceptions where we want to keep genitives
  - rewriting some nouns with relatives
  - rewriting nationalities with adjectives
  - rewriting some abstract nouns into compounds

The project is not yet done, but people have been asking about when the
fruits of it will show up on apertium.org :-)

The other three pairs have also had improvements since last release;
some were also getting pretty bad testvoc-issues due to changes in
dependencies[6], so they get releases too. Apart from testvoc, the pairs
have gotten some transfer rules and fixes merged in from nno-nob
(e.g. prop compounding, and handling genitives in coordinated NP's), and
various disambiguation and vocabulary updates.


-Kevin


[1] https://en.wikipedia.org/wiki/Norwegian_News_Agency
[2] 
https://www.medier24.no/artikler/na-blir-det-nynorsk-bonanza-i-ntb-splitter-ny-robot-oversetter-artikler-automatisk-fra-bokmal/440934
[3] 
https://framtida.no/2018/08/08/nynorskrobot-ei-god-loysing-for-a-dekke-nynorskprosenten
[4] http://www.sprakradet.no/upload/Brosjyrer/Ny%20nynorskrettskriving.pdf
[5] One of which required a bugfix to apertium-transfer

https://github.com/apertium/apertium/commit/542de014a93c96905198f193e0a62a89317fa8a9
[6] https://github.com/apertium/apertium-packaging/issues/12




___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] OpenNMT

2019-03-11 Thread Tommi A Pirinen
On Sun, Mar 10, 2019 at 11:09:19PM +0300, Sevilay Bayatlı wrote:

> Do you know any open neural MT? It is necessary for doing comparison with
> kaz-tur MT.

I found the python version to be the easiest to build a baseline with,
it has step-by-step instructions:

https://github.com/OpenNMT/OpenNMT-py#quickstart

I have not gotten it to produce the bleu points that it should though...

-- 
Doktor Tommi A Pirinen, Computational Linguist,
, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora . CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
.
I tend to follow inline-posting style in desktop e-mail messages.


signature.asc
Description: PGP signature
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] OpenNMT

2019-03-11 Thread Francis Tyers

El 2019-03-10 20:09, Sevilay Bayatlı escribió:

Hi,

Do you know any open neural MT? It is necessary for doing comparison
with  kaz-tur MT.

Thanks in advance,


http://opennmt.net/

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff