[Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-04 Thread Aboelhamd Aly
Hi all,

My proposal  is ready, I am
sorry if it's too bit long.
I am waiting for any questions, reviews or feedback.

Thanks,
Aboelhamd
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-04 Thread Francis Tyers

El 2019-04-05 02:58, Aboelhamd Aly escribió:

Hi all,

My proposal [1] is ready, I am sorry if it's too bit long.
I am waiting for any questions, reviews or feedback.

Thanks,
Aboelhamd

Links:
--
[1] http://wiki.apertium.org/wiki/User:Aboelhamd
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Dear Aboelhamd,

I think my main feedback is that GSOC is incompatible with having a
part-time job.

Other than that I'll leave some comments on GitHub and try to get
back to you here.

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-05 Thread Francis Tyers

El 2019-04-05 13:37, Aboelhamd Aly escribió:

Dear Francis,

I thought that 30-40 hours per week are enough for GSoC and that's no
problem with any other activities, as long as I am able to preserve
that time. But if it's a problem, I will consider leaving the
part-time job when start in phase one.



Dear Aboelhamd,

We count GSOC to be a fulltime job. It is extremely unlikely that
you be selected if you are planning to have other paid employment
at the same time.

Regards,

Fran



___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-05 Thread Sevilay Bayatlı
Hi Aboelhamd,

There is some points in your proposal:

First, I do not think "splitting sentence" is a good idea, each language
has different syntax, how could you know when you should split the sentence.

Second, "substitute yasmet with other method", I think the result will not
be more better if you substituted it with statistical method.

Sincerely,

Sevilay



On Fri, Apr 5, 2019 at 7:41 PM Francis Tyers  wrote:

> El 2019-04-05 13:37, Aboelhamd Aly escribió:
> > Dear Francis,
> >
> > I thought that 30-40 hours per week are enough for GSoC and that's no
> > problem with any other activities, as long as I am able to preserve
> > that time. But if it's a problem, I will consider leaving the
> > part-time job when start in phase one.
> >
>
> Dear Aboelhamd,
>
> We count GSOC to be a fulltime job. It is extremely unlikely that
> you be selected if you are planning to have other paid employment
> at the same time.
>
> Regards,
>
> Fran
>
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-05 Thread Francis Tyers

El 2019-04-05 19:07, Sevilay Bayatlı escribió:

Hi Aboelhamd,

There is some points in your proposal:

First, I do not think "splitting sentence" is a good idea, each
language has different syntax, how could you know when you should
split the sentence.


Apertium works on the concept of a stream of words, so in the runtime
we can't really rely on robust sentence segmentation.

We can often use it, e.g. for training, but if sentence boundary 
detection

were to be included, it would need to be trained, as Sevilay hints at.

Also, I'm not sure how much we would gain from that.


Second, "substitute yasmet with other method", I think the result will
not be more better if you substituted it with statistical method.



Substituting yasmet with a more up to date machine-learning method
might be a worthwhile thing to do. What suggestions do you have?

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-05 Thread Francis Tyers

El 2019-04-05 20:57, Sevilay Bayatlı escribió:

On Fri, 5 Apr 2019, 22:41 Francis Tyers,  wrote:


El 2019-04-05 19:07, Sevilay Bayatlı escribió:

Hi Aboelhamd,

There is some points in your proposal:

First, I do not think "splitting sentence" is a good idea, each
language has different syntax, how could you know when you should
split the sentence.


Apertium works on the concept of a stream of words, so in the
runtime
we can't really rely on robust sentence segmentation.

We can often use it, e.g. for training, but if sentence boundary
detection
were to be included, it would need to be trained, as Sevilay hints
at.

Also, I'm not sure how much we would gain from that.


Second, "substitute yasmet with other method", I think the result

will

not be more better if you substituted it with statistical method.



Substituting yasmet with a more up to date machine-learning method
might be a worthwhile thing to do. What suggestions do you have?

I think first we have to trying the exact method with more than 3
language pairs and then decide  to substitute it or not, because
what is the point of new method if dont achieve gain, then we can
compare  the results of two methods and choose the best one. What do
you think?




Yes, testing it with more language pairs is also a priority.

Fran


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-05 Thread Aboelhamd Aly
Hi Sevilay, hi spectei,

For sentence splitting, I think that we don't need to know neither syntax
nor sentence boundaries of the language.
Also I don't see any necessity for applying it in runtime, as in runtime we
only get the score of each pattern,
where there is no need for splitting. I also had one thought on using
beam-search here as I see it has no effect
and may be I am wrong. We can discuss in it after we close this thread.

We will handle the whole text as one unit and will depend only on the
captured patterns.
Knowing that in the chunker terms, successive patterns that don't share a
transfer rule, are independent.
So by using the lexical form of the text, we match the words with patterns,
then match patterns with rules.
And hence we know which patterns are ambiguous and how much ambiguous rules
they match.

For example if we have text with the following patterns and corresponding
rules numbers:
p1:2  p2:1  p3:6  p4:4  p5:3  p6:5  p7:1  p8:4  p9:4  p10:6  p11:8  p12:5
p13:5  p14:1  p15:3  p16:2

If such text was handled by our old method with generating all the
combinations possible (multiplication of rules numbers),
we would have 82944000 possible combinations, which are not practical at
all to score, and take heavy computations and memory.
And if it is handled by our new method with applying all ambiguous rules of
one pattern while fixing the other patterns at LRLM rule
(addition of rules numbers), we will have just 60 combinations, and not all
of them different, giving drastically low number of combinations,
which may be not so representative.

But if we apply the splitting idea , we will have something in the middle,
that will hopefully avoid the disadvantages of both methods
and benefit from advantages of both, too.
Let's proceed from the start of the text to the end of it, while
maintaining some threshold of say 24000 combinations.
p1 => 2  ,,  p1  p2 => 2  ,,  p1  p2  p3 => 12  ,,  p1  p2  p3  p4 => 48
,,  p1  p2  p3  p4  p5 => 144  ,,
p1  p2  p3  p4  p5  p6 => 720  ,,  p1  p2  p3  p4  p5  p6  p7 => 720
p1  p2  p3  p4  p5  p6  p7 p8 => 2880  ,,  p1  p2  p3  p4  p5  p6  p7  p8
p9 => 11520

And then we stop here, because taking the next pattern will exceed the
threshold.
Hence having our first split, we can now continue our work on it as usual.
But with more -non overwhelming- combinations which would capture more
semantics.
After that, we take the next split and so on.

---

I agree with you, that testing the current method with more than one pair
to know its accuracy is the priority,
and we currently working on it.

---

For an alternative for yasmet, I agree with spectei. Unfortunately, for now
I don't have a solid idea to discuss.
But in the few days, i will try to get one or more ideas to discuss.


On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers  wrote:

> El 2019-04-05 20:57, Sevilay Bayatlı escribió:
> > On Fri, 5 Apr 2019, 22:41 Francis Tyers,  wrote:
> >
> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió:
> >>> Hi Aboelhamd,
> >>>
> >>> There is some points in your proposal:
> >>>
> >>> First, I do not think "splitting sentence" is a good idea, each
> >>> language has different syntax, how could you know when you should
> >>> split the sentence.
> >>
> >> Apertium works on the concept of a stream of words, so in the
> >> runtime
> >> we can't really rely on robust sentence segmentation.
> >>
> >> We can often use it, e.g. for training, but if sentence boundary
> >> detection
> >> were to be included, it would need to be trained, as Sevilay hints
> >> at.
> >>
> >> Also, I'm not sure how much we would gain from that.
> >>
> >>> Second, "substitute yasmet with other method", I think the result
> >> will
> >>> not be more better if you substituted it with statistical method.
> >>>
> >>
> >> Substituting yasmet with a more up to date machine-learning method
> >> might be a worthwhile thing to do. What suggestions do you have?
> >>
> >> I think first we have to trying the exact method with more than 3
> >> language pairs and then decide  to substitute it or not, because
> >> what is the point of new method if dont achieve gain, then we can
> >> compare  the results of two methods and choose the best one. What do
> >> you think?
> >
>
> Yes, testing it with more language pairs is also a priority.
>
> Fran
>
>
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-07 Thread Aboelhamd Aly
Hi all,

I got a not solid yet idea as an alternative to yasmet and max entropy
models.
And it's by using neural networks to give us scores for the ambiguous rules.
But I didn't yet set a formulation for the problem nor the structure of the
inputs, output and even the goal.
As I think there are many formulations that we can adopt.

For example, the most straightforward structure, is to give the network all
the possible combinations
of a sentence translations and let it choose the best one, or give them
weights.
Hence, make the network learns which combinations to choose for a specific
pair.

Another example, is instead of building one network per pair,
we build one network per ambiguous pattern as we did with max entropy
models.
So we give to the network the combinations for that pattern,
and let it assign some weights for the ambiguous rules applied to that
pattern.

And for each structure there are many details and questions to yet answer.

So with that said, I decided to look at some papers to see what others have
done before
to tackle some similar problems or the exact problem, and how some of them
used machine learning
or deep learning to solve these problems, and then try build on them.

Some papers resolution was very specific to the pairs they developed, thus
were not very important to our case. :
1) Resolving Structural Transfer Ambiguity inChinese-to-Korean Machine
Translation 
.(2003)
2) Arabic Machine Translation: A Developmental Perspective

.(2010)

Some other papers tried not to generate ambiguous rules or to minimize the
ambiguity in transfer rules inference, and didn't provide any methods to
resolve the ambiguity in our case. I thought that they may provide some
help, but I think they are far from our topic :
1) Learning Transfer Rules for Machine Translation with Limited Data
.(2005)
2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small Parallel
Corpora .(2009)

Now I am looking into some more recent papers like :
1) Rule Based Machine Translation Combined with Statistical Post Editor for
Japanese to English Patent Translation
.(2007)
2) Machine translation model using inductive logic programming
.(2009)
3) Machine Learning for Hybrid Machine Translation
.(2012)
4) Study and Comparison of Rule-Based and Statistical Catalan-Spanish
Machine Translation Systems

.(2012)
5) Latest trends in hybrid machine translation and its applications
.(2015)
6) Machine Translation: Phrase-Based, Rule-Based and NeuralApproaches with
Linguistic Evaluation
.(2017)
7) A Multitask-Based Neural Machine Translation Model with Part-of-Speech
Tags Integration for Arabic Dialects
.(2018)

And I hope they give me some more insights and thoughts.

--

- So do you have recommendations to other papers that refer to the same
problem ?
- Also about the proposal, I modified it a little bit and share it through
GSoC website as a draft,
 so do you have any last feedback or thoughts about it, or do I just submit
it as a final proposal ?
- Last thing for the coding challenge ( integrating weighted transfer rules
with apertium-transfer ),
 I think it's finished, and I didn't get any feedback or response about it,
also the pull-request is not merged yet with master.


Thanks,
Aboelhamd


On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly 
wrote:

> Hi Sevilay, hi spectei,
>
> For sentence splitting, I think that we don't need to know neither syntax
> nor sentence boundaries of the language.
> Also I don't see any necessity for applying it in runtime, as in runtime
> we only get the score of each pattern,
> where there is no need for splitting. I also had one thought on using
> beam-search here as I see it has no effect
> and may be I am wrong. We can discuss in it after we close this thread.
>
> We will handle the whole text as one unit and will depend only on the
> captured patterns.
> Knowing that in the chunker terms, successive patterns that don't share a
> transfer rule, are independent.
> So by using the lexical form of the text, we match the words with
> patterns, then match patterns with rules.
> And hence we know which patterns are ambiguous and how much ambiguous
> rules they match.
>
> For example if we have text with the following patterns and corresponding
> rules numbers:
> p1:2  p2:1  p3:6  p4:4  p5:3  p6:5  p7:

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-07 Thread Sevilay Bayatlı
hi Aboelhamd,

Your proposal looks good, I found these resource may be will be benefit.




Multi-source *neural translation* 
https://arxiv.org/abs/1601.00710



*Neural machine translation *with extended context

https://arxiv.org/abs/1708.05943

Handling homographs in *neural machine translation*
https://arxiv.org/abs/1708.06510



Sevilay

On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly 
wrote:

> Hi all,
>
> I got a not solid yet idea as an alternative to yasmet and max entropy
> models.
> And it's by using neural networks to give us scores for the ambiguous
> rules.
> But I didn't yet set a formulation for the problem nor the structure of
> the inputs, output and even the goal.
> As I think there are many formulations that we can adopt.
>
> For example, the most straightforward structure, is to give the network
> all the possible combinations
> of a sentence translations and let it choose the best one, or give them
> weights.
> Hence, make the network learns which combinations to choose for a specific
> pair.
>
> Another example, is instead of building one network per pair,
> we build one network per ambiguous pattern as we did with max entropy
> models.
> So we give to the network the combinations for that pattern,
> and let it assign some weights for the ambiguous rules applied to that
> pattern.
>
> And for each structure there are many details and questions to yet answer.
>
> So with that said, I decided to look at some papers to see what others
> have done before
> to tackle some similar problems or the exact problem, and how some of them
> used machine learning
> or deep learning to solve these problems, and then try build on them.
>
> Some papers resolution was very specific to the pairs they developed, thus
> were not very important to our case. :
> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean Machine
> Translation
> .(2003)
> 2) Arabic Machine Translation: A Developmental Perspective
> 
> .(2010)
>
> Some other papers tried not to generate ambiguous rules or to minimize the
> ambiguity in transfer rules inference, and didn't provide any methods to
> resolve the ambiguity in our case. I thought that they may provide some
> help, but I think they are far from our topic :
> 1) Learning Transfer Rules for Machine Translation with Limited Data
> .(2005)
> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small
> Parallel Corpora .(2009)
>
> Now I am looking into some more recent papers like :
> 1) Rule Based Machine Translation Combined with Statistical Post Editor
> for Japanese to English Patent Translation
> .(2007)
> 2) Machine translation model using inductive logic programming
> .(2009)
> 3) Machine Learning for Hybrid Machine Translation
> .(2012)
> 4) Study and Comparison of Rule-Based and Statistical Catalan-Spanish
> Machine Translation Systems
> 
> .(2012)
> 5) Latest trends in hybrid machine translation and its applications
> 
> .(2015)
> 6) Machine Translation: Phrase-Based, Rule-Based and NeuralApproaches
> with Linguistic Evaluation
> .(2017)
> 7) A Multitask-Based Neural Machine Translation Model with Part-of-Speech
> Tags Integration for Arabic Dialects
> .(2018)
>
> And I hope they give me some more insights and thoughts.
>
> --
>
> - So do you have recommendations to other papers that refer to the same
> problem ?
> - Also about the proposal, I modified it a little bit and share it through
> GSoC website as a draft,
>  so do you have any last feedback or thoughts about it, or do I just
> submit it as a final proposal ?
> - Last thing for the coding challenge ( integrating weighted transfer
> rules with apertium-transfer ),
>  I think it's finished, and I didn't get any feedback or response about
> it, also the pull-request is not merged yet with master.
>
>
> Thanks,
> Aboelhamd
>
>
> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly 
> wrote:
>
>> Hi Sevilay, hi spectei,
>>
>> For sentence splitting, I think that we don't need to know neither syntax
>> nor sentence boundaries of the language.
>> Also I don't see any necessity for applying it in runtime, as in runtime
>> we only get the score of 

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-07 Thread Aboelhamd Aly
Thanks Sevilay for your feedback, and thanks for the resources.

On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı  hi Aboelhamd,
>
> Your proposal looks good, I found these resource may be will be benefit.
>
>
>
> 
> Multi-source *neural translation* 
> https://arxiv.org/abs/1601.00710
>
>
> 
> *Neural machine translation *with extended context
> 
> https://arxiv.org/abs/1708.05943
>
> Handling homographs in *neural machine translation*
> https://arxiv.org/abs/1708.06510
>
>
>
> Sevilay
>
> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly 
> wrote:
>
>> Hi all,
>>
>> I got a not solid yet idea as an alternative to yasmet and max entropy
>> models.
>> And it's by using neural networks to give us scores for the ambiguous
>> rules.
>> But I didn't yet set a formulation for the problem nor the structure of
>> the inputs, output and even the goal.
>> As I think there are many formulations that we can adopt.
>>
>> For example, the most straightforward structure, is to give the network
>> all the possible combinations
>> of a sentence translations and let it choose the best one, or give them
>> weights.
>> Hence, make the network learns which combinations to choose for a
>> specific pair.
>>
>> Another example, is instead of building one network per pair,
>> we build one network per ambiguous pattern as we did with max entropy
>> models.
>> So we give to the network the combinations for that pattern,
>> and let it assign some weights for the ambiguous rules applied to that
>> pattern.
>>
>> And for each structure there are many details and questions to yet answer.
>>
>> So with that said, I decided to look at some papers to see what others
>> have done before
>> to tackle some similar problems or the exact problem, and how some of
>> them used machine learning
>> or deep learning to solve these problems, and then try build on them.
>>
>> Some papers resolution was very specific to the pairs they developed,
>> thus were not very important to our case. :
>> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean Machine
>> Translation
>> .(2003)
>> 2) Arabic Machine Translation: A Developmental Perspective
>> 
>> .(2010)
>>
>> Some other papers tried not to generate ambiguous rules or to minimize
>> the ambiguity in transfer rules inference, and didn't provide any methods
>> to resolve the ambiguity in our case. I thought that they may provide some
>> help, but I think they are far from our topic :
>> 1) Learning Transfer Rules for Machine Translation with Limited Data
>> .(2005)
>> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small
>> Parallel Corpora .(2009)
>>
>> Now I am looking into some more recent papers like :
>> 1) Rule Based Machine Translation Combined with Statistical Post Editor
>> for Japanese to English Patent Translation
>> .(2007)
>> 2) Machine translation model using inductive logic programming
>> .(2009)
>> 3) Machine Learning for Hybrid Machine Translation
>> .(2012)
>> 4) Study and Comparison of Rule-Based and Statistical Catalan-Spanish
>> Machine Translation Systems
>> 
>> .(2012)
>> 5) Latest trends in hybrid machine translation and its applications
>> 
>> .(2015)
>> 6) Machine Translation: Phrase-Based, Rule-Based and NeuralApproaches
>> with Linguistic Evaluation
>> .(2017)
>> 7) A Multitask-Based Neural Machine Translation Model with
>> Part-of-Speech Tags Integration for Arabic Dialects
>> .(2018)
>>
>> And I hope they give me some more insights and thoughts.
>>
>> --
>>
>> - So do you have recommendations to other papers that refer to the same
>> problem ?
>> - Also about the proposal, I modified it a little bit and share it
>> through GSoC website as a draft,
>>  so do you have any last feedback or thoughts about it, or do I just
>> submit it as a final proposal ?
>> - Last thing for the coding challenge ( integrating weighted transfer
>> rules with apertium-transfer ),
>>  I think it's finished, and I didn't get any feedback or response about
>> it, also the pull-request is not merged yet with master.
>>
>>
>> Thanks,
>> Aboelhamd
>>
>>
>> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly <
>> aboelhamd.abotr...@gmai

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-19 Thread Aboelhamd Aly
Hi Sevilay. Hi Francis,

Unfortunately, Sevilay reported that the evaluation results of kaz-tur and
spa-eng pairs were very bad with 30% of the tested sentences were good,
compared to apertium LRLM resolution.
So we discussed what to do next and it is to utilize the breakthrough of
deep learning neural networks in NLP and especially machine translations.
Also we discussed about using different values of n more than 5 in the
already used n-gram language model. And to evaluate the result of
increasing value of n, which could give us some more insights in what to do
next and how to do it.

Since I have an intro to deep learning subject this term in college, I
waited this past two weeks to be introduced to the application of deep
learning in NLP and MTs.
Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs) and
why to use it instead of the standard network in NLP, beside understanding
the different architectures of it and the math done in the forward and back
propagation.
Also besides knowing how to build a simple language model, and avoiding the
problem of (vanishing gradient) leading to not capturing long dependencies,
by using Gated Recurrent Units (GRus) and Long Short Term Memory (LSTM)
network.

For next step, we will consider working only on the language model and to
let the max entropy part for later discussions.
So along with trying different n values in the n-gram language model and
evaluate the results, I will try either to use a ready RNNLM or to build a
new one from scratch from what I learnt so far. Honestly I prefer the last
choice because it will increase my experience in applying what I have
learnt.
In last 2 weeks I implemented RNNs with GRUs and LSTM and also implemented
a character based language model as two assignments and they were very fun
to do. So implementing a RNNs word based character LM will not take much
time, though it may not be close to the state-of-the-art model and this is
the disadvantage of it.

Using NNLM instead of the n-gram LM has these possible advantages :
- Automatically learn such syntactic and semantic features.
- Overcome the curse of dimensionality by generating better generalizations.

--

I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
that different as Sevilay pointed out in our discussion.
I knew that NNLM is better than statistical one, also that using machine
learning instead of maximum entropy model will give better performance.
*But* the evaluation results were very very disappointing, unexpected and
illogical, so I thought there might be a bug in the code.
And after some search, I found that I did a very very silly *mistake* in
normalizing the LM scores. As the scores are log base 10 of the sentence
probability, then the higher in magnitude has the lower probability, but I
what I did was the inverse of that, and that was the cause of the very bad
results.

I am fixing this now and then will re-evaluate the results with Sevilay.

Regards,
Aboelhamd


On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly 
wrote:

> Thanks Sevilay for your feedback, and thanks for the resources.
>
> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı 
>> hi Aboelhamd,
>>
>> Your proposal looks good, I found these resource may be will be benefit.
>>
>>
>>
>> 
>> Multi-source *neural translation* 
>> https://arxiv.org/abs/1601.00710
>>
>>
>> 
>> *Neural machine translation *with extended context
>> 
>> https://arxiv.org/abs/1708.05943
>>
>> Handling homographs in *neural machine translation*
>> https://arxiv.org/abs/1708.06510
>>
>>
>>
>> Sevilay
>>
>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I got a not solid yet idea as an alternative to yasmet and max entropy
>>> models.
>>> And it's by using neural networks to give us scores for the ambiguous
>>> rules.
>>> But I didn't yet set a formulation for the problem nor the structure of
>>> the inputs, output and even the goal.
>>> As I think there are many formulations that we can adopt.
>>>
>>> For example, the most straightforward structure, is to give the network
>>> all the possible combinations
>>> of a sentence translations and let it choose the best one, or give them
>>> weights.
>>> Hence, make the network learns which combinations to choose for a
>>> specific pair.
>>>
>>> Another example, is instead of building one network per pair,
>>> we build one network per ambiguous pattern as we did with max entropy
>>> models.
>>> So we give to the network the combinations for that pattern,
>>> and let it assign some weights for the ambiguous rules applied to that
>>> pattern.
>>>
>>> And for each structure there are many details and questions to yet
>>> answer.
>>>
>>> So with that said, I decided to look at some papers to see 

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-19 Thread Aboelhamd Aly
According to the timeline I put in my proposal, I am supposed to start
phase 1 today.
I want to know which procedures to do to document my work, day by day and
week by week.
Do I create a page in wiki to save my progress ?
Or is there another way ?

Thanks

On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly 
wrote:

> Hi Sevilay. Hi Francis,
>
> Unfortunately, Sevilay reported that the evaluation results of kaz-tur and
> spa-eng pairs were very bad with 30% of the tested sentences were good,
> compared to apertium LRLM resolution.
> So we discussed what to do next and it is to utilize the breakthrough of
> deep learning neural networks in NLP and especially machine translations.
> Also we discussed about using different values of n more than 5 in the
> already used n-gram language model. And to evaluate the result of
> increasing value of n, which could give us some more insights in what to do
> next and how to do it.
>
> Since I have an intro to deep learning subject this term in college, I
> waited this past two weeks to be introduced to the application of deep
> learning in NLP and MTs.
> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
> and why to use it instead of the standard network in NLP, beside
> understanding the different architectures of it and the math done in the
> forward and back propagation.
> Also besides knowing how to build a simple language model, and avoiding
> the problem of (vanishing gradient) leading to not capturing long
> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
> Memory (LSTM) network.
>
> For next step, we will consider working only on the language model and to
> let the max entropy part for later discussions.
> So along with trying different n values in the n-gram language model and
> evaluate the results, I will try either to use a ready RNNLM or to build a
> new one from scratch from what I learnt so far. Honestly I prefer the last
> choice because it will increase my experience in applying what I have
> learnt.
> In last 2 weeks I implemented RNNs with GRUs and LSTM and also implemented
> a character based language model as two assignments and they were very fun
> to do. So implementing a RNNs word based character LM will not take much
> time, though it may not be close to the state-of-the-art model and this is
> the disadvantage of it.
>
> Using NNLM instead of the n-gram LM has these possible advantages :
> - Automatically learn such syntactic and semantic features.
> - Overcome the curse of dimensionality by generating better
> generalizations.
>
> --
>
> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
> that different as Sevilay pointed out in our discussion.
> I knew that NNLM is better than statistical one, also that using machine
> learning instead of maximum entropy model will give better performance.
> *But* the evaluation results were very very disappointing, unexpected and
> illogical, so I thought there might be a bug in the code.
> And after some search, I found that I did a very very silly *mistake* in
> normalizing the LM scores. As the scores are log base 10 of the sentence
> probability, then the higher in magnitude has the lower probability, but I
> what I did was the inverse of that, and that was the cause of the very bad
> results.
>
> I am fixing this now and then will re-evaluate the results with Sevilay.
>
> Regards,
> Aboelhamd
>
>
> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly 
> wrote:
>
>> Thanks Sevilay for your feedback, and thanks for the resources.
>>
>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı > wrote:
>>
>>> hi Aboelhamd,
>>>
>>> Your proposal looks good, I found these resource may be will be benefit.
>>>
>>>
>>>
>>> 
>>> Multi-source *neural translation* 
>>> https://arxiv.org/abs/1601.00710
>>>
>>>
>>> 
>>> *Neural machine translation *with extended context
>>> 
>>> https://arxiv.org/abs/1708.05943
>>>
>>> Handling homographs in *neural machine translation*
>>> https://arxiv.org/abs/1708.06510
>>>
>>>
>>>
>>> Sevilay
>>>
>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi all,

 I got a not solid yet idea as an alternative to yasmet and max entropy
 models.
 And it's by using neural networks to give us scores for the ambiguous
 rules.
 But I didn't yet set a formulation for the problem nor the structure of
 the inputs, output and even the goal.
 As I think there are many formulations that we can adopt.

 For example, the most straightforward structure, is to give the network
 all the possible combinations
 of a sentence translations and let it choose the best one, or give them
 weights.
 Hence, make the network learns which combinations to choose f

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Hi,

I am uploading the summary of each day of work in this wiki page
.
Please, take a look and let me know if there is something else I could do
instead.

Thanks.

On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly 
wrote:

> According to the timeline I put in my proposal, I am supposed to start
> phase 1 today.
> I want to know which procedures to do to document my work, day by day and
> week by week.
> Do I create a page in wiki to save my progress ?
> Or is there another way ?
>
> Thanks
>
> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay. Hi Francis,
>>
>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>> compared to apertium LRLM resolution.
>> So we discussed what to do next and it is to utilize the breakthrough of
>> deep learning neural networks in NLP and especially machine translations.
>> Also we discussed about using different values of n more than 5 in the
>> already used n-gram language model. And to evaluate the result of
>> increasing value of n, which could give us some more insights in what to do
>> next and how to do it.
>>
>> Since I have an intro to deep learning subject this term in college, I
>> waited this past two weeks to be introduced to the application of deep
>> learning in NLP and MTs.
>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>> and why to use it instead of the standard network in NLP, beside
>> understanding the different architectures of it and the math done in the
>> forward and back propagation.
>> Also besides knowing how to build a simple language model, and avoiding
>> the problem of (vanishing gradient) leading to not capturing long
>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>> Memory (LSTM) network.
>>
>> For next step, we will consider working only on the language model and to
>> let the max entropy part for later discussions.
>> So along with trying different n values in the n-gram language model and
>> evaluate the results, I will try either to use a ready RNNLM or to build a
>> new one from scratch from what I learnt so far. Honestly I prefer the last
>> choice because it will increase my experience in applying what I have
>> learnt.
>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>> implemented a character based language model as two assignments and they
>> were very fun to do. So implementing a RNNs word based character LM will
>> not take much time, though it may not be close to the state-of-the-art
>> model and this is the disadvantage of it.
>>
>> Using NNLM instead of the n-gram LM has these possible advantages :
>> - Automatically learn such syntactic and semantic features.
>> - Overcome the curse of dimensionality by generating better
>> generalizations.
>>
>> --
>>
>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>> that different as Sevilay pointed out in our discussion.
>> I knew that NNLM is better than statistical one, also that using machine
>> learning instead of maximum entropy model will give better performance.
>> *But* the evaluation results were very very disappointing, unexpected
>> and illogical, so I thought there might be a bug in the code.
>> And after some search, I found that I did a very very silly *mistake* in
>> normalizing the LM scores. As the scores are log base 10 of the sentence
>> probability, then the higher in magnitude has the lower probability, but I
>> what I did was the inverse of that, and that was the cause of the very bad
>> results.
>>
>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>
>> Regards,
>> Aboelhamd
>>
>>
>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Thanks Sevilay for your feedback, and thanks for the resources.
>>>
>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >> wrote:
>>>
 hi Aboelhamd,

 Your proposal looks good, I found these resource may be will be benefit.



 
 Multi-source *neural translation* 
 https://arxiv.org/abs/1601.00710


 
 *Neural machine translation *with extended context
 
 https://arxiv.org/abs/1708.05943

 Handling homographs in *neural machine translation*
 https://arxiv.org/abs/1708.06510



 Sevilay

 On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
 aboelhamd.abotr...@gmail.com> wrote:

> Hi all,
>
> I got a not solid yet idea as an alternative to yasmet and max entropy
> models.
> And it's by using neural networks to give us scores for the ambiguous
> rule

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı
Hi Aboelhamd,

For now it is ok to record day by day, but then you can change it week by
week and make it in a table.

Sevilay

On Sun, Apr 21, 2019 at 1:37 PM Aboelhamd Aly 
wrote:

> Hi,
>
> I am uploading the summary of each day of work in this wiki page
> .
> Please, take a look and let me know if there is something else I could do
> instead.
>
> Thanks.
>
> On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> According to the timeline I put in my proposal, I am supposed to start
>> phase 1 today.
>> I want to know which procedures to do to document my work, day by day and
>> week by week.
>> Do I create a page in wiki to save my progress ?
>> Or is there another way ?
>>
>> Thanks
>>
>> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi Sevilay. Hi Francis,
>>>
>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>> compared to apertium LRLM resolution.
>>> So we discussed what to do next and it is to utilize the breakthrough of
>>> deep learning neural networks in NLP and especially machine translations.
>>> Also we discussed about using different values of n more than 5 in the
>>> already used n-gram language model. And to evaluate the result of
>>> increasing value of n, which could give us some more insights in what to do
>>> next and how to do it.
>>>
>>> Since I have an intro to deep learning subject this term in college, I
>>> waited this past two weeks to be introduced to the application of deep
>>> learning in NLP and MTs.
>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>> and why to use it instead of the standard network in NLP, beside
>>> understanding the different architectures of it and the math done in the
>>> forward and back propagation.
>>> Also besides knowing how to build a simple language model, and avoiding
>>> the problem of (vanishing gradient) leading to not capturing long
>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>> Memory (LSTM) network.
>>>
>>> For next step, we will consider working only on the language model and
>>> to let the max entropy part for later discussions.
>>> So along with trying different n values in the n-gram language model and
>>> evaluate the results, I will try either to use a ready RNNLM or to build a
>>> new one from scratch from what I learnt so far. Honestly I prefer the last
>>> choice because it will increase my experience in applying what I have
>>> learnt.
>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>> implemented a character based language model as two assignments and they
>>> were very fun to do. So implementing a RNNs word based character LM will
>>> not take much time, though it may not be close to the state-of-the-art
>>> model and this is the disadvantage of it.
>>>
>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>> - Automatically learn such syntactic and semantic features.
>>> - Overcome the curse of dimensionality by generating better
>>> generalizations.
>>>
>>> --
>>>
>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>> that different as Sevilay pointed out in our discussion.
>>> I knew that NNLM is better than statistical one, also that using machine
>>> learning instead of maximum entropy model will give better performance.
>>> *But* the evaluation results were very very disappointing, unexpected
>>> and illogical, so I thought there might be a bug in the code.
>>> And after some search, I found that I did a very very silly *mistake*
>>> in normalizing the LM scores. As the scores are log base 10 of the sentence
>>> probability, then the higher in magnitude has the lower probability, but I
>>> what I did was the inverse of that, and that was the cause of the very bad
>>> results.
>>>
>>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>>
>>> Regards,
>>> Aboelhamd
>>>
>>>
>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Thanks Sevilay for your feedback, and thanks for the resources.

 On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >>> wrote:

> hi Aboelhamd,
>
> Your proposal looks good, I found these resource may be will be
> benefit.
>
>
>
> 
> Multi-source *neural translation* 
> https://arxiv.org/abs/1601.00710
>
>
> 
> *Neural machine translation *with extended context
> 
> https://arxiv.org/abs/1708.05943
>
> Handling homographs in *neural machine translation*
> https://arxiv.org/abs/

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Hi Sevilay,

I got it, ok.

Thanks.

On Sun, 21 Apr 2019, 12:50 Sevilay Bayatlı  Hi Aboelhamd,
>
> For now it is ok to record day by day, but then you can change it week by
> week and make it in a table.
>
> Sevilay
>
> On Sun, Apr 21, 2019 at 1:37 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi,
>>
>> I am uploading the summary of each day of work in this wiki page
>> .
>> Please, take a look and let me know if there is something else I could do
>> instead.
>>
>> Thanks.
>>
>> On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> According to the timeline I put in my proposal, I am supposed to start
>>> phase 1 today.
>>> I want to know which procedures to do to document my work, day by day
>>> and week by week.
>>> Do I create a page in wiki to save my progress ?
>>> Or is there another way ?
>>>
>>> Thanks
>>>
>>> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi Sevilay. Hi Francis,

 Unfortunately, Sevilay reported that the evaluation results of kaz-tur
 and spa-eng pairs were very bad with 30% of the tested sentences were good,
 compared to apertium LRLM resolution.
 So we discussed what to do next and it is to utilize the breakthrough
 of deep learning neural networks in NLP and especially machine 
 translations.
 Also we discussed about using different values of n more than 5 in the
 already used n-gram language model. And to evaluate the result of
 increasing value of n, which could give us some more insights in what to do
 next and how to do it.

 Since I have an intro to deep learning subject this term in college, I
 waited this past two weeks to be introduced to the application of deep
 learning in NLP and MTs.
 Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
 and why to use it instead of the standard network in NLP, beside
 understanding the different architectures of it and the math done in the
 forward and back propagation.
 Also besides knowing how to build a simple language model, and avoiding
 the problem of (vanishing gradient) leading to not capturing long
 dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
 Memory (LSTM) network.

 For next step, we will consider working only on the language model and
 to let the max entropy part for later discussions.
 So along with trying different n values in the n-gram language model
 and evaluate the results, I will try either to use a ready RNNLM or to
 build a new one from scratch from what I learnt so far. Honestly I prefer
 the last choice because it will increase my experience in applying what I
 have learnt.
 In last 2 weeks I implemented RNNs with GRUs and LSTM and also
 implemented a character based language model as two assignments and they
 were very fun to do. So implementing a RNNs word based character LM will
 not take much time, though it may not be close to the state-of-the-art
 model and this is the disadvantage of it.

 Using NNLM instead of the n-gram LM has these possible advantages :
 - Automatically learn such syntactic and semantic features.
 - Overcome the curse of dimensionality by generating better
 generalizations.

 --

 I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
 that different as Sevilay pointed out in our discussion.
 I knew that NNLM is better than statistical one, also that using
 machine learning instead of maximum entropy model will give better
 performance.
 *But* the evaluation results were very very disappointing, unexpected
 and illogical, so I thought there might be a bug in the code.
 And after some search, I found that I did a very very silly *mistake*
 in normalizing the LM scores. As the scores are log base 10 of the sentence
 probability, then the higher in magnitude has the lower probability, but I
 what I did was the inverse of that, and that was the cause of the very bad
 results.

 I am fixing this now and then will re-evaluate the results with Sevilay.

 Regards,
 Aboelhamd


 On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
 aboelhamd.abotr...@gmail.com> wrote:

> Thanks Sevilay for your feedback, and thanks for the resources.
>
> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı  wrote:
>
>> hi Aboelhamd,
>>
>> Your proposal looks good, I found these resource may be will be
>> benefit.
>>
>>
>>
>> 
>> Multi-source *neural translation* 
>> https://arxiv.org/abs/1601.00710
>>
>>
>> 
>> *Neural mach

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı
Aboelhamd,

I think using Gated Recurrent Units (GRUS)  instead of n-gram language
model is a good idea, probably we can achieve more gain,  however, the most
important part here is changing the maximum entropy.

Lets see, what Fran thinks about it.

Regards,

Sevilay




On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly 
wrote:

> Hi Sevilay. Hi Francis,
>
> Unfortunately, Sevilay reported that the evaluation results of kaz-tur and
> spa-eng pairs were very bad with 30% of the tested sentences were good,
> compared to apertium LRLM resolution.
> So we discussed what to do next and it is to utilize the breakthrough of
> deep learning neural networks in NLP and especially machine translations.
> Also we discussed about using different values of n more than 5 in the
> already used n-gram language model. And to evaluate the result of
> increasing value of n, which could give us some more insights in what to do
> next and how to do it.
>
> Since I have an intro to deep learning subject this term in college, I
> waited this past two weeks to be introduced to the application of deep
> learning in NLP and MTs.
> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
> and why to use it instead of the standard network in NLP, beside
> understanding the different architectures of it and the math done in the
> forward and back propagation.
> Also besides knowing how to build a simple language model, and avoiding
> the problem of (vanishing gradient) leading to not capturing long
> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
> Memory (LSTM) network.
>
> For next step, we will consider working only on the language model and to
> let the max entropy part for later discussions.
> So along with trying different n values in the n-gram language model and
> evaluate the results, I will try either to use a ready RNNLM or to build a
> new one from scratch from what I learnt so far. Honestly I prefer the last
> choice because it will increase my experience in applying what I have
> learnt.
> In last 2 weeks I implemented RNNs with GRUs and LSTM and also implemented
> a character based language model as two assignments and they were very fun
> to do. So implementing a RNNs word based character LM will not take much
> time, though it may not be close to the state-of-the-art model and this is
> the disadvantage of it.
>
> Using NNLM instead of the n-gram LM has these possible advantages :
> - Automatically learn such syntactic and semantic features.
> - Overcome the curse of dimensionality by generating better
> generalizations.
>
> --
>
> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
> that different as Sevilay pointed out in our discussion.
> I knew that NNLM is better than statistical one, also that using machine
> learning instead of maximum entropy model will give better performance.
> *But* the evaluation results were very very disappointing, unexpected and
> illogical, so I thought there might be a bug in the code.
> And after some search, I found that I did a very very silly *mistake* in
> normalizing the LM scores. As the scores are log base 10 of the sentence
> probability, then the higher in magnitude has the lower probability, but I
> what I did was the inverse of that, and that was the cause of the very bad
> results.
>
> I am fixing this now and then will re-evaluate the results with Sevilay.
>
> Regards,
> Aboelhamd
>
>
> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly 
> wrote:
>
>> Thanks Sevilay for your feedback, and thanks for the resources.
>>
>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı > wrote:
>>
>>> hi Aboelhamd,
>>>
>>> Your proposal looks good, I found these resource may be will be benefit.
>>>
>>>
>>>
>>> 
>>> Multi-source *neural translation* 
>>> https://arxiv.org/abs/1601.00710
>>>
>>>
>>> 
>>> *Neural machine translation *with extended context
>>> 
>>> https://arxiv.org/abs/1708.05943
>>>
>>> Handling homographs in *neural machine translation*
>>> https://arxiv.org/abs/1708.06510
>>>
>>>
>>>
>>> Sevilay
>>>
>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi all,

 I got a not solid yet idea as an alternative to yasmet and max entropy
 models.
 And it's by using neural networks to give us scores for the ambiguous
 rules.
 But I didn't yet set a formulation for the problem nor the structure of
 the inputs, output and even the goal.
 As I think there are many formulations that we can adopt.

 For example, the most straightforward structure, is to give the network
 all the possible combinations
 of a sentence translations and let it choose the best one, or give them
 weights.
 Hence, make the network learns which combinat

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Hi Sevilay,

I think a new language model that could distinguish the best ambiguous
combination/s of a translation, would eliminate our need to max entropy
model or any other method.
But is that the case with RNNs LM, I don't know yet.
But for now, do you agree that we need to change the LM first ? or you
prefer going straight to an alternative method for max entropy ? and do you
have any idea for such alternative method ?
In my opinion, I think fixing all the bugs, evaluating our current system,
then changing n-gram to RNNs, is the prior plan for the next two weeks or
so.
After that we can focus the research on what's next, if the accuracy is not
good enough or there is a room for improvement.
Do you agree with this ?

Regards,
Aboelhamd

On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı 
wrote:

> Aboelhamd,
>
> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
> model is a good idea, probably we can achieve more gain,  however, the most
> important part here is changing the maximum entropy.
>
> Lets see, what Fran thinks about it.
>
> Regards,
>
> Sevilay
>
>
>
>
> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay. Hi Francis,
>>
>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>> compared to apertium LRLM resolution.
>> So we discussed what to do next and it is to utilize the breakthrough of
>> deep learning neural networks in NLP and especially machine translations.
>> Also we discussed about using different values of n more than 5 in the
>> already used n-gram language model. And to evaluate the result of
>> increasing value of n, which could give us some more insights in what to do
>> next and how to do it.
>>
>> Since I have an intro to deep learning subject this term in college, I
>> waited this past two weeks to be introduced to the application of deep
>> learning in NLP and MTs.
>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>> and why to use it instead of the standard network in NLP, beside
>> understanding the different architectures of it and the math done in the
>> forward and back propagation.
>> Also besides knowing how to build a simple language model, and avoiding
>> the problem of (vanishing gradient) leading to not capturing long
>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>> Memory (LSTM) network.
>>
>> For next step, we will consider working only on the language model and to
>> let the max entropy part for later discussions.
>> So along with trying different n values in the n-gram language model and
>> evaluate the results, I will try either to use a ready RNNLM or to build a
>> new one from scratch from what I learnt so far. Honestly I prefer the last
>> choice because it will increase my experience in applying what I have
>> learnt.
>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>> implemented a character based language model as two assignments and they
>> were very fun to do. So implementing a RNNs word based character LM will
>> not take much time, though it may not be close to the state-of-the-art
>> model and this is the disadvantage of it.
>>
>> Using NNLM instead of the n-gram LM has these possible advantages :
>> - Automatically learn such syntactic and semantic features.
>> - Overcome the curse of dimensionality by generating better
>> generalizations.
>>
>> --
>>
>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>> that different as Sevilay pointed out in our discussion.
>> I knew that NNLM is better than statistical one, also that using machine
>> learning instead of maximum entropy model will give better performance.
>> *But* the evaluation results were very very disappointing, unexpected
>> and illogical, so I thought there might be a bug in the code.
>> And after some search, I found that I did a very very silly *mistake* in
>> normalizing the LM scores. As the scores are log base 10 of the sentence
>> probability, then the higher in magnitude has the lower probability, but I
>> what I did was the inverse of that, and that was the cause of the very bad
>> results.
>>
>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>
>> Regards,
>> Aboelhamd
>>
>>
>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Thanks Sevilay for your feedback, and thanks for the resources.
>>>
>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >> wrote:
>>>
 hi Aboelhamd,

 Your proposal looks good, I found these resource may be will be benefit.



 
 Multi-source *neural translation* 
 https://arxiv.org/abs/1601.00710


 
 *Neural machine translation *with ext

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı
I agree with changing n-gram LM, but with which one RNN or GRU?  As I see
from literature GRU has more advantages than RNN.

Sevilay

On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly 
wrote:

> Hi Sevilay,
>
> I think a new language model that could distinguish the best ambiguous
> combination/s of a translation, would eliminate our need to max entropy
> model or any other method.
> But is that the case with RNNs LM, I don't know yet.
> But for now, do you agree that we need to change the LM first ? or you
> prefer going straight to an alternative method for max entropy ? and do you
> have any idea for such alternative method ?
> In my opinion, I think fixing all the bugs, evaluating our current system,
> then changing n-gram to RNNs, is the prior plan for the next two weeks or
> so.
> After that we can focus the research on what's next, if the accuracy is
> not good enough or there is a room for improvement.
> Do you agree with this ?
>
> Regards,
> Aboelhamd
>
> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı 
> wrote:
>
>> Aboelhamd,
>>
>> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
>> model is a good idea, probably we can achieve more gain,  however, the most
>> important part here is changing the maximum entropy.
>>
>> Lets see, what Fran thinks about it.
>>
>> Regards,
>>
>> Sevilay
>>
>>
>>
>>
>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi Sevilay. Hi Francis,
>>>
>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>> compared to apertium LRLM resolution.
>>> So we discussed what to do next and it is to utilize the breakthrough of
>>> deep learning neural networks in NLP and especially machine translations.
>>> Also we discussed about using different values of n more than 5 in the
>>> already used n-gram language model. And to evaluate the result of
>>> increasing value of n, which could give us some more insights in what to do
>>> next and how to do it.
>>>
>>> Since I have an intro to deep learning subject this term in college, I
>>> waited this past two weeks to be introduced to the application of deep
>>> learning in NLP and MTs.
>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>> and why to use it instead of the standard network in NLP, beside
>>> understanding the different architectures of it and the math done in the
>>> forward and back propagation.
>>> Also besides knowing how to build a simple language model, and avoiding
>>> the problem of (vanishing gradient) leading to not capturing long
>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>> Memory (LSTM) network.
>>>
>>> For next step, we will consider working only on the language model and
>>> to let the max entropy part for later discussions.
>>> So along with trying different n values in the n-gram language model and
>>> evaluate the results, I will try either to use a ready RNNLM or to build a
>>> new one from scratch from what I learnt so far. Honestly I prefer the last
>>> choice because it will increase my experience in applying what I have
>>> learnt.
>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>> implemented a character based language model as two assignments and they
>>> were very fun to do. So implementing a RNNs word based character LM will
>>> not take much time, though it may not be close to the state-of-the-art
>>> model and this is the disadvantage of it.
>>>
>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>> - Automatically learn such syntactic and semantic features.
>>> - Overcome the curse of dimensionality by generating better
>>> generalizations.
>>>
>>> --
>>>
>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>> that different as Sevilay pointed out in our discussion.
>>> I knew that NNLM is better than statistical one, also that using machine
>>> learning instead of maximum entropy model will give better performance.
>>> *But* the evaluation results were very very disappointing, unexpected
>>> and illogical, so I thought there might be a bug in the code.
>>> And after some search, I found that I did a very very silly *mistake*
>>> in normalizing the LM scores. As the scores are log base 10 of the sentence
>>> probability, then the higher in magnitude has the lower probability, but I
>>> what I did was the inverse of that, and that was the cause of the very bad
>>> results.
>>>
>>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>>
>>> Regards,
>>> Aboelhamd
>>>
>>>
>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Thanks Sevilay for your feedback, and thanks for the resources.

 On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >>> wrote:

> hi Aboelhamd,
>
> Your propos

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Yes GRUs and LSTM are better than the traditional RNNs. I think we will use
one of them.

On Mon, Apr 22, 2019 at 12:08 AM Sevilay Bayatlı 
wrote:

> I agree with changing n-gram LM, but with which one RNN or GRU?  As I see
> from literature GRU has more advantages than RNN.
>
> Sevilay
>
> On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay,
>>
>> I think a new language model that could distinguish the best ambiguous
>> combination/s of a translation, would eliminate our need to max entropy
>> model or any other method.
>> But is that the case with RNNs LM, I don't know yet.
>> But for now, do you agree that we need to change the LM first ? or you
>> prefer going straight to an alternative method for max entropy ? and do you
>> have any idea for such alternative method ?
>> In my opinion, I think fixing all the bugs, evaluating our current
>> system, then changing n-gram to RNNs, is the prior plan for the next two
>> weeks or so.
>> After that we can focus the research on what's next, if the accuracy is
>> not good enough or there is a room for improvement.
>> Do you agree with this ?
>>
>> Regards,
>> Aboelhamd
>>
>> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı <
>> sevilaybaya...@gmail.com> wrote:
>>
>>> Aboelhamd,
>>>
>>> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
>>> model is a good idea, probably we can achieve more gain,  however, the most
>>> important part here is changing the maximum entropy.
>>>
>>> Lets see, what Fran thinks about it.
>>>
>>> Regards,
>>>
>>> Sevilay
>>>
>>>
>>>
>>>
>>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi Sevilay. Hi Francis,

 Unfortunately, Sevilay reported that the evaluation results of kaz-tur
 and spa-eng pairs were very bad with 30% of the tested sentences were good,
 compared to apertium LRLM resolution.
 So we discussed what to do next and it is to utilize the breakthrough
 of deep learning neural networks in NLP and especially machine 
 translations.
 Also we discussed about using different values of n more than 5 in the
 already used n-gram language model. And to evaluate the result of
 increasing value of n, which could give us some more insights in what to do
 next and how to do it.

 Since I have an intro to deep learning subject this term in college, I
 waited this past two weeks to be introduced to the application of deep
 learning in NLP and MTs.
 Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
 and why to use it instead of the standard network in NLP, beside
 understanding the different architectures of it and the math done in the
 forward and back propagation.
 Also besides knowing how to build a simple language model, and avoiding
 the problem of (vanishing gradient) leading to not capturing long
 dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
 Memory (LSTM) network.

 For next step, we will consider working only on the language model and
 to let the max entropy part for later discussions.
 So along with trying different n values in the n-gram language model
 and evaluate the results, I will try either to use a ready RNNLM or to
 build a new one from scratch from what I learnt so far. Honestly I prefer
 the last choice because it will increase my experience in applying what I
 have learnt.
 In last 2 weeks I implemented RNNs with GRUs and LSTM and also
 implemented a character based language model as two assignments and they
 were very fun to do. So implementing a RNNs word based character LM will
 not take much time, though it may not be close to the state-of-the-art
 model and this is the disadvantage of it.

 Using NNLM instead of the n-gram LM has these possible advantages :
 - Automatically learn such syntactic and semantic features.
 - Overcome the curse of dimensionality by generating better
 generalizations.

 --

 I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
 that different as Sevilay pointed out in our discussion.
 I knew that NNLM is better than statistical one, also that using
 machine learning instead of maximum entropy model will give better
 performance.
 *But* the evaluation results were very very disappointing, unexpected
 and illogical, so I thought there might be a bug in the code.
 And after some search, I found that I did a very very silly *mistake*
 in normalizing the LM scores. As the scores are log base 10 of the sentence
 probability, then the higher in magnitude has the lower probability, but I
 what I did was the inverse of that, and that was the cause of the very bad
 results.

 I am fixing this now and then will re-evaluate the results wi