Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Yes GRUs and LSTM are better than the traditional RNNs. I think we will use
one of them.

On Mon, Apr 22, 2019 at 12:08 AM Sevilay Bayatlı 
wrote:

> I agree with changing n-gram LM, but with which one RNN or GRU?  As I see
> from literature GRU has more advantages than RNN.
>
> Sevilay
>
> On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay,
>>
>> I think a new language model that could distinguish the best ambiguous
>> combination/s of a translation, would eliminate our need to max entropy
>> model or any other method.
>> But is that the case with RNNs LM, I don't know yet.
>> But for now, do you agree that we need to change the LM first ? or you
>> prefer going straight to an alternative method for max entropy ? and do you
>> have any idea for such alternative method ?
>> In my opinion, I think fixing all the bugs, evaluating our current
>> system, then changing n-gram to RNNs, is the prior plan for the next two
>> weeks or so.
>> After that we can focus the research on what's next, if the accuracy is
>> not good enough or there is a room for improvement.
>> Do you agree with this ?
>>
>> Regards,
>> Aboelhamd
>>
>> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı <
>> sevilaybaya...@gmail.com> wrote:
>>
>>> Aboelhamd,
>>>
>>> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
>>> model is a good idea, probably we can achieve more gain,  however, the most
>>> important part here is changing the maximum entropy.
>>>
>>> Lets see, what Fran thinks about it.
>>>
>>> Regards,
>>>
>>> Sevilay
>>>
>>>
>>>
>>>
>>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi Sevilay. Hi Francis,

 Unfortunately, Sevilay reported that the evaluation results of kaz-tur
 and spa-eng pairs were very bad with 30% of the tested sentences were good,
 compared to apertium LRLM resolution.
 So we discussed what to do next and it is to utilize the breakthrough
 of deep learning neural networks in NLP and especially machine 
 translations.
 Also we discussed about using different values of n more than 5 in the
 already used n-gram language model. And to evaluate the result of
 increasing value of n, which could give us some more insights in what to do
 next and how to do it.

 Since I have an intro to deep learning subject this term in college, I
 waited this past two weeks to be introduced to the application of deep
 learning in NLP and MTs.
 Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
 and why to use it instead of the standard network in NLP, beside
 understanding the different architectures of it and the math done in the
 forward and back propagation.
 Also besides knowing how to build a simple language model, and avoiding
 the problem of (vanishing gradient) leading to not capturing long
 dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
 Memory (LSTM) network.

 For next step, we will consider working only on the language model and
 to let the max entropy part for later discussions.
 So along with trying different n values in the n-gram language model
 and evaluate the results, I will try either to use a ready RNNLM or to
 build a new one from scratch from what I learnt so far. Honestly I prefer
 the last choice because it will increase my experience in applying what I
 have learnt.
 In last 2 weeks I implemented RNNs with GRUs and LSTM and also
 implemented a character based language model as two assignments and they
 were very fun to do. So implementing a RNNs word based character LM will
 not take much time, though it may not be close to the state-of-the-art
 model and this is the disadvantage of it.

 Using NNLM instead of the n-gram LM has these possible advantages :
 - Automatically learn such syntactic and semantic features.
 - Overcome the curse of dimensionality by generating better
 generalizations.

 --

 I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
 that different as Sevilay pointed out in our discussion.
 I knew that NNLM is better than statistical one, also that using
 machine learning instead of maximum entropy model will give better
 performance.
 *But* the evaluation results were very very disappointing, unexpected
 and illogical, so I thought there might be a bug in the code.
 And after some search, I found that I did a very very silly *mistake*
 in normalizing the LM scores. As the scores are log base 10 of the sentence
 probability, then the higher in magnitude has the lower probability, but I
 what I did was the inverse of that, and that was the cause of the very bad
 results.

 I am fixing this now and then will re-evaluate the results wi

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı
I agree with changing n-gram LM, but with which one RNN or GRU?  As I see
from literature GRU has more advantages than RNN.

Sevilay

On Mon, Apr 22, 2019 at 12:09 AM Aboelhamd Aly 
wrote:

> Hi Sevilay,
>
> I think a new language model that could distinguish the best ambiguous
> combination/s of a translation, would eliminate our need to max entropy
> model or any other method.
> But is that the case with RNNs LM, I don't know yet.
> But for now, do you agree that we need to change the LM first ? or you
> prefer going straight to an alternative method for max entropy ? and do you
> have any idea for such alternative method ?
> In my opinion, I think fixing all the bugs, evaluating our current system,
> then changing n-gram to RNNs, is the prior plan for the next two weeks or
> so.
> After that we can focus the research on what's next, if the accuracy is
> not good enough or there is a room for improvement.
> Do you agree with this ?
>
> Regards,
> Aboelhamd
>
> On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı 
> wrote:
>
>> Aboelhamd,
>>
>> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
>> model is a good idea, probably we can achieve more gain,  however, the most
>> important part here is changing the maximum entropy.
>>
>> Lets see, what Fran thinks about it.
>>
>> Regards,
>>
>> Sevilay
>>
>>
>>
>>
>> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi Sevilay. Hi Francis,
>>>
>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>> compared to apertium LRLM resolution.
>>> So we discussed what to do next and it is to utilize the breakthrough of
>>> deep learning neural networks in NLP and especially machine translations.
>>> Also we discussed about using different values of n more than 5 in the
>>> already used n-gram language model. And to evaluate the result of
>>> increasing value of n, which could give us some more insights in what to do
>>> next and how to do it.
>>>
>>> Since I have an intro to deep learning subject this term in college, I
>>> waited this past two weeks to be introduced to the application of deep
>>> learning in NLP and MTs.
>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>> and why to use it instead of the standard network in NLP, beside
>>> understanding the different architectures of it and the math done in the
>>> forward and back propagation.
>>> Also besides knowing how to build a simple language model, and avoiding
>>> the problem of (vanishing gradient) leading to not capturing long
>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>> Memory (LSTM) network.
>>>
>>> For next step, we will consider working only on the language model and
>>> to let the max entropy part for later discussions.
>>> So along with trying different n values in the n-gram language model and
>>> evaluate the results, I will try either to use a ready RNNLM or to build a
>>> new one from scratch from what I learnt so far. Honestly I prefer the last
>>> choice because it will increase my experience in applying what I have
>>> learnt.
>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>> implemented a character based language model as two assignments and they
>>> were very fun to do. So implementing a RNNs word based character LM will
>>> not take much time, though it may not be close to the state-of-the-art
>>> model and this is the disadvantage of it.
>>>
>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>> - Automatically learn such syntactic and semantic features.
>>> - Overcome the curse of dimensionality by generating better
>>> generalizations.
>>>
>>> --
>>>
>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>> that different as Sevilay pointed out in our discussion.
>>> I knew that NNLM is better than statistical one, also that using machine
>>> learning instead of maximum entropy model will give better performance.
>>> *But* the evaluation results were very very disappointing, unexpected
>>> and illogical, so I thought there might be a bug in the code.
>>> And after some search, I found that I did a very very silly *mistake*
>>> in normalizing the LM scores. As the scores are log base 10 of the sentence
>>> probability, then the higher in magnitude has the lower probability, but I
>>> what I did was the inverse of that, and that was the cause of the very bad
>>> results.
>>>
>>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>>
>>> Regards,
>>> Aboelhamd
>>>
>>>
>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Thanks Sevilay for your feedback, and thanks for the resources.

 On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >>> wrote:

> hi Aboelhamd,
>
> Your propos

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Hi Sevilay,

I think a new language model that could distinguish the best ambiguous
combination/s of a translation, would eliminate our need to max entropy
model or any other method.
But is that the case with RNNs LM, I don't know yet.
But for now, do you agree that we need to change the LM first ? or you
prefer going straight to an alternative method for max entropy ? and do you
have any idea for such alternative method ?
In my opinion, I think fixing all the bugs, evaluating our current system,
then changing n-gram to RNNs, is the prior plan for the next two weeks or
so.
After that we can focus the research on what's next, if the accuracy is not
good enough or there is a room for improvement.
Do you agree with this ?

Regards,
Aboelhamd

On Sun, Apr 21, 2019 at 10:48 PM Sevilay Bayatlı 
wrote:

> Aboelhamd,
>
> I think using Gated Recurrent Units (GRUS)  instead of n-gram language
> model is a good idea, probably we can achieve more gain,  however, the most
> important part here is changing the maximum entropy.
>
> Lets see, what Fran thinks about it.
>
> Regards,
>
> Sevilay
>
>
>
>
> On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay. Hi Francis,
>>
>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>> compared to apertium LRLM resolution.
>> So we discussed what to do next and it is to utilize the breakthrough of
>> deep learning neural networks in NLP and especially machine translations.
>> Also we discussed about using different values of n more than 5 in the
>> already used n-gram language model. And to evaluate the result of
>> increasing value of n, which could give us some more insights in what to do
>> next and how to do it.
>>
>> Since I have an intro to deep learning subject this term in college, I
>> waited this past two weeks to be introduced to the application of deep
>> learning in NLP and MTs.
>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>> and why to use it instead of the standard network in NLP, beside
>> understanding the different architectures of it and the math done in the
>> forward and back propagation.
>> Also besides knowing how to build a simple language model, and avoiding
>> the problem of (vanishing gradient) leading to not capturing long
>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>> Memory (LSTM) network.
>>
>> For next step, we will consider working only on the language model and to
>> let the max entropy part for later discussions.
>> So along with trying different n values in the n-gram language model and
>> evaluate the results, I will try either to use a ready RNNLM or to build a
>> new one from scratch from what I learnt so far. Honestly I prefer the last
>> choice because it will increase my experience in applying what I have
>> learnt.
>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>> implemented a character based language model as two assignments and they
>> were very fun to do. So implementing a RNNs word based character LM will
>> not take much time, though it may not be close to the state-of-the-art
>> model and this is the disadvantage of it.
>>
>> Using NNLM instead of the n-gram LM has these possible advantages :
>> - Automatically learn such syntactic and semantic features.
>> - Overcome the curse of dimensionality by generating better
>> generalizations.
>>
>> --
>>
>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>> that different as Sevilay pointed out in our discussion.
>> I knew that NNLM is better than statistical one, also that using machine
>> learning instead of maximum entropy model will give better performance.
>> *But* the evaluation results were very very disappointing, unexpected
>> and illogical, so I thought there might be a bug in the code.
>> And after some search, I found that I did a very very silly *mistake* in
>> normalizing the LM scores. As the scores are log base 10 of the sentence
>> probability, then the higher in magnitude has the lower probability, but I
>> what I did was the inverse of that, and that was the cause of the very bad
>> results.
>>
>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>
>> Regards,
>> Aboelhamd
>>
>>
>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Thanks Sevilay for your feedback, and thanks for the resources.
>>>
>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >> wrote:
>>>
 hi Aboelhamd,

 Your proposal looks good, I found these resource may be will be benefit.



 
 Multi-source *neural translation* 
 https://arxiv.org/abs/1601.00710


 
 *Neural machine translation *with ext

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı
Aboelhamd,

I think using Gated Recurrent Units (GRUS)  instead of n-gram language
model is a good idea, probably we can achieve more gain,  however, the most
important part here is changing the maximum entropy.

Lets see, what Fran thinks about it.

Regards,

Sevilay




On Fri, Apr 19, 2019 at 10:29 PM Aboelhamd Aly 
wrote:

> Hi Sevilay. Hi Francis,
>
> Unfortunately, Sevilay reported that the evaluation results of kaz-tur and
> spa-eng pairs were very bad with 30% of the tested sentences were good,
> compared to apertium LRLM resolution.
> So we discussed what to do next and it is to utilize the breakthrough of
> deep learning neural networks in NLP and especially machine translations.
> Also we discussed about using different values of n more than 5 in the
> already used n-gram language model. And to evaluate the result of
> increasing value of n, which could give us some more insights in what to do
> next and how to do it.
>
> Since I have an intro to deep learning subject this term in college, I
> waited this past two weeks to be introduced to the application of deep
> learning in NLP and MTs.
> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
> and why to use it instead of the standard network in NLP, beside
> understanding the different architectures of it and the math done in the
> forward and back propagation.
> Also besides knowing how to build a simple language model, and avoiding
> the problem of (vanishing gradient) leading to not capturing long
> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
> Memory (LSTM) network.
>
> For next step, we will consider working only on the language model and to
> let the max entropy part for later discussions.
> So along with trying different n values in the n-gram language model and
> evaluate the results, I will try either to use a ready RNNLM or to build a
> new one from scratch from what I learnt so far. Honestly I prefer the last
> choice because it will increase my experience in applying what I have
> learnt.
> In last 2 weeks I implemented RNNs with GRUs and LSTM and also implemented
> a character based language model as two assignments and they were very fun
> to do. So implementing a RNNs word based character LM will not take much
> time, though it may not be close to the state-of-the-art model and this is
> the disadvantage of it.
>
> Using NNLM instead of the n-gram LM has these possible advantages :
> - Automatically learn such syntactic and semantic features.
> - Overcome the curse of dimensionality by generating better
> generalizations.
>
> --
>
> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
> that different as Sevilay pointed out in our discussion.
> I knew that NNLM is better than statistical one, also that using machine
> learning instead of maximum entropy model will give better performance.
> *But* the evaluation results were very very disappointing, unexpected and
> illogical, so I thought there might be a bug in the code.
> And after some search, I found that I did a very very silly *mistake* in
> normalizing the LM scores. As the scores are log base 10 of the sentence
> probability, then the higher in magnitude has the lower probability, but I
> what I did was the inverse of that, and that was the cause of the very bad
> results.
>
> I am fixing this now and then will re-evaluate the results with Sevilay.
>
> Regards,
> Aboelhamd
>
>
> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly 
> wrote:
>
>> Thanks Sevilay for your feedback, and thanks for the resources.
>>
>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı > wrote:
>>
>>> hi Aboelhamd,
>>>
>>> Your proposal looks good, I found these resource may be will be benefit.
>>>
>>>
>>>
>>> 
>>> Multi-source *neural translation* 
>>> https://arxiv.org/abs/1601.00710
>>>
>>>
>>> 
>>> *Neural machine translation *with extended context
>>> 
>>> https://arxiv.org/abs/1708.05943
>>>
>>> Handling homographs in *neural machine translation*
>>> https://arxiv.org/abs/1708.06510
>>>
>>>
>>>
>>> Sevilay
>>>
>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi all,

 I got a not solid yet idea as an alternative to yasmet and max entropy
 models.
 And it's by using neural networks to give us scores for the ambiguous
 rules.
 But I didn't yet set a formulation for the problem nor the structure of
 the inputs, output and even the goal.
 As I think there are many formulations that we can adopt.

 For example, the most straightforward structure, is to give the network
 all the possible combinations
 of a sentence translations and let it choose the best one, or give them
 weights.
 Hence, make the network learns which combinat

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Hi Sevilay,

I got it, ok.

Thanks.

On Sun, 21 Apr 2019, 12:50 Sevilay Bayatlı  Hi Aboelhamd,
>
> For now it is ok to record day by day, but then you can change it week by
> week and make it in a table.
>
> Sevilay
>
> On Sun, Apr 21, 2019 at 1:37 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi,
>>
>> I am uploading the summary of each day of work in this wiki page
>> .
>> Please, take a look and let me know if there is something else I could do
>> instead.
>>
>> Thanks.
>>
>> On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> According to the timeline I put in my proposal, I am supposed to start
>>> phase 1 today.
>>> I want to know which procedures to do to document my work, day by day
>>> and week by week.
>>> Do I create a page in wiki to save my progress ?
>>> Or is there another way ?
>>>
>>> Thanks
>>>
>>> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Hi Sevilay. Hi Francis,

 Unfortunately, Sevilay reported that the evaluation results of kaz-tur
 and spa-eng pairs were very bad with 30% of the tested sentences were good,
 compared to apertium LRLM resolution.
 So we discussed what to do next and it is to utilize the breakthrough
 of deep learning neural networks in NLP and especially machine 
 translations.
 Also we discussed about using different values of n more than 5 in the
 already used n-gram language model. And to evaluate the result of
 increasing value of n, which could give us some more insights in what to do
 next and how to do it.

 Since I have an intro to deep learning subject this term in college, I
 waited this past two weeks to be introduced to the application of deep
 learning in NLP and MTs.
 Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
 and why to use it instead of the standard network in NLP, beside
 understanding the different architectures of it and the math done in the
 forward and back propagation.
 Also besides knowing how to build a simple language model, and avoiding
 the problem of (vanishing gradient) leading to not capturing long
 dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
 Memory (LSTM) network.

 For next step, we will consider working only on the language model and
 to let the max entropy part for later discussions.
 So along with trying different n values in the n-gram language model
 and evaluate the results, I will try either to use a ready RNNLM or to
 build a new one from scratch from what I learnt so far. Honestly I prefer
 the last choice because it will increase my experience in applying what I
 have learnt.
 In last 2 weeks I implemented RNNs with GRUs and LSTM and also
 implemented a character based language model as two assignments and they
 were very fun to do. So implementing a RNNs word based character LM will
 not take much time, though it may not be close to the state-of-the-art
 model and this is the disadvantage of it.

 Using NNLM instead of the n-gram LM has these possible advantages :
 - Automatically learn such syntactic and semantic features.
 - Overcome the curse of dimensionality by generating better
 generalizations.

 --

 I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
 that different as Sevilay pointed out in our discussion.
 I knew that NNLM is better than statistical one, also that using
 machine learning instead of maximum entropy model will give better
 performance.
 *But* the evaluation results were very very disappointing, unexpected
 and illogical, so I thought there might be a bug in the code.
 And after some search, I found that I did a very very silly *mistake*
 in normalizing the LM scores. As the scores are log base 10 of the sentence
 probability, then the higher in magnitude has the lower probability, but I
 what I did was the inverse of that, and that was the cause of the very bad
 results.

 I am fixing this now and then will re-evaluate the results with Sevilay.

 Regards,
 Aboelhamd


 On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
 aboelhamd.abotr...@gmail.com> wrote:

> Thanks Sevilay for your feedback, and thanks for the resources.
>
> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı  wrote:
>
>> hi Aboelhamd,
>>
>> Your proposal looks good, I found these resource may be will be
>> benefit.
>>
>>
>>
>> 
>> Multi-source *neural translation* 
>> https://arxiv.org/abs/1601.00710
>>
>>
>> 
>> *Neural mach

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Sevilay Bayatlı
Hi Aboelhamd,

For now it is ok to record day by day, but then you can change it week by
week and make it in a table.

Sevilay

On Sun, Apr 21, 2019 at 1:37 PM Aboelhamd Aly 
wrote:

> Hi,
>
> I am uploading the summary of each day of work in this wiki page
> .
> Please, take a look and let me know if there is something else I could do
> instead.
>
> Thanks.
>
> On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> According to the timeline I put in my proposal, I am supposed to start
>> phase 1 today.
>> I want to know which procedures to do to document my work, day by day and
>> week by week.
>> Do I create a page in wiki to save my progress ?
>> Or is there another way ?
>>
>> Thanks
>>
>> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Hi Sevilay. Hi Francis,
>>>
>>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>>> compared to apertium LRLM resolution.
>>> So we discussed what to do next and it is to utilize the breakthrough of
>>> deep learning neural networks in NLP and especially machine translations.
>>> Also we discussed about using different values of n more than 5 in the
>>> already used n-gram language model. And to evaluate the result of
>>> increasing value of n, which could give us some more insights in what to do
>>> next and how to do it.
>>>
>>> Since I have an intro to deep learning subject this term in college, I
>>> waited this past two weeks to be introduced to the application of deep
>>> learning in NLP and MTs.
>>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>>> and why to use it instead of the standard network in NLP, beside
>>> understanding the different architectures of it and the math done in the
>>> forward and back propagation.
>>> Also besides knowing how to build a simple language model, and avoiding
>>> the problem of (vanishing gradient) leading to not capturing long
>>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>>> Memory (LSTM) network.
>>>
>>> For next step, we will consider working only on the language model and
>>> to let the max entropy part for later discussions.
>>> So along with trying different n values in the n-gram language model and
>>> evaluate the results, I will try either to use a ready RNNLM or to build a
>>> new one from scratch from what I learnt so far. Honestly I prefer the last
>>> choice because it will increase my experience in applying what I have
>>> learnt.
>>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>>> implemented a character based language model as two assignments and they
>>> were very fun to do. So implementing a RNNs word based character LM will
>>> not take much time, though it may not be close to the state-of-the-art
>>> model and this is the disadvantage of it.
>>>
>>> Using NNLM instead of the n-gram LM has these possible advantages :
>>> - Automatically learn such syntactic and semantic features.
>>> - Overcome the curse of dimensionality by generating better
>>> generalizations.
>>>
>>> --
>>>
>>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>>> that different as Sevilay pointed out in our discussion.
>>> I knew that NNLM is better than statistical one, also that using machine
>>> learning instead of maximum entropy model will give better performance.
>>> *But* the evaluation results were very very disappointing, unexpected
>>> and illogical, so I thought there might be a bug in the code.
>>> And after some search, I found that I did a very very silly *mistake*
>>> in normalizing the LM scores. As the scores are log base 10 of the sentence
>>> probability, then the higher in magnitude has the lower probability, but I
>>> what I did was the inverse of that, and that was the cause of the very bad
>>> results.
>>>
>>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>>
>>> Regards,
>>> Aboelhamd
>>>
>>>
>>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>>> aboelhamd.abotr...@gmail.com> wrote:
>>>
 Thanks Sevilay for your feedback, and thanks for the resources.

 On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >>> wrote:

> hi Aboelhamd,
>
> Your proposal looks good, I found these resource may be will be
> benefit.
>
>
>
> 
> Multi-source *neural translation* 
> https://arxiv.org/abs/1601.00710
>
>
> 
> *Neural machine translation *with extended context
> 
> https://arxiv.org/abs/1708.05943
>
> Handling homographs in *neural machine translation*
> https://arxiv.org/abs/

Re: [Apertium-stuff] Extend weighted transfer rules GSoC proposal

2019-04-21 Thread Aboelhamd Aly
Hi,

I am uploading the summary of each day of work in this wiki page
.
Please, take a look and let me know if there is something else I could do
instead.

Thanks.

On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly 
wrote:

> According to the timeline I put in my proposal, I am supposed to start
> phase 1 today.
> I want to know which procedures to do to document my work, day by day and
> week by week.
> Do I create a page in wiki to save my progress ?
> Or is there another way ?
>
> Thanks
>
> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly <
> aboelhamd.abotr...@gmail.com> wrote:
>
>> Hi Sevilay. Hi Francis,
>>
>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur
>> and spa-eng pairs were very bad with 30% of the tested sentences were good,
>> compared to apertium LRLM resolution.
>> So we discussed what to do next and it is to utilize the breakthrough of
>> deep learning neural networks in NLP and especially machine translations.
>> Also we discussed about using different values of n more than 5 in the
>> already used n-gram language model. And to evaluate the result of
>> increasing value of n, which could give us some more insights in what to do
>> next and how to do it.
>>
>> Since I have an intro to deep learning subject this term in college, I
>> waited this past two weeks to be introduced to the application of deep
>> learning in NLP and MTs.
>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs)
>> and why to use it instead of the standard network in NLP, beside
>> understanding the different architectures of it and the math done in the
>> forward and back propagation.
>> Also besides knowing how to build a simple language model, and avoiding
>> the problem of (vanishing gradient) leading to not capturing long
>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term
>> Memory (LSTM) network.
>>
>> For next step, we will consider working only on the language model and to
>> let the max entropy part for later discussions.
>> So along with trying different n values in the n-gram language model and
>> evaluate the results, I will try either to use a ready RNNLM or to build a
>> new one from scratch from what I learnt so far. Honestly I prefer the last
>> choice because it will increase my experience in applying what I have
>> learnt.
>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also
>> implemented a character based language model as two assignments and they
>> were very fun to do. So implementing a RNNs word based character LM will
>> not take much time, though it may not be close to the state-of-the-art
>> model and this is the disadvantage of it.
>>
>> Using NNLM instead of the n-gram LM has these possible advantages :
>> - Automatically learn such syntactic and semantic features.
>> - Overcome the curse of dimensionality by generating better
>> generalizations.
>>
>> --
>>
>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't
>> that different as Sevilay pointed out in our discussion.
>> I knew that NNLM is better than statistical one, also that using machine
>> learning instead of maximum entropy model will give better performance.
>> *But* the evaluation results were very very disappointing, unexpected
>> and illogical, so I thought there might be a bug in the code.
>> And after some search, I found that I did a very very silly *mistake* in
>> normalizing the LM scores. As the scores are log base 10 of the sentence
>> probability, then the higher in magnitude has the lower probability, but I
>> what I did was the inverse of that, and that was the cause of the very bad
>> results.
>>
>> I am fixing this now and then will re-evaluate the results with Sevilay.
>>
>> Regards,
>> Aboelhamd
>>
>>
>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly <
>> aboelhamd.abotr...@gmail.com> wrote:
>>
>>> Thanks Sevilay for your feedback, and thanks for the resources.
>>>
>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı >> wrote:
>>>
 hi Aboelhamd,

 Your proposal looks good, I found these resource may be will be benefit.



 
 Multi-source *neural translation* 
 https://arxiv.org/abs/1601.00710


 
 *Neural machine translation *with extended context
 
 https://arxiv.org/abs/1708.05943

 Handling homographs in *neural machine translation*
 https://arxiv.org/abs/1708.06510



 Sevilay

 On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly <
 aboelhamd.abotr...@gmail.com> wrote:

> Hi all,
>
> I got a not solid yet idea as an alternative to yasmet and max entropy
> models.
> And it's by using neural networks to give us scores for the ambiguous
> rule