Hi Aboelhamd, For now it is ok to record day by day, but then you can change it week by week and make it in a table.
Sevilay On Sun, Apr 21, 2019 at 1:37 PM Aboelhamd Aly <aboelhamd.abotr...@gmail.com> wrote: > Hi, > > I am uploading the summary of each day of work in this wiki page > <http://wiki.apertium.org/wiki/User:Aboelhamd/progress>. > Please, take a look and let me know if there is something else I could do > instead. > > Thanks. > > On Fri, Apr 19, 2019 at 9:42 PM Aboelhamd Aly < > aboelhamd.abotr...@gmail.com> wrote: > >> According to the timeline I put in my proposal, I am supposed to start >> phase 1 today. >> I want to know which procedures to do to document my work, day by day and >> week by week. >> Do I create a page in wiki to save my progress ? >> Or is there another way ? >> >> Thanks >> >> On Fri, Apr 19, 2019 at 9:27 PM Aboelhamd Aly < >> aboelhamd.abotr...@gmail.com> wrote: >> >>> Hi Sevilay. Hi Francis, >>> >>> Unfortunately, Sevilay reported that the evaluation results of kaz-tur >>> and spa-eng pairs were very bad with 30% of the tested sentences were good, >>> compared to apertium LRLM resolution. >>> So we discussed what to do next and it is to utilize the breakthrough of >>> deep learning neural networks in NLP and especially machine translations. >>> Also we discussed about using different values of n more than 5 in the >>> already used n-gram language model. And to evaluate the result of >>> increasing value of n, which could give us some more insights in what to do >>> next and how to do it. >>> >>> Since I have an intro to deep learning subject this term in college, I >>> waited this past two weeks to be introduced to the application of deep >>> learning in NLP and MTs. >>> Now, I have the basics of knowledge in Recurrent Neural Networks (RNNs) >>> and why to use it instead of the standard network in NLP, beside >>> understanding the different architectures of it and the math done in the >>> forward and back propagation. >>> Also besides knowing how to build a simple language model, and avoiding >>> the problem of (vanishing gradient) leading to not capturing long >>> dependencies, by using Gated Recurrent Units (GRus) and Long Short Term >>> Memory (LSTM) network. >>> >>> For next step, we will consider working only on the language model and >>> to let the max entropy part for later discussions. >>> So along with trying different n values in the n-gram language model and >>> evaluate the results, I will try either to use a ready RNNLM or to build a >>> new one from scratch from what I learnt so far. Honestly I prefer the last >>> choice because it will increase my experience in applying what I have >>> learnt. >>> In last 2 weeks I implemented RNNs with GRUs and LSTM and also >>> implemented a character based language model as two assignments and they >>> were very fun to do. So implementing a RNNs word based character LM will >>> not take much time, though it may not be close to the state-of-the-art >>> model and this is the disadvantage of it. >>> >>> Using NNLM instead of the n-gram LM has these possible advantages : >>> - Automatically learn such syntactic and semantic features. >>> - Overcome the curse of dimensionality by generating better >>> generalizations. >>> >>> ---------------------------------------------- >>> >>> I tried using n=8 instead of 5 in the n-gram LM, but the scores weren't >>> that different as Sevilay pointed out in our discussion. >>> I knew that NNLM is better than statistical one, also that using machine >>> learning instead of maximum entropy model will give better performance. >>> *But* the evaluation results were very very disappointing, unexpected >>> and illogical, so I thought there might be a bug in the code. >>> And after some search, I found that I did a very very silly *mistake* >>> in normalizing the LM scores. As the scores are log base 10 of the sentence >>> probability, then the higher in magnitude has the lower probability, but I >>> what I did was the inverse of that, and that was the cause of the very bad >>> results. >>> >>> I am fixing this now and then will re-evaluate the results with Sevilay. >>> >>> Regards, >>> Aboelhamd >>> >>> >>> On Sun, Apr 7, 2019 at 6:46 PM Aboelhamd Aly < >>> aboelhamd.abotr...@gmail.com> wrote: >>> >>>> Thanks Sevilay for your feedback, and thanks for the resources. >>>> >>>> On Sun, 7 Apr 2019, 18:42 Sevilay Bayatlı <sevilaybaya...@gmail.com >>>> wrote: >>>> >>>>> hi Aboelhamd, >>>>> >>>>> Your proposal looks good, I found these resource may be will be >>>>> benefit. >>>>> >>>>> >>>>> >>>>> <https://arxiv.org/pdf/1601.00710> >>>>> Multi-source *neural translation* <https://arxiv.org/abs/1601.00710> >>>>> https://arxiv.org/abs/1601.00710 >>>>> >>>>> >>>>> <https://arxiv.org/pdf/1708.05943> >>>>> *Neural machine translation *with extended context >>>>> <https://arxiv.org/abs/1708.05943> >>>>> https://arxiv.org/abs/1708.05943 >>>>> >>>>> Handling homographs in *neural machine translation* >>>>> <https://arxiv.org/abs/1708.06510>https://arxiv.org/abs/1708.06510 >>>>> >>>>> >>>>> >>>>> Sevilay >>>>> >>>>> On Sun, Apr 7, 2019 at 7:14 PM Aboelhamd Aly < >>>>> aboelhamd.abotr...@gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I got a not solid yet idea as an alternative to yasmet and max >>>>>> entropy models. >>>>>> And it's by using neural networks to give us scores for the ambiguous >>>>>> rules. >>>>>> But I didn't yet set a formulation for the problem nor the structure >>>>>> of the inputs, output and even the goal. >>>>>> As I think there are many formulations that we can adopt. >>>>>> >>>>>> For example, the most straightforward structure, is to give the >>>>>> network all the possible combinations >>>>>> of a sentence translations and let it choose the best one, or give >>>>>> them weights. >>>>>> Hence, make the network learns which combinations to choose for a >>>>>> specific pair. >>>>>> >>>>>> Another example, is instead of building one network per pair, >>>>>> we build one network per ambiguous pattern as we did with max entropy >>>>>> models. >>>>>> So we give to the network the combinations for that pattern, >>>>>> and let it assign some weights for the ambiguous rules applied to >>>>>> that pattern. >>>>>> >>>>>> And for each structure there are many details and questions to yet >>>>>> answer. >>>>>> >>>>>> So with that said, I decided to look at some papers to see what >>>>>> others have done before >>>>>> to tackle some similar problems or the exact problem, and how some of >>>>>> them used machine learning >>>>>> or deep learning to solve these problems, and then try build on them. >>>>>> >>>>>> Some papers resolution was very specific to the pairs they developed, >>>>>> thus were not very important to our case. : >>>>>> 1) Resolving Structural Transfer Ambiguity inChinese-to-Korean >>>>>> Machine Translation >>>>>> <https://www.worldscientific.com/doi/10.1142/S0219427903000887> >>>>>> .(2003) >>>>>> 2) Arabic Machine Translation: A Developmental Perspective >>>>>> <http://www.ieee.ma/IJICT/IJICT-SI-Bouzoubaa-3.3/2%20-%20paper_farghaly.pdf> >>>>>> .(2010) >>>>>> >>>>>> Some other papers tried not to generate ambiguous rules or to >>>>>> minimize the ambiguity in transfer rules inference, and didn't provide >>>>>> any >>>>>> methods to resolve the ambiguity in our case. I thought that they may >>>>>> provide some help, but I think they are far from our topic : >>>>>> 1) Learning Transfer Rules for Machine Translation with Limited Data >>>>>> <http://www.cs.cmu.edu/~kathrin/ThesisSummary/ThesisSummary.pdf> >>>>>> .(2005) >>>>>> 2) Inferring Shallow-Transfer Machine Translation Rulesfrom Small >>>>>> Parallel Corpora <https://arxiv.org/pdf/1401.5700.pdf>.(2009) >>>>>> >>>>>> Now I am looking into some more recent papers like : >>>>>> 1) Rule Based Machine Translation Combined with Statistical Post >>>>>> Editor for Japanese to English Patent Translation >>>>>> <http://www.mt-archive.info/MTS-2007-Ehara.pdf>.(2007) >>>>>> 2) Machine translation model using inductive logic programming >>>>>> <https://scholar.cu.edu.eg/?q=shaalan/files/101.pdf>.(2009) >>>>>> 3) Machine Learning for Hybrid Machine Translation >>>>>> <https://www.aclweb.org/anthology/W12-3138.pdf>.(2012) >>>>>> 4) Study and Comparison of Rule-Based and Statistical >>>>>> Catalan-Spanish Machine Translation Systems >>>>>> <https://pdfs.semanticscholar.org/a731/0d0c15b22381c7b372e783d122a5324b005a.pdf?_ga=2.89511443.981790355.1554651923-676013054.1554651923> >>>>>> .(2012) >>>>>> 5) Latest trends in hybrid machine translation and its applications >>>>>> <https://www.sciencedirect.com/science/article/pii/S0885230814001077> >>>>>> .(2015) >>>>>> 6) Machine Translation: Phrase-Based, Rule-Based and >>>>>> NeuralApproaches with Linguistic Evaluation >>>>>> <http://www.dfki.de/~ansr01/docs/MacketanzEtAl2017_CIT.pdf>.(2017) >>>>>> 7) A Multitask-Based Neural Machine Translation Model with >>>>>> Part-of-Speech Tags Integration for Arabic Dialects >>>>>> <https://www.mdpi.com/2076-3417/8/12/2502/htm>.(2018) >>>>>> >>>>>> And I hope they give me some more insights and thoughts. >>>>>> >>>>>> -------------- >>>>>> >>>>>> - So do you have recommendations to other papers that refer to the >>>>>> same problem ? >>>>>> - Also about the proposal, I modified it a little bit and share it >>>>>> through GSoC website as a draft, >>>>>> so do you have any last feedback or thoughts about it, or do I just >>>>>> submit it as a final proposal ? >>>>>> - Last thing for the coding challenge ( integrating weighted transfer >>>>>> rules with apertium-transfer ), >>>>>> I think it's finished, and I didn't get any feedback or response >>>>>> about it, also the pull-request is not merged yet with master. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Aboelhamd >>>>>> >>>>>> >>>>>> On Sat, Apr 6, 2019 at 5:23 AM Aboelhamd Aly < >>>>>> aboelhamd.abotr...@gmail.com> wrote: >>>>>> >>>>>>> Hi Sevilay, hi spectei, >>>>>>> >>>>>>> For sentence splitting, I think that we don't need to know neither >>>>>>> syntax nor sentence boundaries of the language. >>>>>>> Also I don't see any necessity for applying it in runtime, as in >>>>>>> runtime we only get the score of each pattern, >>>>>>> where there is no need for splitting. I also had one thought on >>>>>>> using beam-search here as I see it has no effect >>>>>>> and may be I am wrong. We can discuss in it after we close this >>>>>>> thread. >>>>>>> >>>>>>> We will handle the whole text as one unit and will depend only on >>>>>>> the captured patterns. >>>>>>> Knowing that in the chunker terms, successive patterns that don't >>>>>>> share a transfer rule, are independent. >>>>>>> So by using the lexical form of the text, we match the words with >>>>>>> patterns, then match patterns with rules. >>>>>>> And hence we know which patterns are ambiguous and how much >>>>>>> ambiguous rules they match. >>>>>>> >>>>>>> For example if we have text with the following patterns and >>>>>>> corresponding rules numbers: >>>>>>> p1:2 p2:1 p3:6 p4:4 p5:3 p6:5 p7:1 p8:4 p9:4 p10:6 p11:8 >>>>>>> p12:5 p13:5 p14:1 p15:3 p16:2 >>>>>>> >>>>>>> If such text was handled by our old method with generating all the >>>>>>> combinations possible (multiplication of rules numbers), >>>>>>> we would have 82944000 possible combinations, which are not >>>>>>> practical at all to score, and take heavy computations and memory. >>>>>>> And if it is handled by our new method with applying all ambiguous >>>>>>> rules of one pattern while fixing the other patterns at LRLM rule >>>>>>> (addition of rules numbers), we will have just 60 combinations, and >>>>>>> not all of them different, giving drastically low number of >>>>>>> combinations, >>>>>>> which may be not so representative. >>>>>>> >>>>>>> But if we apply the splitting idea , we will have something in the >>>>>>> middle, that will hopefully avoid the disadvantages of both methods >>>>>>> and benefit from advantages of both, too. >>>>>>> Let's proceed from the start of the text to the end of it, while >>>>>>> maintaining some threshold of say 24000 combinations. >>>>>>> p1 => 2 ,, p1 p2 => 2 ,, p1 p2 p3 => 12 ,, p1 p2 p3 p4 >>>>>>> => 48 ,, p1 p2 p3 p4 p5 => 144 ,, >>>>>>> p1 p2 p3 p4 p5 p6 => 720 ,, p1 p2 p3 p4 p5 p6 p7 => 720 >>>>>>> p1 p2 p3 p4 p5 p6 p7 p8 => 2880 ,, p1 p2 p3 p4 p5 p6 >>>>>>> p7 p8 p9 => 11520 >>>>>>> >>>>>>> And then we stop here, because taking the next pattern will exceed >>>>>>> the threshold. >>>>>>> Hence having our first split, we can now continue our work on it as >>>>>>> usual. >>>>>>> But with more -non overwhelming- combinations which would capture >>>>>>> more semantics. >>>>>>> After that, we take the next split and so on. >>>>>>> >>>>>>> ----------- >>>>>>> >>>>>>> I agree with you, that testing the current method with more than one >>>>>>> pair to know its accuracy is the priority, >>>>>>> and we currently working on it. >>>>>>> >>>>>>> ----------- >>>>>>> >>>>>>> For an alternative for yasmet, I agree with spectei. Unfortunately, >>>>>>> for now I don't have a solid idea to discuss. >>>>>>> But in the few days, i will try to get one or more ideas to discuss. >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 5, 2019 at 11:23 PM Francis Tyers <fty...@prompsit.com> >>>>>>> wrote: >>>>>>> >>>>>>>> El 2019-04-05 20:57, Sevilay Bayatlı escribió: >>>>>>>> > On Fri, 5 Apr 2019, 22:41 Francis Tyers, <fty...@prompsit.com> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> >> El 2019-04-05 19:07, Sevilay Bayatlı escribió: >>>>>>>> >>> Hi Aboelhamd, >>>>>>>> >>> >>>>>>>> >>> There is some points in your proposal: >>>>>>>> >>> >>>>>>>> >>> First, I do not think "splitting sentence" is a good idea, each >>>>>>>> >>> language has different syntax, how could you know when you >>>>>>>> should >>>>>>>> >>> split the sentence. >>>>>>>> >> >>>>>>>> >> Apertium works on the concept of a stream of words, so in the >>>>>>>> >> runtime >>>>>>>> >> we can't really rely on robust sentence segmentation. >>>>>>>> >> >>>>>>>> >> We can often use it, e.g. for training, but if sentence boundary >>>>>>>> >> detection >>>>>>>> >> were to be included, it would need to be trained, as Sevilay >>>>>>>> hints >>>>>>>> >> at. >>>>>>>> >> >>>>>>>> >> Also, I'm not sure how much we would gain from that. >>>>>>>> >> >>>>>>>> >>> Second, "substitute yasmet with other method", I think the >>>>>>>> result >>>>>>>> >> will >>>>>>>> >>> not be more better if you substituted it with statistical >>>>>>>> method. >>>>>>>> >>> >>>>>>>> >> >>>>>>>> >> Substituting yasmet with a more up to date machine-learning >>>>>>>> method >>>>>>>> >> might be a worthwhile thing to do. What suggestions do you have? >>>>>>>> >> >>>>>>>> >> I think first we have to trying the exact method with more than 3 >>>>>>>> >> language pairs and then decide to substitute it or not, because >>>>>>>> >> what is the point of new method if dont achieve gain, then we can >>>>>>>> >> compare the results of two methods and choose the best one. >>>>>>>> What do >>>>>>>> >> you think? >>>>>>>> > >>>>>>>> >>>>>>>> Yes, testing it with more language pairs is also a priority. >>>>>>>> >>>>>>>> Fran >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Apertium-stuff mailing list >>>>>>>> Apertium-stuff@lists.sourceforge.net >>>>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>>>> >>>>>>> _______________________________________________ >>>>>> Apertium-stuff mailing list >>>>>> Apertium-stuff@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>>> >>>>> _______________________________________________ >>>>> Apertium-stuff mailing list >>>>> Apertium-stuff@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >>>>> >>>> _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff