Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Hi Vincent, On Thu, 2015-09-24 at 22:37 +0200, Vincent Nguyen wrote: > Thanks Matthias for the detailed explanation. > I think I have most of it in mind except not really understanding how > this one works : > > "Difficult sentences generally have worse model score than easy ones but > may still be useful for training." Well, your data selection method may discard training instances that are somehow hard to decode, e.g. because of complex sentence structure or because of rare vocabulary. But that doesn't necessarily mean that it's bad sentence pairs that you're removing. You should manually inspect some samples if possible. I didn't try, but I suspect that you'd get a higher decoder score on the 1-best decoder output of the first of the following two input sentences: (1) " Merci ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! " (2) " Je l' ai vécu moi-même en personne quand j' ai eu mon diplôme à Barnard College en 2002 . " (Just as a simple made-up example.) If we assume that you have a correct English target sentence for both of those sentences in your training data, I wonder which of the two you could learn more from? If you're doing what I think, then you're also basically just assessing whether the source side of the sentence pair is easy to translate. Does this tell you anything about the target sentence? The target side might be misaligned or in a different third language if your data is noisy. Cheers, Matthias -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Thanks Matthias for the detailed explanation. I think I have most of it in mind except not really understanding how this one works : "Difficult sentences generally have worse model score than easy ones but may still be useful for training." but yes what you describe is more or less what I did to better understand the mechanism. and I know I have to tune with in domain data for proper end result. Cheers, Vincent Le 24/09/2015 22:13, Matthias Huck a écrit : > Hi Vincent, > > This is a different topic, and I'm not completely clear about what > exactly you did here. Did you decode the source side of the parallel > training data, conduct sentence selection by applying a threshold on the > decoder score, and extract a new phrase table from the selected fraction > of the original parallel training data? If this is the case, I have some > comments: > > > - Be careful when you translate training data. The system knows these > sentences and does things like frequently applying long singleton > phrases that have been extracted from the very same sentence. > https://aclweb.org/anthology/P/P10/P10-1049.pdf > > - Longer sentences may have worse model score than shorter sentences. > Consider normalizing by sentence length if you use model score for data > selection. > Difficult sentences generally have worse model score than easy ones but > may still be useful for training. You possibly keep the parts of the > data that are easy to translate or are highly redundant in the corpus. > > - You probably see no out-of-vocabulary words (OOVs) when translating > training data, or very few of them (depending on word alignment, phrase > extraction method, and phrase table pruning), but be aware that if there > are OOVs, this may affect the model score a lot. > > - Check to what extent the sentence selection reduces the vocabulary of > your system. > > > Last but not least, two more general comments: > > - You need dev and test sets that are similar to the type of real-world > documents that you're building your system for. Don't tune on Europarl > if you eventually want to translate pharmaceutical patents, for > instance. Try to collect in-domain training data as well. > > - In case you have in-domain and out-of-domain training corpora, you can > try modified Moore-Lewis filtering for data selection. > https://aclweb.org/anthology/D/D11/D11-1033.pdf > > > Cheers, > Matthias > > > On Thu, 2015-09-24 at 18:19 +0200, Vincent Nguyen wrote: >> This is an interesting subject .. >> >> As a matter of fact I have done several tests. >> I came up to that need after realizing that even though my results were >> good in a "standard dev + test set" situation >> I had some strange results with real-world documents. >> That's why I investigated. >> >> But you are right removing some so-called bad entries could have >> unexpected results. >> >> For instance here is a test I did : >> >> I trained a fr-en model on europarl v7 ( 2 millions sentences) >> I tuned with a subset of 3 K sentences. >> I ran a evaluation on the full 2 million lines. >> then I removed the 90 K sentences for which the score was less than 0.2 >> retrained on 1917853 sentences. >> >> In the end I got more sentences (in %) with a score above 0.2 >> but when analyzing at > 0.3 it becomes similar and > 0.4 the initial >> corpus is better. >> >> Just weird. > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Hi Vincent, This is a different topic, and I'm not completely clear about what exactly you did here. Did you decode the source side of the parallel training data, conduct sentence selection by applying a threshold on the decoder score, and extract a new phrase table from the selected fraction of the original parallel training data? If this is the case, I have some comments: - Be careful when you translate training data. The system knows these sentences and does things like frequently applying long singleton phrases that have been extracted from the very same sentence. https://aclweb.org/anthology/P/P10/P10-1049.pdf - Longer sentences may have worse model score than shorter sentences. Consider normalizing by sentence length if you use model score for data selection. Difficult sentences generally have worse model score than easy ones but may still be useful for training. You possibly keep the parts of the data that are easy to translate or are highly redundant in the corpus. - You probably see no out-of-vocabulary words (OOVs) when translating training data, or very few of them (depending on word alignment, phrase extraction method, and phrase table pruning), but be aware that if there are OOVs, this may affect the model score a lot. - Check to what extent the sentence selection reduces the vocabulary of your system. Last but not least, two more general comments: - You need dev and test sets that are similar to the type of real-world documents that you're building your system for. Don't tune on Europarl if you eventually want to translate pharmaceutical patents, for instance. Try to collect in-domain training data as well. - In case you have in-domain and out-of-domain training corpora, you can try modified Moore-Lewis filtering for data selection. https://aclweb.org/anthology/D/D11/D11-1033.pdf Cheers, Matthias On Thu, 2015-09-24 at 18:19 +0200, Vincent Nguyen wrote: > This is an interesting subject .. > > As a matter of fact I have done several tests. > I came up to that need after realizing that even though my results were > good in a "standard dev + test set" situation > I had some strange results with real-world documents. > That's why I investigated. > > But you are right removing some so-called bad entries could have > unexpected results. > > For instance here is a test I did : > > I trained a fr-en model on europarl v7 ( 2 millions sentences) > I tuned with a subset of 3 K sentences. > I ran a evaluation on the full 2 million lines. > then I removed the 90 K sentences for which the score was less than 0.2 > retrained on 1917853 sentences. > > In the end I got more sentences (in %) with a score above 0.2 > but when analyzing at > 0.3 it becomes similar and > 0.4 the initial > corpus is better. > > Just weird. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
0.103413 >>>> > > 0.00192967 ||| 0-0 >>>> > > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| >>>> > > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 >>>> > > ||| 0-0 1-1 2-2 >>>> > > ||| 1 1 1 ||| ||| >>>> > > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 >>>> > > 1-1 ||| 16 13 >>>> > > 10 ||| ||| >>>> > > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 >>>> > > ||| 0-0 1-0 >>>> > > ||| 2.21954e+06 13 1 ||| ||| >>>> > > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 >>>> > > 5.66022e-05 ||| 0-1 >>>> > > 1-2 ||| 2 13 1 ||| ||| >>>> > > ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 >>>> > > 0.000130572 ||| >>>> > > 0-1 1-2 ||| 1 13 1 ||| ||| >>>> > > ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 >>>> > > 0.103413 >>>> > > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| >>>> > > ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 >>>> > > 0.103413 >>>> > > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| >>>> > > ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 >>>> > > 0.0035893 ||| >>>> > > 2-0 ||| 9436 1 1 ||| ||| >>>> > > ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 >>>> > > 0.472387 ||| 0-0 1-1 >>>> > > ||| 1 1 1 ||| ||| >>>> > > ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 >>>> > > 0.0517067 >>>> > > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| >>>> > > ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 >>>> > > 0.0517067 >>>> > > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| >>>> > > ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 >>>> > > 0.0344711 >>>> > > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| >>>> > > ! & quot ||| ! " ||| 0.00218918 8.30626e-09 >>>> > > 0.339323 0.518419 >>>> > > ||| 0-0 2-1 ||| 465 3 2 ||| ||| >>>> > > ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 >>>> > > 0.796143 ||| 0-0 ||| >>>> > > 15870 1 1 ||| ||| >>>> > > ! ' ] , addressed ||| ! " adressé ||| >>>> > > 0.103413 3.70838e-07 >>>> > > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| >>>> > > ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 >>>> > > 0.103413 >>>> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| >>>> > > ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 >>>> > > 0.103413 >>>> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| >>>> > > ! ' ' Alstom shares ||| l' on constate un >>>> > > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 >>>> > > 1.03361e-14 ||| 1-0 >>>> > > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| >>>> > > ! ' ' ||| l' on constate un ||| 0.0147733 >>>> > > 1.56906e-11 >>>> > > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| >>>> > > ! ' ' ||| l' on constate ||| 0.000984889 >>>> > > 1.56906e-11 >>>> > > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| >>>> > > ! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 >>>> > > 0.0129267 >>>> > > 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| ||| >>>> > > ! ' ' ||| ou que l' on constate ||| >>>> > > 0.0344711 1.56906e-11 >>>> > > 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| ||| >>>> > > ! ' ' ||| ou que l' on ||| 0.00304157 >>>> > > 1.56906e-11 >>>> > > 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| ||| >>>> > > ! ' ' ||| que l' on constate un ||| >>>> > > 0.0344711 1.56906e-11 >>>> > > 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| ||| >>>> > > ! ' ' ||| que l' on constate ||| 0.00323167 >>>> > > 1.56906e-11 >>>> > > 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| ||| >>>> > > >>>> > > >>>> > > >>>> > > Le 23/09/2015 15:12, Tom Hoar a écrit : >>>> > > > Vincent, >>>> > > > >>>> > > > If you suspect bad entries, isn't it better to address >>>> > > > the root of the >>>> > > > problem and prepare your training corpus better? >>>> > > > >>>> > > > >>>> > > > On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu >>>> > > > wrote: >>>> > > > > Date: Tue, 22 Sep 2015 20:24:02 +0200 >>>> > > > > From: Philipp Koehn >>>> > > > > Subject: Re: [Moses-support] is there a way to remove >>>> > > > > a bad entry in >>>> > > > > the phrase table ? >>>> > > > > To: Vincent Nguyen >>>> > > > > Cc: moses-support >>>> > > > > >>>> > > > > Hi, >>>> > > > > >>>> > > > > you can remove it manually (just edit the text file), >>>> > > > > there will be no >>>> > > > > negative consequences. >>>> > > > > >>>> > > > > However, it is not a realistic strategy to try to >>>> > > > > remove by hand every >>>> > > > > offending phrase table entry. >>>> > > > > >>>> > > > > -phi >>>> > > > > >>>> > > > > On Tue, Sep 22, 2015 at 4:05 PM, Vincent >>>> > > > > Nguyen wrote: >>>> > > > > >>>> > > > > > >Hi, >>>> > > > > > > >>>> > > > > > >I was wondering if after an analysis of the >>>> > > > > > BLEU-Annotation file we >>>> > > > > > >realize that there must be a bad entry in the >>>> > > > > > phrase table, >>>> > > > > > >we could remove it manually or in some other >>>> > > > > > ways ? >>>> > > > > > > >>>> > > > > > >Gracias. >>>> > > > > > >V. >>>> > > > > > >___ >>>> > > > > > >Moses-support mailing list >>>> > > > > > >Moses-support@mit.edu >>>> > > > > > >http://mailman.mit.edu/mailman/listinfo/moses-support >>>> > > > > > > >>>> > > > >>>> > > > -- >>>> > > > Best regards, >>>> > > > >>>> > > > Tom Hoar >>>> > > > Chief Executive Officer >>>> > > > /*Precision Translation Tools Pte Ltd*/ >>>> > > > Singapore/Thailand >>>> > > > Web: www.precisiontranslationtools.com >>>> > > > <http://www.precisiontranslationtools.com> >>>> > > > Thailand Mobile: +66 87 345-1875 >>>> > > > Skype: tahoar >>>> > > > >>>> > > > >>>> > > > ___ >>>> > > > Moses-support mailing list >>>> > > > Moses-support@mit.edu >>>> > > > http://mailman.mit.edu/mailman/listinfo/moses-support >>>> > > >>>> > > >>>> > > >>>> > > ___ >>>> > > Moses-support mailing list >>>> > > Moses-support@mit.edu >>>> > > http://mailman.mit.edu/mailman/listinfo/moses-support >>>> > > >>>> > >>>> >>>> >>>> >>>> ___ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>>> >>>> ___ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Vincent Nguyen escribió: >> > > I agree and would like to. >> > > But this is tricky, look at the first 30 lines of my >> > > phrase table below. >> > > >> > > and this happens a lot in the first line of tables where >> > > there are &apos >> > > or weird codes, EN/FR pairs do not match. >> > > >> > > >> > > >> > > >> > > ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413 >> > > 0.401758 ||| 0-0 1-1 >> > > 2-2 3-3 ||| 1 1 1 ||| ||| >> > > ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246 >> > > ||| 0-0 1-0 >> > > 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| ||| >> > > ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 ||| >> > > 0-0 1-1 2-2 >> > > ||| 10 7 6 ||| ||| >> > > ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733 >> > > 4.50635e-05 ||| >> > > 0-1 1-2 2-3 ||| 2 7 1 ||| ||| >> > > ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413 >> > > 0.00192967 ||| 0-0 >> > > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| >> > > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 >> > > ||| 0-0 1-1 2-2 >> > > ||| 1 1 1 ||| ||| >> > > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 >> > > 1-1 ||| 16 13 >> > > 10 ||| ||| >> > > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 >> > > ||| 0-0 1-0 >> > > ||| 2.21954e+06 13 1 ||| ||| >> > > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 >> > > 5.66022e-05 ||| 0-1 >> > > 1-2 ||| 2 13 1 ||| ||| >> > > ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 >> > > 0.000130572 ||| >> > > 0-1 1-2 ||| 1 13 1 ||| ||| >> > > ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 >> > > 0.103413 >> > > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| >> > > ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 >> > > 0.103413 >> > > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| >> > > ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 >> > > 0.0035893 ||| >> > > 2-0 ||| 9436 1 1 ||| ||| >> > > ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 >> > > 0.472387 ||| 0-0 1-1 >> > > ||| 1 1 1 ||| ||| >> > > ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 >> > > 0.0517067 >> > > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| >> > > ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 >> > > 0.0517067 >> > > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| >> > > ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 >> > > 0.0344711 >> > > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| >> > > ! & quot ||| ! " ||| 0.00218918 8.30626e-09 >> > > 0.339323 0.518419 >> > > ||| 0-0 2-1 ||| 465 3 2 ||| ||| >> > > ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 >> > > 0.796143 ||| 0-0 ||| >> > > 15870 1 1 ||| ||| >> > > ! ' ] , addressed ||| ! " adressé ||| >> > > 0.103413 3.70838e-07 >> > > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| >> > > ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 >> > > 0.103413 >> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| >> > > ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 >> > > 0.103413 >> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| >> > > ! ' ' Alstom shares ||| l' on constate un >> > > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 >> > > 1.03361e-14 ||| 1-0 >> > > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| >> > > ! ' ' ||| l' on constate un ||| 0.0147733 >> > > 1.56906e-11 >>
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
|| 0-0 1-1 > >> > > ||| 1 1 1 ||| ||| > >> > > ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 > >> > > 0.0517067 > >> > > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| > >> > > ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 > >> > > 0.0517067 > >> > > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| > >> > > ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 > >> > > 0.0344711 > >> > > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| > >> > > ! & quot ||| ! " ||| 0.00218918 8.30626e-09 > >> > > 0.339323 0.518419 > >> > > ||| 0-0 2-1 ||| 465 3 2 ||| ||| > >> > > ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 > >> > > 0.796143 ||| 0-0 ||| > >> > > 15870 1 1 ||| ||| > >> > > ! ' ] , addressed ||| ! " adressé ||| > >> > > 0.103413 3.70838e-07 > >> > > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| > >> > > ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 > >> > > 0.103413 > >> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > >> > > ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 > >> > > 0.103413 > >> > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > >> > > ! ' ' Alstom shares ||| l' on constate un > >> > > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 > >> > > 1.03361e-14 ||| 1-0 > >> > > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| > >> > > ! ' ' ||| l' on constate un ||| 0.0147733 > >> > > 1.56906e-11 > >> > > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| > >> > > ! ' ' ||| l' on constate ||| 0.000984889 > >> > > 1.56906e-11 > >> > > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| > >> > > ! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 > >> > > 0.0129267 > >> > > 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| ||| > >> > > ! ' ' ||| ou que l' on constate ||| > >> > > 0.0344711 1.56906e-11 > >> > > 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| ||| > >> > > ! ' ' ||| ou que l' on ||| 0.00304157 > >> > > 1.56906e-11 > >> > > 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| ||| > >> > > ! ' ' ||| que l' on constate un ||| > >> > > 0.0344711 1.56906e-11 > >> > > 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| ||| > >> > > ! ' ' ||| que l' on constate ||| 0.00323167 > >> > > 1.56906e-11 > >> > > 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| ||| > >> > > > >> > > > >> > > > >> > > Le 23/09/2015 15:12, Tom Hoar a écrit : > >> > > > Vincent, > >> > > > > >> > > > If you suspect bad entries, isn't it better to address > >> > > > the root of the > >> > > > problem and prepare your training corpus better? > >> > > > > >> > > > > >> > > > On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu > >> > > > wrote: > >> > > > > Date: Tue, 22 Sep 2015 20:24:02 +0200 > >> > > > > From: Philipp Koehn > >> > > > > Subject: Re: [Moses-support] is there a way to remove > >> > > > > a bad entry in > >> > > > > the phrase table ? > >> > > > > To: Vincent Nguyen > >> > > > > Cc: moses-support > >> > > > > > >> > > > > Hi, > >> > > > > > >> > > > > you can remove it manually (just edit the text file), > >> > > > > there will be no > >> > > > > negative consequences. > >> > > > > > >> > > > > However, it is not a realistic strategy to try to > >> > > > > remove by hand every > >> > > > > offending phrase table entry. > >> > > > > > >> > > > > -phi > >> > > > > > >> > > > > On Tue, Sep 22, 2015 at 4:05 PM, Vincent > >> > > > > Nguyen wrote: > >> > > > > > >> > > > > > >Hi, > >> > > > > > > > >> > > > > > >I was wondering if after an analysis of the > >> > > > > > BLEU-Annotation file we > >> > > > > > >realize that there must be a bad entry in the > >> > > > > > phrase table, > >> > > > > > >we could remove it manually or in some other > >> > > > > > ways ? > >> > > > > > > > >> > > > > > >Gracias. > >> > > > > > >V. > >> > > > > > >___ > >> > > > > > >Moses-support mailing list > >> > > > > > >Moses-support@mit.edu > >> > > > > > >http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > > > > > > >> > > > > >> > > > -- > >> > > > Best regards, > >> > > > > >> > > > Tom Hoar > >> > > > Chief Executive Officer > >> > > > /*Precision Translation Tools Pte Ltd*/ > >> > > > Singapore/Thailand > >> > > > Web: www.precisiontranslationtools.com > >> > > > <http://www.precisiontranslationtools.com> > >> > > > Thailand Mobile: +66 87 345-1875 > >> > > > Skype: tahoar > >> > > > > >> > > > > >> > > > ___ > >> > > > Moses-support mailing list > >> > > > Moses-support@mit.edu > >> > > > http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > > >> > > > >> > > > >> > > ___ > >> > > Moses-support mailing list > >> > > Moses-support@mit.edu > >> > > http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > > >> > > >> > >> > >> > >> ___ > >> Moses-support mailing list > >> Moses-support@mit.edu > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > >> > >> > >> ___ > >> Moses-support mailing list > >> Moses-support@mit.edu > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
||| ! ! ) - , ||| 0.103413 0.111989 0.103413 > > > 0.00192967 ||| 0-0 > > > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| > > > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 > > > ||| 0-0 1-1 2-2 > > > ||| 1 1 1 ||| ||| > > > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 > > > 1-1 ||| 16 13 > > > 10 ||| ||| > > > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 > > > ||| 0-0 1-0 > > > ||| 2.21954e+06 13 1 ||| ||| > > > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 > > > 5.66022e-05 ||| 0-1 > > > 1-2 ||| 2 13 1 ||| ||| > > > ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 > > > 0.000130572 ||| > > > 0-1 1-2 ||| 1 13 1 ||| ||| > > > ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 > > > 0.103413 > > > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| > > > ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 > > > 0.103413 > > > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| > > > ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 > > > 0.0035893 ||| > > > 2-0 ||| 9436 1 1 ||| ||| > > > ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 > > > 0.472387 ||| 0-0 1-1 > > > ||| 1 1 1 ||| ||| > > > ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 > > > 0.0517067 > > > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| > > > ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 > > > 0.0517067 > > > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| > > > ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 > > > 0.0344711 > > > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| > > > ! & quot ||| ! " ||| 0.00218918 8.30626e-09 > > > 0.339323 0.518419 > > > ||| 0-0 2-1 ||| 465 3 2 ||| ||| > > > ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 > > > 0.796143 ||| 0-0 ||| > > > 15870 1 1 ||| ||| > > > ! ' ] , addressed ||| ! " adressé ||| > > > 0.103413 3.70838e-07 > > > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| > > > ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 > > > 0.103413 > > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > > > ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 > > > 0.103413 > > > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > > > ! ' ' Alstom shares ||| l' on constate un > > > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 > > > 1.03361e-14 ||| 1-0 > > > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| > > > ! ' ' ||| l' on constate un ||| 0.0147733 > > > 1.56906e-11 > > > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| > > > ! ' ' ||| l' on constate ||| 0.000984889 > > > 1.56906e-11 > > > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| > > > ! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 > > > 0.0129267 > > > 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| ||| > > > ! ' ' ||| ou que l' on constate ||| > > > 0.0344711 1.56906e-11 > > > 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| ||| > > > ! ' ' ||| ou que l' on ||| 0.00304157 > > > 1.56906e-11 > > > 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| ||| > > > ! ' ' ||| que l' on constate un ||| > > > 0.0344711 1.56906e-11 > > > 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| ||| > > > ! ' ' ||| que l' on constate ||| 0.00323167 > > > 1.56906e-11 > > > 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| ||| > > > > > > > > > > > > Le 23/09/2015 15:12, Tom Hoar a écrit : > > > > Vincent, > > > > > > > > If you suspect bad entries, isn't it better to ad
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
||| 1-1 ||| 393 26 1 ||| ||| " 1 ||| one ||| 1.32368e-05 5.22671e-06 0.0391025 0.0141179 ||| 1-0 ||| 76806 26 2 ||| ||| " 1,1 % ||| 1.1 % ||| 0.0022504 0.00241746 0.103519 0.875731 ||| 1-0 2-1 ||| 46 1 1 ||| ||| " 1,1 milliard d' euros ||| EUR 1.1 billion ||| 0.00544835 6.98053e-05 0.0517593 0.110019 ||| 3-0 4-0 1-1 2-1 2-2 ||| 19 2 1 ||| ||| " 1,1 milliard d' euros ||| by EUR 1.1 billion ||| 0.0345062 6.98053e-05 0.0517593 0.000791519 ||| 3-1 4-1 1-2 2-2 2-3 ||| 3 2 1 ||| ||| Le 24/09/2015 09:54, Felipe Sánchez Martínez a écrit : Hi, This is quite common. If you look at the scores, they are pretty low when they do not make sense, so, even though they are in the phrase table, most probably they will never be used for translation. I would not bother. Cheers -- Felipe El 23/09/15 a las 16:50, Vincent Nguyen escribió: I agree and would like to. But this is tricky, look at the first 30 lines of my phrase table below. and this happens a lot in the first line of tables where there are &apos or weird codes, EN/FR pairs do not match. ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413 0.401758 ||| 0-0 1-1 2-2 3-3 ||| 1 1 1 ||| ||| ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246 ||| 0-0 1-0 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| ||| ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 ||| 0-0 1-1 2-2 ||| 10 7 6 ||| ||| ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733 4.50635e-05 ||| 0-1 1-2 2-3 ||| 2 7 1 ||| ||| ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413 0.00192967 ||| 0-0 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 ||| 0-0 1-1 2-2 ||| 1 1 1 ||| ||| ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 1-1 ||| 16 13 10 ||| ||| ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 ||| 0-0 1-0 ||| 2.21954e+06 13 1 ||| ||| ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 5.66022e-05 ||| 0-1 1-2 ||| 2 13 1 ||| ||| ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 0.000130572 ||| 0-1 1-2 ||| 1 13 1 ||| ||| ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 0.103413 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 0.103413 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 0.0035893 ||| 2-0 ||| 9436 1 1 ||| ||| ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 0.472387 ||| 0-0 1-1 ||| 1 1 1 ||| ||| ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 0.0517067 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 0.0517067 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 0.0344711 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| ! & quot ||| ! " ||| 0.00218918 8.30626e-09 0.339323 0.518419 ||| 0-0 2-1 ||| 465 3 2 ||| ||| ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 0.796143 ||| 0-0 ||| 15870 1 1 ||| ||| ! ' ] , addressed ||| ! " adressé ||| 0.103413 3.70838e-07 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 0.103413 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 0.103413 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| ! ' ' Alstom shares ||| l' on constate un dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 1.03361e-14 ||| 1-0 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| ! ' ' ||| l' on constate un ||| 0.0147733 1.56906e-11 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| ! ' ' ||| l' on constate ||| 0.000984889 1.56906e-11 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| ! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 0.0129267 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| ||| ! ' ' ||| ou que l' on constate ||| 0.0344711 1.56906e-11 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| ||| ! ' ' ||| ou que l' on ||| 0.00304157 1.56906e-11 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| ||| ! ' ' ||| que l' on constate un ||| 0.0344711 1.56906e-11 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| ||| ! ' ' ||| que l' on constate ||| 0.00323167 1.56906e-11 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| ||| Le 23/09/2015 15:12, Tom Hoar a écrit : Vincent, If you suspect bad entries, isn't it better to address the root of the problem and prepare your training corpus better? On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu wrote: Date: Tue, 22 Sep 2015 20:24:02 +0200 From: Philipp Koehn Subject: Re: [Moses-support] is there a way to remove a bad entry in the phrase table ? To: Vincent Nguyen Cc: moses-support Hi, you can remove it manually (just edit the text file), there will be no negative consequences. However, it is not a realistic strategy to try to remove by hand every offending phrase
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
||| 0.103519 0.25 0.00398148 5.61e-05 ||| 0-0 1-0 ||| > 1 26 1 ||| ||| > " 1 ||| " 1 ||| 0.503492 0.361595 0.11619 0.187815 ||| 0-0 1-1 > ||| 6 26 4 ||| ||| > " 1 ||| 1 ||| 0.0010136 0.00278649 0.461538 0.805151 ||| 1-0 ||| > 11839 26 12 ||| ||| > *" 1 ||| One Million Roofs ||| 0.103519 0.00213892 0.00398148 > 3.32314e-15 ||| 0-0 1-0 0-1 0-2 ||| 1 26 1 ||| |||* > " 1 ||| hardly 1 ||| 0.0258796 0.00278649 0.00398148 1.73108e-05 ||| > 1-1 ||| 4 26 1 ||| ||| > " 1 ||| million solar ||| 0.0345062 3.55949e-06 0.00398148 > 3.29783e-09 ||| 1-0 ||| 3 26 1 ||| ||| > " 1 ||| million ||| 5.83433e-06 3.55949e-06 0.00398148 0.0019399 ||| > 1-0 ||| 17743 26 1 ||| ||| > " 1 ||| of 1 ||| 0.000263406 0.00278649 0.00398148 0.0270917 ||| 1-1 > ||| 393 26 1 ||| ||| > " 1 ||| one ||| 1.32368e-05 5.22671e-06 0.0391025 0.0141179 ||| 1-0 > ||| 76806 26 2 ||| ||| > " 1,1 % ||| 1.1 % ||| 0.0022504 0.00241746 0.103519 0.875731 ||| 1-0 > 2-1 ||| 46 1 1 ||| ||| > " 1,1 milliard d' euros ||| EUR 1.1 billion ||| 0.00544835 > 6.98053e-05 0.0517593 0.110019 ||| 3-0 4-0 1-1 2-1 2-2 ||| 19 2 1 ||| ||| > " 1,1 milliard d' euros ||| by EUR 1.1 billion ||| 0.0345062 > 6.98053e-05 0.0517593 0.000791519 ||| 3-1 4-1 1-2 2-2 2-3 ||| 3 2 1 ||| ||| > > > > Le 24/09/2015 09:54, Felipe Sánchez Martínez a écrit : > > Hi, > > This is quite common. If you look at the scores, they are pretty low when > they do not make sense, so, even though they are in the phrase table, most > probably they will never be used for translation. I would not bother. > > Cheers > -- > Felipe > > El 23/09/15 a las 16:50, Vincent Nguyen escribió: > > I agree and would like to. > But this is tricky, look at the first 30 lines of my phrase table below. > > and this happens a lot in the first line of tables where there are &apos > or weird codes, EN/FR pairs do not match. > > > > > ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413 0.401758 ||| 0-0 1-1 > 2-2 3-3 ||| 1 1 1 ||| ||| > ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246 ||| 0-0 1-0 > 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| ||| > ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 ||| 0-0 1-1 2-2 > ||| 10 7 6 ||| ||| > ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733 4.50635e-05 ||| > 0-1 1-2 2-3 ||| 2 7 1 ||| ||| > ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413 0.00192967 ||| 0-0 > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 ||| 0-0 1-1 2-2 > ||| 1 1 1 ||| ||| > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 1-1 ||| 16 13 > 10 ||| ||| > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 ||| 0-0 1-0 > ||| 2.21954e+06 13 1 ||| ||| > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 5.66022e-05 ||| 0-1 > 1-2 ||| 2 13 1 ||| ||| > ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 0.000130572 ||| > 0-1 1-2 ||| 1 13 1 ||| ||| > ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 0.103413 > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| > ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 0.103413 > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| > ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 0.0035893 ||| > 2-0 ||| 9436 1 1 ||| ||| > ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 0.472387 ||| 0-0 1-1 > ||| 1 1 1 ||| ||| > ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 0.0517067 > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| > ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 0.0517067 > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| > ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 0.0344711 > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| > ! & quot ||| ! " ||| 0.00218918 8.30626e-09 0.339323 0.518419 > ||| 0-0 2-1 ||| 465 3 2 ||| ||| > ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 0.796143 ||| 0-0 ||| > 15870 1 1 ||| ||| > ! ' ] , addressed ||| ! " adressé ||| 0.103413 3.70838e-07 > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| > ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 0.103413 > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 0.103413 > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > ! ' ' Alstom shares ||| l' on constate un > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 1.03361e-14 ||| 1-0 > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| > ! ' ' ||| l' on constate un ||| 0.0147733 1.56906e-11 > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| > ! ' ' ||| l' on constate ||| 0.000984889 1.56906e-11 > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| > ! ' ' ||| l' on ||| 6.76656e-0
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Hi, This is quite common. If you look at the scores, they are pretty low when they do not make sense, so, even though they are in the phrase table, most probably they will never be used for translation. I would not bother. Cheers -- Felipe El 23/09/15 a las 16:50, Vincent Nguyen escribió: > I agree and would like to. > But this is tricky, look at the first 30 lines of my phrase table below. > > and this happens a lot in the first line of tables where there are &apos > or weird codes, EN/FR pairs do not match. > > > > > ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413 0.401758 ||| 0-0 1-1 > 2-2 3-3 ||| 1 1 1 ||| ||| > ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246 ||| 0-0 1-0 > 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| ||| > ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 ||| 0-0 1-1 2-2 > ||| 10 7 6 ||| ||| > ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733 4.50635e-05 ||| > 0-1 1-2 2-3 ||| 2 7 1 ||| ||| > ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413 0.00192967 ||| 0-0 > 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| > ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 ||| 0-0 1-1 2-2 > ||| 1 1 1 ||| ||| > ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 1-1 ||| 16 13 > 10 ||| ||| > ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 ||| 0-0 1-0 > ||| 2.21954e+06 13 1 ||| ||| > ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 5.66022e-05 ||| 0-1 > 1-2 ||| 2 13 1 ||| ||| > ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 0.000130572 ||| > 0-1 1-2 ||| 1 13 1 ||| ||| > ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 0.103413 > 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| > ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 0.103413 > 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| > ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 0.0035893 ||| > 2-0 ||| 9436 1 1 ||| ||| > ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 0.472387 ||| 0-0 1-1 > ||| 1 1 1 ||| ||| > ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 0.0517067 > 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| > ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 0.0517067 > 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| > ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 0.0344711 > 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| > ! & quot ||| ! " ||| 0.00218918 8.30626e-09 0.339323 0.518419 > ||| 0-0 2-1 ||| 465 3 2 ||| ||| > ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 0.796143 ||| 0-0 ||| > 15870 1 1 ||| ||| > ! ' ] , addressed ||| ! " adressé ||| 0.103413 3.70838e-07 > 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| > ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 0.103413 > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 0.103413 > 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| > ! ' ' Alstom shares ||| l' on constate un > dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 1.03361e-14 ||| 1-0 > 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| > ! ' ' ||| l' on constate un ||| 0.0147733 1.56906e-11 > 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| > ! ' ' ||| l' on constate ||| 0.000984889 1.56906e-11 > 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| > ! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 0.0129267 > 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| ||| > ! ' ' ||| ou que l' on constate ||| 0.0344711 1.56906e-11 > 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| ||| > ! ' ' ||| ou que l' on ||| 0.00304157 1.56906e-11 > 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| ||| > ! ' ' ||| que l' on constate un ||| 0.0344711 1.56906e-11 > 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| ||| > ! ' ' ||| que l' on constate ||| 0.00323167 1.56906e-11 > 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| ||| > > > > Le 23/09/2015 15:12, Tom Hoar a écrit : >> Vincent, >> >> If you suspect bad entries, isn't it better to address the root of the >> problem and prepare your training corpus better? >> >> >> On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu wrote: >>> Date: Tue, 22 Sep 2015 20:24:02 +0200 >>> From: Philipp Koehn >>> Subject: Re: [Moses-support] is there a way to remove a bad entry in >>> the phrase table ? >>> To: Vincent Nguyen >>> Cc: moses-support >>> >>> Hi, >>> >>> you can remove it manually (just edit the text file), there will be no >>> negative consequences. >>> >>> However, it is not a realistic strategy to try to remove by hand every >>> offending
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
I agree and would like to. But this is tricky, look at the first 30 lines of my phrase table below. and this happens a lot in the first line of tables where there are &apos or weird codes, EN/FR pairs do not match. ! ! ! ! ||| ! ! ! ! ||| 0.103413 0.132185 0.103413 0.401758 ||| 0-0 1-1 2-2 3-3 ||| 1 1 1 ||| ||| ! ! ! ) ||| ! ! ! ) ||| 0.339323 0.167884 0.508985 0.4246 ||| 0-0 1-0 2-0 2-1 2-2 3-3 ||| 3 2 2 ||| ||| ! ! ! ||| ! ! ! ||| 0.501834 0.219223 0.716905 0.50463 ||| 0-0 1-1 2-2 ||| 10 7 6 ||| ||| ! ! ! ||| budget ! ! ! ||| 0.0517067 0.219223 0.0147733 4.50635e-05 ||| 0-1 1-2 2-3 ||| 2 7 1 ||| ||| ! ! ) , ||| ! ! ) - , ||| 0.103413 0.111989 0.103413 0.00192967 ||| 0-0 1-1 2-2 3-3 3-4 ||| 1 1 1 ||| ||| ! ! ) ||| ! ! ) ||| 0.103413 0.278429 0.103413 0.533321 ||| 0-0 1-1 2-2 ||| 1 1 1 ||| ||| ! ! ||| ! ! ||| 0.625 0.363573 0.769231 0.633844 ||| 0-0 1-1 ||| 16 13 10 ||| ||| ! ! ||| . ||| 4.65922e-08 6.71089e-07 0.00795487 0.140779 ||| 0-0 1-0 ||| 2.21954e+06 13 1 ||| ||| ! ! ||| budget ! ! ||| 0.0517067 0.363573 0.00795487 5.66022e-05 ||| 0-1 1-2 ||| 2 13 1 ||| ||| ! ! ||| nécessaire ! ! ||| 0.103413 0.363573 0.00795487 0.000130572 ||| 0-1 1-2 ||| 1 13 1 ||| ||| ! [ never again ! ||| ! ||| 6.51628e-06 5.42074e-13 0.103413 0.796143 ||| 0-0 4-0 ||| 15870 1 1 ||| ||| ! ] this is ||| tel est ||| 7.38667e-05 9.16191e-11 0.103413 0.00147917 ||| 2-0 3-1 ||| 1400 1 1 ||| ||| ! ] this ||| tel ||| 1.09594e-05 1.44188e-10 0.103413 0.0035893 ||| 2-0 ||| 9436 1 1 ||| ||| ! ] ||| ! ] ||| 0.103413 0.352335 0.103413 0.472387 ||| 0-0 1-1 ||| 1 1 1 ||| ||| ! & quot ; ||| ! " . et ||| 0.0517067 2.36396e-12 0.0517067 1.88268e-05 ||| 0-0 1-1 2-1 3-3 ||| 2 2 1 ||| ||| ! & quot ; ||| ! " ||| 0.000222394 1.44515e-11 0.0517067 0.518419 ||| 0-0 2-1 ||| 465 2 1 ||| ||| ! & quot ||| ! " . ||| 0.000662906 8.30626e-09 0.0344711 0.00232791 ||| 0-0 1-1 2-1 ||| 156 3 1 ||| ||| ! & quot ||| ! " ||| 0.00218918 8.30626e-09 0.339323 0.518419 ||| 0-0 2-1 ||| 465 3 2 ||| ||| ! & ||| ! ||| 6.51628e-06 7.21755e-05 0.103413 0.796143 ||| 0-0 ||| 15870 1 1 ||| ||| ! ' ] , addressed ||| ! " adressé ||| 0.103413 3.70838e-07 0.103413 0.00596848 ||| 0-0 1-1 2-1 4-2 ||| 1 1 1 ||| ||| ! ' ] , ||| ! " ||| 0.000222394 2.49698e-06 0.103413 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| ! ' ] ||| ! " ||| 0.000222394 3.57128e-05 0.103413 0.215573 ||| 0-0 1-1 2-1 ||| 465 1 1 ||| ||| ! ' ' Alstom shares ||| l' on constate un dysfonctionnement ||| 0.0344711 5.62605e-16 0.103413 1.03361e-14 ||| 1-0 2-0 1-1 3-4 4-4 ||| 3 1 1 ||| ||| ! ' ' ||| l' on constate un ||| 0.0147733 1.56906e-11 0.0129267 2.2766e-12 ||| 1-0 2-0 1-1 ||| 7 8 1 ||| ||| ! ' ' ||| l' on constate ||| 0.000984889 1.56906e-11 0.0129267 2.36929e-10 ||| 1-0 2-0 1-1 ||| 105 8 1 ||| ||| ! ' ' ||| l' on ||| 6.76656e-06 1.56906e-11 0.0129267 6.18613e-06 ||| 1-0 2-0 1-1 ||| 15283 8 1 ||| ||| ! ' ' ||| ou que l' on constate ||| 0.0344711 1.56906e-11 0.0129267 4.69534e-15 ||| 1-2 2-2 1-3 ||| 3 8 1 ||| ||| ! ' ' ||| ou que l' on ||| 0.00304157 1.56906e-11 0.0129267 1.22594e-10 ||| 1-2 2-2 1-3 ||| 34 8 1 ||| ||| ! ' ' ||| que l' on constate un ||| 0.0344711 1.56906e-11 0.0129267 4.56092e-14 ||| 1-1 2-1 1-2 ||| 3 8 1 ||| ||| ! ' ' ||| que l' on constate ||| 0.00323167 1.56906e-11 0.0129267 4.74661e-12 ||| 1-1 2-1 1-2 ||| 32 8 1 ||| ||| Le 23/09/2015 15:12, Tom Hoar a écrit : Vincent, If you suspect bad entries, isn't it better to address the root of the problem and prepare your training corpus better? On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu wrote: Date: Tue, 22 Sep 2015 20:24:02 +0200 From: Philipp Koehn Subject: Re: [Moses-support] is there a way to remove a bad entry in the phrase table ? To: Vincent Nguyen Cc: moses-support Hi, you can remove it manually (just edit the text file), there will be no negative consequences. However, it is not a realistic strategy to try to remove by hand every offending phrase table entry. -phi On Tue, Sep 22, 2015 at 4:05 PM, Vincent Nguyen wrote: >Hi, > >I was wondering if after an analysis of the BLEU-Annotation file we >realize that there must be a bad entry in the phrase table, >we could remove it manually or in some other ways ? > >Gracias. >V. >___ >Moses-support mailing list >Moses-support@mit.edu >http://mailman.mit.edu/mailman/listinfo/moses-support > -- Best regards, Tom Hoar Chief Executive Officer /*Precision Translation Tools Pte Ltd*/ Singapore/Thailand Web: www.precisiontranslationtools.com <http://www.precisiontranslationtools.com> Thailand Mobile: +66 87 345-1875 Skype: tahoar ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Vincent, If you suspect bad entries, isn't it better to address the root of the problem and prepare your training corpus better? On 9/23/2015 6:46 PM, moses-support-requ...@mit.edu wrote: Date: Tue, 22 Sep 2015 20:24:02 +0200 From: Philipp Koehn Subject: Re: [Moses-support] is there a way to remove a bad entry in the phrase table ? To: Vincent Nguyen Cc: moses-support Hi, you can remove it manually (just edit the text file), there will be no negative consequences. However, it is not a realistic strategy to try to remove by hand every offending phrase table entry. -phi On Tue, Sep 22, 2015 at 4:05 PM, Vincent Nguyen wrote: >Hi, > >I was wondering if after an analysis of the BLEU-Annotation file we >realize that there must be a bad entry in the phrase table, >we could remove it manually or in some other ways ? > >Gracias. >V. >___ >Moses-support mailing list >Moses-support@mit.edu >http://mailman.mit.edu/mailman/listinfo/moses-support > -- Best regards, Tom Hoar Chief Executive Officer /*Precision Translation Tools Pte Ltd*/ Singapore/Thailand Web: www.precisiontranslationtools.com <http://www.precisiontranslationtools.com> Thailand Mobile: +66 87 345-1875 Skype: tahoar ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] is there a way to remove a bad entry in the phrase table ?
Hi, you can remove it manually (just edit the text file), there will be no negative consequences. However, it is not a realistic strategy to try to remove by hand every offending phrase table entry. -phi On Tue, Sep 22, 2015 at 4:05 PM, Vincent Nguyen wrote: > Hi, > > I was wondering if after an analysis of the BLEU-Annotation file we > realize that there must be a bad entry in the phrase table, > we could remove it manually or in some other ways ? > > Gracias. > V. > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support