I am training my decoder on a string-to-tree model, ENglish to arabic. I am
using these options : -hierarchical -glue-grammar -max-phrase-length 5
-ghkm --extract-options="--UnknownWordMinRelFreq 0.01 --MaxNodes 40
--MaxRuleDepth 7 --MaxRuleSize 7 --AllowUnary"
--score-options="--GoodTuring --LowCountFeature --MinCountHierarchical 2
--MinScore 2:0.0001"

I am planning on filtering them manually but can you point out if I might
be doing something wrong or which option is doing this? Could duplicate
sentences cause this ?
Maybe I need to filter my training set

On Fri, Jul 1, 2016 at 7:31 PM, Hieu Hoang <hieuho...@gmail.com> wrote:

> if shouldn't. How did you create it? can you give an example
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 1 July 2016 at 18:29, Ayah El Maghraby <ayah.elmaghr...@gmail.com>
> wrote:
>
>> I mean also the file rule-table.gz contains duplicate entries
>> On Fri, Jul 1, 2016 at 7:27 PM Hieu Hoang <hieuho...@gmail.com> wrote:
>>
>>> the extract.sorted.gz file is not your rule table.
>>>
>>> the training should create another file called phrase-table.*gz. This is
>>> your rule table
>>> On 30/06/2016 22:55, Ayah El Maghraby wrote:
>>>
>>> Hello,
>>> I have duplicate entries in my rule table, extract.sorted.gz files. I am
>>> training using a data set of of size around 1000,000 lines
>>> I am translating from english to arabic. Is this normal ?
>>> Will removing duplicates affect my decoder ?
>>>
>>> Regards,
>>> Ayah
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing 
>>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to