Re: OpenNLP maxent model trained with wrong encoding

Richard Eckart de Castilho Wed, 02 Mar 2016 00:51:48 -0800

Hi again,

the Spanish and Dutch NER models are also affected, was just a bit more 
difficult to figure out because the models internally lower-case the features.


Cheers,

-- Richard

> On 01.03.2016, at 23:13, Richard Eckart de Castilho <[email protected]> wrote:
> 
> Hi all,
> 
> I noticed that the OpenNLP German POS Tagger maxent model available from 
> Sourceforge has been trained using the wrong encoding setting. Apparently the 
> input data was UTF-8, but it was read as ISO8859-1. The perceptron model is 
> not affected. I only examined NER and POS models, not tokenizer or sentence 
> splitter models.
> 
> Best,
> 
> -- Richard

Re: OpenNLP maxent model trained with wrong encoding

Reply via email to