Re: Steps required for adding support for another language

Rafa Haro Sun, 14 Apr 2019 13:30:46 -0700

By the way, in your case, you shouldn't be using opennlp ner engine, you
should be using directly opennlp chunking and EntityLinking engine (no
named Entity Linking)


El El dom, 14 abr 2019 a las 22:27, Rafa Haro <[email protected]> escribió:

> Yeah, ideally you will have to train open nlp models for Polish. But for
> testing, you can force opennlp engines to use the models for a specific
> language (English normally). I would swear you can do that directly in the
> engines configuration through Felix console. The content will be processed
> as English and open nlp will be doing its best, but for languages with a
> similar sintaxis sometimes is enough for, at least, getting chunks with
> candidate tokens.
>
> Hope that helps
>
> PD: just for curiosity, because I don't remember it right now and I won't
> have a laptop by hand in some days....which are the engines involve in the
> fst-linking chain?
>
> El El dom, 14 abr 2019 a las 21:54, Grzegorz Trzeciak <[email protected]>
> escribió:
>
>> OK I've found the chain that at least captures some dbpedia entities:
>> dbpedia-fst-linking
>> I will be playing with varous engine combinations to see what can get me
>> through the POC the best which leaves me with question about the more
>> permanent solution.
>>
>> My understanding is that this would require building language model for
>> opennlp, is it correct? Are there other requirements for adding language
>> support? I am trying to estimate work effort required for such task so any
>> advice will be helpful.
>>
>> Also if you are aware of any resources that could be helpful, that would
>> be great.
>>
>> Thank you
>>
>> G.
>>
>> niedz., 14 kwi 2019 o 21:10 Grzegorz Trzeciak <[email protected]>
>> napisał(a):
>>
>>> using default chain:
>>>
>>>    - *tika* ( optional , TikaEngine)
>>>    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
>>>    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
>>>    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
>>>    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
>>>    - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
>>>    - *dbpediaLinking* ( required , NamedEntityTaggingEngine)
>>>    - *entityhubExtraction* ( required , EntityLinkingEngine)
>>>    - *dbpedia-dereference* ( required , EntityDereferenceEngine)
>>>
>>>
>>> I will try disabling langdetect then.
>>>
>>> niedz., 14 kwi 2019 o 21:08 Rafa Haro <[email protected]> napisał(a):
>>>
>>>> Hi Grzergorz,
>>>>
>>>> Can you provide details about your enhancement chain?. Probably you can
>>>> try
>>>> by disabling language detection and forcing English as language for the
>>>> whole chain
>>>>
>>>> El El dom, 14 abr 2019 a las 20:52, Grzegorz Trzeciak <
>>>> [email protected]>
>>>> escribió:
>>>>
>>>> > I need to provide a proof of concept for a customer using Stanbol
>>>> enhancer
>>>> > but the POC needs to be in Polish, only now I realised there is no
>>>> support
>>>> > for Polish in Stanbol (other than language recognition). At the moment
>>>> > running the enhancer on a text only returns the recognized language,
>>>> so my
>>>> > question is twofold:
>>>> >
>>>> > 1. Is there a quick and dirty way of making Stanbol work with Polish
>>>> > language (for POC only)
>>>> > 2. What are the steps necessary to implement the correct solution of
>>>> > supporting another language
>>>> >
>>>> > Thanks
>>>> >
>>>> > Grzegorz Trzeciak
>>>> >
>>>>
>>>

Re: Steps required for adding support for another language

Reply via email to