Re: Steps required for adding support for another language

Grzegorz Trzeciak Sun, 14 Apr 2019 13:33:15 -0700

That would explain why  dbpedia-fst-linking worked, here is the list of
engines (opennlp-chunker instead of opennlp-ner in default)


   - *tika* ( optional , TikaEngine)
   - *langdetect* ( required , LanguageDetectionEnhancementEngine)
   - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
   - *opennlp-token* ( required , OpenNlpTokenizerEngine)
   - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
   - *opennlp-chunker* ( required , OpenNlpChunkingEngine)
   - *dbpedia-fst* ( required , FstLinkingEngine)
   - *dbpedia-dereference* ( required , EntityDereferenceEngine)


niedz., 14 kwi 2019 o 22:30 Rafa Haro <rh...@apache.org> napisał(a):

> By the way, in your case, you shouldn't be using opennlp ner engine, you
> should be using directly opennlp chunking and EntityLinking engine (no
> named Entity Linking)
>
> El El dom, 14 abr 2019 a las 22:27, Rafa Haro <rh...@apache.org> escribió:
>
> > Yeah, ideally you will have to train open nlp models for Polish. But for
> > testing, you can force opennlp engines to use the models for a specific
> > language (English normally). I would swear you can do that directly in
> the
> > engines configuration through Felix console. The content will be
> processed
> > as English and open nlp will be doing its best, but for languages with a
> > similar sintaxis sometimes is enough for, at least, getting chunks with
> > candidate tokens.
> >
> > Hope that helps
> >
> > PD: just for curiosity, because I don't remember it right now and I won't
> > have a laptop by hand in some days....which are the engines involve in
> the
> > fst-linking chain?
> >
> > El El dom, 14 abr 2019 a las 21:54, Grzegorz Trzeciak <
> gtrzec...@gmail.com>
> > escribió:
> >
> >> OK I've found the chain that at least captures some dbpedia entities:
> >> dbpedia-fst-linking
> >> I will be playing with varous engine combinations to see what can get me
> >> through the POC the best which leaves me with question about the more
> >> permanent solution.
> >>
> >> My understanding is that this would require building language model for
> >> opennlp, is it correct? Are there other requirements for adding language
> >> support? I am trying to estimate work effort required for such task so
> any
> >> advice will be helpful.
> >>
> >> Also if you are aware of any resources that could be helpful, that would
> >> be great.
> >>
> >> Thank you
> >>
> >> G.
> >>
> >> niedz., 14 kwi 2019 o 21:10 Grzegorz Trzeciak <gtrzec...@gmail.com>
> >> napisał(a):
> >>
> >>> using default chain:
> >>>
> >>>    - *tika* ( optional , TikaEngine)
> >>>    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
> >>>    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
> >>>    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
> >>>    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
> >>>    - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
> >>>    - *dbpediaLinking* ( required , NamedEntityTaggingEngine)
> >>>    - *entityhubExtraction* ( required , EntityLinkingEngine)
> >>>    - *dbpedia-dereference* ( required , EntityDereferenceEngine)
> >>>
> >>>
> >>> I will try disabling langdetect then.
> >>>
> >>> niedz., 14 kwi 2019 o 21:08 Rafa Haro <rh...@apache.org> napisał(a):
> >>>
> >>>> Hi Grzergorz,
> >>>>
> >>>> Can you provide details about your enhancement chain?. Probably you
> can
> >>>> try
> >>>> by disabling language detection and forcing English as language for
> the
> >>>> whole chain
> >>>>
> >>>> El El dom, 14 abr 2019 a las 20:52, Grzegorz Trzeciak <
> >>>> gtrzec...@gmail.com>
> >>>> escribió:
> >>>>
> >>>> > I need to provide a proof of concept for a customer using Stanbol
> >>>> enhancer
> >>>> > but the POC needs to be in Polish, only now I realised there is no
> >>>> support
> >>>> > for Polish in Stanbol (other than language recognition). At the
> moment
> >>>> > running the enhancer on a text only returns the recognized language,
> >>>> so my
> >>>> > question is twofold:
> >>>> >
> >>>> > 1. Is there a quick and dirty way of making Stanbol work with Polish
> >>>> > language (for POC only)
> >>>> > 2. What are the steps necessary to implement the correct solution of
> >>>> > supporting another language
> >>>> >
> >>>> > Thanks
> >>>> >
> >>>> > Grzegorz Trzeciak
> >>>> >
> >>>>
> >>>
>

Re: Steps required for adding support for another language

Reply via email to