Re: [Dbp-spotlight-users] Internazionalization problems with Basque

[email protected] Tue, 04 Feb 2014 04:02:20 -0800

Hello again

Joachim Daiber said that "If you do not provide the models to thetraining, the statistical backend will learn a dictionary-based spottingmodel." If we give the spot to the system, it isnt neccesary to buildthe OpenNLP models for spotting?

And the statistical disambiguation step will not be affected at all? Oneof the probabilities used in disambiguation is context based. So it willuse the OpenNLP models to tokenize ...

Knowing this, the disambiguation step will be also dictionary-based?

We think that in the end, it will be a light version for Basque, withoutthe context knowledge.


thanks in advance ;)

ander


az., 2014.eko urtren 29a 22:07(e)an, Joachim Daiber(e)k idatzi zuen:

Hi Ander,

the statistical backend currently only supports OpenNLP models. Thisis simply because they were readily available. So from my point ofview there are 2 things you can do:

1. change Spotlight to additionally accept your tool (assuming it'sJVM based)

2. retrain your models with OpenNLP

But regardless, you do not need those necessarily. If you do notprovide the models to the training, the statistical backend will learna dictionary-based spotting model. Depending on the size of theWikipedia input, this should work equally well (if the Wikipedia istoo small, it might be a bit sparse).


Hope that helps,
Jo

On Wed, Jan 29, 2014 at 3:11 PM, [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> wrote:


    Hi spotlight users,

    Our main idea is to apply NED in basque documents, for this
    proposal, we
    want to use the dbpedia spotlight statistical backend system.

    We want to create a Spotlight model for Basque language, but we have a
    "little" problem. We have seen that there isn't any openNLP model for
    Basque. We have all the resources such as tokenizer, chuncker, POS
    tagger, stopwords... but not any of the openNLP pre-trained models for
    this language.

    Our questions are:

    Is there any other way to use this resources instead of using openNLP
    models? For example, integrating our resources in the system code and
    giving the output to dbpedia spotlight system (without openNLP
    models).
    Does someone done something like this before?
    Or
    Do we need to build an openNLP model compulsorily?

    thanks in advance,

    Ander


    
------------------------------------------------------------------------------
    WatchGuard Dimension instantly turns raw network data into actionable
    security intelligence. It gives you real-time visual feedback on key
    security issues and trends.  Skip the complicated setup - simply
    import
    a virtual appliance and go from zero to informed in seconds.
    http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
    _______________________________________________
    Dbp-spotlight-users mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] Internazionalization problems with Basque

Reply via email to