Joachin

for the record, at [1] there is some progress on adding a stemmer/analyser to 
DBPedia Spotlight for Slovak

feedback/corrections/comments very welcome

I did some initial tests as outlined in the pull request, and I have been able 
to tag some text - now I need to try adding manual mappings and tune the tool 
to see if I can get more annotations and more precise. However first results 
seem very promising.

thank you for quick hint HOWTO below and help on this matter

best regards

Alberto

[1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/pull/184

On 17 May 2013, at 13:31, Joachim Daiber <[email protected]> wrote:

> Hey Alberto,
> 
> we started indexing for Czech using [2] but did not finish it because of the 
> missing stemmer. At the moment we only use snowball stemmers via Lucene but 
> we would be more than happy for contributions.
> 
> For [2], Stemmers need to implement org.dbpedia.spotlight.db.model.Stemmer
> To make it general enough, you could add a companion object Stemmer, e.g.
> 
>> object Stemmer {
>> 
>>   def forLanguage(lang: String): Stemmer = {
>>       if (lang equals "cs")
>>          your stemmer
>>       else
>>         default snowball stemmer
>>   }
>> 
>> }
>> 
> 
> Stemmer.forLanguage would have to be added to 
> org.dbpedia.spotlight.db.CreateSpotlightModel and 
> org.dbpedia.spotlight.db.SpotlightModel as well as to the pignlproc scripts 
> that do the counting in pignlproc.index.GetCounts* 
> 
> In any case, using [2] and [1] should be the easiest method and give the best 
> results.
> 
> [1] https://github.com/jodaiber/model-quickstarter
> 
> Best,
> Joachim
> 
> 
> Am 17.05.2013 um 14:00 schrieb Alberto Reggiori:
> 
>> 
>> Hi all
>> 
>> I am in the process of trying out a customised setup of DBPedia-Spotlight in 
>> Slovak following the instructions at [1][2][3] possibly configuring/adding 
>> custom a tokenisers/stemmers [4][5] (and in parallel perhaps looking at 
>> defining the necessary DBPedia infobox mappings).
>> 
>> Before I duplicate any work, I am wondering if anyone on this list has been 
>> playing with Slavonic languages, such Slovak and Czech etc. - and if they is 
>> any public available work/project out there.
>> 
>> Thank you very much in advance for any follow up
>> 
>> 
>> Best regards
>> 
>> Alberto
>> 
>> [1] 
>> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(Lucene-backed-core)
>> [2] 
>> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(DB-backed-core)
>> [3] 
>> https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization
>> [4] 
>> http://vi.ikt.ui.sav.sk/Projekty/Projekty_2008%2F%2F2009/Hana_Pifková_-_Stemer
>> [5] http://www.languagetool.org/languages/
>> ------------------------------------------------------------------------------
>> AlienVault Unified Security Management (USM) platform delivers complete
>> security visibility with the essential security capabilities. Easily and
>> efficiently configure, manage, and operate all of your security controls
>> from a single console and one unified framework. Download a free trial.
>> http://p.sf.net/sfu/alienvault_d2d
>> _______________________________________________
>> Dbp-spotlight-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
> 

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to