Re: question on working with custom vocabularies

Rafa Haro Mon, 11 Jan 2016 10:44:59 -0800

Hi Kata,

Have you overwritten the old solr index in the datafiles folder or have you
started from the scratch after fixing the encoding of the RDF files?


Just a hint: you can check if your entities have been indexed by querying
then with the EntityHub API at Stanbol Web interface

Hope that helps,
Rafa

On Mon, Jan 11, 2016 at 7:19 PM Lejtovicz, Katalin <
katalin.lejtov...@oeaw.ac.at> wrote:

> Dear All,
>
> I have some problem with using custom vocabularies to enhance my content.
> I created an index with Stanbol from a vocabulary, deployed the .jar file
> and copied the solr index file to the datafiles folder, and created an
> EntityHub Linking Engine, plus a weighted chain, where the following
> pipeline was configured: langdetect, opennlp-sentence, opennlp-token,
> opennlp-pos, opennlp-chunker, and the an EntityHub Linking Engine for my
> custom vocab.
>
> It worked fine, when text was pasted in this enhancement chain in the user
> interface of Stanbol, entities were found. However we had an encoding
> problem in our RDF resource from which the index was built, so entities
> with umlaut (eg. ö, ä) were not found. We corrected the encoding of the RDF
> and I ran the indexing process again with the same config files, but with
> the new RDF resource.
> I again deployed (.jar and solr zip), and created the entityhub Linking
> Engine, plus the same Weighted Chain as above specified.
> Now I don't get any results, when I paste text in the text field of this
> chain in Stanbol.
>
> I configured log files, so that I can see what is happening. The linkable,
> matchable tokens, etc. are defined correctly eg. 'Berlin' in the sentence
> 'Berlin is a big city' is defined as linkable token:
>
> 11.01.2016 16:14:05.667 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.SectionData     -
> TokenData: 'Berlin'[linkable=true(linkabkePos=true)|
> matchable=true(matchablePos=true)| alpha=true| seachLength=true|
> upperCase=true]
>
> Also it is sent to the solr index, but from there, no results come back:
> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker ---
> preocess Token 0: Berlin (lemma: null) linkable=true, matchable=true |
> chunk: Chunk: [0, 6] Berlin
> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     -
> 1:'is' (lemma: null) linkable=false, matchable=false
> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     -
> 2:'a' (lemma: null) linkable=false, matchable=false
> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >>
> searchStrings [Berlin]
> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    >
> request entities [0-20] entities ...
> 11.01.2016 16:14:05.669 *DEBUG* [Thread-9]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker       <
> found 0 entities ...
>
> I also looked at the solr.log, the query looks like this:
> (((@en\/rdfs\:label\/:"Berlin")) OR ((@\/rdfs\:label\/:"Berlin")))
> hits=0 status=0 QTime=1
>
>
> I installed solr and copied the index file over to execute the above
> query. It does not result any Solr Documents, but the following one does:
> (((_\!@en\/rdfs\:label\/:" Berlin ")) OR ((_\!@\/rdfs\:label\/:" Berlin
> ")))
>
> Can someone help me, what I am missing?
> Is it a configuration issue when I am creating the index? (Strange is,
> that I used the same config files for the incorrectly encoded RDF resource
> file, an that index worked.)
> Or is it a Stanbol issue?
>
> Thanks for any hints/help!
>
> Best regards,
> Kata
>
>

Re: question on working with custom vocabularies

Reply via email to