question on working with custom vocabularies

Lejtovicz, Katalin Mon, 11 Jan 2016 10:19:16 -0800

Dear All,

I have some problem with using custom vocabularies to enhance my content.
I created an index with Stanbol from a vocabulary, deployed the .jar file and 
copied the solr index file to the datafiles folder, and created an EntityHub 
Linking Engine, plus a weighted chain, where the following pipeline was 
configured: langdetect, opennlp-sentence, opennlp-token, opennlp-pos, 
opennlp-chunker, and the an EntityHub Linking Engine for my custom vocab.


It worked fine, when text was pasted in this enhancement chain in the user 
interface of Stanbol, entities were found. However we had an encoding problem 
in our RDF resource from which the index was built, so entities with umlaut 
(eg. ö, ä) were not found. We corrected the encoding of the RDF and I ran the 
indexing process again with the same config files, but with the new RDF 
resource.
I again deployed (.jar and solr zip), and created the entityhub Linking Engine, 
plus the same Weighted Chain as above specified.
Now I don't get any results, when I paste text in the text field of this chain 
in Stanbol.

I configured log files, so that I can see what is happening. The linkable, 
matchable tokens, etc. are defined correctly eg. 'Berlin' in the sentence 
'Berlin is a big city' is defined as linkable token:

11.01.2016 16:14:05.667 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.SectionData     - 
TokenData: 'Berlin'[linkable=true(linkabkePos=true)| 
matchable=true(matchablePos=true)| alpha=true| seachLength=true| upperCase=true]

Also it is sent to the solr index, but from there, no results come back:
11.01.2016 16:14:05.668 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker --- 
preocess Token 0: Berlin (lemma: null) linkable=true, matchable=true | chunk: 
Chunk: [0, 6] Berlin
11.01.2016 16:14:05.668 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     - 
1:'is' (lemma: null) linkable=false, matchable=false
11.01.2016 16:14:05.668 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     - 2:'a' 
(lemma: null) linkable=false, matchable=false
11.01.2016 16:14:05.668 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >> 
searchStrings [Berlin]
11.01.2016 16:14:05.668 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    > 
request entities [0-20] entities ...
11.01.2016 16:14:05.669 *DEBUG* [Thread-9] 
org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker       < 
found 0 entities ...

I also looked at the solr.log, the query looks like this:
(((@en\/rdfs\:label\/:"Berlin")) OR ((@\/rdfs\:label\/:"Berlin")))
hits=0 status=0 QTime=1


I installed solr and copied the index file over to execute the above query. It 
does not result any Solr Documents, but the following one does:
(((_\!@en\/rdfs\:label\/:" Berlin ")) OR ((_\!@\/rdfs\:label\/:" Berlin ")))

Can someone help me, what I am missing?
Is it a configuration issue when I am creating the index? (Strange is, that I 
used the same config files for the incorrectly encoded RDF resource file, an 
that index worked.)
Or is it a Stanbol issue?

Thanks for any hints/help!

Best regards,
Kata

question on working with custom vocabularies

Reply via email to