Hi Kata, Have you overwritten the old solr index in the datafiles folder or have you started from the scratch after fixing the encoding of the RDF files?
Just a hint: you can check if your entities have been indexed by querying then with the EntityHub API at Stanbol Web interface Hope that helps, Rafa On Mon, Jan 11, 2016 at 7:19 PM Lejtovicz, Katalin < katalin.lejtov...@oeaw.ac.at> wrote: > Dear All, > > I have some problem with using custom vocabularies to enhance my content. > I created an index with Stanbol from a vocabulary, deployed the .jar file > and copied the solr index file to the datafiles folder, and created an > EntityHub Linking Engine, plus a weighted chain, where the following > pipeline was configured: langdetect, opennlp-sentence, opennlp-token, > opennlp-pos, opennlp-chunker, and the an EntityHub Linking Engine for my > custom vocab. > > It worked fine, when text was pasted in this enhancement chain in the user > interface of Stanbol, entities were found. However we had an encoding > problem in our RDF resource from which the index was built, so entities > with umlaut (eg. ö, ä) were not found. We corrected the encoding of the RDF > and I ran the indexing process again with the same config files, but with > the new RDF resource. > I again deployed (.jar and solr zip), and created the entityhub Linking > Engine, plus the same Weighted Chain as above specified. > Now I don't get any results, when I paste text in the text field of this > chain in Stanbol. > > I configured log files, so that I can see what is happening. The linkable, > matchable tokens, etc. are defined correctly eg. 'Berlin' in the sentence > 'Berlin is a big city' is defined as linkable token: > > 11.01.2016 16:14:05.667 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.SectionData - > TokenData: 'Berlin'[linkable=true(linkabkePos=true)| > matchable=true(matchablePos=true)| alpha=true| seachLength=true| > upperCase=true] > > Also it is sent to the solr index, but from there, no results come back: > 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker --- > preocess Token 0: Berlin (lemma: null) linkable=true, matchable=true | > chunk: Chunk: [0, 6] Berlin > 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker - > 1:'is' (lemma: null) linkable=false, matchable=false > 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker - > 2:'a' (lemma: null) linkable=false, matchable=false > 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker >> > searchStrings [Berlin] > 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker > > request entities [0-20] entities ... > 11.01.2016 16:14:05.669 *DEBUG* [Thread-9] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker < > found 0 entities ... > > I also looked at the solr.log, the query looks like this: > (((@en\/rdfs\:label\/:"Berlin")) OR ((@\/rdfs\:label\/:"Berlin"))) > hits=0 status=0 QTime=1 > > > I installed solr and copied the index file over to execute the above > query. It does not result any Solr Documents, but the following one does: > (((_\!@en\/rdfs\:label\/:" Berlin ")) OR ((_\!@\/rdfs\:label\/:" Berlin > "))) > > Can someone help me, what I am missing? > Is it a configuration issue when I am creating the index? (Strange is, > that I used the same config files for the incorrectly encoded RDF resource > file, an that index worked.) > Or is it a Stanbol issue? > > Thanks for any hints/help! > > Best regards, > Kata > >