Hi Rafa, Thanks for the hint, I will have a look at it!
Best, Kata -----Original Message----- From: Rafa Haro [mailto:rh...@apache.org] Sent: Tuesday, January 12, 2016 12:21 PM To: Lejtovicz, Katalin <katalin.lejtov...@oeaw.ac.at>; dev@stanbol.apache.org Subject: Re: Re: question on working with custom vocabularies Hi Kata, Probably there is a problem with your linking configuration. The first thing I would check would be if you have labels for the language identified automatically by Stanbol. If your instance is available in any public URL I could try to take a look if you want. Cheers, Rafa On Tue, Jan 12, 2016 at 11:36 AM Lejtovicz, Katalin < katalin.lejtov...@oeaw.ac.at> wrote: > Hi Rafa, > > > > Thanks for your reply! > > The index file was deleted and the new one copied to the datafiles folder, > but on another machine I also tried out a new installation of Stanbol, and > copied the index file over, and it didn’t work there either. > > > > The EntityHub via the API works, I checked that first, when I noticed the > problem. I get back my entities. > > Do you probably have any clue, what the problem can be? The index contains > the entities, and I can also query them via EntityHub, but the solr query > seems to be ‘incorrect’ (the first one, which is logged from Stanbol > doesn’t return any results, but the second, that I tried out works fine). > > > > Thanks in advance! > > > > Best regards, > > Kata > > > > >Hi Kata, > > > > > >Have you overwritten the old solr index in the datafiles folder or have > you > > >started from the scratch after fixing the encoding of the RDF files? > > > > > >Just a hint: you can check if your entities have been indexed by querying > > >then with the EntityHub API at Stanbol Web interface > > > > > >Hope that helps, > > >Rafa > > > > > >On Mon, Jan 11, 2016 at 7:19 PM Lejtovicz, Katalin < > > >katalin.lejtov...@oeaw.ac.at>> wrote: > > > > > >> Dear All, > > >> > > >> I have some problem with using custom vocabularies to enhance my > content. > > >> I created an index with Stanbol from a vocabulary, deployed the .jar > file > > >> and copied the solr index file to the datafiles folder, and created an > > >> EntityHub Linking Engine, plus a weighted chain, where the following > > >> pipeline was configured: langdetect, opennlp-sentence, opennlp-token, > > >> opennlp-pos, opennlp-chunker, and the an EntityHub Linking Engine for my > > >> custom vocab. > > >> > > >> It worked fine, when text was pasted in this enhancement chain in the > user > > >> interface of Stanbol, entities were found. However we had an encoding > > >> problem in our RDF resource from which the index was built, so entities > > >> with umlaut (eg. ö, ä) were not found. We corrected the encoding of the > RDF > > >> and I ran the indexing process again with the same config files, but > with > > >> the new RDF resource. > > >> I again deployed (.jar and solr zip), and created the entityhub Linking > > >> Engine, plus the same Weighted Chain as above specified. > > >> Now I don't get any results, when I paste text in the text field of this > > >> chain in Stanbol. > > >> > > >> I configured log files, so that I can see what is happening. The > linkable, > > >> matchable tokens, etc. are defined correctly eg. 'Berlin' in the > sentence > > >> 'Berlin is a big city' is defined as linkable token: > > >> > > >> 11.01.2016 16:14:05.667 *DEBUG* [Thread-9] > > >> org.apache.stanbol.enhancer.engines.entitylinking.impl.SectionData - > > >> TokenData: 'Berlin'[linkable=true(linkabkePos=true)| > > >> matchable=true(matchablePos=true)| alpha=true| seachLength=true| > > >> upperCase=true] > > >> > > >> Also it is sent to the solr index, but from there, no results come back: > > >> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > > >> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker --- > > >> preocess Token 0: Berlin (lemma: null) linkable=true, matchable=true | > > >> chunk: Chunk: [0, 6] Berlin > > >> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > > >> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker > - > > >> 1:'is' (lemma: null) linkable=false, matchable=false > > >> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > > >> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker > - > > >> 2:'a' (lemma: null) linkable=false, matchable=false > > >> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > > >> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker > >>>> > > >> searchStrings [Berlin] > > >> 11.01.2016 16:14:05.668 *DEBUG* [Thread-9] > > >> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker > >> > > >> request entities [0-20] entities ... > > >> 11.01.2016 16:14:05.669 *DEBUG* [Thread-9] > > >> > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker < > > >> found 0 entities ... > > >> > > >> I also looked at the solr.log, the query looks like this: > > >> (((@en\/rdfs\:label\/:"Berlin")) OR ((@\/rdfs\:label\/:"Berlin"))) > > >> hits=0 status=0 QTime=1 > > >> > > >> > > >> I installed solr and copied the index file over to execute the above > > >> query. It does not result any Solr Documents, but the following one > does: > > >> (((_\!@en\/rdfs\:label\/:" Berlin ")) OR ((_\!@\/rdfs\:label\/:" Berlin > > >> "))) > > >> > > >> Can someone help me, what I am missing? > > >> Is it a configuration issue when I am creating the index? (Strange is, > > >> that I used the same config files for the incorrectly encoded RDF > resource > > >> file, an that index worked.) > > >> Or is it a Stanbol issue? > > >> > > >> Thanks for any hints/help! > > >> > > >> Best regards, > > >> Kata > > >> > > >> > > >