Thanks Rupert for your help. I will post some more questions in the days to come as i explore further
thanks again tarandeep On Tue, Jun 4, 2013 at 3:22 PM, Rupert Westenthaler < [email protected]> wrote: > On Tue, Jun 4, 2013 at 9:52 AM, Sawhney, Tarandeep Singh > <[email protected]> wrote: > > Thanks so much Rupert for giving your valuable inputs, it really helped. > > > > You responded below "*Semantic Search in Stanbol is defined as searches > > over the document space. So with the Contenthub you will be able to > perform > > queries for all **Documents that do mention a Person and a Place.". *So > how > > different semantic search in Stanbol is different from keyword based > search > > on documents, say, to search a set of documents based on keyword > "Russia" ? > > You can do keyword searches by using > > * fise:selected-text values of fise:TextAnnotation and/or > * fise:entity-label values of fise:EntityAnnotation and/or > * the labels of the Entities referenced by fise:EnttiyAnntoation > (fise:entity-reference) > > You can also to entity searches by using the URIs of the Entities. > This becomes especially handy if > > * the search box for the users supports entity suggestions > * for faceted browsing this allows to build facets over Entities and > not only labels > > In the end all depends on the LD Path program used in the > configuration for the Contenthub. > > > > > Also, when terms in enhancement results say for example "IBM" is linked > > with entityhub entity, where this linkage is stored. Is it in enhancement > > RDF itself before it is stored in clerezza by contenhub ? > > > > The fise:EntityAnnotation [1] is used to represent this information. > Note also the "entityhub:site" property. This is present if the Entity > is originating from an Entityhub Site. > > For the Contenthub: > > * All RDF data are stored in the Clerezza TripleStore used by the > Contenthub. > * For the semantic index (Solr) it depends on the LDPath program > defined in its configuration > > best > Rupert > > [1] > http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fiseentityannotation > > > > > If you could please provide your inputs. > > > > Thanks and warm regards > > tarandeep > > > > > > On Mon, Jun 3, 2013 at 7:24 PM, Rupert Westenthaler < > > [email protected]> wrote: > > > >> On Mon, Jun 3, 2013 at 11:40 AM, Sawhney, Tarandeep Singh > >> <[email protected]> wrote: > >> > Hi > >> > > >> > I am new to stanbol and trying to understand its offerings. > >> > > >> > i have few questions, may i request to please provide your valuable > >> inputs > >> > so i understand things better and faster :-) Below questions are > >> > very beginner level, so please bear. > >> > > >> > (1) When user edits marked up data and defines/disambiguates entities > and > >> > then saves it say from VIE type editor, what happens in the > background ? > >> > does RDF is stored in entityhub? text is stored in contenthub, then > how > >> > semantic indexes gets created and on what ? on text or on RDF > metadata ? > >> In > >> > what scenarios we would need custom semantic indexes and not default > >> > semantic indexes and how would they be created by the system ? > >> > >> By default nothing of those. If you want to store Entities > >> acknowledged by users in you will need to call the RESTful API of the > >> Entityhub (typically a ManagedSite created for that reason). If you > >> send documents to the contenthub (instead of the enhancer) the text > >> and all enhancements will be stored and semantic indexed. In this case > >> you can also get the RDF enhancement results via a RESTful service and > >> display it in a VIE type editor. Documents sent to the Enhancer will > >> not be included in the contenthub. > >> > > >> > (2) Is RDF stored in entityhub ? then what is stanbol fact-store and > what > >> > it stores ? OR entityhub uses fact-store ? > >> > >> The Entityhub does not store RDF. It stores Entities - in RDF language > >> an entity is defined as an URI and all outgoing relation (similar to > >> the definition of Linked Data). When loading RDF data to the Entityhub > >> one need to consider that the Entityhub does not support bNodes. > >> > >> > > >> > (3) What is stanbol SPARQL editor and does it run on top of entityhub > ? > >> > >> It runs on top of Apache Clerezza. In case users do use a Clerezza > >> TripleStore (ClerezzaYard) as backend for an Entityhub Site, you can > >> also access those data via SPARQL. However typically the Apache Solr > >> based implementation (SolrYard) is used by the Entityhub. In this case > >> you can not perform SPARQL queries over the data in the Entityhub. > >> > >> The contenthub also stores the enhancement results in a Clerezza > >> TripleStore. So you can perform SPARQL queries over the data in the > >> Contenthub. > >> > >> > > > > >> > (4) If i were to integrate something line Relfinder with stanbol, and > >> > relfinder operates on RDF data, where it will get RDF data from ? Is > it > >> > from Entityhub ? > >> > >> As I stated above, you could use the ClerezzaYard to store the data of > >> the Entityhub. However this would badly affect the performance of the > >> Stanbol Enhancer when linking against those data (because Solr is much > >> better with label based queries). An other option would be to use the > >> Entityhub FieldQuery instead of SPARQL to obtain required information > >> from the Entityhub. The FieldQuery interface works regardless of the > >> storage backend. > >> > >> > > >> > (5) What is semantic search ? if it is searching entities and > >> relationships > >> > (which are stored in entityhub in the form of linkeddata cloud) then > what > >> > is the role of semantic index and why it is said that content hub > enables > >> > semantic search ? What are the type of queries we can fire using > semantic > >> > search ? > >> > >> Relfinder tries to "find" relations between Entities. In that way it > >> provides search / navigation support in the knowledge base. Semantic > >> Search in Stanbol is defined as searches over the document space. So > >> with the Contenthub you will be able to perform queries for all > >> Documents that do mention a Person and a Place. > >> > >> > > >> > (6) Can i pass pdf/word document to enhancer to generate metadata ? > >> > >> Yes. Just make sure to include the Apache Tika Engine [1] in your > >> Enhancement Chain. > >> > >> [1] > >> > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine > >> > >> > > >> > (7) how can i make enhancer extract my domain entities, what steps are > >> > needed at high level ? > >> > >> [2] gives an good overview about that. Typically you can start by > >> configuring a ManagedSite [3] and uploading your RDF data via the > >> RESTful interface. Next you will need to configure an > >> EntityhubLinkingEngine [4] for this ManagedSite. Finally you need to > >> configure an Enhancement Chain (preferable a Weighted Chain) that > >> includes tika, langdetect, opennlp-sentence, opennlp-token, > >> opennlp-pos, opennlp-chunker and {your-entityhub-linking-engine}. > >> After that your Enahncement Chain will be available in the RESTful > >> Endpoint of the Stanbol Enhancer (enhancer/chain/{name-of-you-chain}). > >> > >> If you want to link against several vocabularies you can configure > >> multiple ManagedSites and EntityhubLinkingEngine. If you want to have > >> a single Enhancement Chain that links against all of those, just add > >> all your EntityhubLinkingEngines to a single chain. > >> > >> best > >> Rupert > >> > >> > >> [2] http://stanbol.apache.org/docs/trunk/customvocabulary.html > >> [3] > >> > http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html > >> [4] > >> > http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking > >> > >> > > >> > thanks in advance > >> > taran > >> > > >> > -- > >> > > >> > "This e-mail and any attachments transmitted with it are for the sole > use > >> > of the intended recipient(s) and may contain confidential , > proprietary > >> or > >> > privileged information. If you are not the intended recipient, please > >> > contact the sender by reply e-mail and destroy all copies of the > original > >> > message. Any unauthorized review, use, disclosure, dissemination, > >> > forwarding, printing or copying of this e-mail or any action taken in > >> > reliance on this e-mail is strictly prohibited and may be unlawful." > >> > >> > >> > >> -- > >> | Rupert Westenthaler [email protected] > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen > >> > > > > -- > > > > "This e-mail and any attachments transmitted with it are for the sole use > > of the intended recipient(s) and may contain confidential , proprietary > or > > privileged information. If you are not the intended recipient, please > > contact the sender by reply e-mail and destroy all copies of the original > > message. Any unauthorized review, use, disclosure, dissemination, > > forwarding, printing or copying of this e-mail or any action taken in > > reliance on this e-mail is strictly prohibited and may be unlawful." > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > -- "This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful."
