On Mon, Jun 3, 2013 at 11:40 AM, Sawhney, Tarandeep Singh
<[email protected]> wrote:
> Hi
>
> I am new to stanbol and trying to understand its offerings.
>
> i have few questions, may i request to please provide your valuable inputs
> so i understand things better and faster :-) Below questions are
> very beginner level, so please bear.
>
> (1) When user edits marked up data and defines/disambiguates entities and
> then saves it say from VIE type editor, what happens in the background ?
> does RDF is stored in entityhub? text is stored in contenthub, then how
> semantic indexes gets created and on what ? on text or on RDF metadata ? In
> what scenarios we would need custom semantic indexes and not default
> semantic indexes and how would they be created by the system ?

By default nothing of those. If you want to store Entities
acknowledged by users in you will need to call the RESTful API of the
Entityhub (typically a ManagedSite created for that reason). If you
send documents to the contenthub (instead of the enhancer) the text
and all enhancements will be stored and semantic indexed. In this case
you can also get the RDF enhancement results via a RESTful service and
display it in a VIE type editor. Documents sent to the Enhancer will
not be included in the contenthub.

>
> (2) Is RDF stored in entityhub ? then what is stanbol fact-store and what
> it stores ? OR entityhub uses fact-store ?

The Entityhub does not store RDF. It stores Entities - in RDF language
an entity is defined as an URI and all outgoing relation (similar to
the definition of Linked Data). When loading RDF data to the Entityhub
one need to consider that the Entityhub does not support bNodes.

>
> (3) What is stanbol SPARQL editor and does it run on top of entityhub ?

It runs on top of Apache Clerezza. In case users do use a Clerezza
TripleStore (ClerezzaYard) as backend for an Entityhub Site, you can
also access those data via SPARQL. However typically the Apache Solr
based implementation (SolrYard) is used by the Entityhub. In this case
you can not perform SPARQL queries over the data in the Entityhub.

The contenthub also stores the enhancement results in a Clerezza
TripleStore. So you can perform SPARQL queries over the data in the
Contenthub.

>
> (4) If i were to integrate something line Relfinder with stanbol, and
> relfinder operates on RDF data, where it will get RDF data from ? Is it
> from Entityhub ?

As I stated above, you could use the ClerezzaYard to store the data of
the Entityhub. However this would badly affect the performance of the
Stanbol Enhancer when linking against those data (because Solr is much
better with label based queries). An other option would be to use the
Entityhub FieldQuery instead of SPARQL to obtain required information
from the Entityhub. The FieldQuery interface works regardless of the
storage backend.

>
> (5) What is semantic search ? if it is searching entities and relationships
> (which are stored in entityhub in the form of linkeddata cloud) then what
> is the role of semantic index and why it is said that content hub enables
> semantic search ? What are the type of queries we can fire using semantic
> search ?

Relfinder tries to "find" relations between Entities. In that way it
provides search / navigation support in the knowledge base. Semantic
Search in Stanbol is defined as searches over the document space. So
with the Contenthub you will be able to perform queries for all
Documents that do mention a Person and a Place.

>
> (6) Can i pass pdf/word document to enhancer to generate metadata ?

Yes. Just make sure to include the Apache Tika Engine [1] in your
Enhancement Chain.

[1] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine

>
> (7) how can i make enhancer extract my domain entities, what steps are
> needed at high level ?

[2] gives an good overview about that. Typically you can start by
configuring a ManagedSite [3] and uploading your RDF data via the
RESTful interface. Next you will need to configure an
EntityhubLinkingEngine [4] for this ManagedSite. Finally you need to
configure an Enhancement Chain (preferable a Weighted Chain) that
includes tika, langdetect, opennlp-sentence, opennlp-token,
opennlp-pos, opennlp-chunker and {your-entityhub-linking-engine}.
After that your Enahncement Chain will be available in the RESTful
Endpoint of the Stanbol Enhancer (enhancer/chain/{name-of-you-chain}).

If you want to link against several vocabularies you can configure
multiple ManagedSites and EntityhubLinkingEngine. If you want to have
a single Enhancement Chain that links against all of those, just add
all your EntityhubLinkingEngines to a single chain.

best
Rupert


[2] http://stanbol.apache.org/docs/trunk/customvocabulary.html
[3] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html
[4] 
http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking

>
> thanks in advance
> taran
>
> --
>
> "This e-mail and any attachments transmitted with it are for the sole use
> of the intended recipient(s) and may contain confidential , proprietary or
> privileged information. If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of the original
> message. Any unauthorized review, use, disclosure, dissemination,
> forwarding, printing or copying of this e-mail or any action taken in
> reliance on this e-mail is strictly prohibited and may be unlawful."



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to