Hi Arthi

see comments inline

On Wed, Jul 10, 2013 at 4:03 PM,  <arthi.ven...@wipro.com> wrote:
> Hi,
> Iam a newbee to Stanbol.
> I want to use Stanbol to be able to extract meaningful data from different 
> unstructured text.
> Fields of interest are based on my custom vocabulary.
> Data in the unstructured text will keep changing and cannot be indexed upfront
>
> Have followed the instructions on this link  -  
> http://stanbol.apache.org/docs/trunk/customvocabulary.html
>
> From reading the link under stand I would need to follow only the keyword 
> linking approach.
>
> Did following
>
> 1.       Created a Yard Site implementation
>

I assume you created a ManagedSite as described in [1]

> 2.       Uploaded by basic vocabulary into this using the curl command
>
> 3.       Created an EntityHub linking engine
>
> 4.       Create an enhancement chain with following components
>
>      *   langdetect ( required , LanguageDetectionEnhancementEngine)
>      *   opennlp-sentence ( required , OpenNlpSentenceDetectionEngine)
>      *   opennlp-token ( required , OpenNlpTokenizerEngine)
>      *   opennlp-pos ( required , OpenNlpPosTaggingEngine)
>      *   opennlp-chunker ( required , OpenNlpChunkingEngine)
>      *   opennlp-ner ( required , NamedEntityExtractionEnhancementEngine)
>      *   CustentityhubExtraction ( required , EntityLinkingEngine)
>

You will not need the "opennlp-ner" to extract your Entities, but
having this engine in the chain is also not a problem.

> 5.       When I run query I do not see entities from vocabulary getting 
> identified
> Note : Currently  my vocabulary is very simple.   More entities will be added 
> later.
> It is an  Ontology which has only 1 entity Person
> Person in turn has following properties - Name,  City, DateOfBirth

What property, does your Entity use for the Name? By default the
EntityhubLinkingEngine uses "rdfs:label" for linking. If you use a
different property in your ontology you will need to adapt the
configuration of the "Label Field" for your EntityLinkingEngine

>
> I think I have gone wrong is some configuration parameter.
> I had doubts in following :
>
> 1.       In the Entity Hub linking engine
>
> a.        What do we enter in the fields used for dereferencing

This config allows you to add additional information for extracted
Entities. In your case you might want to add the properties used to
store the City and the DateOfBirth. If you do not enable "Dereference
Entities" this config will get ignored.

>
>                                                                i.      Do we 
> delete the default mappings provided
>
> b.      What do we enter in the Type mappings

Those are used to map the rdf:type value of the Entities to dc:type
values used for fise:TextAnnotation instances. As your vocabulary
contains Persons you should mat the rdf:type value your are using in
your Ontology to "dbp-ont:Person" e.g. by adding the mapping

{your-person-type-uri} > dbp-ont:Person

but this configuration is completely optional. If you do not do it
fise:TextAnnotations created by the Engine will not have dc:type
values.

>
> c.       In processed languages do we need to enter any special parameters

As long as you use OpenNLP as NLP framework the provided default are ok.

>
> 2.       In the Managed Site yard site what do we enter for field mappings
>
> a.       I have entered person:name > dbp:Person:birthName  , not sure if 
> this is correct

This is not correct as "dbp:Person:birthName" is not a valid QNAME
({prefix}:{localname}). Those mappings can be used to map properties
of your ontologies to others. If you use your own namespaces you will
most likely need to use full URIs instead of QNAMEs.

e.g.

    http://www.my-ontology.org/person/name > rdfs:label

>
> b.      Do we need to retain the default mappings
>

Mappings are optional. You can delete those if you don't need them.
The defaults only ensure that 'rdfs:label' values are present for
common ontologies. This is because the EntityLinkingEngine dose use
'rdfs:label' as default config for the label field.

> 3.       In the Solr Yard configuration I have not defined any Solr cor. Gone 
> with default core / create on initialization. Is this ok

you need to provide a value for the "Solr Index/Core" and also enable
"Allow Initialization" so that an empty SolrCore is created for your
ManagedSite (see also [1] for the full documentation)


Hope this helps
best
Rupert

[1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html

>
> Please confirm if the steps followed are correct.
> What do I need to make the custom vocabulary work.
>
> Have spent most of last week on this  but unable to get this working.
> Request your help for same
>
>
> Thanks a lot,
> Arthi
>
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
>
> www.wipro.com



--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to