Hi Arthi see comments inline
On Wed, Jul 10, 2013 at 4:03 PM, <arthi.ven...@wipro.com> wrote: > Hi, > Iam a newbee to Stanbol. > I want to use Stanbol to be able to extract meaningful data from different > unstructured text. > Fields of interest are based on my custom vocabulary. > Data in the unstructured text will keep changing and cannot be indexed upfront > > Have followed the instructions on this link - > http://stanbol.apache.org/docs/trunk/customvocabulary.html > > From reading the link under stand I would need to follow only the keyword > linking approach. > > Did following > > 1. Created a Yard Site implementation > I assume you created a ManagedSite as described in [1] > 2. Uploaded by basic vocabulary into this using the curl command > > 3. Created an EntityHub linking engine > > 4. Create an enhancement chain with following components > > * langdetect ( required , LanguageDetectionEnhancementEngine) > * opennlp-sentence ( required , OpenNlpSentenceDetectionEngine) > * opennlp-token ( required , OpenNlpTokenizerEngine) > * opennlp-pos ( required , OpenNlpPosTaggingEngine) > * opennlp-chunker ( required , OpenNlpChunkingEngine) > * opennlp-ner ( required , NamedEntityExtractionEnhancementEngine) > * CustentityhubExtraction ( required , EntityLinkingEngine) > You will not need the "opennlp-ner" to extract your Entities, but having this engine in the chain is also not a problem. > 5. When I run query I do not see entities from vocabulary getting > identified > Note : Currently my vocabulary is very simple. More entities will be added > later. > It is an Ontology which has only 1 entity Person > Person in turn has following properties - Name, City, DateOfBirth What property, does your Entity use for the Name? By default the EntityhubLinkingEngine uses "rdfs:label" for linking. If you use a different property in your ontology you will need to adapt the configuration of the "Label Field" for your EntityLinkingEngine > > I think I have gone wrong is some configuration parameter. > I had doubts in following : > > 1. In the Entity Hub linking engine > > a. What do we enter in the fields used for dereferencing This config allows you to add additional information for extracted Entities. In your case you might want to add the properties used to store the City and the DateOfBirth. If you do not enable "Dereference Entities" this config will get ignored. > > i. Do we > delete the default mappings provided > > b. What do we enter in the Type mappings Those are used to map the rdf:type value of the Entities to dc:type values used for fise:TextAnnotation instances. As your vocabulary contains Persons you should mat the rdf:type value your are using in your Ontology to "dbp-ont:Person" e.g. by adding the mapping {your-person-type-uri} > dbp-ont:Person but this configuration is completely optional. If you do not do it fise:TextAnnotations created by the Engine will not have dc:type values. > > c. In processed languages do we need to enter any special parameters As long as you use OpenNLP as NLP framework the provided default are ok. > > 2. In the Managed Site yard site what do we enter for field mappings > > a. I have entered person:name > dbp:Person:birthName , not sure if > this is correct This is not correct as "dbp:Person:birthName" is not a valid QNAME ({prefix}:{localname}). Those mappings can be used to map properties of your ontologies to others. If you use your own namespaces you will most likely need to use full URIs instead of QNAMEs. e.g. http://www.my-ontology.org/person/name > rdfs:label > > b. Do we need to retain the default mappings > Mappings are optional. You can delete those if you don't need them. The defaults only ensure that 'rdfs:label' values are present for common ontologies. This is because the EntityLinkingEngine dose use 'rdfs:label' as default config for the label field. > 3. In the Solr Yard configuration I have not defined any Solr cor. Gone > with default core / create on initialization. Is this ok you need to provide a value for the "Solr Index/Core" and also enable "Allow Initialization" so that an empty SolrCore is created for your ManagedSite (see also [1] for the full documentation) Hope this helps best Rupert [1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html > > Please confirm if the steps followed are correct. > What do I need to make the custom vocabulary work. > > Have spent most of last week on this but unable to get this working. > Request your help for same > > > Thanks a lot, > Arthi > > > Please do not print this email unless it is absolutely necessary. > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen