Stanbol follow-up questions

Rajan Shah Thu, 28 May 2015 21:52:04 -0700

Hi,

Please note that, some of the below questions are very basic.


*1. Ontology:*

If I have a custom ontology, do I have to host it somewhere on the web in
order for me to use Referenced Site or indexing using genericrdf?

Specifically, when I included ontology which is not registered with
prefix.cc it failed and generated an error?

For ex.

@prefix bsym:   <http://bsym.bloomberg.com/sym/> .

@prefix figi-gii:   <
http://www.omg.org/spec/FIGI/GlobalInstrumentIdentifiers/> .
@prefix figi-st:   <http://www.omg.org/spec/FIGI/SecurityTypes/> .


bsym:AAPL rdf:type figi-gii:CompositeGlobalIdentifier;
          bsym:securityType figi-st:CommonStock;
          figi-gii:isConstituentOf "djia";
          figi-gii:EquityMarketSector "Consumer electronics";
          bsym:listedAs "AAPL";
          bsym:issuedBy "Apple Inc." .

The figi-gii generated an error.

*2. Solr Index:*

As far as solr yard is concerned, I am interested in finding out the
indexed content. Is there an easy way to find out what is inside the index?
I tried to use Luke but didn't help on solr index generated by stanbol. The
worst come to worst, is there some tool which can dump the entire index
structure in matrix for might work as well.


@prefix bsym:   <http://bsym.bloomberg.com/sym/> .


bsym:AAPL rdf:type bsym:CommonStock;
          bsym:securityType "CommonStock";
          bsym:sector "Consumer electronics";
          bsym:ticker "AAPL";
          bsym:name "Apple Inc." .

-- mappings.txt ---
# --- Specific to the symbology ---

bsym:*
bsym:securityType | d=entityhub:ref
bsym:name | d=entityhub:ref > dbp-ont:Organisation
bsym:ticker | d=entityhub:ref > rdfs:label

Is mappings.txt seem accurate for the above turtle content? (Assuming I am
using Referenced Site and using genericrdf)

Again, after making above changes I could successfully index the turtle
file however the entityhub site query doesn't work for any field even with
name.

*3. NER:*

Is it true that in order for NER to work, one must have well-defined
ontology in public domain (i.e. referring to concepts people already have
already modeled). If that's not the case and, if I can upload custom
ontology to stanbol ontonet can I refer to it during the index time or
setting the NER properties within Entityhub Linking?

*4. Enhancer Engines:*

Suppose, I have two engines

a. opennlp-ner: the built-in engine which can detect currently
Organization, Places, etc...
b. custom engine: if this engine uses above ontology to recognize stock

If I am planning to define new List type chain do you think once I use
opennlp-token, opennlp-pos afterwards I have to have above both chains in
specific order.  Again, the goal is to work on previously refined NER and
further refine it.

Is Listed chain the most appropriate for such task or some other chain such
as weighted/graph chain?


*5. Enhancer Engines and Tie breaking:*

Suppose, I want to have dbpedia related engine(s) and freebase engine in
one listed chain. Which one should be given higher weight/priority and why?
or What would be the preferred approach?

In scenarios, I have seen that same entity has been identified multiple
times by one engine with different confidences (may be based on position,
prefix, etc...).

a. Is there any runtime setting one can tweak to get only one entity with
average confidence etc...
b. Are there any other algorithms, which truly pick-up the ones based on
context?

*6. Features:*

a. Finding Relationships:
Is there any way, one can infer relationship(s) within various entities
outlined in text in stanbol?

b. Highlighting important sentences:
Is there any way to highlight/detect important sentences within text using
Stanbol built-in features?

Stanbol follow-up questions

Reply via email to