Hi, Please note that, some of the below questions are very basic.
*1. Ontology:* If I have a custom ontology, do I have to host it somewhere on the web in order for me to use Referenced Site or indexing using genericrdf? Specifically, when I included ontology which is not registered with prefix.cc it failed and generated an error? For ex. @prefix bsym: <http://bsym.bloomberg.com/sym/> . @prefix figi-gii: < http://www.omg.org/spec/FIGI/GlobalInstrumentIdentifiers/> . @prefix figi-st: <http://www.omg.org/spec/FIGI/SecurityTypes/> . bsym:AAPL rdf:type figi-gii:CompositeGlobalIdentifier; bsym:securityType figi-st:CommonStock; figi-gii:isConstituentOf "djia"; figi-gii:EquityMarketSector "Consumer electronics"; bsym:listedAs "AAPL"; bsym:issuedBy "Apple Inc." . The figi-gii generated an error. *2. Solr Index:* As far as solr yard is concerned, I am interested in finding out the indexed content. Is there an easy way to find out what is inside the index? I tried to use Luke but didn't help on solr index generated by stanbol. The worst come to worst, is there some tool which can dump the entire index structure in matrix for might work as well. @prefix bsym: <http://bsym.bloomberg.com/sym/> . bsym:AAPL rdf:type bsym:CommonStock; bsym:securityType "CommonStock"; bsym:sector "Consumer electronics"; bsym:ticker "AAPL"; bsym:name "Apple Inc." . -- mappings.txt --- # --- Specific to the symbology --- bsym:* bsym:securityType | d=entityhub:ref bsym:name | d=entityhub:ref > dbp-ont:Organisation bsym:ticker | d=entityhub:ref > rdfs:label Is mappings.txt seem accurate for the above turtle content? (Assuming I am using Referenced Site and using genericrdf) Again, after making above changes I could successfully index the turtle file however the entityhub site query doesn't work for any field even with name. *3. NER:* Is it true that in order for NER to work, one must have well-defined ontology in public domain (i.e. referring to concepts people already have already modeled). If that's not the case and, if I can upload custom ontology to stanbol ontonet can I refer to it during the index time or setting the NER properties within Entityhub Linking? *4. Enhancer Engines:* Suppose, I have two engines a. opennlp-ner: the built-in engine which can detect currently Organization, Places, etc... b. custom engine: if this engine uses above ontology to recognize stock If I am planning to define new List type chain do you think once I use opennlp-token, opennlp-pos afterwards I have to have above both chains in specific order. Again, the goal is to work on previously refined NER and further refine it. Is Listed chain the most appropriate for such task or some other chain such as weighted/graph chain? *5. Enhancer Engines and Tie breaking:* Suppose, I want to have dbpedia related engine(s) and freebase engine in one listed chain. Which one should be given higher weight/priority and why? or What would be the preferred approach? In scenarios, I have seen that same entity has been identified multiple times by one engine with different confidences (may be based on position, prefix, etc...). a. Is there any runtime setting one can tweak to get only one entity with average confidence etc... b. Are there any other algorithms, which truly pick-up the ones based on context? *6. Features:* a. Finding Relationships: Is there any way, one can infer relationship(s) within various entities outlined in text in stanbol? b. Highlighting important sentences: Is there any way to highlight/detect important sentences within text using Stanbol built-in features?