Re: Clerezza Yard setup and SPARQL

Rupert Westenthaler Mon, 08 Jun 2015 00:51:43 -0700

Hi,

The SolrYard does not support BNodes and the VCard RDF tends to use those.


If you use the Entityhub Indexing Tool for importing the data you can
try to set the "bnode-prefix" for the rdf indexing source (see
STANBOL-765 [1] for details)

best
Rupert

[1] https://issues.apache.org/jira/browse/STANBOL-765

On Tue, Jun 2, 2015 at 6:13 PM, Rajan Shah <[email protected]> wrote:
> Hi Rupert,
>
> Thanks again for the response.
>
> At present, it's just an observation that mainly with vcard I had issue
> with queries. At the same time, I could get results with either custom
> entities or even foaf.
>
> I will keep an eye on it and if observe it again, will submit JIRA issue.
>
> With best regards,
> Rajan
>
>
>
> On Tue, Jun 2, 2015 at 11:03 AM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> Hi Rajan,
>>
>> Sorry I do not have enough time for a detailed answer. But the
>> baseline is. EntityLinking does not work with the Clerezza Yard. Even
>> if you would not encounter errors both performance and results would
>> be much worse as with a SolrYard. This is because EntityLinking
>> depends on features that are Solr Exclusive (e.g. the Solr Analyzers
>> doing Stemming ... and the ranking of query results).
>>
>> If you find failing SPARQL queries in the log feel free to report as
>> Issues in Jira. I will have a look.
>>
>> best
>> Rupert
>>
>> On Sat, May 30, 2015 at 11:14 PM, Rajan Shah <[email protected]> wrote:
>> > Hi,
>> >
>> > I can create Clerezza Yard successfully and query the data using SPARQL.
>> Now,
>> > when it comes to Named Entity Recognition the same issue persists.
>> >
>> > I would appreciate, if someone can provide some insight or potential
>> > resolution.
>> >
>> > Thanks in advance,
>> > Rajan
>> >
>> > These are the steps I followed:
>> >
>> > 1. Uploaded relevant ontology to local ontonet
>> >
>> > 2. Created Managed Site, uploaded triples
>> >
>> > 3. Verified the data exists via SPARQL query:
>> >
>> > <binding>
>> > <result>
>> > <binding name="ticker"><literal>AAPL</literal>
>> > </binding><binding name="issuer"><literal>Apple Inc.</literal>
>> > </binding><binding name="exchange"><literal>NASDAQ</literal></binding>
>> > <binding name="currency"><literal>USD</literal>
>> > </binding><binding name="instr">
>> > <uri>http://finance.intellimind.io/secmaster/djia/AAPL</uri>
>> > </binding>
>> > </result>
>> > </results></sparql>
>> >
>> > 4. Entityhub Linking
>> >
>> > Assuming prefix imind being http://finance.intellimind.io/secmaster (so
>> > that namespace prefix can be verified)
>> >
>> > In the entityhub linking setup, within type mapping I am trying to map
>> >
>> > a. Type Mapping Setup
>> > imind:ticker > rdfs:label
>> > imind:exchange > rdfs:label
>> > ...
>> >
>> > b. Select "Case Sensitivity"
>> >
>> >
>> > 5. Chain setup
>> >
>> > When included it in the list chain, it doesn't capture single entity
>> > whereas it spent most of the time in this paricular chain.
>> >
>> >
>> >
>> >    - *tika* ( optional , TikaEngine)
>> >    - *langdetect* ( required , LanguageDetectionEnhancementEngine)
>> >    - *opennlp-sentence* ( required , OpenNlpSentenceDetectionEngine)
>> >    - *opennlp-token* ( required , OpenNlpTokenizerEngine)
>> >    - *opennlp-pos* ( required , OpenNlpPosTaggingEngine)
>> >    - *opennlp-ner* ( required , NamedEntityExtractionEnhancementEngine)
>> >    - *refdata-linking* ( required , EntityLinkingEngine)
>> >    -
>> >
>> >
>> > *Sample Text:*
>> >
>> > The Apple Inc. CEO Tim Cook spoke at dev conference. The Apple Inc. has
>> > headquarter in US. It's ticker symbol is AAPL, which trades on NASDAQ.
>> >
>> > On Mon, May 25, 2015 at 12:04 AM, Rajan Shah <[email protected]> wrote:
>> >
>> >> Hi,
>> >>
>> >> In order to use Clerezza Yard setup, I tried very simple example
>> outlined
>> >> at the end.
>> >>
>> >> I would really appreciate, if someone can shed some light on
>> >>
>> >> a. Is there anything I am just completely missing here pertaining to
>> >> "Named Graph" vs "Unions of Graphs" and reference? If that's the case,
>> >> could you please clarify what would be relevant URI/IRI?
>> >>
>> >> b. What is the best way to debug such an issue? If SPARQL query fails,
>> >> where should I see the logs indicating any issue as it doesn't appear in
>> >> stdout logs?
>> >>
>> >> c. Is there any other simple alternative compare to this to achieve
>> >> similar functionality? Is storing in Kiwi beneficial compared to this
>> >> approach or do I have to have Apache Maramotta installed in order to use
>> >> Kiwi?
>> >>
>> >> Thanks in advance,
>> >> Rajan
>> >>
>> >>
>> >> *1. Apache Stanbol Entityhub Yard: Clerezza Yard Configuration*
>> >>
>> >> Set following parameters
>> >>
>> >> ID: testYard
>> >> Graph URI: http://test.io/ns/friends#
>> >>
>> >> *2. Setup Clerezza - SCB Jena TDB Storage Provider*
>> >>
>> >> Jena TDB directory: /<stanbol_dir>/<tdb_store>
>> >> Default Graph Name: http://test.io/ns
>> >> Weight: 105
>> >>
>> >> *3. Save the .ttl file into /<stanbol_dir>/<tdb_store>*
>> >>
>> >> @prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
>> >> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>> >> @prefix friends: <http://test.io/ns/friends#> .
>> >>
>> >> <http://test.io/ns/friends#AndrewSmith> a vcard:Individual;
>> >>                     vcard:fn "Andrew Smith";
>> >>                     vcard:title "Founder";
>> >>                     vcard:org "ABC LLC";
>> >>                     vcard:orgunit "Startup";
>> >>                     vcard:hasAddress [
>> >>                                         a vcard:Work;
>> >>                                         vcard:country-name "USA";
>> >>                                         vcard:locality "New York";
>> >>                                         vcard:region "New York"
>> >>                     ] .
>> >>
>> >> *4. I do see that, upon startup, it creates necessary index files
>> within *
>> >> /<stanbol_dir>/<tdb_store>
>> >> directory. In addition, within UI, it also registers following
>> >> TripleCollections in SPARQL Endpoint
>> >>
>> >> http://test.io/ns/friends#
>> >>
>> >> *5. SPARQL Query*
>> >> -- query1
>> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>> >> PREFIX friends: <http://test.io/ns/friends#>
>> >> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>> >>
>> >> SELECT ?fn ?title ?org
>> >> WHERE {
>> >>   ?s vcard:fn ?fn ;
>> >>     vcard:title ?title ;
>> >>     vcard:org ?org .
>> >> }
>> >>
>> >> OR
>> >>
>> >> -- query2
>> >> PREFIX hmgr: <http://test.io/ns/friends#>
>> >> PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>
>> >>
>> >> SELECT ?Individual ?title
>> >> WHERE { ?title  vcard:title  "Founder" }
>> >>
>> >>
>> >> *Observations:*
>> >>
>> >> The above queries work perfectly fine on either command-line or Jena
>> Fuseki
>> >> as follows
>> >>
>> >> a. tdbquery --loc /<stanbol_dir/<tdb_store> --query query1
>> >> b. using fuseki user interface
>> >>
>> >> I tried couple alternatives such as GRAPH, NAMED, etc... however nothing
>> >> helps. Is there any specific syntax need to be used for the SPARQL
>> stanbol
>> >> interface?
>> >>
>> >>
>> >>
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                              ++43-699-11108907
>> | A-5500 Bischofshofen
>> | REDLINK.CO
>> ..........................................................................
>> | http://redlink.co/
>>



-- 
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Re: Clerezza Yard setup and SPARQL

Reply via email to