Hi Rupert,

Thanks for the pointer :)
I will configure the indexing tool to get the foaf sub-set from freebase
dump since I believe it has a more-than sufficient set of foaf data. I will
use the irc channel for frequent questions on my way forward..

Regards,
Dileepa


On Wed, Jun 19, 2013 at 5:07 PM, Rupert Westenthaler <
[email protected]> wrote:

> Hi Dileepa,
>
> IMO it would be better if you join the #stanbol IRC channel on
> freenode.net. This would allow to reduce rtt's (rount trip times) for
> such kind of questions a lot.
>
>
> For the Freebase indexing I implemented property filters (see
> STANBOL-1016) with this you can specify what triples need to be
> imported and others that should be dropped. See the Freebase Indexing
> Tool for details and an Example for how to configure it. A good
> starting point would be to only import triples where the property
> starts with the FOAF namespace.
>
> In addition you could write your own EntityProcessor that checks if a
> Resource does have all required fields before importing.
> EntitiyProcessor implemetnations do get the Representation for an
> Entity parsed. It is very easy to write an EntityProcessor that checks
> if values for some properties are present. If not all required are
> present you can filter those by returning NULL.
>
> With those two things in palce you should easily get a rather good
> quality sub-set of FOAF data from the referenced dataset.
>
> best
> Rupert
>
> On Wed, Jun 19, 2013 at 12:52 PM, Dileepa Jayakody
> <[email protected]> wrote:
> > The the link of the data-set project I'm looking at :
> > http://km.aifb.kit.edu/projects/btc-2012/
> >
> >
> > On Wed, Jun 19, 2013 at 4:21 PM, Dileepa Jayakody <
> [email protected]
> >> wrote:
> >
> >> Hi Rupert et al,
> >> On Wed, Jun 19, 2013 at 2:27 PM, Rupert Westenthaler <
> >> [email protected]> wrote:
> >>
> >>> Hi
> >>>
> >>>
> >>> On Wed, Jun 19, 2013 at 9:20 AM, Dileepa Jayakody
> >>> <[email protected]> wrote:
> >>> > Hi All,
> >>> >
> >>> > I'm trying out entityhub indexing tool to configure a site for a
> sample
> >>> > foaf dataset. My data set (sampleNquads.nx) is in n-quad format.
> >>> Actually
> >>> >  it is a set of links to foaf files from various sources in nquad
> >>> format.
> >>> >
> >>> > eg:
> >>> > <http://www.agfa.com/> <http://www.agfa.com/global/en/main/index.jsp>
> .
> >>> > *<http://sebastian.tramp.name/> <
> http://sebastian.tramp.name/index.rdf>
> >>> .*
> >>> > <http://gitorious.com/~tobyink> <http://gitorious.org/~tobyink> .
> >>> >
> >>>
> >>> I am not completely sure what you are mean by that.
> >>>
> >>
> >> I have misunderstood the N-Quad format, and thought it's just a set of
> >> links to external rdf files.
> >> The sample data-set I used  was just a small part of the actual datahub
> >> dataset (>1 GB) and incomplete. That might be the reason the indexer not
> >> been able to index the dataset.:)
> >>
> >>>
> >>> Generally: Links to RDF files are not supported by the Indexing Tool.
> >>> You will need to download the RDF files to the
> >>> "indexing/resources/rdfdata" directory.
> >>>
> >>> Quad Formats are in principle supported by the Indexing Tool. However
> >>> node that only SPO are used and the Context is dropped during the
> >>> import.
> >>>
> >>>
> >>> For debugging the indexing process:
> >>>
> >>>   * the Indexing Tool logs the number of indexed Entities. You should
> >>> check this value
> >>>   * the IDs off all indexed entities are also stored in
> >>> "indexing/destination/indexed-entities-ids.zip". After installing the
> >>> index to Stanbol you can use those IDs to retrieve the available data
> >>> by using requests like "curl -H "Accept: text/turtle"
> >>> "
> http://localhost:8080/entityhub/site/{site-name}/entity?id={entity-id}";
> >>>
> >>> > I followed the instructions here [1] and in the ReadMe.md,
> >>> > indexing.properties files of the tool and created a site {datahub}
> for
> >>> my
> >>> > data accessible at : http://localhost:8080/entityhub/site/datahub/
> >>> >
> >>> > However when I try out sample requests to find entities in the site I
> >>> get
> >>> > no results.
> >>> > I'm trying to find the entity with *name=Sebastian** which is
> actually
> >>> in
> >>> > the sample dataset used above but I get an empty results set. Can
> anyone
> >>> > please help me understand what I've done wrong here? Basically I have
> >>> > followed the steps in init, index executions of the tool.
> >>> >
> >>> > Is it because my dataset is only a set of external links to foaf
> files?
> >>> > Do I need to manually download the foaf files to
> >>> indexing/resources/rdfdata
> >>> > directory?
> >>> >
> >>> > eg :
> >>> >
> >>> > request: curl -X POST -d "name=Sebastian*"
> >>> > http://localhost:8080/entityhub/site/datahub/find
> >>> >
> >>> > result :
> >>> > {
> >>> >     "query": {
> >>> >         "selected": [
> >>> >             "http:\/\/stanbol.apache.org
> >>> \/ontology\/entityhub\/query#score",
> >>> >             "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label"
> >>> >         ],
> >>> >         "constraints": [{
> >>> >             "type": "text",
> >>> >             "patternType": "wildcard",
> >>> >             "text": "SSebastian Tramp",
> >>> >             "field": "http:\/\/www.w3.org
> \/2000\/01\/rdf-schema#label"
> >>> >         }],
> >>> >         "limit": 5,
> >>> >         "offset": 0
> >>> >     },
> >>> >     "results": []
> >>> > }
> >>> >
> >>>
> >>> For queries like that you need to make sure that your entities do have
> >>> values for "rdf:label". AFAIK the default
> >>> "indexing/config/mapping.txt" configuration does copy the foaf:name
> >>> value to rdfs:label, but if you do specifically work with FOAF data
> >>> you should preferable query for "foaf:name".
> >>>
> >>> Thanks for these useful pointers. I will follow them.
> >>
> >> In general for my GSOC project on FOAF co-reference based
> disambiguation,
> >> do you think this datahub dataset is useful?
> >> This is the best dataset I found so far other than already indexed
> DBpedia
> >> dataset in Stanbol.
> >>
> >> Thanks,
> >> Dileepa
> >>
> >>
> >>> best
> >>> Rupert
> >>>
> >>> >
> >>> > Your help is much appreciated here.
> >>> > Thanks,
> >>> > Dileepa
> >>> >
> >>> >
> >>> > [1] http://stanbol.apache.org/docs/trunk/customvocabulary.html
> >>>
> >>>
> >>>
> >>> --
> >>> | Rupert Westenthaler             [email protected]
> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> | A-5500 Bischofshofen
> >>>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Reply via email to