Re: Big data vs Apachi Jena

2015-10-30 Thread Rob Walpole
Usually when people talk about big data in anything but a very general
sense they are talking about Apache Hadoop and the MapReduce model which is
a way of parallel processing very large data sets. This is a completely
different model to the RDF graph model supported by Apache Jena. That's not
to say you can't process large data sets in parallel using Jena - but that
would be big data in the very general sense.

Rob


On Thu, Oct 29, 2015 at 9:17 PM, kumar rohit  wrote:

> Hello how can big data can be related with Jena or semantic web in general?
>


Re: REST Web services role in Jena and Semantic web

2015-09-10 Thread Rob Walpole
Hi Kumar,

The DRI Catalogue used at the UK National Archives uses a RESTful web
service built on top of the Jena stack to - exactly as John says - provide
access to the information held in RDF without the user (or application in
this case) needing to know anything at all about the underlying technology
or its data structures. It means that should the need arise, the underlying
technology can be replaced without users (or applications) needing to know.

You can read more about this approach here
https://www.nationalarchives.gov.uk/documents/information-management/xml-london-tna-rw.pdf

Rob


On Wed, Sep 9, 2015 at 7:40 PM, kumar rohit  wrote:

> Yes John, I want examples of REST web services which site on top a Jena
> implementation which contains data in owl/rdf. If I do not use JSP, how can
> I achieve it via Rest web services.
>
> On Wed, Sep 9, 2015 at 7:27 PM, John A. Fereira  wrote:
>
> > Are you looking for specific use cases or examples of REST web services
> > which site on top a Jena implementation which contains data in owl/rdf?
> >
> > I can point to a few of them but in general,  a use of a Restful web
> > services could be to provide access to data represented in owl/rdf
> without
> > the requirement of knowing anything about owl, rdf, or other semantic web
> > technologies.
> >
> > -Original Message-
> > From: kumar rohit [mailto:kumar.en...@gmail.com]
> > Sent: Wednesday, September 9, 2015 2:09 PM
> > To: users@jena.apache.org
> > Subject: REST Web services role in Jena and Semantic web
> >
> > Hello what are the uses of Restful web services and AJAX in developing
> > semantic web applications. I mean if I have an owl file/ontology in Jena,
> > where is the role of the Restful web services and Ajax tool then?
> >
>


Re: RDF or OWL

2015-08-20 Thread Rob Walpole
You will need use an OntModel if you want to instantiate your OWL classes.
I assume your vocabulary is actually based on OWL, i.e. all your classes
are subclasses of owl:Thing - so if you want to use these in your Java code
- perhaps to write queries - you will need to instantiate them something
like this:

OntClass MY_OWL_CLASS = MODEL.createClass(MY_VOCAB_URI + MyClassName);

Where MODEL is an instance of OntModel create something like this:

OntModel MODEL = ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM);

I believe you can also extract inferred triples from an OntModel but I
haven't tried this myself.

Rob

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881http://www.linkedin.com/in/robwalpole


On Thu, Aug 20, 2015 at 8:41 AM, kumar rohit kumar.en...@gmail.com wrote:

 Suppose a Protege file which is saved as module.owl in Rdf/xml serialized
 form and we want to import it in Jena, what classes will we use? I mean RDF
 classes like

 Model model = ModelFactory.createDefaultModel();

 OR Ontmodel classes and methods like

 OntModel m = ModelFactory.createOntologyModel();


 I am confused so kindly explain the situation when to use RDF and when
 to use OWL classes when working on Protege and Jena.



Re: JENA code or SPARQL

2015-08-20 Thread Rob Walpole
You probably want to do both. Write your SPARQL queries - either in Jena
ARQ or in text files which you post to Jena Fuseki (depending on your set
up) - and then load the results into a Model to extract the bits of
information you need.

Rob

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881http://www.linkedin.com/in/robwalpole


On Thu, Aug 20, 2015 at 8:10 AM, kumar rohit kumar.en...@gmail.com wrote:

 What is the best way to query a model in Jena? Through  classes and methods
 of Jena like below
 ResIterator itr = model.listSubjectsWithProperty(studies);

 Or through SPARQL inside jena code.?

 and how one is best than the other?



Re: Is this a good way to get started?

2014-12-12 Thread Rob Walpole
Hi again Nate,

When you talk about your T-box data, I think this would contain class
 hierarchies, information about which ones are disjoint, etc. Is that right?


Exactly.. and it is a tiny dataset compared the instance data on the A-box
side.


 `Is there ever a risk that a change to the ontology component (T-box) can
 invalidate the data component (A-box)? If so, how do you manage that?


Definitely.. but in that case we create a patch using SPARQL Update and
apply the patch to the data. We keep the patch in our version control
system so that we have a record of the changes we have made and can
re-apply them if necessary.


 When you load straight to the triple store, is there a single RDF? if not,
 do you use an assembler to gather to multiple files?


This depends. We happen to be using named graphs - I don't know whether
this is appropriate for you or not. We also happen to be using Jena TDB so
if the data is held in N-Quad format then then we load single file which
contains separate graphs. Jena allows you to do this using the tdbloader
command line tool. We could just as easily load separate RDF files that
were in a triple format such as Turtle and specify the graph name during
loading. The result is the same. I wouldn't get too hung up on named graphs
unless you think they are really appropriate for you though as they do add
some complexity to updating the data which it may be better to avoid at
first. The reason we chose to do this is that our ontology is still
developing and we wanted to be able to delete terms that we had decided to
dump without leaving cruft in the triplestore. Dropping the graph seemed to
be the best way to achieve this.


 Does separating the T-box and A-box data have any down sides?  Is it
 invisible to reasoners , for example?


Yes, as I say, using named graphs adds complexity to updates. We are using
Fuseki and we specifiy #dataset tdb:unionDefaultGraph true in the
Fuseki config file and this means that the when we query the data we can
forget about the named graphs as it is transparent to the query. When we do
updates though strange things can happen if we don't specify the graph name
in the right part of the query. I can't say how it impacts on reasoning as
we don't use this at present.


 Finally, I'm obviously a complete neophyte.  Am I in the wrong group?  I
 don't want to put noise in the channel


Being a neophyte is cool - welcome! Whether this is the is the right group
depends whether your questions relate to Jena specifics or not.. it seems
to me they do, at least in part..

Rob



 Thanks again!

 On Thu, Dec 11, 2014 at 12:20 PM, Rob Walpole robkwalp...@gmail.com
 wrote:

  Hi Nate,
 
  I'm not sure what you mean by an ontology management workflow exactly
 and
  I can't comment on whether your approach is a good one or not... but what
  we have done is to create our own ontology which as far as possible
 reuses
  or extends other pre-existing ontologies (e.g. central-goverment, dublin
  core etc.). This ontology consists of a load of classes, object
 properties
  and data properties which are used inside our actual data. The ontology
 (or
  TBox - http://en.wikipedia.org/wiki/Tbox) and data (or ABox -
  http://en.wikipedia.org/wiki/Abox) components exist as separate datasets
  and we have found it convenient to store them as separate named graphs
  within our triplestore - mainly so that the ontology component can be
  updated easily by dropping and reloading the graph.
 
  We manage the ontology using Protege and I have to say I find modelling
  things in Protege saves me from wasting huge amounts of time as it forces
  me to model things up front before I start fiddling about with the data.
 I
  find the OntoGraf plugin particularly helpful when I need to visualise
  relationships and when discussing requirements with users. Protege also
  allows you to save the ontology as an RDF file which you can load
 straight
  into your triplestore (Jena TDB in our case).
 
  We also keep a number of named individuals in the ontology itself. These
  are for things that are entities but what I think of (coming from a Java
  background) as statics. They are the entities which are very unlikely to
  change and if they do then I am happy to edit them within the ontology.
 
  Hope that helps in some way.
 
  Rob
 
  Rob Walpole
  Email robkwalp...@gmail.com
  Tel. +44 (0)7969 869881
  Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole
 
 
  On Thu, Dec 11, 2014 at 12:30 PM, Nate Marks npma...@gmail.com wrote:
 
   I'm trying to get my arms around an ontology management workflow.  I've
   been reading the docs on the Apache Jena site  and a couple of books.
  I
   was hoping to test my understanding of the technology by sharing my
  current
   plan and gathering some feedback.
  
   Thanks in advance if you have the time to comment!
  
  
   I intend to tightly manage a pretty broad ontology.  Let's say it
  includes
   assets, locations, people and workflows

Re: Is this a good way to get started?

2014-12-11 Thread Rob Walpole
Hi Nate,

I'm not sure what you mean by an ontology management workflow exactly and
I can't comment on whether your approach is a good one or not... but what
we have done is to create our own ontology which as far as possible reuses
or extends other pre-existing ontologies (e.g. central-goverment, dublin
core etc.). This ontology consists of a load of classes, object properties
and data properties which are used inside our actual data. The ontology (or
TBox - http://en.wikipedia.org/wiki/Tbox) and data (or ABox -
http://en.wikipedia.org/wiki/Abox) components exist as separate datasets
and we have found it convenient to store them as separate named graphs
within our triplestore - mainly so that the ontology component can be
updated easily by dropping and reloading the graph.

We manage the ontology using Protege and I have to say I find modelling
things in Protege saves me from wasting huge amounts of time as it forces
me to model things up front before I start fiddling about with the data. I
find the OntoGraf plugin particularly helpful when I need to visualise
relationships and when discussing requirements with users. Protege also
allows you to save the ontology as an RDF file which you can load straight
into your triplestore (Jena TDB in our case).

We also keep a number of named individuals in the ontology itself. These
are for things that are entities but what I think of (coming from a Java
background) as statics. They are the entities which are very unlikely to
change and if they do then I am happy to edit them within the ontology.

Hope that helps in some way.

Rob

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


On Thu, Dec 11, 2014 at 12:30 PM, Nate Marks npma...@gmail.com wrote:

 I'm trying to get my arms around an ontology management workflow.  I've
 been reading the docs on the Apache Jena site  and a couple of books.   I
 was hoping to test my understanding of the technology by sharing my current
 plan and gathering some feedback.

 Thanks in advance if you have the time to comment!


 I intend to tightly manage a pretty broad ontology.  Let's say it includes
 assets, locations, people and workflows.

 I think I want to have a single schema file that describes the asset
 class hierarchy  and the rules for validating assets based on properties,
 disjointness etc.

 Then I might have a bunch of other data files that enumerate all the
 assets using that first schema  file.

 I'd repeat this structure using a schema file each for locations, people,
 workflows.

 Having created these files, I think I can  use an assembler file to pull
 them into a single model.

 Ultimately, I expect to query the data using Fuseki and this is where I get
 a little hazy.  I think the assembler can pull the files into a single
 memory model, then I can write it to a tdb.

 Is that necessary, though?  it's a simple bit of java, but I have the
 nagging feeling that there's a shorter path to automatically load/validate
 those files for  Fuseki


 Is this approach to organizing the files sound?



Re: Jena / Stanbol success stories?

2014-11-25 Thread Rob Walpole
Hi Phil,

As Adam says we are using Apache Jena at the UK National Archives to power
the catalogue of our digitial repository. The set up we use has Jena TDB as
the triplestore and Jena Fuseki as the front end for reading and writing
data over HTTP. We also use other components of the Jena project as needed,
such as the Java RDF and ARQ APIs for creating the graphs and queries which
are posted to Fuseki. We also use the Elda Linked Data API (
https://github.com/epimorphics/elda) implentation which provides a nice
out-of-the-box means of viewing and querying the contents of the triple
store without the need for users to understand SPARQL and RDF.

Although it delivers all of our immediate needs I don't think we have
really begun to make use of the power of this technology yet. This will
come when we start to use Natural Language Processing to extract richer
metadata and combine this with entity recognition and inference to extract
new facts and connections. Ideally the data would also be opened to the
Linked Data Cloud enabling us to make connections to other archives and
authoritative organisations and to allow others to make their own
connections to us.

I wrote a conference paper on all of this a couple of years ago which you
can read here -
http://www.nationalarchives.gov.uk/documents/information-management/xml-london-tna-rw.pdf
-
it is a little outdated but the key facts are unchanged.

HTH
Rob

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


On Mon, Nov 24, 2014 at 9:04 PM, Adam Retter adam.ret...@googlemail.com
wrote:

 The National Archives (UK) are using Apache Jena to power the
 Linked-Data Catalogue which forms the backbone of their new Digital
 Archive system (DRI). They are also using Fuseki and Elda which are
 related to Jena.

 I no longer work on the project, but Rob (Cc'd) might be able to tell
 you more if you want to know.

 On 24 November 2014 at 03:19, Phillip Rhodes motley.crue@gmail.com
 wrote:
  Hi all, I was just wondering if anybody knows of, or is involved with,
  any projects using Jena and/or Stanbol which (have been|can be)
  discussed and cited publicly?
 
  A local company that I've been talking to is interested in possibly
  using SemWeb technology (specifically, Jena/Stanbol) internally, but
  are looking for some evidence to support the assertion that this
  technology delivers and is for real.
 
  Any pointers or references would be appreciated... or if you are
  personally involved in something and are willing to talk about it
  (possibly with appropriate NDAs, etc. in place), I'd love to talk to
  you.
 
 
  Thanks,
 
 
 
  Phil
  ---
  This message optimized for indexing by NSA PRISM



 --
 Adam Retter

 skype: adam.retter
 tweet: adamretter
 http://www.adamretter.org.uk



Re: SPARQL protocol and using-graph-uri parameter

2014-07-31 Thread Rob Walpole
Thanks for the explanation Andy. I'm still a bit confused about what is
meant by the default graph though TBH... I thought starting Fuseki with
tdb:unionDefaultGraph true might mean that queries and updates would work
without specifying the graph name? It certainly seems to be true of queries
(Elda just works with no change) - but perhaps this is because only the
WHERE clause is affected by this configuration?

As for updates.. if we use the Fuseki command line s-post tool this also
appears to work but only if we specify the graph name to update as a
parameter (i.e. the named graph URI and not default) so I guess this is
somehow modifying the underlying update? As we have written our own Fuseki
client to use the SPARQL protocol it seems we need to modify the updates
that go through here to include WITH.. but we will bite the bullet as you
suggest :)

Cheers
Rob


On Wed, Jul 30, 2014 at 6:12 PM, Andy Seaborne a...@apache.org wrote:

 On 30/07/14 16:10, Rob Walpole wrote:

 Hi Andy,


  What's the problem with using WITH?



 The problem is that we are want to switch our default dataset from one
 using a single unnamed graph as the default graph to one using two named
 graphs as the default graph. We have figured out how to do the data part
 of
 the switch but we already have a lot of SPARQL update queries in various
 places which we wanted to avoid having to modify.


 I'd bite the bullet and change them -- it's style thing but hacking the
 protocol to effectively modify operation makes for long-term confusion
 where someone maintaining one of those updates isn't aware of the layer
 that modifies the operation via the protocol.

 Doing updates really does presume knowing the physical dataset.  Your
 updates using USING were successfully changing the (empty!) default graph.


  I was hoping that the
 using-named-graph-uri query parameter would just do this - but I
 understand
 from your explanation and re-reading the spec that it only affects the
 WHERE clause. This seems slightly odd as the spec implies to me it is an
 alternative to WITH...

 
 The RDF Dataset for an update operation may be specified either in the
 operation string itself using the USING,USING NAMED, and/or WITH keywords,
 or it may be specified via the using-graph-uri and using-named-graph-uri
   parameters.

 

 ..but I guess it just cannot be used with a WITH.


 USING etc behaves like FROM -- in TDB, that means pick out of the dataset
 but in a different implementation it may be pick off the web. WITH always
 picks from the dataset -- it is syntactic sugar for teh 2*GRAPH form.

 WITH is a different mechanism - it does specify a dataset for the update,
 one where the default graph is different for both access (WHERE) and change
 (DELETE, INSERT).

 But WITH is not a form of USING - each has it's own use and they are
 different.

 What's sort of missing is a protocol argument with-graph-uri.

 Andy



 Cheers
 Rob





-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


SPARQL protocol and using-graph-uri parameter

2014-07-30 Thread Rob Walpole
Hi,

We are running TDB-backed Fuseki started with an assembler file with
tdb:unionDefaultGraph set to true. There are two named graphs in our
dataset. If I want to delete an item from one of the graphs I can do so
using the WITH clause as in the following example...

PREFIX dcterms: http://purl.org/dc/terms/
PREFIX dri: http://nationalarchives.gov.uk/terms/dri#
PREFIX xsd: http://www.w3.org/2001/XMLSchema#

WITH http://nationalarchives.gov.uk/dri/catalogue
DELETE
{
  http://nationalarchives.gov.uk/dri/catalogue/record-list/abc
dri:recordListMember ?item .
}
WHERE
{
  ?item dcterms:identifier 123^^xsd:string .
}

However I am sending my updates over HTTP using the SPAQRL Protocol and so
I would prefer not to use the WITH clause but instead to use the
using-graph-uri parameter in my HTTP POST request. My understanding from
the specs is that the following HTTP request should be equivalent but it
doesn't seem to be. Can anyone confirm if this should work? Although it
executes successfully (response 204) - the data is not deleted...

POST 
/catalogue/update?using-graph-uri=http%3A%2F%2Fnationalarchives.gov.uk%2Fdri%2Fcatalogue
HTTP/1.1
Host: localhost:3030
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.5) Gecko/20120606
Firefox/10.0.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Type: application/sparql-update; charset=UTF-8
Content-Length: 557
Pragma: no-cache
Cache-Control: no-cache
PREFIX dcterms: http://purl.org/dc/terms/
PREFIX dri: http://nationalarchives.gov.uk/terms/dri#
PREFIX xsd: http://www.w3.org/2001/XMLSchema#
DELETE { http://nationalarchives.gov.uk/dri/catalogue/record-list/abc
dri:recordListMember ?item . }
WHERE { ?item dcterms:identifier 123^^xsd:string . }
HTTP/1.1 204 No Content
Fuseki-Request-ID: 40
Access-Control-Allow-Origin: *
Server: Fuseki (1.0.2)
--

Thanks
Rob

-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: SPARQL protocol and using-graph-uri parameter

2014-07-30 Thread Rob Walpole
Hi Andy,


 What's the problem with using WITH?


The problem is that we are want to switch our default dataset from one
using a single unnamed graph as the default graph to one using two named
graphs as the default graph. We have figured out how to do the data part of
the switch but we already have a lot of SPARQL update queries in various
places which we wanted to avoid having to modify. I was hoping that the
using-named-graph-uri query parameter would just do this - but I understand
from your explanation and re-reading the spec that it only affects the
WHERE clause. This seems slightly odd as the spec implies to me it is an
alternative to WITH...


The RDF Dataset for an update operation may be specified either in the
operation string itself using the USING,USING NAMED, and/or WITH keywords,
or it may be specified via the using-graph-uri and using-named-graph-uri
 parameters.



..but I guess it just cannot be used with a WITH.

Cheers
Rob


Re: Comparing xsd:date/time with different time zones

2014-04-10 Thread Rob Walpole
We hit this exact problem a while ago...

Basically if no timezone is specified then it takes the max and min
possible timezone difference, i.e. +14 hours and -14 hours, and for
anything that falls within this block of time no comparison can be made.

The best solution is just to add the timezone information. However
retrospectively adding timezone information is a challenge in it's own
right! The lesson for us is to always use timezones.

Rob

Rob


On Tue, Apr 8, 2014 at 10:21 AM, Andy Seaborne a...@apache.org wrote:

 On 08/04/14 01:38, Holger Knublauch wrote:

 Hi all,

 we noticed a change in behavior between the recent Jena versions.


 Between which versions?



 SELECT *
 WHERE {
  BIND(2014-04-08T00:00:00^^xsd:dateTime AS ?time0) .
  BIND(2014-04-08T00:00:00+10:00^^xsd:dateTime AS ?time1) .
  BIND(xsd:date(?time0) AS ?date0) .
  BIND(xsd:date(?time1) AS ?date1) .
  BIND(?date0  ?date1 AS ?compare)
 }

 With Jena 2.11 ?compare is unbound, indicating that it is impossible to
 compare times or dates that have a different time zone.


 2014-04-08T00:00:00^^xsd:dateTime does not have a timezone.

 No timezone is not the same as different timezone.


  Is this the expected behavior? Shouldn't it in theory figure out the
 time zones itself (e.g. by aligning all to a universal time zone)
 instead of failing?


 See the XSD FO spec for comparing dateTimes with and without timezones.

 14 is a magic number.


  What would be a work-around to do this manually for data that is
 stored in mixed time zones?


 Be careful.

 The problem is that the data publisher and the query engine (and indeed
 the client asking the query) may be in different timezones so defaulting a
 timezone does not make sense on the web.

 There is nothing special about Z.

 Andy

 PS Please use the users@jena.apache.org mailing list, not the incubator
 one.


 Thanks
 Holger





-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: Re: Query causes a StackOverflowError

2014-03-18 Thread Rob Walpole
On Tue, Mar 18, 2014 at 11:59 AM, Chris Dollin chris.dol...@epimorphics.com
 wrote:

 On Monday, March 17, 2014 06:17:10 PM Adam Retter wrote:

  We did try using the:
 
  yourSPARQLEndpoint elda:supportsNestedSelects true.
 
  Although I think we found that the option was actually (non-plural):
 
  yourSPARQLEndpoint elda:supportsNestedSelect true.

 Oops, sorry.


Actually what we used in the end looked like:-

http://localhost:3030/catalogue/query elda:supportsNestedSelect
true^^xsd:string .

Fuseki then said it was using nested selects (in the 500 error) - but it
didn't make any difference to the problem.



 If the fix Rob made was to change from labelledDescribeViewer to
 a simpler viewer like api:describerViewer then that's a confirmation
 and I'll add an issue to the issues list.


The thing is we do actually want the labels from all the viewed items as we
use these elsewhere. As the generated union query seemed to be the root of
the problem we switched to using our Elda construct extension so we could
have an endpoint like this:

spec:record-list a apivc:ListEndpoint
; apivc:uriTemplate /record-list/{uuid}
; apivc:variable [apivc:name uuid; apivc:type xsd:string]
; tna:construct 
CONSTRUCT { ?member rdfs:label ?label . }
  WHERE {
?recordList dcterms:identifier ?uuid ;
dri:recordListMember ?member .
?member rdfs:label ?label .
}

.

The query takes a while to run (there are approx 10,000
dri:recordListMember entries) but we get a result back eventually with no
stack overflow problem.

Rob
-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: Naming entities

2013-09-02 Thread Rob Walpole
+1 for UUIDs but with some domain information. i.e.
http://my.base.uri/person/{uuid} or http://my.base.uri/employee/{uuid}

In RDF terms you can have more human readable information in the label and
render this where required, i.e.

http://my.base.uri/person/{uuid-goes-here} a foaf:Person ;
rdfs:label David Moss .


On Thu, Aug 15, 2013 at 1:32 PM, Martynas Jusevičius
marty...@graphity.orgwrote:

 Where uniqueness is more important than readability, I would go with UUIDs.

 On Thu, Aug 15, 2013 at 2:03 AM, David Moss admo...@gmail.com wrote:
  This is a fairly basic question, but how do others go about naming
 entities in an RDF graph?
 
  The semantic web evangelists are keen on URIs that mean something ie 
 http://admoss.info/David_Moss.
 
  This sounds great but in practice it doesn't scale. There are many
 people named David Moss in the world.
 
  It is possible to have URIs such as http://admoss.info/David_Moss1 
 http://admoss.info/David_Moss2 ... http://admoss.info/David_Moss249,
 but differentiating between them is not a human readable task. It also
 becomes problematic in tracking the highest number of each entity name so
 additions can be made to the graph.
 
  I first tried using blank nodes as entity identifiers but they are no
 good for the purpose as searching is difficult and they are not supposed to
 be used outside the environment in which they are created. They are
 supposed to be internal only references for convenience of the machine.
 They are also the antithesis of human readable.
 
  I currently maintainable next_id entity in my graph and use and update
 its value to obtain entity names, ending up with 
 http://admoss.info/person22, http://admoss.info/organisation23 and 
 http://admoss.info/Building24 etc.
 
  This is not exactly human readable, but I can't think of any naming
 policy that maintains the dream of human readable identifiers yet scales.
 
  How are others addressing this issue?
 
 




-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: Default and named graphs within TDB/Fuseki

2013-07-10 Thread Rob Walpole
Hi Andy,

Thanks for your reply.


 That load for the purpose of the query ... you already have the data in
 the dataset.  Remove these.

 Did you load into http://example.org/bob;


So I used tdbloader2 to store dft.ttl into an empty instance of TDB. I
assume this places the data in the default graph? The dft.ttl data looks
like this:

@prefix dc: http://purl.org/dc/elements/1.1/ .

http://example.org/bobdc:publisher  Bob Hacker .
http://example.org/alice  dc:publisher  Alice Hacker .

Then I used s-put to store bob.ttl and alice.ttl into the named graphs
http://example.org/bob and http://example.org/alice respectively.

bob.ttl looks like this:-

@prefix foaf: http://xmlns.com/foaf/0.1/ .

_:a foaf:name Bob .
_:a foaf:mbox mailto:b...@oldcorp.example.org .

and alice.ttl looks like this...

@prefix foaf: http://xmlns.com/foaf/0.1/ .

_:a foaf:name Alice .
_:a foaf:mbox mailto:al...@work.example.org .

When I run the query though I get no results.


 Run this:

 SELECT * {
   { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } }
 }

 to see what you have.


I have this:-


| s  | p
| o| g  |

| http://example.org/bob   |
http://purl.org/dc/elements/1.1/publisher | Bob Hacker
||
| http://example.org/alice |
http://purl.org/dc/elements/1.1/publisher | Alice Hacker
||
| _:b0   | http://xmlns.com/foaf/0.1/mbox
| mailto:b...@oldcorp.example.org | http://example.org/bob   |
| _:b0   | http://xmlns.com/foaf/0.1/name
| Bob| http://example.org/bob   |
| _:b1   | http://xmlns.com/foaf/0.1/mbox
| mailto:al...@work.example.org  | http://example.org/alice |
| _:b1   | http://xmlns.com/foaf/0.1/name
| Alice  | http://example.org/alice |






  I have tried using the form http://localhost:3030/my-**
 dataset/data?default http://localhost:3030/my-dataset/data?default
 but this has no affect (although I can download the data from here...)


 Longer:

 In TDB it picks FROM/FROM NAMED from the set of already loaded named
 graphs.

 http://jena.apache.org/**documentation/tdb/dynamic_**datasets.htmlhttp://jena.apache.org/documentation/tdb/dynamic_datasets.html

 But it is only from data already loaded.  You probably don't want to do
 that.


Well yes that is what I want - all of the data is loaded but some is in
what I think is the default graph and some is in named graphs. Are you
saying that the default graph is either the data not in named graph or the
combination of named graphs?  It can only be one or the other?

Thanks
Rob


Default and named graphs within TDB/Fuseki

2013-07-09 Thread Rob Walpole
Hi,

I could use some help to understand default and named graphs within Jena
(TDB/Fuseki)...

In the following example taken from the W3C SPARQL 1.1 spec what needs to
go in place of...

FROM http://example.org/dft.ttl

...assuming I have loaded dft.ttl into the 'default' graph of TDB and am
querying via Fuseki?

PREFIX foaf: http://xmlns.com/foaf/0.1/
PREFIX dc: http://purl.org/dc/elements/1.1/

SELECT ?who ?g ?mbox
FROM http://example.org/dft.ttl
FROM NAMED http://example.org/alice
FROM NAMED http://example.org/bob
WHERE
{
   ?g dc:publisher ?who .
   GRAPH ?g { ?x foaf:mbox ?mbox }
}

I have tried using the form http://localhost:3030/my-dataset/data?default
but this has no affect (although I can download the data from here...)

Many thanks
Rob

-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: Conditional INSERT statements

2013-07-05 Thread Rob Walpole
Hi Andy,

Just to complete the thread and for anyone else that needs to do something
similar, we now have this working nicely using two operations, as you
suggested. Here is the code...

DELETE
{
?transfer dri:transferAsset ?transferAsset .
?transferAsset ?transferAssetProperty ?transferAssetValue .
}
WHERE
{
?transfer dcterms:identifier 20130628134601^^xsd:string .
?subject dcterms:identifier
9a10d7cf-c7e8-4e06-91c7-1d46aaa6e52e^^xsd:string .
  OPTIONAL
{
?transfer dri:transferAsset ?transferAsset .
?transferAsset dcterms:subject  ?subject ;
dcterms:modified ?transferAssetModified ;
?transferAssetProperty ?transferAssetValue .
FILTER(?transferAssetModified 
2013-06-28T15:21:00Z^^xsd:dateTime)
  }
};
INSERT
{
?transfer dri:transferAsset [
dri:transferAssetStatus 
http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus#COMPLETED
;
dcterms:modified  2013-06-28T15:21:00Z^^xsd:dateTime ;
dcterms:subject  ?subject
]
}
WHERE
{
# where the transfer asset doesn't exist - i.e it is new or has not been
deleted
  ?transfer dcterms:identifier 20130628134601^^xsd:string .
?subject dcterms:identifier
9a10d7cf-c7e8-4e06-91c7-1d46aaa6e52e^^xsd:string .
FILTER(NOT EXISTS { ?transfer dri:transferAsset [ dcterms:subject
?subject ] . })
}

So basically the delete is conditional, as before, but now the insert is
conditional on the delete having occurred. Easy when you know how :-)

Thanks for your help!

Rob


On Sat, Jun 15, 2013 at 2:02 PM, Andy Seaborne a...@apache.org wrote:

 PREFIX  dri:  
 http://nationalarchives.gov.**uk/terms/dri#http://nationalarchives.gov.uk/terms/dri#
 
 PREFIX  rdfs: 
 http://www.w3.org/2000/01/**rdf-schema#http://www.w3.org/2000/01/rdf-schema#
 
 PREFIX  status: http://nationalarchives.gov.**uk/dri/catalogue/**
 transferAssetStatus#http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus#
 
 PREFIX  dct:  http://purl.org/dc/terms/
 PREFIX  xsd:  
 http://www.w3.org/2001/**XMLSchema#http://www.w3.org/2001/XMLSchema#
 
 PREFIX  rdf:  
 http://www.w3.org/1999/02/22-**rdf-syntax-ns#http://www.w3.org/1999/02/22-rdf-syntax-ns#
 


 DELETE {
   ?transfer dri:transferAsset ?transferAsset .
 }
 INSERT {
   ?transfer dri:transferAsset _:b0 .
   _:b0 dct:subject ?subject .
   _:b0 dri:transferAssetStatus status:SENT .
   _:b0 dct:modified 2013-06-13T11:58:23.468Z^^**xsd:dateTime .
 }
 WHERE
   { ?transfer dct:identifier 201305241200^^xsd:string .
 ?subject dct:identifier dff82497-f161-4afd-8e38-**
 f31a8b475b43^^xsd:string

 OPTIONAL
   { ?transfer dri:transferAsset ?transferAsset .
 ?transferAsset dct:subject ?subject .
 ?transferAsset dct:modified ?transferAssetModified
 FILTER ( ?transferAssetModified  
 2013-06-13T11:58:23.468Z^^**xsd:dateTime
 )
   }
   }



 Rob,

 (which version of the software?)

 The example data does not look like the DELETE was applied - there is
 still a dri:transferAsset link to the old state.  I would have expected the
 bnode still to be there but the triple connecting it should have gone.

 If so, then the OPTIONAL is not matching -- it sets ?transferAsset.

 In your example, the

 ?subject dct:identifier ...

 does not match either but an INSERT does seem to have happened.

 Could you delete all ?transferAsset completely?  The new state only
 depends on the new status if it's a legal state transition for the status.

 To cope with the fact that COMPLETED can come before SENDING, test the
 status.



 DELETE {
   ?transfer dri:transferAsset ?transferAsset .
   ?transferAsset ?p ?o .
 }
 INSERT {
   ?transfer dri:transferAsset _:b0 .
   _:b0 dct:subject ?subject .
   _:b0 dri:transferAssetStatus status:SENT .
   _:b0 dct:modified 2013-06-13T11:58:23.468Z^^**xsd:dateTime .
 }
 WHERE {
   ?transfer dct:identifier 201305241200^^xsd:string ;
 dri:transferAssetStatus ?status ;
 dri:transferAsset ?transferAsset .
   FILTER (?status != status:COMPLETED)
   ?transferAsset ?p ?o .
 } ;


 SPARQL Updates can be several operations in one request.  It may be easier
 to have two operations

 DELETE { ... } WHERE { ... } ;
 INSERT { ... } WHERE { ... }

 Andy




-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: How to backup/restore a Jena TDB

2013-07-01 Thread Rob Walpole
You can use the tdbloader or tdbloader2 script (depending on your OS) which
is found in the /bin directory of your Apache Jena install.

The command you need is something like ./tdbloader2 -loc
/path-to-tdb-data-dir /path-to-rdf-file


On Mon, Jul 1, 2013 at 12:31 PM, Frederic Toublanc 
frederic.toubl...@telemis.com wrote:

 Hello everyone,

 I'm using this code to backup a Jena TDB :

 FileOutputStream fos = new FileOutputStream(new File(backupFilePath));
  TDBBackup.backup(new Location(tdbLocation), fos);

 This generate a backup.rdf file

 But i really dunno how to restore this backup in a new jena TDB.




-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: How to backup/restore a Jena TDB

2013-07-01 Thread Rob Walpole
Was the database empty before you attempted the restore? If there was old
data then this should be deleted or moved elsewhere (all .dat, .idn. .opt
and jrnl files). Also it is important that nothing in the JVM is attempting
to access the database while the load script is running - or this could
cause problems.

Are you accessing TDB via Fuseki? If so then this gives you your JVM
container and backups should be done using the management console - you
shouldn't do backups from another JVM (like using the Java API) or likewise
you could have problems.


On Mon, Jul 1, 2013 at 3:38 PM, Frederic Toublanc 
frederic.toubl...@telemis.com wrote:

 Ok i can restore the export rdf file. But is there a way to specify the
 tdbloader script to set the TDB as transactional ?

 Cause when i want to read the data after the restore i have a nodeReadOnly
 error. I'm using transaction to read and write so i guess when i restored
 it back the TDB wasn't set to support transactions.

 Any clue ?


 2013/7/1 Frederic Toublanc frederic.toubl...@telemis.com

  Ok i will try that thanks a lot.
 
 
  2013/7/1 Rob Walpole robkwalp...@gmail.com
 
  You can use the tdbloader or tdbloader2 script (depending on your OS)
  which
  is found in the /bin directory of your Apache Jena install.
 
  The command you need is something like ./tdbloader2 -loc
  /path-to-tdb-data-dir /path-to-rdf-file
 
 
  On Mon, Jul 1, 2013 at 12:31 PM, Frederic Toublanc 
  frederic.toubl...@telemis.com wrote:
 
   Hello everyone,
  
   I'm using this code to backup a Jena TDB :
  
   FileOutputStream fos = new FileOutputStream(new File(backupFilePath));
TDBBackup.backup(new Location(tdbLocation), fos);
  
   This generate a backup.rdf file
  
   But i really dunno how to restore this backup in a new jena TDB.
  
 
 
 
  --
 
  Rob Walpole
  Email robkwalp...@gmail.com
  Tel. +44 (0)7969 869881
  Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole
 
 
 




-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: Conditional INSERT statements

2013-06-14 Thread Rob Walpole
Hi Andy,

Thanks for your reply...


On Thu, Jun 13, 2013 at 11:26 PM, Andy Seaborne a...@apache.org wrote:

 Immediate comments:

 1/



 DELETE {
   ?transfer 
 http://nationalarchives.gov.**uk/terms/dri#transferAssethttp://nationalarchives.gov.uk/terms/dri#transferAsset
 
 ?transferAsset .
 }
 INSERT {
   ?transfer 
 http://nationalarchives.gov.**uk/terms/dri#transferAssethttp://nationalarchives.gov.uk/terms/dri#transferAsset
 _:b0 .
   various triples with _:b0 as subject


 so the DELETE leaves the structure from the bNode in place even if
 unlinked?


Ah, so are you saying we need to do something like:

DELETE
{
?transfer dri:transferAsset ?transferAsset .
?transferAsset ?property ?value .
}
INSERT
{
new transferAsset...
}
WHERE
{
?transfer dcterms:identifier 201305241200^^xsd:string .
?subject dcterms:identifierdff82497-f161-4afd-8e38-**f31a8
b475b43^^xsd:string
OPTIONAL {
?transfer dri:transferAsset ?transferAsset .
?transferAsset dcterms:subject ?subject
?transferAsset dcterms:modified ?transferAssetModified ;
?property ?value .
FILTER (?transferAssetModified
2013-06-13T11:58:23.468Z^^xsd:dateTime)
}
}

Otherwise the bNode gets orphaned? I can see this may be wrong but I'm not
sure this would cause our problem which I believe is to do with the INSERT
not being conditional on the filter statement.


 2/ In the data: I see:

 .../transferAssetStatus#SENT

 but

 .../transferStatus#COMPLETED

 Whta's

 dri:transferAsset
 [ dri:transferAssetStatus ..
 vs

 dri:transferStatus


Yes, I've probably given you too much information there. The transfer item
also has status (as well as it's transferAssets) but this is unrelated the
the problem.

Thanks
Rob




Conditional INSERT statements

2013-06-13 Thread Rob Walpole
Hi there,

Hoping for some help with a SPARQL challenge I am facing...

We have an instance of Jena Fuseki to which we POST SPARQL update queries
to update the status of a thing called a transferAsset. A transferAsset is
actually a bNode which is referenced by a thing called a transfer. These
transferAssets have a subject, which is another resource, a status and a
modified date.

While there is a specific order in which status updates should be applied
(e.g SENDING, SENT, COMPLETED) there is no guarantee of the order in which
the updates will be received by Fuseki. In other words a COMPLETED status
may inadvertently be received before a SENDING status. To check that we
only apply the most recent status we check the modified date of the
transferAsset. If the new modified date is later than the existitng
modified date we delete the transferAsset at the same time as inserting a
new one.

The problem however is that although the deletion is conditional on there
being a transferAsset older than the new transferAsset, the insert is not,
and I am struggling to see how to make the insert conditional. This means
that if the updates are received out of order we end up inserting another
transferAsset with the same subject, i.e without deleting the old
transferAsset - not what we want.

The SPARQL code is shown below...

DELETE {
  ?transfer http://nationalarchives.gov.uk/terms/dri#transferAsset
?transferAsset .
}
INSERT {
  ?transfer http://nationalarchives.gov.uk/terms/dri#transferAsset _:b0 .
  _:b0 http://purl.org/dc/terms/subject ?subject .
  _:b0 http://nationalarchives.gov.uk/terms/dri#transferAssetStatus 
http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus#SENT .
  _:b0 http://purl.org/dc/terms/modified 2013-06-13T11:58:23.468Z^^
http://www.w3.org/2001/XMLSchema#dateTime .
}
WHERE
  { ?transfer http://purl.org/dc/terms/identifier 201305241200^^
http://www.w3.org/2001/XMLSchema#string .
?subject http://purl.org/dc/terms/identifier
dff82497-f161-4afd-8e38-f31a8b475b43^^
http://www.w3.org/2001/XMLSchema#string
OPTIONAL
  { ?transfer http://nationalarchives.gov.uk/terms/dri#transferAsset
?transferAsset .
?transferAsset http://purl.org/dc/terms/subject ?subject .
?transferAsset http://purl.org/dc/terms/modified
?transferAssetModified
FILTER ( ?transferAssetModified  2013-06-13T11:58:23.468Z^^
http://www.w3.org/2001/XMLSchema#dateTime )
  }
  }

And a sample transfer showing the duplication problem...

http://nationalarchives.gov.uk/dri/catalogue/transfer/201305241200
  rdf:type  dri:Transfer ;
  dri:transferAsset
[ dri:transferAssetStatus

http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus#SENDING ;
  dcterms:modified
 2013-06-13T11:58:23.463Z^^xsd:dateTime ;
  dcterms:subject  
http://nationalarchives.gov.uk/dri/catalogue/item/dff82497-f161-4afd-8e38-f31a8b475b43

] ;
  dri:transferAsset
[ dri:transferAssetStatus

http://nationalarchives.gov.uk/dri/catalogue/transferAssetStatus#SENT ;
  dcterms:modified
 2013-06-13T11:58:23.468Z^^xsd:dateTime ;
  dcterms:subject  
http://nationalarchives.gov.uk/dri/catalogue/item/dff82497-f161-4afd-8e38-f31a8b475b43

] ;
  dri:transferStatus  
http://nationalarchives.gov.uk/dri/catalogue/transferStatus#COMPLETED ;
  dcterms:identifier  201305241200^^xsd:string ;
  dcterms:modified  2013-06-13T11:58:27.999Z^^xsd:dateTime .

Many thanks

-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole


Re: Populate an owl ontology with mysql database

2013-05-19 Thread Rob Walpole
David,

I would suggest using something like D2RQ to dump your data as RDF. You may
be able to get the mapping you want from this but if not, load it into TDB
and run some SPARQL transforms (constructs) to get it how you want it.

Rob
On May 19, 2013 11:58 AM, David De La Peña da...@delapena.eu wrote:

 Hello,
 could I use Jena to populate individuals form a mysql db to an owl
 ontology?

 Thank you,

 --
 David DE LA PEÑA



Re: Binding causes hang in Fuseki

2013-02-01 Thread Rob Walpole
Hi Andy,

On Fri, Feb 1, 2013 at 12:16 PM, Andy Seaborne a...@apache.org wrote:

 Rob - I notice you use rdfs:member which is a calculated property in ARQ.

 There is a bug in ARQ (JENA-340) which means this isn't handled as a
 calculated property inside a FILTER.


 You have:

 FILTER EXISTS { ?export rdfs:member ?ancestor } .


On it's own this doesn't seem to cause us any problems... perhaps because
we are specifically inserting these triples into TDB rather than relying on
them being calculated?

Thanks
Rob


Re: Binding causes hang in Fuseki

2013-01-29 Thread Rob Walpole
Cool, thanks guys, will give this a try tomorrow :-)

Rob


On Tue, Jan 29, 2013 at 7:36 PM, Andy Seaborne a...@apache.org wrote:

 On 29/01/13 18:21, Alexander Dutton wrote:


 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi Rob,

 On 29/01/13 18:11, Rob Walpole wrote:

 Am I doing something wrong here?


 The short answer is that the inner SELECT is evaluated first, leading to
 the results being calculated in the second case in a rather inefficient
 way.

 In the first inner SELECT ?deselected is bound, so it's quite quick to
 find all its ancestors.

 In the second, all possible ?deselected and ?ancestor pairs are returned
 by the inner query, which are then (effectively) filtered to remove all
 the pairs where ?deselected isn't whatever it was BINDed to.

 Here's more from the spec:
 http://www.w3.org/TR/**sparql11-query/#subquerieshttp://www.w3.org/TR/sparql11-query/#subqueries
 .

 I /think/ ARQ is able to perform some optimisations along these lines,
 but obviously not for your query.


 Spot on.

 If you remove the inner SELECT it should do better.



   { BIND(...) AS ?readyStatus)
 BIND(...) AS ?deselected)
 ?export rdfs:member ?member .
 ?export dri:username rwalpole^^xsd:string .
 ?export dri:exportStatus ?readyStatus
 OPTIONAL
   { ?deselected (dri:parent)+ ?ancestor

 FILTER EXISTS {?export rdfs:member ?ancestor }
   }
   }

 but technically this is a different query so it'll depend on your data as
 to whether it is right.

 http://www.sparql.org/query-**validator.htmlhttp://www.sparql.org/query-validator.html

 Andy



 Best regards,

 Alex

 PS. You don't need to do URI(http://?;); you can do a straight IRI
 literal: http://?

 - --
 Alexander Dutton
 Developer, Office of the CIO; data.ox.ac.uk, OxPoints
 IT Services, University of Oxford
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.13 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQEcBAEBAgAGBQJRCBMZAAoJEPotab**D1ANF7Fb0H/**jeCedjfCIuhI2KTNETOcrVR
 Gvl8N4k9ty4AN4F0xFKA3kcGCTR2CI**pgz/**hez6BM5s8mDqLc7ViNPXWxbUhb4kHh
 fxVuuoYBr13VhGnyufvWFliFeT3xSV**LO3eDUilzoja2pvH/Cx/**sNQvcHbi2Ee+EX
 MoWLyfSvtSGY2rXDmMAXvBz49wgk42**mC2Bsr5ptNUfXWQjzz6BXp5SxTKADy**SBXG
 Tm/**DmqGRclHxw233I6EcB9lKfDytTosVu**gH1Yl0BGEHiFPL2/wkkB+**AZiLIwCmb/
 cy+Y8/**I9PlD4onvYlDMRmP169HQVYt849Skx**5/TnTyjMBBNIgQiE8+cj0a/oDc8=
 =ZQec
 -END PGP SIGNATURE-





-- 

Rob Walpole
Email robkwalp...@gmail.com
Tel. +44 (0)7969 869881
Skype: RobertWalpolehttp://www.linkedin.com/in/robwalpole