Re: skolemize a graph when loading

2024-06-30 Thread Paul Tyson



On 6/30/24 06:27, Andy Seaborne wrote:

Hi Paul,

You want this to make bnodes in the Fuseki server addressable?
Yes, exactly. Not formally skolemization, more of a data quality issue. 
While working out some graph processing use cases, I discovered that 
addressable nodes are helpful, if not necessary. It's easy enough to get 
a list of nodes by type, but if you want to go back and get further 
details of each node, it's problematic without URIs. The node properties 
are variegated, so it's cumbersome to construct a sparql query with 
enough OPTIONAL clauses to get all the info you want on the first query 
for node types.


I agree it would be nice for something in riot to do this.
As a convenience, yes. If misused, it could surprise users by modifying 
stored data in ways they didn't expect. In my case, it would just be 
protection against defective inputs.


What skolemization naming schemes are popular?

I don't know. I was thinking of something like '.../.well-known/genid/...'



It's possible to convert using text processing on N-Triples. Regex for 
(^| )_: and replace with a  or what every scheme you want.


Maybe 
Yes, if it turns out to be a persistent problem with data input, I could 
do something like that, or Martynas's Skolemizer.



Jena also has a non-standard extension:

URI(bnode()) => <_:44f86b24-8556-44e0-89eb-27e3bf21c66e>

"_:" is illegal as a URI scheme and also as a relative URI so thjat's 
illegal as a URI, strictly but Jena allows it.


It can be in SPARQL results and queries.

When a Jena parser or result set reader sees "_:" it uses the rest as 
the bnode id.


Ah, that's what I was missing. I did try using the bnode id in a query 
as a curie but it was illegal. I will try as a uriref. That would be the 
shortest of short-term fixes for my immediate problem.


Thank you,
--Paul



Unlike skolemization, The data and results really do have a bnode in 
it so the data itself is unchanged.


    Andy

Related but different: : https://github.com/apache/jena/issues/2549
"More granular control over Blank node serialization"

On 6/27/24 22:57, Martynas Jusevičius wrote:

LinkedDataHub has a class that does that:
https://github.com/AtomGraph/LinkedDataHub/blob/develop/src/main/java/com/atomgraph/linkeddatahub/server/util/Skolemizer.java 



On Thu, Jun 27, 2024 at 11:55 PM Paul Tyson  
wrote:


I searched the source and docs but didn't turn up any easy way to
skolemize an input graph before loading to fuseki. The skolemization
capabilities appear to be related to the reasoners, which I don't need.

I expect there's a good reason for this missing feature, but wanted to
ask if I've missed something. I probably don't need it badly enough to
build something for it. It would be better if the dataset came to me
with the necessary node ids, but am considering if it's worth 
putting in

a fixup routine.

Thanks
--Paul



skolemize a graph when loading

2024-06-27 Thread Paul Tyson
I searched the source and docs but didn't turn up any easy way to 
skolemize an input graph before loading to fuseki. The skolemization 
capabilities appear to be related to the reasoners, which I don't need.


I expect there's a good reason for this missing feature, but wanted to 
ask if I've missed something. I probably don't need it badly enough to 
build something for it. It would be better if the dataset came to me 
with the necessary node ids, but am considering if it's worth putting in 
a fixup routine.


Thanks
--Paul



Re: SHACL

2024-04-06 Thread Paul Tyson

This is old, but might be useful. It has a chapter on SHACL.

https://book.validatingrdf.com/

Regards,
--Paul

On 4/5/24 04:58, Hashim Khan wrote:

Hi,
  I am interested in working with SHACL shapes for validation. I would
like to know if someone  points out a nice resource.

Best,



Re: Database Migrations in Fuseki

2024-02-10 Thread Paul Tyson
I don't know if my experience is helpful, but I went a different way to 
solve these sorts of problems.


I avoid adding business logic while generating the RDF from the source 
data systems. It is almost entirely a simple transliteration from one 
format to another (I use an R2RML mapping.) The purpose is to combine 
data about the same subject from different source systems.


Separately, I wrote rules that check for several different desired 
constraints on the data. I think this corresponds to your "application 
logic". The source rules are written in RIF, then translated to HTML for 
display and SPARQL for execution.


Using the rules (in SPARQL format) I materialize additional triples and 
add them to the datastore.


This approach decouples data from business logic and lets each evolve 
separately. When the source data schema changes, I change the R2RML 
mapping. When the business logic changes, I change the RIF rules.


The logic (rules) may be applied in any of several different ways to 
suit your needs (for example using a different rules engine--or shex or 
shacl--, or generating results on-the-fly instead of materializing them 
back into the datastore).


Regards,
--Paul

On 2/9/24 08:17, balduin.landolt@dasch.swiss.INVALID wrote:

Hi Andy,


If I understand correctly, this is a schema change requiring the data to change.

Correct, but we don't enforce any schema on the database level (no SHACL 
involved), that's only done programmatically in the application.


The transformation of the data to the updated data model could be done offline, 
that would reduce downtime. If the data is being continuously updated, that's 
harder because the offline copy will get out of step with the live data.
How often does the data change (not due to application logic changes)?

The data might theoretically change constantly, so just doing it offline on a 
copy isn't really possible.
A compromise I've been thinking about, which would still be better than 
downtime, would be a read-only mode for the duration of the migration. But for 
now, the application doesn't support something like this yet.
(And if we can get the downtime to something reasonable, that would be good 
enough.)


Do you have a concrete example of such a change?

These changes can vary from very simple to very complex:
- The simplest case would maybe be that a certain property that used to be 
optional on a certain type of resource becomes mandatory; for all instances 
where this is not present, a default value needs to be supplied.
   => this we could easily do with a SPARQL update.
- The most complex case I encountered so far was roughly this:
   Given in graph A (representing the data model for a subset of the data), a 
particular statement on something of type P (defining some kind of property) is 
present, and in graph B (the subset of data corresponding to the model in A), a 
certain statement holds true for all V (which have a reference to P), then P 
should be modified. If the statement does not hold true for all V, then each V 
where it does not, must be modified to become a more complex object.
   (More concretely: V represents a text value. If P says that V may contain 
markup, then check if any V contains markup. If not, change P to say that it 
does not contain markup;  if any V contains markup, then all Vs that represent 
text without markup need to be changed to contain text with markup. Text 
without markup here represents a bit of reification around a string literal; 
text with markup follows a sophisticated standoff markup model, and even if no 
markup is present, it needs to contain information on nature of the markup that 
is used.)
   => this is something I would not know how to, or feel comfortable attempting 
in SPARQL, so it needs to happen in code.

Long story short: Some simple changes I could easily do in SPARQL; the more 
complex ones would require me to be able to do the changes in code, but it 
might be possible to have it set up in such a way that the code essentially has 
read access to the data and can generate update queries from that.

Our previous setup worked like this (durations not measured, just from 
experience):
On application start, if a migration needs doing, it won't start right away, 
but kick that process off:
- download an entire dump of fuseki to a file on disk (ca. 20 min.)
- load the dump into an in-memory Jena model (10 min. -> plus huge memory 
consumption that will always grow proportional to our data growing)
- perform the migration on the in-memory model (1 sec. - 1 min.)
- dump the model to a file on disk
- drop all graphs from fuseki (20 min.)
- upload the dump into fuseki (20 min.)
Then the application would start... so at least 1h downtime, clearly room for 
improvement.
The good thing about this approach is that if the migration fails, the data 
would not be corrupted because the data loaded in fuseki is not affected.

My best bet at this point is to say, we take the risk of data 

sparql query performance jena v2, 3, 4

2023-02-27 Thread Paul Tyson
I maintain an old jena/fuseki application that has happily been using 
jena v2.13 and tdb v1.1.2 for several years. It loads 1b+ triples into a 
tdb database, and runs a couple dozen queries, some not so trivial, on 
the tdb.


Now it is time to update things. I first went to 3.17, to stay on java8. 
Many of the queries work fine, but a few have abysmal performance. A 
query that took maybe 10 minutes with v2.13 now runs for hours without 
finishing.


I am now trying v4.7 with java11. Testing is still in progress, but it 
doesn't look promising.


The troublesome queries have several FILTER EXISTS and FILTER NOT EXISTS 
clauses, some of which have UNION patterns. It is rather complicated, 
but also a fairly literal translation of the applicable business rules. 
I took a closer look at them, and adjusted the order of patterns to put 
the more-specific ones earlier, but that didn't help. I discovered that 
eliminating the UNION alternatives would let the query return some 
results, but obviously not what is wanted.


Did anything in particular change in the query processing since v2 that 
would cause this performance degradation?


Should I expect any difference in tdb vs tdb2? I've tried both, and 
neither give satisfaction.


Thanks in advance,
--Paul




jena version java compatibility

2022-08-15 Thread Paul Tyson
Apologies for not paying close enough attention to the release history, 
but I need to find the latest jena version that runs in java 1.8. I know 
support for java8 was dropped somewhere in the 4.x series, but can't 
find where. My application must remain at java8 for the time being.


Thanks in advance,

--Paul




Re: where is the RDF shapes action?

2022-07-02 Thread Paul Tyson

Hi Florian,

Long ago I learned some about SPIN, which became SHACL. I thought back 
then it had quite a lot of potential to aid in RDF processing, but 
didn't have any use cases for it at the time More below.


On 6/27/22 14:17, Florian Kleedorfer wrote:

Am 2022-06-27 18:51, schrieb Paul Tyson:

Can anyone point to websites, mailing lists, or other forum where
users are discussing questions, use cases, and solutions involving RDF
shapes (ShEx or SHACL)?


Don't know ShEX, but having worked with SHACL quite a bit from the 
user perspective, I don't know of much beyond Stack Overflow and the 
occasional blog post, the latter usually explaining basics.
It's possible that this is due to the fact that there is no 
specification of a standard SHACL runtime environment, REPL, or 
anything like it, so each system that supports it wraps it in its own 
specific API and what you can do in one environment is not the same 
that you can do in another.


Anyway, can I ask what you have in mind when you say,


I also want to generate graphs from a shape and variable bindings.


Sounds like you want to use SHACL the wrong way around, so to speak: 
The standard use case is to define constraints that the data in a 
given graph must satisfy. You seem to want to express a pattern for 
generating a graph, the pattern is expressed in SHACL shapes, and the 
actual data to be generated is provided as variable bindings. Is that 
correct? I ask because I have been working on getting SHACL to work in 
a different, but equally 'wrong' way. It is possible that your use 
case is also covered by some version of the solution I am trying to 
come up with. Out of the box, however, that's not how SHACL works.


Yes, my use cases go beyond validation. I found the SHACL Advanced 
Features (https://www.w3.org/TR/shacl-af/) and SHACL-JavaScript 
(https://www.w3.org/TR/shacl-js/) working group notes that seem to 
address my use cases, by adding rules and expressions that can 
effectively transform one graph to another. I must study these further, 
and will look at your Jena contributions also later when I return.


Best,

--Paul



best regards,
Florian


Re: where is the RDF shapes action?

2022-07-02 Thread Paul Tyson

Andy, thanks for the links.

I'm really disappointed the active shacl community is on discord. I'll 
have to take some time to come to terms with their terms of service and 
privacy policy.


What would really be nice is if someone (topbraid maybe?) would sponsor 
a neutral community list for rdf-shapes that could become something like 
the terrific xsl-list that mulberry has sponsored for decades: 
https://mulberrytech.com/xsl/xsl-list/index.html


I don't know about gitter. But in general chat boards just don't seem to 
work as well for open, thoughtful discussion about complex topics.


I take it not many of the shex and shacl people would be interested in 
returning to the original w3c lists on those subjects?


I'll be away for a couple of weeks, but hope to pick up discussion 
towards the end of July.


Best regards,

--Paul

On 6/27/22 12:07, Andy Seaborne wrote:

Communities:

SHACL :
https://lists.w3.org/Archives/Public/public-shacl/2021Nov/0007.html
  Discord invite:https://discord.gg/RTbGfJqdKB

ShEx: https://gitter.im/shapeExpressions/Lobby



SHACL reports say what validation found; Jena has a similar reports 
for ShEx (ShEx does not define so much in this area).


Also Jena feature emerging:
If you could look at PR 1256 and see if that can be used for your 
"informative" validation if not covered by validation reports.


https://github.com/apache/jena/pull/1256  (Florian Kleedorfer)

    Andy

On 27/06/2022 17:51, Paul Tyson wrote:
Can anyone point to websites, mailing lists, or other forum where 
users are discussing questions, use cases, and solutions involving 
RDF shapes (ShEx or SHACL)? There is plenty of technical information 
about both languages, but not so much real-world practical discussion.


I recently started exploring the shex and shacl features of jena, 
using the command-line interface to begin with. I have a basic 
conceptual understanding of RDF shapes, but have never implemented 
any applications with them, so am just learning the technical details.


My use cases go beyond simple validation. I need "informative" 
validation that will report not only conformance, but specific 
defects. I also want to generate graphs from a shape and variable 
bindings.


Thanks in advance for advice and pointers.

Best,

--Paul




Re: where is the RDF shapes action?

2022-06-27 Thread Paul Tyson
I should have mentioned, my question comes after finding low to nonexisting 
traffic on W3 shacl and shex mailing lists, and not many stackoverflow posts 
tagged “shex” or “shacl”.  Where else should I look? Or are those the right 
places to start user discussions on these topics?

> On Jun 27, 2022, at 11:52, Paul Tyson  wrote:
> 
> Can anyone point to websites, mailing lists, or other forum where users are 
> discussing questions, use cases, and solutions involving RDF shapes (ShEx or 
> SHACL)? There is plenty of technical information about both languages, but 
> not so much real-world practical discussion.
> 
> I recently started exploring the shex and shacl features of jena, using the 
> command-line interface to begin with. I have a basic conceptual understanding 
> of RDF shapes, but have never implemented any applications with them, so am 
> just learning the technical details.
> 
> My use cases go beyond simple validation. I need "informative" validation 
> that will report not only conformance, but specific defects. I also want to 
> generate graphs from a shape and variable bindings.
> 
> Thanks in advance for advice and pointers.
> 
> Best,
> 
> --Paul
> 
> 



where is the RDF shapes action?

2022-06-27 Thread Paul Tyson
Can anyone point to websites, mailing lists, or other forum where users 
are discussing questions, use cases, and solutions involving RDF shapes 
(ShEx or SHACL)? There is plenty of technical information about both 
languages, but not so much real-world practical discussion.


I recently started exploring the shex and shacl features of jena, using 
the command-line interface to begin with. I have a basic conceptual 
understanding of RDF shapes, but have never implemented any applications 
with them, so am just learning the technical details.


My use cases go beyond simple validation. I need "informative" 
validation that will report not only conformance, but specific defects. 
I also want to generate graphs from a shape and variable bindings.


Thanks in advance for advice and pointers.

Best,

--Paul




Re: JSON-LD: 1.0 or 1.1

2022-04-23 Thread Paul Tyson



> On Apr 23, 2022, at 12:16, Andy Seaborne  wrote:
> 
> What should the default settings be JSON-LD 1.0 or 1.1?
> 
1.1 would better meet my use cases.

Thanks, 
—Paul



Re: Java APIs

2022-02-06 Thread Paul Tyson

Hans-Jürgen, that sounds like an interesting and useful project. Can you post a 
website where we can get more information and follow the project?

Thanks,
--Paul

On 2/5/22 18:34, Hans-Juergen Rennau wrote:

Hello,
I am interested in the integration of SPARQL into XQuery. The essential 
requirement is a Java API for (a) parsing RDF resources into triples, (b) 
evaluating SPARQL queries. Would Jena be a candidate to offer these? Would the 
integration of Jena components into an OpenSource project under the 3-clause 
BSD License be possible?
With kind regards,Hans-Jürgen
PS: The approach envisaged is completely lightweight and very different from 
https://www.w3.org/Submission/xsparql-language-specification/


Re: Ontology

2021-08-19 Thread Paul Tyson
Yes, off-topic, better forums would be ontolog-forum 
(http://ontologforum.org/info/) and semantic-...@w3.org.


But, briefly: I have yet to see a good use case that would justify the 
expense and trouble of making a formal ontology. The most you will 
probably ever need is an RDFS schema, and that mainly for your own 
understanding, communication, and documentation.


Regards.

--Paul

On 8/19/21 2:30 PM, Matt Whitby wrote:

This will be off-topic but I believe you're the best group of people to ask
this to. We need to look at our various datasets and start to put together
an ontology. Does anyone have any suggestions on books to read, videos to
watch, or any general advice or pitfalls to avoid?

Thanks,
Matt.



Re: Scalability

2021-07-23 Thread Paul Tyson



On 7/23/21 12:32 PM, Matt Whitby wrote:

A little bit of a vague question, and perhaps a silly one.

How well does Jena scale?  Would it tap out after a given number of triples?


There are way too many variables to give a simple answer. I curate a 
dataset of 1 billion triples that is refreshed nightly, using a very old 
version of jena. I assume the performance has not regressed in the past 
6 years. The dataset supports a web application driven by fairly complex 
sparql queries, in what might be called a rudimentary "linked data" 
paradigm.


Regards,

--Paul



Do people sometimes split very large datasets over different instances and
just query across the different servers?

Thanks all.




Re: [Apache Fuseki] Limits of Apache Fuseki Triple Store

2021-01-25 Thread Paul Tyson
Regarding first question: I maintain a dataset of 1 billion triples in a very 
old fuseki tdb version.

I would like to know if anyone’s working in trillion triple range , perhaps 
with RDF/HDT.

Regards,
—Paul

> On Jan 25, 2021, at 01:02, Marco Franke  wrote:
> 
> 
> Dear developers,
>  
> we use the Apache Fuseki Triple Store to store sensory data. Up to now, it 
> works pretty well.
>  
> The challenge is that we have to import a lot of data in the upcoming month.
> At the moment, we have stored already ca. 500 000 triples and everything is 
> fine.
>  
> My question is whether there is a limitation of the amount of triples to be 
> stored and queried?
>  
> My second question is how to download the data again.
> I know there is the …./get endpoint for each endpoint. Does this endpoint 
> works for huge datasets, too?
>  
> Thanks in Advance,
>  
> Kind regards,
>  
> Marco Franke
>  
>  
> 
> i.A. Dipl. -Inf. Marco Franke
> Wissenschaftlicher Mitarbeiter in 2.2
>  
> BIBA - Bremer Institut für Produktion und Logistik GmbH
>  
> Informations- und kommunikationstechnische Anwendungen in der Produktion
> Prof. Dr.-Ing. Klaus-Dieter Thoben
>  
> Raum 1490
> Tel.: +49 (0)421 218-50089
> Fax: +49 (0)421 218-50007
> f...@biba.uni-bremen.de
>  
>  
>  
> BIBA – Bremer Institut für Produktion und Logistik GmbH
> Postanschrift: Postfach P.O.B. 33 05 60 · 28335 Bremen / Germany
> Geschäftssitz: Hochschulring 20 · 28359 Bremen / Germany
> USt-ID: DE814890109 · Amtsgericht Bremen HRB 24505 HB
> Tel: +49 (0)421/218-5 · Fax: +49 (0)421/218-50031
> E-Mail: i...@biba.uni-bremen.de · Internet: www.biba.uni-bremen.de
> Geschäftsführer: Prof. Dr.-Ing. M. Freitag, Prof. Dr.-Ing. K.-D. Thoben, O. 
> Simon
>  
>  


Re: Visualising Jena model saved in RDF/XML in the browser

2020-03-09 Thread Paul Tyson
Not to go too deeply into this on jena-users list, but ...

I don't doubt that RDF/XML+XSLT is a useful approach in some situations,
to meet some information presentation needs.

My experience was in Product Lifecycle Management (PLM), mainly dealing
with part master and product structure data, but extending into shop
floor (planning), configuration control, and supply chain aspects. There
are many different, useful, presentations that can be made from this
sort of dataset. Some presentations are easily supported by csv or other
record-oriented extracts from RDF dataset. Others require on-the-fly
conversion of subgraph extracts. I came to realize that both types of
presentations (for web browser delivery) could be handled by relatively
light-weight client-side conversions using d3.js. The only drawback is
that too much business- and schema-specific selection and transformation
logic had to be written in javascript. I would have preferred to use a
standard declarative language (like XSLT, only for general graphs) to
encode these rules.

I'm pretty sure my experience in PLM applies in other domains. The
abstract problem is the same: selecting things of interest from an RDF
dataset, and applying transformations to display those things with
desired presentational semantics (which includes behavioral dimensions).

Regards,
--Paul

On Mon, 2020-03-09 at 08:31 +0100, Martynas Jusevičius wrote:
> Paul,
> 
> as I've shown, XSLT is perfectly usable. Admittedly easier on a flat,
> non-nested output.
> RDF/XML is significant as a bridge format to the XML stack.
> 
> Do you have an example of "useful schema-specific information presentations"?
> 
> 
> Martynas
> 
> On Mon, Mar 9, 2020 at 3:56 AM Paul Tyson  wrote:
> >
> > On Sun, 2020-03-08 at 12:06 +0530, Diptendu Dutta wrote:
> > > I have used Jena to generate RDF/XML of the model:
> > >
> > >  > > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
> > > xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#;>
> > >   
> > > 
> > >   TheConsignor
> > > 
> > >  > > rdf:resource="http://www.lke.com/lke.owl#shall%20deliver
> > > "/>
> > > 
> > >   theConsignedUnits
> > > 
> > >   
> > >   
> > > 
> > >   rulesofinterpretation
> > > 
> > > http://www.lke.com/lke.owl#apply%20in"/>
> > > 
> > >   thisConsignmentAgreement
> > > 
> > >   
> > > .
> > > .
> > > 
> > >
> > > Do I need to use some other format apart from RDF/XML so that
> > > the output my be be visualised in a browser?
> > >
> > > I looked at "cytoscape.js", Owl2Vowl, and some others. They all require
> > > the data to be in some specific format.
> > >
> > > Are there libraries available for  transforming the RDF/XML to a format
> > > suitable for display in browser?
> > >
> > > Which approach would you suggest?
> >
> > I have not worked in this area since 2015, but it remains the biggest
> > problem (and opportunity) in semantic technology.
> >
> > Must you work with RDF/XML format? Jena provides many other options,
> > including csv, json-ld, and sparql results format (srx). These are all
> > easier to transform for presentation than RDF/XML. Will you be
> > transforming the RDF on the server or in the browser?
> >
> > Next, do you want to create a static information display, or dynamic
> > display that responds to user input? Do you want a primarily text-based
> > layout (paragraphs, lists, tables), or graphical (boxes or bubbles and
> > lines)?
> >
> > The approach I found most promising was to use d3.js [1] in the browser.
> > This allows you to create either text or graphical layouts, static or
> > dynamic. You can make ajax sparql queries, either CONSTRUCT or SELECT,
> > and transform the results to HTML5 (including SVG and canvas). What is
> > lacking, however, is a standard way to select and transform significant
> > graph patterns, analogous to XSLT templates. Perhaps the graph shape
> > languages, SHACL and ShEx, could be of some help in this area, but I
> > have not kept up in the last 5 years.
> >
> > One of the big disappointments of semantic technologies is that we
> > haven't gotten far past the "connected bubbles" visualization of RDF
> > graphs. That is easy to do and most unhelpful. It is much harder to
> > create useful schema-specific information presentations from semantic
> > data that meet the real needs of the information consumers.
> >
> > Regards,
> > --Paul
> >
> > [1] https://d3js.org/
> >
> > >
> > > Regards,
> > >
> > > Diptendu Dutta
> > >
> > > Regards,
> > >
> > > Diptendu Dutta
> >
> >




Re: Visualising Jena model saved in RDF/XML in the browser

2020-03-08 Thread Paul Tyson
On Sun, 2020-03-08 at 12:06 +0530, Diptendu Dutta wrote:
> I have used Jena to generate RDF/XML of the model:
> 
>  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
> xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#;>
>   
> 
>   TheConsignor
> 
> http://www.lke.com/lke.owl#shall%20deliver
> "/>
> 
>   theConsignedUnits
> 
>   
>   
> 
>   rulesofinterpretation
> 
> http://www.lke.com/lke.owl#apply%20in"/>
> 
>   thisConsignmentAgreement
> 
>   
> .
> .
> 
> 
> Do I need to use some other format apart from RDF/XML so that
> the output my be be visualised in a browser?
> 
> I looked at "cytoscape.js", Owl2Vowl, and some others. They all require
> the data to be in some specific format.
> 
> Are there libraries available for  transforming the RDF/XML to a format
> suitable for display in browser?
> 
> Which approach would you suggest?

I have not worked in this area since 2015, but it remains the biggest
problem (and opportunity) in semantic technology.

Must you work with RDF/XML format? Jena provides many other options,
including csv, json-ld, and sparql results format (srx). These are all
easier to transform for presentation than RDF/XML. Will you be
transforming the RDF on the server or in the browser?

Next, do you want to create a static information display, or dynamic
display that responds to user input? Do you want a primarily text-based
layout (paragraphs, lists, tables), or graphical (boxes or bubbles and
lines)?

The approach I found most promising was to use d3.js [1] in the browser.
This allows you to create either text or graphical layouts, static or
dynamic. You can make ajax sparql queries, either CONSTRUCT or SELECT,
and transform the results to HTML5 (including SVG and canvas). What is
lacking, however, is a standard way to select and transform significant
graph patterns, analogous to XSLT templates. Perhaps the graph shape
languages, SHACL and ShEx, could be of some help in this area, but I
have not kept up in the last 5 years.

One of the big disappointments of semantic technologies is that we
haven't gotten far past the "connected bubbles" visualization of RDF
graphs. That is easy to do and most unhelpful. It is much harder to
create useful schema-specific information presentations from semantic
data that meet the real needs of the information consumers.

Regards,
--Paul

[1] https://d3js.org/

> 
> Regards,
> 
> Diptendu Dutta
> 
> Regards,
> 
> Diptendu Dutta




Re: programmatically construct UpdateDeleteInsert

2019-12-05 Thread Paul Tyson
Never mind. I missed:

QuadAcc deletes = upd.getDeleteAcc();
deletes.addTriple(/* triple pattern to delete */);

Regards,
--Paul


On Thu, 2019-12-05 at 09:12 -0600, Paul Tyson wrote:
> I'm trying to construct a SPARQL update like:
> 
> DELETE {?r ?p ?o. ?s ?p1 ?r.}
> WHERE {?r ex:foo "123"; ex:bar "456"; ?p ?o.
>   OPTIONAL {?s ?p1 ?r}
> }
> 
> In other words, delete all triples with subject or object resource that
> has certain ex:foo and ex:bar values.
> 
> I can't see how to set or modify the DELETE pattern in an UpdateModify
> (or UpdateDeleteInsert) object. I'm guessing a visitor must be used,
> but I can't put the pieces together. Can someone point out the right
> pattern?
> 
> UpdateModify upd = new UpdateModify();
> upd.setHasDeleteClause(true);
> upd.setHasInsertClause(false);
> upd.setElement(/* set the where block */);
> upd.visit(new UpdateVisitorBase() {
>   @Override
>   public void visit(UpdateModify um) {
> /* somehow add the DELETE triples??? */
>   }
>  });
> 
> I intend to use the built-up request in an UpdateRequest to be
> submitted  by an RDFConnection, like:
> 
> UpdateRequest req = UpdateFactory.create().add(upd);
> rdfConnection.update(req);
> 
> I'm using jena libs 3.13.1.
> 
> Thanks and regards,
> --Paul
> 
> 




programmatically construct UpdateDeleteInsert

2019-12-05 Thread Paul Tyson
I'm trying to construct a SPARQL update like:

DELETE {?r ?p ?o. ?s ?p1 ?r.}
WHERE {?r ex:foo "123"; ex:bar "456"; ?p ?o.
  OPTIONAL {?s ?p1 ?r}
}

In other words, delete all triples with subject or object resource that
has certain ex:foo and ex:bar values.

I can't see how to set or modify the DELETE pattern in an UpdateModify
(or UpdateDeleteInsert) object. I'm guessing a visitor must be used,
but I can't put the pieces together. Can someone point out the right
pattern?

UpdateModify upd = new UpdateModify();
upd.setHasDeleteClause(true);
upd.setHasInsertClause(false);
upd.setElement(/* set the where block */);
upd.visit(new UpdateVisitorBase() {
  @Override
  public void visit(UpdateModify um) {
/* somehow add the DELETE triples??? */
  }
 });

I intend to use the built-up request in an UpdateRequest to be
submitted  by an RDFConnection, like:

UpdateRequest req = UpdateFactory.create().add(upd);
rdfConnection.update(req);

I'm using jena libs 3.13.1.

Thanks and regards,
--Paul




Re: Delete all nested triples

2019-02-20 Thread Paul Tyson
On Wed, 2019-02-20 at 17:27 -0700, ganesh chandra wrote:
> Hello All,
> My data looks something like this:
>  a something:Entity ;
> something:privateData [ a something:PrivateData ;
> something:jsonContent "{\"fileType\": \”jp\"}"^^xsd:string ;
>   something:modeData [a something:data1
>   system:code 1234]
> something:system  ] ;
> 
> There are many like the above one and I am trying to write the query to 
> delete all the data if the id matches. How I should I go about doing this?
> 

If the data is always in this shape, something like this should work:

prefix something: 
prefix system: 
DELETE WHERE {
 a something:Entity ;
something:privateData ?_a.
?_a  a something:PrivateData ;
something:jsonContent ?json ;
something:modeData ?_b; 
something:system ?filetype .
?_b a something:data1;
system:code ?code.
}

This just replaces the blank nodes with sparql variables.

It's a good idea to test DELETE updates thoroughly, because they can
often cause surprises. One way to see what will be deleted is to change
the DELETE to SELECT and run it as a query. That will show you exactly
what triples will be deleted.

Regards,
--Paul




Re: Example code

2018-03-19 Thread Paul Tyson
On Mon, 2018-03-19 at 20:01 +1000, David Moss wrote:
> I agree, the technical documentation is not the place to keep basic how-to 
> examples. But with Jena the basic how-to examples seem to be missing entirely.
> I have written GUI applications using the available examples from MYSQL. 
> MYSQL does have this kind of thing if you look. Jena does not.
> 
> Try typing " mysql java gui tutorial" into Google and see what pops up.
> Then try " Jena java gui tutorial" and contrast the results.
> 
Everyone's been on the low end of the learning curve with the RDF
technology stack, but I would advise to set aside the idea that your RDF
solution architecture should look anything like an SQL-based solution.
The RDF stack is web-ready, which means that what you are trying to do
is probably easier than you expect it to be.

If you described your goals, requirements, and constraints in more
detail, maybe we could be of more assistance.

Regards,
--Paul




Re: [JENA-DEV] SPARQL - Way to concat several property values

2018-02-20 Thread Paul Tyson
Data and query samples would help. (I could not see the image.)

But from your problem description, you might try GROUP BY and the
GROUP_CONCAT aggregate function. This would put (for example) all the
properties of a subject in one result field, separated by the delimiter
of your choice.

See the discussion of aggregates in SPARQL at
https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#aggregates

Regards,
--Paul

On Tue, 2018-02-20 at 13:39 +0100, Brice Sommacal wrote:
> Hi dear jena users, 
> 
> 
> Here is a question regarding how to output a SPARQL result set into a
> Google SpreadSheet with a concatenation mode.
> 
> 
> We have been using a SPARQL RESULT set in order to feed a google
> spreadsheet. 
> The result is below. There is nothing new, we get one line result for
> each match found in our Jena Model.
> 
> 
> Images intégrées 1
> 
> 
> 
> However, we would like to get concatenated values for equivalent
> rows. 
> As you can see from row 1 to 5 in SPARQL QUERY RESULT, we have found 2
> results for Apple. 2 for Microsoft and 1 for John Doe. All these line
> have one common value: "Entité morale réalisant des activités" (eg.
> column in midle side, column 3).
> From EXPECTED RESULT, row 1, these values have been concatenated with
> respect to the unicity of the value  "Entité morale réalisant des
> activités".
> 
> 
> Some times ago, I have been able to restitute this exact use case by
> developping a specific API going through the result set and outputing
> it in a tabular file. However, I would like to keep best practices in
> place, and I wonder if there is a new way to resolve this using
> SPARQL. 
> 
> 
> I was thinking of using SPARQL SUB SELECT, but last time I have been
> using string methods, I haven't get what I wanted. Do you have an
> example showing how to achieve this results? 
> I may be missing others solutions empowered by JENA and ARQ. Any
> others ways to resolve this will be considered. 
> 
> 
> 
> 
> Your insights and helps would be very much appreciated.
> With best regards, 
> 
> 
> 
> 
> Brice
>  




Re: there is a tutorial on the use of fuseki on jsp (JavaServer Pages), to be able to consult about the web platform

2017-11-21 Thread Paul Tyson


> On Nov 21, 2017, at 12:21, Manuel Quintero Fonseca  wrote:
> 
> Yes, I want to make a query with sparql in fuseki, to show the result in a
> web. what has been researched, happens to use java for web pages JSP is
> used in tomcat.

You might have a good reason (or requirement) to use jsp, but it is not 
necessary. Any JavaScript framework for dom and ajax will work. My favorite is 
d3 (d3js.org).

Regards,
--Paul

> 
> 2017-11-21 6:55 GMT-07:00 ajs6f :
> 
>> Are you asking about the use of Fuseki from JSP pages? That is, using JSP
>> pages to send queries to Fuseki?
>> 
>> ajs6f
>> 
>>> On Nov 20, 2017, at 9:56 PM, Manuel Quintero Fonseca 
>> wrote:
>>> 
>>> Hello, there is a tutorial on the use of fuseki on jsp (JavaServer
>> Pages),
>>> to be able to consult about the web platform
>>> 
>>> Information on this subject is very scarce, and in the official
>>> documentation or say, is unclear.
>>> 
>>> Thank you
>> 
>> 



Re: Return nested JSON results

2017-05-16 Thread Paul Tyson


On May 16, 2017, at 15:51, Laura Morales  wrote:

>> That's ok, but i wanted to know what kind of app is querying your Fuseki
>> database? Only a hint.. I know it sounds 'fundamentally' not essential,
>> but for 'me' it is the practice of developement.
> 
> 
> Well yeah, I don't know why this would be useful... anyway I'm trying to do a 
> school project which is basically a Java desktop app using Fuseki as backend. 
> I've loaded a few datasets and I'm reading/writing from/to it. Also, I'm 
> using Fuseki to query via HTTP, and not any particular Java API for Jena. I'm 
> stuck trying to figure out how to return JSON-LD framed results. Still no 
> progress.

Respectfully, json-ld framing is way overkill unless you get extra credit for 
complexity.

All you need is SELECT returning sparql-results+json and a JavaScript library 
that can do grouping (aka nesting). Even simpler, get text/csv and perform a 
grouping operation. This is a few lines of code using d3 (d3js.org) but there 
are no doubt other libraries that would provide this capability.

Regards,
--Paul


Re: Return nested JSON results

2017-05-09 Thread Paul Tyson
On Tue, 2017-05-09 at 18:32 +, Dimov, Stefan wrote:
> I’m also interested in having nested JSON results …
> S.
> 
> On 5/8/17, 11:06 PM, "Laura Morales"  wrote:
> 
> > I may have time to test if the change in this pull request [1] could 
> create such response. Feel free to comment there should you have any 
> suggestion, or if you could help testing it.
> I'd love to help testing this feature, although I haven't used Java 
> in years so I'm not 100% sure what's the best way to test this. I guess I'll 
> simply recompile the whole package and submit queries.
> 
> Btw, I just realized that using "" in my example (previous 
> email) might have been a bit misleading. What I was trying to show was the 
> results of a query to "retrieve people, the books they've read, and for each 
> book title/author/characters".
> Roughly speaking my idea is that of traversing the graph starting from 
> some node, and return some properties for selected nodes along the way. If 
> this makes sense. This doesn't seem to be possible right now, at least not 
> with Jena/SPARQL since all results are flattened into a plain list. I hope 
> something can be done for this, because it would be extremely cool and 
> extremely useful as well :)
> 

I'm not sure I completely understand the use case that this approach is
trying to address. But at best, it looks like a nonstandard,
application-specific shortcut to produce something like a table with
subrows.

Yes, if you want JSON it's simpler than JSON-LD and richer than the JSON
sparql-results serialization. But I think you will find with non-trivial
datasets you will miss the RDFness (or graphiness, if you will) of
JSON-LD. The main problem is there is no notion of object identity and
referencing, so you will have duplicate objects throughout the dataset.
If the only reason you want nested JSON is for easier display, that's
OK, but then I wonder why you would want to use RDF at all.

A semantic-oriented approach to this problem awaits standardization of a
general-purpose graph transformation language, to do for RDF graphs what
XSLT does for XML. The RDF Shapes movement (SHACL [1], ShEx [2]) might
be a step in this direction.

Regards,
--Paul

[1] https://www.w3.org/TR/shacl/
[2] https://www.w3.org/2001/sw/wiki/ShEx




Re: How to make complex SPARQL queries reusable?

2017-04-24 Thread Paul Tyson
Another option is to express the query logic in standard rule notation, such as 
RIF, and translate to sparql. This approach is especially indicated if the 
sparql queries represent actual business rules.

Regards,
--Paul

> On Apr 24, 2017, at 06:22, Andy Seaborne  wrote:
> 
> Simon,
> 
> 1/ SpinRDF provides ways of defining custom functions and property functions 
> in SPARQL.
> 
> http://spinrdf.org/
> 
> There'll probably be something in the SHACL-sphere (not in the standard, but 
> in the same general area/style/framework) soon.
> 
> 2/ There are methods in ExprUtils to parse expressions in SPARQL syntax.
> 
> You could build a general helper.
> 
> That does not enable syntax replacement but does cover (?) your MyFunc 
> example.
> 
>Andy
> 
>> On 24/04/17 01:02, Simon Schäfer wrote:
>> Hello,
>> 
>> I have complex SPARQL queries, which I would like to divide into several 
>> parts (like function definitions), in order to reuse these parts among 
>> different SPARQL queries and in order to make a single SPARQL query easier 
>> to understand. What are my options to achieve this?
>> 
>> I had a look at Jenas built in functionality to support user defined 
>> function definitions. The problem with them is that it seems that they can 
>> be used only for simple functionality like calculating the max of two 
>> integers. But I have quite complex functionality, which I don't want to 
>> rewrite in Java. Example:
>> 
>> 
>> select * where {
>>  ?s a ?tpe .
>>  filter not exists {
>>?sub rdfs:subClassOf ?tpe .
>>filter (?sub != ?tpe)
>>  }
>> }
>> 
>> 
>> It would be great if that could be separated into:
>> 
>> 
>> public class MyFunc extends FunctionBase1 {
>>public NodeValue exec(NodeValue v) {
>>return NodeValue.fromSparql("filter not exists {"
>>+ "   ?sub rdfs:subClassOf ?tpe ."
>>+ "  filter (?sub != ?tpe)"
>>+ "}") ;
>>}
>> }
>> // and then later
>> FunctionRegistry.get().put("http://example.org/function#myFunc;, 
>> MyFunc.class) ;
>> 
>> 
>> and then:
>> 
>> 
>> prefix fun:
>> select * where {
>>  ?s a ?tpe .
>>  filter(fun:myFunc(?tpe))
>> }
>> 
>> 
>> Basically I'm looking for a way to call a SPARQL query from within a SPARQL 
>> query. Is that possible?
>> 
>> 



Re: Predicates with no vocabulary

2017-04-12 Thread Paul Tyson
Part of the fun (and ease) of RDF is being able to make stuff up as you go 
along. But, as others have said, when you move past the stage of learning and 
experimentation, and make your work persistent or reusable, you'll want to be 
more formal.

The linked data patterns book [1] has several ideas for uris.

Regards,
--Paul

[1] http://patterns.dataincubator.org/book

Sent from my iPhone

> On Apr 12, 2017, at 13:24, Dick Murray  wrote:
> 
> It is for this reason that I use  and as a nod to my Cisco
> engineer days and example.org... :-)
> 
> As Martynas Jusevičius said give it a little thought.
> 
> On 12 April 2017 at 17:37, Martynas Jusevičius 
> wrote:
> 
>> It would not be an error as long it is a valid URI.
>> 
>> Conceptually non-HTTP URIs go against Linked Data principles because
>> normally they cannot be dereferenced.
>> 
>> Therefore it makes sense to give it a little thought and choose an
>> http:// namespace that you control.
>> 
>> On Wed, Apr 12, 2017 at 6:31 PM, Laura Morales  wrote:
 I use "urn:ex:..." in a lot of my test code (short for "urn:example:").
 
 Then the predicate is "urn:ex:time/now" or "urn:ex:time/duration" or
 whatever you need...
>>> 
>>> would it be an error (perhaps conceptually) to use "ex:...", essentially
>> removing the "urn:" scheme?
>> 



Re: SPARQL query

2017-03-01 Thread Paul Tyson
Maybe something like:

Select ?s ?p ?o
Where {
?s ?p ?o.
Filter (?p in (:A, :B, :C))
Minus {?s ?p2 ?o2.
Filter (!(?p2 in (:A, :B, :C)))}

Untested.

Regards, 
--Paul

> On Mar 1, 2017, at 14:12, Claude Warren  wrote:
> 
> I have a graph where resources have a number of controlled properties.
> Call them A thorough F.  Some resources may have one property other may
> have all 6.
> 
> I have a resource with a with properties A, B, and C.
> 
> I want to find all resources in the graph where the resource has matching
> values but does not have any non specified properties.
> 
> So
> x with A, B, and C will match
> y with A and C will match
> z with A, B, C and D will not match
> w with A and D will not match.
> 
> Any idea how to construct a query that will do this?
> 
> It seems backwards to the way I normally think about queries.
> 
> Any help would be appreciated,
> Claude
> 
> 
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> 
> LinkedIn: http://www.linkedin.com/in/claudewarren


Re: Benefits of Semantic web

2017-02-10 Thread Paul Tyson
You might find this related discussion interesting:

https://groups.google.com/forum/m/#!topic/ontolog-forum/AUzkFVhGrok

Regards,
--Paul

> On Feb 10, 2017, at 11:02, David Jordan  wrote:
> 
> I agree that have some discussion about this is very useful. Many of us
> have tried to evangelize semantic web technologies in our organizations and
> have struggled and failed because we cannot provide sufficient
> justification for using the technology. Hearing the specific value provided
> that can convince the skeptics is extremely valuable, much more valuable
> than simple support questions about a particular API interface.
> 
> 
>> On Fri, Feb 10, 2017 at 10:50 AM, Andy Seaborne  wrote:
>> 
>> kumar,
>> 
>> Have you done some investigation by searching on the web?
>> 
>> You will find many articles, blogs and papers from a wide range of
>> perspectives.
>> 
>>Andy
>> 
>> 
>>> On 10/02/17 12:22, kumar rohit wrote:
>>> 
>>> Hi, what are the benefits of semantic web technologies? I have used
>>> semantic web technologies from one year but, in theory I am not sure the
>>> real advantages of semantic web.
>>> When we develop a system using traditional RDBMS and Java and same system
>>> we develop using Java/Jena Protege SPARQL etc, so what is the advantage of
>>> the latter application?
>>> 
>>> 


Re: JSON-LD questions

2017-01-11 Thread Paul Tyson


> On Jan 11, 2017, at 09:00, Martynas Jusevičius <marty...@graphity.org> wrote:
> 
> Paul,
> 
> isn't that what SPARQL CONSTRUCT is doing?

That just gives you another graph, not a hierarchical structure as the OP 
wanted. Other than named graphs, RDF has no notion of containment, which is 
essential for creating meaningful displays of information. Nor is it good for 
sequencing, another essential feature for good information display. A general 
graph transformation language should make it easy to specify an output 
information structure built around containment and sequence (e.g., XML).

Regards,
--Paul
> 
>> On Wed, Jan 11, 2017 at 3:18 PM, Paul Tyson <phty...@sbcglobal.net> wrote:
>> 
>> 
>> On Jan 10, 2017, at 18:06, Grahame Grieve 
>> <grah...@healthintersections.com.au> wrote:
>> 
>>>> 
>>>> statements to describe the graph itself, e.g. here's a graph which
>>>> identifies a particular person as the foaf:primaryTopic of the graph:
>>>> 
>>>> {
>>>> "@context": {
>>>>   "Person": "http://xmlns.com/foaf/0.1/Person;,
>>>>   "name": "http://xmlns.com/foaf/0.1/name;,
>>>>   "primaryTopic": "http://xmlns.com/foaf/0.1/primaryTopic;
>>>> },
>>>> 
>>>> 
>>> 
>>> thanks. Is it expected that every domain application will come up with
>>> their own (different) way to do this?
>> 
>> Wrong list for this suggestion, but what is missing from the w3c semantic 
>> web stack is a general graph transformation language, analogous to XSLT, 
>> that would allow one to write domain-specific transformations from graph to 
>> hierarchy. Perhaps it was imagined that RDF/XML plus XSLT would fill this 
>> role, but that is painful. I've written srx-to-*ml transforms, but that is 
>> awkward also. Of course you can always roll your own rules in JavaScript, 
>> but that hides business logic in imperative code.
>> 
>> Regards,
>> --Paul
>>> 
>>> Grahame
>> 



Re: sum property values

2016-10-13 Thread Paul Tyson
There have been 2 different techniques mentioned that solve different problems.

The SUM aggregate function can be used to reduce a result set as illustrated by 
the following minimal CSV snippets:

BEFORE:
?v1,?v2
"A",1
"A",1
"A",2
"B",3
"B",1

AFTER:
?v1,?v3
"A",4
"B",4

The above results could be achieved by:

SELECT ?v1 (SUM(?v2) AS ?v3)
WHERE {
#pattern that binds ?v1 and ?v2
}
GROUP BY ?v1

If you want to extend a result set with a new binding composed from other bound 
values, you could use:

SELECT ?v1 ?v2 ?v3 ?v4
WHERE {
#pattern that binds ?v1,?v2,?v3
BIND (?v2+?v3 AS ?v4)
}

If the basic graph pattern results were:

?v1,?v2,?v3
"A",2,0
"B",1,3
"C",5,2

then the BIND statement would cause:

?v1,?v2,?v3,?v4
"A",2,0,2
"B",1,3,4
"C",5,2,7

Regards,
--Paul

> On Oct 11, 2016, at 13:32, neha gupta  wrote:
> 
> I am still waiting for any response about how to add/sum values of
> different data property values in SPARQL?
> 
> 
> 
>> On Sun, Oct 9, 2016 at 8:19 AM, neha gupta  wrote:
>> 
>> yes because what I studied about SUM is it sum the values of single
>> variable.
>> 
>> SELECT (?worldcupgoals + ?eurogoals+ ?othergoals))
>> where { ?team rdf:type ont:Team . ?team ont:WorldCup_Goals ?worldcupgoals;
>> ont:EuroCup_Goals ?eurogoals ; ont:Other_Goals ?othergoals}
>> 
>> On Sun, Oct 9, 2016 at 5:15 AM, Lorenz B. > leipzig.de> wrote:
>> 
>>> No, that's obviously wrong. And you don't need GROUP BY and SUM - in
>>> your examples it's even the wrong way.
>>> Simply SELECT the team and use '+' to compute the sum of the different
>>> goal values.
>>> 
>>> And why can't you try it out?
>>> 
 Will it works like:
 
 SELECT (SUM(?worldcupgoals ?eurogoals ?othergoals))
 where { ?team rdf:type ont:Team . ?team ont:WorldCup_Goals
>>> ?worldcupgoals;
 ont:EuroCup_Goals ?eurogoals ; ont:Other_Goals ?othergoals}
 
 
 On Sat, Oct 8, 2016 at 12:43 PM, Martynas Jusevičius <
>>> marty...@graphity.org>
 wrote:
 
> https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#aggr
>>> egateExample
> 
> On Sat, Oct 8, 2016 at 9:35 PM, neha gupta 
>>> wrote:
>> I am sorry but need some details? I really have no idea how to sum
> multiple
>> values (data properties) using SPARQL, despite searched the web.
>> 
>> On Sat, Oct 8, 2016 at 11:21 AM, Andy Seaborne 
>>> wrote:
>> 
>>> SUM and GROUP BY
>>> 
>>>   Andy
>>> 
>>> 
 On 08/10/16 19:14, neha gupta wrote:
 
 team1  WorldCup_Goals  20
 team1  EuroCup_Goals   25
 team1  Other_Goals50
 
 WorldCup_Goals,  EuroCup_Goals and Other_Goals are my properties.
 
 How to calculate total goals of team1 using SPARQL?
>>> --
>>> Lorenz Bühmann
>>> AKSW group, University of Leipzig
>>> Group: http://aksw.org - semantic web research center
>> 



Re: Applications/Projects Using Jena

2016-10-06 Thread Paul Tyson
As a contractor, I cannot disclose specific company information, but I have 
implemented a couple of large Jena projects.

One is in the PLM (Product Lifecycle Management) domain. I extract part master 
and product structure from 4 enterprise data systems and load to TDB, using 
RDB2RDF (R2RML) design patterns. Fuseki serves the data as interactive 
multi-BOM (bill of material) browser. I also apply business rules, cast as 
sparql queries, to detect defective data conditions within and between the 
enterprise systems. The dataset has over 900M triples.

Another project is an advanced hyperlinking application. Fuseki is a linkbase 
endpoint that supports authoring (creating/editing links between documents in 
an XML repository) and navigating (following links in an HTML delivery system). 
Automated processes update the linkbase as documents are versioned, so that 
source and target link ends are updated independently without author 
intervention. Link targets are calculated at display-time because they may 
change based on browsing context.

Regards,
--Paul

> On Oct 6, 2016, at 09:25, Kumar,Abhishek  wrote:
> 
> Hi Team,
> 
> I am a Masters student at University of Florida. I am presenting Apache Jena 
> to our class as part of research on RDF and semantic web. I have got material 
> to speak about Jena but I am having trouble finding some use cases of Jena in 
> current industry.
> 
> I read that many Semantic web applications use Jena but I could not find any 
> project which uses Jena. Can you help provide names of some projects which 
> are currently using Jena? That would be very helpful to generate interest 
> among students.
> 
> Thanks & Regards
> Abhishek Kumar



Re: Construct query

2016-10-01 Thread Paul Tyson
On Sat, 2016-10-01 at 03:10 -0700, neha gupta wrote:
> Paul, thank you.
> 
> If I have Student/University ontology and the ontology does not have
> "hasResearch" property can we create like this:
> 
> CONSTRUCT {?s ex:hasResearch ?research}
> WHERE {?s rdf:type ex:Student;ex:hasResearch  "C-SPARQL"}
> 
> to create the property and assign instance to it?

That wouldn't construct any triple because ?research will never be bound
by the WHERE clause. If you ever have a question about what triples will
be constructed, just change the CONSTRUCT {...} to SELECT * and see what
results you get. For every result record, CONSTRUCT will create triples
from the patterns whose variables are bound in the result set.

But now I'm not sure whether you are trying to extend the schema (or
ontology), or just assigning properties to existing subjects.

To add ex:hasResearch into the dataset as an rdf:Property you would just
load some triples like:

ex:hasResearch a rdf:Property;
  rdfs:domain ex:Student;
  rdfs:range [owl:oneOf ("C-SPARQL" "RIF" "SPIN")].

If you want to do any kind of OWL reasoning over your dataset you would
need to add some more statements to associate ex:hasResearch as a data
property of ex:Student.

On the other hand, if you just want to say that all students who are
taking a course called "Intro to SPARQL" will "have
research"="C-SPARQL", that would be like:

CONSTRUCT {?s ex:hasResearch "C-SPARQL"}
WHERE {?s rdf:type ex:Student; ex:hasCourse/rdfs:label "Intro to
SPARQL"}

Regards,
--Paul

> 
> On Fri, Sep 30, 2016 at 8:52 PM, Paul Tyson <phty...@sbcglobal.net> wrote:
> 
> > On Thu, 2016-09-29 at 13:44 -0700, tina sani wrote:
> > > I want to know about the Construct query.
> > > How it differs from Select query
> >
> > One way to think about it, if you have any background in relational
> > databases, is that SELECT returns a highly denormalized table (at least,
> > from any non-trivial query). On the other hand, CONSTRUCT will give you
> > an over-normalized set of tables, if you think of each RDF property
> > triple (or quad, if you include graph uri) as an RDBMS table.
> >
> > That is a very abstract view. Syntactically, the results of CONSTRUCT
> > and SELECT are quite different. SELECT must be serialized into something
> > like a table model (csv or the isomorphic xml, json, or text versions).
> > CONSTRUCT results are serialized into some RDF format.
> >
> > > Is it create a new property/class which is not already in the ontology or
> > > it just creates new triples.
> >
> > You can do either (or both). You can use CONSTRUCT to extract a
> > subgraph, or to create new triples based on some rules about the
> > existing data.
> >
> > > I will appreciate if some one come with a simple example. I have searched
> > > web, but could not grasp it.
> >
> > Extract a subgraph. Just get rdf:type and rdfs:label of all subjects.
> > CONSTRUCT {?s rdf:type ?type;rdfs:label ?label}
> > WHERE {?s rdf:type ?type;rdfs:label ?label}
> >
> > Make a new statement based on certain conditions in the dataset.
> > CONSTRUCT {?s :new-property false}
> > WHERE {?s rdf:type ex:TypeA;ex:property1 "A"}
> >
> > Materialize superclass properties:
> > CONSTRUCT {?s rdf:type ?super}
> > WHERE {?s rdf:type ?type;?type rdfs:subClassOf* ?super}
> >
> > As another responder mentioned, CONSTRUCT just returns the triples--it
> > does not modify the dataset. Use a SPARQL update statement
> > (INSERT/WHERE) for that. I've found that with complex or large results
> > it is better in jena to do a CONSTRUCT query and then load the triples
> > into the TDB repository, rather than do an INSERT/WHERE update, which
> > can exhaust resources if the resulting graph is large.
> >
> > Hope this helps.
> >
> > Regards,
> > --Paul
> >
> >




Re: Construct query

2016-09-30 Thread Paul Tyson
On Thu, 2016-09-29 at 13:44 -0700, tina sani wrote:
> I want to know about the Construct query.
> How it differs from Select query

One way to think about it, if you have any background in relational
databases, is that SELECT returns a highly denormalized table (at least,
from any non-trivial query). On the other hand, CONSTRUCT will give you
an over-normalized set of tables, if you think of each RDF property
triple (or quad, if you include graph uri) as an RDBMS table.

That is a very abstract view. Syntactically, the results of CONSTRUCT
and SELECT are quite different. SELECT must be serialized into something
like a table model (csv or the isomorphic xml, json, or text versions).
CONSTRUCT results are serialized into some RDF format.

> Is it create a new property/class which is not already in the ontology or
> it just creates new triples.

You can do either (or both). You can use CONSTRUCT to extract a
subgraph, or to create new triples based on some rules about the
existing data.

> I will appreciate if some one come with a simple example. I have searched
> web, but could not grasp it.

Extract a subgraph. Just get rdf:type and rdfs:label of all subjects.
CONSTRUCT {?s rdf:type ?type;rdfs:label ?label}
WHERE {?s rdf:type ?type;rdfs:label ?label}

Make a new statement based on certain conditions in the dataset.
CONSTRUCT {?s :new-property false}
WHERE {?s rdf:type ex:TypeA;ex:property1 "A"}

Materialize superclass properties:
CONSTRUCT {?s rdf:type ?super}
WHERE {?s rdf:type ?type;?type rdfs:subClassOf* ?super}

As another responder mentioned, CONSTRUCT just returns the triples--it
does not modify the dataset. Use a SPARQL update statement
(INSERT/WHERE) for that. I've found that with complex or large results
it is better in jena to do a CONSTRUCT query and then load the triples
into the TDB repository, rather than do an INSERT/WHERE update, which
can exhaust resources if the resulting graph is large.

Hope this helps.

Regards,
--Paul



Re: sparql algebra differences jena 2.13.0/3.n

2016-09-18 Thread Paul Tyson
I looked at some more queries that worked in jena 2.x but seem to hang
in 3.x. They all follow the same pattern of a complex FILTER query on
string values. Rewriting the filter conditions into subgraph patterns
solved the problem.

Is this a defect induced by algebra optimizations in 3.x? Or, is it more
proper to apply string filters in the manner you suggested, by enclosing
them in subgraph patterns close to the triples they filter?

There was one case that was a little more complex. The original query
was like:

CONSTRUCT {
?var1 :p1 false .
}
WHERE {
FILTER ((?var2 != "str1" && !strstarts(?var3,"str2")))
?var1 :p2 ?var3 ;
 :p3 ?var2 ;
 :p4 "str3" ;
 :p5 "str4" ;
 :p6 "str5" .
FILTER NOT EXISTS {
FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13")))
?var5 :p7 ?var4 ;
 :p8 ?var3 .
}
}

I initially rewrote the FILTER NOT EXISTS clause to read:

FILTER NOT EXISTS {
{FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13")))
?var5 :p7 ?var4 .}
?var5 :p8 ?var3 .
}

which still seemed to hang. Reordering the FILTER NOT EXISTS bgp to the
following solved the problem.

FILTER NOT EXISTS {
?var5 :p8 ?var3 .
{FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13"))) 
?var5 :p7 ?var4 .}
}

I have noticed other cases where order of triples and bgps makes quite a
difference in execution time, but I can't figure out any science to it.
Are there any guidelines for ordering the components of a complex query
(including UNION and OPTIONAL clauses) to optimize performance? Can you
tell anything by a static analysis of the sparql algebra?

Regards,
--Paul 



On Fri, 2016-09-16 at 08:37 -0500, Paul Tyson wrote:
> Andy,
> 
> With that rewrite, the 3.x tdbquery works as expected.
> 
> I will investigate further this weekend and send other queries that don't 
> work in 3.x.
> 
> Regards,
> --Paul
> 
> > On Sep 16, 2016, at 04:26, Andy Seaborne <a...@apache.org> wrote:
> > 
> > Paul,  If you could try the query below which mimics the effect of placing 
> > the ?var4 filter part, it will help determine if this is a filter placement 
> > issue or not.
> > 
> > The difference is that first basic graph pattern is inside a {} with the 
> > relevant part of the filter expression.
> > 
> >Andy
> > 
> > 
> > PREFIX  : <http://example/>
> > 
> > SELECT  *
> > WHERE
> >  { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
> >{ ?var2  :p1  ?var4 ;
> > :p2  ?var3
> >  FILTER ( ! ( ( ( ?var4 = "" ) ||
> >   ( ?var4 = "str3" ) ) ||
> >   regex(?var4, "pat1") ) )
> >}
> >{   { ?var1  :p3  ?var4 }
> >  UNION
> >{ ?var1  :p4  ?var4 }
> >}
> >  }
> > 
> > 
> >Andy
> > 
> > 
> >> On 14/09/16 13:15, Paul Tyson wrote:
> >>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
> >>> Hi Paul,
> >>> 
> >>> It's difficult to tell what's going on from your report. Plain strings
> >>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
> >>> related the data for running Jena 3.x.
> >> 
> >> I admit I have not studied the subtleties around string literals with
> >> and without datatype tags. None of my data loadfiles have tagged string
> >> literals, nor do my queries. Are you saying they should?
> >> 
> >>> 
> >>> On less data, does either case produce the wrong answers?
> >> 
> >> I'll produce a smaller dataset to test.
> >> 
> >>> The regex is not being pushed inwards in the same way which may be an
> >>> issue - it "all depends" on the data.
> >>> 
> >>> A smaller query exhibiting a timing difference would be very helpful.
> >>> Are all parts of the FILTER necessary for the effect?
> >> Yes, they eliminate spurious matches.
> >> 
> >>> 
> >>>Andy
> >>> 
> >>> Unrelated:
> >>> 
> >>> {
> >>> ?var1 :p3 ?var4 .
> >>> } U

Re: sparql algebra differences jena 2.13.0/3.n

2016-09-16 Thread Paul Tyson
Andy,

With that rewrite, the 3.x tdbquery works as expected.

I will investigate further this weekend and send other queries that don't work 
in 3.x.

Regards,
--Paul

> On Sep 16, 2016, at 04:26, Andy Seaborne <a...@apache.org> wrote:
> 
> Paul,  If you could try the query below which mimics the effect of placing 
> the ?var4 filter part, it will help determine if this is a filter placement 
> issue or not.
> 
> The difference is that first basic graph pattern is inside a {} with the 
> relevant part of the filter expression.
> 
>Andy
> 
> 
> PREFIX  : <http://example/>
> 
> SELECT  *
> WHERE
>  { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
>{ ?var2  :p1  ?var4 ;
> :p2  ?var3
>  FILTER ( ! ( ( ( ?var4 = "" ) ||
>   ( ?var4 = "str3" ) ) ||
>   regex(?var4, "pat1") ) )
>}
>    {   { ?var1  :p3  ?var4 }
>  UNION
>{ ?var1  :p4  ?var4 }
>}
>  }
> 
> 
>Andy
> 
> 
>> On 14/09/16 13:15, Paul Tyson wrote:
>>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
>>> Hi Paul,
>>> 
>>> It's difficult to tell what's going on from your report. Plain strings
>>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>>> related the data for running Jena 3.x.
>> 
>> I admit I have not studied the subtleties around string literals with
>> and without datatype tags. None of my data loadfiles have tagged string
>> literals, nor do my queries. Are you saying they should?
>> 
>>> 
>>> On less data, does either case produce the wrong answers?
>> 
>> I'll produce a smaller dataset to test.
>> 
>>> The regex is not being pushed inwards in the same way which may be an
>>> issue - it "all depends" on the data.
>>> 
>>> A smaller query exhibiting a timing difference would be very helpful.
>>> Are all parts of the FILTER necessary for the effect?
>> Yes, they eliminate spurious matches.
>> 
>>> 
>>>Andy
>>> 
>>> Unrelated:
>>> 
>>> {
>>> ?var1 :p3 ?var4 .
>>> } UNION {
>>> ?var1 :p4 ?var4 .
>>> }
>>> 
>>> can be written
>>> 
>>> ?var1 (:p3|:p4) ?var4
>> Yes, but I generate these queries from RIF source, and UNION is easier
>> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
>> any difference in the algebra, does it?
>> 
>> Regards,
>> --Paul
>> 
>>>> On 14/09/16 02:01, Paul Tyson wrote:
>>>> I have some queries that worked fine in jena-2.13.0 but not in
>>>> jena-3.1.0, using the same data.
>>>> 
>>>> For a long time I've been running a couple dozen queries regularly over
>>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
>>>> forever). qparse revealed that the sparql algebra is quite different in
>>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>>>> 
>>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>>>> 
>>>> prefix : <http://example.org>
>>>> CONSTRUCT {
>>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>>>> }
>>>> WHERE {
>>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>>>> "str3" || regex(?var4,"pat1"
>>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
>>>> {{
>>>> ?var1 :p3 ?var4 .
>>>> } UNION {
>>>> ?var1 :p4 ?var4 .
>>>> }}
>>>> }
>>>> 
>>>> Jena-2.13.0 produces algebra:
>>>> (prefix ((: <http://example.org>))
>>>>  (sequence
>>>>(filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>>>>  (sequence
>>>>(filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>> "pat1")))
>>>>  (bgp (triple ?var2 :p1 ?var4)))
>>>>(bgp (triple ?var2 :p2 ?var3
>>>>(union
>>>>  (bgp (triple ?var1 :p3 ?var4))
>>>>  (bgp (triple ?var1 :p4 ?var4)
>>>> 
>>>> Jena-3.1.0 produces algebra:
>>>> (prefix ((: <http://example.org>))
>>>>  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>> "pat1")))
>>>>(disjunction
>>>>  (assign ((?var3 "str1"))
>>>>(sequence
>>>>  (bgp
>>>>(triple ?var2 :p1 ?var4)
>>>>(triple ?var2 :p2 "str1")
>>>>  )
>>>>  (union
>>>>(bgp (triple ?var1 :p3 ?var4))
>>>>(bgp (triple ?var1 :p4 ?var4)
>>>>  (assign ((?var3 "str2"))
>>>>(sequence
>>>>  (bgp
>>>>(triple ?var2 :p1 ?var4)
>>>>(triple ?var2 :p2 "str2")
>>>>  )
>>>>  (union
>>>>(bgp (triple ?var1 :p3 ?var4))
>>>>(bgp (triple ?var1 :p4 ?var4
>>>> 
>>>> Thanks for any insight or assistance into this problem.
>>>> 
>>>> Regards,
>>>> --Paul
>> 
>> 



Re: sparql algebra differences jena 2.13.0/3.n

2016-09-14 Thread Paul Tyson
On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
> Hi Paul,
> 
> It's difficult to tell what's going on from your report. Plain strings 
> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have 
> related the data for running Jena 3.x.

I admit I have not studied the subtleties around string literals with
and without datatype tags. None of my data loadfiles have tagged string
literals, nor do my queries. Are you saying they should?

> 
> On less data, does either case produce the wrong answers?
> 

I'll produce a smaller dataset to test.

> The regex is not being pushed inwards in the same way which may be an 
> issue - it "all depends" on the data.
> 
> A smaller query exhibiting a timing difference would be very helpful. 
> Are all parts of the FILTER necessary for the effect?
Yes, they eliminate spurious matches.

> 
>   Andy
> 
> Unrelated:
> 
> {
> ?var1 :p3 ?var4 .
> } UNION {
> ?var1 :p4 ?var4 .
> }
> 
> can be written
> 
> ?var1 (:p3|:p4) ?var4
> 
> 
Yes, but I generate these queries from RIF source, and UNION is easier
for the general RIF statement "Or(x,y)". The surface syntax doesn't make
any difference in the algebra, does it?

Regards,
--Paul

> On 14/09/16 02:01, Paul Tyson wrote:
> > I have some queries that worked fine in jena-2.13.0 but not in
> > jena-3.1.0, using the same data.
> >
> > For a long time I've been running a couple dozen queries regularly over
> > a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
> > to jena-3.1.0, I found that 5 of these queries would not return (ran
> > forever). qparse revealed that the sparql algebra is quite different in
> > 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
> >
> > Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
> > with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
> >
> > prefix : <http://example.org>
> > CONSTRUCT {
> > ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
> > }
> > WHERE {
> > FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
> > "str3" || regex(?var4,"pat1"
> > ?var2 :p1 ?var4 ; :p2 ?var3 .
> > {{
> > ?var1 :p3 ?var4 .
> > } UNION {
> > ?var1 :p4 ?var4 .
> > }}
> > }
> >
> > Jena-2.13.0 produces algebra:
> > (prefix ((: <http://example.org>))
> >   (sequence
> > (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
> >   (sequence
> > (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> > "pat1")))
> >   (bgp (triple ?var2 :p1 ?var4)))
> > (bgp (triple ?var2 :p2 ?var3
> > (union
> >   (bgp (triple ?var1 :p3 ?var4))
> >   (bgp (triple ?var1 :p4 ?var4)
> >
> > Jena-3.1.0 produces algebra:
> > (prefix ((: <http://example.org>))
> >   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> > "pat1")))
> > (disjunction
> >   (assign ((?var3 "str1"))
> > (sequence
> >   (bgp
> > (triple ?var2 :p1 ?var4)
> > (triple ?var2 :p2 "str1")
> >   )
> >   (union
> > (bgp (triple ?var1 :p3 ?var4))
> > (bgp (triple ?var1 :p4 ?var4)
> >   (assign ((?var3 "str2"))
> > (sequence
> >   (bgp
> > (triple ?var2 :p1 ?var4)
> > (triple ?var2 :p2 "str2")
> >   )
> >   (union
> > (bgp (triple ?var1 :p3 ?var4))
> > (bgp (triple ?var1 :p4 ?var4
> >
> > Thanks for any insight or assistance into this problem.
> >
> > Regards,
> > --Paul
> >




sparql algebra differences jena 2.13.0/3.n

2016-09-13 Thread Paul Tyson
I have some queries that worked fine in jena-2.13.0 but not in
jena-3.1.0, using the same data.

For a long time I've been running a couple dozen queries regularly over
a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
to jena-3.1.0, I found that 5 of these queries would not return (ran
forever). qparse revealed that the sparql algebra is quite different in
2.13.0 and 3.1.0 (or apparently any 3.n.n version).

Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
with the algebra given by qparse --explain for 2.13.0 and 3.1.0:

prefix : 
CONSTRUCT {
?var1  ?var2 .
}
WHERE {
FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
"str3" || regex(?var4,"pat1"
?var2 :p1 ?var4 ; :p2 ?var3 .
{{
?var1 :p3 ?var4 .
} UNION {
?var1 :p4 ?var4 .
}}
}
 
Jena-2.13.0 produces algebra:
(prefix ((: ))
  (sequence
(filter (|| (= ?var3 "str1") (= ?var3 "str2"))
  (sequence
(filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
"pat1")))
  (bgp (triple ?var2 :p1 ?var4)))
(bgp (triple ?var2 :p2 ?var3
(union
  (bgp (triple ?var1 :p3 ?var4))
  (bgp (triple ?var1 :p4 ?var4)
 
Jena-3.1.0 produces algebra:
(prefix ((: ))
  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
"pat1")))
(disjunction
  (assign ((?var3 "str1"))
(sequence
  (bgp
(triple ?var2 :p1 ?var4)
(triple ?var2 :p2 "str1")
  )
  (union
(bgp (triple ?var1 :p3 ?var4))
(bgp (triple ?var1 :p4 ?var4)
  (assign ((?var3 "str2"))
(sequence
  (bgp
(triple ?var2 :p1 ?var4)
(triple ?var2 :p2 "str2")
  )
  (union
(bgp (triple ?var1 :p3 ?var4))
(bgp (triple ?var1 :p4 ?var4
 
Thanks for any insight or assistance into this problem.

Regards,
--Paul



Re: Relationship between similar columns from multiple databases

2016-09-07 Thread Paul Tyson
I wrote my own R2RML converter using Jena lib. It seemed cheapest and easiest 
at the time (a few years ago). I have not surveyed current offerings in this 
space.

Best,
--Paul

> On Sep 7, 2016, at 14:13, Martynas Jusevičius <marty...@graphity.org> wrote:
> 
> Paul,
> 
> What are you using for R2RML?
> 
> Ontop looks promising: http://ontop.inf.unibz.it/
> 
>> On Wed, Sep 7, 2016 at 9:11 PM, Paul Tyson <phty...@sbcglobal.net> wrote:
>> Yes, I am using R2RML to convert 4 big PLM DBs into RDF, load in Jena TDB 
>> and serve via fuseki for data mashups and inconsistency reports. Works very 
>> well.
>> 
>> Best,
>> --Paul
>> 
>>> On Sep 7, 2016, at 13:39, Martynas Jusevičius <marty...@graphity.org> wrote:
>>> 
>>> I think R2RML and GRDDL could be of interest to you:
>>> https://www.w3.org/TR/r2rml/
>>> https://www.w3.org/TR/grddl/
>>> 
>>> On Wed, Sep 7, 2016 at 8:00 PM, ☼ R Nair (रविशंकर नायर)
>>> <ravishankar.n...@gmail.com> wrote:
>>>> Agree, the question is whether an RDF can be created out of the data from
>>>> multiple data sources and use it for semantic correlation. That would turn
>>>> the world round. In my organization,  there are at least a PB of data lying
>>>> in disparate sources, untapped because , they are legacy and none knows the
>>>> relationships until explored manually. If Jean is not, any suggestions to
>>>> manage this? Thanks
>>>> 
>>>> Best, Ravion
>>>> 
>>>> On Sep 7, 2016 1:55 PM, "A. Soroka" <aj...@virginia.edu> wrote:
>>>> 
>>>> Jena is an RDF framework-- it's not really designed to integrate SQL
>>>> databases. Are you sure you are using the right product? Does your use case
>>>> involve a good deal of RDF processing?
>>>> 
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>> 
>>>>>> On Sep 7, 2016, at 1:43 PM, ☼ R Nair (रविशंकर नायर) <
>>>>> ravishankar.n...@gmail.com> wrote:
>>>>> 
>>>>> All,
>>>>> 
>>>>> I am new to Jena. I would like to query two databases, mysql and Oracle.
>>>>> Assume that there are similar columns in both. For example MYSQL contains
>>>> a
>>>>> table EMP with ENAME column. Oracle contains, say, DEPT table with
>>>>> EMPLOYEENAME column. What are the steps if I want Jena to find out ENAME
>>>> of
>>>>> MYSQL is same as EMPLOYEENAME column of Oracle, ( and so can be joined).
>>>> Is
>>>>> this possible, at least to get an output saying both columns are similar?
>>>>> If so, how, thanks and appreciate your help.
>>>>> 
>>>>> Best, Ravion
>> 



Re: Relationship between similar columns from multiple databases

2016-09-07 Thread Paul Tyson
Yes, I am using R2RML to convert 4 big PLM DBs into RDF, load in Jena TDB and 
serve via fuseki for data mashups and inconsistency reports. Works very well.

Best,
--Paul

> On Sep 7, 2016, at 13:39, Martynas Jusevičius  wrote:
> 
> I think R2RML and GRDDL could be of interest to you:
> https://www.w3.org/TR/r2rml/
> https://www.w3.org/TR/grddl/
> 
> On Wed, Sep 7, 2016 at 8:00 PM, ☼ R Nair (रविशंकर नायर)
>  wrote:
>> Agree, the question is whether an RDF can be created out of the data from
>> multiple data sources and use it for semantic correlation. That would turn
>> the world round. In my organization,  there are at least a PB of data lying
>> in disparate sources, untapped because , they are legacy and none knows the
>> relationships until explored manually. If Jean is not, any suggestions to
>> manage this? Thanks
>> 
>> Best, Ravion
>> 
>> On Sep 7, 2016 1:55 PM, "A. Soroka"  wrote:
>> 
>> Jena is an RDF framework-- it's not really designed to integrate SQL
>> databases. Are you sure you are using the right product? Does your use case
>> involve a good deal of RDF processing?
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
 On Sep 7, 2016, at 1:43 PM, ☼ R Nair (रविशंकर नायर) <
>>> ravishankar.n...@gmail.com> wrote:
>>> 
>>> All,
>>> 
>>> I am new to Jena. I would like to query two databases, mysql and Oracle.
>>> Assume that there are similar columns in both. For example MYSQL contains
>> a
>>> table EMP with ENAME column. Oracle contains, say, DEPT table with
>>> EMPLOYEENAME column. What are the steps if I want Jena to find out ENAME
>> of
>>> MYSQL is same as EMPLOYEENAME column of Oracle, ( and so can be joined).
>> Is
>>> this possible, at least to get an output saying both columns are similar?
>>> If so, how, thanks and appreciate your help.
>>> 
>>> Best, Ravion



Re: Optimising path to root concept SPARQL query

2016-02-01 Thread Paul Tyson
I don't know that you can get such results from sparql directly. I would get 
flat list of subclass relations in xml (.srx) or Json and then process with 
xslt or JavaScript to write out class hierarchy.

Regards,
--Paul

> On Feb 1, 2016, at 07:05, Joël Kuiper  wrote:
> 
> This message has no content.


Re: optimizing serialization of results from fuseki

2016-01-07 Thread Paul Tyson
On Thu, 2016-01-07 at 08:48 +, Håvard Mikkelsen Ottestad wrote:
> Hi,
> 
> Reordering the filters might help.
I've tried a bit of this, without much noticeable effect. I understand
that these are pretty well optimized when the sparql text is converted
to algebra.

> 
> Also, maybe a stats file would reorder your query to be faster. I dunno how 
> often (or if) fuseki generates a stats file. You can try to generate one by 
> hand when fuseki is shutdown: 
> https://jena.apache.org/documentation/tdb/optimizer.html
> 
There is a stats.opt file so I assume it's using that for optimization.

> Also I’m wondering what the performance is like if you take this line away: 
> ?s :p1/:p2 ?nd.
> 
When this occurs last in the BGP it does not have much effect. At the
top of the BGP it really slowed things down.

> 
> One major performance drain I have seen in the past is filters on string 
> literals. Especially if you are doing anything like CONTAINS or LOWERCASE. Do 
> you have any of that?
> 
I think all my filters are on numeric literals or URIs (see sample query
in other post). On other projects I also have noticed the impact of
string filters, and got much better results using Lucene add-on for
that.

Regards,
--Paul

> Håvard
> 
> 
> 
> 
> On 07/01/16 03:51, "Paul Tyson" <phty...@sbcglobal.net> wrote:
> 
> >On Wed, 2016-01-06 at 18:52 +, Andy Seaborne wrote:
> >> Hi Paul,
> >> 
> >>  > My question is: is total query time limited by search execution speed,
> >>  > or by marshaling and serialization of search results?
> >> 
> >> Costs are a bit of both but normally mainly query.  It also depends on 
> >> the client processing.
> >> 
> >> Some context please:
> >> 1/ What's the storage layer?
> >TDB behind fuseki 2.3.1
> >
> >> 2/ What result set format are you getting?
> >text/csv
> >
> >> 3/ How are you handling the results on receipt in the client?
> >Just writing them to file for testing.
> >
> >> 
> >> (Håvard point about seeing data and query also applies)
> >Sorry, not easy to share the data.
> >
> >> 
> >> The important point is that output is streamed.
> >> 
> >> Result sent while the query is execution; it is not the case that the 
> >> query executes,. all the results calculated and then results are produced.
> >> 
> >> To investigate, modify the query to do something like this
> >> 
> >> SELECT (count(*) AS ?C) { ... }
> >> 
> >> because then the result set cost is low and all the query is executed 
> >> before a result can be produced.
> >> 
> >Yes, I did that, and the time is very nearly the same.
> >
> >So I conclude we are seeing the best performance possible unless there
> >is something terribly wrong with my queries. They are essentially of the
> >form:
> >
> >select ?s
> >where {
> >?nd :prop1 ;
> >  :prop2 "lit1";
> >  :prop3 ?var1;
> >  :prop4 ?var2;
> ># more properties of ?s
> >filter (?var1 > N1 && ?var1 < N2)
> >filter (?var2 in (,,...))
> >#more filters on ?nd properties
> >?s :p1/:p2 ?nd.
> >}
> >
> >Some of the filters get a little more complicated. And there is at least
> >one, possibly 2, UNION clauses. No OPTIONAL clauses. I've dissected the
> >queries and run each individual piece (triple + filter), and it seems to
> >be the more complicated filters that start to slow things down, as might
> >be expected.
> >
> >Thanks for your comments and interest. The performance we're seeing is
> >unacceptable for our application requirements, so I wanted to see if
> >there were any other performance factors I had missed.
> >
> >Regards,
> >--Paul
> >
> >> Andy
> >> 
> >> 
> >> On 06/01/16 16:17, Paul Tyson wrote:
> >> > I have a modest (17M triple) dataset, fairly flat graph. I run some
> >> > queries selecting nodes with anywhere from 12-20 different property
> >> > values.
> >> >
> >> > Result set counts are anywhere from 10,000 to 30,000 nodes. Total
> >> > execution time measured at client are in the 30-40 second range.
> >> >
> >> > The web request begins streaming results immediately, but seems to take
> >> > longer than it should (based on the number of results and size of data
> >> > transfer). I also notice that the time is roughly linear with the size
> >> > of dataset--halving the dataset size halves the result set and the
> >> > execution time. I wouldn't have expected this behavior if all the time
> >> > was due to an indexed search.
> >> >
> >> > My question is: is total query time limited by search execution speed,
> >> > or by marshaling and serialization of search results?
> >> >
> >> > I have tried different query patterns, and believe I have the best
> >> > queries possible for the use case.
> >> >
> >> > I'm looking for other suggestions to reduce overall execution time. The
> >> > performance does not improve drastically going from 4Gb to 8 or 16Gb
> >> > RAM. My test platforms are 64-bit Windows, ranging from small server
> >> > (16Gb RAM, 4 CPU) to laptops with 4Gb RAM.
> >> >
> >> > Thanks,
> >> > --Paul
> >> >
> >> 
> >
> >




Re: optimizing serialization of results from fuseki

2016-01-07 Thread Paul Tyson
Here is an actual query, partially obfuscated. It returns about 18K
nodes in 40 seconds, from a dataset of about 17M triples. (The nodes are
not necessarily distinct.)

The predominant graph structure is like:

?node <- ?lsu -> ?detail -> LSUPROPERTYVALUE

Thanks for your attention and any suggestions for improvement.

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix lsu: <http://rules.example.org/ns/lsu#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT (count(?node) as ?cnt)
WHERE {
?detail lsu:source "XYZ".
?detail lsu:length-type "Ltype".
?detail lsu:max-length-exclusive ?maxe_len;
  lsu:max-length-inclusive ?maxi_len;
  lsu:min-length-inclusive ?mine_len;
  lsu:min-length-exclusive ?mini_len.
FILTER (
  (?maxe_len = rdf:nil || ?maxe_len > "95"^^xsd:decimal)
  && (?maxi_len = rdf:nil || ?maxi_len >= "95"^^xsd:decimal)
  && (?mine_len = rdf:nil || ?mine_len < "95"^^xsd:decimal)
  && (?mini_len  = rdf:nil || ?mini_len <= "95"^^xsd:decimal)
)
?detail lsu:date-type "Date type 1".
{{
  ?detail lsu:retroactive true;
lsu:end-date rdf:nil .
} UNION {
  ?detail lsu:retroactive false;
lsu:start-date ?start ;
lsu:end-date ?end .
  FILTER (?start <= "2006-08-11"^^xsd:date
  && (?end = rdf:nil || ?end >= "2006-08-11"^^xsd:date))
}}
?detail lsu:minimum-age ?min_age;
  lsu:maximum-age ?max_age.
FILTER ((?max_age = rdf:nil || ?max_age >= 8)
 && (?min_age = 0 || ?min_age < 8))
?detail lsu:applicable-for "adfsda" .
?detail lsu:v-type ?v_type.
FILTER (?v_type in (rdf:nil, <http://www.example.org/2015/7/abc>))
?detail lsu:s-type ?s_type.
FILTER (?s_type in (rdf:nil, <http://www.example.org/2015/7/dsfgdsa>))
?detail lsu:max-gg-exclusive ?maxe_gg;
  lsu:max-gg-inclusive ?maxi_gg;
  lsu:min-gg-inclusive ?mine_gg;
  lsu:min-gg-exclusive ?mini_gg.
FILTER (
  (?maxe_gg = rdf:nil || ?maxe_gg > "50"^^xsd:decimal)
  && (?maxi_gg = rdf:nil || ?maxi_gg >= "50"^^xsd:decimal)
  && (?mine_gg = rdf:nil || ?mine_gg < "50"^^xsd:decimal)
  && (?mini_gg = rdf:nil || ?mini_gg <= "50"^^xsd:decimal)
)
?detail lsu:h-m ?h_m.
FILTER (?h_m in (rdf:nil, <http://www.example.org/2015/7/hm1>))
{{
?detail lsu:v-func ?v_func.
FILTER (?v_func in
(<http://www.example.org/2015/7/vf1>,<http://www.example.org/2015/7/vf2>))
} UNION {
?detail lsu:c-n ?c_n.
FILTER (?c_n in
(<http://www.example.org/2015/7/cn1>,<http://www.example.org/2015/7/cn2>,<http://www.example.org/2015/7/cn3>,<http://www.example.org/2015/7/cn4>))
}}
?lsu lsu:lsu-d ?detail.
?lsu lsu:aF ?node.
}


On Thu, 2016-01-07 at 12:36 +, Andy Seaborne wrote:
> It looks like it is the query cost and not the
> 
> > So I conclude we are seeing the best performance possible unless there
> > is something terribly wrong with my queries. They are essentially of the
> > form:
> >
> 
> Details matter here - can you show a real query?
> 
> > select ?s
> > where {
> > ?nd :prop1 ;
> >   :prop2 "lit1";
> >   :prop3 ?var1;
> >   :prop4 ?var2;
> > # more properties of ?s
> 
> ?s doesn't appear until later.
> 
> There is a chance there are cross products in the real query.
> 
> > filter (?var1 > N1 && ?var1 < N2)
> > filter (?var2 in (,,...))
> 
> This usually gets optimized - maybe something else in your query is 
> blocking that.
> 
> Filter order can matter as well.
> 
> > #more filters on ?nd properties
> > ?s :p1/:p2 ?nd.
> > }
> >
> > Some of the filters get a little more complicated. And there is at least
> > one, possibly 2, UNION clauses. No OPTIONAL clauses. I've dissected the
> > queries and run each individual piece (triple + filter), and it seems to
> > be the more complicated filters that start to slow things down, as might
> > be expected.
> >
> > Thanks for your comments and interest. The performance we're seeing is
> > unacceptable for our application requirements, so I wanted to see if
> > there were any other performance factors I had missed.
> 
>   Andy
> 
> On 07/01/16 08:48, Håvard Mikkelsen Ottestad wrote:
> > Hi,
> >
> > Reordering the filters might help.
> >
> > Also, maybe a stats file would reorder your query to be faster. I dunno how 
> > often (or if) fuseki generates a stats file. You can try to generate one by 
> > hand when fuseki is shutdown: 
> > https://jena.apache.org/documentation/tdb/optimizer.html
> >
> > Also I’m wondering what the performance is like if you take this

optimizing serialization of results from fuseki

2016-01-06 Thread Paul Tyson
I have a modest (17M triple) dataset, fairly flat graph. I run some
queries selecting nodes with anywhere from 12-20 different property
values.

Result set counts are anywhere from 10,000 to 30,000 nodes. Total
execution time measured at client are in the 30-40 second range.

The web request begins streaming results immediately, but seems to take
longer than it should (based on the number of results and size of data
transfer). I also notice that the time is roughly linear with the size
of dataset--halving the dataset size halves the result set and the
execution time. I wouldn't have expected this behavior if all the time
was due to an indexed search.

My question is: is total query time limited by search execution speed,
or by marshaling and serialization of search results? 

I have tried different query patterns, and believe I have the best
queries possible for the use case.

I'm looking for other suggestions to reduce overall execution time. The
performance does not improve drastically going from 4Gb to 8 or 16Gb
RAM. My test platforms are 64-bit Windows, ranging from small server
(16Gb RAM, 4 CPU) to laptops with 4Gb RAM.

Thanks,
--Paul



Re: optimizing serialization of results from fuseki

2016-01-06 Thread Paul Tyson
On Wed, 2016-01-06 at 18:52 +, Andy Seaborne wrote:
> Hi Paul,
> 
>  > My question is: is total query time limited by search execution speed,
>  > or by marshaling and serialization of search results?
> 
> Costs are a bit of both but normally mainly query.  It also depends on 
> the client processing.
> 
> Some context please:
> 1/ What's the storage layer?
TDB behind fuseki 2.3.1

> 2/ What result set format are you getting?
text/csv

> 3/ How are you handling the results on receipt in the client?
Just writing them to file for testing.

> 
> (Håvard point about seeing data and query also applies)
Sorry, not easy to share the data.

> 
> The important point is that output is streamed.
> 
> Result sent while the query is execution; it is not the case that the 
> query executes,. all the results calculated and then results are produced.
> 
> To investigate, modify the query to do something like this
> 
> SELECT (count(*) AS ?C) { ... }
> 
> because then the result set cost is low and all the query is executed 
> before a result can be produced.
> 
Yes, I did that, and the time is very nearly the same.

So I conclude we are seeing the best performance possible unless there
is something terribly wrong with my queries. They are essentially of the
form:

select ?s
where {
?nd :prop1 ;
  :prop2 "lit1";
  :prop3 ?var1;
  :prop4 ?var2;
# more properties of ?s
filter (?var1 > N1 && ?var1 < N2)
filter (?var2 in (,,...))
#more filters on ?nd properties
?s :p1/:p2 ?nd.
}

Some of the filters get a little more complicated. And there is at least
one, possibly 2, UNION clauses. No OPTIONAL clauses. I've dissected the
queries and run each individual piece (triple + filter), and it seems to
be the more complicated filters that start to slow things down, as might
be expected.

Thanks for your comments and interest. The performance we're seeing is
unacceptable for our application requirements, so I wanted to see if
there were any other performance factors I had missed.

Regards,
--Paul

> Andy
> 
> 
> On 06/01/16 16:17, Paul Tyson wrote:
> > I have a modest (17M triple) dataset, fairly flat graph. I run some
> > queries selecting nodes with anywhere from 12-20 different property
> > values.
> >
> > Result set counts are anywhere from 10,000 to 30,000 nodes. Total
> > execution time measured at client are in the 30-40 second range.
> >
> > The web request begins streaming results immediately, but seems to take
> > longer than it should (based on the number of results and size of data
> > transfer). I also notice that the time is roughly linear with the size
> > of dataset--halving the dataset size halves the result set and the
> > execution time. I wouldn't have expected this behavior if all the time
> > was due to an indexed search.
> >
> > My question is: is total query time limited by search execution speed,
> > or by marshaling and serialization of search results?
> >
> > I have tried different query patterns, and believe I have the best
> > queries possible for the use case.
> >
> > I'm looking for other suggestions to reduce overall execution time. The
> > performance does not improve drastically going from 4Gb to 8 or 16Gb
> > RAM. My test platforms are 64-bit Windows, ranging from small server
> > (16Gb RAM, 4 CPU) to laptops with 4Gb RAM.
> >
> > Thanks,
> > --Paul
> >
> 




how to set fuseki options under tomcat

2016-01-05 Thread Paul Tyson
On another thread Andy mentioned fuseki options arq:optIndexJoinStrategy
and arq:optMergeBGPs, with example using "--set" command line option.

How do you set these when running under tomcat? I could not find
instructions or examples in the documentation.

Thanks,
--Paul



Re: delete/insert problem with tdbupdate

2015-11-02 Thread Paul Tyson
I guess I had some wrong assumptions about DELETE/INSERT. Will review spec more 
closely. Takeaway for me is that in general DELETE/INSERT is not equivalent to 
DELETE followed by INSERT.

Thanks,
--Paul

> On Nov 2, 2015, at 03:34, Andy Seaborne <a...@apache.org> wrote:
> 
>> On 01/11/15 23:00, Paul Tyson wrote:
>>> On Sun, 2015-11-01 at 10:32 +, Andy Seaborne wrote:
>>> Paul,
>>> 
>>> Could you simplify the example please?
>> 
>>  I will work on that, but ran into some puzzling differences between
>> windows and Linux implementations of jena-3.0.0. Specifically, the linux
>> tdbupdate doesn't recognize --loc argument while Windows does. Anyway
>> that's a side issue.
>> 
>>> 
>>> What is the output of a query that does the WHERE clause when executed
>>> after the first update?
>> 
>> This should be six records: the blank node that " is defined
>> by" ?thing_doc, its type, its term, a blank node that is its gAlt, and
>> the gAlt's type and gSyn.
> 
> Did you run the query?
> 
> I get two rows.
> 
> So 18 triples inserted, not 9.
> 
> Every triple in the template has a blank node. New and different blank nodes 
> are created on each template instantiation.
> 
> c.f. CONSTRUCT.
> 
>>> 
>>> If it has multiple matches (and it looks to me as if it will, so
>>> multiple rows with ?thing_doc, same value each time), then the INSERT in
>>> update 2 is going to put in different blank nodes, for each time the
>>> template is instantiated. If there are >= 2 rods from the WHERE, that's
>>> multiple, different blank nodes.
>> 
>> I assumed the DELETE and INSERT were separate operations, and that the
>> INSERT operated on the dataset after the DELETE had been performed. But
>> if INSERT is calculated concurrently with DELETE, then the observed
>> results might make more sense. I will study this further. It is
>> obviously behaving differently with a combined DELETE/INSERT than with a
>> discrete DELETE followed by an INSERT (in separate updates).
>> 
>> I guess that's the crux of my question: should a DELETE/INSERT update
>> operation yield the same results as a DELETE update followed by an
>> INSERT update, if the DELETE, INSERT, and WHERE clauses are the same?
>> 
>>> 
>>>Andy
>>> 
>>> PS Did you mean a doubly hested OPTIONAL?  The layout suggests not.
>> 
>> Yes, the intent is to make the same pattern work for datasets that do
>> not have any matching records yet in the dataset, as  well as those that
>> have been populated with the INSERT triples. The ex:gAlt property on
>> ex:Thing1 instances is optional. If the outer OPTIONAL were removed we
>> would need different patterns based on whether the dataset already held
>> the target triples.
>> 
>>> 
>>>> On 31/10/15 23:37, Paul Tyson wrote:
>>>> DELETE/INSERT operation does not work as expected using tdbupdate in
>>>> jena-3.0.0.
>>>> 
>>>> Starting with an empty TDB, I run the following two updates. The first
>>>> works as expected, adding 6 triples. The second should remove those
>>>> triples and add 9 new ones. Instead, the graph now has 20 triples. I did
>>>> not analyze them closely to see where they might be coming from.
>>>> 
>>>> If I run the DELETE clause of the 2nd update separately, it removes the
>>>> expected triples, and then the INSERT by itself will add the expected
>>>> triples.
>>>> 
>>>> = 1st DELETE/INSERT =
>>>> prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>>> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>>>> prefix ex: <http://example.org/ns/foo#>
>>>> DELETE {
>>>> ?thing_old rdfs:isDefinedBy ?thing_doc ;
>>>>rdf:type ?old_type;
>>>>ex:term ?old_term;
>>>>ex:gAlt ?old_alt .
>>>> ?old_alt ?old_alt_p ?old_alt_o .
>>>> }
>>>> INSERT {
>>>> _:b1 a ex:Thing1 ;
>>>>   rdfs:isDefinedBy ?thing_doc;
>>>>   ex:term "ABCD";
>>>>   ex:gAlt [a ex:Thing2;ex:gSyn "Automatic Widget"];
>>>> .
>>>> }
>>>> WHERE {
>>>> VALUES (?thing_doc) {(<http://example.org/doc/20849DBD>)}
>>>> OPTIONAL {?thing_old rdfs:isDefinedBy ?thing_doc ;
>>>>rdf:type ?old_type ;
>>>>ex:term ?old_term .
>>>> OPTIONAL {?thing_old ex:gAlt ?old_alt.

Re: delete/insert problem with tdbupdate

2015-11-01 Thread Paul Tyson
On Sun, 2015-11-01 at 10:32 +, Andy Seaborne wrote:
> Paul,
> 
> Could you simplify the example please?

 I will work on that, but ran into some puzzling differences between
windows and Linux implementations of jena-3.0.0. Specifically, the linux
tdbupdate doesn't recognize --loc argument while Windows does. Anyway
that's a side issue.

> 
> What is the output of a query that does the WHERE clause when executed 
> after the first update?

This should be six records: the blank node that " is defined
by" ?thing_doc, its type, its term, a blank node that is its gAlt, and
the gAlt's type and gSyn.
> 
> If it has multiple matches (and it looks to me as if it will, so 
> multiple rows with ?thing_doc, same value each time), then the INSERT in 
> update 2 is going to put in different blank nodes, for each time the 
> template is instantiated. If there are >= 2 rods from the WHERE, that's 
> multiple, different blank nodes.

I assumed the DELETE and INSERT were separate operations, and that the
INSERT operated on the dataset after the DELETE had been performed. But
if INSERT is calculated concurrently with DELETE, then the observed
results might make more sense. I will study this further. It is
obviously behaving differently with a combined DELETE/INSERT than with a
discrete DELETE followed by an INSERT (in separate updates).

I guess that's the crux of my question: should a DELETE/INSERT update
operation yield the same results as a DELETE update followed by an
INSERT update, if the DELETE, INSERT, and WHERE clauses are the same?

> 
>   Andy
> 
> PS Did you mean a doubly hested OPTIONAL?  The layout suggests not.

Yes, the intent is to make the same pattern work for datasets that do
not have any matching records yet in the dataset, as  well as those that
have been populated with the INSERT triples. The ex:gAlt property on
ex:Thing1 instances is optional. If the outer OPTIONAL were removed we
would need different patterns based on whether the dataset already held
the target triples. 

> 
> On 31/10/15 23:37, Paul Tyson wrote:
> > DELETE/INSERT operation does not work as expected using tdbupdate in
> > jena-3.0.0.
> >
> > Starting with an empty TDB, I run the following two updates. The first
> > works as expected, adding 6 triples. The second should remove those
> > triples and add 9 new ones. Instead, the graph now has 20 triples. I did
> > not analyze them closely to see where they might be coming from.
> >
> > If I run the DELETE clause of the 2nd update separately, it removes the
> > expected triples, and then the INSERT by itself will add the expected
> > triples.
> >
> > = 1st DELETE/INSERT =
> > prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> > prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> > prefix ex: <http://example.org/ns/foo#>
> > DELETE {
> > ?thing_old rdfs:isDefinedBy ?thing_doc ;
> >rdf:type ?old_type;
> >ex:term ?old_term;
> >ex:gAlt ?old_alt .
> > ?old_alt ?old_alt_p ?old_alt_o .
> > }
> > INSERT {
> > _:b1 a ex:Thing1 ;
> >   rdfs:isDefinedBy ?thing_doc;
> >   ex:term "ABCD";
> >   ex:gAlt [a ex:Thing2;ex:gSyn "Automatic Widget"];
> > .
> > }
> > WHERE {
> > VALUES (?thing_doc) {(<http://example.org/doc/20849DBD>)}
> > OPTIONAL {?thing_old rdfs:isDefinedBy ?thing_doc ;
> >rdf:type ?old_type ;
> >ex:term ?old_term .
> > OPTIONAL {?thing_old ex:gAlt ?old_alt.
> >?old_alt ?old_alt_p ?old_alt_o.}}
> > }
> >
> > = 2nd DELETE/INSERT =
> > prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> > prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> > prefix ex: <http://example.org/ns/foo#>
> > DELETE {
> > ?thing_old rdfs:isDefinedBy ?thing_doc ;
> >rdf:type ?old_type;
> >ex:term ?old_term;
> >ex:gAlt ?old_alt .
> > ?old_alt ?old_alt_p ?old_alt_o .
> > }
> > INSERT {
> > _:b1 a ex:Thing1 ;
> >   rdfs:isDefinedBy ?thing_doc;
> >   ex:term "ABCD";
> >   ex:gAlt [a ex:Thing2;ex:gSyn "Automatic Widget"];
> >   ex:gAlt [a ex:Thing2;ex:gSyn "Automated Widget"];
> > .
> >
> > }
> > WHERE {
> > VALUES (?thing_doc) {(<http://example.org/doc/20849DBD>)}
> > OPTIONAL {?thing_old rdfs:isDefinedBy ?thing_doc ;
> >rdf:type ?old_type ;
> >ex:term ?old_term .
> > OPTIONAL {?thing_old ex:gAlt ?old_alt.
> >?old_alt ?old_alt_p ?old_alt_o.}}
> > }
> >
> >
> 




Re: fuseki 2.3.0 startup failed in tomcat

2015-08-15 Thread Paul Tyson
Never mind, this was an unrelated problem in server.xml file.

Regards,
--Paul

On Sat, 2015-08-15 at 12:06 -0500, Paul Tyson wrote:
 I dropped fuseki.war in webapps directory of tomcat.
 
 The server.xml has been configured to support another big webapp, but as
 far as I can tell should not exclude other webapps from running.
 
 fuseki fails to start and leaves these messages in log. Can anyone give
 a clue where to look for root cause? Thanks in advance.
 
 Aug 14, 2015 10:19:30 PM org.apache.catalina.startup.TldConfig execute
 INFO: At least one JAR was scanned for TLDs yet contained no TLDs.
 Enable debug logging for this logger for a complete list of JARs that
 were scanned but no TLDs were found in them. Skipping unneeded JARs
 during scanning can improve startup time and JSP compilation time.
 Aug 14, 2015 10:19:30 PM org.apache.catalina.core.StandardContext
 startInternal
 SEVERE: One or more listeners failed to start. Full details will be
 found in the appropriate container log file
 Aug 14, 2015 10:19:30 PM org.apache.catalina.core.StandardContext
 startInternal
 SEVERE: Context [/fuseki] startup failed due to previous errors
 Aug 14, 2015 10:19:36 PM org.apache.catalina.startup.TldConfig execute
 INFO: At least one JAR was scanned for TLDs yet contained no TLDs.
 Enable debug logging for this logger for a complete list of JARs that
 were scanned but no TLDs were found in them. Skipping unneeded JARs
 during scanning can improve startup time and JSP compilation time.
 Aug 14, 2015 10:19:36 PM org.apache.catalina.core.StandardContext
 startInternal
 SEVERE: One or more listeners failed to start. Full details will be
 found in the appropriate container log file
 Aug 14, 2015 10:19:36 PM org.apache.catalina.core.StandardContext
 startInternal
 SEVERE: Context [/fuseki] startup failed due to previous errors
 
 Regards,
 --Paul
 




fuseki 2.3.0 startup failed in tomcat

2015-08-15 Thread Paul Tyson
I dropped fuseki.war in webapps directory of tomcat.

The server.xml has been configured to support another big webapp, but as
far as I can tell should not exclude other webapps from running.

fuseki fails to start and leaves these messages in log. Can anyone give
a clue where to look for root cause? Thanks in advance.

Aug 14, 2015 10:19:30 PM org.apache.catalina.startup.TldConfig execute
INFO: At least one JAR was scanned for TLDs yet contained no TLDs.
Enable debug logging for this logger for a complete list of JARs that
were scanned but no TLDs were found in them. Skipping unneeded JARs
during scanning can improve startup time and JSP compilation time.
Aug 14, 2015 10:19:30 PM org.apache.catalina.core.StandardContext
startInternal
SEVERE: One or more listeners failed to start. Full details will be
found in the appropriate container log file
Aug 14, 2015 10:19:30 PM org.apache.catalina.core.StandardContext
startInternal
SEVERE: Context [/fuseki] startup failed due to previous errors
Aug 14, 2015 10:19:36 PM org.apache.catalina.startup.TldConfig execute
INFO: At least one JAR was scanned for TLDs yet contained no TLDs.
Enable debug logging for this logger for a complete list of JARs that
were scanned but no TLDs were found in them. Skipping unneeded JARs
during scanning can improve startup time and JSP compilation time.
Aug 14, 2015 10:19:36 PM org.apache.catalina.core.StandardContext
startInternal
SEVERE: One or more listeners failed to start. Full details will be
found in the appropriate container log file
Aug 14, 2015 10:19:36 PM org.apache.catalina.core.StandardContext
startInternal
SEVERE: Context [/fuseki] startup failed due to previous errors

Regards,
--Paul



Re: fuseki ontmodel capacity

2015-07-01 Thread Paul Tyson
Hi Andy, further questions below.

On Fri, 2015-06-26 at 18:47 +0100, Andy Seaborne wrote:
 On 25/06/15 17:35, Paul Tyson wrote:
  Hi Andy, no joy yet.
 
  On Wed, 2015-06-24 at 22:36 +0100, Andy Seaborne wrote:
  On 24/06/15 21:37, Paul Tyson wrote:
  Before working through the configuration of an ontology model in
  fuseki2, I wanted to ask if anyone has experience with large models.
 
  I estimate there will be 250K class definitions, about 40M triples.
 
  That's not that large :-)
  (There are installations I've worked with x10 that many triples).
 
  What hardware are you running on?
 
  I just took a small sample (500 class definitions), and am running on a
  Windows laptop with 4Gb memory.
 
 
 
  My queries will be for instance checking:
 
  select ?class
  where {
  _:a rdf:type ?class;
 ex:p1 v1;
 ex:p2 R1;
 
  Well, depending on the frequencies of
 
  ?  ex:p1 v1
  and
  ?  ex:p2 R1
 
  that query will be quite fast.  i.e. if there is a property-object that
  selects a few resources (100s), then that 's no much of a stress test
  should work fine.
 
  I specified OWL_MEM_MICRO_RULE_INF for the OntModel. Fuseki grinds on
  this query for a long time (more than 30 minutes) and consumes all
  memory and cpu.
 
 That's because there is inference as well as storage.  Ontology model 
 don't have to have inference.
 
 The rule engine and TDB can interact badly, especially as the database 
 grows relative to the available RAM.

Is there a configuration to avoid this bad behavior?

 
 Which of the micro rules do you have a requirement for?

At this point I only need someValuesFrom, allValuesFrom, intersectionOf,
unionOf, hasValue, equivalentClass. I see a comment in one of the rules
files that intersectionOf is implemented procedurally. What triggers the
intersectionOf functionality, in case of a customized ruleset?

I also need datatype restrictions from OWL2. I started to write rules
for it, but the owl:withRestrictions facet list seems to present the
same problem as intersectionOf, as it requires iterating through a list.
I could brute-force it by writing explicit rules to cover all the
possible patterns in my data.

Regards,
--Paul

 
   Andy
 
 
 
 
 
  # ...all properties of _:a
  .
  }
 
  Is there hope of fast performance in this scenario?
 
  Yes.
 
  To be clear, my class definitions are like:
  Class1 owl:equivalentClass [owl:intersectionOf (
 [owl:unionOf (A1 A2 ...)]
 [owl:unionOf (B1 B2 ...)]
 )];
  A1 owl:equivalentClass [owl:Restriction;
 owl:onProperty ex:p1;
 owl:hasValue v1].
  B1 owl:equivalentClass [owl:Restriction;
 owl:onProperty ex:p2;
 owl:hasValue v2].
 
  I expect to find Class1,A1,B1 as a rdf:types of [ex:p1 v1;ex:p2
  v2].
 
  Eventually I need to use OWL2 datatype restrictions, so it doesn't
  appear the current Jena OWL reasoner will get me there, but I wanted to
  explore the capabilities.
 
  I need to get some other tools to validate the ontology to make sure
  there's nothing pathological, but on inspection it looks OK.
 
  Regards,
  --Paul
 
 
 
  Any other approaches for better performance?
 
  Thanks,
  --Paul
 
 
 
 
 




Re: fuseki2 ontmodel config

2015-06-25 Thread Paul Tyson
Thanks Andy. Simple fix, noted below.

However, it turns out I have OWL2 constructs (datatype restrictions)
that apparently are not handled by the Jena OWL reasoners.

On Thu, 2015-06-25 at 12:56 +0100, Andy Seaborne wrote:
 Hi Paul,
 
 On 25/06/15 03:42, Paul Tyson wrote:
  I cannot piece together a workable configuration for fuseki2 with an
  OntModel.
 
  Combining the config-tdb-dir template with info from
  http://jena.markmail.org/message/wr3f6gy5orxbszyd I get the config shown
  below.
 
  When I put this as file tdb-owl.ttl in the FUSEKI_BASE/configuration
  directory, fuseki says Can't find .../tdb-owl.ttl on startup. When I
  point to it with --config option on fuseki-server.bat command line,
  fuseki starts but the dataset does not appear.
 
 In each case, what does the log file show?
 
 For the --config case do you get these two lines adjacent?
 
 [...] Config INFO  Configuration file: config.ttl
 [...] Server INFO  Started 2015/06/25 12:45:20 BST on port 3030
 
 
 
 This config will not work with v2.0.0 and --config because it is missing:
 
 
 [ ja:loadClass  com.hp.hpl.jena.tdb.TDB ] .
 
 tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
 tdb:GraphTDB  rdfs:subClassOf  ja:Model .
 

I had that much.

 
 [] rdf:type fuseki:Server ;
fuseki:services (
   #service_tdb
 ) .

Ah, I was missing the fuseki:services property. That fixed it.

 
 
 (the development version does not need this)
 
 
 I can't tell what the Can't find .../tdb-owl.ttl but it might be a 
 now-fixed-bug JENA-915
 
 Would it be possible for you to try the development build v2.3.0 (Java8 
 required).
 
 https://repository.apache.org/content/groups/snapshots/org/apache/jena/jena-fuseki/2.3.0-SNAPSHOT/
 

Unfortunately, not at this time, as I must work through remaining issues
on this proof-of-concept.

Regards,
--Paul

   Andy
 
  #service_tdb rdf:type fuseki:Service ;
   rdfs:label  TDB DB ;
   fuseki:name db ;
   fuseki:serviceQuery query ;
   fuseki:serviceQuery sparql ;
   fuseki:serviceUpdateupdate ;
   fuseki:serviceUploadupload ;
   fuseki:serviceReadWriteGraphStore  data ;
   # A separate read-only graph store endpoint:
   fuseki:serviceReadGraphStore   get ;
   fuseki:dataset   #dataset ;
   .
 
  #dataset rdf:type   ja:RDFDataset ;
  ja:defaultGraph   #model_inf ;
   .
 
  #model_inf rdf:type ja:OntModel ;
   ja:baseModel #tdbGraph ;
   ja:ontModelSpec ja:OWL_DL_MEM_RULE_INF;
  .
 
  #tdbDataset rdf:type tdb:DatasetTDB ;
  tdb:location DB ;
  .
 
  #tdbGraph rdf:type tdb:GraphTDB ;
  tdb:dataset #tdbDataset
  .
 
  Thanks,
  --Paul
 
 




Re: fuseki ontmodel capacity

2015-06-25 Thread Paul Tyson
Hi Andy, no joy yet.

On Wed, 2015-06-24 at 22:36 +0100, Andy Seaborne wrote:
 On 24/06/15 21:37, Paul Tyson wrote:
  Before working through the configuration of an ontology model in
  fuseki2, I wanted to ask if anyone has experience with large models.
 
  I estimate there will be 250K class definitions, about 40M triples.
 
 That's not that large :-)
 (There are installations I've worked with x10 that many triples).
 
 What hardware are you running on?

I just took a small sample (500 class definitions), and am running on a
Windows laptop with 4Gb memory. 

 
 
  My queries will be for instance checking:
 
  select ?class
  where {
  _:a rdf:type ?class;
ex:p1 v1;
ex:p2 R1;
 
 Well, depending on the frequencies of
 
 ?  ex:p1 v1
 and
 ?  ex:p2 R1
 
 that query will be quite fast.  i.e. if there is a property-object that 
 selects a few resources (100s), then that 's no much of a stress test 
 should work fine.

I specified OWL_MEM_MICRO_RULE_INF for the OntModel. Fuseki grinds on
this query for a long time (more than 30 minutes) and consumes all
memory and cpu.

 
  # ...all properties of _:a
  .
  }
 
  Is there hope of fast performance in this scenario?
 
 Yes.

To be clear, my class definitions are like:
Class1 owl:equivalentClass [owl:intersectionOf (
  [owl:unionOf (A1 A2 ...)]
  [owl:unionOf (B1 B2 ...)]
  )];
A1 owl:equivalentClass [owl:Restriction;
  owl:onProperty ex:p1;
  owl:hasValue v1].
B1 owl:equivalentClass [owl:Restriction;
  owl:onProperty ex:p2;
  owl:hasValue v2].

I expect to find Class1,A1,B1 as a rdf:types of [ex:p1 v1;ex:p2
v2].

Eventually I need to use OWL2 datatype restrictions, so it doesn't
appear the current Jena OWL reasoner will get me there, but I wanted to
explore the capabilities.

I need to get some other tools to validate the ontology to make sure
there's nothing pathological, but on inspection it looks OK.

Regards,
--Paul

 
 
  Any other approaches for better performance?
 
  Thanks,
  --Paul
 
 




fuseki ontmodel capacity

2015-06-24 Thread Paul Tyson
Before working through the configuration of an ontology model in
fuseki2, I wanted to ask if anyone has experience with large models.

I estimate there will be 250K class definitions, about 40M triples.

My queries will be for instance checking:

select ?class
where {
_:a rdf:type ?class;
 ex:p1 v1;
 ex:p2 R1;
# ...all properties of _:a
.
}

Is there hope of fast performance in this scenario?

Any other approaches for better performance?

Thanks,
--Paul



fuseki2 ontmodel config

2015-06-24 Thread Paul Tyson
I cannot piece together a workable configuration for fuseki2 with an
OntModel.

Combining the config-tdb-dir template with info from
http://jena.markmail.org/message/wr3f6gy5orxbszyd I get the config shown
below.

When I put this as file tdb-owl.ttl in the FUSEKI_BASE/configuration
directory, fuseki says Can't find .../tdb-owl.ttl on startup. When I
point to it with --config option on fuseki-server.bat command line,
fuseki starts but the dataset does not appear.

#service_tdb rdf:type fuseki:Service ;
rdfs:label  TDB DB ;
fuseki:name db ;
fuseki:serviceQuery query ;
fuseki:serviceQuery sparql ;
fuseki:serviceUpdateupdate ;
fuseki:serviceUploadupload ;
fuseki:serviceReadWriteGraphStore  data ;
# A separate read-only graph store endpoint:
fuseki:serviceReadGraphStore   get ;
fuseki:dataset   #dataset ;
.

#dataset rdf:type   ja:RDFDataset ;
   ja:defaultGraph   #model_inf ;
.

#model_inf rdf:type ja:OntModel ;
ja:baseModel #tdbGraph ;
ja:ontModelSpec ja:OWL_DL_MEM_RULE_INF;
.

#tdbDataset rdf:type tdb:DatasetTDB ;
   tdb:location DB ;
   .

#tdbGraph rdf:type tdb:GraphTDB ;
   tdb:dataset #tdbDataset 
.

Thanks,
--Paul



named graph impact on query performance

2015-04-11 Thread Paul Tyson
Hi,

Any theoretical reasons or evidence that lots of named graphs in a TDB
repository will adversely affect query performance?

For example, 50,000 named graphs containing total of 11 million triples.

Some queries will be for specific graphs, but most will be union queries
over the entire repository.

I'm going to find out in the next few days, but curious about community
experience with this.

Thanks,
--Paul



Re: named graph impact on query performance

2015-04-11 Thread Paul Tyson
On Sun, 2015-04-12 at 03:18 +0530, Rose Beck wrote:
 I think performance largely depends on your application and queries.
 Any specific reason for using 50,000 named graphs?

Yes, but other than to say I believe it to be a viable architectural
choice I'd rather not go into details. Of course if performance sucks
that would jeopardize its viability, with TDB anyway.

Regards,
--Paul

 
 On Sun, Apr 12, 2015 at 3:10 AM, Paul Tyson phty...@sbcglobal.net wrote:
  Hi,
 
  Any theoretical reasons or evidence that lots of named graphs in a TDB
  repository will adversely affect query performance?
 
  For example, 50,000 named graphs containing total of 11 million triples.
 
  Some queries will be for specific graphs, but most will be union queries
  over the entire repository.
 
  I'm going to find out in the next few days, but curious about community
  experience with this.
 
  Thanks,
  --Paul
 
 
 
 




Re: hot swap tdb behind fuseki

2015-02-13 Thread Paul Tyson
On Fri, 2015-02-13 at 16:49 +, Andy Seaborne wrote:
 On 12/02/15 22:09, Paul Tyson wrote:
  On Wed, 2015-02-11 at 10:53 +, Andy Seaborne wrote:
  Paul,
 
  You can add a new, pre-built database to a running Fuseki2 server with a
  new name but you can't hot swap an existing name at the moment in Fuseki
  itself (1 or 2).
 
  I built jena-fuseki2 out of the development repository.
 
 Fine if you want to do that but there are nightly builds for convenience 
 (these are not releases):
 
 https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-fuseki/
 

Ah, I didn't know about that (no link on downloads page). I did it the
hard way, but learned a few things.

  I don't see
  anything in the administrative console about adding a new tdb location.
  Is that what the Add persistent dataset feature is supposed to do?
 
 Yes.
 
  That didn't work in my set-up. It didn't add the new name to the list of
  existing datasets (it did however seem to reserve the name, because I
  couldn't add the same name again).
 
 Hmm - it does for me (Firefox, Ubuntu, latest development code).
 
 add new dataset
check persistent - add name - press create dataset
 
 and it jumps to the existing datasets which show the new dataset.
 
 (maybe you have cached old javascript because IIRC this is a fixed bug - 
 but I have had a lot of problems with aggressive caching by browsers - 
 control-F5 may not fix it)

OK, probably a crippled browser (icecat). I saw the new tdb directories
in run/databases, and when I restarted fuseki the new datasets appeared.

 
  Even aside from the name reuse issue (solvable), it is the stopping
  using the old database that is a problem on Windows and this is not
  fixable; Java lacks the capability [1].  It needs a JVM restart to
  release the old files in that case.  It works on Linux.
 
  I'm running it on Linux. Are you saying that if the jena-fuseki2 code
  allowed reuse of the dataset name, you could rebind a new tdb location
  to the same name without restarting the server? (Or, more generally,
  just refresh the data from the existing location, to account for
  filesystem links that may have changed.)
 
 It should allow it - it current doesn't - JENA-869 which when I last 
 looked at the innards showed that internally management of the name had 
 got messy/unclear.  Sorry - plain old messy bug.

I'm thinking the balanced reverse proxy setup would be the best
solution, as you indicated previously. Thanks for the help.

Regards,
--Paul

 
   Andy
 
 
  Thanks,
  --Paul
 
  A proper fix to all this isn't likely before Fuseki 2.0.0 -- it would
  allow an external name to be reused with a different database location
  which half fixes the problem.  At least the internal caches of a
  database go away.
 
  Your idea of having multiple servers and have a load balancer (or a
  reverse proxy) to maintain the external valid URL and switch between
  them is a good one.  At Epimorphics, we do this in production - we
  typically have two copies of the database, two servers, for fault
  tolerance reasons and for load spikes.  We  swap servers by changing
  the load balancers, or a moral equivalent if the backends are stacks of
  several machines (sometimes, we run presentation on a different machine
  to the database - it depends on the scale of the system).
 
 Andy
 
  [1]
  http://bugs.java.com/view_bug.do?bug_id=4724038
  and several others.
 
  On 11/02/15 00:07, Paul Tyson wrote:
  On Tue, 2015-02-10 at 23:38 +, Stian Soiland-Reyes wrote:
  Are you using the tdb to swap just for reading, or would you need to
  synchronize transactions?
 
  Below I'll assume you mean 'reading', and that you want to swap
  because you have a 'newer' tdb store from somewhere.
 
  Yes, that's the use case exactly.
 
 
 
  With fuseki2 (I have not checked with Fuseki 1 which only has a REST
  interface) you can hot add a new tdb store in the web interface. If
  the tdb directory with the given name already exist under
  /etc/fuseki/databases it will be re-used and made live immediately.
 
 
  I currently use fuseki 1 but was looking to upgrade to fuseki2 anyway,
  so this sounds like it will solve my problem.
 
  In my setup I use this in combination with data loading, so that I can
  load offline with tdbloader2 and then immediately make it live in an
  existing running Fuseki 2.
 
  I create the new tdb (using tdbloader) and then update a symlink to
  target the new tdb. Currently I restart fuseki and the new tdb is read
  from the symlink. I want to eliminate the restart. It sounds like the
  fuseki2 web interface will allow this.
 
 
  There's unfortunately an open issue in the web interface with removing
  and adding a store with the same name -
  https://issues.apache.org/jira/browse/JENA-869 - so if you try this
  now with the current SNAPSHOT of Fuseki 2 you would have to make a new
  database name for every swap and copy the tdb store into that before
  adding

Re: hot swap tdb behind fuseki

2015-02-12 Thread Paul Tyson
On Wed, 2015-02-11 at 10:53 +, Andy Seaborne wrote:
 Paul,
 
 You can add a new, pre-built database to a running Fuseki2 server with a 
 new name but you can't hot swap an existing name at the moment in Fuseki 
 itself (1 or 2).

I built jena-fuseki2 out of the development repository. I don't see
anything in the administrative console about adding a new tdb location.
Is that what the Add persistent dataset feature is supposed to do?
That didn't work in my set-up. It didn't add the new name to the list of
existing datasets (it did however seem to reserve the name, because I
couldn't add the same name again).

 
 Even aside from the name reuse issue (solvable), it is the stopping 
 using the old database that is a problem on Windows and this is not 
 fixable; Java lacks the capability [1].  It needs a JVM restart to 
 release the old files in that case.  It works on Linux.

I'm running it on Linux. Are you saying that if the jena-fuseki2 code
allowed reuse of the dataset name, you could rebind a new tdb location
to the same name without restarting the server? (Or, more generally,
just refresh the data from the existing location, to account for
filesystem links that may have changed.)

Thanks,
--Paul
 
 A proper fix to all this isn't likely before Fuseki 2.0.0 -- it would 
 allow an external name to be reused with a different database location 
 which half fixes the problem.  At least the internal caches of a 
 database go away.
 
 Your idea of having multiple servers and have a load balancer (or a 
 reverse proxy) to maintain the external valid URL and switch between 
 them is a good one.  At Epimorphics, we do this in production - we 
 typically have two copies of the database, two servers, for fault 
 tolerance reasons and for load spikes.  We  swap servers by changing 
 the load balancers, or a moral equivalent if the backends are stacks of 
 several machines (sometimes, we run presentation on a different machine 
 to the database - it depends on the scale of the system).
 
   Andy
 
 [1]
 http://bugs.java.com/view_bug.do?bug_id=4724038
 and several others.
 
 On 11/02/15 00:07, Paul Tyson wrote:
  On Tue, 2015-02-10 at 23:38 +, Stian Soiland-Reyes wrote:
  Are you using the tdb to swap just for reading, or would you need to
  synchronize transactions?
 
  Below I'll assume you mean 'reading', and that you want to swap
  because you have a 'newer' tdb store from somewhere.
 
  Yes, that's the use case exactly.
 
 
 
  With fuseki2 (I have not checked with Fuseki 1 which only has a REST
  interface) you can hot add a new tdb store in the web interface. If
  the tdb directory with the given name already exist under
  /etc/fuseki/databases it will be re-used and made live immediately.
 
 
  I currently use fuseki 1 but was looking to upgrade to fuseki2 anyway,
  so this sounds like it will solve my problem.
 
  In my setup I use this in combination with data loading, so that I can
  load offline with tdbloader2 and then immediately make it live in an
  existing running Fuseki 2.
 
  I create the new tdb (using tdbloader) and then update a symlink to
  target the new tdb. Currently I restart fuseki and the new tdb is read
  from the symlink. I want to eliminate the restart. It sounds like the
  fuseki2 web interface will allow this.
 
 
  There's unfortunately an open issue in the web interface with removing
  and adding a store with the same name -
  https://issues.apache.org/jira/browse/JENA-869 - so if you try this
  now with the current SNAPSHOT of Fuseki 2 you would have to make a new
  database name for every swap and copy the tdb store into that before
  adding it in the user interface.  You could probably hide/simplify
  that name from the URI with a simple Apache httpd ProxyPass or
  RewriteRule
 
  Thanks for the pointers and warning. I'll see if I can work it out.
 
  Regards,
  --Paul
 
 
 
 
  On 10 February 2015 at 19:19, Paul Tyson phty...@sbcglobal.net wrote:
  I've looked through the user documentation but did not find a clue to
  this problem. I have not dug too deeply into the code.
 
  The problem is to safely re-initialize a running fuseki server to read a
  new tdb location.
 
  I've thought of using 2 (or more) jetty or tomcat workers in a
  load-balancing configuration, which would allow staged restarts. But
  before I go there I thought I would ask if there is an easier way.
 
  Does anyone have a usage pattern for this, or can point me to some
  documentation or classes that would get me started?
 
  Thanks,
  --Paul
 
 
 
 
 
 
 




hot swap tdb behind fuseki

2015-02-10 Thread Paul Tyson
I've looked through the user documentation but did not find a clue to
this problem. I have not dug too deeply into the code.

The problem is to safely re-initialize a running fuseki server to read a
new tdb location.

I've thought of using 2 (or more) jetty or tomcat workers in a
load-balancing configuration, which would allow staged restarts. But
before I go there I thought I would ask if there is an easier way.

Does anyone have a usage pattern for this, or can point me to some
documentation or classes that would get me started?

Thanks,
--Paul



Re: hot swap tdb behind fuseki

2015-02-10 Thread Paul Tyson
On Tue, 2015-02-10 at 23:38 +, Stian Soiland-Reyes wrote:
 Are you using the tdb to swap just for reading, or would you need to
 synchronize transactions?
 
 Below I'll assume you mean 'reading', and that you want to swap
 because you have a 'newer' tdb store from somewhere.

Yes, that's the use case exactly.

 
 
 With fuseki2 (I have not checked with Fuseki 1 which only has a REST
 interface) you can hot add a new tdb store in the web interface. If
 the tdb directory with the given name already exist under
 /etc/fuseki/databases it will be re-used and made live immediately.
 

I currently use fuseki 1 but was looking to upgrade to fuseki2 anyway,
so this sounds like it will solve my problem.

 In my setup I use this in combination with data loading, so that I can
 load offline with tdbloader2 and then immediately make it live in an
 existing running Fuseki 2.

I create the new tdb (using tdbloader) and then update a symlink to
target the new tdb. Currently I restart fuseki and the new tdb is read
from the symlink. I want to eliminate the restart. It sounds like the
fuseki2 web interface will allow this.

 
 There's unfortunately an open issue in the web interface with removing
 and adding a store with the same name -
 https://issues.apache.org/jira/browse/JENA-869 - so if you try this
 now with the current SNAPSHOT of Fuseki 2 you would have to make a new
 database name for every swap and copy the tdb store into that before
 adding it in the user interface.  You could probably hide/simplify
 that name from the URI with a simple Apache httpd ProxyPass or
 RewriteRule

Thanks for the pointers and warning. I'll see if I can work it out.

Regards,
--Paul

 
 
 
 On 10 February 2015 at 19:19, Paul Tyson phty...@sbcglobal.net wrote:
  I've looked through the user documentation but did not find a clue to
  this problem. I have not dug too deeply into the code.
 
  The problem is to safely re-initialize a running fuseki server to read a
  new tdb location.
 
  I've thought of using 2 (or more) jetty or tomcat workers in a
  load-balancing configuration, which would allow staged restarts. But
  before I go there I thought I would ask if there is an easier way.
 
  Does anyone have a usage pattern for this, or can point me to some
  documentation or classes that would get me started?
 
  Thanks,
  --Paul
 
 
 
 




Re: Is this a good way to get started?

2014-12-11 Thread Paul Tyson
Hi Nate,

I don't know if your questions were about Rob's particular application,
or just in general, but I'll jump in with a few generic responses in
areas I'm familiar with.

On Thu, 2014-12-11 at 17:48 -0500, Nate Marks wrote:
 This is great feedback.  Thanks for taking the time.  If you wouldn't mind,
 I have a couple follow up questions, but please don't feel pressured to
 respond.  I'm sure you're busy.
 
 When you talk about your T-box data, I think this would contain class
 hierarchies, information about which ones are disjoint, etc. Is that right?

I worried too much about T-box/A-box stuff when I started. I can't say
if it's important to your application or not, but to jena it's all just
triples. As long as it's all syntactically well-formed RDF you can work
on it with the jena toolset.
 
 `Is there ever a risk that a change to the ontology component (T-box) can
 invalidate the data component (A-box)? If so, how do you manage that?

There are plenty of eggheads on other lists who will tell you more about
that than you care to know from a theoretical point of view. I don't use
jena's inferencing tools, but I'm guessing they would help you avoid
most practical problems in this area. I validate my data using sparql,
but I am only interested in content validity, not schema conformance.

 
 When you load straight to the triple store, is there a single RDF? if not,
 do you use an assembler to gather to multiple files?

The jena utility, tdbloader, allows you to create or enlarge a TDB
repository from one or many RDF files (several concrete syntaxes
supported). Other utilities can be used to update the repository
(delete/insert). How you arrange the files to load is entirely up to
you--makes no difference to jena. The tdbloader is particularly good for
initial creation of a TDB repository, but can be used to append triples
as well.

Of course the same functionality is available through the java API.

 
 Does separating the T-box and A-box data have any down sides?  Is it
 invisible to reasoners , for example?
 
 Finally, I'm obviously a complete neophyte.  Am I in the wrong group?  I
 don't want to put noise in the channel

Most of the discussion here is about how to use the jena toolset, less
so about how to design an RDF application or solve specific application
problems. You might look at semantic-...@w3.org or public-...@w3.org for
general discussion of semantic web/linked data applications built with
RDF.

Also you might find the linked data patterns book helpful:
http://patterns.dataincubator.org/book/

Regards,
--Paul




Re: XXXX-Large TDB (suggestion wanted)

2014-11-14 Thread Paul Tyson
Jacek,

Sorry, I wasn't paying close attention to this thread, but saw your last
comment and wanted to chime in.

Of course jena/fuseki (nor any RDF system) can't compete with SQL in the
things SQL is good at.

I load 760M triples in about 8 hours on a Linux VM on what is by now
probably a middle-of-the road machine with an SSD and maybe 8Gb RAM. As
Andy said, the bulk loader must load into an empty store to get
performance benefits. You can just feed all your input files to it at
once--I load a couple of dozen turtle files at a time. I don't use named
graphs, so I don't know if that makes any difference.

I am still surprised at how fast most queries go, although some require
tweaking based on knowledge of the data. I'm not sure a select distinct
type query is a good indicator of general usefulness.

Anyway, you need to select the best tools to get the job done, and it
sounds like you've done enough testing to determine that. I just didn't
want others to think that was a general or common experience with
jena/fuseki.

Regards,
--Paul

On Fri, 2014-11-14 at 22:25 +, Jacek Grzebyta wrote:
 Hi guys,
 
 I give up. Thanks for your help but unfortunately nothing works. I tried
 different triple stores with different combinations. None of them are able
 to work with large data on non-server machine. But Fuseki was the most
 efficient. It took only ca 10 min to have results for request:
 
 select distinct ?class {
 [] a ?class
 }
 
 BigData(R) I had to kill after half an hour whereas Virtuoso I wasn't able
 to load
 I will use postgresql + d2rq.
 
 
 Thanks a lot,
 Jacek
 
 
 
 
 On 12 November 2014 11:46, Andy Seaborne a...@apache.org wrote:
 
  Hi Jacek,
 
  On 11/11/14 16:07, Jacek Grzebyta wrote:
 
  Dear All,
 
  I have a problem with creating a copy (for read only) of a very large
  database. The original data are located at
  ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/latest/.
  I want to load at least all large files (activity, assays and molecule).
  The biggest file (molecule) contains ca 400M triples and it took several
  hours to load. First of all I loaded 'molecule' into a named graph. Than I
  started to load (into the same store) next (activity) file inside another
  named graph. After 2 hours I found it was loaded only ~2M.
 
 
  Loading all the data at once into an empty store is faster than adding
  data in separate updates.  The bulk loader can only do anything special on
  an empty store.
 
   Please help me. How to organise the database? Load each named graph into
  separate tdb storage (what is faster)? Maybe I should not use named graphs
  if I do not need them: I have just wanted split data into graphs based on
  the subject. Maybe it is better to use separate store for each data
  subject. But maybe if I have 3 or more stores and 400M triples in each
  than
  the final service would be painfully slow. I am planning to do sparql
  queries without reasoner but expect quite large output.
 
 
  There isn't a single best that I know of.  The details of your data and
  access requirements will matter.
 
  Some thoughts:
 
  If the SPARQL queries will want to match across different data sources,
  then it'll need to be in one store.  Federated query is unlikely to help.
 
  If you don't need to keep the data separate, load into the default graph.
  It's less work, its easier to write the queries (alternatively, use
  unionDefault Graph for the latter - but it's slower to load into named
  graphs than
 
  Make sure you have the right hardware.  RAM and SSD.  It sounds as if the
  later 2m in 2hr might be suffering from some kind of
 
  If you can borrow a large SSD machine to load the data, you can load and
  prepare the database on that machine and use different hardware for query.
  Query needs adequate RAM but is less SSD sensitive if the RAM is large
  enough.
 
 
  Thanks a lot,
  Jacek
 
 
  Andy
 
 




text query analyzer problem

2014-09-18 Thread Paul Tyson
I've been using the configurable text query analyzer in jena 2.11.2
(fuseki 1.0.2) since it was provided by JENA-654. I use the
KeywordAnalyzer to index a field that contains part numbers, which are
mostly composed of digits with dashes, but a fair amount of alphabetic
characters.

I just noticed a problem searching for alphabetic strings. I can't even
detect a consistent failure mode, but it is definitely doing
case-insensitive searching. I understand this is how the Lucene
StandardAnalyzer works, which leads me to suspect the query parser is
using StandardAnalyzer instead of KeywordAnalyzer. There are other
problems too, which might be due to the query parser tokenizing the
query string incompatibly with the index.

Here's the bit of the assembler file with the entity map:

#entMap a text:EntityMap ;
text:entityField  uri ;
text:defaultField text ;
text:map (
[ text:field text ;
  text:predicate skosxl:literalForm;
  text:analyzer [a text:KeywordAnalyzer;]
]) .

Text queries work quite as expected when the search string is numeric
and dashes. It goes wrong when alphabetic characters are used.

Has anyone else noticed this problem? Do I need a different entity
mapping to make the query parser use KeywordAnalyzer? Where else could I
look for the cause of this problem?

Thanks in advance.

Regards,
--Paul



Re: Configuring Jena TDB for a benchmark

2014-04-19 Thread Paul Tyson
On Sat, 2014-04-19 at 18:33 +0100, Saud Aljaloud wrote:
 Dear Jena folks,
 
 We are investigating how efficient different triple stores, including
  Jena TDB, handle literal strings within SPARQL. To this end, We are
  now working on benchmarking these triple stores against a set of
  specific queries, using the Berlin Benchmark (BSBM) test driver [1],
  dataset and matrices[2].
 

If finding strings quickly is important, use jena text indexing [1]. I
recently improved query performance by a factor of 10 by indexing a
particular property.

I'm not familiar with the benchmark test you mentioned, so I don't know
if this is appropriate for your case.

Regards,
--Paul

[1] http://jena.apache.org/documentation/query/text-query.html





Re: Sparql To SQL

2014-03-30 Thread Paul Tyson
On Sun, 2014-03-30 at 09:37 +, Kamalraj Jairam wrote:
 Hello All,
 
 Whats the best way to convert sparql to SQL using R2RML mappings
 
 and convert resultset from DB to RDF?
 

What are the givens? Do you have existing SPARQL text written against
some RDF produced by some existing R2RML mappings? Or do you have some
SPARQL and some SQL and you want to fill in the R2RML and produce some
RDF?

In any case, it is an interesting scenario that will arise more often as
companies expand their use of RDF while relying mostly on SQL.

There is a higher abstraction that should be explored and possibly
exploited to provide a general pattern for working through these
situations. Since SQL, SPARQL, and R2RML are rule languages compatible
with relational algebra (RA), it should be possible to derive a common
set of RA predicates and classes to create an RDF vocabulary. This
vocabulary can then be used to write queries as production rules in a
generic standard rule language, such as RIF (Rule Interchange Format).
The RIF source can be translated to SQL, SPARQL, or R2RML for execution
in the target system.

Going from RIF to SQL, SPARQL, or R2RML is always going to be easier
than starting from SQL or SPARQL and going to some other format.
Theoretically you should be able to partially translate R2RML to RIF
automatically (SQL embedded in the R2RML will still be opaque). But I
don't know what tools could be used to translate SQL and SPARQL texts to
generic production rules using an RA vocabulary.

The ultimate payoff from this approach is that it will be possible to
link all of your data relations and operations with meaningful business
terminology and processes. It will enable greater visibility and control
of all data operations, and put important elements of business logic in
transparent rules (e.g. RIF) instead of arcane notations such as SQL
SPARQL (or worse, procedural code).

Regards,
--Paul



Re: jena-text indexing fields with KeywordAnalyzer

2014-03-17 Thread Paul Tyson
On Mon, 2014-03-17 at 12:58 +, bwm-epimorphics wrote:
 On 14/03/14 00:51, Paul Tyson wrote:
 
 [...]
  Has anyone else encountered this problem?
 I have.  I have an application that may require using either a different 
 analyzer or the StandardAnalyzer with a different set of stop words.
Have I missed some other way
  to improve response time for a filtered string search, or overestimated
  the possible performance improvement? (I'm new to Lucene.) Would the
  developers consider an enhancement to make this option configurable in
  the text assembler?
 Are you going to have a go at this Paul? If so that's great.

I thought I would first follow up on Osma's suggestion to look at Solr.
I can't see right away how to configure it for the jena-text indexer,
but will analyze it when I get a chance.

 
   I will likely noodle on something in the meantime.  As I couldn't find 
 one, I've created a Jira [1] for it.
 

Good, thanks. I will contribute as I can.

Regards,
--Paul

 Brian
 
 
 [1] https://issues.apache.org/jira/browse/JENA-654
 
  Regards,
  --Paul
 
 



jena-text indexing fields with KeywordAnalyzer

2014-03-13 Thread Paul Tyson
I just tried out the jena-text indexing and query capabilities of jena
2.11. Great stuff, but the property values I indexed contain part
numbers that frequently contain hyphens. Apparently Lucene's
StandardAnalyzer tokenizes on hyphens, so my initial search results were
quite puzzling.

However, even with the limited results, I can see that the text queries
are much faster than strstarts() or regex() filters on the same property
values. So I would like to try indexing the property values using
Lucene's KeywordAnalyzer. I think I can see in the code how this could
be easily done.

Has anyone else encountered this problem? Have I missed some other way
to improve response time for a filtered string search, or overestimated
the possible performance improvement? (I'm new to Lucene.) Would the
developers consider an enhancement to make this option configurable in
the text assembler?

Regards,
--Paul



sparql performance parameters and limitations

2013-06-01 Thread Paul Tyson
I'm seeking guidance for setting expectations for TDB sparql performance
as the size and complexity of the queries grows.

The dataset has about 600 million triples, around 200 million
non-literal nodes, about 500 predicates.

I generate sparql queries from logical rules, which as it turns out can
be complicated. In most cases the generated sparql performs acceptably.
But multiple (possibly nested) OPTIONAL and UNION clauses seem to tip
the scales toward poor performance.

Does anyone have stories from experience or pointers to literature on
the following points:

- Effect of multiple and nested OPTIONAL clauses on process growth.
- What sort of process growth do additional UNION clauses cause: linear,
logarithmic, exponential?
- Does the absolute number of variables or predicates mentioned in the
query affect performance as much as the complexity of the graph
patterns?
- effect of complex, non-normalized FILTER expressions

The TDB dataset is on a fairly powerful machine. I'm not so much
interested in absolute performance numbers, as I am in relative
performance of different queries on the same dataset and platform.

Thanks,
--Paul



Re: stream turtle to TDBLoader

2013-05-16 Thread Paul Tyson
Andy, thanks for the tips.

I decided to modify the producer to make n-triples. It works fine but
initial tests do not run as fast as I had hoped. When making a new TDB,
it averaged 3000 triples/sec, but I was seeing higher rates reading from
ttl files. Still need to do more testing on a higher-powered machine.

Regards,
--Paul

On Thu, 2013-05-16 at 12:50 +0100, Andy Seaborne wrote:
 On 16/05/13 04:21, Paul Tyson wrote:
  Hi,
 
  I'm trying to use the TDBLoader api to stream turtle to the bulk loader
  to create a new TDB repository.
 
  I suspect none of the TDBLoader.load*() methods accept turtle input. I'm
  using version 2.10.1.
 
 When the URL is given, they should pick up the language.  If not, please 
 can we have a complete minimal example.
 
 
  This sort of code produces an immediate RIOT exception:
 
  InputStream is = ...;
  Dataset ds = TDBFactory.createDataset(/tdb);
  TDBLoader ldr = new TDBLoader();
  ldr.loadGraph((GraphTDB) ds.getDefaultModel().getGraph(),is);
 
 When reading from a stream, it does not know the syntax and, yes, 
 N-Triples expected.
 
  (I am working from memory so I might not have the syntax quite right.)
 
  It throws a RIOT exception on the first triple it sees, which is like:
 
  iri1 a foo; p1 str;...
 
  Says it expected IRI instead of keyword a.
 
  Any pointers on how to stream turtle to TDBLoader? I do have control
  over the RDF production, so could produce straight triples. But until
  now I've been writing the RDF to ttl files, then loading with the
  tdbloader command line.
 
  Thanks,
  --Paul
 
 When streaming, you can feed through the turtle script to convert Turtle 
 to N-Triples
 
 ... files ... | turtle | tdbloader -- -
 
 At scale, the extra cross-processes costs don't show any effect. 
 Parsing is not (usually) the dominate cost at scale.
 
 
 If you are incrementally loading graphs into an existing store, 
 TDBLoader does not have any advantages.  It's magic is when the store is 
 empty.
 
 RDFDataMgr.read(in, ds.getDefaultModel(), Lang.TURTLE) ;
 
 should be fine.
 
   Andy
 
 
 
 
 



stream turtle to TDBLoader

2013-05-15 Thread Paul Tyson
Hi,

I'm trying to use the TDBLoader api to stream turtle to the bulk loader
to create a new TDB repository.

I suspect none of the TDBLoader.load*() methods accept turtle input. I'm
using version 2.10.1.

This sort of code produces an immediate RIOT exception:

InputStream is = ...;
Dataset ds = TDBFactory.createDataset(/tdb);
TDBLoader ldr = new TDBLoader();
ldr.loadGraph((GraphTDB) ds.getDefaultModel().getGraph(),is);

(I am working from memory so I might not have the syntax quite right.)

It throws a RIOT exception on the first triple it sees, which is like:

iri1 a foo; p1 str;...

Says it expected IRI instead of keyword a.

Any pointers on how to stream turtle to TDBLoader? I do have control
over the RDF production, so could produce straight triples. But until
now I've been writing the RDF to ttl files, then loading with the
tdbloader command line.

Thanks,
--Paul