Re: State of Elastic/Open Search support in Fuseki

2023-06-19 Thread Adrian Gschwend

On 16.06.23 21:53, Andy Seaborne wrote:

Hi Andy,


 From the documentation:
ah thanks, I read it but have to wrap my head around it with some 
examples to understand what happens.


There is also the model of "One document equals one entity" model that 
might be more appropriate faceted search. It returns the subject URI 
with a Lucene document for multiple triples.


same. That might be what I had in mind.

There then needs to be a facet property function. Would someone like to 
sketch one out as a GH issue?


I'll try to come up with some examples so we can see what would be useful.

** ElasticSearch - if we can negotiate the licensing issues (the client 
libs are OSS but to test them needs a server so it impacts the build; 
there may be a testcontainers.io way round this, or optional tests - we 
need the build to be clean as well as the produced binaries), then this 
could be done and/or solr. It does need someone or someones to take an 
interest in this both now and for keeping the code maintained especially 
if any security issues arise.


But the licensing issues would be solved if we switch to OpenSearch or 
am I missing something?


And I agree on the interest, I will think about it more and see if we 
have a case that is worth spending some time/money on that.


regards

Adrian


Re: State of Elastic/Open Search support in Fuseki

2023-06-15 Thread Adrian Gschwend

On 14.06.23 14:45, Øyvind Gjesdal wrote:

Hi Øyvind,



Facet/aggregation was not implemented as extension functions in SPARQL and
I believe that it also used the same abstraction described in the jena-text
docs:


  One Jena*triple*  equals one Lucene*document*

which makes aggregations/facets not available or usable neither from the
Elasticsearch APIs.


yes I saw that and I also thought that's probably not ideal. I don't 
know much about Elastic in practice, I mainly read tutorials & 
documentation. What I had in mind was that we could define for example 
via SHACL shape (or something comparable) what a "document" contains. So 
it's shapes that would define how we see the document and we could use 
this abstraction for search. So the integration would take SHACL shapes, 
create a "document" out of it that is consumable by Elastic and then we 
could use this for search.


The second thing is that I'm mainly interested in an integration that we 
don't have to update the Elastic index on our own. I guess that the 
Fuseki integration takes care of that so it's "in sync" all the time. I 
would want the Elastic API available as well as this is easier to use 
for the facet use-cases than pure SPARQL. Paging is not trivial in 
SPARQL for use-cases like this, the Elastic API however is built for that.



We switched to jena-text with Lucene after some weeks, which didn't have
aggregations either, but there was much more activity and usage for the
module, and the options for configuring from the assembler files were much
richer.


ok, any example of what you configure in there? I don't think I saw much 
in the documentation for that so far. Aggregations are definitely 
something I would like to have. One example are archival records, where 
we have a hierarchy in the data. And I need to be able to show that 
hierarchy per record (which has it's own IRI) and to browse by hierarchy 
levels as well. This is super easy to represent in RDF but super hard to 
query efficiently.



At the moment I'm unsure if I inspected and looked at the Elasticsearch
APIs directly to check the structure of the documents in the index itself,
after indexing.


What versions did you work on with Elastic?

regards

Adrian



State of Elastic/Open Search support in Fuseki

2023-06-14 Thread Adrian Gschwend



According to https://jena.apache.org/documentation/query/text-query.html 
there was support for text search using Elastic instead of Lucene in 
Fuseki at some point at least. But from what I can see it was removed 
(?) in 4.x.


We have a use-case where faceted search is important and this is quite 
hard in SPARQL 1.1, paging & counting is less than ideal. Either the 
queries get very complex or the counts are wrong.


What was the reason for removing that code, lack of maintenance? If so, 
any ideas on how much work it would be to bring this to the 4.x codebase 
again? I guess it might make sense to switch to OpenSearch as well 
instead with the Elastic license issues.


Anyone has or had Elastic in use in Fuseki & can share some experience? 
I found some posts here and there but not much details about how the 
integration worked.


regards

Adrian



Re: ARQ: query from passed string

2022-12-07 Thread Adrian Gschwend

On 07.12.22 17:02, Andy Seaborne wrote:

Hi Andy,


--query, --file File containing a query

is that not showing up?


yes that is showing up but there is no hint that one could pass a string 
as an option, at least I did not had this idea after reading the help 
output.


I don't see it on the homepage either:

https://jena.apache.org/documentation/query/cmds.html

I can do a pull-request on the website but it should IMO also go into -h 
output.


regards

Adrian


Re: ARQ: query from passed string

2022-12-07 Thread Adrian Gschwend

On 07.12.22 15:00, Lorenz Buehmann wrote:

Hi Lorenz,


arq --data /path/to/file.rdf  "SELECT * WHERE { ?s ?p ?o }"

--query only if you provide it as file


ah indeed that works thanks! I really can't see that based on the help 
page, am I missing something?


regards

Adrian


ARQ: query from passed string

2022-12-06 Thread Adrian Gschwend

Hi,

I just wanted to extract some stuff from a local file where I need a 
super simple one-liner query. I tried this after checking the docs:


arq --query "SELECT * WHERE { ?s ?p ?o }" --data /path/to/file.rdf

But it seems to want this passed as file.

Is there a reason that this cannot be passed as simple one-liner 
argument, as in this example?


regards

Adrian

--
Adrian Gschwend
CEO Zazuko GmbH, Biel, Switzerland

Phone +41 32 510 60 31
Email adrian.gschw...@zazuko.com


Re: Configure fuseki-server with geosparql assembler

2022-02-10 Thread Adrian Gschwend

On 10.02.22 22:18, Erik Bijsterbosch wrote:

Hi Erik,


I pursued my attempt to set up a dockerised fuseki-server and
fuseki-geosparql combi application.
I created one image for boh services which I can start with docker-compose
arguments.


You might want to try the work my colleague Ludovic did:

https://github.com/zazuko/fuseki-geosparql

regards

Adrian


Re: SHACL & TDB

2021-09-28 Thread Adrian Gschwend
On 27.09.21 18:05, Andy Seaborne wrote:

Hi Andy,

> Other than the service name - where does it say in-memory only? The> 
> validation API works on any graphs. The CLI tools do work in-memory.
Thanks for the clarification. I then over-interpreted the documentation
and draw that wrong conclusion. Good to hear!

> How many millions?
Right now still in the 1 digit range

> I've done some work on SHACL to split work out into inline streaming
> (e.g. anything checking objects which is often the bulk of shapes work +
> domain and range like things), entity wide items until later (e.g. track
> what might be affected cardinality) and things that are unknown (SPARQL)
> until the end. But not in the codebase.

yeah some things can work in streaming in SHACL, others can't.

> ShEx with its "shapes map" has a different approach - the shapes map can
> direct the validation so the app can break it up into chunks or
> calculate from changes. Might be interesting to use that concept sometime.

ah never had a look at that.

thanks

regards

Adrian


SHACL & TDB

2021-09-27 Thread Adrian Gschwend
Hi group,

I recently started playing with SHACL validation in Jena, worked well
with the cli for the few things I tested, thanks for the work!

What I don't really get is where SHACL support in Jena is heading. From
what I understand in the documentation this is currently only usable in
in-memory scenarios. According to the doc the same is true for using
SHACL in Fuseki.

I guess that this will work up to a certain size of the graph. Is there
a plan/scenario where I could do the same on TDB indexes/backend?

The use-case I'm thinking about would be validating observations in RDF
data cubes with SHACL. This could potentially be millions of
observations so I guess there is a certain limit of what I could
validate in the in-memory setup.

regards

Adrian


Re: Scalability

2021-07-24 Thread Adrian Gschwend
On 24.07.21 11:35, Matt Whitby wrote:

> I did say it was somewhat of a 'length of a piece of string' type
> question.  Having just dealt with relational databases for 25 years I know
> pretty much nothing about triplestores. Realistically I don't see us
> storing over 250m triples and if you say the whole Wikidata dump works ok
> (and that's sitting at 12,878,628,927) then I'm pretty sure we'll be okay.

250M is not a problem per se, it only depends on how "graphy" your
queries will be. Standard stuff should be fine, for very complex joins
Jena runs into troubles at some point.

regards

Adrian


Re: Fuseki Graph Store Protocol: Streaming or not?

2021-06-29 Thread Adrian Gschwend
On 29.06.21 20:29, Andy Seaborne wrote:

Hi Andy,

> I'd expect faster though there are a lot of environmental factors. Lots
> of question below ...

good point

> I've loaded 200+million on the default setup into a live server using
> TDB2 before.

ok good to know. I started to have some doubts, that's why I asked.

> How is the data being sent? What's the client software?

In this test it's pure curl to a named graph:

curl -X PUT \
 -n \
 -H Content-Type:application/n-triples \
 -T scope.nt \
 -G $SINK_ENDPOINT_URL \
 --data-urlencode graph=https://some-named-graph/graph/ais-metadata

> Does the data have a lot of long literals?

What is "long" in that context? It's archival records so they indeed do
have longer literals, at least partially.

> Is it "same machine" or are sender and server on different machines?

different machine, endpoint is in a hosted kubernetes cluster. There is
obviously some overhead because of the line but it should not be a big
issue in this setup.

> What does the log have in it? And what's in the log if running "verbose"
> which prints more HTTP details.

What would be of interest here?

But that reminds me of something else, it's a custom Fuseki version we
made with Open Telemetry integrated so we can get a lot more tracing:

https://github.com/zazuko/docker-fuseki-otel

(we started doing that when having problems that were almost impossible
to debug otherwise as the final sender will also be somewhere else with
proxy & other stuff that makes debugging super hard).

> If the server also live, running queries?

nothing extraordinary right now no, still early phase.

> Which form of Fuseki? The low level of HTTP is provided by the web
> server - Jetty or Tomcat.

The zip we use is taken from maven, it's
apache-jena-fuseki-${JENA_VERSION}.zip, not sure what this one is using?

Source:
https://github.com/zazuko/docker-fuseki-otel/blob/main/image/Dockerfile#L22

> I presume this is not a set the dataset is nested in some other
> functionality?
> 
> (and what's the version, though nothing has changed directly but you
> never know... maybe a dependency)

good point after we added open telemetry I think we did not go back to
original fuseki with no modifications anymore.

> If the data is available, I can try to run it at my end.

it is:

http://ktk.netlabs.org/misc/rdf/scope.nt.gz


> TDB1:: There is a limitation on the single transaction as it requires
> temporary heap space. With TDB1, sending chunks avoids the limit (unless
> the server is under other load and can't find the time to flush the
> transaction to the database from the journal).

ok that is pretty much what we experienced. In other words in this setup
TDB1 will always have this limitation, good to know thanks.

> TDB2:: There is no such a limitation nor is it affected by a concurrent
> read load holding up freeing resources.
> 
> In fact, TDB2 does some of the work while the transaction is in-progress
> that TDB1 does at the end.

excellent. With TDB1 we never managed to write everything without OOM,
with TDB2 it's slow but we could write the full batch.

>> We send application/n-triples so I was expecting that it streams it.
> 
> Yes, if it can.

ok

> Loading in Fuseki is not full "tdbloader" in either TDB1 or TDB2.

ok I expected tdbloader "cheats" as it's super fast. Not a problem per
se obviously. We have another setup where we load with tdbloader and
then replace the instance in kubernetes. No outside writes allowed in
that setup.

> As in -Xmx6G on what size of machine? If 8G-ish, it's going to suffer
> lack of space in the disk cache. 2G is likely fine.

ok will check with my devops colleagues, not sure.

> Is the storage spinning disk or SSD?

same

>> TDB2 seems to behave a bit better, it runs through without OOM but takes
>> 1.5 hours for the job while it is less than 15 minutes when we split it
>> into smaller junks and send it in ~100k triples batches via Graph Store
>> Protocol.
> 
> That is a bit slow.

that was my feeling too.

> If that space is squeezed by Java growing the heap, it can become slow.

ok will check the setup.

> TDB2 - there's a reason why it is not TDB1 :-)

that is very good to know. So far we mainly used it in tdbloader setups
so apparently the issues with TDB1 were less a problem for our use-cases.

thanks for the feedback so far!

regards

Adrian


Fuseki Graph Store Protocol: Streaming or not?

2021-06-29 Thread Adrian Gschwend
Hi everyone,

We have automated pipelines that write to Fuseki using the SPARQL Graph
Store protocol. This seems to work fine for smaller junks of data but
when we write a larger dataset of around 15 million triples in one
batch, this seems to fail.

After checking out what happens, we see an OOM error.

We send application/n-triples so I was expecting that it streams it.
When using tdbloader this size is not really an issue at all.

In this particular setup we first used TDB, the machine has 6GB of
memory assigned.

TDB2 seems to behave a bit better, it runs through without OOM but takes
1.5 hours for the job while it is less than 15 minutes when we split it
into smaller junks and send it in ~100k triples batches via Graph Store
Protocol.

Interestingly we never see more than 1GB of RAM used so I'm even more
confused.

Is this OOM error to be expected for large graph-store writes?


regards

Adrian


Join heavy query does not return

2021-04-08 Thread Adrian Gschwend
Hi everyone,

I got a dataset of around 5 million triples where I do relatively join
heavy queries on it. It's a OLAP cube like structure that combines a
bunch of them in a view.

Fuseki never returns, we configured it with -Xmx60g. I could not find
any other options that could/should be adjusted by default, is this
still the only place to assign more memory to Fuseki?

Next step would be to check out logs & see if I see anything in there.

regards

Adrian


Re: SHACL: parse does not have --out

2020-11-23 Thread Adrian Gschwend
On 24.11.20 00:37, Andy Seaborne wrote:

Hi Andy,

> (3.17.0 will have compact syntax output, 3.16.0 does not)

Ah that explains, thanks for the info.

regards

Adrian


SHACL: parse does not have --out

2020-11-23 Thread Adrian Gschwend
Hi,

According to the SHACL cli docs there should be a --out option:

shacl p --out=FMT FILE

I tried this with 3.16.0 and I get:

shacl p --out t cube-shape.ttl
Unknown argument: out

did that get lost somewhere?

regards

Adrian


Re: simple SPARQL query with UNION => timeout

2020-10-06 Thread Adrian Gschwend
On 06.10.20 16:24, Jean-Marc Vanel wrote:

> PREFIX dcat: 
> PREFIX void: 
> CONSTRUCT {
>   ?thing a ?CLASS . }
> WHERE {
>   ?thing a ?CLASS .
>   { ?thing a dcat:Dataset .} UNION
>   { ?thing a void:Dataset .}
> }

UNION surely works but for that use-case I like the VALUES keyword, I
think this should be equivalent:

WHERE {
  VALUES ?type { dcat:Dataset void:Dataset }
  ?thing a ?type
}

regards

Adrian


Re: Jena transpiled to TypeScript/JavaScript?

2020-05-15 Thread Adrian Gschwend
On 15.05.20 12:39, Martynas Jusevičius wrote:

Hi

> RDF JavaScript frameworks are still ~20 years behind Java. So this had
> me thinking for a while -- instead of reinventing the wheel, would it
> be possible to transpile Jena to TypeScript or JavaScript?

When my colleague sees that I answer your post he will complain that I
"feed the troll" but I'll give it a try anyway as some in here might
have missed what happened in the JS world the past few years.

The goal of a JavaScript ecosystem for should be to be as close to what
JavaScript/Web developers expect as possible. You might not necessarily
like that but that's a sensible choice for attracting new people to RDF.

Within the RDFJS effort, we managed to define a standard interface for
RDF and build tooling around it. Have a look at

https://rdf.js.org/

You will find multiple implementations and abstractions on top of these
interface definitions.

We @Zazuko build additional abstractions on top of it, for example the
RDF-Ext stack. See an introduction to it here

https://zazuko.com/get-started/developers/

and an overview of libraries here

https://github.com/rdf-ext/rdf-ext

Most recently we re-factored Holgers SHACL reference implementation to
match these interfaces:

https://github.com/zazuko/rdf-validate-shacl

A great base for doing additional work is Comunica, a modular framework
for querying the web. It also provides stores & SPARQL interfaces on top:

https://comunica.linkeddatafragments.org/

Quoting from your intro phrase:

> RDF JavaScript frameworks are still ~20 years behind Java.

I know that you Martynas do not give a f-ck about anything that is not
Java and/or XML but let me ask the question anyway: What exactly would
you like to do in JavaScript that should be done in JavaScript and
cannot be done in it right now?

We @Zazuko are happily using JavaScript in Frontend & Backend (Node) and
rely on things like Fuseki or Stardog that are both written in Java on
the backend.

We are also experimenting with WASM for things where it does not make
sense for various reason to approach the problem in JavaScript itself.

But I really fail to see the point of having a full "Jena" API in
JavaScript, however that would look like.

regards

Adrian



Re: Fuseki's Cache-Control

2020-03-19 Thread Adrian Gschwend
On 19.03.20 12:05, Martynas Jusevičius wrote:

> Cache-Control: must-revalidate,no-cache,no-store
> Pragma: no-cache

YMMV, but my take here is:

- a SPARQL endpoint should always return the latest results
- If caching is needed, it should be transparent for the user, as in the
SPARQL endpoint can have its caching indexes internally
- it is the middleware/developers job to add HTTP caching layers where
appropriate

We do that with our SPARQL proxy for example, the middleware there sets
caching headers that are configurable.

And as usual, cache invalidation is the hard part :)

regards

Adrian


Re: Sv: Migrate CVS data into RDF

2019-04-09 Thread Adrian Gschwend
On 09.04.19 09:19, Glenn Eriksson wrote:

> I've been reading about the R2RML Ontop API for Java. But to my
> limited knowledge it seams geared towards queering and transforming
> relational DB (JDBC) to RFD  and not trough merging tabular data with
> RDF model.

Most R2RML implementations work fine with CSV as well but you can also
use RML.

We have an XText based DSL for R2RML/RML so writing the mapping files is
far less work. It's available on Github, using Eclipse DSL. But it's not
much documented yet so you might have to play around with it.

https://github.com/mchlrch/experimental-rmdsl

I use this for really large mappings and it works flawlessly.

Other Java based options:

https://tarql.github.io/

https://github.com/carml/carml

carml is RML, but a much faster implementation than the reference
implementation of RML.

another Java R2RML implementation which I did not try myself:

https://github.com/nkons/r2rml-parser

But out of curiosity, you want GTFS in RDF or you want to know what it
means to get there? Because I personally really try to avoid writing
stuff other people already solved unless I have a good reason to do so :)

regards

Adrian


Re: Migrate CVS data into RDF

2019-04-08 Thread Adrian Gschwend
On 08.04.19 15:13, Glenn Eriksson wrote:

Hi Glenn,

> Use case: I want to migrate CVS data in GTFS (General Transit Feed
> Specification) format into RDF
> 
> *   I have a ontology of GTFS in Turtle format (gtfs.ttl) *   There
> are 12-13 CVS files with data that I want to merge (one at a time)
> with the GTFS model *   Store the migrated data in a triple store for
> further processing

you might want to checkout

https://github.com/linkedconnections/gtfs2lc

and

https://github.com/OpenTransport/gtfs-csv2rdf

And linkedconnections in general. Pieter does great work with this data.

> 1.  Is it possible to use the retired jena-csv module to merge CVS
> data and GTFS. I don't want to use R2RML if I can avoid it since it's
> quite verbose?

In case you still want to do it on your own for whatever reason I would
recommend our CSV on the Web based parser, which is streaming-capable.
GTFS generates a *lot* of data so streaming is a must.

https://github.com/rdf-ext/rdf-parser-csvw

This is for example used here: https://github.com/zazuko/blv-tierseuchen-ld

Node.js based, not JVM.

regards

Adrian


Re: CSV to rdf

2019-02-15 Thread Adrian Gschwend
On 15.02.19 09:50, Conal Tuohy wrote:

> I have used (and I can recommend) CSV2RDF from
> https://github.com/theodi/csv2rdf which is compliant to the W3C's CSV on
> the Web specification. You can generate RDF from CSV file by creating a
> metadata file (in JSON format) which describes the CSV. It uses URI
> Templates to specify how to convert CSV values into resource identifiers.
> 
> NB this is different to Martynus's software of the same name.

I would also recommend CSV on the Web. Unfortunately the Ruby
implementation has quite bad performance, for that reason we wrote our
own streaming-based implementation.

https://github.com/rdf-ext/rdf-parser-csvw

There is no cli-tool right now, we use this in pipelines. For example:

https://github.com/zazuko/blv-tierseuchen-ld/

regards

Adrian


Re: Converting TriX to N-Quads with riot

2018-11-11 Thread Adrian Gschwend
On 11.11.18 23:41, Adrian Gschwend wrote:

> for tools. I will add RDFlib as well soon so we have SHACL too

RDFUnit I mean, sorry



Re: Converting TriX to N-Quads with riot

2018-11-11 Thread Adrian Gschwend
On 11.11.18 23:38, Martynas Jusevičius wrote:

> Nice to know, thanks!
> 
> We should put together a Docker image with all these utilities ;)

https://github.com/zazukoians/docker-node-java-jena

:-D

https://github.com/zazukoians/docker-node-java-jena/blob/master/Dockerfile#L22

for tools. I will add RDFlib as well soon so we have SHACL too

Pull requests/ideas are welcome!

regards

Adrian


Re: Converting TriX to N-Quads with riot

2018-11-11 Thread Adrian Gschwend
On 11.11.18 18:36, Martynas Jusevičius wrote:

> I got a tip that rapper should be able to convert N-Quads to N-Triples.
> http://librdf.org/raptor/rapper.html

I did use rapper too for that but for large datasets it was too slow. I
use "serd" now for batch-conversion, this is by far the fastest lib I
found. It converts nquads to ntriples as well:

https://drobilla.net/software/serd

I once opened an issue for what you are looking for:

https://github.com/drobilla/serd/issues/12

currently, I do it like this:

#!/usr/bin/env bash
set -euo pipefail

tdbdump --loc target/tdb | sed '\#example.org#d' | serdi -o ntriples - |
gzip --stdout > target/everything.nt.gz

In this example I strip out a certain grap, if you don't need that just
skip this part. serdi throws away the graph in nquads


regards

Adrian


BNODE & simple literal argument

2018-04-17 Thread Adrian Gschwend
Hi,

I run a query in Stardog and Fuseki with completely different results.
The query does this (simplifyed):

--
SELECT ?datasetNotation ?propertyNotation ?b_property ?property WHERE {
  ?dataset a qb:DataSet ;
skos:notation ?datasetNotation ;
qb:structure/qb:component/(qb:dimension|qb:measure) ?property .

  ?property skos:notation ?propertyNotation .

BIND(BNODE(CONCAT(?datasetNotation, ?propertyNotation)) AS ?b_property)
}
--

Now the problem is that I get a unique ?b_property for every single row
in TDB/Fuseki, although the strings ?datasetNotation and
?propertyNotation are the same for the same ?dataset. Stardog does not
behave like this and returns the same bnode for each unique string I
pass to BNODE(), also in different rows.

In the spec I find:

https://www.w3.org/TR/sparql11-query/#func-bnode


--
The BNODE function constructs a blank node that is distinct from all
blank nodes in the dataset being queried and distinct from all blank
nodes created by calls to this constructor for other query solutions. If
the no argument form is used, every call results in a distinct blank
node. If the form with a simple literal is used, every call results in
distinct blank nodes for different simple literals, and the same blank
node for calls with the same simple literal within expressions for one
solution mapping.
--

I'm not sure how to interpret the last part. Can I expect the same bnode
for each result in a set or not? It's IMO either a bug in Stardog or
Fuseki :)

I tested in Fuseki with two BNODE assignments:

BIND(BNODE("sugus") AS ?b_property)
BIND(BNODE("sugus") AS ?b_property2)

That does return _:b0

But then again I just get one row back like this.

regards

Adrian


Re: Streaming CONSTRUCT/INSERTs in TDB

2018-03-13 Thread Adrian Gschwend
On 12.03.18 14:37, Andy Seaborne wrote:

> Yes - all this was just using the command line tools.
> 
> Default heap but that's different for me and for you.  I'm running with
> 8G heap (25% of 32G).
> 
> 4G didn't work.

right I tested it now, for the current set 7GB seems to be the barrier.
> Earlier:
> """
>   And I once allocated almost all I had on my system (> 8GB)
> """
> 
> I'm afraid it sounds like that didn't get set - if you can see the
> commandline for the running java process via system tools, it will have
> the -Xmx setting.  Or your data/query are different - I see %% stuff in
> the update.

jup I said something wrong there, it was on 3g, sorry about.

> Don't over allocate - the TDB files are memory mapped files and that dos
> not come out heap.

is there a rule of thumb for that? I now assigned 7GB on a 8GB machine.
But this is a VM spin up for that particular job (in Gitlab) and then
destroyed right afterwards.
> It's about to run out of heap.  java8 has a peculiar feature that when
> heap usage grows, it tries to full GC to create space, then tries again
> and again, ... to the point where the machine is only doing GCs which
> are parallel hence all the CPUs go crazy.

ok that explains. Is that something I should add to the docs for others?
I've seen this pattern quite often but never made that link to memory.

regards

Adrian


Re: Streaming CONSTRUCT/INSERTs in TDB

2018-03-11 Thread Adrian Gschwend
On 10.03.18 00:36, Andy Seaborne wrote:

Hi Andy,

> Executes in 2m 20s (java8) for me (and 1m 49s with java9 which is .
> Default heap which is IIRC 25% of RAM or 8G. Cold JVM, cold file cache.

wow, did you do that with TDB commandline tools? Default heap in terms
of default settings of Fuseki?

> If you have an 8G machine, an 8G heap may cause problems (swapping).

I have 16 gigs on my local system.

> Does the CPU load go up very high, on all cores? That's a sign of a full
> GC trying to reclaim space before a OOME.

yes that's exactly what is happening. What is OOME?

> If you get the same with TDB2, then the space isn't going in
> transactions in TDB1.

not sure what that means but ok :)

regards

Adrian


Re: Streaming CONSTRUCT/INSERTs in TDB

2018-03-06 Thread Adrian Gschwend
On 03.03.18 17:11, Andy Seaborne wrote:

Hi Andy,

> Do you have an example of such an update?

yes I can deliver two use-cases, with data and query. First one is this
dataset:

http://ktk.netlabs.org/misc/rdf/fuseki-lock.nq.gz

Query:

https://pastebin.com/7TbsiAii

This returns reliably in Stardog, in less than one minute. The UNION is
most probably necessary due to blank-nodes issues so I don't think I can
split them.

In Fuseki it runs out with

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded

And I once allocated almost all I had on my system (> 8GB)

> Some cases can't stream, but it is possible some cases aren't streaming
> when they could.

ok

> Or the whole transaction is quite large which is where TDB2 comes in.

I did try that on TDB2 recently as well, same issue.

Will post the other sample later.

regards

Adrian


Streaming CONSTRUCT/INSERTs in TDB

2018-03-02 Thread Adrian Gschwend
Hi group,

I had some discussions with Andy before outside the list. I have a
pipeline that creates a bunch of new triples via CONSTRUCT or INSERT
queries. The patterns are in my opinion pretty easy to stream, if I do
not do an insert but just a count, the query executes fine in TDB. As
soon as I do constructs on top of that it runs forever, takes CPU like
crazy and eventually runs out of memory.

Now my guess is that it tries to create everything in-memory which is
probably simply too much. From some googling I found:

https://issues.apache.org/jira/browse/JENA-329

https://issues.apache.org/jira/browse/JENA-205

if I get that correctly there is streaming for this kind of queries but
I can't figure out if I can use that in tdb cli interfaces like tdbupdate.

Is there a way to use that from cli? My pipelines are run in batches so
I don't really care about speed, I just care that the queries finish.

regards

Adrian


Re: tdbquery: INSERT & graph output format

2018-01-15 Thread Adrian Gschwend
On 14.01.18 13:49, ajs6f wrote:

> Try tdbquery --help and you will see the flags it takes. They include:
> 
> --results= Results format (Result set: text, XML, JSON, CSV, TSV; 
> Graph: RDF serialization)
> 
> Feel free to send an improvement for 
> https://jena.apache.org/documentation/tdb/commands.html, or I will when I get 
> to it.

thanks a lot, that was helpful! Will document it

regards

Adrian


Re: tdbquery: INSERT & graph output format

2018-01-14 Thread Adrian Gschwend
On 14.01.18 15:11, Andy Seaborne wrote:

>  tdbquery --loc DB -results=NT 'CONSTRUCT WHERE { ?s ?p ?o}'
> 
> or N-TRIPLES
> 
> (but not the MIME type :-()

ah great tnx, I did try the mime-type writing indeed

regards

Adrian


tdbquery: INSERT & graph output format

2018-01-14 Thread Adrian Gschwend
Hi,

I am using tdbquery in some pipelines I created. Is there a way to use
INSERT statements as well instead of CONSTRUCTS? From the error message
I got I get the impression that it behaves like executing on the query
interface on Fuseki and not the update, so INSERT is not valid in that
context.

Also I couldn't manage to get n-triples back from the construct, it
always returns Turtle. Is there a way to select the graph output format?

regards

Adrian



Getting latest Fuseki release

2017-09-10 Thread Adrian Gschwend
Hi,

Is there a way to always get the latest release of Fuseki without some
regex magic and then downloading a specific version?

In terms of API or stable latest URL for the download. We would like to
automate something.

thanks

Adrian


Re: SPIN support

2017-09-06 Thread Adrian Gschwend
On 06.09.17 00:21, Holger Knublauch wrote:

Hi Holger,

> Is this about SPARQL functions or functions called from Java? For SPARQL
> functions written in JavaScript, see
> 
> https://w3c.github.io/data-shapes/shacl-js/#js-functions

ok but that still expects the SPARQL engine to support that right?

regards

Adrian


Re: SPIN support

2017-09-05 Thread Adrian Gschwend
On 05.09.17 11:58, Andy Seaborne wrote:

Hi Andy,

> No progress from me - my open-source time has been going elsewhere (TDB2
> - see dev@jena email yesterday).

oh great, will check that out


> This is a restricted case where maybe there is another way?
> 
> Would a custom function do the same?

yes, absolutely

> It would be interesting to have customer functions in javascript
> (non-compiled java) that could be loaded into a server via the "server"
> section of the Fuseki configuration file.

wow that would be very cool. Like this I see a chance to be able to
provide some custom functions myself! With Java I'm out.

regards

Adrian


Re: SPIN support

2017-09-05 Thread Adrian Gschwend
On 05.09.17 00:07, Holger Knublauch wrote:

Hi Holger,

> out of interest: which features of SPIN would be of interest, and which
> of these are not covered by SHACL?
> 
> See also http://spinrdf.org/spin-shacl.html

A simple one actually :) I saw that there is a camelcase function in
SPIN and that's something which would be very useful for cleanup-jobs.
And as I mainly do those within Fuseki I would like to have the function
available in there.

regards

Adrian


Re: SPIN support

2017-09-04 Thread Adrian Gschwend
On 30.12.16 17:24, Andy Seaborne wrote:

Hi Andy,

> For integration with Fuseki there are a couple of things needed:

it's a while ago, did you make any progress on SPIN API support in
Fuseki? Some stuff in SPIN would be useful.

regards

Adrina


Re: Fuseki error: BindingHashMap WARN Binding.add: null value - ignored

2016-08-25 Thread Adrian Gschwend
On 24.08.16 18:10, Andy Seaborne wrote:

Hi Andy,

> PS The &contentTypeSelect= looks wrong: it has a + in it and that gets
> decoded to a space.

ah thanks, will report that to YASGUI.

> Hard to say in such a large query. Do you have a shorter query
> 
> (ideally, use a gist or pastebin? I may have mis-decoded it)

ah sorry that went to a UI which would have decoded it. I copied it to
pastebin as well:

http://pastebin.com/66qz9Gu1

I will do some tests to see which part is triggering it

> "http://data.staatsarchiv-bs.ch/id/archivalresource/CH-27-1/eisenbahn-a-19";
> 
> but literals can't be subjects.

right thanks for catching that. That was an error in code which
generated the query. I fixed that but still same problem with it (also
in 2.4.0 stable)

regards

Adrian




Fuseki error: BindingHashMap WARN Binding.add: null value - ignored

2016-08-24 Thread Adrian Gschwend
Hi,

We have a query with property paths which is very slow. We suspected
that it's related to this Jena bug:

https://issues.apache.org/jira/browse/JENA-1195

I'm now on apache-jena-fuseki-2.4.1-SNAPSHOT of today and when I had a
look at the log when executing the slow query I see tons of entries like

[2016-08-24 10:45:08] BindingHashMap WARN  Binding.add: null value - ignored

Query we execute (endpoint is public):

http://data.alod.ch/sparql/#query=PREFIX+%3A+%3Chttp%3A%2F%2Fvoc.zazuko.com%2Fzack%23%3E%0APREFIX++rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX++text%3A+%3Chttp%3A%2F%2Fjena.apache.org%2Ftext%23%3E%0APREFIX++rdf%3A++%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX++skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0APREFIX++xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ACONSTRUCT+%7B%0A++_%3Ab0+%3AnumberOfResults+%3Fcount.%0A++_%3Ab0+%3AqueryStart+%3Fquerystart.%0A++_%3Ab0+%3AqueryEnd+%3Fqueryend.%0A%7D%0AWHERE+%7B%0A++%7B+%0ASELECT+(COUNT(%3Fsub)+as+%3Fcount)+%7B%0A++GRAPH+%3Chttp%3A%2F%2Fdata.alod.ch%2Fgraph%2Finference-supertitle%3E+%7B%0A(%3Fsub+%3Fscore+%3Fliteral)+text%3Aquery+(skos%3AhiddenLabel+%22*%22)+.%0A++%7D%0AGRAPH+%3Fg+%7B%0A++%3Fsub+%5E%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2FhasPart%3E%2B+%22http%3A%2F%2Fdata.staatsarchiv-bs.ch%2Fid%2Farchivalresource%2FCH-27-1%2Feisenbahn-a-19%22+.%0A%7D%0A%7D%0A++%7D%0A++UNION%0A++%7B%0ASELECT+(MIN(%3Fresourcestart)+as+%3Fquerystart)+(MAX(%3Fresourceend)+as+%3Fqueryend)+%7B%0A++GRAPH+%3Chttp%3A%2F%2Fdata.alod.ch%2Fgraph%2Finference-supertitle%3E+%7B%0A(%3Fsub+%3Fscore+%3Fliteral)+text%3Aquery+(skos%3AhiddenLabel+%22*%22)+.%0A++%7D%0AGRAPH+%3Fg+%7B%0A++%3Fsub+%5E%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2FhasPart%3E%2B+%22http%3A%2F%2Fdata.staatsarchiv-bs.ch%2Fid%2Farchivalresource%2FCH-27-1%2Feisenbahn-a-19%22+.%0A%7D%0A++GRAPH+%3Ft+%7B%0AOPTIONAL+%7B%0A++%3Fsub+%3Chttp%3A%2F%2Fwww.w3.org%2F2006%2Ftime%23intervalEnds%3E+%3Fresourceend+.%0A%7D%0AOPTIONAL+%7B%0A++%3Fsub+%3Chttp%3A%2F%2Fwww.w3.org%2F2006%2Ftime%23intervalStarts%3E+%3Fresourcestart+.++%0A%7D%0AFILTER(+datatype(xsd%3Adate(%3Fresourcestart))+%3D+xsd%3Adate)%0AFILTER(+datatype(xsd%3Adate(%3Fresourceend))+%3D+xsd%3Adate)%0A++%7D%0A%7D%0A++%7D%0A%7D&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=http%3A%2F%2Fdata.admin.ch%3A3030%2Falod%2Fquery&requestMethod=POST&tabTitle=Query&outputFormat=table

any ideas of what could be going wrong here?

regards

Adrian


Re: Fuseki INSERT: GC overhead limit exceeded

2016-04-27 Thread Adrian Gschwend
On 27.04.16 17:48, Andy Seaborne wrote:

Hi Andy,

> Depending on version of the Fuseki script the heap setting can be rather
> low (non aggressive when running on a user machine, not server).
> 
> JVM_ARGS to the fuseki-server script.
> JAVA_OPTIONS to the fuseki service script.

ok thanks will do that. Are we talking about -Xmx here? I have 128GB
available in total, what would you recommend?

regards

Adrian


Fuseki INSERT: GC overhead limit exceeded

2016-04-27 Thread Adrian Gschwend
Hi group,

I did some performance tests with different triplestores on a complex
query I had, results can be found here:

https://gist.github.com/ktk/a04e267dd776da2511692e96fc2b5d99#jena-fuseki

So far I tested Virtuoso, Stardog & Blazegraph. Jena fails with GC issues:

[2016-04-26 21:41:47] Fuseki INFO  [26] 500 GC overhead limit
exceeded (837.944 s)

I found some old post where Rob mentioned this is a TDB design issue. Is
there anything I can do about it or will this query not working with TDB?

You can find the data & query details in my post

regards

Adrian


Re: riot, streaming & memory

2016-04-07 Thread Adrian Gschwend
On 31.03.16 11:28, Andy Seaborne wrote:

Hi Andy,

> If you find a JSON-LD parser that can convert this file to N-triples,
> could you let the list know please?

Talked to my colleague Reto last week and he had mercy:

https://github.com/zazuko/jsonldparser

See the README, currently it only supports a subset of JSON-LD. Pull
requests are welcome :) It's MIT license.

regards

Adrian


Re: riot, streaming & memory

2016-04-03 Thread Adrian Gschwend
On 31.03.16 11:28, Andy Seaborne wrote:

Hi Andy,

> The whole chain JSON->JSON-LD->RDF is designed around small/medium sized
> data.
> 
> If you find a JSON-LD parser that can convert this file to N-triples,
> could you let the list know please?

will do so, might be that we have some streaming parser sooner or later
in the Node.js/JavaScript world. There are some ongoing discussions.

regards

Adrian


Re: riot, streaming & memory

2016-03-30 Thread Adrian Gschwend
On 30.03.16 18:49, Andy Seaborne wrote:

Hi Andy,

> The start of the file is somewhat long literal heavy for the geo data.
> There are some serious

yeah it is a Swiss geodata set which should be published later this year
as RDF. It gets generated as JSON-LD from its original INTERLIS format
(mainly used in Switzerland)

> It does seem to get into some kind of GC hell as the oldest GC
> generation grows which I think is the cause of lost of CPU cycles and
> little real progress.

ok that sounds like bad code, will I run into this issue as well when I
try loading it to Fuseki?

regards

Adrian


Re: riot, streaming & memory

2016-03-29 Thread Adrian Gschwend
On 29.03.16 19:13, Andy Seaborne wrote:

Hi Andy,

> So JSON-LD does not stream end-to-end.

Ok I thought something like this. We have the same problem with the
JavaScript JSON-LD library.

> At 800Mb I would have expected a large enough heap to work for N-triples
> output.  Is the file available online anywhere?

the generated file is here:

http://www.eisenhutinformatik.ch/tmp/swissNAMES3D_LV03.zip

> (and is the JSON-LD one big object?  It is not really JSON sweet spot
> for large objects)

I'm not really into JSON-LD details so not sure

regards

Adrian


riot, streaming & memory

2016-03-29 Thread Adrian Gschwend
Hi group,

I try to convert a 800MB JSON-LD file to something more readable (NT or
Turtle) using riot. Unfortunately I run into memory issues:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded

I tried -Xmx up to 16GB but no luck so far. According to the
documentation riot should try streaming if possible, is that not
available for JSON-LD or am I missing something else?

regards

Adrian

-- 
Adrian Gschwend
Bern University of Applied Sciences




Re: Script creation of a new database in Fuseki 2.3

2015-11-10 Thread Adrian Gschwend
On 10.11.15 23:09, Andy Seaborne wrote:

Hi Andy,

> http://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html
> 
> Suggestion for improvement?

That's exactly what I was looking for thanks. Documents what I didn't
see in the index -> PEBCAK :-)

thanks

adrian


Re: Script creation of a new database in Fuseki 2.3

2015-11-10 Thread Adrian Gschwend
On 10.11.15 20:37, A. Soroka wrote:

Hi,

> That’s Java, but it should be pretty easy to translate. The important
> point is to realize that the administrative forms are at the
> “{your-fuseki-instance}/$” url. In your case you want the
> “{your-fuseki-instance}/$/datasets” section, to which you can POST
> your request for a new dataset.

thanks that helped, got it work like this:

curl -u youruser:yourpassword --data "dbType=tdb&dbName=foobar"
http://localhost:3030/$/datasets

regards

Adrian


Script creation of a new database in Fuseki 2.3

2015-11-10 Thread Adrian Gschwend
Hi,

I can't seem to figure out how I can create a new (persistent) database
via scripts in Fuseki 2.3, documentation didn't help.

I need this for Travis so everything I do needs to be fully automated.
Any hints on what I would have to execute towards the Fuseki endpoint?

regards

Adrian


Re: Content negotiation

2015-10-26 Thread Adrian Gschwend
On 24.10.15 09:45, Trevor Lazarus wrote:

> Thanks for making things clear Andy.
> 
> Is there any to do what the Pubby interface does with Fuseki alone?

No it doesn't.

If you don't like Pubby (I don't) you might also try Trifid-LD:

https://github.com/zazukoians/trifid-ld

regards

Adrian


Re: Fuseki 2.0 configuration issues

2015-08-29 Thread Adrian Gschwend
On 25.08.15 14:06, Andy Seaborne wrote:

Hi Andy,

> The problem is that the configuration of the "bfs" dataset is picked up
> from the configuration database.  Presumably you created it through the UI.

I did! Thanks a lot for the hint. Is this somewhere written in the
documentation? I know that I am not the only one who was running into
that, I've seen similar questions outside the mailinglist.

> That has an assembler that has a timeout of 3000ms for that dataset.
> 
> In Fuseki v2.3.0, UI created databases don't go into the system database
> any more - they go into run/configuration/ as turtle so they are easier
> to edit.
> 
> You can either edit, with SPARQL Update, the system database or,
> simpler, delete the system database entirely (it only had one assembler
> in it) and put a file in run/configuration/.
> 
> There is one for bfs below I used in testing, with your database
> location in it.. It also has all the unnecessary initialization stuff
> removed that is no longer needed for Fuseki 2.3 + TDB. After all, Fuseki
> itself uses TDB so TDB setup and writing for assemblers happens anyway.

great tnx, will work with this one!

regards

Adrian


Re: Fuseki 2.0 configuration issues

2015-08-24 Thread Adrian Gschwend
On 24.08.15 13:05, Andy Seaborne wrote:

Hi Andy,

> Adrian - That would be perfect.  I don't think I need the data, just
> the setup.  It would also be useful to know exactly how you are
> making the call.  A per-query timeout is possible with the header
> "Timeout:" or parameter &timeout=.

I use this curl command to execute it:

curl -H "Accept: application/n-triples" --data-urlencode
query@construct/map_municipality2classes.sparql
http://localhost:3030/bfs/sparql -o out/map_municipality2classes.nt

Didn't check the exact header sent though.

I've tared everything including the data, it's just 14MB compressed.
Execute fuseki-server within the "fuseki" subdirectory, the shell-script
fuseki-construct.sh fires a bunch of construct queries into "out" directory.

http://ktk.netlabs.org/misc/fuseki-timeout.tar.bz2

Note that the one I mention above is commented-out right now in this
shell script.

regards

Adrian


Re: Fuseki 2.0 configuration issues

2015-08-23 Thread Adrian Gschwend
On 09.07.15 11:03, Andy Seaborne wrote:

Hi Andy,


>>> Could this be JENA-918?
>>>
>>>  Andy
>>>
>>> https://issues.apache.org/jira/browse/JENA-918
>>
>> oh I didn't see there is a tdb-config as well. I just changed it there
>> to 5000ms and I still get the error.
>>
>> Can I somehow see what is set at runtime?
> 
> Not really though it would a good idea to annotate the log at the start
> of the query with the timeout set.

that would definitely help for debugging. I can't see that in 2.3.0
release, right?

> The snapshot development version is 2.3.0-SNAPHOT
> 
> If you could try that, it would be very helpful.
> 
> The "3 second" timeout is the marker for JENA-918 but if every setting
> is different to 3000 then I can't explain the situation.  If you could
> send me all the configuration files you use, I will try to recreate this
> when I'm back online (I'm "away" this week)

I still have that on 2.3.0. Do you want just the configuration or also
the data which triggers it? It's gonna be open data so it's not a problem.

Would it be enough to tar the "run" directory or is this not portable?

regards

Adrian


Re: Fuseki 2.0 configuration issues

2015-07-06 Thread Adrian Gschwend
On 03.07.15 20:13, Andy Seaborne wrote:

Hi Andy,

>> so I don't have the impression that config.ttl is really considered when
>> I start fuseki-server in the directory where there is the "run"
>> directory placed?
> 
> Could this be JENA-918?
> 
> Andy
> 
> https://issues.apache.org/jira/browse/JENA-918

oh I didn't see there is a tdb-config as well. I just changed it there
to 5000ms and I still get the error.

Can I somehow see what is set at runtime?

We are still at 2.0.0 as release, right?

regards

Adrian


Re: Fuseki 2.0 configuration issues

2015-06-30 Thread Adrian Gschwend
On 24.06.15 15:12, Andy Seaborne wrote:

Hi Andy,

This is the first answer on your email, expect more in the comming days :-)

> When it's using the run/ area, the sources are run/config.ttl (the
> server setup), configuration/, one file per service+dataset and the
> system database (like configuration/ but a database of named graphs, not
> files).

ok I'm a bit confused by that: I seem to run into the timeout for one
CONSTRUCT query I do, so I tweeked run/config.ttl and enabled:

 ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "3" ] ;

(I just uncommented it from the default config). However, I still seem
to have 0 set:

[2015-06-30 16:55:34] Fuseki INFO  [4] 503 The query timed out
(restricted to null ms) (3.013 s)

so I don't have the impression that config.ttl is really considered when
I start fuseki-server in the directory where there is the "run"
directory placed?

regards

Adrian


Re: SPARQL FILTER & non standard data types

2015-06-21 Thread Adrian Gschwend
On 15.06.15 17:38, Andy Seaborne wrote:

Hi Andy,

> Some thoughts:
> 
> If you can translate to xsd date/time values, < and > will work.

thanks for the various hints, James Anderson proposed another type to
use (outside of the list), I listed the response here:

https://github.com/OpenTransport/gtfs-csv2rdf/issues/13

"if one takes your description as it stands, you seek to represent a
duration either from midnight or from beginning of service, or some
other point in time which fits your model.
if you use a duration to model exactly that, xsd:dayTimeDuration should
have sufficient capacity and the time of day for a given event would be
the combination of that duration and the start of the respective service
day."

That pretty much matches the semantics of the GTFS specs so that would
be the best solution. I tried to figure out what xsd types are supported
in Jena/Fuseki, is there a reference somewhere?

regards

Adrian


Fuseki 2.0 configuration issues

2015-06-18 Thread Adrian Gschwend
Hi group,

I can't seem to understand the new structure of Fuseki 2.0 configuration.

In the documentation I read:

https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html

"The directory FUSEKI_BASE/configuration/ with one data service
assembler per file (includes endpoint details and the dataset description.)

So I understand that as I create a file with the dataset configuration,
in my case I did this:

--
<#service2> rdf:type fuseki:Service ;
fuseki:name "/sbb" ;   # http://host:port/da-ro
fuseki:serviceQuery "query" ;# SPARQL query service
fuseki:serviceReadGraphStore"data" ; # SPARQL Graph store
protocol (read only)
fuseki:dataset   <#dataset> ;

<#dataset> rdf:type  tdb:DatasetTDB ;
tdb:location "sbb" ;
# Query timeout on this dataset (1s, 1000 milliseconds)
ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
# Make the default graph be the union of all named graphs.
## tdb:unionDefaultGraph true ;
--

where the tdb for "sbb" is in run/databases

I called this file sbb.ttl, when I now start fuseki-server without
arguments I get:

 ERROR Exception in initialization: Not found: sbb.ttl

When I create a full config file like in Fuseki 1.x I don't see the
dataset at all, even though it says in startup that it loads it. Config
looks like this:

http://pastebin.com/mZVKBp76

And it's loaded with the --config=config.ttl option

In the admin-interface there is no "sbb" dataset visible.

Any hints on what I'm doing wrong?

regards

Adrian


Re: Fuseki & Union Default Graph

2015-06-18 Thread Adrian Gschwend
On 18.06.15 00:55, Andy Seaborne wrote:

Hi Andy,

> If you are not seeing the default graph as the union of the two named
> graphs, then something is amiss with the setup.  How are you setting
> unionDefaultGraph? (and which version of code are you running?)

So turns out the config file I was changing in the Docker image of
Fuseki was not the one which got loaded.

I will create a new post for some config questions I have

regards

Adrian


Re: Fuseki & Union Default Graph

2015-06-18 Thread Adrian Gschwend
On 18.06.15 00:55, Andy Seaborne wrote:

Hi Andy,

> This is TDB?

yes

> If your turn unionDefaultGraph on, then the stored default graph is not
> visible in a query (it can be accessed via a special graph name that is
> not included in the union of named graphs) so I'd expect it to be
> inaccessible.

ah ok, interestingly the data in the default graph I still see.

> If you are not seeing the default graph as the union of the two named
> graphs, then something is amiss with the setup.  How are you setting
> unionDefaultGraph? (and which version of code are you running?)

I use a Docker image of Fuseki 2.0, in particular
https://registry.hub.docker.com/u/stain/jena-fuseki/

I've configured my TDB like this:

<#sbbtdb> rdf:type  tdb:DatasetTDB ;
tdb:location "sbb" ;
# Make the default graph be the union of all named graphs.
tdb:unionDefaultGraph true ;
# Query timeout on this dataset (1s, 1000 milliseconds)
ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
 .

After that I restarted the docker. I will see if the config is really
read by turning off the configured endpoint for a try.

> unionDefaultGraph works (in TDB) by using the quad indexes and ignoring
> the G part; e.g. it get SPO from SPOG (and reducing any duplicates to
> get set semantics).

ok that makes sense thanks.

> You'll have to load a named graph with data to see it in the default
> union graph.

ok so when I use tdbloader I need to provide a graph name. FYI the
second two graphs I loaded to the store via the new web interface of
Fuseki after that.

regards

Adrian



Fuseki & Union Default Graph

2015-06-17 Thread Adrian Gschwend
Hi,

In my GTFS Fuseki endpoint I loaded the data set to the default graph
using tdbloader. Later I added two new graphs to the store with a
smaller set of data. This seems to work fine, I can address the triples
in each graph.

Now I would like to expose everything in the default graph for easier
querying, for that I added " tdb:unionDefaultGraph true" to my
configuration.

However, this doesn't seem to change anything, I still have to address
the graph to get the triples in the other two graphs.

Is that supposed to work in this combination or is this something I
would have to enable before I load triples to the store?

regards

Adrian


SPARQL FILTER & non standard data types

2015-06-15 Thread Adrian Gschwend
Hi,

I'm using Pieter Colpaerts script for transforming GTFS public transport
schedule data to RDF, see

https://github.com/OpenTransport/gtfs-csv2rdf/

According to the GTFS specification arrivalTime and departureTime can
have more than 24 hours to represent connections which are technically
on a new day but before the end of the "daily" public transport
schedule. As an example a bus departing at 01:33 would be listed as
25:33:00.

While this makes sense for what GTFS aims to do it's highly inconvenient
for querying the data with SPARQL. When using SPARQL one wants to be
able to use the FILTER keyword to restrict to time ranges for the
particular query. This is in that form not possible as we cannot assign
standard values like xsd:time to it.

As an example I would like to get all departures after 22:00 on that
particular stop/connection.

I wonder what would be needed to implement a new data type for GTFS
datasets with that particular behaviour. In terms of what would be
needed from a Fuseki/SPARQL point of view and is this a good idea or
not? It clearly will break any other SPARQL endpoint which would not
know how to handle this data type but it would not require hacks on
GTFS2RDF side.

I will collect ideas in this issue:

https://github.com/OpenTransport/gtfs-csv2rdf/issues/13

regards

Adrian


Re: Fuseki 2.0 frontend & localhost

2015-05-05 Thread Adrian Gschwend
On 05.05.15 17:29, Andy Seaborne wrote:

Hi Andy,

> Are you running the standalone server as it comes striaght out of the
> distribution?

yes

> By default, the admin interface only responds to localhost for security
> reasons.

ah ok. Makes sense

> Have you modified shiro.ini at all?

No, I actually checked the Shiro part in the doc but it seems to be very
minimal so far.

If I get it to work I will contribute it there.

Regarding Shiro, I manage to get everything to work with enabling
anonymous access. I'm not really sure what I am supposed to do to get
basic auth, like where would I login after that? Couldn't see anything.

> If you want more sophisticated security, then at the moment, recommend
> using the facilities of a web app container like Tomcat or Jetty, with
> Fuseki's shiro.ini set to not block things that that point.

What is there seems to be enough for the moment.

regards

Adrian


Fuseki 2.0 frontend & localhost

2015-05-05 Thread Adrian Gschwend
Hi group,

While the new Fuseki 2.0 seems to work quite well on localhost I was not
very successful getting it to work on anything else than that. It's
pretty much broken in a sense that I can't create databases and thus not
much else either.

Is that a known problem and if not, should I file bugs?

regards

Adrian


Re: Fuseki can't find Lucene text index

2015-02-05 Thread Adrian Gschwend
On 05.02.15 19:13, Andy Seaborne wrote:

Hi Andy,

> You can just touch the text index with:
> 
> SELECT (count(*) AS ?C) { ?x text:query  }
> 
> below that level, you can use Lucene tools themselves (or a java program
> to access the Lucene index).

great thanks that helped. I did get results with * but nothing when I
searche for a keyword. Then I disabled this one:

 text:analyzer [ a text:LowerCaseKeywordAnalyzer ]

and with the default one it seems to work. Will read more docs for the
analyzers.

regards

Adrian




Re: Fuseki can't find Lucene text index

2015-02-05 Thread Adrian Gschwend
On 05.02.15 17:23, Andy Seaborne wrote:

Hi andy,

> It probably means the dataset isn't linked to its index in the
> fuseki.ttl configuration file.  Just having a text index isn't enough
> (there can be multiple datasets in one configuration).

that was it thanks, still pointed to the TDB version instead of the
textIndex one.

Now it does search but it is super slow and it never returns a result
even with strings I find in the index. Is there any way to debug how
many entries are in the index and if they return useful answers?

regards

Adrian


Fuseki can't find Lucene text index

2015-02-05 Thread Adrian Gschwend
Hi everyone,

I've set up a Lucene text index according to the documentation. The
configuration & creation of the index seems to have worked out:

% java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --desc=fuseki.ttl
INFO  12494 (2498 per second) properties indexed

I see the Lucene index in the directory I configured and if I execute
"strings" on it I do find values that make sense.

When I execute a query on this configuration I get:

--
16:19:17 WARN  Failed to find the text index : tried context and as a
text-enabled dataset
16:19:17 WARN  No text index - no text search performed
16:19:17 INFO  [2] exec/select
16:19:17 INFO  [2] 200 OK (139 ms)
--

the EntityMap itself looks like this:

--
<#entMap> a text:EntityMap ;
text:entityField  "uri" ;
text:graphField   "graph" ;
text:defaultField "title" ;
text:map (
 # dc:title
 [ text:field "title" ;
   text:predicate dc:title ;
   text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
 # locah:note
 [ text:field "note" ;
   text:predicate locah:note ;
   text:analyzer [ a text:LowerCaseKeywordAnalyzer ] ]
 ) .
--

And the query is:

--
PREFIX dc: 
PREFIX text: 
PREFIX rdfs: 

SELECT  ?s ?title WHERE {

  GRAPH    {
  ?s dc:title ?title .
  ?s text:query (dc:title 'Unfug' 10);

}} LIMIT 10
--

Any ideas what could go wrong here? Googling didn't help much

regards

Adrian


Re: Split strings with SPARQL

2014-07-04 Thread Adrian Gschwend
On 04.07.14 14:41, Andy Seaborne wrote:

> http://jena.apache.org/documentation/query/library-propfunc.html
> 
> apf:strSplit

fantastic, thanks!

regards

Adrian



Split strings with SPARQL

2014-07-04 Thread Adrian Gschwend
Hi group,

I'm moving a lot of legacy data to RDF these days and in several, mostly
unrelated datasets I came across fields which had multiple values in it
which should be split up into multiple entries. Imagine multiple names
as an example.

This is quite common in SQL datasets which are not properly normalized,
which seems to be more common than I thought :-) Fortunately such field
often use separators so one could split that up pretty easily.

I googled around if there is anything I can do with SPARQL (to avoid
scripting) about this and found this posting:

http://answers.semanticweb.com/questions/19162/split-and-trim-strings-in-sparql

The most interesting answer is by Bob DuCharme, which showed how it is
done in TopBraid:

CONSTRUCT { :x :p ?member }
WHERE {
  :x :p ?o .
  ?member spif:split (?o ",")
}

I found this a very useful function, is there anything like in in Jena?
Couldn't find anything documented so far.

regards

Adrian




Re: Building Semantic Web Application

2014-04-18 Thread Adrian Gschwend
On 18.04.14 12:09, Mohammad Hailmizan Bin Jafar wrote:

Hi Mohammad,

> I would like to build e-govt service application (e-services) such as
> Land Information System. I was asked by my superior to do a research
> study on this. Can this be done?

yes this can be done, I think semantic web technologies are even an
excellent choice for doing this. The most advanced things I've seen so
far can probably be found at data.gov.uk, they provide a lot of
information based on semantic web technologies.

Some examples:

http://data.ordnancesurvey.co.uk/
http://www.legislation.gov.uk/

You get an overview of other data sets there at

http://data.gov.uk/data/search?res_format=RDF

The biggest risk you take is to get to know to the new technology stack.
It will take your team some time to start to understand it but once you
got it, you probably won't go back :-) In Switzerland we did a prototype
for the statistical office to learn how to use the stack. This raised
quite some interest from other offices and currently it looks like the
government will start using RDF and Linked Data it in some upcoming
projects. I would recommend to start small and build some sample
applications first and then see if this fits what you expect. Like this
you don't take too big financial risks in the beginning and you have a
better chance for a quick success.

If you need to know more let me know

regards

Adrian






Re: Building Semantic Web Application

2014-04-18 Thread Adrian Gschwend
On 18.04.14 08:32, Mohammad Hailmizan Bin Jafar wrote:

Hi Mohammad,

> Hi, I am new at developing semantic web apps. I have downloaded
> e-book called Semantic Web Programming by Wiley 2009 which is I think
> already outdated. Do you have any new book that you recommend ?

it really depends on what you want to learn, the two books I currently
recommend to people which are new to the domain are:

http://www.learningsparql.com/

and

http://www.amazon.com/Semantic-Web-Working-Ontologist-Second/dp/0123859654/

Learning SPARQL is about the graph query language but in my opinion it
is also a great introduction to some real world examples and not just
SPARQL. The second book is big and some chapters are quite though to
read but gives the best introduction into ontologies I've read so far.

What would you like to build in the end?

regards

Adrian


SPARQL Service Description

2014-03-20 Thread Adrian Gschwend
Hi group,

Am I right that currently Fuseki is not supporting SPARQL Service
Description: http://www.w3.org/TR/sparql11-service-description/ ?

If so any special reason or just no one did it so far?

regards

Adrian



Fwd: Re: @context vs. Turtle @prefix

2013-09-06 Thread Adrian Gschwend
Hi group,

As a followup to the JSON-LD discussion. IMHO this makes it very useful
for web developers so it would be great if

- Fuseki would return JSON-LD when application/ld+json is requested
- Fuseki is using @context for all the prefixes defined in a DESCRIBE or
CONSTRUCT query

what do you think?

regards

Adrian
--- Begin Message ---
On Sep 6, 2013, at 12:19 AM, Adrian Gschwend  wrote:

> Hi group,
> 
> (repost, didn't make it trough the queue last night it seems)
> 
> I'm working on a Pubby-like basic LDP frontend which so far can return
> classical RDF representations of URIs, like text/turtle,
> application/rdf+json etc.
> 
> To convince web developers to use it I would like to introduce json-ld
> as well. My goal is to have something like a well prefixed turtle
> representation but in json-ld. This means that I want to see as few URIs
> as absolutely necessary for the predicates itself.
> 
> The prefixed sample turtle output looks like this
> 
> http://ktk.netlabs.org/misc/12099-shrink.ttl
> 
> With the help of http://rdf-translator.appspot.com/ I managed to get a
> basic json-ld version of this:
> 
> http://ktk.netlabs.org/misc/12099.jsonld
> 
> I do see that it did compact some things compared to the
> application/rdf+json output. But I guess it's not "sexy" enough yet so
> that a web developer would be able to use it without big issues.

The key is in using the proper context when compacting. My own RDF Writer 
interface to JSON-LD, takes the prefixes found on input and uses them to 
automatically generate a context for the JSON-LD output. You can see it in 
operation at 
<http://rdf.greggkellogg.net/distiller?format=jsonld&in_fmt=turtle&uri=http://ktk.netlabs.org/misc/12099-shrink.ttl>,
 although note that my service has been flaky lately due to something else 
which is eating up memory on my VPS. It basically gives you what I think you 
want:

{
  "@context": {
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#";,
"dac": "http://data.admin.ch/vocab/";,
"dbp": "http://dbpedia.org/property/";,
"xsd": "http://www.w3.org/2001/XMLSchema#";,
"geo": "http://www.w3.org/2003/01/geo/wgs84_pos#";,
"foaf": "http://xmlns.com/foaf/0.1/";,
"dbo": "http://dbpedia.org/ontology/";,
"dbpedia": "http://dbpedia.org/resource/";,
"owl": "http://www.w3.org/2002/07/owl#";
  },
  "@id": "http://data.admin.ch/bfs/municipality/12099";,
  "@type": "dac:Municipality",
  "dac:canton": {
"@id": "http://data.admin.ch/bfs/canton/GE";
  },
  "dac:cantonAbbreviation": "GE",
  "dac:district": {
"@id": "http://data.admin.ch/bfs/district/10187";
  },
  "dac:districtHistId": 10187,
  "dac:historyMunicipalityId": 12099,
  "dac:municipalityAdmissionDate": {
"@value": "1960-01-01",
"@type": "xsd:date"
  },
  "dac:municipalityAdmissionMode": 20,
  "dac:municipalityAdmissionNumber": 1000,
  "dac:municipalityDateOfChange": {
"@value": "1960-01-01",
"@type": "xsd:date"
  },
  "dac:municipalityEntryMode": 11,
  "dac:municipalityId": 6621,
  "dac:municipalityLongName": "Genève",
  "dac:municipalityShortName": "Genève",
  "dac:municipalityStatus": 1,
  "dac:neighboringMunicipality": [
{
  "@id": "http://data.admin.ch/bfs/municipality/10902";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/10918";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/11400";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/11431";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/11599";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/12606";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/12624";
},
{
  "@id": "http://data.admin.ch/bfs/municipality/13110";
}
  ],
  "dbo:abstract": {
"@value": "Genf ist eine politische Gemeinde und der Hauptort des Kantons 
Genf in der Schweiz. Genf ist mit seinen rund 192'000 Einwohnern nach Zürich 
die zweitgrösste Stadt der Schweiz. Mit 46,8 Prozent Ausländern zählt Genf, 
neben Leysin (61,7 Prozent) und Kreuzlingen (50,0 Prozent), zu den Städten mit 
hohem Ausländeranteil. Die Stadt liegt am südwestlichen Ra

Re: JSON-LD as a Fuseki output format

2013-09-05 Thread Adrian Gschwend
On 05.09.13 22:33, Adrian Gschwend wrote:

> IMHO json-ld is only useful if everything is "prefixed", otherwise we
> can directly work with application/rdf+json as it looks pretty much the
> same (to me at least).

ok I correct myself, you do get some kind of nesting so it's a bit more
readable for graphs.

regards

Adrian


Re: JSON-LD as a Fuseki output format

2013-09-05 Thread Adrian Gschwend
On 05.09.13 22:05, Andy Seaborne wrote:

> In particular, whether the pure RDF to JSON-LD synatx produces something
> a normal web dev would relate to.

good point, I'm just trying to understand that too (will post to their
list as I don't get it yet).

IMHO json-ld is only useful if everything is "prefixed", otherwise we
can directly work with application/rdf+json as it looks pretty much the
same (to me at least).

IIRC a CONSTRUCT query in Fuseki will use PREFIXes defined and when
text/turtle is requested return a "shrinked" file, which is very
readable (by the way would that be true for DESCRIBE as well?).

So my proposal/wish would be that if no PREFIXes are defined, you get
the basic json-ld view (not sure how they call this one) for
application/ld+json and if we do have PREFIXes, we return a "compacted"
form using @context for all the prefixes.

But then again maybe I just didn't get json-ld yet :)

> We're approaching a jena release.  It is important this release happens
> soon.
> 
> At the moment, this isn't in it.  It does introduce some new
> dependencies so the legal-stuff needs doing but it *might* just be
> possible to get it in.  If feedback were forth coming ... hint hint ...

It would be great IMHO because right now it's pretty much impossible to
get directly json-ld from a triplestore. In fact I want to create a
sample output from my platform as I have to convince some webdevs in two
weeks at a conference.

regards

Adrian


Re: JSON-LD as a Fuseki output format

2013-09-05 Thread Adrian Gschwend
On 21.07.13 15:38, Andy Seaborne wrote:

Hi guys,

> Joachim brings up an issue that I've seen several times now - not
> everything is table shaped - and CONSTRUCT is one way to address that.

was just running into the same issue today, I want to make the
DESCRIBE/CONSTRUCT queries available to classical web developers and
JSON-LD seems to be the way to go now.

> The only things missing are adding in the JSON_LD module and updating
> Fuseki's content negotiation.

sounds great, any progress on that one?

regards

Adrian




Re: Suggested JVM Settings

2013-09-03 Thread Adrian Gschwend
On 03.09.13 16:45, Andy Seaborne wrote:

> I'd expect them to be the same but I don't have any input to confirm
> that.  Do you have any figures?

Problem is that currently I don't run the systems on the same hardware
so it's not possible to directly compare results right now. If I find
some time I will try to do some tests on my data set. I do have some
queries which need quite some time to complete.

I also want to compare them to the DB2 backend which IBM now offers for
Fuseki (instead of TDB). Wonder if it really is faster or if they just
claim it is :)

> I currently think its a windows matter in the handling of mmap files
> (from java).

yeah can imagine that the JVM has to do something there. AFAIK it's
called File Mapping on Windows API.


regards

Adrian


Re: Suggested JVM Settings

2013-09-03 Thread Adrian Gschwend
On 03.09.13 16:26, Andy Seaborne wrote:

> But mmap files do not give the speed up as they do on Linux.  It's been
> reported as being neutral.  It still flexs the memory size and avoids
> the need to size the heap at start-up.

ok thanks for the clarification. Is there a difference between Linux and
other Unixes or should this be pretty much the same? I run Fuseki on
FreeBSD and Solaris as well.

regards

Adrian


Re: Suggested JVM Settings

2013-09-03 Thread Adrian Gschwend
On 03.09.13 14:53, Andy Seaborne wrote:

Hi Andy,

> TDB will use the rest of unused memory via memory mapped files - these
> are not part of the heap and don't count for -Xmx.

Is this true for all platforms or only Unix? Thought you once mentioned
that this is not available on Windows.

regards

Adrian


Re: Restricting Fuseki to localhost

2013-09-02 Thread Adrian Gschwend
On 02.09.13 21:12, Rob Vesse wrote:

Hi Rob,

> It's a known bug - JENA-499
> (https://issues.apache.org/jira/browse/JENA-499)

thanks for the pointer

> The --host argument is not actually supported in past versions of Fuseki
> 
> The fix which is available in the dev snapshots (1.0.0-SNAPSHOT) adds a
> --localhost argument to do this instead.

ok that will do for the moment. Any ETA for 0.2.8?

Andy asks about use-cases to bind to other IPs. While it's not very
common I did have this several times in the past when I had more than
one IP available on a host and I want to restrict the daemon to one
specific IP. Most people will probably never see that but if you have
more complex network setups it is handy.

I would call it --listen though as it's an option to what interface you
listen to.

Regarding IPv6, I'm not sure what is to blame. On my Mac I have
localhost defined in /etc/hosts for both IPv4 and Ipv6. ping resolves
127.0.0.1 while ping6 resolves to ::1.

dig/nslookup/host will not return any of these as they *always* bypass
the hosts file (by design). For that reason some network administrators
return 127.0.0.1 as an A-record for "localhost" in the local domain
which is IMHO plain wrong.

> The documentation has been corrected behind the scenes but the ASF CMS
> does not let us selectively publish parts of the website so the correction
> won't appear on the main site until we are ready to publish it.

ok

regards

Adrian


Restricting Fuseki to localhost

2013-09-02 Thread Adrian Gschwend
Hi group,

According to the Fuseki documentation there is a --host argument for
restricting it to some IPs:

https://jena.apache.org/documentation/serving_data/index.html#running-a-fuseki-server

but:

./fuseki-server --config=fuseki-persistent.ttl --host=127.0.0.1
Unknown argument: host

--port seems to work.

Version: Fuseki 0.2.7 2013-05-11T22:05:51+0100

Do I miss something here?

thanks

Adrian





Re: Jena 2.10.0 : request for pre-release testing

2013-02-12 Thread Adrian Gschwend
On 12.02.13 14:11, Claude Warren wrote:

> Is that information on a webpage somewhere?  I'd like to be able to point
> to it in any posts.

https://mail-archives.apache.org/mod_mbox/jena-users/201302.mbox/%3C511A076A.2030907%40apache.org%3E

:-D

cu

Adrian


Re: Problem with a federated query

2013-02-08 Thread Adrian Gschwend
On 08.02.13 09:29, Andy Seaborne wrote:

Hi Andy,

> Signified says in a comment on that answers.semanticweb.com question
> that Samstagern has a non-integer value.

he was right, just posted this in the answer:

--
You are right, that's what I get from Samstagern:

http://dbpedia.org/resource/Samstagern "0138.046"@en
http://www.richterswil.ch

Same with rapper:

dbpedia-owl:municipalityCode "0138.046"@en ;
But the HTML view says "138 (xsd:integer)", wtf...

Thanks for the hints, again no bug in Fuseki :-)
--

> DBpedia is down, looks like the cluster has lost a machine (careless!),
> so I can't find out what it is.

Not sure if it was DBpedias fault, my SPARQL block removed <> from the
URIs which gives a 404 when you try it. If this is fixed it works here.

cu

Adrian


Re: Problem with a federated query

2013-02-07 Thread Adrian Gschwend
On 07.02.13 19:22, Andy Seaborne wrote:

Hi Andy,

> Can't test but my guess is that the some of the ?munidstr coming back
> from DBpedia are not valid integer strings.

ah didn't think of that.

> That makes:
> 
> BIND(xsd:int(str(?munidstr)) AS ?munid)
> 
> leave ?munid unbound so the rest of the query is unconnected with the
> SERVICE part.  That will mix up the data.

Could this already be the case if some values are strings and some are
indeed integers? IIRC DBpedia was not really constant on this particular
predicate.

cu

Adrian


Problem with a federated query

2013-02-07 Thread Adrian Gschwend
Hi group,

I posted an issue I had with a federated query to DBPedia on
answers.semanticweb.com. Beside a workaround we were debating why my
query does not return properly and we suspect it is either a bug in
Virtuoso or in Fuseki.

The complete query + sample endpoint is described in the posing:

http://answers.semanticweb.com/questions/20843/federated-sparql-query-to-dbpedia-with-local-mapping

So I hope it's ok that I do not repeat that in here.

Versions:

Jena:   VERSION: 2.7.4-SNAPSHOT
Jena:   BUILD_DATE: 20120930-0548
ARQ:VERSION: 2.9.4-SNAPSHOT
ARQ:BUILD_DATE: 20120930-0548
TDB:VERSION: 0.9.4-SNAPSHOT
TDB:BUILD_DATE: 20120930-0548
Fuseki: VERSION: 0.2.5-SNAPSHOT
Fuseki: BUILD_DATE: 20120930-0548

What's your thoughts on the mapping, Fuseki's bug?

cu

Adrian