Re: [ANN] Apache Jena 5.0.0

2024-03-20 Thread Marco Neumann
Thank you Andy for your continued leadership in the Apache Jena project and
the committers and testers for helping to bring the Jena 5 release to the
wider developer and user communities around the world.

All The Best,
Marco



Marco Neumann
Lotico Community Lead
http://www.lotico.com

On Wed, Mar 20, 2024 at 9:06 AM Andy Seaborne  wrote:

> The Apache Jena development community is pleased to
> announce the release of Apache Jena 5.0.0.
>
> In Jena5:
>
> * Minimum Java requirement: Java 17
>
> * Language tags are case-insensitive unique.
>
> * Term graphs for in-memory models
>
> * RRX - New RDF/XML parser
>
> * Remove support for JSON-LD 1.0
>
> * Turtle/Trig Output : default output PREFIX and BASE
>
> * New artifacts : jena-bom and OWASP CycloneDX SBOM
>
> * API deprecation removal
>
> * Dependency updates :
>  Note: slf4j update : v1 to v2 (needs log4j change)
>
> More details below.
>
>  Contributions:
>
> Configurable CORS headers for Fuseki
>From Paul Gallagher
>
> Balduin Landolt @BalduinLandolt - javadoc fix for Literal.getString.
>
> @OyvindLGjesdal - https://github.com/apache/jena/pull/2121 -- text index
> fix
>
> Tong Wang @wang3820 Fix tests due to hashmap order
>
> Explicit Accept headers on RDFConnectionRemote fix
>from @Aklakan
>
> 
>
> All issues in this release:
>  https://s.apache.org/jena-5.0.0-issues
>
> which includes the ones specifically related to Jena5:
>
>https://github.com/apache/jena/issues?q=label%3Ajena5
>
> ** Java Requirement
>
> Java 17 or later is required.
> Java 17 language constructs now are used in the codebase.
>
> Jakarta JavaEE required for deploying the WAR file (Apache Tomcat10)
>
> ** Language tags
>
> Language tags become are case-insensitive unique.
>
> "abc"@EN and "abc"@en are the same RDF term.
>
> Internally, language tags are formatted using the algorithm of RFC 5646.
>
> Examples "@en", "@en-GB", "@en-Latn-GB".
>
> SPARQL LANG(?literal) will return a formatted language tag.
>
> Data stored in TDB using language tags must be reloaded.
>
> ** Term graphs
>
> Graphs are now term graphs in the API or SPARQL. That is, they do not
> match "same value" for some of the Java mapped datatypes. The model API
> already normalizes values written.
>
> TDB1, TDB2 keep their value canonicalization during data loading.
>
> A legacy value-graph implementation can be obtained from GraphMemFactory.
>
> ** RRX - New RDF/XML parser
>
> RRX is the default RDF/XML parser. It is a replacement for ARP.
> RIOT uses RRX.
>
> The ARP parser is still temporarily available for transition assistance.
>
> ** Remove support for JSON-LD 1.0
>
> JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.
>
> https://github.com/filip26/titanium-json-ld
>
> ** Turtle/Trig Output
>
> "PREFIX" and "BASE" are output by default for Turtle and TriG output.
>
> ** Artifacts
>
> There is now a release BOM for Jena artifacts - artifact
> org.apache.jena:jena-bom
>
> There are now OWASP CycloneDX SBOM for Jena artifacts.
> https://github.com/CycloneDX
>
> jena-tdb is renamed jena-tdb1.
>
> jena-jdbc is no longer released
>
> ** Dependencies
>
> The update to slf4j 2.x means the log4j artifact changes to
> "log4j-slf4j2-impl" (was "log4j-slf4j-impl").
>
>
>  API Users
>
> ** Deprecation removal
>
> There has been a clearing out of deprecated functions, methods and
> classes. This includes the deprecations in Jena 4.10.0 added to show
> code that is being removed in Jena5.
>
> ** QueryExecutionFactory
>
> QueryExecutionFactory is simplified to cover commons cases only; it
> becomes a way to call the general QueryExecution builders which are
> preferred and provide all full query execution setup controls.
>
> Local execution builder:
> QueryExecution.create()...
>
> Remote execution builder:
> QueryExecution.service(URL)...
>
> ** QueryExecution variable substitution
>
> Using "substitution", where the query is modified by replacing one or
> more variables by RDF terms, is now preferred to using "initial
> bindings", where query solutions include (var,value) pairs.
>
> "substitution" is available for all queries, local and remote, not just
> local executions.
>
> Rename TDB1 packages org.apache.jena.tdb -> org.apache.jena.tdb1
>
>  Fuseki Users
>
> Fuseki: Uses the Jakarta namespace for servlets and Fuseki has been
> upgraded to use Eclipse Jetty12.
>
> Apache Tomcat10 or 

Re: Requesting advice on Fuseki memory settings

2024-03-11 Thread Marco Neumann
Hi Gaspar,

if you delete data from the graph you do not effectively remove data from
disk. tdb actually keeps the records on the file system.

search the mailing list and you will find a more detailed response from
Andy.

If you want to make sure to keep the database size on disk to a minimum and
if it suits your use case you can physically remove the folder from disk
and reload the dataset.

Read "disk" here as any kind of storage device.

Best,
Marco


On Fri, Mar 8, 2024 at 10:40 AM Gaspar Bartalus  wrote:

> Hi,
>
> Thanks for the responses.
>
> We were actually curious if you'd have some explanation for the
> linear increase in the storage, and why we are seeing differences between
> the actual size of our dataset and the size it uses on disk. (Changes
> between `df -h` and `du -lh`)?
>
> The heap memory has some very minimal peaks, saw-tooth, but otherwise it's
> flat.
>
> Regards,
> Gaspar
>
> On Thu, Mar 7, 2024 at 11:55 PM Andy Seaborne  wrote:
>
> >
> >
> > On 07/03/2024 13:24, Gaspar Bartalus wrote:
> > > Dear Jena support team,
> > >
> > > We would like to ask you to help us in configuring the memory for our
> > > jena-fuseki instance running in kubernetes.
> > >
> > > *We have the following setup:*
> > >
> > > * Jena-fuseki deployed as StatefulSet to a k8s cluster with the
> > > resource config:
> > >
> > > Limits:
> > >   cpu: 2
> > >   memory:  16Gi
> > > Requests:
> > >   cpu: 100m
> > >   memory:  11Gi
> > >
> > > * The JVM_ARGS has the following value: -Xmx10G
> > >
> > > * Our main dataset of type TDB2 contains ~1 million triples.
> > A million triples doesn't take up much RAM even in a memory dataset.
> >
> > In Java, the JVM will grow until it is close to the -Xmx figure. A major
> > GC will then free up a lot of memory. But the JVM does not give the
> > memory back to the kernel.
> >
> > TDB2 does not only use heap space. A heap of 2-4G is usually enough per
> > dataset, sometimes less (data shape depenendent - e.g. many large
> > literals used more space.
> >
> > Use a profiler to examine the heap in-use, you'll probably see a
> > saw-tooth shape.
> > Force a GC and see the level of in-use memory afterwards.
> > Add some safety margin and work space for requests and try that as the
> > heap size.
> >
> > > *  We execute the following type of UPDATE operations:
> > >- There are triggers in the system (e.g. users of the application
> > > changing the data) which start ~50 other update operations containing
> > > up to ~30K triples. Most of them run in parallel, some are delayed
> > > with seconds or minutes.
> > >- There are scheduled UPDATE operations (executed on hourly basis)
> > > containing 30K-500K triples.
> > >- These UPDATE operations usually delete and insert the same amount
> > > of triples in the dataset. We use the compact API as a nightly job.
> > >
> > > *We are noticing the following behaviour:*
> > >
> > > * Fuseki consumes 5-10G of heap memory continuously, as configured in
> > > the JVM_ARGS.
> > >
> > > * There are points in time when the volume usage of the k8s container
> > > starts to increase suddenly. This does not drop even though compaction
> > > is successfully executed and the dataset size (triple count) does not
> > > increase. See attachment below.
> > >
> > > *Our suspicions:*
> > >
> > > * garbage collection in Java is often delayed; memory is not freed as
> > > quickly as we would expect it, and the heap limit is reached quickly
> > > if multiple parallel queries are run
> > > * long running database queries can send regular memory to Gen2, that
> > > is not actively cleaned by the garbage collector
> > > * memory-mapped files are also garbage-collected (and perhaps they
> > > could go to Gen2 as well, using more and more storage space).
> > >
> > > Could you please explain the possible reasons behind such a behaviour?
> > > And finally could you please suggest a more appropriate configuration
> > > for our use case?
> > >
> > > Thanks in advance and best wishes,
> > > Gaspar Bartalus
> > >
> >
>


-- 


---
Marco Neumann


Re: GeoSparql example?

2023-12-02 Thread Marco Neumann
Did that spatial SPARQL query work for you Claude?

Marco

On Fri, Dec 1, 2023 at 8:08 PM Claude Warren  wrote:

> can you give me an example of a query?
>
> On Fri, Dec 1, 2023, 19:14 Marco Neumann  wrote:
>
> > just go ahead you are almost there
> >
> >  wkt:asWKT "Polygon (( -5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5
> > -5.5  ))"^^wkt:wktLiteral
> >
> > same with the LINESTRING
> >
> > Marco
> >
> > On Fri, Dec 1, 2023 at 6:03 PM Claude Warren  wrote:
> >
> > > I am playing with GeoSparql for the fist time and I am trying to find
> an
> > > example of how to format the data.
> > >
> > > I have a polygon:
> > > POLYGON ((-5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5 -5.5))
> > >
> > > and a linestring:
> > > LINESTRING (-1 -3, -1 -2)
> > >
> > > Using the jena-geosparql module what is the SPARQL insert statement to
> > > place the polygon into a model or dataset?
> > >
> > > Once the polygon is in, what is the query that will do the equivalent
> of
> > > the jst Geometry.iswithinDistance between  the Linestring and the
> > Polygon?
> > >
> > > Thanks,
> > > Claude
> > >
> > > --
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >
> >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> >
>


-- 


---
Marco Neumann


Re: GeoSparql example?

2023-12-02 Thread Marco Neumann
Discreet Global Grid System?

Let's discuss this separately to see if we can get Apache Jena GeoSPARQL up
to GeoSPARQL 1.1 conformance in the near future. Maybe there are already
some contributions in the community that could be integrated into the
Apache project.

Marco

On Sat, Dec 2, 2023 at 12:55 PM Nicholas Car  wrote:

> Well no other system that we know of offers GeoSPARQL 1.1 support either
> as the new version really is very new!
>
> Most of the examples in that documentation are fine for GeoSPARQL 1.0 as
> part of the update was just to make better documentation.
>
> I don’t think it will be hard for Jena to support the majority of
> GeoSPARQL 1.1. The hard bit is DGGS and that is called out as a separate
> conformance class for that reason. Most of the additions are minor: things
> like ensuring full Simple Features spatial relations calculations work.
>
> Nick
>
> On Sat, Dec 2, 2023 at 6:58 pm, Marco Neumann <[marco.neum...@gmail.com
> ](mailto:On Sat, Dec 2, 2023 at 6:58 pm, Marco Neumann < wrote:
>
> > Nick, we only support GeoSPARQL 1.0 at this point in time in the Jena
> > project with some extensions that predate the OGC effort.
> >
> > Marco
> >
> > On Sat, Dec 2, 2023 at 4:36 AM Nicholas Car  wrote:
> >
> >> GeoSPARQL 1.1 is now approved by the OGC and its specification document
> >> contains many more examples than GeoSPARQL 1.0:
> >>
> >> https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html
> >>
> >> Nick
> >>
> >>
> >>
> >> On Saturday, 2 December 2023 at 6:39 AM, Marco Neumann <
> >> marco.neum...@gmail.com> wrote:
> >>
> >>
> >> > PREFIX spatial:http://jena.apache.org/spatial#
> >> >
> >> > PREFIX units: http://www.opengis.net/def/uom/OGC/1.0/
> >> >
> >> >
> >> > SELECT *
> >> > WHERE{
> >> > ?object spatial:nearby(5 5 10 units:kilometer).
> >> > }
> >> >
> >> > On Fri, Dec 1, 2023 at 8:08 PM Claude Warren cla...@xenei.com wrote:
> >> >
> >> > > can you give me an example of a query?
> >> > >
> >> > > On Fri, Dec 1, 2023, 19:14 Marco Neumann marco.neum...@gmail.com
> >> wrote:
> >> > >
> >> > > > just go ahead you are almost there
> >> > > >
> >> > > > wkt:asWKT "Polygon (( -5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5,
> >> -5.5
> >> > > > -5.5 ))"^^wkt:wktLiteral
> >> > > >
> >> > > > same with the LINESTRING
> >> > > >
> >> > > > Marco
> >> > > >
> >> > > > On Fri, Dec 1, 2023 at 6:03 PM Claude Warren cla...@xenei.com
> wrote:
> >> > > >
> >> > > > > I am playing with GeoSparql for the fist time and I am trying to
> >> find
> >> > > > > an
> >> > > > > example of how to format the data.
> >> > > > >
> >> > > > > I have a polygon:
> >> > > > > POLYGON ((-5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5
> -5.5))
> >> > > > >
> >> > > > > and a linestring:
> >> > > > > LINESTRING (-1 -3, -1 -2)
> >> > > > >
> >> > > > > Using the jena-geosparql module what is the SPARQL insert
> >> statement to
> >> > > > > place the polygon into a model or dataset?
> >> > > > >
> >> > > > > Once the polygon is in, what is the query that will do the
> >> equivalent
> >> > > > > of
> >> > > > > the jst Geometry.iswithinDistance between the Linestring and the
> >> > > > > Polygon?
> >> > > > >
> >> > > > > Thanks,
> >> > > > > Claude
> >> > > > >
> >> > > > > --
> >> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> >> > > >
> >> > > > --
> >> > > >
> >> > > > ---
> >> > > > Marco Neumann
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> >
> >> > ---
> >> > Marco Neumann
> >>
> >
> > --
> >
> > ---
> > Marco Neumann



-- 


---
Marco Neumann


Re: GeoSparql example?

2023-12-02 Thread Marco Neumann
Nick, we only support GeoSPARQL 1.0 at this point in time in the Jena
project with some extensions that predate the OGC effort.

Marco



On Sat, Dec 2, 2023 at 4:36 AM Nicholas Car  wrote:

> GeoSPARQL 1.1 is now approved by the OGC and its specification document
> contains many more examples than GeoSPARQL 1.0:
>
> https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html
>
> Nick
>
>
>
> On Saturday, 2 December 2023 at 6:39 AM, Marco Neumann <
> marco.neum...@gmail.com> wrote:
>
>
> > PREFIX spatial:http://jena.apache.org/spatial#
> >
> > PREFIX units: http://www.opengis.net/def/uom/OGC/1.0/
> >
> >
> > SELECT *
> > WHERE{
> > ?object spatial:nearby(5 5 10 units:kilometer).
> > }
> >
> > On Fri, Dec 1, 2023 at 8:08 PM Claude Warren cla...@xenei.com wrote:
> >
> > > can you give me an example of a query?
> > >
> > > On Fri, Dec 1, 2023, 19:14 Marco Neumann marco.neum...@gmail.com
> wrote:
> > >
> > > > just go ahead you are almost there
> > > >
> > > > wkt:asWKT "Polygon (( -5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5,
> -5.5
> > > > -5.5 ))"^^wkt:wktLiteral
> > > >
> > > > same with the LINESTRING
> > > >
> > > > Marco
> > > >
> > > > On Fri, Dec 1, 2023 at 6:03 PM Claude Warren cla...@xenei.com wrote:
> > > >
> > > > > I am playing with GeoSparql for the fist time and I am trying to
> find
> > > > > an
> > > > > example of how to format the data.
> > > > >
> > > > > I have a polygon:
> > > > > POLYGON ((-5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5 -5.5))
> > > > >
> > > > > and a linestring:
> > > > > LINESTRING (-1 -3, -1 -2)
> > > > >
> > > > > Using the jena-geosparql module what is the SPARQL insert
> statement to
> > > > > place the polygon into a model or dataset?
> > > > >
> > > > > Once the polygon is in, what is the query that will do the
> equivalent
> > > > > of
> > > > > the jst Geometry.iswithinDistance between the Linestring and the
> > > > > Polygon?
> > > > >
> > > > > Thanks,
> > > > > Claude
> > > > >
> > > > > --
> > > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > > >
> > > > --
> > > >
> > > > ---
> > > > Marco Neumann
> >
> >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
>


-- 


---
Marco Neumann


Re: GeoSparql example?

2023-12-01 Thread Marco Neumann
PREFIX spatial:<http://jena.apache.org/spatial#>
PREFIX units: <http://www.opengis.net/def/uom/OGC/1.0/>

SELECT *
WHERE{
 ?object spatial:nearby(5 5 10 units:kilometer).
}

On Fri, Dec 1, 2023 at 8:08 PM Claude Warren  wrote:

> can you give me an example of a query?
>
> On Fri, Dec 1, 2023, 19:14 Marco Neumann  wrote:
>
> > just go ahead you are almost there
> >
> >  wkt:asWKT "Polygon (( -5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5
> > -5.5  ))"^^wkt:wktLiteral
> >
> > same with the LINESTRING
> >
> > Marco
> >
> > On Fri, Dec 1, 2023 at 6:03 PM Claude Warren  wrote:
> >
> > > I am playing with GeoSparql for the fist time and I am trying to find
> an
> > > example of how to format the data.
> > >
> > > I have a polygon:
> > > POLYGON ((-5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5 -5.5))
> > >
> > > and a linestring:
> > > LINESTRING (-1 -3, -1 -2)
> > >
> > > Using the jena-geosparql module what is the SPARQL insert statement to
> > > place the polygon into a model or dataset?
> > >
> > > Once the polygon is in, what is the query that will do the equivalent
> of
> > > the jst Geometry.iswithinDistance between  the Linestring and the
> > Polygon?
> > >
> > > Thanks,
> > > Claude
> > >
> > > --
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >
> >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> >
>


-- 


---
Marco Neumann


Re: GeoSparql example?

2023-12-01 Thread Marco Neumann
just go ahead you are almost there

 wkt:asWKT "Polygon (( -5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5
-5.5  ))"^^wkt:wktLiteral

same with the LINESTRING

Marco

On Fri, Dec 1, 2023 at 6:03 PM Claude Warren  wrote:

> I am playing with GeoSparql for the fist time and I am trying to find an
> example of how to format the data.
>
> I have a polygon:
> POLYGON ((-5.5 -5.5, -4.5 -5.5, -4.5 -4.5, -5.5 -4.5, -5.5 -5.5))
>
> and a linestring:
> LINESTRING (-1 -3, -1 -2)
>
> Using the jena-geosparql module what is the SPARQL insert statement to
> place the polygon into a model or dataset?
>
> Once the polygon is in, what is the query that will do the equivalent of
> the jst Geometry.iswithinDistance between  the Linestring and the Polygon?
>
> Thanks,
> Claude
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-25 Thread Marco Neumann
I was looking for an IRI validator and this one didn't come up in the
search engines. This service might need a bit more visibility and some
incoming links.

Marco

On Sat, Nov 25, 2023 at 1:34 PM Andy Seaborne  wrote:

>
>
> On 24/11/2023 10:05, Marco Neumann wrote:
> > (side note) preferably the local name of a URI should not start with a
> > number but a letter or underscore.
>
> It's a hangover from XML QNames.
>
> Turtle doesn't care.
>
> Style-wise, yes, avoid an initial number.
>
> > What do you mean by human-readable here? For large technical systems it's
> > simply not feasible to encode meaning into the URI and I might even
> > consider it an anti-pattern.
> >
> > There are some community efforts that have introduced single letters and
> > number sequences for vocabulary development like CIDOC CRM which was
> later
> > also adopted by community projects like wikidata. But instance data
> > typically doesn't have that requirement and can be random but has to be
> > syntax compliant of course.
> >
> > I am sure Andy can elaborate on the details of the encoding here.
>
> There's an online IRI validator.
>
> https://sparql.org/iri-validator.html
>
> using the jena-iri package.
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
Martynas, I think you have to go way back in time to fully appreciate the
anchor reference and its "interference" with URI local names. :)

Fundamentally URIs as identifiers are not meant to be retrieved as such
Laura. So a web browser is not designed to follow the implicit "physical"
link of an identifier.

To "browse" URIs as identifiers only you need a RDF browser or plugin that
may dereference documents from objects for display as URLs.

Marco


On Fri, Nov 24, 2023 at 1:55 PM Martynas Jusevičius 
wrote:

> On Fri, Nov 24, 2023 at 12:50 PM Laura Morales  wrote:
> >
> > > If you want a page for every book, don't use fragment URIs. Use
> > > http://example.org/book/1 or http://example.org/book/1#this instead of
> > >  http://example.org/book#1.
> >
> > yes yes I agree with this. I only tried to present an example of yet
> another "quirk" between raw data and browsers (where this kind of data is
> supposed to be used).
>
> Still don't understand the problem :) http://example.org/book#1
> uniquely identifies a resource, but you'll need to get the whole
> http://example.org/book document to retrieve it. That's just how HTTP
> works.
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
The URI syntax is defined by the Internet Engineering Task Force (IETF) in
RFC 3986.

W3C RDF is just a rule-taker here ;)

https://datatracker.ietf.org/doc/html/rfc3986

Marco

On Fri, Nov 24, 2023 at 10:36 AM Laura Morales  wrote:

> > What do you mean by human-readable here? For large technical systems it's
> > simply not feasible to encode meaning into the URI and I might even
> > consider it an anti-pattern.
>
> This is my problem. I do NOT want to encode any meaning into URLs, but I
> do want them to be human readable simply because I) properties are URLs
> too, 2) they can be used online, and 3) they are simpler to work with, for
> example editing in a Turtle file or writing a query.
>
> :alice :knows :bobvs:dsa7hdsahdsa782j :d93ifg75jgueeywu
> :s93oeirugj290sjf
>
> I can avoid [ entirely, but it rises the question of what other characters
> I MUST avoid.
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
(side note) preferably the local name of a URI should not start with a
number but a letter or underscore.

What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.

There are some community efforts that have introduced single letters and
number sequences for vocabulary development like CIDOC CRM which was later
also adopted by community projects like wikidata. But instance data
typically doesn't have that requirement and can be random but has to be
syntax compliant of course.

I am sure Andy can elaborate on the details of the encoding here.




On Fri, Nov 24, 2023 at 9:31 AM Laura Morales  wrote:

> Thank you a lot. FILTER(STR(?id) = "...") works, as suggested by Andy. I
> do recognize though that it is a hack, and that URLs should probably not
> have a [.
>
> But now I have trouble understanding UTF8 addresses. I would use random
> alphanumeric URLs everywhere if I could, or I would %-encode everything.
> But nodes IDs (URLs) are supposed to be valid, human-readable URLs because
> they're used online. Jena, and browsers, work fine with IRIs (which are
> UTF8), but the way special characters are used is not the same. For example
> it's perfectly fine in my graph to have a URL fragment, such as
> http://example.org/foo#bar but these URLs are not usable with a browser
> because the fragment is a local reference (local to the browser) that is
> not sent to the server. Which means in practice, that if I want to stay out
> of trouble I should not create a graph with IDs
>
> http://example.org/book#1
> http://example.org/book#2
> http://example.org/book#3
>
> in the case that I want to use these URLs with a web browser. Viceversa,
> browsers are perfectly fine with a [ in the path, but Jena is stricter.
>
> So, if I want to use UTF8 addresses (IRIs) in my graph, and if I don't
> want to %-encode them because I want them to be human-readbale (also
> because they are much easier to read/edit manually), what is the list of
> characters that MUST be %-encoded?
>
>
> > Sent: Friday, November 24, 2023 at 9:55 AM
> > From: "Marco Neumann" 
> > To: users@jena.apache.org
> > Subject: Re: Querying URL with square brackets
> >
> > Laura, see jena issue #2102
> > https://github.com/apache/jena/issues/2102
> >
> > Marco
>


-- 


---
Marco Neumann


Re: Querying URL with square brackets

2023-11-24 Thread Marco Neumann
Laura, see jena issue #2102
https://github.com/apache/jena/issues/2102

Marco

On Fri, Nov 24, 2023 at 7:12 AM Laura Morales  wrote:

> I have a few URLs containing square brackets like
> http://example.org/foo[1]bar
> I can create a TDB2 dataset without much problems, with warnings but no
> errors. I can also query these nodes "indirectly", that is if I query them
> by some property and not by URI. My problem is that I cannot query them
> directly by URI. As soon as I try to use the URIs explicitly in a query,
> for example "DESCRIBE <http://example.org/foo[1]bar>", I receive this
> error
>
> ERROR SPARQL  :: [line: 1, col: 10] Bad IRI: '
> http://example.org/foo[1]bar': <http://example.org/foo[1]bar> Code:
> 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for
> URIs/IRIs.
>
> I tried escaping, "foo\[1\]bar" but it doesn't work.
> I tried converting from a string, FILTER(?id = URI("
> http://example.org/foo[1]bar;)) but it doesn't work
> What else could I try?
>


-- 


---
Marco Neumann


Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Marco Neumann
chatGPT has made an interesting and useful attempt for a change

[image: image.png]


On Fri, Nov 10, 2023 at 9:03 PM Andy Seaborne  wrote:

>
>
> On 10/11/2023 20:35, Marco Neumann wrote:
> > On Fri, Nov 10, 2023 at 5:51 PM Andy Seaborne  wrote:
> >
> >>
> >>
> >> On 10/11/2023 12:33, Marco Neumann wrote:
> >>> Should DELETE {URI URI * } not update all matching graph patterns?
> >>
> >> No.
> >> (and that's bad syntax)
> >>
> >>> I had a case where only DELETE {URI URI NODE } did execute the update
> in
> >>> the dataset/graph/query fuseki UI.
> >>>
> >>> To be precise it is a DELETE INSERT combination with an empty WHERE
> >> clause.
> >>>
> >>> DELETE {pattern} INSERT{pattern} WHERE{ }
> >>
> >> the "pattern" is used as a template.
> >> DELETE {template} INSERT {template} WHERE {pattern}
> >>
> >> If the template has variables, these variables must be set by the WHERE
> >> clause. Otherwise triple patterns with unbound variables are skipped.
> >>
> >
> > OK, yes I think this is my case, an unbound variable was used in the
> > template, the "Update Success" tricked me into believing that the data
> was
> > actually removed.
>
> "Update Success" means "executed as per spec" :-)
>
> It's the same rule as CONSTRUCT which skips triples with any unbound
> variables.
>
>  Andy
>
> >>
> >> There is no pattern matching  in a template.
> >>
> >> There is a short form DELETE WHERE { pattern } which is
> >> DELETE { pattern } WHERE {pattern}, using the pattern as the template.
> >>
> >>   Andy
> >>
> >>>
> >>> Marco
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann


Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Marco Neumann
ok I see, yes that (*) was just pseudocode.

Thanks Andy.


On Fri, Nov 10, 2023 at 9:00 PM Andy Seaborne  wrote:

>
>
> On 10/11/2023 18:19, Marco Neumann wrote:
> > On Fri, Nov 10, 2023 at 5:51 PM Andy Seaborne  wrote:
> >
> >>
> >>
> >> On 10/11/2023 12:33, Marco Neumann wrote:
> >>> Should DELETE {URI URI * } not update all matching graph patterns?
> >>
> >> No.
> >> (and that's bad syntax)
> >>
> >
> > DELETE {  ?x } is bad syntax?
>
> "*" is bad syntax.
>
> DELETE {  ?x } is bad syntax for another reason - there must
> be a WHERE.
>
> >
> >
> >>> I had a case where only DELETE {URI URI NODE } did execute the update
> in
> >>> the dataset/graph/query fuseki UI.
> >>>
> >>> To be precise it is a DELETE INSERT combination with an empty WHERE
> >> clause.
> >>>
> >>> DELETE {pattern} INSERT{pattern} WHERE{ }
> >>
> >> the "pattern" is used as a template.
> >> DELETE {template} INSERT {template} WHERE {pattern}
> >>
> >> If the template has variables, these variables must be set by the WHERE
> >> clause. Otherwise triple patterns with unbound variables are skipped.
> >>
> >> There is no pattern matching  in a template.
> >>
> >> There is a short form DELETE WHERE { pattern } which is
> >> DELETE { pattern } WHERE {pattern}, using the pattern as the template.
> >>
> >>   Andy
> >>
> >>>
> >>> Marco
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann


Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Marco Neumann
On Fri, Nov 10, 2023 at 5:51 PM Andy Seaborne  wrote:

>
>
> On 10/11/2023 12:33, Marco Neumann wrote:
> > Should DELETE {URI URI * } not update all matching graph patterns?
>
> No.
> (and that's bad syntax)
>
> > I had a case where only DELETE {URI URI NODE } did execute the update in
> > the dataset/graph/query fuseki UI.
> >
> > To be precise it is a DELETE INSERT combination with an empty WHERE
> clause.
> >
> > DELETE {pattern} INSERT{pattern} WHERE{ }
>
> the "pattern" is used as a template.
> DELETE {template} INSERT {template} WHERE {pattern}
>
> If the template has variables, these variables must be set by the WHERE
> clause. Otherwise triple patterns with unbound variables are skipped.
>

OK, yes I think this is my case, an unbound variable was used in the
template, the "Update Success" tricked me into believing that the data was
actually removed.


>
> There is no pattern matching  in a template.
>
> There is a short form DELETE WHERE { pattern } which is
> DELETE { pattern } WHERE {pattern}, using the pattern as the template.
>
>  Andy
>
> >
> > Marco
> >
>


-- 


---
Marco Neumann


Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Marco Neumann
On Fri, Nov 10, 2023 at 5:51 PM Andy Seaborne  wrote:

>
>
> On 10/11/2023 12:33, Marco Neumann wrote:
> > Should DELETE {URI URI * } not update all matching graph patterns?
>
> No.
> (and that's bad syntax)
>

DELETE {  ?x } is bad syntax?


> > I had a case where only DELETE {URI URI NODE } did execute the update in
> > the dataset/graph/query fuseki UI.
> >
> > To be precise it is a DELETE INSERT combination with an empty WHERE
> clause.
> >
> > DELETE {pattern} INSERT{pattern} WHERE{ }
>
> the "pattern" is used as a template.
> DELETE {template} INSERT {template} WHERE {pattern}
>
> If the template has variables, these variables must be set by the WHERE
> clause. Otherwise triple patterns with unbound variables are skipped.
>
> There is no pattern matching  in a template.
>
> There is a short form DELETE WHERE { pattern } which is
> DELETE { pattern } WHERE {pattern}, using the pattern as the template.
>
>  Andy
>
> >
> > Marco
> >
>


-- 


---
Marco Neumann


Semantics of SPARQL Update Delete

2023-11-10 Thread Marco Neumann
Should DELETE {URI URI * } not update all matching graph patterns?

I had a case where only DELETE {URI URI NODE } did execute the update in
the dataset/graph/query fuseki UI.

To be precise it is a DELETE INSERT combination with an empty WHERE clause.

DELETE {pattern} INSERT{pattern} WHERE{ }

Marco

-- 


---
Marco Neumann


Re: Transactions over http (fuseki)

2023-08-17 Thread Marco Neumann
Caspar, while you are looking into the extensions how would you like to use
transactions in your use case?

Marco

On Thu, 17 Aug 2023 at 13:44, Gaspar Bartalus  wrote:

> Hi Lorenz,
>
> Thanks for the quick response. That sounds indeed very promising.
> Looking forward to knowing more details about the fuseki extension
> mechanism, or a transaction module in particular.
>
> Gaspar
>
> On Thu, Aug 17, 2023 at 9:17 AM Lorenz Buehmann <
> buehm...@informatik.uni-leipzig.de> wrote:
>
> > Hi,
> >
> > that is an open issue in the SPARQL standard and Andy already opened a
> > ticket [1] regarding this maybe w.r.t. an upcoming SPARQL 1.2
> >
> > I think mixed query types are still not possible via standard Fuseki in
> > a single transaction, but indeed an extension like you're planning
> > should be possible. Andy is already working on a newer Fuseki extension
> > mechanism (it's basically already there) where you can plugin so-called
> > Fuseki modules. This would be the way I'd try to add this extension to
> > Fuseki.
> >
> > Indeed, Andy knows better and can give you more specific code or
> > pointers - maybe he even has such a module or code part implemented
> > somewhere.
> >
> >
> > Regards,
> >
> > Lorenz
> >
> >
> > [1] https://github.com/w3c/sparql-dev/issues/83
> >
> > On 16.08.23 17:20, Gaspar Bartalus wrote:
> > > Hi,
> > >
> > > We’ve been using jena-fuseki to store and interact with RDF data by
> > running
> > > queries over the http endpoints.
> > > We are now facing the challenge to use transactional operations on the
> > > triple store, i.e. running multiple sparql queries (both select and
> > update
> > > queries) in a single transaction.
> > > I would like to ask what your suggestion might be to achieve this.
> > >
> > > The idea we have in mind is to extend jena-fuseki with new http
> endpoints
> > > for handling transactions.
> > > Would this be technically feasible, i.e. could we reach the internal
> > > transaction API (store API?) from jena-fuseki?
> > > Would you agree with this approach conceptually, or would you recommend
> > > something different?
> > >
> > > Thanks in advance,
> > > Gaspar
> > >
> > > PS: Sorry for the duplicate, I have the feeling that my other email
> > address
> > > is blocked somehow.
> > >
> > --
> > Lorenz Bühmann
> > Research Associate/Scientific Developer
> >
> > Email buehm...@infai.org
> >
> > Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109
> > Leipzig | Germany
> >
> >
>
-- 


---
Marco Neumann


Re: Mystery memory leak in fuseki

2023-07-12 Thread Marco Neumann
Thanks Dave, I am not familiar with Prometheus JVM metrics but I gather
it's an open source solution that you have coupled with grafana for
visualization.  I will have a look into this.

Best,
Marco

On Tue, Jul 11, 2023 at 9:32 AM Dave Reynolds 
wrote:

> Hi Marco,
>
> On 11/07/2023 09:04, Marco Neumann wrote:
> > Dave, can you say a bit more about the profiling methodology? Are you
> using
> > a tool such as VisualVM to collect the data? Or do you just use the
> system
> > monitor?
>
> The JVM metrics here are from prometheus scanning the metrics exposed by
> fuseki via the built in micrometer (displayed use grafana). They give a
> *lot* of details on things like GC behaviour etc which I'm not showing.
>
> Ironically the only thing this fuseki was doing when it died originally
> was supporting these metric scans, and the health check ping.
>
> The overall memory curve is picked up by telegraph scanning the OS level
> stats for the docker processes (collected via influx DB and again
> displayed in grafana). These are what you would get with e.g. top on the
> machine or a system monitor but means we have longer term records which
> we access remotely. When I quoted 240K RES, 35K shared that was actually
> just top on the machine.
>
> When running locally can also use things like jconsole or visualVM but
> I actually find the prometheus + telegraph metrics we have in our
> production monitoring more detailed and easier to work with. We run lots
> of services so the monitoring and alerting stack, while all industry
> standard, has been a life saver for us.
>
> For doing the debugging locally I also tried setting the JVM flags to
> enable finer grain native memory tracking and use jcmd (in a scripted
> loop) to pull out those more detailed metrics. Though they are not that
> much more detailed than the micrometer/prometheus metrics.
> That use of jcmd and the caution on how to interpret RES came from the
> blog item I mentioned earlier:
> https://poonamparhar.github.io/troubleshooting_native_memory_leaks/
>
> For the memory leak checking I used valgrind but there's lots of others.
>
> Dave
>
> >
> > Marco
> >
> > On Tue, Jul 11, 2023 at 8:57 AM Dave Reynolds  >
> > wrote:
> >
> >> For interest[*] ...
> >>
> >> This is what the core JVM metrics look like when transitioning from a
> >> Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
> >> to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
> >> on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
> >> asked any queries yet.
> >>
> >>
> >>
> https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0
> >>
> >> Here' the same metrics around the time of triggering a TDB backup. Shows
> >> the mapped buffer use for TDB but no significant impact on heap etc.
> >>
> >>
> >>
> https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0
> >>
> >> These are all on the same instance as the RES memory trace:
> >>
> >>
> >>
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >>
> >> Dave
> >>
> >> [*] I've been staring and metric graphs for so many days I may have a
> >> distorted notion of what's interesting :)
> >>
> >> On 11/07/2023 08:39, Dave Reynolds wrote:
> >>> After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
> >>> production, containerized, environment then it is indeed very stable.
> >>>
> >>> Running at less that 6% of memory on 4GB machine compared to peaks of
> >>> ~50% for versions with Jetty 10. RES shows as 240K with 35K shared
> >>> (presume mostly libraries).
> >>>
> >>> Copy of trace is:
> >>>
> >>
> https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0
> >>>
> >>> The high spikes on left of image are the prior run on with out of the
> >>> box 4.7.0 on same JVM.
> >>>
> >>> The small spike at 06:00 is a dump so TDB was able to touch and scan
> all
> >>> the (modest) data with very minor blip in resident size (as you'd
> hope).
> >>> JVM stats show the mapped buffers for TDB jumping up but confirm heap
> is
> >>> stable at < 60M, non-heap 60M.
> >>>
> >>> Da

Re: Mystery memory leak in fuseki

2023-07-11 Thread Marco Neumann
 >> I realise Jetty 9.4.x is out of community support but eclipse say EOL
> >> is "unlikely to happen before 2025". So, while this may not be a
> >> solution for the Jena project, it could give us a workaround at the
> >> cost of doing custom builds.
> >>
> >> Dave
> >>
> >>
> >> On 03/07/2023 14:20, Dave Reynolds wrote:
> >>> We have a very strange problem with recent fuseki versions when
> >>> running (in docker containers) on small machines. Suspect a jetty
> >>> issue but it's not clear.
> >>>
> >>> Wondering if anyone has seen anything like this.
> >>>
> >>> This is a production service but with tiny data (~250k triples, ~60MB
> >>> as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].
> >>>
> >>> We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
> >>> support) with no problems.
> >>>
> >>> Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
> >>> a day or so to reach ~3GB of memory at which point the 4GB machine
> >>> becomes unviable and things get OOM killed.
> >>>
> >>> The strange thing is that this growth happens when the system is
> >>> answering no Sparql queries at all, just regular health ping checks
> >>> and (prometheus) metrics scrapes from the monitoring systems.
> >>>
> >>> Furthermore the space being consumed is not visible to any of the JVM
> >>> metrics:
> >>> - Heap and and non-heap are stable at around 100MB total (mostly
> >>> non-heap metaspace).
> >>> - Mapped buffers stay at 50MB and remain long term stable.
> >>> - Direct memory buffers being allocated up to around 500MB then being
> >>> reclaimed. Since there are no sparql queries at all we assume this is
> >>> jetty NIO buffers being churned as a result of the metric scrapes.
> >>> However, this direct buffer behaviour seems stable, it cycles between
> >>> 0 and 500MB on approx a 10min cycle but is stable over a period of
> >>> days and shows no leaks.
> >>>
> >>> Yet the java process grows from an initial 100MB to at least 3GB.
> >>> This can occur in the space of a couple of hours or can take up to a
> >>> day or two with no predictability in how fast.
> >>>
> >>> Presumably there is some low level JNI space allocated by Jetty (?)
> >>> which is invisible to all the JVM metrics and is not being reliably
> >>> reclaimed.
> >>>
> >>> Trying 4.6.0, which we've had less problems with elsewhere, that
> >>> seems to grow to around 1GB (plus up to 0.5GB for the cycling direct
> >>> memory buffers) and then stays stable (at least on a three day soak
> >>> test). We could live with allocating 1.5GB to a system that should
> >>> only need a few 100MB but concerned that it may not be stable in the
> >>> really long term and, in any case, would rather be able to update to
> >>> more recent fuseki versions.
> >>>
> >>> Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
> >>> keeps ticking up slowly at random intervals. We project that it would
> >>> take a few weeks to grow the scale it did under java 11 but it will
> >>> still eventually kill the machine.
> >>>
> >>> Anyone seem anything remotely like this?
> >>>
> >>> Dave
> >>>
> >>> [1]  500M heap may be overkill but there can be some complex queries
> >>> and that should still leave plenty of space for OS buffers etc in the
> >>> remaining memory on a 4GB machine.
> >>>
> >>>
> >>>
>


-- 


---
Marco Neumann


Re: Binary literals

2023-05-04 Thread Marco Neumann
but don't just think of jena-geosparql as only an implementation of
presdefined geospatial features, Nick. Jena spatial and the geosparql query
enhancements have been developed long before there was an OGC GeoSPARQL
whitepaper and later  reference specification.

Many of the features in jena go beyond the current OGC discussion. Whether
or not future recommendations that may come out of the OGC discussion can
be implemented or not is a question of manpower in the jena community
(which is a group of volunteers after all).

Best,
Marco


On Thu, May 4, 2023 at 10:57 AM Nicholas Car  wrote:

> Hi Rob,
>
> Thanks for this: it is pretty much as I thought!
>
> I think we will be able to cater for WKB then in GeoSPARQL 1.3 with just
> hex encoding of the value and ^^geo:wkbLiteral and then, as you say,
> implementers, like Jena-geosparql, can just read the hex into their spatial
> indexes one-time.
>
> I see little value in this other than meeting an allowed data type in the
> Simple Features standard, then again, I see little value in KML and other
> existing, allowed, formats too!
>
> Cheers, Nick
>
>
>
>
> --- Original Message ---
> On Thursday, May 4th, 2023 at 18:30, Rob @ DNR 
> wrote:
>
>
> > Well, the RDF specifications fundamentally define RDF literals to be the
> following:
> >
> > * a lexical form, being a Unicode [UNICODEhttps://
> www.w3.org/TR/rdf11-concepts/#bib-UNICODE] string, which should be in
> Normal Form C [NFChttps://www.w3.org/TR/rdf11-concepts/#bib-NFC],
> >
> > https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
> >
> > So, you are effectively forced to use some sort of string-based encoding
> of the binary data to represent any literal, whether that underlying
> datatype is truly binary data.
> >
> > Now in principle you could define a custom implementation of the
> LiteralLabel interface that stores the value as true binary, i.e. byte[],
> and only materialises it into a string encoding when absolutely necessary.
> This could then be used to create instances via
> NodeFactory.create(LiteralLabel).
> >
> > However, data into and out of the system is generally going to be via a
> RDF serialisation, which again will require string encoding or decoding as
> appropriate. And the parsers don’t really care about datatypes so your
> custom implementation wouldn’t get used. Thus, whether a custom
> LiteralLabel would actually gain you anything would depend on how the data
> is coming into the system and how you consume it. If the data is coming in
> via some programmatic means that isn’t parsing serialised RDF then maybe
> but I don’t think it would gain you much.
> >
> > For spatial indexing generally the approach of a GeoSPARQL
> implementation is to build the spatial index up-front so you’d only pay the
> cost of the string to binary decoding once when the index was first built
> from the RDF data. The spatial index is going to convert the incoming
> geo-data into its own internal index structures that will be very efficient
> to access, at which point whether the binary data was originally string
> encoded is irrelevant.
> >
> > Regards,
> >
> > Rob Vesse
> >
> > From: Nicholas Car n...@kurrawong.net
> >
> > Date: Wednesday, 3 May 2023 at 23:22
> > To: users@jena.apache.org users@jena.apache.org
> >
> > Subject: Re: Binary literals
> > I see Base64 is an XSD option too, but I’m most interested in “true”
> binary, as opposed to binary-as-text options, and whether any exist!
> >
> > Nick
> >
> > On Thu, May 4, 2023 at 8:13 am, Nicholas Car <[n...@kurrawong.net
> ](mailto:On Thu, May 4, 2023 at 8:13 am, Nicholas Car < wrote:
> >
> > > Dear Jena users,
> > >
> > > How can I store binary literals in RDF and in Jena/Fuseki?
> > >
> > > There is xsd:hexBinary for arbitrary binary data but is there a
> better/more efficient/another way to store binary literals in Jena?
> > >
> > > The reason I ask is that a future version of GeoSPARQL might want to
> include WKB - Well-Known Binary - as a geometry format option. We would
> hope this can be efficiently accessed by a spatial index so we want to know
> how to handle perhaps a custom data type, perhaps geo:wkbLiteral, and how
> best to store this in Jena, perhaps not as hex text.
> > >
> > > Thanks, Nick
>


-- 


---
Marco Neumann


Re: CVE-2023-22665: Apache Jena: Exposure of arbitrary execution in script engine expressions.

2023-04-24 Thread Marco Neumann
Is that already fixed in 4.8.0 or applies to Apache Jena versions 4.7.0+?

Marco

On Mon, Apr 24, 2023 at 8:03 PM Andy Seaborne  wrote:

> Severity: important
>
> Description:
>
> There is insufficient checking of user queries in Apache Jena versions
> 4.7.0 and earlier, when invoking custom scripts. It allows a remote user to
> execute arbitrary javascript via a SPARQL query.
>
> Credit:
>
> L3yx of Syclover Security Team (reporter)
>
> References:
>
> https://jena.apache.org/
> https://www.cve.org/CVERecord?id=CVE-2023-22665
>
>

-- 


---
Marco Neumann


Community Over Code NA ASF Semantic GeoSpatial Track

2023-04-17 Thread Marco Neumann
FYI if you work with geospatial data and RDF you may want to consider to
present your project at the ASF Conference Community Over Code NA
GeoSpatial Track in Halifax, Nova Scotia, October 7-10, 2023

https://communityovercode.org/

As co-organizer I am particularly keen to see RDF and Geospatial efforts
being presented as well.

I am looking forward to your submissions
Marco



-- 


---
Marco Neumann


Re: density of GraphUtil not recognized

2022-12-11 Thread Marco Neumann
oh well...

On Sun, Dec 11, 2022 at 6:44 PM emri mbiemri 
wrote:

> The ChatGPT suggested using these kinds of solutions for getting some of
> the above-mentioned graph metrics.
> Anyway, is there any other tool that calculated some graph metrics on RDF
> graphs?
>
> On Sun, Dec 11, 2022 at 8:13 PM Andy Seaborne  wrote:
>
> >
> >
> > On 11/12/2022 13:15, emri mbiemri wrote:
> > > ok, thanks, but is there any function within Apache Jena that
> calculates
> > > the graph metrics such as centrality, density, and clustering
> > coefficient?
> >
> > No.
> >
> > What led you to thinking there was?
> >
> > Andy
> >
>


-- 


---
Marco Neumann


Re: GraphMem in 4.6

2022-10-18 Thread Marco Neumann
Can you briefly explain / guess how this degradation was introduced? I
presume we are not testing for regression in the unit tests. might be best
addressed in dev.

Marco

On Tue, Oct 18, 2022 at 10:46 AM Holger Knublauch 
wrote:

> Yes! This seems to have fixed it:
>
> ARQ.getContext().set(ARQ.optReorderBGP, false);
>
> Many thanks,
> Holger
>
>
>
> > On 18 Oct 2022, at 11:29 am, Élie Roux 
> wrote:
> >
> > Perhaps this is an instance of
> > https://github.com/apache/jena/issues/1533 ? What triple reordering
> > optimization are you using?
> >
> > Best,
> > --
> > Elie
>
>

-- 


---
Marco Neumann


Re: Apache Jena - 10 years as an Apache Project.

2022-04-18 Thread Marco Neumann
hurray. TLP.


On Mon, Apr 18, 2022 at 5:40 PM Andy Seaborne  wrote:

> Today is the 10th anniversary of Apache Jena as a Top Level Project of
> the Apache Software Foundation!
>
>

-- 


---
Marco Neumann
KONA


Re: WG: Broken GND dataset after loading with tdb2.xloader+tdb2.tdbloader

2022-03-03 Thread Marco Neumann
and it's not all about size, Joachim. :)

Some of my smaller datasets give me the greatest leverage and in
combination with linked data they really start to shine.

Marco

On Thu, Mar 3, 2022 at 9:51 AM Neubert, Joachim  wrote:

> Hi Andy,
>
> Thanks for investigating this. Using tdb2.tdbloader for all files, as you
> suggested, is fine. GND was a large dataset ten years ago, but apparently
> doesn't qualify as such any more :)
>
> Cheers, Joachim
>
> > -Ursprüngliche Nachricht-
> > Von: Andy Seaborne 
> > Gesendet: Sonntag, 27. Februar 2022 13:28
> > An: users@jena.apache.org
> > Betreff: Re: WG: Broken GND dataset after loading with
> > tdb2.xloader+tdb2.tdbloader
> >
> > Hi Joachim,
> >
> > Yes, there is a bug in xloader. I have managed to create an example test
> case
> > using a small amount of data.
> >
> > It is the xloader. Running the load-then-load test case with all other
> loaders
> > hasn't shown a problem so it isn't the second data load.
> >
> > I'm not sure what the cause is yet but I haven't seen query go wrong if
> all the
> > files are loaded once by xloader.
> >
> >  Andy
> >
> > JENA-2294
> >
> > On 22/02/2022 06:40, Neubert, Joachim wrote:
> > > This mail of yesterday didn't get through - here again.
> > >
> > > The data of the broken load is temporarily linked from
> > http://134.245.93.72/beta/tmp.
> > >
> > > I've now invoked
> > >
> > > /opt/jena/bin/tdb2.tdbloader --loader=parallel
> > > --loc=/zbw/var/lib/fuseki/databases/temp
> > > ../var/gnd/2021-11/src/GND.utf8.ttl.gz
> > > ../var/gnd/2021-11/src/gnd-sc.ttl
> > > ../var/gnd/2021-11/src/gnd-sc_notation.ttl
> > > ../var/gnd/2021-11/src/gndo.ttl
> > >
> > > and got a steadily decreasing rate (see below). On the other hand, the
> total
> > load time is nice. tdbstats ran correctly afterwards, and the query for
> > gndo:DifferentiatedPerson works as expected.
> > >
> > > Cheers - Joachim
>


-- 


---
Marco Neumann
KONA


Re: Geo indexing Wikidata

2022-02-21 Thread Marco Neumann
hMap.java:658) ~[?:?]
> > at java.util.HashMap.put(HashMap.java:607) ~[?:?]
> > at java.util.HashSet.add(HashSet.java:220) ~[?:?]
> > at
> > org.apache.jena.util.iterator.UniqueFilter.test(UniqueFilter.java:38)
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:56)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:103)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.util.IteratorCollection.iteratorToList(IteratorCollection.java:63)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> > org.apache.jena.graph.GraphUtil.addIteratorWorker(GraphUtil.java:144)
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at org.apache.jena.graph.GraphUtil.addInto(GraphUtil.java:139)
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at org.apache.jena.rdf.model.impl.ModelCom.add(ModelCom.java:195)
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.geosparql.configuration.GeoSPARQLOperations.applyInferencing(GeoSPARQLOperations.java:332)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.geosparql.configuration.GeoSPARQLOperations.applyInferencing(GeoSPARQLOperations.java:277)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.geosparql.configuration.GeoSPARQLOperations.applyInferencing(GeoSPARQLOperations.java:259)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> > at
> >
> org.apache.jena.geosparql.assembler.GeoAssembler.createDataset(GeoAssembler.java:140)
>
> > ~[fuseki-server.jar:4.5.0-SNAPSHOT]
> Looks like 8GB RAM is not enough for RDFS on the 18 million triples
> graph. Did not try to increase the RAM as it it takes too much times
> anyways and given that this would happen on every startup I do not
> consider this a good option.
>
> Next I tried the RDFS dataset feature Andy introduced which allows for
> "RDFS Simple" - inference (subClassOf, subPropertyOf, domain, range) -
> this should be sufficient, all we need is domain/range reasoning. So I
> added it to the assembler and forwarded this RDFS dataset to the
> Geosparql dataset. Restarted Fuseki and got instant access to the endpoint.
>
> Tried a very simple query to see if inference works as I need
>
> > select * {
> >   graph <http://wikidata.org/geosparql> {
> > ?s1 a geo:Feature
> >   }
> > }
> > limit 10
> Took ~100s on initial query, quite long? Ok, maybe there is some intial
> setup maybe later request will be fast. But it doesn't seem so, did a
> follow up request with a just renaming ?s1 to ?s2 to avoid caching.
> still ~75s response time. I'm not sure if this performance is now what
> we can get or if I did something wrong?
>
> Anyways, currently I'm loading the materialized variant of the dataset,
> i.e. with geo:Feature and geo:Geometry types attached such that
> inference can be disabled at all. Then I'll try some more queries.
>
>
> Did anybody else already try to setup the spatial index on Wikidata? Any
> experience or comments so far? Any suggestions how to handle the core
> data and also how to work on the non-terrestial data? Should we avoid
> inference here (18M vs 36M triples)?
>
>
> Cheers,
>
> Lorenz
>
>

-- 


---
Marco Neumann
KONA


Re: Duplicates in the "Available services" list in UI?

2022-02-20 Thread Marco Neumann
@Joachim: That is indeed something to consider. Fuseki is very liberal with
its default settings. users should be aware that they basically expose the
entire data set if they install the service as accessible on the public web.

On Sun, Feb 20, 2022 at 10:46 AM Neubert, Joachim  wrote:

> @Marco: For documentation, the screenshot is now at
> https://imgur.com/a/1WTaQnZ
>
> @all: As a user, I just have no idea what the duplicates could mean.
>
> As a system administrator, my long-time understanding was that services
> can only be accessed by operation-specific URLs, not by the general one for
> the dataset. I want to expose /query and perhaps /get, but surely not
> /update or /data. So my setup includes a firewall not allowing port 3030,
> and reverse proxy rules like this:
>
> ProxyPass /beta/sparql/wikidata/query
> http://127.0.0.1:3030/wikidata/query retry=0
> ProxyPassReverse  /beta/sparql/wikidata/query
> http://127.0.0.1:3030/wikidata/query
>
> In a quick check, access to /beta/sparql/wikidata is denied (404), so the
> approach seems to work anyway, but it was a bit scary to me that I could
> have accidentally exposed more than I'd intended.
>
> Cheers, Joachim
>
> > -Ursprüngliche Nachricht-
> > Von: Bruno Kinoshita 
> > Gesendet: Samstag, 19. Februar 2022 20:29
> > An: users@jena.apache.org
> > Betreff: Re: Duplicates in the "Available services" list in UI?
> >
> > Maybe we could add help icons to the UI, with a brief description to
> users
> > about things like the endpoints (including duplicates are OK), the
> editable
> > sparql endpoint added yesterday, content types, etc.
> >
> > Bruno
> >
> > On Sun, 20 Feb 2022, 08:23 Andy Seaborne,  wrote:
> >
> > >
> > >
> > > On 19/02/2022 17:15, Neubert, Joachim wrote:
> > > > In the Fuseki UI, I get apparent duplicates in the service list (see
> > > > also screenshot):
> > > >
> > > > SPARQL Update /wikidata/
> > > > SPARQL Update /wikidata/update
> > >
> > > That's fine.
> > >
> > > That's two endpoints both of which provide update.
> > >
> > > Because you used fuseki:serviceUpdate the server adds both the named
> > > endpoint and the enables the operation on the dataset URL - this is
> > > for long-time compatibility.
> > >
> > > In new-style setup, the config file specifices exactly what teh server
> > > will have:
> > > ("new" is several years!)
> > >
> > >
> > > Example 1:
> > > <#service_wikidata> rdf:type fuseki:Service ;
> > > fuseki:name "wikidata" ;
> > > fuseki:endpoint [
> > > fuseki:operation fuseki:update ;
> > > ] ;
> > > fuseki:endpoint [
> > > fuseki:operation fuseki:update ;
> > > fuseki:name "update"
> > > ] ;
> > > 
> > >
> > > Example 2:
> > > :service rdf:type fuseki:Service ;
> > >  fuseki:name "dataset" ;
> > >  fuseki:endpoint [
> > >  fuseki:operation fuseki:query ;
> > >  fuseki:name "sparql" ;
> > >  ] ;
> > >  fuseki:endpoint [
> > >  fuseki:operation fuseki:update ;
> > >  fuseki:name "sparql" ;
> > >  ] ;
> > >  fuseki:dataset :dataset ;
> > >  .
> > >
> > > Query and Update on /dataset/sparql.
> > >
> > > No name, or fuseki:name "" for an endpoint is the dataset itself.
> > > This is for all the SPARQL standard functions but does include GSP.
> > > Dispatch is based on content type, and query string.
> > >
> > >  Andy
> > >
> > > >
> > > > Is something wrong with my definitions (below), or is that an
> > > > interface issue?
> > > >
> > > > Cheers, Joachim
> > > >
> > > > Fuseki: VERSION: 4.5.0-SNAPSHOT
> > > >
> > > > Fuseki: BUILD_DATE: 2022-02-09T18:01:44Z
> > > >
> > > > <#service_wikidata> rdf:type fuseki:Service ;
> > > >
> > > >  rdfs:label  "wikidata TDB Service (RW)" ;
> > > >
> > > >  fuseki:name "wikidata" ;
> > > >
> > > >  fuseki:serviceQuery "query" ;
> > > >
> > > >  fuseki:serviceQuery "sparql" ;
> > > >
> > > >  fuseki:serviceUpdate"update" ;
> > > >
> > > >  fuseki:serviceUpload"upload" ;
> > > >
> > > >  fuseki:serviceReadWriteGraphStore"data" ;
> > > >
> > > >  fuseki:serviceReadGraphStore"get" ;
> > > >
> > > >  fuseki:dataset   :wikidata ;
> > > >
> > > >  .
> > > >
> > > > --
> > > >
> > > > Joachim Neubert
> > > >
> > > > ZBW – Leibniz Information Centre for Economics
> > > >
> > > > Neuer Jungfernstieg 21
> > > > 20354 Hamburg
> > > >
> > > > Phone +49-40-42834-462
> > > >
> > >
>


-- 


---
Marco Neumann
KONA


Re: Duplicates in the "Available services" list in UI?

2022-02-19 Thread Marco Neumann
I can't see the screenshot (it has been removed by the mailing list
software ) but I can confirm that both URLs will work with the defined
service request. I don't consider this to be a bug but a feature. The
Update will now work for both directives.

Marco

On Sat, Feb 19, 2022 at 5:15 PM Neubert, Joachim  wrote:

> In the Fuseki UI, I get apparent duplicates in the service list (see also
> screenshot):
>
>
>
> SPARQL Update /wikidata/
> <http://a1-srv04.zbw-nett.zbw-kiel.de:3030/wikidata/>
>
> SPARQL Update /wikidata/update
> <http://a1-srv04.zbw-nett.zbw-kiel.de:3030/wikidata/update>
>
>
>
> Is something wrong with my definitions (below), or is that an interface
> issue?
>
>
>
> Cheers, Joachim
>
>
>
> Fuseki: VERSION: 4.5.0-SNAPSHOT
>
> Fuseki: BUILD_DATE: 2022-02-09T18:01:44Z
>
>
>
> <#service_wikidata> rdf:type fuseki:Service ;
>
> rdfs:label  "wikidata TDB Service (RW)" ;
>
> fuseki:name "wikidata" ;
>
> fuseki:serviceQuery "query" ;
>
> fuseki:serviceQuery "sparql" ;
>
> fuseki:serviceUpdate"update" ;
>
> fuseki:serviceUpload"upload" ;
>
> fuseki:serviceReadWriteGraphStore"data" ;
>
> fuseki:serviceReadGraphStore"get" ;
>
> fuseki:dataset   :wikidata ;
>
> .
>
>
>
>
>
>
>
> --
>
> Joachim Neubert
>
>
>
> ZBW – Leibniz Information Centre for Economics
>
> Neuer Jungfernstieg 21
> 20354 Hamburg
>
> Phone +49-40-42834-462
>
>
>


-- 


---
Marco Neumann
KONA


Re: Loading Wikidata

2022-02-18 Thread Marco Neumann
thank you Joachim, I suspect Lorenz's machine would produce similar results
now with the most recent release. at lower cost and energy consumption.
(AMD Ryzen 9 5950X)

On Fri, Feb 18, 2022 at 9:16 AM Neubert, Joachim  wrote:

> OS is Centos 7.9 in a docker container running on Ubuntu 9.3.0.
>
> CPU is 4 x Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz / 18 core (144 cores
> in total)
>
> Cheers, Joachim
>
> > -Ursprüngliche Nachricht-
> > Von: Marco Neumann 
> > Gesendet: Freitag, 18. Februar 2022 10:00
> > An: users@jena.apache.org
> > Betreff: Re: Loading Wikidata
> >
> > Thank you for the effort Joachim, what CPU and OS was used for the load
> > test?
> >
> > Best,
> > Marco
> >
> > On Fri, Feb 18, 2022 at 8:51 AM Neubert, Joachim 
> > wrote:
> >
> > > Storage of the machine is one 10TB raid6 SSD.
> > >
> > > Cheers, Joachim
> > >
> > > > -Ursprüngliche Nachricht-
> > > > Von: Andy Seaborne 
> > > > Gesendet: Mittwoch, 16. Februar 2022 20:05
> > > > An: users@jena.apache.org
> > > > Betreff: Re: Loading Wikidata
> > > >
> > > >
> > > >
> > > > On 16/02/2022 11:56, Neubert, Joachim wrote:
> > > > > I've loaded the Wikidata "truthy" dataset with 6b triples. Summary
> > > stats is:
> > > > >
> > > > > 10:09:29 INFO  Load node table  = 3 seconds
> > > > > 10:09:29 INFO  Load ingest data = 25165 seconds
> > > > > 10:09:29 INFO  Build index SPO  = 11241 seconds
> > > > > 10:09:29 INFO  Build index POS  = 14100 seconds
> > > > > 10:09:29 INFO  Build index OSP  = 12435 seconds
> > > > > 10:09:29 INFO  Overall  98496 seconds
> > > > > 10:09:29 INFO  Overall  27h 21m 36s
> > > > > 10:09:29 INFO  Triples loaded   = 6756025616
> > > > > 10:09:29 INFO  Quads loaded = 0
> > > > > 10:09:29 INFO  Overall Rate 68591 tuples per second
> > > > >
> > > > > This was done on a large machine with 2TB RAM and -threads=48, but
> > > > anyway: It looks like tdb2.xloader in apache-jena-4.5.0-SNAPSHOT
> > > > brought HUGE improvements over prior versions (unfortunately I
> > > > cannot find a log, but it took multiple days with 3.x on the same
> > machine).
> > > >
> > > > This is very helpful - faster than Lorenz reported on a 128G / 12
> > > > threads (31h). It does suggests there is effectively a soft upper
> > > > bound on going
> > > faster
> > > > by more RAM, more threads.
> > > >
> > > > That seems likely - disk bandwith also matters and because xloader
> > > > is
> > > phased
> > > > between sort and index writing steps, it is unlikely to be getting
> > > > the
> > > best
> > > > overlap of CPU crunching and I/O.
> > > >
> > > > This all gets into RAID0, or allocating files across different disk.
> > > >
> > > > There comes a point where it gets quite a task to setup the machine.
> > > >
> > > > One other area I think might be easy to improve - more for smaller
> > > machines
> > > > - is during data ingest. There, the node table index is being
> > > > randomly
> > > read.
> > > > On smaller RAM machines, the ingest phase is proporiately longer,
> > > > sometimes a lot.
> > > >
> > > > An idea I had is calling the madvise system call on the mmap
> > > > segments to
> > > tell
> > > > the kernel the access is random (requires native code; Java17 makes
> > > > it possible to directly call mdavise(2) without needing a C (etc)
> > > > converter
> > > layer).
> > > >
> > > >  > If you think it useful, I am happy to share more details.
> > > >
> > > > What was the storage?
> > > >
> > > >  Andy
> > > > >
> > > > > Two observations:
> > > > >
> > > > >
> > > > > -As Andy (thanks again for all your help!) already
> mentioned,
> > > gzip files
> > > > apparently load significantly faster then bzip2 files. I experienced
> > > 200,000 vs.
> > > > 100,000 triples/second in the parse nodes step (though colleagues
> > > > had
> > > jobs
> > > > on the machine too, which might have influenced the results).
> > > > >
> > > > > -During the extended POS/POS/OSP sort periods, I saw only
> one
> > > or two
> > > > gzip instances (used in the background), which perhaps were a
> > > bottleneck. I
> > > > wonder if using pigz could extend parallel processing.
> > > > >
> > > > > If you think it usefull, I am happy to share more details. If I
> > > > > can
> > > help with
> > > > running some particular tests on a massive parallel machine, please
> > > > let
> > > me
> > > > know.
> > > > >
> > > > > Cheers, Joachim
> > > > >
> > > > > --
> > > > > Joachim Neubert
> > > > >
> > > > > ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg
> > > > > 21
> > > > > 20354 Hamburg
> > > > > Phone +49-40-42834-462
> > > > >
> > > > >
> > >
> >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
>


-- 


---
Marco Neumann
KONA


Re: Loading Wikidata

2022-02-18 Thread Marco Neumann
Thank you for the effort Joachim, what CPU and OS was used for the load
test?

Best,
Marco

On Fri, Feb 18, 2022 at 8:51 AM Neubert, Joachim  wrote:

> Storage of the machine is one 10TB raid6 SSD.
>
> Cheers, Joachim
>
> > -Ursprüngliche Nachricht-
> > Von: Andy Seaborne 
> > Gesendet: Mittwoch, 16. Februar 2022 20:05
> > An: users@jena.apache.org
> > Betreff: Re: Loading Wikidata
> >
> >
> >
> > On 16/02/2022 11:56, Neubert, Joachim wrote:
> > > I've loaded the Wikidata "truthy" dataset with 6b triples. Summary
> stats is:
> > >
> > > 10:09:29 INFO  Load node table  = 3 seconds
> > > 10:09:29 INFO  Load ingest data = 25165 seconds
> > > 10:09:29 INFO  Build index SPO  = 11241 seconds
> > > 10:09:29 INFO  Build index POS  = 14100 seconds
> > > 10:09:29 INFO  Build index OSP  = 12435 seconds
> > > 10:09:29 INFO  Overall  98496 seconds
> > > 10:09:29 INFO  Overall  27h 21m 36s
> > > 10:09:29 INFO  Triples loaded   = 6756025616
> > > 10:09:29 INFO  Quads loaded = 0
> > > 10:09:29 INFO  Overall Rate 68591 tuples per second
> > >
> > > This was done on a large machine with 2TB RAM and -threads=48, but
> > anyway: It looks like tdb2.xloader in apache-jena-4.5.0-SNAPSHOT brought
> > HUGE improvements over prior versions (unfortunately I cannot find a log,
> > but it took multiple days with 3.x on the same machine).
> >
> > This is very helpful - faster than Lorenz reported on a 128G / 12 threads
> > (31h). It does suggests there is effectively a soft upper bound on going
> faster
> > by more RAM, more threads.
> >
> > That seems likely - disk bandwith also matters and because xloader is
> phased
> > between sort and index writing steps, it is unlikely to be getting the
> best
> > overlap of CPU crunching and I/O.
> >
> > This all gets into RAID0, or allocating files across different disk.
> >
> > There comes a point where it gets quite a task to setup the machine.
> >
> > One other area I think might be easy to improve - more for smaller
> machines
> > - is during data ingest. There, the node table index is being randomly
> read.
> > On smaller RAM machines, the ingest phase is proporiately longer,
> > sometimes a lot.
> >
> > An idea I had is calling the madvise system call on the mmap segments to
> tell
> > the kernel the access is random (requires native code; Java17 makes it
> > possible to directly call mdavise(2) without needing a C (etc) converter
> layer).
> >
> >  > If you think it useful, I am happy to share more details.
> >
> > What was the storage?
> >
> >  Andy
> > >
> > > Two observations:
> > >
> > >
> > > -As Andy (thanks again for all your help!) already mentioned,
> gzip files
> > apparently load significantly faster then bzip2 files. I experienced
> 200,000 vs.
> > 100,000 triples/second in the parse nodes step (though colleagues had
> jobs
> > on the machine too, which might have influenced the results).
> > >
> > > -During the extended POS/POS/OSP sort periods, I saw only one
> or two
> > gzip instances (used in the background), which perhaps were a
> bottleneck. I
> > wonder if using pigz could extend parallel processing.
> > >
> > > If you think it usefull, I am happy to share more details. If I can
> help with
> > running some particular tests on a massive parallel machine, please let
> me
> > know.
> > >
> > > Cheers, Joachim
> > >
> > > --
> > > Joachim Neubert
> > >
> > > ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21
> > > 20354 Hamburg
> > > Phone +49-40-42834-462
> > >
> > >
>


-- 


---
Marco Neumann
KONA


Re: Error initializing geosparql

2021-12-13 Thread Marco Neumann
and as you have mentioned earlier GeoSPARQL 1.0 is 2D. If we go beyond that
(and we should at some point) we will have to indicate that in the
references.

For now GeoSPARQL 1.0 is the point of reference for the Apache Jena
GeoSPARQL implementation.

On Mon, Dec 13, 2021 at 12:30 PM Marco Neumann 
wrote:

> OK yes I see, thank you for the findings. That would explain the situation.
>
> I will take a look at geo:coordinateDimension. So I think what we would
> like to avoid is a situation where higher level dimensions are quietly
> skipped altogether. Even though using x, y access methods on higher level
> dimension data will lead to geographically incorrect results.
>
>
> Marco
>
>
>
> On Mon, Dec 13, 2021 at 12:02 PM Greg  wrote:
>
>> Ah, okay. That link is to the 11-52r3 version while I believe the final
>> version is 11-52r4. In the later draft, Req 12 became Req 9 and the
>> 'is3D' was removed.
>> I've found the OGC website can be a bit hit and miss about indicating
>> what are latest and deprecated versions.
>>
>> https://www.ogc.org/standards/geosparql#downloads
>>
>> https://portal.ogc.org/files/?artifact_id=47664
>>
>> Greg
>>
>> On 12/12/2021 20:41, Marco Neumann wrote:
>> > I wasn't are of the geo:is3D property myself but is is apparently in
>> Req 12
>> > of the final GeoSPARQL release V1.0
>> >
>> > https://portal.ogc.org/files/?artifact_id=44722
>> >
>> > maybe it's only in the requirements.
>> >
>> >
>> >
>> > On Sun, Dec 12, 2021 at 8:24 PM Greg  wrote:
>> >
>> >> Hi,
>> >>
>> >> The WKT parser supports XY, XYZ, XYM, and XYZM coordinate notations.
>> The
>> >> presence of dimensions beyond XY should not have an impact on the
>> >> geomtery being used in spatial relation or other queries. The triples
>> >> exist in the graph and will be processed, but the GeoSPARQL 1.0 is 2D
>> >> only so the extra dimensions are not used in calculations.
>> >>
>> >> I'm not aware of an 'is3D' property in the v1.0 spec as this can be
>> >> tested using the 'geo:coordinateDimension' (either 2, 3, or 4 - i.e.
>> XY,
>> >> XYZ, XYM and XYZM) and/or 'geo:spatialDimension' (either 2 or 3 - i.e.
>> >> XY and XYM or XYZ and XYZM) properties of geo:Geometry (Section 8.4).
>> >> These values are inferred in the Jena implementation and do not need to
>> >> asserted to be accessed.
>> >>
>> >> The standard also states that invalid geometry literals are to be
>> >> treated as errors, hence the 'DatatypeFormatException'.
>> >>
>> >> Thanks,
>> >>
>> >> Greg
>> >>
>> >> On 11/12/2021 19:33, Marco Neumann wrote:
>> >>> BTW this (the implementation of the property function) would be a
>> great
>> >>> project for members in the community to make a contribution to the
>> Jena
>> >>> project and the geosparql modul in particular.  I would think that
>> this
>> >>> could be a well manageable task for an experienced java developer
>> with an
>> >>> interest in geospatial data processing.
>> >>>
>> >>>
>> >>>
>> >>> On Sat, Dec 11, 2021 at 6:19 PM Marco Neumann <
>> marco.neum...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> That said, I am just looking at the code base and I think we are
>> >> missing a
>> >>>> property function for is3D which is mentioned in the  v1.0 spec.
>> >>>>
>> >>>> On Sat, Dec 11, 2021 at 5:14 PM Marco Neumann <
>> marco.neum...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> good to see some discussion around this at GeoSPARQL 1.1. But from
>> >> what I
>> >>>>> can see it is still in an early phase with regards to 3D.
>> >>>>>
>> >>>>> The name geosparql++ was just used  as a flag to indicate to the
>> user
>> >>>>> that we are operating outside of the realm of OGC geosparql 1.0
>> spec.
>> >>>>>
>> >>>>> similar to what we do with sparql vs arq syntax
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Sat, Dec 11, 2021 at 5:01 PM Lorenz Buehmann <
>> >>>>> buehm...@informatik.uni-leipzig.de> wrote:
>> >>

Re: Error initializing geosparql

2021-12-13 Thread Marco Neumann
OK yes I see, thank you for the findings. That would explain the situation.

I will take a look at geo:coordinateDimension. So I think what we would
like to avoid is a situation where higher level dimensions are quietly
skipped altogether. Even though using x, y access methods on higher level
dimension data will lead to geographically incorrect results.


Marco



On Mon, Dec 13, 2021 at 12:02 PM Greg  wrote:

> Ah, okay. That link is to the 11-52r3 version while I believe the final
> version is 11-52r4. In the later draft, Req 12 became Req 9 and the
> 'is3D' was removed.
> I've found the OGC website can be a bit hit and miss about indicating
> what are latest and deprecated versions.
>
> https://www.ogc.org/standards/geosparql#downloads
>
> https://portal.ogc.org/files/?artifact_id=47664
>
> Greg
>
> On 12/12/2021 20:41, Marco Neumann wrote:
> > I wasn't are of the geo:is3D property myself but is is apparently in Req
> 12
> > of the final GeoSPARQL release V1.0
> >
> > https://portal.ogc.org/files/?artifact_id=44722
> >
> > maybe it's only in the requirements.
> >
> >
> >
> > On Sun, Dec 12, 2021 at 8:24 PM Greg  wrote:
> >
> >> Hi,
> >>
> >> The WKT parser supports XY, XYZ, XYM, and XYZM coordinate notations. The
> >> presence of dimensions beyond XY should not have an impact on the
> >> geomtery being used in spatial relation or other queries. The triples
> >> exist in the graph and will be processed, but the GeoSPARQL 1.0 is 2D
> >> only so the extra dimensions are not used in calculations.
> >>
> >> I'm not aware of an 'is3D' property in the v1.0 spec as this can be
> >> tested using the 'geo:coordinateDimension' (either 2, 3, or 4 - i.e. XY,
> >> XYZ, XYM and XYZM) and/or 'geo:spatialDimension' (either 2 or 3 - i.e.
> >> XY and XYM or XYZ and XYZM) properties of geo:Geometry (Section 8.4).
> >> These values are inferred in the Jena implementation and do not need to
> >> asserted to be accessed.
> >>
> >> The standard also states that invalid geometry literals are to be
> >> treated as errors, hence the 'DatatypeFormatException'.
> >>
> >> Thanks,
> >>
> >> Greg
> >>
> >> On 11/12/2021 19:33, Marco Neumann wrote:
> >>> BTW this (the implementation of the property function) would be a great
> >>> project for members in the community to make a contribution to the Jena
> >>> project and the geosparql modul in particular.  I would think that this
> >>> could be a well manageable task for an experienced java developer with
> an
> >>> interest in geospatial data processing.
> >>>
> >>>
> >>>
> >>> On Sat, Dec 11, 2021 at 6:19 PM Marco Neumann  >
> >>> wrote:
> >>>
> >>>> That said, I am just looking at the code base and I think we are
> >> missing a
> >>>> property function for is3D which is mentioned in the  v1.0 spec.
> >>>>
> >>>> On Sat, Dec 11, 2021 at 5:14 PM Marco Neumann <
> marco.neum...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> good to see some discussion around this at GeoSPARQL 1.1. But from
> >> what I
> >>>>> can see it is still in an early phase with regards to 3D.
> >>>>>
> >>>>> The name geosparql++ was just used  as a flag to indicate to the user
> >>>>> that we are operating outside of the realm of OGC geosparql 1.0 spec.
> >>>>>
> >>>>> similar to what we do with sparql vs arq syntax
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Dec 11, 2021 at 5:01 PM Lorenz Buehmann <
> >>>>> buehm...@informatik.uni-leipzig.de> wrote:
> >>>>>
> >>>>>> It's on the way with GeoSPARQL 1.1, isn't it? At least there are
> >> tickets
> >>>>>> related to it, e.g. [1] and many functions will be stated to work on
> >> 3D
> >>>>>> as well [2]
> >>>>>>
> >>>>>>> I personally think we should go beyond GeoSPARQL soon with Jena to
> >>>>>> provide
> >>>>>>> users with more advanced features. Possibly flag it as geosparql++
> or
> >>>>>> the
> >>>>>>> like.
> >>>>>> You mean because there was already this GeoSPARQL+ thing from
> Steffen
> >>>>>> Staab group with support

Re: Error initializing geosparql

2021-12-12 Thread Marco Neumann
I wasn't are of the geo:is3D property myself but is is apparently in Req 12
of the final GeoSPARQL release V1.0

https://portal.ogc.org/files/?artifact_id=44722

maybe it's only in the requirements.



On Sun, Dec 12, 2021 at 8:24 PM Greg  wrote:

> Hi,
>
> The WKT parser supports XY, XYZ, XYM, and XYZM coordinate notations. The
> presence of dimensions beyond XY should not have an impact on the
> geomtery being used in spatial relation or other queries. The triples
> exist in the graph and will be processed, but the GeoSPARQL 1.0 is 2D
> only so the extra dimensions are not used in calculations.
>
> I'm not aware of an 'is3D' property in the v1.0 spec as this can be
> tested using the 'geo:coordinateDimension' (either 2, 3, or 4 - i.e. XY,
> XYZ, XYM and XYZM) and/or 'geo:spatialDimension' (either 2 or 3 - i.e.
> XY and XYM or XYZ and XYZM) properties of geo:Geometry (Section 8.4).
> These values are inferred in the Jena implementation and do not need to
> asserted to be accessed.
>
> The standard also states that invalid geometry literals are to be
> treated as errors, hence the 'DatatypeFormatException'.
>
> Thanks,
>
> Greg
>
> On 11/12/2021 19:33, Marco Neumann wrote:
> > BTW this (the implementation of the property function) would be a great
> > project for members in the community to make a contribution to the Jena
> > project and the geosparql modul in particular.  I would think that this
> > could be a well manageable task for an experienced java developer with an
> > interest in geospatial data processing.
> >
> >
> >
> > On Sat, Dec 11, 2021 at 6:19 PM Marco Neumann 
> > wrote:
> >
> >> That said, I am just looking at the code base and I think we are
> missing a
> >> property function for is3D which is mentioned in the  v1.0 spec.
> >>
> >> On Sat, Dec 11, 2021 at 5:14 PM Marco Neumann 
> >> wrote:
> >>
> >>> good to see some discussion around this at GeoSPARQL 1.1. But from
> what I
> >>> can see it is still in an early phase with regards to 3D.
> >>>
> >>> The name geosparql++ was just used  as a flag to indicate to the user
> >>> that we are operating outside of the realm of OGC geosparql 1.0 spec.
> >>>
> >>> similar to what we do with sparql vs arq syntax
> >>>
> >>>
> >>>
> >>> On Sat, Dec 11, 2021 at 5:01 PM Lorenz Buehmann <
> >>> buehm...@informatik.uni-leipzig.de> wrote:
> >>>
> >>>> It's on the way with GeoSPARQL 1.1, isn't it? At least there are
> tickets
> >>>> related to it, e.g. [1] and many functions will be stated to work on
> 3D
> >>>> as well [2]
> >>>>
> >>>>> I personally think we should go beyond GeoSPARQL soon with Jena to
> >>>> provide
> >>>>> users with more advanced features. Possibly flag it as geosparql++ or
> >>>> the
> >>>>> like.
> >>>> You mean because there was already this GeoSPARQL+ thing from Steffen
> >>>> Staab group with support for rasterized data? It's a shame that those
> >>>> stuff does never make it into the main public projects they are based
> >>>> on. What a waste of resources and time (from my point of view)
> >>>>
> >>>> [1] https://github.com/opengeospatial/ogc-geosparql/issues/238
> >>>> [2]
> >>>>
> >>>>
> https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html#_b_1_functions_summary_table
> >>>>
> >>>> On 11.12.21 17:39, Marco Neumann wrote:
> >>>>> That's correct Jean-Marc, no comma.
> >>>>>
> >>>>> And yes the OGC GeoSPARQL spec is not supporting 3D access methods.
> >>>> And if
> >>>>> you record a third dimension, which is of course possible, it will be
> >>>>> ignored in Jena. Unfortunately the entire record will be. We could
> >>>> record
> >>>>> this as a bug but it's really not supported at the moment by the
> spec.
> >>>> Many
> >>>>> of the spatial functions in the OGC GeoSPARQL spec operate with a 2D
> >>>>> reference system.
> >>>>>
> >>>>> I personally think we should go beyond GeoSPARQL soon with Jena to
> >>>> provide
> >>>>> users with more advanced features. Possibly flag it as geosparql++ or
> >>>> the
> >>>>> like.
> >>>>>
> >>>

Re: Re: Error initializing geosparql

2021-12-11 Thread Marco Neumann
BTW this (the implementation of the property function) would be a great
project for members in the community to make a contribution to the Jena
project and the geosparql modul in particular.  I would think that this
could be a well manageable task for an experienced java developer with an
interest in geospatial data processing.



On Sat, Dec 11, 2021 at 6:19 PM Marco Neumann 
wrote:

> That said, I am just looking at the code base and I think we are missing a
> property function for is3D which is mentioned in the  v1.0 spec.
>
> On Sat, Dec 11, 2021 at 5:14 PM Marco Neumann 
> wrote:
>
>> good to see some discussion around this at GeoSPARQL 1.1. But from what I
>> can see it is still in an early phase with regards to 3D.
>>
>> The name geosparql++ was just used  as a flag to indicate to the user
>> that we are operating outside of the realm of OGC geosparql 1.0 spec.
>>
>> similar to what we do with sparql vs arq syntax
>>
>>
>>
>> On Sat, Dec 11, 2021 at 5:01 PM Lorenz Buehmann <
>> buehm...@informatik.uni-leipzig.de> wrote:
>>
>>> It's on the way with GeoSPARQL 1.1, isn't it? At least there are tickets
>>> related to it, e.g. [1] and many functions will be stated to work on 3D
>>> as well [2]
>>>
>>> > I personally think we should go beyond GeoSPARQL soon with Jena to
>>> provide
>>> > users with more advanced features. Possibly flag it as geosparql++ or
>>> the
>>> > like.
>>> You mean because there was already this GeoSPARQL+ thing from Steffen
>>> Staab group with support for rasterized data? It's a shame that those
>>> stuff does never make it into the main public projects they are based
>>> on. What a waste of resources and time (from my point of view)
>>>
>>> [1] https://github.com/opengeospatial/ogc-geosparql/issues/238
>>> [2]
>>>
>>> https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html#_b_1_functions_summary_table
>>>
>>> On 11.12.21 17:39, Marco Neumann wrote:
>>> > That's correct Jean-Marc, no comma.
>>> >
>>> > And yes the OGC GeoSPARQL spec is not supporting 3D access methods.
>>> And if
>>> > you record a third dimension, which is of course possible, it will be
>>> > ignored in Jena. Unfortunately the entire record will be. We could
>>> record
>>> > this as a bug but it's really not supported at the moment by the spec.
>>> Many
>>> > of the spatial functions in the OGC GeoSPARQL spec operate with a 2D
>>> > reference system.
>>> >
>>> > I personally think we should go beyond GeoSPARQL soon with Jena to
>>> provide
>>> > users with more advanced features. Possibly flag it as geosparql++ or
>>> the
>>> > like.
>>> >
>>> > Best,
>>> > Marco
>>> >
>>> >
>>> >
>>> >
>>> > On Sun, Dec 5, 2021 at 4:15 PM Jean-Marc Vanel <
>>> jeanmarc.va...@gmail.com>
>>> > wrote:
>>> >
>>> >> I fixed the WKT not having the right datatype, as said before; here
>>> are the
>>> >> SPARQL used to check and fix:
>>> >> COUNT-spatial-wkt-as-string.rq
>>> >> <
>>> >>
>>> https://github.com/jmvanel/semantic_forms/blob/master/sparql/COUNT-spatial-wkt-as-string.rq
>>> >> FIX-spatial-wkt-as-string.upd.rq
>>> >> <
>>> >>
>>> https://github.com/jmvanel/semantic_forms/blob/master/sparql/FIX-spatial-wkt-as-string.upd.rq
>>> >> Now this is not the end of the road . Another imperfect data causing
>>> >> geosparql initialization to fail :
>>> >>
>>> >> *Exception: Build WKT Geometry Exception - Type: point, Coordinates:
>>> >> (2.353821,48.83399,0). Index 1 out of bounds for length 1*
>>> >> 2021-12-05T15:48:54.166Z [application-akka.actor.default-dispatcher-5]
>>> >> ERROR jena - Exception class:class
>>> >> org.apache.jena.datatypes.DatatypeFormatException
>>> >> 2021-12-05T15:48:54.167Z [application-akka.actor.default-dispatcher-5]
>>> >> ERROR jena - Exception
>>> >> org.apache.jena.datatypes.DatatypeFormatException: Build WKT Geometry
>>> >> Exception - Type: point, Coordinates: (2.353821,48.83399,0). Index 1
>>> out of
>>> >> bounds for length 1
>>> >>
>>> >>
>>> org.apache.j

Re: Re: Error initializing geosparql

2021-12-11 Thread Marco Neumann
That said, I am just looking at the code base and I think we are missing a
property function for is3D which is mentioned in the  v1.0 spec.

On Sat, Dec 11, 2021 at 5:14 PM Marco Neumann 
wrote:

> good to see some discussion around this at GeoSPARQL 1.1. But from what I
> can see it is still in an early phase with regards to 3D.
>
> The name geosparql++ was just used  as a flag to indicate to the user that
> we are operating outside of the realm of OGC geosparql 1.0 spec.
>
> similar to what we do with sparql vs arq syntax
>
>
>
> On Sat, Dec 11, 2021 at 5:01 PM Lorenz Buehmann <
> buehm...@informatik.uni-leipzig.de> wrote:
>
>> It's on the way with GeoSPARQL 1.1, isn't it? At least there are tickets
>> related to it, e.g. [1] and many functions will be stated to work on 3D
>> as well [2]
>>
>> > I personally think we should go beyond GeoSPARQL soon with Jena to
>> provide
>> > users with more advanced features. Possibly flag it as geosparql++ or
>> the
>> > like.
>> You mean because there was already this GeoSPARQL+ thing from Steffen
>> Staab group with support for rasterized data? It's a shame that those
>> stuff does never make it into the main public projects they are based
>> on. What a waste of resources and time (from my point of view)
>>
>> [1] https://github.com/opengeospatial/ogc-geosparql/issues/238
>> [2]
>>
>> https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html#_b_1_functions_summary_table
>>
>> On 11.12.21 17:39, Marco Neumann wrote:
>> > That's correct Jean-Marc, no comma.
>> >
>> > And yes the OGC GeoSPARQL spec is not supporting 3D access methods. And
>> if
>> > you record a third dimension, which is of course possible, it will be
>> > ignored in Jena. Unfortunately the entire record will be. We could
>> record
>> > this as a bug but it's really not supported at the moment by the spec.
>> Many
>> > of the spatial functions in the OGC GeoSPARQL spec operate with a 2D
>> > reference system.
>> >
>> > I personally think we should go beyond GeoSPARQL soon with Jena to
>> provide
>> > users with more advanced features. Possibly flag it as geosparql++ or
>> the
>> > like.
>> >
>> > Best,
>> > Marco
>> >
>> >
>> >
>> >
>> > On Sun, Dec 5, 2021 at 4:15 PM Jean-Marc Vanel <
>> jeanmarc.va...@gmail.com>
>> > wrote:
>> >
>> >> I fixed the WKT not having the right datatype, as said before; here
>> are the
>> >> SPARQL used to check and fix:
>> >> COUNT-spatial-wkt-as-string.rq
>> >> <
>> >>
>> https://github.com/jmvanel/semantic_forms/blob/master/sparql/COUNT-spatial-wkt-as-string.rq
>> >> FIX-spatial-wkt-as-string.upd.rq
>> >> <
>> >>
>> https://github.com/jmvanel/semantic_forms/blob/master/sparql/FIX-spatial-wkt-as-string.upd.rq
>> >> Now this is not the end of the road . Another imperfect data causing
>> >> geosparql initialization to fail :
>> >>
>> >> *Exception: Build WKT Geometry Exception - Type: point, Coordinates:
>> >> (2.353821,48.83399,0). Index 1 out of bounds for length 1*
>> >> 2021-12-05T15:48:54.166Z [application-akka.actor.default-dispatcher-5]
>> >> ERROR jena - Exception class:class
>> >> org.apache.jena.datatypes.DatatypeFormatException
>> >> 2021-12-05T15:48:54.167Z [application-akka.actor.default-dispatcher-5]
>> >> ERROR jena - Exception
>> >> org.apache.jena.datatypes.DatatypeFormatException: Build WKT Geometry
>> >> Exception - Type: point, Coordinates: (2.353821,48.83399,0). Index 1
>> out of
>> >> bounds for length 1
>> >>
>> >>
>> org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.buildGeometry(WKTReader.java:141)
>> >>
>> >>
>> org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.(WKTReader.java:50)
>> >>
>> >>
>> org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.extract(WKTReader.java:292)
>> >>
>> >>
>> >>
>> org.apache.jena.geosparql.implementation.datatype.WKTDatatype.read(WKTDatatype.java:89)
>> >>
>> >>
>> org.apache.jena.geosparql.implementation.index.GeometryLiteralIndex.retrieveMemoryIndex(GeometryLiteralIndex.java:69)
>> >>
>> >>
>> org.apache.jena.geosparql.implementation.index.GeometryLiteralIndex.retrieve(GeometryLiteralI

Re: Re: Error initializing geosparql

2021-12-11 Thread Marco Neumann
good to see some discussion around this at GeoSPARQL 1.1. But from what I
can see it is still in an early phase with regards to 3D.

The name geosparql++ was just used  as a flag to indicate to the user that
we are operating outside of the realm of OGC geosparql 1.0 spec.

similar to what we do with sparql vs arq syntax



On Sat, Dec 11, 2021 at 5:01 PM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

> It's on the way with GeoSPARQL 1.1, isn't it? At least there are tickets
> related to it, e.g. [1] and many functions will be stated to work on 3D
> as well [2]
>
> > I personally think we should go beyond GeoSPARQL soon with Jena to
> provide
> > users with more advanced features. Possibly flag it as geosparql++ or the
> > like.
> You mean because there was already this GeoSPARQL+ thing from Steffen
> Staab group with support for rasterized data? It's a shame that those
> stuff does never make it into the main public projects they are based
> on. What a waste of resources and time (from my point of view)
>
> [1] https://github.com/opengeospatial/ogc-geosparql/issues/238
> [2]
>
> https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html#_b_1_functions_summary_table
>
> On 11.12.21 17:39, Marco Neumann wrote:
> > That's correct Jean-Marc, no comma.
> >
> > And yes the OGC GeoSPARQL spec is not supporting 3D access methods. And
> if
> > you record a third dimension, which is of course possible, it will be
> > ignored in Jena. Unfortunately the entire record will be. We could record
> > this as a bug but it's really not supported at the moment by the spec.
> Many
> > of the spatial functions in the OGC GeoSPARQL spec operate with a 2D
> > reference system.
> >
> > I personally think we should go beyond GeoSPARQL soon with Jena to
> provide
> > users with more advanced features. Possibly flag it as geosparql++ or the
> > like.
> >
> > Best,
> > Marco
> >
> >
> >
> >
> > On Sun, Dec 5, 2021 at 4:15 PM Jean-Marc Vanel  >
> > wrote:
> >
> >> I fixed the WKT not having the right datatype, as said before; here are
> the
> >> SPARQL used to check and fix:
> >> COUNT-spatial-wkt-as-string.rq
> >> <
> >>
> https://github.com/jmvanel/semantic_forms/blob/master/sparql/COUNT-spatial-wkt-as-string.rq
> >> FIX-spatial-wkt-as-string.upd.rq
> >> <
> >>
> https://github.com/jmvanel/semantic_forms/blob/master/sparql/FIX-spatial-wkt-as-string.upd.rq
> >> Now this is not the end of the road . Another imperfect data causing
> >> geosparql initialization to fail :
> >>
> >> *Exception: Build WKT Geometry Exception - Type: point, Coordinates:
> >> (2.353821,48.83399,0). Index 1 out of bounds for length 1*
> >> 2021-12-05T15:48:54.166Z [application-akka.actor.default-dispatcher-5]
> >> ERROR jena - Exception class:class
> >> org.apache.jena.datatypes.DatatypeFormatException
> >> 2021-12-05T15:48:54.167Z [application-akka.actor.default-dispatcher-5]
> >> ERROR jena - Exception
> >> org.apache.jena.datatypes.DatatypeFormatException: Build WKT Geometry
> >> Exception - Type: point, Coordinates: (2.353821,48.83399,0). Index 1
> out of
> >> bounds for length 1
> >>
> >>
> org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.buildGeometry(WKTReader.java:141)
> >>
> >>
> org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.(WKTReader.java:50)
> >>
> >>
> org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.extract(WKTReader.java:292)
> >>
> >>
> >>
> org.apache.jena.geosparql.implementation.datatype.WKTDatatype.read(WKTDatatype.java:89)
> >>
> >>
> org.apache.jena.geosparql.implementation.index.GeometryLiteralIndex.retrieveMemoryIndex(GeometryLiteralIndex.java:69)
> >>
> >>
> org.apache.jena.geosparql.implementation.index.GeometryLiteralIndex.retrieve(GeometryLiteralIndex.java:51)
> >>
> >>
> org.apache.jena.geosparql.implementation.datatype.GeometryDatatype.parse(GeometryDatatype.java:57)
> >>
> >>
> org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1176)
> >>
> >>
> >>
> org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1137)
> >>
> >>
> org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1147)
> >> org.apache.jena.geosparql.configuration.ModeSRS.search(ModeSRS.java:61)
> >>
> >>
> org.apache.jena.geo

Re: Error initializing geosparql

2021-12-11 Thread Marco Neumann
 example of the object values:
> > "POINT(-4.189911,54.880557,0)"
> > I probably imported them by hacking a JSON API as JSON-LD , I have to
> > check my journals ...
> >
> > Looking at the OGC GeoSPARQL standard, I saw that the WKT strings should
> > have this datatype :
> > http://www.opengis.net/ont/geosparql#wktLiteral
> >
> > So I can make a SPARQL update to FIX my data .
> > But maybe Jena GeoSPARQL could be forgiving about the string datatype for
> > WKT data .
> > And the error message should be more explicit ...
> >
> > Thanks Andy for the quick answer.
> >
> > Jean-Marc Vanel
> > <
> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me>
> +33
> > (0)6 89 16 29 52
> >
> >
> > Le dim. 5 déc. 2021 à 11:52, Jean-Marc Vanel 
> a
> > écrit :
> >
> >> After having fixed bad data in the TDB database (latitude is present but
> >> not longitude, coordinates as strings , see issue
> >> https://issues.apache.org/jira/browse/JENA-2202 ),
> >> there is an exception, probably related to the database content.
> >> Here is the log:
> >> Dec 05, 2021 9:57:33 AM
> >> org.apache.sis.referencing.factory.sql.EPSGFactory 
> >> WARNING: The “SIS_DATA” environment variable is not set.
> >> 2021-12-05T09:57:33.940Z [application-akka.actor.default-dispatcher-9]
> >> INFO  jena - SpatialIndex: isFunctionRegistered true
> >> 2021-12-05T09:57:33.941Z [application-akka.actor.default-dispatcher-9]
> >> INFO  jena - Before setupSpatialIndex
> >> 2021-12-05T09:57:33.948Z [application-akka.actor.default-dispatcher-9]
> >> INFO  o.a.j.g.c.GeoSPARQLOperations - Find Mode SRS - Started
> >>
> >> And here is the exception:
> >> *Exception: Unrecognised Geometry Datatype:
> >> http://www.w3.org/2001/XMLSchema#string
> >> <http://www.w3.org/2001/XMLSchema#string> Ensure that Datatype is
> extending
> >> GeometryDatatype.*
> >>
> >>
> org.apache.jena.geosparql.implementation.datatype.GeometryDatatype.get(GeometryDatatype.java:78)
> >>
> >>
> org.apache.jena.geosparql.implementation.datatype.GeometryDatatype.get(GeometryDatatype.java:86)
> >>
> >>
> org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1175)
> >>
> >>
> org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1137)
> >>
> >>
> org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1147)
> >> org.apache.jena.geosparql.configuration.ModeSRS.search(ModeSRS.java:61)
> >>
> >>
> org.apache.jena.geosparql.configuration.GeoSPARQLOperations.findModeSRS(GeoSPARQLOperations.java:520)
> >>
> >>
> org.apache.jena.geosparql.spatial.SpatialIndex.buildSpatialIndex(SpatialIndex.java:336)
> >>
> >>
> org.apache.jena.geosparql.configuration.GeoSPARQLConfig.setupSpatialIndex(GeoSPARQLConfig.java:263)
> >>
> >>
> deductions.runtime.jena.RDFStoreLocalJenaProviderObject$.createDatabase(RDFStoreLocalJenaProvider.scala:175)
> >>
> >> I use the latest Jena release 4.2.0 . Note that there is no trouble on
> my
> >> development machine, only on the production site , although the source
> is
> >> the same .
> >>
> >> Jean-Marc Vanel
> >>
> >>
>


-- 


---
Marco Neumann
KONA


Re: Upload large datasets to fuseki

2021-12-06 Thread Marco Neumann
I am currently experimenting with xloader. It's part of the 4.3 release.

It's not as fast as tdb2.tdbloader with the parallel option but it seems to
work more gracefully with an extra large datasets.


On Mon, Dec 6, 2021 at 1:14 PM  wrote:

>
>
> Hello,
>
> I have to upload 3 billion triples to Jena Fuseki.
> I tried
> using the following command with a first dataset (0.ttl.gz 1.ttl.gz =>
> 750 million triples):
> tdb2_tdbloader.bat --loader = parallel --loc
> datasetX 0.ttl.gz 1.ttl.gz.
>
> Loading took about 8 hours to upload 750
> milion. The system has a Core-i7, 16 G ram, SSD hard-disk.
>
> Is it
> possible to optimize loading times?
>
> I have seen that there are several
> types of loaders:
> tdbloader
> tdbloader2 (I can also use a linux
> system)
> tdb2_tdbloader (with different options)
>
> Which of these is the
> best?
>
> Thanks!
>
>
>
> Con Tiscali Mobile Smart 70 hai 70 GB in 4G, minuti illimitati e 100 SMS a
> soli 7,99€ al mese http://tisca.li/Smart70
>
>

-- 


---
Marco Neumann
KONA


Re: Heap space problem with insert where

2021-09-24 Thread Marco Neumann
sure and if its fits your your use case even better

On Fri, Sep 24, 2021 at 9:41 AM Harri Kiiskinen 
wrote:

> Perhaps so; but as a tool, Jena, and SPARQL in general, is very suitable
> for managing and processing data so that the processes can be described
> and repeated. For example in this case, processing the results of the
> OCR is very quick compared to the actual OCR process, so I prefer to
> store the original results of the OCR somewhere, and do post-processing
> – which may require other stages than just the one presented here –
> later. For any external solution, I'd have to store the original text
> somewhere in any case, and keep track of the file names etc.
>
> In this case, the actual run of the corrected SPARQL took only some tens
> of seconds, which is rather good, especially compared to the amount of
> time it would take to write the necessary scripts and data management
> for making this simple process repeatable with external solutions.
>
> And in fact, if a database cannot be used for managing and processing
> data, I don't what what it should be used for :-)
>
> Harri
>
>
> On 24.9.2021 11.21, Marco Neumann wrote:
> > All that said, I would think you'd be best advised to run this type of
> > operation outside of Jena during preprocessing with CLI tools such as
> grep,
> > sed, awk or ack.
> >
> > On Fri, Sep 24, 2021 at 9:14 AM Harri Kiiskinen 
> > wrote:
> >
> >> Hi all,
> >>
> >> and thanks for the support! I did manage to resolve the problem by
> >> modifying the query, detailed comments below.
> >>
> >> Harri K.
> >>
> >> On 23.9.2021 22.47, Andy Seaborne wrote:
> >>> I guess you are using TDB2 if you have -Xmx2G. TDB1 wil use even more
> >>> heap space.
> >>
> >> Yes, TDB2.
> >>
> >>> All those named variables mean that the intermediate results are being
> >>> held onto. That includes the "no change" case. It looks like REPLACE
> and
> >>> no change is still a new string.
> >>
> >> I was a afraid this might be the vase.
> >>
> >>> There is at least 8 Gbytes just there by my rough calculation.
> >>
> >> -Xmx12G was not enough, so even more, I guess.
> >>
> >>> Things to try:
> >>>
> >>> 1/
> >>> Replace the use of named variables by a single expression
> >>> REPLACE (REPLACE(  ))
> >>
> >> This did the trick. Combining all the replaces to one as above was
> >> enough to keep the memory use below 7 GB.
> >>
> >> I also tried replacing the BIND's with the Jena-specific LET-constructs
> >> (https://jena.apache.org/documentation/query/assignment.html) but that
> >> had no effect – is the LET just a pre-SPARQL-1.1 addition that is
> >> practically same as BIND, or is there a meaningful difference between
> >> the two?
> >>
> >>> 2/ (expanding on Macros' email):
> >>> If you are using TDB2:
> >>>
> >>> First transaction:
> >>> COPY vice:pageocrdata TO vice:pageocrdata_clean
> >>> or
> >>> insert {
> >>>   graph vice:pageocrdata_clean {
> >>> ?page vice:ocrtext ?X .
> >>>   }
> >>> }
> >>> where {
> >>>   graph vice:pageocrdata {
> >>> ?page vice:ocrtext ?X .
> >>>   }
> >>>
> >>> then applies the changes:
> >>>
> >>> WITH vice:pageocrdata_clean
> >>> DELETE { ?page vice:ocrtext ?ocr }
> >>> INSERT { ?page vice:ocrtext ?ocr7 }
> >>> WHERE {
> >>>   ?page vice:ocrtext ?ocr .
> >>>   BIND(replace(?ocr1,'uͤ','ü') AS ?ocr7)
> >>>   FILTER (?ocr != ?ocr7)
> >>> }
> >>
> >> Is there a big difference in working within one graphs as compared to
> >> intergraph update operations? Just asking because I'm compartmentalizing
> >> my data to different graphs quite much, but if it is significantly more
> >> expensive, I may have to rethink some processes, like shown above.
> >>
> >>> 3/
> >>> If TDB1 and none of that works, maybe reduce the internal transaction
> >>> space as well
> >>>
> >>> It so happens that SELECT LIMIT OFFSET is predictable for a persistent
> >>> database (this is not portable!!).
> >>>
> >>> WHERE {
> >>>  {
> >>>SELECT ?ocr
> >>>

Re: Heap space problem with insert where

2021-09-24 Thread Marco Neumann
All that said, I would think you'd be best advised to run this type of
operation outside of Jena during preprocessing with CLI tools such as grep,
sed, awk or ack.

On Fri, Sep 24, 2021 at 9:14 AM Harri Kiiskinen 
wrote:

> Hi all,
>
> and thanks for the support! I did manage to resolve the problem by
> modifying the query, detailed comments below.
>
> Harri K.
>
> On 23.9.2021 22.47, Andy Seaborne wrote:
> > I guess you are using TDB2 if you have -Xmx2G. TDB1 wil use even more
> > heap space.
>
> Yes, TDB2.
>
> > All those named variables mean that the intermediate results are being
> > held onto. That includes the "no change" case. It looks like REPLACE and
> > no change is still a new string.
>
> I was a afraid this might be the vase.
>
> > There is at least 8 Gbytes just there by my rough calculation.
>
> -Xmx12G was not enough, so even more, I guess.
>
> > Things to try:
> >
> > 1/
> > Replace the use of named variables by a single expression
> > REPLACE (REPLACE(  ))
>
> This did the trick. Combining all the replaces to one as above was
> enough to keep the memory use below 7 GB.
>
> I also tried replacing the BIND's with the Jena-specific LET-constructs
> (https://jena.apache.org/documentation/query/assignment.html) but that
> had no effect – is the LET just a pre-SPARQL-1.1 addition that is
> practically same as BIND, or is there a meaningful difference between
> the two?
>
> > 2/ (expanding on Macros' email):
> > If you are using TDB2:
> >
> > First transaction:
> > COPY vice:pageocrdata TO vice:pageocrdata_clean
> > or
> > insert {
> >  graph vice:pageocrdata_clean {
> >?page vice:ocrtext ?X .
> >  }
> >}
> >where {
> >  graph vice:pageocrdata {
> >?page vice:ocrtext ?X .
> >  }
> >
> > then applies the changes:
> >
> > WITH vice:pageocrdata_clean
> > DELETE { ?page vice:ocrtext ?ocr }
> > INSERT { ?page vice:ocrtext ?ocr7 }
> > WHERE {
> >  ?page vice:ocrtext ?ocr .
> >  BIND(replace(?ocr1,'uͤ','ü') AS ?ocr7)
> >  FILTER (?ocr != ?ocr7)
> > }
>
> Is there a big difference in working within one graphs as compared to
> intergraph update operations? Just asking because I'm compartmentalizing
> my data to different graphs quite much, but if it is significantly more
> expensive, I may have to rethink some processes, like shown above.
>
> > 3/
> > If TDB1 and none of that works, maybe reduce the internal transaction
> > space as well
> >
> > It so happens that SELECT LIMIT OFFSET is predictable for a persistent
> > database (this is not portable!!).
> >
> > WHERE {
> > {
> >   SELECT ?ocr
> >   { graph vice:pageocrdata { ?page vice:ocrtext ?ocr . }
> >   OFFSET ... LIMIT ...
> > }
> > All the BIND
> > }
> >
> > (or filter by , starts ?ocr starts with "A" then with "B"
> >
> >  Andy
>
> Ah, yes, of course, this may become handy with even larger datasets.
>
> > BTW : replace(str(?ocr), ...
> > Any URIs will turn into strings and any language tags will be lost.
>
> Yes, that is unnecessary.
>
> > On 23/09/2021 16:28, Marco Neumann wrote:
> >> "not to bind" to be read as "just bind once"
> >>
> >> On Thu, Sep 23, 2021 at 4:25 PM Marco Neumann 
> >> wrote:
> >>
> >>> set -Xmx to 8G and try not to bind the variable and to see if this
> >>> alleviates the issue.
> >>>
> >>> On Thu, Sep 23, 2021 at 12:41 PM Harri Kiiskinen
> >>> 
> >>> wrote:
> >>>
> >>>> Hi!
> >>>>
> >>>> I'm trying to run a simple update query that reads strings from one
> >>>> graph, processes them, and stores to another:
> >>>>
> >>>>
> >>>>
> --
>
> >>>>
> >>>>insert {
> >>>>  graph vice:pageocrdata_clean {
> >>>>?page vice:ocrtext ?ocr7 .
> >>>>  }
> >>>>}
> >>>>where {
> >>>>  graph vice:pageocrdata {
> >>>>?page vice:ocrtext ?ocr .
> >>>>  }
> >>>>  bind (replace(str(?ocr),'ſ','s') as ?ocr1)
> >>>>  bind (replace(?ocr1,'uͤ','ü') as ?ocr2)
> >>>>  bind (replace(?ocr2,'aͤ'

Re: Heap space problem with insert where

2021-09-23 Thread Marco Neumann
"not to bind" to be read as "just bind once"

On Thu, Sep 23, 2021 at 4:25 PM Marco Neumann 
wrote:

> set -Xmx to 8G and try not to bind the variable and to see if this
> alleviates the issue.
>
> On Thu, Sep 23, 2021 at 12:41 PM Harri Kiiskinen 
> wrote:
>
>> Hi!
>>
>> I'm trying to run a simple update query that reads strings from one
>> graph, processes them, and stores to another:
>>
>>
>> --
>>   insert {
>> graph vice:pageocrdata_clean {
>>   ?page vice:ocrtext ?ocr7 .
>> }
>>   }
>>   where {
>> graph vice:pageocrdata {
>>   ?page vice:ocrtext ?ocr .
>> }
>> bind (replace(str(?ocr),'ſ','s') as ?ocr1)
>> bind (replace(?ocr1,'uͤ','ü') as ?ocr2)
>> bind (replace(?ocr2,'aͤ','ä') as ?ocr3)
>> bind (replace(?ocr3,'oͤ','ö') as ?ocr4)
>> bind (replace(?ocr4,"[⸗—]\n",'') as ?ocr5)
>> bind (replace(?ocr5,"\n",' ') as ?ocr6)
>> bind (replace(?ocr6,"[ ]+",' ') as ?ocr7)
>>   }
>>
>> ---
>> The source graph has some 250,000 triples that fill the WHERE criterium.
>> The strings are one to two thousand characters in length.
>>
>> I'm running the query using the Fuseki web UI, and it ends each time with
>> "Bad Request (#400) Java heap space". The fuseki log does not show any
>> error except for the Bad Request #400. I'm quite surprised by this problem,
>> because the update operation is a simple and straightforward data
>> processing, with no ordering etc.
>>
>> I started with -Xmx2G, but even increasing the heap to -Xmx12G only
>> increases the time it takes for Fuseki to return the same error.
>>
>> Is there something wrong with the SPARQL above? Is there something that
>> increases the memory use unnecessarily?
>>
>> Best,
>>
>> Harri Kiiskinen
>>
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>
>

-- 


---
Marco Neumann
KONA


Re: Heap space problem with insert where

2021-09-23 Thread Marco Neumann
set -Xmx to 8G and try not to bind the variable and to see if this
alleviates the issue.

On Thu, Sep 23, 2021 at 12:41 PM Harri Kiiskinen 
wrote:

> Hi!
>
> I'm trying to run a simple update query that reads strings from one graph,
> processes them, and stores to another:
>
>
> --
>   insert {
> graph vice:pageocrdata_clean {
>   ?page vice:ocrtext ?ocr7 .
> }
>   }
>   where {
> graph vice:pageocrdata {
>   ?page vice:ocrtext ?ocr .
> }
> bind (replace(str(?ocr),'ſ','s') as ?ocr1)
> bind (replace(?ocr1,'uͤ','ü') as ?ocr2)
> bind (replace(?ocr2,'aͤ','ä') as ?ocr3)
> bind (replace(?ocr3,'oͤ','ö') as ?ocr4)
> bind (replace(?ocr4,"[⸗—]\n",'') as ?ocr5)
> bind (replace(?ocr5,"\n",' ') as ?ocr6)
> bind (replace(?ocr6,"[ ]+",' ') as ?ocr7)
>   }
>
> ---
> The source graph has some 250,000 triples that fill the WHERE criterium.
> The strings are one to two thousand characters in length.
>
> I'm running the query using the Fuseki web UI, and it ends each time with
> "Bad Request (#400) Java heap space". The fuseki log does not show any
> error except for the Bad Request #400. I'm quite surprised by this problem,
> because the update operation is a simple and straightforward data
> processing, with no ordering etc.
>
> I started with -Xmx2G, but even increasing the heap to -Xmx12G only
> increases the time it takes for Fuseki to return the same error.
>
> Is there something wrong with the SPARQL above? Is there something that
> increases the memory use unnecessarily?
>
> Best,
>
> Harri Kiiskinen
>


-- 


---
Marco Neumann
KONA


Re: Re: Undesirable SPARQL Jena Query Pattern Behavior with Optionals

2021-08-05 Thread Marco Neumann
Thank you for the link Lorenz. Yes it behaves the same with the same data,
in my test it had a different dataset where ?xLabel was bound but with no
matching ?x variable.

Someone has recently elaborated on this with regards to Jena here.
https://newbedev.com/sparql-optional-query

On Thu, Aug 5, 2021 at 7:44 AM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

> That shouldn't happen for Blazegraph, especially as they even have a
> blog entry for this topic, dubbed "order matters":
> https://github.com/blazegraph/database/wiki/SPARQL_Order_Matters
>
> You can check the Blazegraph query plan as well, just put
> =details to the request URL
>
> You can also try to disable the query optimizer via
>
> SELECT ... WHERE {
>hint:Query hint:optimizer "None".
>...
> }
>
>
> If this changes the result, then there is a bug in Blazegraph
>
> On 03.08.21 19:44, Marco Neumann wrote:
> > OK yes, thank you for the algebra hint and implicit empty graph pattern
> > Andy. Blazegraph has thrown me off here since it seems to interpret the
> > query differently. without preserving the table unit on the left hand
> side
> > I presume.
> >
> > On Tue, Aug 3, 2021 at 6:20 PM Andy Seaborne  wrote:
> >
> >>
> >> On 03/08/2021 17:38, Marco Neumann wrote:
> >>> I have just noticed that the following query pattern with optionals
> >> yields
> >>> undesirable SPARQL query results if the variable (?xLabel) isn't bound.
> >>>
> >>> Let the data be: :x :p :y
> >>>
> >>> SELECT ?x
> >>> WHERE{
> >>> OPTIONAL{ ?x rdfs:label ?xLabel.}
> >>> ?x :p ?y.
> >>> }
> >>   (join
> >> (leftjoin
> >>   (table unit)
> >>   (bgp (triple ?x rdfs:label ?xLabel)))
> >> (bgp (triple ?x :p ?y)
> >>
> >> Note the (table unit) in the leftjoin
> >>
> >> SELECT ?x
> >> WHERE{
> >> {}
> >> OPTIONAL{ ?x rdfs:label ?xLabel.}
> >> ?x :p ?y.
> >> }
> >>
> >>> while reversing the order in the query pattern yields a result as
> >> expected:
> >>> :x
> >>>
> >>> SELECT ?x
> >>> WHERE{
> >>> ?x :p ?y.
> >>> OPTIONAL{ ?x rdfs:label ?xLabel.}
> >>> }
> >> (project (?x)
> >>   (leftjoin
> >> (bgp (triple ?x :p ?y))
> >> (bgp (triple ?x rdfs:label ?xLabel)
> >>
> >>> Is this (the importance of order of optionals) a normal behavior during
> >>> query execution here?
> >> Yes - it is correct. They are different queries with different results
> >> (this is not an optimizer issue).
> >>
> >>   Andy
> >>
> >
>


-- 


---
Marco Neumann
KONA


Re: Undesirable SPARQL Jena Query Pattern Behavior with Optionals

2021-08-03 Thread Marco Neumann
OK yes, thank you for the algebra hint and implicit empty graph pattern
Andy. Blazegraph has thrown me off here since it seems to interpret the
query differently. without preserving the table unit on the left hand side
I presume.

On Tue, Aug 3, 2021 at 6:20 PM Andy Seaborne  wrote:

>
>
> On 03/08/2021 17:38, Marco Neumann wrote:
> > I have just noticed that the following query pattern with optionals
> yields
> > undesirable SPARQL query results if the variable (?xLabel) isn't bound.
> >
> > Let the data be: :x :p :y
> >
> > SELECT ?x
> > WHERE{
> > OPTIONAL{ ?x rdfs:label ?xLabel.}
> > ?x :p ?y.
> > }
>
>  (join
>(leftjoin
>  (table unit)
>  (bgp (triple ?x rdfs:label ?xLabel)))
>(bgp (triple ?x :p ?y)
>
> Note the (table unit) in the leftjoin
>
> SELECT ?x
> WHERE{
>{}
>OPTIONAL{ ?x rdfs:label ?xLabel.}
>?x :p ?y.
> }
>
> >
> > while reversing the order in the query pattern yields a result as
> expected:
> > :x
> >
> > SELECT ?x
> > WHERE{
> > ?x :p ?y.
> > OPTIONAL{ ?x rdfs:label ?xLabel.}
> > }
>
>(project (?x)
>  (leftjoin
>(bgp (triple ?x :p ?y))
>(bgp (triple ?x rdfs:label ?xLabel)
>
> >
> > Is this (the importance of order of optionals) a normal behavior during
> > query execution here?
>
> Yes - it is correct. They are different queries with different results
> (this is not an optimizer issue).
>
>  Andy
>


-- 


---
Marco Neumann
KONA


Undesirable SPARQL Jena Query Pattern Behavior with Optionals

2021-08-03 Thread Marco Neumann
I have just noticed that the following query pattern with optionals yields
undesirable SPARQL query results if the variable (?xLabel) isn't bound.

Let the data be: :x :p :y

SELECT ?x
WHERE{
OPTIONAL{ ?x rdfs:label ?xLabel.}
?x :p ?y.
}

while reversing the order in the query pattern yields a result as expected:
:x

SELECT ?x
WHERE{
?x :p ?y.
OPTIONAL{ ?x rdfs:label ?xLabel.}
}

Is this (the importance of order of optionals) a normal behavior during
query execution here?

-- 


---
Marco Neumann
KONA


Re: scaling jena

2021-04-15 Thread Marco Neumann
Reza, the short answer is Yes, but please try to define "enterprise scale
company" here a little more. Is it primarily about deployment in a
particular "cloud" infrastructure?

Best,
Marco

On Thu, Apr 15, 2021 at 8:24 AM reza sedighi  wrote:

> Hi,
>
> I want to use Jena for an enterprise scale company, so it should be
> scalable in answering queries and also in storing data. Is that possible ?
>
> best regards
> Reza
>


-- 


---
Marco Neumann
KONA


Re: [Apache Fuseki Cluster] Is it possible to create a cluster of fuseki

2021-03-01 Thread Marco Neumann
Marco,
the short answer is yes. please check out the RDF Delta project
https://afs.github.io/rdf-delta/

If this does not correspond with your requirements please elaborate on what
you want to achieve with the cluster.

Best,
Marco


On Mon, Mar 1, 2021 at 8:03 AM Marco Franke  wrote:

> Dear all,
>
>
>
> I read on the JENA e-mail list that an apache Jena cluster setup is
> possible. Up to know, we use just one Apache Fuseki instance but run in
> storage and querying limitations.
>
> If the possibility is given, we would like to create a cluster of Apache
> Fuseki instances. Is this possibly true? If yes, could you please share the
> URL of documentation on how to set it up?
>
>
>
> I really appreciate any help you can provide.
>
>
>
> Kind Regards,
>
> Marco Franke
>
>
>
>
>
> *i.A. Dipl. -Inf. Marco Franke*
>
> Wissenschaftlicher Mitarbeiter in 2.2
>
>
>
> *BIBA** - Bremer Institut für Produktion und Logistik GmbH*
>
>
>
> Informations- und kommunikationstechnische Anwendungen in der Produktion
>
> Prof. Dr.-Ing. Klaus-Dieter Thoben
>
>
>
> Raum 1490
> Tel.: +49 (0)421 218-50089
>
> Fax: +49 (0)421 218-50007
>
> f...@biba.uni-bremen.de
>
>
>
> [image: In-2C-28px-R]
> <https://www.linkedin.com/company/biba.uni-bremen.de>[image:
> apple-touch-icon-180x180]
> <https://www.researchgate.net/institution/BIBA-Bremer_Institut_fuer_Produktion_und_Logistik>
>   [image: FB-f-Logo__blue_29]
> <https://www.facebook.com/BIBA.Produktion.Logistik>  [image:
> YouTube-social-icon_red_48px]
> <http://www.youtube.com/channel/UCieF5Uq5Qix9XZAZuYhTQhg>
>
>
>
> BIBA – Bremer Institut für Produktion und Logistik GmbH
>
> Postanschrift: Postfach P.O.B. 33 05 60 · 28335 Bremen / Germany
>
> Geschäftssitz: Hochschulring 20 · 28359 Bremen / Germany
>
> USt-ID: DE814890109 · Amtsgericht Bremen HRB 24505 HB
>
> Tel: +49 (0)421/218-5 · Fax: +49 (0)421/218-50031
>
> E-Mail: i...@biba.uni-bremen.de · Internet: www.biba.uni-bremen.de
>
> Geschäftsführer: Prof. Dr.-Ing. M. Freitag, Prof. Dr.-Ing. K.-D. Thoben,
> O. Simon
>
>
>
>
>


-- 


---
Marco Neumann
KONA


Re: New attempt with GeoSparql API

2021-02-27 Thread Marco Neumann
Of course in code we can always mix and match.

but we should also track some of the discussion here:

https://github.com/opengeospatial/ogc-geosparql/milestone/1

I prefer to focus on the basics for Jena but possibly going forward keeping
an eye on 1.2 and 2.0

I think it's Important to take a look at the issues with Apache Jena
GeoSPARQL that Milos has mentioned at the Semantic GeoSpatial Web - Use
Cases Workshop 2021 last week.

maybe we do another follow up session on this to make sure we have
addressed the show stoppers.



On Sat, Feb 27, 2021 at 9:51 AM Jean-Marc Vanel 
wrote:

> About performance, all I can say is that indexing 2 dbPedia cities takes
> 608 ms elapsed time from scratch,
> and re-indexing after loading one more city takes 4 ms .
> This is acceptable, and hopefully the re-indexing  time is mostly dependent
> on the increment, not on the overall size of already indexed spatial data.
>
> I'll try GeoSPARQL + Lucene, to see for myself; I see no fundamental reason
> preventing having two different indices on a database (actually altogether
> 16 , the 12 TDB/*.idn plus Lucene plus GeoSparql ).
>
> Time permitting, I also want to try 4.0.0-SNAPSHOT.
>
> Jean-Marc Vanel
> <
> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> +33
> (0)6 89 16 29 52
>
>
> Le sam. 27 févr. 2021 à 10:14, Marco Neumann  a
> écrit :
>
> > On Sat, Feb 27, 2021 at 8:48 AM Jean-Marc Vanel <
> jeanmarc.va...@gmail.com>
> > wrote:
> >
> > > The result is now correct. The missing call is
> > > GeoSPARQLConfig.setupMemoryIndex()
> > > source code updated:
> > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/main/scala/deductions/runtime/jena/GeoSPARLtest.scala#L11
> > >
> > > NOTES
> > >
> > >- but need to re-index after RDF addition;
> > >- setupMemoryIndex()  actually registers special SPARQL
> > predicates,
> > >which is not apparent in method name;
> > >- QUESTIONS:
> > >   -  how expensive in terms of CPU, elapsed time and storage is
> > >   re-indexing?
> > >
> > you will have to test that yourself
> > did you check out the
> >
> >   https://github.com/galbiston/geosparql-benchmarking
> >
> > and
> >
> >  https://github.com/OpenLinkSoftware/GeoSPARQLBenchmark
> >
> > would be nice to compare them
> >
> >
> >   -  how to make re-index automatic?
> > >
> >
> > they should be, of course it depends on your conformance requirements
> with
> > OGC Geosparql as well. query rewriting requires inferencing. try the
> > standalone implementations for your tests first,
> >
> >
> > >   - is GeoSPARQL indexing compatible with Lucene indexing?
> > >
> >
> > no, the geospatial modul uses a different approach to indexing. The
> lucene
> > index is not directly resusably in the Apache Jena geosparql module. But
> > Andy mentioned a resurrection of our lucene spatial integration with Jena
> > 4.
> >
> > we may approach this integration with a compliance register in mind in
> the
> > future from a OGC GeoSPARQL 1,0 conformance level point of view.
> >
> >
> > >
> > > Jean-Marc Vanel
> > > <
> > >
> >
> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> > > >
> > > +33
> > > (0)6 89 16 29 52
> > >
> > >
> > > Le mer. 24 févr. 2021 à 09:17, Jean-Marc Vanel <
> jeanmarc.va...@gmail.com
> > >
> > > a
> > > écrit :
> > >
> > > > The Scala code is here;
> > > >
> > > >
> > >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/main/scala/deductions/runtime/jena/GeoSPARLtest.scala#L11
> > > > starting with empty TDB 1; just load 2 dbPedia cities with geo:
> > > > coordinates, initialize with
> > > >   GeoSPARQLConfig.setupSpatialIndex(dataset)
> > > > and query with spatial:withinBox .
> > > > Alas, the result is empty (see bold line).
> > > >
> > > > Log output :
> > > > 2021-02-24T08:04:14.609Z [run-main-6] INFO
> > o.a.j.g.c.GeoSPARQLOperations
> > > > - Find Mode SRS - Started
> > > > 2021-02-24T08:04:14.633Z [run-main-6] INFO
> > o.a.j.g.c.GeoSPARQLOperations
> > > > - Find Mode SRS - Completed
> > > > 2021-02-24T08:04:14.634Z [run-main-6] INFO
> >

Re: New attempt with GeoSparql API

2021-02-27 Thread Marco Neumann
On Sat, Feb 27, 2021 at 8:48 AM Jean-Marc Vanel 
wrote:

> The result is now correct. The missing call is
> GeoSPARQLConfig.setupMemoryIndex()
> source code updated:
>
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/main/scala/deductions/runtime/jena/GeoSPARLtest.scala#L11
>
> NOTES
>
>- but need to re-index after RDF addition;
>- setupMemoryIndex()  actually registers special SPARQL predicates,
>which is not apparent in method name;
>- QUESTIONS:
>   -  how expensive in terms of CPU, elapsed time and storage is
>   re-indexing?
>
you will have to test that yourself
did you check out the

  https://github.com/galbiston/geosparql-benchmarking

and

 https://github.com/OpenLinkSoftware/GeoSPARQLBenchmark

would be nice to compare them


  -  how to make re-index automatic?
>

they should be, of course it depends on your conformance requirements with
OGC Geosparql as well. query rewriting requires inferencing. try the
standalone implementations for your tests first,


>   - is GeoSPARQL indexing compatible with Lucene indexing?
>

no, the geospatial modul uses a different approach to indexing. The lucene
index is not directly resusably in the Apache Jena geosparql module. But
Andy mentioned a resurrection of our lucene spatial integration with Jena
4.

we may approach this integration with a compliance register in mind in the
future from a OGC GeoSPARQL 1,0 conformance level point of view.


>
> Jean-Marc Vanel
> <
> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> +33
> (0)6 89 16 29 52
>
>
> Le mer. 24 févr. 2021 à 09:17, Jean-Marc Vanel 
> a
> écrit :
>
> > The Scala code is here;
> >
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/main/scala/deductions/runtime/jena/GeoSPARLtest.scala#L11
> > starting with empty TDB 1; just load 2 dbPedia cities with geo:
> > coordinates, initialize with
> >   GeoSPARQLConfig.setupSpatialIndex(dataset)
> > and query with spatial:withinBox .
> > Alas, the result is empty (see bold line).
> >
> > Log output :
> > 2021-02-24T08:04:14.609Z [run-main-6] INFO  o.a.j.g.c.GeoSPARQLOperations
> > - Find Mode SRS - Started
> > 2021-02-24T08:04:14.633Z [run-main-6] INFO  o.a.j.g.c.GeoSPARQLOperations
> > - Find Mode SRS - Completed
> > 2021-02-24T08:04:14.634Z [run-main-6] INFO
> >  o.a.j.geosparql.spatial.SpatialIndex - Building Spatial Index - Started
> > 2021-02-24T08:04:14.634Z [run-main-6] INFO
> >  o.a.j.geosparql.spatial.SpatialIndex - Geo predicate statements found.
> > févr. 24, 2021 8:04:14 AM
> > org.apache.sis.referencing.factory.sql.EPSGFactory 
> >
> > *AVERTISSEMENT: La variable environnementale « SIS_DATA » n’est pas
> > définie.*2021-02-24T08:04:14.973Z [run-main-6] INFO
> >  o.a.j.geosparql.spatial.SpatialIndex - Building Spatial Index -
> Completed
> >
> > *?feature*[success] Total time: 5 s, completed 24 févr. 2021 à 08:04:15
> >
> > If someone wants Java code to try, send me a private mail and I'll write
> > it :) .
> >
> > Jean-Marc Vanel
> > <
> http://semantic-forms.cc:1952/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> > +33 (0)6 89 16 29 52
> > Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
> >  Chroniques jardin
> > <
> http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> >
> >
> >
>


-- 


---
Marco Neumann
KONA


Re: Semantic GeoSpatial Web - Use Cases Workshop

2021-02-20 Thread Marco Neumann
FYI the program for the Semantic GeoSpatial Web - Use Cases Workshop is now
almost complete. To attend the workshop on Thursday 2/25 you will have to
register on the lotico site.

http://www.lotico.com/index.php/Semantic_GeoSpatial_Web_-_Use_Cases_Workshop

On Mon, Oct 26, 2020 at 5:42 PM Marco Neumann 
wrote:

> As discussed at the ApacheCon 2020 Jena Track last month we would like to
> organize a GeoSpatial Semantic Web Use cases session to learn more about
> how our users take advantage of spatial data and access methods in SPARQL
> and Jena.
>
> https://apachecon.com/acah2020/tracks/jena.html
>
> It's almost 14 years since we have introduced the first geospatial
> features to Jena and we are now able to offer the GeoSPARQL module which
> aspires compatibility with the OGC GeoSPARQL 1.0 standard along with the
> Apache Jena release cycle.
>
> If you have a use case or just an application of the spatial features in
> Jena please consider to make a contribution to this event. So please get in
> touch! Examples are not limited to the use of the latest release of the
> Jena spatial modules but any use of spatial data (e.g doesn't have to use
> geometries) is welcome. Even applications that make use of software beyond
> the bundling with Jena .
>
> Please find further details in the coming weeks here and register
>
>
> http://www.lotico.com/index.php/Semantic_GeoSpatial_Web_-_Use_Cases_Workshop
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>
>

-- 


---
Marco Neumann
KONA


Semantic GeoSpatial Web - Use Cases Workshop

2020-10-26 Thread Marco Neumann
As discussed at the ApacheCon 2020 Jena Track last month we would like to
organize a GeoSpatial Semantic Web Use cases session to learn more about
how our users take advantage of spatial data and access methods in SPARQL
and Jena.

https://apachecon.com/acah2020/tracks/jena.html

It's almost 14 years since we have introduced the first geospatial features
to Jena and we are now able to offer the GeoSPARQL module which aspires
compatibility with the OGC GeoSPARQL 1.0 standard along with the Apache
Jena release cycle.

If you have a use case or just an application of the spatial features in
Jena please consider to make a contribution to this event. So please get in
touch! Examples are not limited to the use of the latest release of the
Jena spatial modules but any use of spatial data (e.g doesn't have to use
geometries) is welcome. Even applications that make use of software beyond
the bundling with Jena .

Please find further details in the coming weeks here and register

http://www.lotico.com/index.php/Semantic_GeoSpatial_Web_-_Use_Cases_Workshop


-- 


---
Marco Neumann
KONA


Re: Jena geosparql , simple export use case ; practical doc. needed

2020-10-26 Thread Marco Neumann
the PREFIX should be spatial: <http://jena.apache.org/spatial#>

And yes with regards to the conversion but only if you want to make use of
geosparql functions

It is part of the preparation for the session to collect input before the
event and address questions like yours.



On Mon, Oct 26, 2020 at 2:17 PM Jean-Marc Vanel 
wrote:

> I missed ApacheCon, and I need some concrete hints sooner than next year.
>
> Currently my database gives an empty answer to :
> PREFIX spatial: <http://geovocab.org/spatial#>
> # ?feature spatial:withinBox(?latMin ?lonMin ?latMax ?lonMax [ ?limit])
> SELECT * WHERE {
>   ?feature spatial:withinBox( 43.0 0.0 46.0 10.0 100 )
> } LIMIT 100
>
> Should I apply once for all GeoSPARQLOperations.convert() ?
> static Dataset
> <
> https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/Dataset.html?is-external=true
> >
> convert
> <
> https://jena.apache.org/documentation/javadoc/geosparql/org/apache/jena/geosparql/configuration/GeoSPARQLOperations.html#convert-org.apache.jena.query.Dataset-
> >
> (Dataset
> <
> https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/Dataset.html?is-external=true
> >
>  dataset)
> Convert the input dataset to the most frequent coordinate reference system
> and default datatype.
>
> Jean-Marc Vanel
> <
> http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> +33
> (0)6 89 16 29 52
>
>
> Le lun. 26 oct. 2020 à 11:02, Marco Neumann  a
> écrit :
>
> > good questions Jean-Marc, I will organize a session early in the new year
> > to address some of them. In your case geo:lat and geo:long are not part
> of
> > OGC GeoSPARQL but require a transformation. We have a tool in place for
> > that in Jena. And yes displaying large datasets need dedicated strategies
> > for efficient processing. The points raised by you would make a good
> > problem statement for our session next year. Not sure if you have
> attended
> > the ApachCon 2020 GeoSPARQL session last month but it's where I
> > mentioned further collaboration with third party tool developers as a
> > possibility. stay tuned.
> >
> > Marco
> >
> >
> > On Mon, Oct 26, 2020 at 9:14 AM Jean-Marc Vanel <
> jeanmarc.va...@gmail.com>
> > wrote:
> >
> > > After reading the official
> > > https://jena.apache.org/documentation/geosparql/
> > > I'm puzzled as to concrete how to .
> > >
> > > My simple export use case
> > >
> > >- I have a TDB 1 database with geo:lat and long properties; the Jena
> > >geosparql dependency is added to my application
> > >- I have a LeafLet viewer able to display any RDF document with
> > geo:lat
> > >and long properties ; example map
> > ><
> > >
> >
> https://semantic-forms.cc:1953/assets/geo-map/geo-map.html?link-prefix=http://semantic-forms.cc:1953/display?displayuri==fr=https://semantic-forms.cc:1953/sparql?query=%0APREFIX+form%3A+%3Chttp%3A%2F%2Fraw.githubusercontent.com%2Fjmvanel%2Fsemantic_forms%2Fmaster%2Fvocabulary%2Fforms.owl.ttl%23%3E+%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+%0APREFIX+geo%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2003%2F01%2Fgeo%2Fwgs84_pos%23%3E+%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E+%0A%0ACONSTRUCT+%7B%0A++%3Fthing+geo%3Along+%3FLONG+.%0A++%3Fthing+geo%3Alat+%3FLAT+.%0A++%3Fthing+rdfs%3Alabel+%3FLAB+.%0A++%3Fthing+foaf%3Adepiction+%3FIMG+.%0A%7D+WHERE+%7B%0A++graph+%3Fg+%7B%0A%3Fthing+%3Chttp%3A%2F%2Fpurl.org%2FNET%2Fc4dm%2Fevent.owl%23produced_in%3E+%3Chttp%3A%2F%2Fsemantic-forms.cc%3A1952%2Fldp%2FCormoz%3E+.%0A++%7D%0A++graph+%3Fgcoord+%7B%0A%3Fthing+geo%3Along+%3FLONG+.%0A%3Fthing+geo%3Alat+%3FLAT+.%0A++%7D%0A++OPTIONAL+%7B%0A+++graph+%3Fg1+%7B%0A%3Fthing+rdfs%3Alabel+%3FLAB+%7D+%7D%0A++OPTIONAL+%7B%0A+++graph+%3Fg2+%7B%0A%3Fthing+%3Curn%3AdisplayLabel%3E+%3FLAB+%7D+%7D%0A%0A++OPTIONAL+%7B%0A+++graph+%3Fg3+%7B%0A%3Fthing+foaf%3Adepiction+%3FIMG+%7D+%7D%0A++OPTIONAL+%7B%0A+++graph+%3Fg4+%7B%0A%3Fthing+foaf%3Aimg+%3FIMG+%7D+%7D%0A%0AOPTIONAL+%7B%0A+++graph+%3FgrCount+%7B%0A%3Fthing+form%3AlinksCount+%3FCOUNT.%0A++%7D+%7D%0A%7D%0AORDER+BY+DESC%28%3FCOUNT%29%0A
> > > >
> > >; but I found that JavaScript based displayers are falling on their
> > > knees
> > >(becoming very slow) for thousands of points
> > >- so I plan to use QGIS, so I need an export from RDF to one of the
> > >formats QGIS supports: GML, GeoJSON , etc
> > >
> > > How can I do that ?
> > > Do I have to use one of the convert* methods in
> >

Re: Jena geosparql , simple export use case ; practical doc. needed

2020-10-26 Thread Marco Neumann
good questions Jean-Marc, I will organize a session early in the new year
to address some of them. In your case geo:lat and geo:long are not part of
OGC GeoSPARQL but require a transformation. We have a tool in place for
that in Jena. And yes displaying large datasets need dedicated strategies
for efficient processing. The points raised by you would make a good
problem statement for our session next year. Not sure if you have attended
the ApachCon 2020 GeoSPARQL session last month but it's where I
mentioned further collaboration with third party tool developers as a
possibility. stay tuned.

Marco


On Mon, Oct 26, 2020 at 9:14 AM Jean-Marc Vanel 
wrote:

> After reading the official
> https://jena.apache.org/documentation/geosparql/
> I'm puzzled as to concrete how to .
>
> My simple export use case
>
>- I have a TDB 1 database with geo:lat and long properties; the Jena
>geosparql dependency is added to my application
>- I have a LeafLet viewer able to display any RDF document with geo:lat
>and long properties ; example map
><
> https://semantic-forms.cc:1953/assets/geo-map/geo-map.html?link-prefix=http://semantic-forms.cc:1953/display?displayuri==fr=https://semantic-forms.cc:1953/sparql?query=%0APREFIX+form%3A+%3Chttp%3A%2F%2Fraw.githubusercontent.com%2Fjmvanel%2Fsemantic_forms%2Fmaster%2Fvocabulary%2Fforms.owl.ttl%23%3E+%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+%0APREFIX+geo%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2003%2F01%2Fgeo%2Fwgs84_pos%23%3E+%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E+%0A%0ACONSTRUCT+%7B%0A++%3Fthing+geo%3Along+%3FLONG+.%0A++%3Fthing+geo%3Alat+%3FLAT+.%0A++%3Fthing+rdfs%3Alabel+%3FLAB+.%0A++%3Fthing+foaf%3Adepiction+%3FIMG+.%0A%7D+WHERE+%7B%0A++graph+%3Fg+%7B%0A%3Fthing+%3Chttp%3A%2F%2Fpurl.org%2FNET%2Fc4dm%2Fevent.owl%23produced_in%3E+%3Chttp%3A%2F%2Fsemantic-forms.cc%3A1952%2Fldp%2FCormoz%3E+.%0A++%7D%0A++graph+%3Fgcoord+%7B%0A%3Fthing+geo%3Along+%3FLONG+.%0A%3Fthing+geo%3Alat+%3FLAT+.%0A++%7D%0A++OPTIONAL+%7B%0A+++graph+%3Fg1+%7B%0A%3Fthing+rdfs%3Alabel+%3FLAB+%7D+%7D%0A++OPTIONAL+%7B%0A+++graph+%3Fg2+%7B%0A%3Fthing+%3Curn%3AdisplayLabel%3E+%3FLAB+%7D+%7D%0A%0A++OPTIONAL+%7B%0A+++graph+%3Fg3+%7B%0A%3Fthing+foaf%3Adepiction+%3FIMG+%7D+%7D%0A++OPTIONAL+%7B%0A+++graph+%3Fg4+%7B%0A%3Fthing+foaf%3Aimg+%3FIMG+%7D+%7D%0A%0AOPTIONAL+%7B%0A+++graph+%3FgrCount+%7B%0A%3Fthing+form%3AlinksCount+%3FCOUNT.%0A++%7D+%7D%0A%7D%0AORDER+BY+DESC%28%3FCOUNT%29%0A
> >
>; but I found that JavaScript based displayers are falling on their
> knees
>(becoming very slow) for thousands of points
>- so I plan to use QGIS, so I need an export from RDF to one of the
>formats QGIS supports: GML, GeoJSON , etc
>
> How can I do that ?
> Do I have to use one of the convert* methods in
>
> https://jena.apache.org/documentation/javadoc/geosparql/index.html?org/apache/jena/geosparql/configuration/GeoSPARQLOperations.html
> ?
>
> Jean-Marc Vanel
> <
> http://semantic-forms.cc:1952/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
> >
> +33 (0)6 89 16 29 52
> Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
>  Chroniques jardin
> <
> http://semantic-forms.cc:1952/history?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FChronicle
> >
>


-- 


---
Marco Neumann
KONA


Re: Jena GeoSparQL Fuseki

2020-10-24 Thread Marco Neumann
download and compile the latest version in the repo. it's fixed in 3.17.

Marco

On Sat, Oct 24, 2020 at 9:47 AM Johan Kumps  wrote:

> Hi all,
>
> I'm trying to follow the examples at
> https://jena.apache.org/documentation/geosparql/geosparql-fuseki
>
> When using an in memory dataset like this :
>
> java -jar jena-fuseki-geosparql-3.14.0.jar -rf "data.owl" -i -u
>
> querying works fine. But when adding the -t "TestDS" option :
>
> java -jar jena-fuseki-geosparql-3.14.0.jar -rf "data.owl" -i -u -t "TestDS"
>
> I keep getting :
>
> Exception in thread "main"
> org.apache.jena.tdb.transaction.TDBTransactionException: Not in a
> transaction
> at
>
> org.apache.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:138)
> at
>
> org.apache.jena.tdb.transaction.DatasetGraphTransaction.get(DatasetGraphTransaction.java:49)
> at
>
> org.apache.jena.sparql.core.DatasetGraphWrapper.getR(DatasetGraphWrapper.java:81)
> at
>
> org.apache.jena.sparql.core.DatasetGraphWrapper.isEmpty(DatasetGraphWrapper.java:170)
> at
> org.apache.jena.sparql.core.DatasetImpl.isEmpty(DatasetImpl.java:247)
> at
>
> org.apache.jena.fuseki.geosparql.DatasetOperations.setup(DatasetOperations.java:95)
> at org.apache.jena.fuseki.geosparql.Main.main(Main.java:64)
>
> I tried adding jts-1.13.jar to the classpath but without success:
>
> java -cp jena-fuseki-geosparql-3.14.0.jar;jts-1.13.jar
> org.apache.jena.fuseki.geosparql.Main -rf "data.owl" -i -u -t "TestDS"
>
> Could you help me with this.
>
> Thanks!
> Johan,
>


-- 


---
Marco Neumann
KONA


Re: Jena : 20 years of code.

2020-08-28 Thread Marco Neumann
we might want to upload all releases to  the repository at some point. I
have most of them locally available as zip files. except the 0.x.zip
versions that came after api2.zip.

On Fri, Aug 28, 2020 at 2:06 PM Andy Seaborne  wrote:

> Brian released the first Jena code with this message:
> https://lists.w3.org/Archives/Public/www-rdf-interest/2000Aug/0128.html
>
>  Andy
>
> > From: McBride, Brian 
> > Subject: Jena - A Java API for RDF
> > Date: Mon, 28 Aug 2000 13:40:03 +0100
> > Message-ID: <5e13a1874524d411a876006008cd059f239...@0-mail-1.hpl.hp.com>
> > To: "RDF Interest (E-mail)" 
> >
> > A few weeks ago I posted some suggestions for an improved java RDF API.
> > I've placed an implementation of these ideas at
> > http://www-uk.hpl.hp.com/people/bwm/rdf/jena/index.htm
> > <http://www-uk.hpl.hp.com/people/bwm/rdf/jena/index.htm> .
> >
> > The code supports in memory models.  David Megginson's RDFFilter is
> hooked
> > in so it can parse RDF serializations.  I suggest you use the version of
> > RDFFilter supplied in the distribution as it has some minor bug fixes.
> Its
> > alpha code - it gets through the regression tests and runs the samples
> but
> > hasn't had much use other than that.  An SQL implementation may follow.
> >
> > I think better tools will help encourage the adoption of RDF.  I'm
> looking
> > on this as an experiment to see whether this API brings any benefits.  So
> > I'd like some feedback.
> >
> > Brian McBride
> > HPLabs
>
>
>

-- 


---
Marco Neumann
KONA


Re: Float comparison

2020-08-19 Thread Marco Neumann
tical
> computations are a major concern for the application, RDF is probably not
> the best choice—RDF optimises for other concerns. Therefore the best choice
> for representing non-integer numbers in RDF is usually xsd:decimal—more
> expensive, but no issues with precision.
> >>
> >> Richard
> >
> > xsd:decimal can record any decimal precision but division may loose
> precision - otherwise "1/3" is infinite storage.
> >
> > Jena uses 24 digit precision for division for inexact results like 1/3.
> >
> >>
> >>
> >>> On 18 Aug 2020, at 05:48, Dr. Chavdar Ivanov 
> wrote:
> >>>
> >>> Hello
> >>>
> >>>
> >>>
> >>> I posted the message below to the TopBraid users mailing list and
> >>> already clarified that as sh:equals is based on RDF node equality,
> >>> values such as "1.0"^^xsd:float and "1"^^xsd:float count as distinct.
> >>> So I am keeping this for the interest of others in the list
> >
> > SPARQL has both comparisons.
> >
> > The "sameTerm()" operator for RDF termequality, and SPARQL "=" for value
> comparison (by op:numeric-equal):
> >
> >   Andy
> >
> >>>
> >>>
> >>>
> >>> But on SPARQL float comparison I got an advise to check in this
> mailing list for other opinions.
> >>>
> >>> I understand that SPARQL comparison is mathematically based so 1.0
> should be equal to 1. However below in item 2 you will see the numbers I
> compared and I am getting confused. Take into account that in the data
> graph the 2 compared properties are typed literals with datatype float.
> >>>
> >>> I wanted to know what is the precision when float is compared. So I
> >>> have 2 questions
> >>>
> >>> *   What is the precision? - is it 6th decimal and is it OK to
> compare different forms of float, i.e. one is in scientific form
> >>> *   Why I am getting wrong comparison result for bigger values
> such as100123456.1 and  100123459 which are found as same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Dear all,
> >>>
> >>>
> >>>
> >>> I have a very basic question...
> >>>
> >>> I need to compare literals that are floats and tried to use two ways.
> >>> 1) using sh:equals to compare 2 properties and 2) using SPARQL where
> >>> I filter != different values
> >>>
> >>>
> >>>
> >>> For the filter I tried using
> >>>
> >>> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>>
> >>> or
> >>>
> >>> FILTER (?value1!=?value1).
> >>>
> >>> Both give the same outcome.
> >>>
> >>>
> >>>
> >>> Below I listed a summary of the tests I did
> >>>
> >>>
> >>>
> >>> I think sh:equals treats the literals as strings even though they are
> floats. It also gives 2 results. I thing this looks like according to the
> SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>>
> >>>
> >>>
> >>> However In some cases the result form the SPARQL is kind of strange.
> It looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>>
> >>>
> >>>
> >>> What is followed to define the difference?
> >>>
> >>> If I use google calculator
> >>>
> >>> 100123456.1-100.123459E+06=-2.9000596
> >>>
> >>>
> >>>
> >>> Normally it should be OK to compare different forms of float.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 1) using sh:equals in the property shape
> >>>
> >>> Value1 ; value 2  ; comparisson result
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>>
> >>> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>>
> >>> 30;  30.001 ; different (sh:equals reports it twice)
> >>>
> >>> 30 ;  30.01 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456;  100.123456E6 ; different (sh:equals reports it twice)
> >>>
> >>> 100123456;  100.123456E+06 ; different (sh:equals reports it twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >>> twice)
> >>>
> >>> 100123456.1;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1 ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1 ;  100123459  ; different (sh:equals reports it
> twice)
> >>>
> >>> 100123456.1 ;  100123459.0; different (sh:equals reports it
> twice)
> >>>
> >>>
> >>>
> >>> 2) using SPARQL (in the property shape)
> >>>
> >>> 1.123456 ; 1.123456 ; same
> >>>
> >>> 1.1234560 ; 1.1234561 ; different
> >>>
> >>> 31.1234560 ; 31.1234561 ;different
> >>>
> >>> 30;  30.001 ; same
> >>>
> >>> 30 ;  30.01 ; different
> >>>
> >>> 100123456.0  ; 100123456.1 ; same
> >>>
> >>> 100123456.0  ; 100123456.0 ; same
> >>>
> >>> 100123456;  100.123456E6 ; same
> >>>
> >>> 100123456;  100.123456E+06 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-3 ; same
> >>>
> >>> -0.123456789  ;  -123.456789E-03 ; same
> >>>
> >>> 100123456.1;  100.123456E+06  ; same
> >>>
> >>> 100123456.1 ;   100.123459E+06 ; same
> >>>
> >>> 100123456.1 ;  100123459  ; same
> >>>
> >>> 100123456.1 ;  100123459.0; same
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>> Chavdar
> >>>
> >>>
> >>>
> >>
>


-- 


---
Marco Neumann
KONA


Re: Float comparison

2020-08-18 Thread Marco Neumann
 1) using sh:equals to compare 2 properties and 2) using SPARQL where
> >> I filter != different values
> >>
> >>
> >>
> >> For the filter I tried using
> >>
> >> FILTER (xsd:float(?value1)!=xsd:float(?value1)).
> >>
> >> or
> >>
> >> FILTER (?value1!=?value1).
> >>
> >> Both give the same outcome.
> >>
> >>
> >>
> >> Below I listed a summary of the tests I did
> >>
> >>
> >>
> >> I think sh:equals treats the literals as strings even though they are
> floats. It also gives 2 results. I thing this looks like according to the
> SHACL spec although I didn't if the sh:equals ignores the datatype.
> >>
> >>
> >>
> >> However In some cases the result form the SPARQL is kind of strange. It
> looks like the precision is 10-6, but for the big numbers  and when
> scientific form on float number is used we have something different.
> >>
> >>
> >>
> >> What is followed to define the difference?
> >>
> >> If I use google calculator
> >>
> >> 100123456.1-100.123459E+06=-2.9000596
> >>
> >>
> >>
> >> Normally it should be OK to compare different forms of float.
> >>
> >>
> >>
> >>
> >>
> >> 1) using sh:equals in the property shape
> >>
> >> Value1 ; value 2  ; comparisson result
> >>
> >> 1.123456 ; 1.123456 ; same
> >>
> >> 1.1234560 ; 1.1234561 ; different (sh:equals reports it twice)
> >>
> >> 31.1234560 ; 31.1234561 ;different (sh:equals reports it twice)
> >>
> >> 30;  30.001 ; different (sh:equals reports it twice)
> >>
> >> 30 ;  30.01 ; different (sh:equals reports it twice)
> >>
> >> 100123456.0  ; 100123456.1 ; different (sh:equals reports it twice)
> >>
> >> 100123456.0  ; 100123456.0 ; same
> >>
> >> 100123456;  100.123456E6 ; different (sh:equals reports it twice)
> >>
> >> 100123456;  100.123456E+06 ; different (sh:equals reports it twice)
> >>
> >> -0.123456789  ;  -123.456789E-3 ; different (sh:equals reports it
> >> twice)
> >>
> >> -0.123456789  ;  -123.456789E-03 ; different (sh:equals reports it
> >> twice)
> >>
> >> 100123456.1;  100.123456E+06  ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1 ;   100.123459E+06 ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1 ;  100123459  ; different (sh:equals reports it
> twice)
> >>
> >> 100123456.1 ;  100123459.0; different (sh:equals reports it
> twice)
> >>
> >>
> >>
> >> 2) using SPARQL (in the property shape)
> >>
> >> 1.123456 ; 1.123456 ; same
> >>
> >> 1.1234560 ; 1.1234561 ; different
> >>
> >> 31.1234560 ; 31.1234561 ;different
> >>
> >> 30;  30.001 ; same
> >>
> >> 30 ;  30.01 ; different
> >>
> >> 100123456.0  ; 100123456.1 ; same
> >>
> >> 100123456.0  ; 100123456.0 ; same
> >>
> >> 100123456;  100.123456E6 ; same
> >>
> >> 100123456;  100.123456E+06 ; same
> >>
> >> -0.123456789  ;  -123.456789E-3 ; same
> >>
> >> -0.123456789  ;  -123.456789E-03 ; same
> >>
> >> 100123456.1;  100.123456E+06  ; same
> >>
> >> 100123456.1 ;   100.123459E+06 ; same
> >>
> >> 100123456.1 ;  100123459  ; same
> >>
> >> 100123456.1 ;  100123459.0; same
> >>
> >>
> >>
> >> Best regards
> >>
> >> Chavdar
> >>
> >>
> >>
> >
>


-- 


---
Marco Neumann
KONA


Re: JENA Loader Benchmarks

2020-06-12 Thread Marco Neumann
just reporting some performance related regression with ubuntu 20.04 LTS
and jdk 13.02 of 20%+ compared to ubuntu 19.04 with jdk12.0.1. The
performance difference between jena 3.13 and 3.15 on the other hand is
marginal.

so far everything indicates that this regression seems to be ubuntu 20.04
distro and jdk13.02 related on my end.



On Mon, Jun 24, 2019 at 12:05 AM Andy Seaborne  wrote:

>
>
> On 23/06/2019 10:29, Marco Neumann wrote:
> > yes I'd say the local NVMe SSDs make the difference here. In my case for
> > zone US East and US East 2 the VMs only showing a premium ssd option. So
> > called ultra ssd's seem to be high in demand and currently not available
> in
> > my profile. And they also come at a very high estimated price point.
> >
> > which dataset do you use to run the load test above?
>
> It was a synthetic one to mimic up some work-related data.
>
>  Andy
>
> >
> >
> >
> > On Sat, Jun 22, 2019 at 11:47 PM Andy Seaborne  wrote:
> >
> >>
> >>
> >> On 20/06/2019 16:01, Marco Neumann wrote:
> >>> quick update here on loader performance. Did a modest (in terms of
> cost)
> >>> hardware upgrade of one of the dedicated data processors with a faster
> >> CPU
> >>> and faster NVme SSD drive and was able to almost half our load times.
> >> Very
> >>> satisfied with the HW upgrade and TDB2 loader performance. VM's don't
> >> seem
> >>> to work well for us in combination with TDB.
> >>
> >> My experience has been significant variation across different VM types.
> >> My assumption is the form of virtualization matters.
> >>
> >> I had access to an AWS i3.8xlarge for a short while which had local NVMe
> >> SSDs and got very good performance:
> >>
> >> 500mTDB22,362s  39m 22s 218,460 TPS
> >> 1 billion   TDB25,164s  1h 26m 04s  200,100 TPS
> >>
> >> (this is a single graph dataset)
> >>
> >> i3 are "Storage optimized"
> >>
> >> The TDB2 loader is multithreaded and each thread is working on a
> >> different indexes so the access patterns are jumping around all over the
> >> place both because the non-primary index is, in effect at scale,
> >> randomly accessed, and because multiple indexes are updating at the same
> >> time.
> >>
> >>   Andy
> >>
> >>>
> >>> On Fri, Jun 14, 2019 at 11:56 PM Marco Neumann <
> marco.neum...@gmail.com>
> >>> wrote:
> >>>
> >>>> absolutely it does, preferably NVMe SSD. tdbloaders are almost a
> >> showcase
> >>>> themselves for good up-to-date hardware..
> >>>>
> >>>> if possible I'd like to load the wikidata dataset* at at some point to
> >> see
> >>>> where 57GB fits in terms of tdb. The wikidata team is currently
> looking
> >> at
> >>>> new solutions that can go beyond blazegraph. And I get the impression
> >> that
> >>>> they have not yet actively considered to give jena tdb try.
> >>>>
> >>>> https://dumps.wikimedia.org/wikidatawiki/entities/
> >>>>
> >>>>
> >>>> On Fri, Jun 14, 2019 at 11:47 PM Martynas Jusevičius <
> >>>> marty...@atomgraph.com> wrote:
> >>>>
> >>>>> What about SSD disks, don't they make a difference?
> >>>>>
> >>>>> On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann <
> >> marco.neum...@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> that did the trick Andy, very good might be a good idea to add this
> to
> >>>>> the
> >>>>>> distribution in jena-log4j.properties
> >>>>>>
> >>>>>> I am getting these numbers for a midsize dedicated server, very nice
> >>>>>> numbers indeed Andy. well done!
> >>>>>>
> >>>>>> 00:24:53 INFO  loader   :: Loader = LoaderPhased
> >>>>>> 00:24:53 INFO  loader   :: Start:
> >>>>>> ../../public_html/lotico.ttl.gz
> >>>>>> 00:24:55 INFO  loader   :: Add: 500,000 lotico.ttl.gz
> >>>>> (Batch:
> >>>>>> 237,755 / Avg: 237,755)
> >>>>>> 00:24:56 INFO  loader   :: Add: 1,000,000 lotico.ttl.gz
> >>>>> (Batch:
> >

Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-11 Thread Marco Neumann
Wolfgang, here is another link (I did not find in your link list yet) this
time to setup wikidata with blazegraph in the Google Cloud (GCE)

https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/


On Thu, Jun 11, 2020 at 7:14 AM Wolfgang Fahl  wrote:

>
> Am 10.06.20 um 17:46 schrieb Marco Neumann:
> > Wolfang, I hear you and I've added a dataset today with 1 billion triples
> > and will continue to try to add larger datasets over time.
> > http://www.lotico.com/index.php/JENA_Loader_Benchmarks
> >
> > If you are only specifically interested in the wikidata dump loading
> > process for this thread there is some data available on the wikidata
> > mailing list as well (no data for Jena yet though). It took some users
> 10.2
> > days to load the full Wikidata RDF dump (wikidata-20190513-all-BETA.ttl,
> > 379G) with Blazegraph 2.1.5. and apparently 43 hours with a dev version
> of
> > Virtuoso.
> > https://lists.wikimedia.org/pipermail/wikidata/2019-June/013201.html
>
> Marco - thank you - i have added a section "Performance Reports" now to
> the wiki page "Get your own copy of WikiData"
>
> http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData#Performance_Reports
>
> I'd appreciate to get more reports and pointers to reports for
> successful WikiData dump imports.
>
> Wolfgang
>
> --
>
> BITPlan - smart solutions
> Wolfgang Fahl
> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
> Tel. +49 2154 811-480, Fax +49 2154 811-481
> Web: http://www.bitplan.de
> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548,
> Geschäftsführer: Wolfgang Fahl
>
>
>

-- 


---
Marco Neumann
KONA


Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-10 Thread Marco Neumann
Exactly Andy, thank you for the additional context, and as a matter of fact
we already query / manipulate 150bn+ triples in a LOD cloud as distributed
sets every day.

But of course we frequently see practitioners in the community who look at
the Semantic Web and Jena specifically primarily as a database technology,
while not paying that much attention to the Web and RDF / SPARQL federation
aspects.

That said a lot of what we do here on the list with Jena is indeed geared
towards performance, optimization and features and hence I will continue to
collect sample data for the lotico benchmarks page. The dataset we have
used so far in the benchmarks process simply hits a sweet spot in terms of
hardware requirements and time it takes to run quick tests. And the tests
already gave me valuable hints at how to scale out clusters for other
non-public data sets. BTW if anyone has access to more powerful hardware
configurations I'd be more than happy to test larger datasets for
benchmarking purposes and would include the results in the page :-) . And
as mentioned by Martynas a page on the Jena project site might be a good
idea as well.

Wolfang, I hear you and I've added a dataset today with 1 billion triples
and will continue to try to add larger datasets over time.
http://www.lotico.com/index.php/JENA_Loader_Benchmarks

If you are only specifically interested in the wikidata dump loading
process for this thread there is some data available on the wikidata
mailing list as well (no data for Jena yet though). It took some users 10.2
days to load the full Wikidata RDF dump (wikidata-20190513-all-BETA.ttl,
379G) with Blazegraph 2.1.5. and apparently 43 hours with a dev version of
Virtuoso.
https://lists.wikimedia.org/pipermail/wikidata/2019-June/013201.html

Marco



On Wed, Jun 10, 2020 at 9:39 AM Andy Seaborne  wrote:

>
>
> On 09/06/2020 12:18, Wolfgang Fahl wrote:
> > Marco
> >
> > thank you for sharing your results. Could you please try to make the
> > sample size 10 and 100 times bigger for the discussion we currently have
> > at hand. Getting to a billion triples has not been a problem for the
> > WikiData import. From 1-10 billion triples it gets tougher and
> > for >10 billion triples there is no success story yet that I know of.
> >
> > This brings us to the general question - what will we do in a few years
> > from now when we'd like to work with 100 billion triples or more and the
> > upcoming decades where we might see a rise in data size that stays
> > exponential?
>
> At several levels, the world is going parallel, both one "machine" (a
> computer is a distributed system) and datacenter wide.
>
> Scale comes from multiple machines. There is still mileage in larger
> single machine architectures and better software, but not long term.
>
> At another level - why have all the data in the same place? Convenience.
>
> Search engines are not a feature of WWW architecture. They are an
> emergent effect because it is convenient (simpler, easier) to have one
> place to find things - and that also makes it a winner-takes-all market.
>
> Convenience has limits. Search engine style does not work for all tasks,
> e.g. search within the enterprise for example, or indeed for data. And
> it has consequences in the clandestine data analysis and data abuse.
>
>  Andy
>
> > Wolfgang
> >
> >
> > Am 09.06.20 um 12:17 schrieb Marco Neumann:
> >> same here, I get the best performance on single iron with SSD and fast
> >> DDRAM. The datacenters in the cloud tend to be very selective and you
> can
> >> only get the fast dedicated hardware in a few locations in the cloud.
> >>
> >> http://www.lotico.com/index.php/JENA_Loader_Benchmarks
> >>
> >> In addition keep in mind these are not query benchmarks.
> >>
> >>   Marco
> >>
> >> On Tue, Jun 9, 2020 at 10:27 AM Andy Seaborne  wrote:
> >>
> >>> It maybe that SSD is the important factor.
> >>>
> >>> 1/ From a while ago, on truthy:
> >>>
> >>>
> >>>
> https://lists.apache.org/thread.html/70dde8e3d99ce3d69de613b5013c3f4c583d96161dec494ece49a412%40%3Cusers.jena.apache.org%3E
> >>>
> >>> before tdb2.tdbloader was a thing.
> >>>
> >>> 2/ I did some (not open) testing on a mere 800M and tdb2.tdbloader with
> >>> a Dell XPS laptop (2015 model, 16G RAM, 1T M.2 SSD) and a big AWS
> server
> >>> (local NVMe, but virtualized, SSD).
> >>>
> >>> The laptop was nearly as fast as a big AWS server.
> >>>
> >>> My assumption was that as the database grew, RAM caching become less
> >>> significant and the sp

Re: Resource requirements and configuration for loading a Wikidata dump

2020-06-09 Thread Marco Neumann
same here, I get the best performance on single iron with SSD and fast
DDRAM. The datacenters in the cloud tend to be very selective and you can
only get the fast dedicated hardware in a few locations in the cloud.

http://www.lotico.com/index.php/JENA_Loader_Benchmarks

In addition keep in mind these are not query benchmarks.

 Marco

On Tue, Jun 9, 2020 at 10:27 AM Andy Seaborne  wrote:

> It maybe that SSD is the important factor.
>
> 1/ From a while ago, on truthy:
>
>
> https://lists.apache.org/thread.html/70dde8e3d99ce3d69de613b5013c3f4c583d96161dec494ece49a412%40%3Cusers.jena.apache.org%3E
>
> before tdb2.tdbloader was a thing.
>
> 2/ I did some (not open) testing on a mere 800M and tdb2.tdbloader with
> a Dell XPS laptop (2015 model, 16G RAM, 1T M.2 SSD) and a big AWS server
> (local NVMe, but virtualized, SSD).
>
> The laptop was nearly as fast as a big AWS server.
>
> My assumption was that as the database grew, RAM caching become less
> significant and the speed of I/O was dominant.
>
> FYI When "tdb2.tdbloader --loader=parallel" gets going it will saturate
> the I/O.
>
> 
>
> I don't have access to hardware (or ad hoc AWS machines) at the moment
> otherwise I'd give this a try.
>
> Previously, downloading the data to AWS is much faster and much more
> reliable than to my local setup. That said, I think dumps.wikimedia.org
> does some rate limiting of downloads as well or my route to the site
> ends up on a virtual T3 - I get the magic number of 5MBytes/s sustained
> download speed a lot out of working hours.
>
>  Andy
>
> On 09/06/2020 08:04, Wolfgang Fahl wrote:
> > Hi Johannes,
> >
> > thank you for bringing the issue to this mailinglist again.
> >
> > At
> >
> https://stackoverflow.com/questions/61813248/jena-tdbloader-performance-and-limits
> > there is a question describing the issue and at
> >
> http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData#Test_with_Apache_Jena
> > a documentation of my own attempts. There has been some feedback by a
> > few people in the mean time but i have no report of a success yet. Also
> > the only hints to achieve better performance are currently related to
> > RAM and disk so using lots of RAM (up to 2 Terrrabyte) and SSDs (also
> > some 2 Terrabyte) was mentioned. I asked at my local IT center and the
> > machine with such RAM is around 30-60 thousand EUR and definitely out of
> > my budget. I might invest in a 200 EUR 2 Terrabyte SSD if i could be
> > sure that this would solve the problem. At this time i doubt it since
> > the software keeps crashing on me and there seem to be bugs in Operating
> > System, Java Virtual Machine and Jena itself that prevent the success as
> > well as the severe degradation in performance for multi-billion triple
> > imports that make it almost impossible to test given a estimated time of
> > finish of half a year on (old but sophisticated) hardware that i am
> > using daily.
> >
> > Cheers
> >Wolfgang
> >
> > Am 08.06.20 um 17:54 schrieb Hoffart, Johannes:
> >> Hi,
> >>
> >> I want to load the full Wikidata dump, available at
> https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 to
> use in Jena.
> >>
> >> I tried it using the tdb2.tdbloader with $JVM_ARGS set to -Xmx120G.
> Initially, the progress (measured by dataset size) is quick. It slows down
> very much after a couple of 100GB written, and finally, at around 500GB,
> the progress is almost halted.
> >>
> >> Did anyone ingest Wikidata into Jena before? What are the system
> requirements? Is there a specific tdb2.tdbloader configuration that would
> speed things up? For example building an index after data ingest?
> >>
> >> Thanks
> >> Johannes
> >>
> >> Johannes Hoffart, Executive Director, Technology Division
> >> Goldman Sachs Bank Europe SE | Marienturm | Taunusanlage 9-10 | D-60329
> Frankfurt am Main
> >> Email: johannes.hoff...@gs.com<mailto:johannes.hoff...@gs.com> | Tel:
> +49 (0)69 7532 3558
> >> Vorstand: Dr. Wolfgang Fink (Vorsitzender) | Thomas Degn-Petersen | Dr.
> Matthias Bock
> >> Vorsitzender des Aufsichtsrats: Dermot McDonogh
> >> Sitz: Frankfurt am Main | Amtsgericht Frankfurt am Main HRB 114190
> >>
> >>
> >> 
> >>
> >> Your Personal Data: We may collect and process information about you
> that may be subject to data protection laws. For more information about how
> we use and disclose your personal data, how we protect your information,
> our legal basis to use your information, your rights and who you can
> contact, please refer to: www.gs.com/privacy-notices<
> http://www.gs.com/privacy-notices>
> >>
>


-- 


---
Marco Neumann
KONA


Re: Migrate Web Application from TDB to TDB2

2020-05-01 Thread Marco Neumann
Hi Bart,

do you just try to open a TDB database with TDB2 calls?

Marco

On Fri, May 1, 2020 at 9:08 PM Bart van Leeuwen 
wrote:

> Hi,
>
> I'm trying to migrate a web application which uses TDB to use TDB2
>
> From everything I have ready it should be pretty straight forward, but I
> immediately run into exceptions like:
>
> java.nio.channels.OverlappingFileLockException
> org.apache.jena.tdb2.TDBException: dataset closed
> org.apache.jena.dboe.transaction.txn.TransactionException: Currently in an
> active transaction
>
> I've looked at the samples and the minimal tutorials I could find and
> couldn't find hints  on what could cause this.
>
> I did a initdebug:
>
> JenaSystem.init - start
> Found:
>   InitTDB2 [42]
>   InitRIOT [20]
>   InitARQ  [30]
>   InitJenaCore [10]
> Initialization sequence:
>   JenaInitLevel0   [0]
>   InitJenaCore [10]
>   InitRIOT [20]
>   InitARQ  [30]
>   InitTDB2 [42]
> Init: JenaInitLevel0
> Init: InitJenaCore
> JenaCore.init - start
> JenaCore.init - finish
> Init: InitRIOT
> RIOT.init - start
> RIOT.init - finish
> Init: InitARQ
> ARQ.init - start
> ARQ.init - finish
> Init: InitTDB2
> TDB2.init - start
> TDB.init - finish
> JenaSystem.init - finish
>
> Any help appreciated.
>
> Met Vriendelijke Groet / With Kind Regards
> Bart van Leeuwen
>
>
> twitter: @semanticfire
> tel. +31(0)6-53182997
> Netage B.V.
> http://netage.nl
> Esdoornstraat 3
> 3461ER Linschoten
> The Netherlands
>
-- 


---
Marco Neumann
KONA


Re: Fuseki inference not triggered with incoming data

2020-04-28 Thread Marco Neumann
so Sascha in short try to use one of the Jena build-in reasoners if this is
not bound to a pellet specific task. if it is you will have to bring this
request up with the pellet maintainers.

On Tue, Apr 28, 2020 at 5:20 PM Andy Seaborne  wrote:

>
>
> On 27/04/2020 09:49, Meckler, Sascha wrote:
> > Hi *,
> > I have a question about Fuseki:
> >
> > We have a Fuseki server with one dataset for actual data and one dataset
> with reasoning based on the default union graph (=data from first dataset).
> When I insert new data into the first dataset, the inference is not always
> triggered. The reasoning dataset updates only at the first time or when
> restarting. How can we configurate Fuseki to run the inference every time
> that new data is inserted into the first dataset? Or is there a better way
> to do this?
>
> For forward rules, I think that the data updates have to go through the
> graph with the inference.  Forward rules keep state so if you bypass
> inference the graph, the state does not get updated.
>
> Backwards rules may notice to some extent - I'm unsure of the details here.
>
> Pellet isn't part of the Apache Jena project.
>
>  Andy
>
> >
> > I created a small demo project with an easy example and additional
> documentation [1]. My student asked on StackOverflow [2] in January but
> there was no answer.
> >
> > Thank you very much for your help!
> > Best regards,
> > Sascha
> >
> > [1] https://github.com/smeckler/inference-demo
> > [2]
> https://stackoverflow.com/questions/59952945/reasoning-in-apache-jena-fuseki-reload-dataset-or-trigger-inference
> >
> > __
> > Sascha Meckler
> > Data Spaces and IoT Solutions
> >
> > Fraunhofer-Institut für Integrierte Schaltungen
> <https://www.google.com/maps/search/tut+f%C3%BCr+Integrierte+Schaltungen?entry=gmail=g>
> IIS
> > Nordostpark 93 | 90411 Nürnberg
> >
> > Phone +49 911 58061-9614
> >
> >
>
-- 


---
Marco Neumann
KONA


Re: Lotico Event: Jena-based Components for Building Semantic Web Applications with Claus Stadler 4/2/20

2020-04-03 Thread Marco Neumann
Thank You for joining us yesterday for the lotico event Jena-based
Components for Building Semantic Web Applications with Claus Stadler, we
had 23 attendees in the live session.

The recording is now available on youtube, please find the video link in
the session URL:

http://www.lotico.com/index.php/Jena-based_Components_for_Building_Semantic_Web_Applications

Enjoy and stay safe,
Marco


On Tue, Mar 31, 2020 at 5:11 PM Marco Neumann 
wrote:

> FYI
>
> Jena-based Components for Building Semantic Web Applications
>
>
> http://www.lotico.com/index.php/Jena-based_Components_for_Building_Semantic_Web_Applications
>
> Speaker: Claus Stadler
>
> Location: Online on zoom ( https://zoom.us/j/955345345 )
>
> Date: 2 April 2020
>
> Time: 6pm CEST / 12pm EDT
>
>
> The basics of RDF and SPARQL may seem simple enough at first, but once one
> starts to develop a prototype, one quickly stumbles upon the same recurrent
> problems, such as:
>
> How can performance be improved by means of query caching? How can we deal
> with blank nodes? How to deal with result set limits?
>
> Apache Jena ( http://jena.apache.org/ ) is a powerful Semantic Web
> toolkit, however for the aforementioned issues it does not provide
> out-of-the box solutions. Our original motivation for our independent
> "jena-sparql-api" project was to address these issues in one central place,
> instead of distributing solutions - possibly with various degrees of
> quality - among our applications (e.g. RDFUnit, DL-Learner, LIMES). By now,
> the library has grown. Most notably, it now features declarative Java-RDF
> mappings and RDF processing with reactive streams.
>
> Duration:  60 min
> Session-Type: Technology - Application - Coding
> Session-Level: Intermediate - Advanced
> Session-URL:
> http://www.lotico.com/index.php/Jena-based_Components_for_Building_Semantic_Web_Applications
>
> --
>
>
> ---
> Marco Neumann
> KONA
>
>

-- 


---
Marco Neumann
KONA


Lotico Event: Jena-based Components for Building Semantic Web Applications with Claus Stadler 4/2/20

2020-03-31 Thread Marco Neumann
FYI

Jena-based Components for Building Semantic Web Applications

http://www.lotico.com/index.php/Jena-based_Components_for_Building_Semantic_Web_Applications

Speaker: Claus Stadler

Location: Online on zoom ( https://zoom.us/j/955345345 )

Date: 2 April 2020

Time: 6pm CEST / 12pm EDT


The basics of RDF and SPARQL may seem simple enough at first, but once one
starts to develop a prototype, one quickly stumbles upon the same recurrent
problems, such as:

How can performance be improved by means of query caching? How can we deal
with blank nodes? How to deal with result set limits?

Apache Jena ( http://jena.apache.org/ ) is a powerful Semantic Web toolkit,
however for the aforementioned issues it does not provide out-of-the box
solutions. Our original motivation for our independent "jena-sparql-api"
project was to address these issues in one central place, instead of
distributing solutions - possibly with various degrees of quality - among
our applications (e.g. RDFUnit, DL-Learner, LIMES). By now, the library has
grown. Most notably, it now features declarative Java-RDF mappings and RDF
processing with reactive streams.

Duration:  60 min
Session-Type: Technology - Application - Coding
Session-Level: Intermediate - Advanced
Session-URL:
http://www.lotico.com/index.php/Jena-based_Components_for_Building_Semantic_Web_Applications

-- 


---
Marco Neumann
KONA


Re: Identify SPARQL query's type

2020-03-19 Thread Marco Neumann
excellent looking forward to the presentation Claus. would you mind if we
open this zoom event up to the public?

On Thu, Mar 19, 2020 at 6:30 PM Claus Stadler <
cstad...@informatik.uni-leipzig.de> wrote:

> Hi Marco,
>
> I will prepare a presentation of the most important features next week; I
> can't say right now which day is best, but maybe we can arrange that on
> short notice on the weekend or on Monday via direct mail. As for
> contributions to Jena directly, I am already in contact with Andy via some
> recent JIRA issues and PRs :)
>
>
> I also intend to start the discussion on contributing some relevant parts
> of our extension project to jena directly. The reason why this did not
> happen so far is mainly because it takes significantly more efforts to
> polish code up for a such a big community project and ensuring a good level
> of stability - but some parts are stable and probably of more general
> interest :)
>
>
> Cheers,
>
> Claus
>
>
> On 19.03.20 10:37, Marco Neumann wrote:
> > thank you Claus, there is obviously much more in the Jena-extensions
> > (SmartDataAnalytics / jena-sparql-api).
> >
> > if you want to contribute your work to the Jena project you will have to
> > follow up with Andy directly. But I am not sure this is necessary at the
> > moment since you already provide the code in the public domain
> conveniently
> > as an extension / add-on to the Jena project, which I think is great as
> is
> > for now. Over time we might want to learn from your work and add aspects
> to
> > the overall core Jena project I would think.
> >
> > It would be great if we could schedule a zoom session in order to give us
> > an overview of the "SmartDataAnalytics / jena-sparql-api" extensions
> >
> > could you prepare such a presentation in the coming days?
> >
> > best,
> > Marco
> >
> >
> >
> > On Wed, Mar 18, 2020 at 3:34 PM Claus Stadler <
> > cstad...@informatik.uni-leipzig.de> wrote:
> >
> >> Hi,
> >>
> >>
> >> The SparqlStmt API built against jena 3.14.0 is now available on Maven
> >> Central [1]  in case one want to give it a try (example in [2]) and give
> >> feedback and whether one thinks it would be a useful contribution to
> Jena
> >> directly - and what changes would be necessary if so.
> >>
> >>
> >> 
> >> org.aksw.jena-sparql-api
> >> jena-sparql-api-stmt
> >> 3.14.0-1
> >>
> >> 
> >>
> >>
> >> [1]
> >>
> https://search.maven.org/artifact/org.aksw.jena-sparql-api/jena-sparql-api-stmt/3.14.0-1/jar
> >>
> >> [2]
> >>
> https://github.com/SmartDataAnalytics/jena-sparql-api/blob/def0d3bdf0f4396fbf1ef0715f9697e9bb255029/jena-sparql-api-stmt/src/test/java/org/aksw/jena_sparql_api/stmt/TestSparqlStmtUtils.java#L54
> >>
> >>
> >> Cheers,
> >>
> >> Claus
> >>
> >>
> >>
> >> On 18.03.20 16:04, Andy Seaborne wrote:
> >>> Note that parsing the string as a query aborts early as soon as it
> finds
> >> an update keyword so the cost of parsing isn't very large.
> >>>  Andy
> >>>
> >>> On 18/03/2020 11:58, Marco Neumann wrote:
> >>>> is there some utility function here in the code base now already to do
> >>>> this, or do I still need to roll my own here?
> >>>>
> >>>> On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne 
> wrote:
> >>>>
> >>>>> On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I would like to know if Jena offers a way to detect the type of an
> >>>>> unknow SPARQL request ?Starting from the query string.
> >>>>>> At the moment the only way I succed to code it without "basic
> parsing"
> >>>>> of the query ( sort of thing I prefer avoid, manually parsing string
> >> with
> >>>>> short function often create errors )
> >>>>>> looks like this :
> >>>>>>
> >>>>>> [...]
> >>>>>>   String queryString = "a query string, may be a select or
> an
> >>>>> update";
> >>>>>>try{
> >>>>>>Query select = QueryFactory.create(queryString);
> >>>>>>Service.process_select_query(select);//do some work
> 

Re: Identify SPARQL query's type

2020-03-19 Thread Marco Neumann
thank you Claus, there is obviously much more in the Jena-extensions
(SmartDataAnalytics / jena-sparql-api).

if you want to contribute your work to the Jena project you will have to
follow up with Andy directly. But I am not sure this is necessary at the
moment since you already provide the code in the public domain conveniently
as an extension / add-on to the Jena project, which I think is great as is
for now. Over time we might want to learn from your work and add aspects to
the overall core Jena project I would think.

It would be great if we could schedule a zoom session in order to give us
an overview of the "SmartDataAnalytics / jena-sparql-api" extensions

could you prepare such a presentation in the coming days?

best,
Marco



On Wed, Mar 18, 2020 at 3:34 PM Claus Stadler <
cstad...@informatik.uni-leipzig.de> wrote:

> Hi,
>
>
> The SparqlStmt API built against jena 3.14.0 is now available on Maven
> Central [1]  in case one want to give it a try (example in [2]) and give
> feedback and whether one thinks it would be a useful contribution to Jena
> directly - and what changes would be necessary if so.
>
>
> 
>org.aksw.jena-sparql-api
>jena-sparql-api-stmt
>3.14.0-1
>
> 
>
>
> [1]
> https://search.maven.org/artifact/org.aksw.jena-sparql-api/jena-sparql-api-stmt/3.14.0-1/jar
>
> [2]
> https://github.com/SmartDataAnalytics/jena-sparql-api/blob/def0d3bdf0f4396fbf1ef0715f9697e9bb255029/jena-sparql-api-stmt/src/test/java/org/aksw/jena_sparql_api/stmt/TestSparqlStmtUtils.java#L54
>
>
> Cheers,
>
> Claus
>
>
>
> On 18.03.20 16:04, Andy Seaborne wrote:
> > Note that parsing the string as a query aborts early as soon as it finds
> an update keyword so the cost of parsing isn't very large.
> >
> > Andy
> >
> > On 18/03/2020 11:58, Marco Neumann wrote:
> >> is there some utility function here in the code base now already to do
> >> this, or do I still need to roll my own here?
> >>
> >> On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne  wrote:
> >>
> >>> On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:
> >>>> Hi,
> >>>>
> >>>> I would like to know if Jena offers a way to detect the type of an
> >>> unknow SPARQL request ?Starting from the query string.
> >>>>
> >>>> At the moment the only way I succed to code it without "basic parsing"
> >>> of the query ( sort of thing I prefer avoid, manually parsing string
> with
> >>> short function often create errors )
> >>>> looks like this :
> >>>>
> >>>> [...]
> >>>>  String queryString = "a query string, may be a select or an
> >>> update";
> >>>>
> >>>>   try{
> >>>>   Query select = QueryFactory.create(queryString);
> >>>>   Service.process_select_query(select);//do some work with
> >>> the select
> >>>>   }
> >>>>   catch(QueryException e){
> >>>>   UpdateRequest update =
> UpdateFactory.create(queryString);
> >>>>   Service.process_update_query(update);//do some work with
> >>> the update
> >>>>   }
> >>>>       catch(ProcessException e){
> >>>>   //handle this exception
> >>>>   }
> >>>>
> >>>> [...]
> >>>>
> >>>> So is it possible ? Or not ?
> >>>
> >>> Not currently.
> >>>
> >>> You could use a regexp to spot the SELECT/CONSTRUCT/DESCRIBE/ASK
> keyword
> >>> coming after BASE/PREFIXES/Comments.
> >>>
> >>>  Andy
> >>>
> >>>
> >>
> --
> Dipl. Inf. Claus Stadler
> Department of Computer Science, University of Leipzig
> Research Group: http://aksw.org/
> Workpage & WebID: http://aksw.org/ClausStadler
> Phone: +49 341 97-32260
>
>

-- 


---
Marco Neumann
KONA


Re: Identify SPARQL query's type

2020-03-18 Thread Marco Neumann
thank you Claus and Martynas, both very good ideas here. it's a function we
should move into Jena.

let's look at this in a bit more detail now, I currently envision this to
be a factory method of org.apache.jena.query.Query returning boolean like

.isSelect()
.isAsk()
.isDescribe()
.isUpdate()

Claus your solution would extend the following?

org.apache.jena.sparql.lang.ParserSPARQL11.perform(ParserSPARQL11.java:100)

how is fuseki implementing this during query parsing at the moment?




On Wed, Mar 18, 2020 at 1:00 PM Martynas Jusevičius 
wrote:

> I always wondered why there is no class hierarchy for SPARQL commands,
> similarly to SP vocabulary [1]. Something like
>
> Command
>   Query
> Describe
> Construct
> Select
> Ask
>   Update
> ...
>
> So that one could check command type doing instanceof Update or
> instance of Select instead of query.isSelectType() etc.
>
> [1] https://github.com/spinrdf/spinrdf/blob/master/etc/sp.ttl
>
>
>
> On Wed, Mar 18, 2020 at 12:58 PM Marco Neumann 
> wrote:
> >
> > is there some utility function here in the code base now already to do
> > this, or do I still need to roll my own here?
> >
> > On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne  wrote:
> >
> > > On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:
> > > > Hi,
> > > >
> > > > I would like to know if Jena offers a way to detect the type of an
> > > unknow SPARQL request ?Starting from the query string.
> > > >
> > > > At the moment the only way I succed to code it without "basic
> parsing"
> > > of the query ( sort of thing I prefer avoid, manually parsing string
> with
> > > short function often create errors )
> > > > looks like this :
> > > >
> > > > [...]
> > > > String queryString = "a query string, may be a select or an
> > > update";
> > > >
> > > >  try{
> > > >  Query select = QueryFactory.create(queryString);
> > > >  Service.process_select_query(select);//do some work with
> > > the select
> > > >  }
> > > >  catch(QueryException e){
> > > >  UpdateRequest update =
> UpdateFactory.create(queryString);
> > > >  Service.process_update_query(update);//do some work with
> > > the update
> > > >  }
> > > >  catch(ProcessException e){
> > > >  //handle this exception
> > > >  }
> > > >
> > > > [...]
> > > >
> > > > So is it possible ? Or not ?
> > >
> > > Not currently.
> > >
> > > You could use a regexp to spot the SELECT/CONSTRUCT/DESCRIBE/ASK
> keyword
> > > coming after BASE/PREFIXES/Comments.
> > >
> > > Andy
> > >
> > >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
>


-- 


---
Marco Neumann
KONA


Re: Identify SPARQL query's type

2020-03-18 Thread Marco Neumann
is there some utility function here in the code base now already to do
this, or do I still need to roll my own here?

On Tue, Jul 30, 2013 at 4:25 PM Andy Seaborne  wrote:

> On 30/07/13 10:13, Arthur Vaïsse-Lesteven wrote:
> > Hi,
> >
> > I would like to know if Jena offers a way to detect the type of an
> unknow SPARQL request ?Starting from the query string.
> >
> > At the moment the only way I succed to code it without "basic parsing"
> of the query ( sort of thing I prefer avoid, manually parsing string with
> short function often create errors )
> > looks like this :
> >
> > [...]
> > String queryString = "a query string, may be a select or an
> update";
> >
> >  try{
> >  Query select = QueryFactory.create(queryString);
> >  Service.process_select_query(select);//do some work with
> the select
> >  }
> >  catch(QueryException e){
> >  UpdateRequest update = UpdateFactory.create(queryString);
> >  Service.process_update_query(update);//do some work with
> the update
> >  }
> >  catch(ProcessException e){
> >  //handle this exception
> >  }
> >
> > [...]
> >
> > So is it possible ? Or not ?
>
> Not currently.
>
> You could use a regexp to spot the SELECT/CONSTRUCT/DESCRIBE/ASK keyword
> coming after BASE/PREFIXES/Comments.
>
> Andy
>
>

-- 


---
Marco Neumann
KONA


Re: SPARQL performance question

2020-02-24 Thread Marco Neumann
t; >>> If that's too slow then you'll need a higher performance third party
> >>> reasoner.
> >>>
> >>> Dave
> >>>
> >>> On 23/02/2020 18:57, Steve Vestal wrote:
> >>>> I'm looking for suggestions on a SPARQL performance issue.  My test
> >>>> model has ~800 sentences, and processing of one select query takes
> >>>> about
> >>>> 25 minutes.  The query is a basic graph pattern with 9 variables
> >>>> and 20
> >>>> triples, plus a filter that forces distinct variables to have distinct
> >>>> solutions using pair-wise not-equals constraints.  No option clause or
> >>>> anything else fancy.
> >>>>
> >>>> I am issuing the query against an inference model.  Most of the
> >>>> asserted
> >>>> sentences are in imported models.  If I iterate over all the
> >>>> statements
> >>>> in the OntModel, I get ~1500 almost instantly.  I experimented with
> >>>> several of the reasoners.
> >>>>
> >>>> Below is the basic control flow.  The thing I found curious is that
> >>>> the
> >>>> execSelect() method finishes almost instantly.  It is the iteration
> >>>> over
> >>>> the ResultSet that is taking all the time, it seems in the call to
> >>>> selectResult.hasNext(). The result has 192 rows, 9 columns.  The
> >>>> results
> >>>> are provided in bursts of 8 rows each, with ~1 minute between bursts.
> >>>>
> >>>>   OntModel ontologyModel = getMyOntModel(); // Tried various
> >>>> reasoners
> >>>>   String selectQuery = getMySelectQuery();
> >>>>   QueryExecution selectExec =
> >>>> QueryExecutionFactory.create(selectQuery, ontologyModel);
> >>>>   ResultSet selectResult = selectExec.execSelect();
> >>>>   while (selectResult.hasNext()) {  // Time seems to be
> >>>> spent in
> >>>> hasNext
> >>>>   QuerySolution selectSolution = selectResult.next();
> >>>>   for (String var : getMyVariablesOfInterest() {
> >>>>   RDFNode varValue = selectSolution.get(var);
> >>>>   // process varValue
> >>>>   }
> >>>>   }
> >>>>
> >>>> Any suggestions would be appreciated.
> >>>>
> >>
>


-- 


---
Marco Neumann
KONA


Re: SPARQL performance question

2020-02-23 Thread Marco Neumann
Steve, so subsequent queries on the model should be much faster. +1 to
Dave's comments, matrialization is the way to go here to speed up the
process.

in addition for large datasets I would recommend generic rules or construct
queries if you already know which data points you would like to process to
generate the query model.

On Sun, Feb 23, 2020 at 9:33 PM Dave Reynolds 
wrote:

> The issues is not performance of SPARQL but performance of the inference
> engines.
>
> If you need some OWL inference then your best bet is OWLMicro.
>
> If that's tow slow to query directly then one option to try is to
> materialize the entire inference closure and then query that. You can
> that by simply copying the inference model to a plain model.
>
> If that's too slow then you'll need a higher performance third party
> reasoner.
>
> Dave
>
> On 23/02/2020 18:57, Steve Vestal wrote:
> > I'm looking for suggestions on a SPARQL performance issue.  My test
> > model has ~800 sentences, and processing of one select query takes about
> > 25 minutes.  The query is a basic graph pattern with 9 variables and 20
> > triples, plus a filter that forces distinct variables to have distinct
> > solutions using pair-wise not-equals constraints.  No option clause or
> > anything else fancy.
> >
> > I am issuing the query against an inference model.  Most of the asserted
> > sentences are in imported models.  If I iterate over all the statements
> > in the OntModel, I get ~1500 almost instantly.  I experimented with
> > several of the reasoners.
> >
> > Below is the basic control flow.  The thing I found curious is that the
> > execSelect() method finishes almost instantly.  It is the iteration over
> > the ResultSet that is taking all the time, it seems in the call to
> > selectResult.hasNext(). The result has 192 rows, 9 columns.  The results
> > are provided in bursts of 8 rows each, with ~1 minute between bursts.
> >
> >  OntModel ontologyModel = getMyOntModel(); // Tried various
> reasoners
> >  String selectQuery = getMySelectQuery();
> >  QueryExecution selectExec =
> > QueryExecutionFactory.create(selectQuery, ontologyModel);
> >  ResultSet selectResult = selectExec.execSelect();
> >  while (selectResult.hasNext()) {  // Time seems to be spent in
> > hasNext
> >  QuerySolution selectSolution = selectResult.next();
> >  for (String var : getMyVariablesOfInterest() {
> >  RDFNode varValue = selectSolution.get(var);
> >  // process varValue
> >  }
> >  }
> >
> > Any suggestions would be appreciated.
> >
>
-- 


---
Marco Neumann
KONA


Re: SPARQL performance question

2020-02-23 Thread Marco Neumann
a copy of the actual query + assertions would be advisable here in addition
to runtime information (cpu,mem,os,jdk)


On Sun 23. Feb 2020 at 18:57, Steve Vestal 
wrote:

> I'm looking for suggestions on a SPARQL performance issue.  My test
> model has ~800 sentences, and processing of one select query takes about
> 25 minutes.  The query is a basic graph pattern with 9 variables and 20
> triples, plus a filter that forces distinct variables to have distinct
> solutions using pair-wise not-equals constraints.  No option clause or
> anything else fancy.
>
> I am issuing the query against an inference model.  Most of the asserted
> sentences are in imported models.  If I iterate over all the statements
> in the OntModel, I get ~1500 almost instantly.  I experimented with
> several of the reasoners.
>
> Below is the basic control flow.  The thing I found curious is that the
> execSelect() method finishes almost instantly.  It is the iteration over
> the ResultSet that is taking all the time, it seems in the call to
> selectResult.hasNext(). The result has 192 rows, 9 columns.  The results
> are provided in bursts of 8 rows each, with ~1 minute between bursts.
>
> OntModel ontologyModel = getMyOntModel(); // Tried various
> reasoners
> String selectQuery = getMySelectQuery();
> QueryExecution selectExec =
> QueryExecutionFactory.create(selectQuery, ontologyModel);
> ResultSet selectResult = selectExec.execSelect();
> while (selectResult.hasNext()) {  // Time seems to be spent in
> hasNext
> QuerySolution selectSolution = selectResult.next();
> for (String var : getMyVariablesOfInterest() {
> RDFNode varValue = selectSolution.get(var);
> // process varValue
>     }
> }
>
> Any suggestions would be appreciated.
>
> --


---
Marco Neumann
KONA


Re: using wdqs as a service in a sparql query

2019-12-20 Thread Marco Neumann
you are welcome Jean-Claude, BTW wikidata SPARQL query questions can be
best discussed over at the wikidata mailing lists / sites

On Fri, Dec 20, 2019 at 11:30 AM Jean-Claude Moissinac <
jean-claude.moissi...@telecom-paristech.fr> wrote:

> So, the shared variable ?wikidata must be in the inner select to be seen by
> both triple store.
> Many thank's
> --
> Jean-Claude Moissinac
>
>
>
> Le ven. 20 déc. 2019 à 11:57, Marco Neumann  a
> écrit :
>
> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> > PREFIX wikibase: <http://wikiba.se/ontology#>
> > PREFIX bd: <http://www.bigdata.com/rdf#>
> > SELECT * where{
> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata).
> > SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
> > select ?wikidata ?p ?propLabel ?o ?oLabel where {
> > ?wikidata ?p ?o . ?prop wikibase:directClaim ?p .
> > SERVICE wikibase:label {bd:serviceParam wikibase:language "en,fr" .}
> > } } }
> >
> > On Fri, Dec 20, 2019 at 10:56 AM Marco Neumann 
> > wrote:
> >
> > > ok I see, how about the following
> > >
> > >
> > > On Fri, Dec 20, 2019 at 9:34 AM Jean-Claude Moissinac <
> > > jean-claude.moissi...@telecom-paristech.fr> wrote:
> > >
> > >> In the second code, the bind must be inside the service <...> {...}
> > >> --
> > >> Jean-Claude Moissinac
> > >>
> > >>
> > >>
> > >> Le jeu. 19 déc. 2019 à 16:48, Jean-Claude Moissinac <
> > >> jean-claude.moissi...@telecom-paristech.fr> a écrit :
> > >>
> > >> > Hello
> > >> >
> > >> > In an instance of Fuseki, I'm trying the following query
> > >> >
> > >> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> > >> >
> > >> > prefix wikibase: <http://wikiba.se/ontology#>
> > >> >
> > >> > PREFIX bd: <http://www.bigdata.com/rdf#>
> > >> >
> > >> > SELECT * where
> > >> >
> > >> > {
> > >> >
> > >> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
> > >> >
> > >> > service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
> > >> >
> > >> > select ?p ?propLabel ?o ?oLabel where {
> > >> >
> > >> > ?wikidata ?p ?o . ?prop wikibase:directClaim ?p . SERVICE
> > >> wikibase:label {
> > >> > bd:serviceParam wikibase:language "en,fr" .
> > >> >
> > >> > } } } }
> > >> >
> > >> >
> > >> > which fails with an error 500 (Error 500: HTTP 500 error making the
> > >> query:
> > >> > Internal Server Error)
> > >> >
> > >> > While the following one gives results:
> > >> >
> > >> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> > >> >
> > >> > prefix wikibase: <http://wikiba.se/ontology#>
> > >> >
> > >> > PREFIX bd: <http://www.bigdata.com/rdf#>
> > >> >
> > >> > SELECT * where {
> > >> >
> > >> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
> > >> >
> > >> > service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
> > >> >
> > >> > select ?p ?propLabel ?o ?oLabel where {
> > >> >
> > >> > <http://www.wikidata.org/entity/Q640447> ?p ?o .
> > >> >
> > >> > ?prop wikibase:directClaim ?p . SERVICE wikibase:label {
> > bd:serviceParam
> > >> > wikibase:language "en,fr" .
> > >> >
> > >> > } } } }
> > >> >
> > >> >
> > >> > In my real query, in place of the bind, I have some code which
> selects
> > >> > some wikidata entities. The goal is to get a wikidata description of
> > >> these
> > >> > entities
> > >> >
> > >> > Have you some ideas?
> > >> >
> > >> >
> > >> > --
> > >> > Jean-Claude Moissinac
> > >> >
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > >
> > > ---
> > > Marco Neumann
> > > KONA
> > >
> > >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
> >
>


-- 


---
Marco Neumann
KONA


Re: using wdqs as a service in a sparql query

2019-12-20 Thread Marco Neumann
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>
SELECT * where{
bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata).
SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
select ?wikidata ?p ?propLabel ?o ?oLabel where {
?wikidata ?p ?o . ?prop wikibase:directClaim ?p .
SERVICE wikibase:label {bd:serviceParam wikibase:language "en,fr" .}
} } }

On Fri, Dec 20, 2019 at 10:56 AM Marco Neumann 
wrote:

> ok I see, how about the following
>
>
> On Fri, Dec 20, 2019 at 9:34 AM Jean-Claude Moissinac <
> jean-claude.moissi...@telecom-paristech.fr> wrote:
>
>> In the second code, the bind must be inside the service <...> {...}
>> --
>> Jean-Claude Moissinac
>>
>>
>>
>> Le jeu. 19 déc. 2019 à 16:48, Jean-Claude Moissinac <
>> jean-claude.moissi...@telecom-paristech.fr> a écrit :
>>
>> > Hello
>> >
>> > In an instance of Fuseki, I'm trying the following query
>> >
>> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
>> >
>> > prefix wikibase: <http://wikiba.se/ontology#>
>> >
>> > PREFIX bd: <http://www.bigdata.com/rdf#>
>> >
>> > SELECT * where
>> >
>> > {
>> >
>> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
>> >
>> > service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
>> >
>> > select ?p ?propLabel ?o ?oLabel where {
>> >
>> > ?wikidata ?p ?o . ?prop wikibase:directClaim ?p . SERVICE
>> wikibase:label {
>> > bd:serviceParam wikibase:language "en,fr" .
>> >
>> > } } } }
>> >
>> >
>> > which fails with an error 500 (Error 500: HTTP 500 error making the
>> query:
>> > Internal Server Error)
>> >
>> > While the following one gives results:
>> >
>> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
>> >
>> > prefix wikibase: <http://wikiba.se/ontology#>
>> >
>> > PREFIX bd: <http://www.bigdata.com/rdf#>
>> >
>> > SELECT * where {
>> >
>> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
>> >
>> > service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
>> >
>> > select ?p ?propLabel ?o ?oLabel where {
>> >
>> > <http://www.wikidata.org/entity/Q640447> ?p ?o .
>> >
>> > ?prop wikibase:directClaim ?p . SERVICE wikibase:label { bd:serviceParam
>> > wikibase:language "en,fr" .
>> >
>> > } } } }
>> >
>> >
>> > In my real query, in place of the bind, I have some code which selects
>> > some wikidata entities. The goal is to get a wikidata description of
>> these
>> > entities
>> >
>> > Have you some ideas?
>> >
>> >
>> > --
>> > Jean-Claude Moissinac
>> >
>> >
>>
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>
>

-- 


---
Marco Neumann
KONA


Re: using wdqs as a service in a sparql query

2019-12-20 Thread Marco Neumann
ok I see, how about the following


On Fri, Dec 20, 2019 at 9:34 AM Jean-Claude Moissinac <
jean-claude.moissi...@telecom-paristech.fr> wrote:

> In the second code, the bind must be inside the service <...> {...}
> --
> Jean-Claude Moissinac
>
>
>
> Le jeu. 19 déc. 2019 à 16:48, Jean-Claude Moissinac <
> jean-claude.moissi...@telecom-paristech.fr> a écrit :
>
> > Hello
> >
> > In an instance of Fuseki, I'm trying the following query
> >
> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> >
> > prefix wikibase: <http://wikiba.se/ontology#>
> >
> > PREFIX bd: <http://www.bigdata.com/rdf#>
> >
> > SELECT * where
> >
> > {
> >
> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
> >
> > service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
> >
> > select ?p ?propLabel ?o ?oLabel where {
> >
> > ?wikidata ?p ?o . ?prop wikibase:directClaim ?p . SERVICE wikibase:label
> {
> > bd:serviceParam wikibase:language "en,fr" .
> >
> > } } } }
> >
> >
> > which fails with an error 500 (Error 500: HTTP 500 error making the
> query:
> > Internal Server Error)
> >
> > While the following one gives results:
> >
> > PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> >
> > prefix wikibase: <http://wikiba.se/ontology#>
> >
> > PREFIX bd: <http://www.bigdata.com/rdf#>
> >
> > SELECT * where {
> >
> > bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
> >
> > service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
> >
> > select ?p ?propLabel ?o ?oLabel where {
> >
> > <http://www.wikidata.org/entity/Q640447> ?p ?o .
> >
> > ?prop wikibase:directClaim ?p . SERVICE wikibase:label { bd:serviceParam
> > wikibase:language "en,fr" .
> >
> > } } } }
> >
> >
> > In my real query, in place of the bind, I have some code which selects
> > some wikidata entities. The goal is to get a wikidata description of
> these
> > entities
> >
> > Have you some ideas?
> >
> >
> > --
> > Jean-Claude Moissinac
> >
> >
>


-- 


---
Marco Neumann
KONA


Re: using wdqs as a service in a sparql query

2019-12-19 Thread Marco Neumann
Jean-Claude, the variable isn't visible, or in scope, to your select
statement

this is how you can to do it

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT * WHERE {
SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
SELECT ?p ?propLabel ?o ?oLabel WHERE {
BIND(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
?wikidata ?p ?o . ?prop wikibase:directClaim ?p .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" .}
}
}
}

On Thu, Dec 19, 2019 at 3:48 PM Jean-Claude Moissinac <
jean-claude.moissi...@telecom-paristech.fr> wrote:

> Hello
>
> In an instance of Fuseki, I'm trying the following query
>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
>
> prefix wikibase: <http://wikiba.se/ontology#>
>
> PREFIX bd: <http://www.bigdata.com/rdf#>
>
> SELECT * where
>
> {
>
> bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
>
> service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
>
> select ?p ?propLabel ?o ?oLabel where {
>
> ?wikidata ?p ?o . ?prop wikibase:directClaim ?p . SERVICE wikibase:label {
> bd:serviceParam wikibase:language "en,fr" .
>
> } } } }
>
>
> which fails with an error 500 (Error 500: HTTP 500 error making the query:
> Internal Server Error)
>
> While the following one gives results:
>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
>
> prefix wikibase: <http://wikiba.se/ontology#>
>
> PREFIX bd: <http://www.bigdata.com/rdf#>
>
> SELECT * where {
>
> bind(<http://www.wikidata.org/entity/Q640447> as ?wikidata)
>
> service <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
>
> select ?p ?propLabel ?o ?oLabel where {
>
> <http://www.wikidata.org/entity/Q640447> ?p ?o .
>
> ?prop wikibase:directClaim ?p . SERVICE wikibase:label { bd:serviceParam
> wikibase:language "en,fr" .
>
> } } } }
>
>
> In my real query, in place of the bind, I have some code which selects some
> wikidata entities. The goal is to get a wikidata description of these
> entities
>
> Have you some ideas?
>
>
> --
> Jean-Claude Moissinac
>


-- 


---
Marco Neumann
KONA


Re: StAX parsing error when querying DBpedia

2019-12-01 Thread Marco Neumann
certainly the project finished in 2008, not a critic on the effort here at
all. it just came unexpected  into the spanner of my LOD scan. And is not
related to the XML1.1 issue but since I flagged SystemARQ.UseSAX = true it
produced the same error along with the issues mentioned above.

On Sun, Dec 1, 2019 at 6:27 PM Andy Seaborne  wrote:

>
>
> On 30/11/2019 21:32, Marco Neumann wrote:
> > ok yes I see, I was running into the issue on the ReSIST SPARQL endpoints
> > at the School of Electronics and Computer Science at the University of
> > Southampton (UK)
> >
> > eg http://acm.rkbexplorer.com/sparql/
>
> Probably pre-dates SPARQL 1.0 final and is quite possibly 3Store
> (judging by the SQL errors), just running on MariaDB these days. Kudos
> to University of Southampton for keeping it running.
>
> >
> > they seem to use a non standard ask syntax as well
> >
> > thx Andy that answered my question here.
> >
> > best,
> > Marco
> >
> > On Sat, Nov 30, 2019 at 8:46 PM Andy Seaborne  wrote:
> >
> >>
> >>
> >> On 30/11/2019 19:31, Marco Neumann wrote:
> >>> this still seems to be a thing, I get the same error on 3.13.1 now
> >>> with qexec.execAsk() which doesn't provide content negotiation.
> >>
> >> It sends:
> >>
> >> Accept:
> >> application/sparql-results+json, application/sparql-results+xml;q=0.9,
> >> text/tab-separated-values;q=0.7, text/csv;q=0.5, application/json;q=0.2,
> >> application/xml;q=0.2, */*;q=0.1
> >>
> >> which is settable via QueryEgineHTTP (the QueryExecution implementation)
> >>
> >> or (better) using RDFConnection.queryAsk(Query)
> >>
> >> but not queryAsk(String) which passes through strings without parsing
> >> (for extension syntax of other engines).
> >>
> >>   Andy
> >>
> >>> in addition
> >>> a method to ignore cert evaluation for https would make sense here
> since
> >> a
> >>> large number of of sparql sites seem to have an invalid cert status.
> >>>
> >>> On Tue, May 19, 2015 at 6:31 PM Andy Seaborne  wrote:
> >>>
> >>>> On 19/05/15 17:18, Olivier Rossel wrote:
> >>>>> Should we still ask DBPedia to switch back to XML 1.0 ?
> >>>>
> >>>> It is not just Jena - it's the laggardly state of the JVM so it might
> >>>> affect others.  If it turns out to be a recurrent question, then it
> >>>> would be good to let them know -- I guess the reason is to get the
> wider
> >>>> character support in XML 1.1, not tied to a version of Unicode.
> >>>> Of course, wikipedia can be a bit messy but fixing on ingestion helps
> >>>> everyone.
> >>>>
> >>>>   Andy
> >>>>
> >>>> http://norman.walsh.name/2004/09/30/xml11
> >>>>
> >>>>>
> >>>>> On Wed, May 13, 2015 at 9:02 PM, Andy Seaborne 
> >> wrote:
> >>>>>> On 13/05/15 15:27, Rob Vesse wrote:
> >>>>>>>
> >>>>>>> I assume you'll go ahead and file a bug against Xerces?
> >>>>>>
> >>>>>>
> >>>>>> The issue does not seem to be in Apache Xerces.
> >>>>>>
> >>>>>> Jena is picking up the JDK XMLStreamReader implementation.
> >>>>>>
> >>>>>> Xerces does not provide javax.xml.stream.XMLInputFactory and
> >>>> XMLStreamReader
> >>>>>> at least its not in META-INF/services
> >>>>>>
> >>>>>> It means that adding org.codehaus.woodstox:wstx-asl is a valid
> >>>> workaround
> >>>>>> always as the default JDK provider is not used unless there are no
> >>>>>> XMLInputFactory registered (ServiceLoader).
> >>>>>>
> >>>>>> Its surprising that the JDK bug is still open as the fix for the JDK
> >>>> looks
> >>>>>> small.
> >>>>>>
> >>>>>>Andy
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Rob
> >>>>>>>
> >>>>>>> On 13/05/2015 14:56, "Andy Seaborne"  wrote:
> >>>>>>>
> >>>>>>>> So far we know:
> >>>>>>>>
> >&g

Re: StAX parsing error when querying DBpedia

2019-11-30 Thread Marco Neumann
ok yes I see, I was running into the issue on the ReSIST SPARQL endpoints
at the School of Electronics and Computer Science at the University of
Southampton (UK)

eg http://acm.rkbexplorer.com/sparql/

they seem to use a non standard ask syntax as well

thx Andy that answered my question here.

best,
Marco

On Sat, Nov 30, 2019 at 8:46 PM Andy Seaborne  wrote:

>
>
> On 30/11/2019 19:31, Marco Neumann wrote:
> > this still seems to be a thing, I get the same error on 3.13.1 now
> > with qexec.execAsk() which doesn't provide content negotiation.
>
> It sends:
>
> Accept:
> application/sparql-results+json, application/sparql-results+xml;q=0.9,
> text/tab-separated-values;q=0.7, text/csv;q=0.5, application/json;q=0.2,
> application/xml;q=0.2, */*;q=0.1
>
> which is settable via QueryEgineHTTP (the QueryExecution implementation)
>
> or (better) using RDFConnection.queryAsk(Query)
>
> but not queryAsk(String) which passes through strings without parsing
> (for extension syntax of other engines).
>
>  Andy
>
> > in addition
> > a method to ignore cert evaluation for https would make sense here since
> a
> > large number of of sparql sites seem to have an invalid cert status.
> >
> > On Tue, May 19, 2015 at 6:31 PM Andy Seaborne  wrote:
> >
> >> On 19/05/15 17:18, Olivier Rossel wrote:
> >>> Should we still ask DBPedia to switch back to XML 1.0 ?
> >>
> >> It is not just Jena - it's the laggardly state of the JVM so it might
> >> affect others.  If it turns out to be a recurrent question, then it
> >> would be good to let them know -- I guess the reason is to get the wider
> >> character support in XML 1.1, not tied to a version of Unicode.
> >> Of course, wikipedia can be a bit messy but fixing on ingestion helps
> >> everyone.
> >>
> >>  Andy
> >>
> >> http://norman.walsh.name/2004/09/30/xml11
> >>
> >>>
> >>> On Wed, May 13, 2015 at 9:02 PM, Andy Seaborne 
> wrote:
> >>>> On 13/05/15 15:27, Rob Vesse wrote:
> >>>>>
> >>>>> I assume you'll go ahead and file a bug against Xerces?
> >>>>
> >>>>
> >>>> The issue does not seem to be in Apache Xerces.
> >>>>
> >>>> Jena is picking up the JDK XMLStreamReader implementation.
> >>>>
> >>>> Xerces does not provide javax.xml.stream.XMLInputFactory and
> >> XMLStreamReader
> >>>> at least its not in META-INF/services
> >>>>
> >>>> It means that adding org.codehaus.woodstox:wstx-asl is a valid
> >> workaround
> >>>> always as the default JDK provider is not used unless there are no
> >>>> XMLInputFactory registered (ServiceLoader).
> >>>>
> >>>> Its surprising that the JDK bug is still open as the fix for the JDK
> >> looks
> >>>> small.
> >>>>
> >>>>   Andy
> >>>>
> >>>>
> >>>>>
> >>>>> Rob
> >>>>>
> >>>>> On 13/05/2015 14:56, "Andy Seaborne"  wrote:
> >>>>>
> >>>>>> So far we know:
> >>>>>>
> >>>>>> It is a bug in Xerces handling of 1.1
> >>>>>>
> >>>>>> Specifically, an NPE
> >>>>>>   XML11NSDocumentScannerImpl:scanStartElement line 356
> >>>>>>
> >>>>>> (a big +1 to open source here)
> >>>>>>
> >>>>>> 1/ The first problem line hit is 
> >>>>>>
> >>>>>> "/>" is the trigger.
> >>>>>>
> >>>>>>  would work.
> >>>>>>
> >>>>>> 2/ It affects Xerces 2.11.0 and also the Xerces inside OpenJDK.
> >>>>>> https://bugs.openjdk.java.net/browse/JDK-8029437
> >>>>>>
> >>>>>> 3/ Adding org.codehaus.woodstox:wstx-asl to the dependencies can fix
> >> it
> >>>>>> (may depend on ordering) - e.g. add jena-text to your project (!!!).
> >>>>>> because it picks up a different STaX parser.
> >>>>>>
> >>>>>>   Andy
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 13/05/15 14:06, Rob Vesse wrote:
> >

Re: StAX parsing error when querying DBpedia

2019-11-30 Thread Marco Neumann
>> %5
> >>>>>> Cbact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A”
> >>>>>>
> >>>>>> On 13 May 2015, at 12:32, Rob Vesse  wrote:
> >>>>>>
> >>>>>>> What is the error message you get?
> >>>>>>>
> >>>>>>> It is not unheard of for Virtuoso (the software that powers
> DBPedia)
> >>>>>>> to
> >>>>>>> produce bad output particularly if the data has not been
> appropriately
> >>>>>>> sanitised so I would suspect Virtuoso before suspecting Jena in a
> case
> >>>>>>> like this
> >>>>>>>
> >>>>>>> Rob
> >>>>>>>
> >>>>>>> On 13/05/2015 10:16, "Jeremy Debattista"  >
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Dear All,
> >>>>>>>>
> >>>>>>>> I am trying to query the DBpedia SPARQL endpoint using the
> >>>>>>>> QueryExecutionFactory sparqlService and execSelect(), but I’m
> given
> >>>>>>>> the
> >>>>>>>> following error:
> com.hp.hpl.jena.sparql.resultset.ResultSetException:
> >>>>>>>> Failed when initializing the StAX parsing engine
> >>>>>>>>
> >>>>>>>> The query in question is
> >>>>>>>> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX
> >>>>>>>> rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX
> >>>>>>>> owl:<http://www.w3.org/2002/07/owl#>  SELECT distinct ?class
> ?label
> >>>>>>>> WHERE { {?class rdf:type owl:Class} UNION {?class rdf:type
> >>>>>>>> rdfs:Class}.
> >>>>>>>> ?class rdfs:label ?label.   FILTER(bound(?label)  && REGEX(?label,
> >>>>>>>> "\\bact","i"))} ORDER BY ?class
> >>>>>>>>
> >>>>>>>> which gives a result in dbpedia sparql web interface [1].
> >>>>>>>>
> >>>>>>>> The code in question is the following:
> >>>>>>>>
> >>>>>>>> public static ResultSet executeQuery(String uri, String
> queryString)
> >>>>>>>> {
> >>>>>>>>  Query query = QueryFactory.create(queryString);
> >>>>>>>>  QueryExecution qexec =
> >>>>>>>> QueryExecutionFactory.sparqlService(uri,
> >>>>>>>> query);
> >>>>>>>>  try {
> >>>>>>>>  ResultSet results = qexec.execSelect();
> >>>>>>>>  return results;
> >>>>>>>>  } finally {
> >>>>>>>>
> >>>>>>>>  }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> After debugging, the problem seems to be related to how the XML
> >>>>>>>> parser
> >>>>>>>> is
> >>>>>>>> reading the stream input. Would you have any other idea how I can
> go
> >>>>>>>> around it?
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Jeremy
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org
> >>>>>>>> ue
> >>>>>>>> ry
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> =PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org
> %2F1999%2F02%2F22-rdf-syntax-n
> >>>>>>>> s%
> >>>>>>>> 23
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> %3E+PREFIX+rdfs%3A%3Chttp%3A%2F%2Fwww.w3.org
> %2F2000%2F01%2Frdf-schema%
> >>>>>>>> 23
> >>>>>>>> %3
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> E+PREFIX+owl%3A%3Chttp%3A%2F%2Fwww.w3.org
> %2F2002%2F07%2Fowl%23%3E++SEL
> >>>>>>>> EC
> >>>>>>>> T+
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> distinct+%3Fclass+%3Flabel++WHERE+%7B+%7B%3Fclass+rdf%3Atype+owl%3ACla
> >>>>>>>> ss
> >>>>>>>> %7
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> D+UNION+%7B%3Fclass+rdf%3Atype+rdfs%3AClass%7D.+%3Fclass+rdfs%3Alabel+
> >>>>>>>> %3
> >>>>>>>> Fl
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> abel.+++FILTER%28bound%28%3Flabel%29++%26%26+REGEX%28%3Flabel%2C+%22%5
> >>>>>>>> C%
> >>>>>>>> 5C
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> bact%22%2C%22i%22%29%29%7D+ORDER+BY+%3Fclass%0D%0A=text%2Fhtml&
> >>>>>>>> ti
> >>>>>>>> me
> >>>>>>>> out=3=on
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>
>
>

-- 


---
Marco Neumann
KONA


Re: Missing triples after loading RDF to Fuseki

2019-11-25 Thread Marco Neumann
quot; , "value": "Efectividad del misoprostol para
> el aborto en el primer trimestre del embarazo en adolescentes." } ,
> "name": { "type": "literal" , "value": "Annia Najarro" } ,
> "affiliation": { "type": "literal" , "value": "Hospital provincial
> universitario \u201CDr. Gustavo Aldereguía Lima\u201D. Cienfuegos." } ,
> "abstract": { "type": "literal" , "value": "Some text here." } ,
> "uri": { "type": "literal" , "value": "
> http://www.medisur.sld.cu/index.php/medisur/article/view/353; } ,
> "year": { "type": "literal" , "datatype": "
> http://www.w3.org/2001/XMLSchema#integer; , "value": "2008" }
> }
> ]
> }
> }
>
> It seems that the authors without affiliation triple aren't in the RDF
> graph. It is normal this behavior?
>
> Best regards.
> Yusniel Hidalgo.
>
> 1519-2019: Aniversario 500 de la Villa de San Cristóbal de La Habana
>
> Por La Habana, lo más grande. #Habana500 #UCIxHabana500
>
> --


---
Marco Neumann
KONA


Re: Graph status?

2019-09-08 Thread Marco Neumann
yes I have noticed that, looks like we do so since we moved to Jena 3. Do
you think it's feasible to recommend Commons to implement a Commons Graph
as custom Jena Graph to utilize the Jena API in the future? What does guava
take care of in the current Jena context?




On Sun, Sep 8, 2019 at 8:44 PM Andy Seaborne  wrote:

>
>
> On 08/09/2019 09:16, Marco Neumann wrote:
> > Romain,
> >
> > sure Jena gives you so much more than just the basic graph
> infrastructure.
> > I wasn't acutely aware of the guava shade mandatory requirement in a
> > minimal viable setup of Jena. Still I would encourage you to engage with
> > Jena community to discuss design ideas and opportunity for reuse of jena
> > components in your work on graph at commons and apache.
>
> Yes, Jena uses a shaded guava.  Used as a library in an application,
> Jena choice of libraries can get into version issues. Guava used to
> change between version quite a bit (from general reading, I think it is
> more stable nowadays) so we don't force the Guava version on the
> application but shade it to make the Jena use independent of the app
> choice.
>
> >
> > FYI I currently need the following libraries in a minimal viable setup to
> > work with the Jena graph api.
> >
> > jena-base (215kb), jena-core (1.69mb), jena-shaded-guave (2.73mb), log4j
> > (479kb), slf4j-log (41kb), slf4j-api (12kb)
>
> FYI:
> A choice of slf4j logging impl is needed - it does not have to be
> log4j1. Testing and jena-cmds does make that choice but for general use
> of the modules, it is choice-independent.
>
> >
> > Marco
> >
> > On Sat, Sep 7, 2019 at 10:33 PM Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> Hi Marco,
> >>
> >> Had a look to jena for another project and didnt evaluate it here for
> these
> >> reasons (happy to be wrong):
> >>
> >> - dep stack was huge for only graph part (guava shade, some other
> uneeded
> >> commons etc, most being excludable but without guarantees in time)
> >> - it is not about DAG and therefore misses navigation methods (which is
> >> what I need in addition to "mutation" methods for the algo i want to
> impl)
> >> - it is not the goal of jena so API and core stack can evolve in an
> >> undesired manner
> >>
> >> To mention alternatives, spark, flink, beam, ignite for the few I can
> think
> >> about, have something not crazy but still this stack and API issues :(.
> >>
> >> This is how i ended up looking commons, to try to have something stable
> and
> >> dep free.
> >>
> >> Romain
> >>
> >>
> >> Le sam. 7 sept. 2019 à 23:15, Marco Neumann  a
> >> écrit :
> >>
> >>> I highly recommend to take a look at the Apache Jena project for
> >>> inspiration here. It has a very mature graph representationat this
> point:
> >>>
> >>> https://jena.apache.org/
> >>>
> >>>
> >>>
> >>
> https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/Graph.html
> >>>
> >>> Jena use triples in the form of  to encode the graph
> >>>
> >>> give it try and make sure to post to users@jena.apache.org if you have
> >> any
> >>> questions
> >>>
> >>> enjoy,
> >>> Marco
> >>>
> >>> On Sat, Sep 7, 2019 at 10:30 AM Romain Manni-Bucau <
> >> rmannibu...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi all
> >>>>
> >>>> What is the status of graph at commons - or apache if we have
> something
> >>>> elsewhere?
> >>>>
> >>>> I found in sandbox that doc
> >>>>
> >>>>
> >>>
> >>
> https://commons.apache.org/sandbox/commons-graph/apidocs/org/apache/commons/graph/DirectedGraph.html
> >>>> ,
> >>>> but wonder if we have something live and if not why it failed.
> >>>>
> >>>> My rational is I started to write some DAG modelization and tooling
> >>>> (backward browsing in my case) but I see it could be generic so wonder
> >> if
> >>>> it is worse thinking about commons or incubator of if scope is too
> >> small
> >>>> for that and keeping it specific is saner.
> >>>>
> >>>> Anyone has some pointers?
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>> ---
> >>> Marco Neumann
> >>> KONA
> >>>
> >>> --
> >>>
> >>>
> >>> ---
> >>> Marco Neumann
> >>> KONA
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann
KONA


Re: Graph status?

2019-09-08 Thread Marco Neumann
Romain,

sure Jena gives you so much more than just the basic graph infrastructure.
I wasn't acutely aware of the guava shade mandatory requirement in a
minimal viable setup of Jena. Still I would encourage you to engage with
Jena community to discuss design ideas and opportunity for reuse of jena
components in your work on graph at commons and apache.

FYI I currently need the following libraries in a minimal viable setup to
work with the Jena graph api.

jena-base (215kb), jena-core (1.69mb), jena-shaded-guave (2.73mb), log4j
(479kb), slf4j-log (41kb), slf4j-api (12kb)

Marco

On Sat, Sep 7, 2019 at 10:33 PM Romain Manni-Bucau 
wrote:

> Hi Marco,
>
> Had a look to jena for another project and didnt evaluate it here for these
> reasons (happy to be wrong):
>
> - dep stack was huge for only graph part (guava shade, some other uneeded
> commons etc, most being excludable but without guarantees in time)
> - it is not about DAG and therefore misses navigation methods (which is
> what I need in addition to "mutation" methods for the algo i want to impl)
> - it is not the goal of jena so API and core stack can evolve in an
> undesired manner
>
> To mention alternatives, spark, flink, beam, ignite for the few I can think
> about, have something not crazy but still this stack and API issues :(.
>
> This is how i ended up looking commons, to try to have something stable and
> dep free.
>
> Romain
>
>
> Le sam. 7 sept. 2019 à 23:15, Marco Neumann  a
> écrit :
>
> > I highly recommend to take a look at the Apache Jena project for
> > inspiration here. It has a very mature graph representationat this point:
> >
> > https://jena.apache.org/
> >
> >
> >
> https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/graph/Graph.html
> >
> > Jena use triples in the form of  to encode the graph
> >
> > give it try and make sure to post to users@jena.apache.org if you have
> any
> > questions
> >
> > enjoy,
> > Marco
> >
> > On Sat, Sep 7, 2019 at 10:30 AM Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> > > Hi all
> > >
> > > What is the status of graph at commons - or apache if we have something
> > > elsewhere?
> > >
> > > I found in sandbox that doc
> > >
> > >
> >
> https://commons.apache.org/sandbox/commons-graph/apidocs/org/apache/commons/graph/DirectedGraph.html
> > > ,
> > > but wonder if we have something live and if not why it failed.
> > >
> > > My rational is I started to write some DAG modelization and tooling
> > > (backward browsing in my case) but I see it could be generic so wonder
> if
> > > it is worse thinking about commons or incubator of if scope is too
> small
> > > for that and keeping it specific is saner.
> > >
> > > Anyone has some pointers?
> >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
> >
>


-- 


---
Marco Neumann
KONA


Re: JENA Loader Benchmarks

2019-06-23 Thread Marco Neumann
yes I'd say the local NVMe SSDs make the difference here. In my case for
zone US East and US East 2 the VMs only showing a premium ssd option. So
called ultra ssd's seem to be high in demand and currently not available in
my profile. And they also come at a very high estimated price point.

which dataset do you use to run the load test above?



On Sat, Jun 22, 2019 at 11:47 PM Andy Seaborne  wrote:

>
>
> On 20/06/2019 16:01, Marco Neumann wrote:
> > quick update here on loader performance. Did a modest (in terms of cost)
> > hardware upgrade of one of the dedicated data processors with a faster
> CPU
> > and faster NVme SSD drive and was able to almost half our load times.
> Very
> > satisfied with the HW upgrade and TDB2 loader performance. VM's don't
> seem
> > to work well for us in combination with TDB.
>
> My experience has been significant variation across different VM types.
> My assumption is the form of virtualization matters.
>
> I had access to an AWS i3.8xlarge for a short while which had local NVMe
> SSDs and got very good performance:
>
> 500mTDB22,362s  39m 22s 218,460 TPS
> 1 billion   TDB25,164s  1h 26m 04s  200,100 TPS
>
> (this is a single graph dataset)
>
> i3 are "Storage optimized"
>
> The TDB2 loader is multithreaded and each thread is working on a
> different indexes so the access patterns are jumping around all over the
> place both because the non-primary index is, in effect at scale,
> randomly accessed, and because multiple indexes are updating at the same
> time.
>
>  Andy
>
> >
> > On Fri, Jun 14, 2019 at 11:56 PM Marco Neumann 
> > wrote:
> >
> >> absolutely it does, preferably NVMe SSD. tdbloaders are almost a
> showcase
> >> themselves for good up-to-date hardware..
> >>
> >> if possible I'd like to load the wikidata dataset* at at some point to
> see
> >> where 57GB fits in terms of tdb. The wikidata team is currently looking
> at
> >> new solutions that can go beyond blazegraph. And I get the impression
> that
> >> they have not yet actively considered to give jena tdb try.
> >>
> >> https://dumps.wikimedia.org/wikidatawiki/entities/
> >>
> >>
> >> On Fri, Jun 14, 2019 at 11:47 PM Martynas Jusevičius <
> >> marty...@atomgraph.com> wrote:
> >>
> >>> What about SSD disks, don't they make a difference?
> >>>
> >>> On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann <
> marco.neum...@gmail.com>
> >>> wrote:
> >>>>
> >>>> that did the trick Andy, very good might be a good idea to add this to
> >>> the
> >>>> distribution in jena-log4j.properties
> >>>>
> >>>> I am getting these numbers for a midsize dedicated server, very nice
> >>>> numbers indeed Andy. well done!
> >>>>
> >>>> 00:24:53 INFO  loader   :: Loader = LoaderPhased
> >>>> 00:24:53 INFO  loader   :: Start:
> >>>> ../../public_html/lotico.ttl.gz
> >>>> 00:24:55 INFO  loader   :: Add: 500,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 237,755 / Avg: 237,755)
> >>>> 00:24:56 INFO  loader   :: Add: 1,000,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 305,250 / Avg: 267,308)
> >>>> 00:24:58 INFO  loader   :: Add: 1,500,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 313,087 / Avg: 281,004)
> >>>> 00:25:00 INFO  loader   :: Add: 2,000,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 328,299 / Avg: 291,502)
> >>>> 00:25:01 INFO  loader   :: Add: 2,500,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 341,763 / Avg: 300,336)
> >>>> 00:25:03 INFO  loader   :: Add: 3,000,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 337,381 / Avg: 305,935)
> >>>> 00:25:04 INFO  loader   :: Add: 3,500,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 318,877 / Avg: 307,719)
> >>>> 00:25:06 INFO  loader   :: Add: 4,000,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 295,857 / Avg: 306,184)
> >>>> 00:25:07 INFO  loader   :: Add: 4,500,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 327,225 / Avg: 308,388)
> >>>> 00:25:09 INFO  loader   :: Add: 5,000,000 lotico.ttl.gz
> >>> (Batch:
> >>>> 349,406 / Avg: 312,051)
> >>>> 0

Re: JENA Loader Benchmarks

2019-06-20 Thread Marco Neumann
quick update here on loader performance. Did a modest (in terms of cost)
hardware upgrade of one of the dedicated data processors with a faster CPU
and faster NVme SSD drive and was able to almost half our load times. Very
satisfied with the HW upgrade and TDB2 loader performance. VM's don't seem
to work well for us in combination with TDB.

On Fri, Jun 14, 2019 at 11:56 PM Marco Neumann 
wrote:

> absolutely it does, preferably NVMe SSD. tdbloaders are almost a showcase
> themselves for good up-to-date hardware..
>
> if possible I'd like to load the wikidata dataset* at at some point to see
> where 57GB fits in terms of tdb. The wikidata team is currently looking at
> new solutions that can go beyond blazegraph. And I get the impression that
> they have not yet actively considered to give jena tdb try.
>
> https://dumps.wikimedia.org/wikidatawiki/entities/
>
>
> On Fri, Jun 14, 2019 at 11:47 PM Martynas Jusevičius <
> marty...@atomgraph.com> wrote:
>
>> What about SSD disks, don't they make a difference?
>>
>> On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann 
>> wrote:
>> >
>> > that did the trick Andy, very good might be a good idea to add this to
>> the
>> > distribution in jena-log4j.properties
>> >
>> > I am getting these numbers for a midsize dedicated server, very nice
>> > numbers indeed Andy. well done!
>> >
>> > 00:24:53 INFO  loader   :: Loader = LoaderPhased
>> > 00:24:53 INFO  loader   :: Start:
>> > ../../public_html/lotico.ttl.gz
>> > 00:24:55 INFO  loader   :: Add: 500,000 lotico.ttl.gz
>> (Batch:
>> > 237,755 / Avg: 237,755)
>> > 00:24:56 INFO  loader   :: Add: 1,000,000 lotico.ttl.gz
>> (Batch:
>> > 305,250 / Avg: 267,308)
>> > 00:24:58 INFO  loader   :: Add: 1,500,000 lotico.ttl.gz
>> (Batch:
>> > 313,087 / Avg: 281,004)
>> > 00:25:00 INFO  loader   :: Add: 2,000,000 lotico.ttl.gz
>> (Batch:
>> > 328,299 / Avg: 291,502)
>> > 00:25:01 INFO  loader   :: Add: 2,500,000 lotico.ttl.gz
>> (Batch:
>> > 341,763 / Avg: 300,336)
>> > 00:25:03 INFO  loader   :: Add: 3,000,000 lotico.ttl.gz
>> (Batch:
>> > 337,381 / Avg: 305,935)
>> > 00:25:04 INFO  loader   :: Add: 3,500,000 lotico.ttl.gz
>> (Batch:
>> > 318,877 / Avg: 307,719)
>> > 00:25:06 INFO  loader   :: Add: 4,000,000 lotico.ttl.gz
>> (Batch:
>> > 295,857 / Avg: 306,184)
>> > 00:25:07 INFO  loader   :: Add: 4,500,000 lotico.ttl.gz
>> (Batch:
>> > 327,225 / Avg: 308,388)
>> > 00:25:09 INFO  loader   :: Add: 5,000,000 lotico.ttl.gz
>> (Batch:
>> > 349,406 / Avg: 312,051)
>> > 00:25:09 INFO  loader   ::   Elapsed: 16.02 seconds
>> [2019/06/15
>> > 00:25:09 CEST]
>> > 00:25:11 INFO  loader   :: Add: 5,500,000 lotico.ttl.gz
>> (Batch:
>> > 285,062 / Avg: 309,388)
>> > 00:25:13 INFO  loader   :: Add: 6,000,000 lotico.ttl.gz
>> (Batch:
>> > 203,665 / Avg: 296,559)
>> > 00:25:16 INFO  loader   :: Add: 6,500,000 lotico.ttl.gz
>> (Batch:
>> > 189,393 / Avg: 284,190)
>> >
>> > on another machine that sits in the Azure infrastructure somewhere it
>> > tdbloader doesn't look as good, even with decent hardware it seems to
>> die a
>> > slow death of memory exhaustion at 16GB. started off with 70kT/s and is
>> now
>> > down to 17kT/s and still going.
>> >
>> > lesson learned big iron and big memory is the way to go with Jena
>> > tdbloaders.
>> >
>> >
>> >
>> >
>> > On Fri, Jun 14, 2019 at 10:53 PM Andy Seaborne  wrote:
>> >
>> > > These messages are logged (to logger "org.apache.jena.tdb2.loader") -
>> do
>> > > you have log4j.proprties in the current working directory?
>> > >
>> > > Do you get any output?
>> > >
>> > > INFO  Loader = LoaderParallel
>> > > INFO  Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz
>> > > INFO  Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770)
>> > > INFO  Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604)
>> > > INFO  Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920)
>> > > INFO  Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189)
>> > > INFO  Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508)
>&g

Re: JENA Loader Benchmarks

2019-06-18 Thread Marco Neumann
I agree, would be desirable to have funding for these requests and more. to
bad there isn't currently a commercial entity that helps actively driving
this valuable project.


On Tue, Jun 18, 2019 at 2:37 PM Andy Seaborne  wrote:

>
>
> On 18/06/2019 13:44, Marco Neumann wrote:
> > Andy, just one observation. there seems to be quite some data replication
> > going on in the respective tdb / tdb2 folder.
> >
> > Is it possibly to instruct tdb/tdb2 only to create a database with one
> > default graph?
>
> In theory you can set the indexes you want via StoreParams - it works
> for choices but I would not be surprised if the code assumed at least
> one quads index. Fixable.
>
> > It seems to be quite safe to remove files from disk that
> > contain G-indexes manually and maintain query consistency in the default
> > graph and it would reduced the tdb database footprint on disk by 1/3.
> >
>
> They aren't as big as you think they are :-)
>
> Try this:
>
> No DB2.
>
> tdb2.tdbquery --loc DB2 'ASK{}'
> Ask => Yes
>
> du -sh DB2
> 216KDB2
>
> so it is 216K bytes on disk empty.
>
> (this is Linux/ext4 filesystem)
>
> ~ >> ll DB2/Data-0001/
>
> loads of 8M files.
>
> How come there are files that are 8M but the entire thing is 216K?
>
> They are sparse files.
> The space is not allocated.
>
> Some systems (Mac for example) report the size of the files added up,
> not the space used.
>
> total 204
> -rw-r--r-- 1 afs afs  24 Jun 18 14:26 GOSP.bpt
> -rw-r--r-- 1 afs afs 8388608 Jun 18 14:26 GOSP.dat
> -rw-r--r-- 1 afs afs 8388608 Jun 18 14:26 GOSP.idn
>
> > not to speak of an option for LZW compression a la HDT.
>
> That would be good if I had time. Anyone got any spare funding?!
>
> I'm not sure how the HDT (java) project is doing.
> Like all open source projects, it needs time and energy, and executing a
> steady state still requires backing.
>
> I currently think RocksDB is possible choice. Initial experiments showed
> it works but needs tuning work. The new storage architecture
> (jena-dboe-storage) would make it event easier to build.
>
>  Andy
>
>
> >
> >
> >
> > On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne  wrote:
> >
> >>
> >>
> >> On 14/06/2019 18:13, Marco Neumann wrote:
> >>> I am collecting jena loader benchmarks. if you have results please post
> >>> them directly.
> >>>
> >>> http://www.lotico.com/index.php/JENA_Loader_Benchmarks
> >>
> >> tdb2.tdbloader has variations controlled by --loader.
> >>
> >> --loader=
> >> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
> >> 'light'
> >>
> >> "basic" is a super naive parser-add triple loop - it used if a loader
> >> can't cope with an already loaded database.
> >>
> >> "phased" is a balanced, does not saturate the machine loader. Some
> >> parallelism.
> >>
> >> "sequential" is the tdbloader algorithm for TDB2, more for reference.
> >>
> >> "parallel" is as much parallelism as it wants. (5 for triples, more for
> >> quads)
> >>
> >> "light" is two threaded. Slightly ligther than "phased".
> >>
> >> See LoaderPlans.
> >>
> >>> On a linux machine I am using "time" to collect data.
> >>>
> >>> Is there a flag on tdb2.tdbloader to report time and triples per
> second?
> >>>
> >>> I have noticed that storage space use for tdbloader2 is significantly
> >>> smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
> >>> straight forward explanation here?
> >>>
> >>
> >
> >
>


-- 


---
Marco Neumann
KONA

-- 


---
Marco Neumann
KONA


Re: JENA Loader Benchmarks

2019-06-18 Thread Marco Neumann
Andy, just one observation. there seems to be quite some data replication
going on in the respective tdb / tdb2 folder.

Is it possibly to instruct tdb/tdb2 only to create a database with one
default graph? It seems to be quite safe to remove files from disk that
contain G-indexes manually and maintain query consistency in the default
graph and it would reduced the tdb database footprint on disk by 1/3.

not to speak of an option for LZW compression a la HDT.



On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne  wrote:

>
>
> On 14/06/2019 18:13, Marco Neumann wrote:
> > I am collecting jena loader benchmarks. if you have results please post
> > them directly.
> >
> > http://www.lotico.com/index.php/JENA_Loader_Benchmarks
>
> tdb2.tdbloader has variations controlled by --loader.
>
> --loader=
> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
> 'light'
>
> "basic" is a super naive parser-add triple loop - it used if a loader
> can't cope with an already loaded database.
>
> "phased" is a balanced, does not saturate the machine loader. Some
> parallelism.
>
> "sequential" is the tdbloader algorithm for TDB2, more for reference.
>
> "parallel" is as much parallelism as it wants. (5 for triples, more for
> quads)
>
> "light" is two threaded. Slightly ligther than "phased".
>
> See LoaderPlans.
>
> > On a linux machine I am using "time" to collect data.
> >
> > Is there a flag on tdb2.tdbloader to report time and triples per second?
> >
> > I have noticed that storage space use for tdbloader2 is significantly
> > smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
> > straight forward explanation here?
> >
>


-- 


---
Marco Neumann
KONA


Re: JENA Loader Benchmarks

2019-06-15 Thread Marco Neumann
Very good, thank you for the links Lorenz!

Marco

On Sat, Jun 15, 2019 at 8:10 AM Lorenz B. <
buehm...@informatik.uni-leipzig.de> wrote:

> Hi Marco,
>
> that reminds me of a previous discussions in Nov./Dec. 2017, one
> regarding general performance titled "tdb2.tdbloader performance" [1, 2]
> and then as followup, "Report on loading wikidata" [3]. Maybe you can
> also have a look at it, some people like Dick and Andy also did some
> kind of (light-weight) performance benchmark
>
> [1]
>
> https://lists.apache.org/thread.html/a5a2751a4fc4387c3db929b95927a95cbc4d0116664c7f3d32dca576@%3Cusers.jena.apache.org%3E
> [2]
>
> https://lists.apache.org/thread.html/34b53d7ee75e484cdbcc2ac75e075e6d7321ba1ee4a143c58c95b793@%3Cusers.jena.apache.org%3E
> [3]
>
> https://lists.apache.org/thread.html/70dde8e3d99ce3d69de613b5013c3f4c583d96161dec494ece49a412@%3Cusers.jena.apache.org%3E
>
> > absolutely it does, preferably NVMe SSD. tdbloaders are almost a showcase
> > themselves for good up-to-date hardware..
> >
> > if possible I'd like to load the wikidata dataset* at at some point to
> see
> > where 57GB fits in terms of tdb. The wikidata team is currently looking
> at
> > new solutions that can go beyond blazegraph. And I get the impression
> that
> > they have not yet actively considered to give jena tdb try.
> >
> > https://dumps.wikimedia.org/wikidatawiki/entities/
> >
> >
> > On Fri, Jun 14, 2019 at 11:47 PM Martynas Jusevičius <
> marty...@atomgraph.com>
> > wrote:
> >
> >> What about SSD disks, don't they make a difference?
> >>
> >> On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann  >
> >> wrote:
> >>> that did the trick Andy, very good might be a good idea to add this to
> >> the
> >>> distribution in jena-log4j.properties
> >>>
> >>> I am getting these numbers for a midsize dedicated server, very nice
> >>> numbers indeed Andy. well done!
> >>>
> >>> 00:24:53 INFO  loader   :: Loader = LoaderPhased
> >>> 00:24:53 INFO  loader   :: Start:
> >>> ../../public_html/lotico.ttl.gz
> >>> 00:24:55 INFO  loader   :: Add: 500,000 lotico.ttl.gz
> (Batch:
> >>> 237,755 / Avg: 237,755)
> >>> 00:24:56 INFO  loader   :: Add: 1,000,000 lotico.ttl.gz
> >> (Batch:
> >>> 305,250 / Avg: 267,308)
> >>> 00:24:58 INFO  loader   :: Add: 1,500,000 lotico.ttl.gz
> >> (Batch:
> >>> 313,087 / Avg: 281,004)
> >>> 00:25:00 INFO  loader   :: Add: 2,000,000 lotico.ttl.gz
> >> (Batch:
> >>> 328,299 / Avg: 291,502)
> >>> 00:25:01 INFO  loader   :: Add: 2,500,000 lotico.ttl.gz
> >> (Batch:
> >>> 341,763 / Avg: 300,336)
> >>> 00:25:03 INFO  loader   :: Add: 3,000,000 lotico.ttl.gz
> >> (Batch:
> >>> 337,381 / Avg: 305,935)
> >>> 00:25:04 INFO  loader   :: Add: 3,500,000 lotico.ttl.gz
> >> (Batch:
> >>> 318,877 / Avg: 307,719)
> >>> 00:25:06 INFO  loader   :: Add: 4,000,000 lotico.ttl.gz
> >> (Batch:
> >>> 295,857 / Avg: 306,184)
> >>> 00:25:07 INFO  loader   :: Add: 4,500,000 lotico.ttl.gz
> >> (Batch:
> >>> 327,225 / Avg: 308,388)
> >>> 00:25:09 INFO  loader   :: Add: 5,000,000 lotico.ttl.gz
> >> (Batch:
> >>> 349,406 / Avg: 312,051)
> >>> 00:25:09 INFO  loader   ::   Elapsed: 16.02 seconds
> >> [2019/06/15
> >>> 00:25:09 CEST]
> >>> 00:25:11 INFO  loader   :: Add: 5,500,000 lotico.ttl.gz
> >> (Batch:
> >>> 285,062 / Avg: 309,388)
> >>> 00:25:13 INFO  loader   :: Add: 6,000,000 lotico.ttl.gz
> >> (Batch:
> >>> 203,665 / Avg: 296,559)
> >>> 00:25:16 INFO  loader   :: Add: 6,500,000 lotico.ttl.gz
> >> (Batch:
> >>> 189,393 / Avg: 284,190)
> >>>
> >>> on another machine that sits in the Azure infrastructure somewhere it
> >>> tdbloader doesn't look as good, even with decent hardware it seems to
> >> die a
> >>> slow death of memory exhaustion at 16GB. started off with 70kT/s and is
> >> now
> >>> down to 17kT/s and still going.
> >>>
> >>> lesson learned big iron and big memory is the way to go with Jena
> >>> tdbloaders.
> >>>
> >>>
> >>

Re: JENA Loader Benchmarks

2019-06-14 Thread Marco Neumann
absolutely it does, preferably NVMe SSD. tdbloaders are almost a showcase
themselves for good up-to-date hardware..

if possible I'd like to load the wikidata dataset* at at some point to see
where 57GB fits in terms of tdb. The wikidata team is currently looking at
new solutions that can go beyond blazegraph. And I get the impression that
they have not yet actively considered to give jena tdb try.

https://dumps.wikimedia.org/wikidatawiki/entities/


On Fri, Jun 14, 2019 at 11:47 PM Martynas Jusevičius 
wrote:

> What about SSD disks, don't they make a difference?
>
> On Sat, Jun 15, 2019 at 12:36 AM Marco Neumann 
> wrote:
> >
> > that did the trick Andy, very good might be a good idea to add this to
> the
> > distribution in jena-log4j.properties
> >
> > I am getting these numbers for a midsize dedicated server, very nice
> > numbers indeed Andy. well done!
> >
> > 00:24:53 INFO  loader   :: Loader = LoaderPhased
> > 00:24:53 INFO  loader   :: Start:
> > ../../public_html/lotico.ttl.gz
> > 00:24:55 INFO  loader   :: Add: 500,000 lotico.ttl.gz (Batch:
> > 237,755 / Avg: 237,755)
> > 00:24:56 INFO  loader   :: Add: 1,000,000 lotico.ttl.gz
> (Batch:
> > 305,250 / Avg: 267,308)
> > 00:24:58 INFO  loader   :: Add: 1,500,000 lotico.ttl.gz
> (Batch:
> > 313,087 / Avg: 281,004)
> > 00:25:00 INFO  loader   :: Add: 2,000,000 lotico.ttl.gz
> (Batch:
> > 328,299 / Avg: 291,502)
> > 00:25:01 INFO  loader   :: Add: 2,500,000 lotico.ttl.gz
> (Batch:
> > 341,763 / Avg: 300,336)
> > 00:25:03 INFO  loader   :: Add: 3,000,000 lotico.ttl.gz
> (Batch:
> > 337,381 / Avg: 305,935)
> > 00:25:04 INFO  loader   :: Add: 3,500,000 lotico.ttl.gz
> (Batch:
> > 318,877 / Avg: 307,719)
> > 00:25:06 INFO  loader   :: Add: 4,000,000 lotico.ttl.gz
> (Batch:
> > 295,857 / Avg: 306,184)
> > 00:25:07 INFO  loader   :: Add: 4,500,000 lotico.ttl.gz
> (Batch:
> > 327,225 / Avg: 308,388)
> > 00:25:09 INFO  loader   :: Add: 5,000,000 lotico.ttl.gz
> (Batch:
> > 349,406 / Avg: 312,051)
> > 00:25:09 INFO  loader   ::   Elapsed: 16.02 seconds
> [2019/06/15
> > 00:25:09 CEST]
> > 00:25:11 INFO  loader   :: Add: 5,500,000 lotico.ttl.gz
> (Batch:
> > 285,062 / Avg: 309,388)
> > 00:25:13 INFO  loader   :: Add: 6,000,000 lotico.ttl.gz
> (Batch:
> > 203,665 / Avg: 296,559)
> > 00:25:16 INFO  loader   :: Add: 6,500,000 lotico.ttl.gz
> (Batch:
> > 189,393 / Avg: 284,190)
> >
> > on another machine that sits in the Azure infrastructure somewhere it
> > tdbloader doesn't look as good, even with decent hardware it seems to
> die a
> > slow death of memory exhaustion at 16GB. started off with 70kT/s and is
> now
> > down to 17kT/s and still going.
> >
> > lesson learned big iron and big memory is the way to go with Jena
> > tdbloaders.
> >
> >
> >
> >
> > On Fri, Jun 14, 2019 at 10:53 PM Andy Seaborne  wrote:
> >
> > > These messages are logged (to logger "org.apache.jena.tdb2.loader") -
> do
> > > you have log4j.proprties in the current working directory?
> > >
> > > Do you get any output?
> > >
> > > INFO  Loader = LoaderParallel
> > > INFO  Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz
> > > INFO  Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770)
> > > INFO  Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604)
> > > INFO  Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920)
> > > INFO  Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189)
> > > INFO  Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508)
> > > INFO  Add: 3,000,000 bsbm-5m.nt.gz (Batch: 206,953 / Avg: 186,173)
> > > INFO  Add: 3,500,000 bsbm-5m.nt.gz (Batch: 183,621 / Avg: 185,804)
> > > INFO  Add: 4,000,000 bsbm-5m.nt.gz (Batch: 151,423 / Avg: 180,676)
> > > INFO  Add: 4,500,000 bsbm-5m.nt.gz (Batch: 152,765 / Avg: 177,081)
> > > INFO  Add: 5,000,000 bsbm-5m.nt.gz (Batch: 158,881 / Avg: 175,076)
> > > INFOElapsed: 28.56 seconds [2019/06/14 22:51:37 BST]
> > > INFO  Finished: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz: 5,000,599 tuples
> > > in 28.63s (Avg: 174,644)
> > > INFO  Finish - index SPO
> > > INFO  Finish - index POS
> > > INFO  Finish - index OSP
> > > INFO  Time = 35.572 seconds : Triples = 5,000,599 : Rate = 140,577 /s
> > >
> > >
> >

Re: JENA Loader Benchmarks

2019-06-14 Thread Marco Neumann
that did the trick Andy, very good might be a good idea to add this to the
distribution in jena-log4j.properties

I am getting these numbers for a midsize dedicated server, very nice
numbers indeed Andy. well done!

00:24:53 INFO  loader   :: Loader = LoaderPhased
00:24:53 INFO  loader   :: Start:
../../public_html/lotico.ttl.gz
00:24:55 INFO  loader   :: Add: 500,000 lotico.ttl.gz (Batch:
237,755 / Avg: 237,755)
00:24:56 INFO  loader   :: Add: 1,000,000 lotico.ttl.gz (Batch:
305,250 / Avg: 267,308)
00:24:58 INFO  loader   :: Add: 1,500,000 lotico.ttl.gz (Batch:
313,087 / Avg: 281,004)
00:25:00 INFO  loader   :: Add: 2,000,000 lotico.ttl.gz (Batch:
328,299 / Avg: 291,502)
00:25:01 INFO  loader   :: Add: 2,500,000 lotico.ttl.gz (Batch:
341,763 / Avg: 300,336)
00:25:03 INFO  loader   :: Add: 3,000,000 lotico.ttl.gz (Batch:
337,381 / Avg: 305,935)
00:25:04 INFO  loader   :: Add: 3,500,000 lotico.ttl.gz (Batch:
318,877 / Avg: 307,719)
00:25:06 INFO  loader   :: Add: 4,000,000 lotico.ttl.gz (Batch:
295,857 / Avg: 306,184)
00:25:07 INFO  loader   :: Add: 4,500,000 lotico.ttl.gz (Batch:
327,225 / Avg: 308,388)
00:25:09 INFO  loader   :: Add: 5,000,000 lotico.ttl.gz (Batch:
349,406 / Avg: 312,051)
00:25:09 INFO  loader   ::   Elapsed: 16.02 seconds [2019/06/15
00:25:09 CEST]
00:25:11 INFO  loader   :: Add: 5,500,000 lotico.ttl.gz (Batch:
285,062 / Avg: 309,388)
00:25:13 INFO  loader   :: Add: 6,000,000 lotico.ttl.gz (Batch:
203,665 / Avg: 296,559)
00:25:16 INFO  loader   :: Add: 6,500,000 lotico.ttl.gz (Batch:
189,393 / Avg: 284,190)

on another machine that sits in the Azure infrastructure somewhere it
tdbloader doesn't look as good, even with decent hardware it seems to die a
slow death of memory exhaustion at 16GB. started off with 70kT/s and is now
down to 17kT/s and still going.

lesson learned big iron and big memory is the way to go with Jena
tdbloaders.




On Fri, Jun 14, 2019 at 10:53 PM Andy Seaborne  wrote:

> These messages are logged (to logger "org.apache.jena.tdb2.loader") - do
> you have log4j.proprties in the current working directory?
>
> Do you get any output?
>
> INFO  Loader = LoaderParallel
> INFO  Start: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz
> INFO  Add: 500,000 bsbm-5m.nt.gz (Batch: 134,770 / Avg: 134,770)
> INFO  Add: 1,000,000 bsbm-5m.nt.gz (Batch: 189,753 / Avg: 157,604)
> INFO  Add: 1,500,000 bsbm-5m.nt.gz (Batch: 205,676 / Avg: 170,920)
> INFO  Add: 2,000,000 bsbm-5m.nt.gz (Batch: 204,248 / Avg: 178,189)
> INFO  Add: 2,500,000 bsbm-5m.nt.gz (Batch: 202,101 / Avg: 182,508)
> INFO  Add: 3,000,000 bsbm-5m.nt.gz (Batch: 206,953 / Avg: 186,173)
> INFO  Add: 3,500,000 bsbm-5m.nt.gz (Batch: 183,621 / Avg: 185,804)
> INFO  Add: 4,000,000 bsbm-5m.nt.gz (Batch: 151,423 / Avg: 180,676)
> INFO  Add: 4,500,000 bsbm-5m.nt.gz (Batch: 152,765 / Avg: 177,081)
> INFO  Add: 5,000,000 bsbm-5m.nt.gz (Batch: 158,881 / Avg: 175,076)
> INFOElapsed: 28.56 seconds [2019/06/14 22:51:37 BST]
> INFO  Finished: /home/afs/Datasets/BSBM/bsbm-5m.nt.gz: 5,000,599 tuples
> in 28.63s (Avg: 174,644)
> INFO  Finish - index SPO
> INFO  Finish - index POS
> INFO  Finish - index OSP
> INFO  Time = 35.572 seconds : Triples = 5,000,599 : Rate = 140,577 /s
>
>
> There is pause after the first "Finished:" - this is finished data in,
> the index threads are still running and the pause comes from flush to disk.
>
>  Andy
>
> On 14/06/2019 20:16, Marco Neumann wrote:
> > let me fire up one of the big machines to see what I will get there.
> > currently I have no info display during load with tdb2.tdbloader . if -v
> is
> > specified I get some extra info but no load info.
> >
> > On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne  wrote:
> >
> >>
> >>
> >> On 14/06/2019 18:13, Marco Neumann wrote:
> >>> I am collecting jena loader benchmarks. if you have results please post
> >>> them directly.
> >>>
> >>> http://www.lotico.com/index.php/JENA_Loader_Benchmarks
> >>
> >> tdb2.tdbloader has variations controlled by --loader.
> >>
> >> --loader=
> >> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
> >> 'light'
> >>
> >> "basic" is a super naive parser-add triple loop - it used if a loader
> >> can't cope with an already loaded database.
> >>
> >> "phased" is a balanced, does not saturate the machine loader. Some
> >> parallelism.
> >>
> >> "sequential" is the tdbloader algorithm for TDB2, more for reference.
> >>
>

Re: JENA Loader Benchmarks

2019-06-14 Thread Marco Neumann
let me fire up one of the big machines to see what I will get there.
currently I have no info display during load with tdb2.tdbloader . if -v is
specified I get some extra info but no load info.

On Fri, Jun 14, 2019 at 8:03 PM Andy Seaborne  wrote:

>
>
> On 14/06/2019 18:13, Marco Neumann wrote:
> > I am collecting jena loader benchmarks. if you have results please post
> > them directly.
> >
> > http://www.lotico.com/index.php/JENA_Loader_Benchmarks
>
> tdb2.tdbloader has variations controlled by --loader.
>
> --loader=
> Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or
> 'light'
>
> "basic" is a super naive parser-add triple loop - it used if a loader
> can't cope with an already loaded database.
>
> "phased" is a balanced, does not saturate the machine loader. Some
> parallelism.
>
> "sequential" is the tdbloader algorithm for TDB2, more for reference.
>
> "parallel" is as much parallelism as it wants. (5 for triples, more for
> quads)
>
> "light" is two threaded. Slightly ligther than "phased".
>
> See LoaderPlans.
>
> > On a linux machine I am using "time" to collect data.
> >
> > Is there a flag on tdb2.tdbloader to report time and triples per second?
> >
> > I have noticed that storage space use for tdbloader2 is significantly
> > smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
> > straight forward explanation here?
> >
>


-- 


---
Marco Neumann
KONA


Re: JENA Loader Benchmarks

2019-06-14 Thread Marco Neumann
nice, so basically for a read only instance tdbloader2 is the way to go in
terms of disk space. Is there a trade off for the full packed B+Trees in
terms of performance?

On Fri, Jun 14, 2019 at 7:52 PM Andy Seaborne  wrote:

>
>
> On 14/06/2019 18:13, Marco Neumann wrote:
> > I am collecting jena loader benchmarks. if you have results please post
> > them directly.
> >
> > http://www.lotico.com/index.php/JENA_Loader_Benchmarks
> >
> > On a linux machine I am using "time" to collect data.
> >
> > Is there a flag on tdb2.tdbloader to report time and triples per second?
>
> It does (if time >1 second)
>
> ...
> INFO  Time = 11.755 seconds : Triples = 1,000,312 : Rate = 85,097 /s
>
> what do you see?
>
> >
> > I have noticed that storage space use for tdbloader2 is significantly
> > smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
> > straight forward explanation here?
>
> tdbloader2 create full packed B+Trees.
>
> However, as updates happen, these will grow to more like the other
> loaders as node splitting occurs.
>
> >
>


-- 


---
Marco Neumann
KONA


JENA Loader Benchmarks

2019-06-14 Thread Marco Neumann
I am collecting jena loader benchmarks. if you have results please post
them directly.

http://www.lotico.com/index.php/JENA_Loader_Benchmarks

On a linux machine I am using "time" to collect data.

Is there a flag on tdb2.tdbloader to report time and triples per second?

I have noticed that storage space use for tdbloader2 is significantly
smaller on disk compared to tdbloader and tdb2.tdbloader. Is there a
straight forward explanation here?

-- 


---
Marco Neumann
KONA


Re: Query timeout

2019-06-10 Thread Marco Neumann
your config works for me as well with the correct queryTimeout time.

Are you picking up another config.ttl file maybe by way of FUSEKI_CONF etc?


On Mon, Jun 10, 2019 at 1:55 PM Mikael Pesonen 
wrote:

>
> Also tested without Jena text, just in case:
>
> SELECT * WHERE
> {
> GRAPH <http://www.yso.fi/onto/yso/>
>{
>   ?concept <http://www.w3.org/2004/02/skos/core#broader>* ?c2 .
>   ?concept <http://www.w3.org/2004/02/skos/core#prefLabel>
> ?prefLabel .
>}
> }
> limit 999
>
> YSO is downloadable here: https://finto.fi/yso/en/  at the end of page.
>
>
>
> On 10/06/2019 15:20, Andy Seaborne wrote:
> > How are you testing it?
> >
> >
> https://github.com/apache/jena/blob/master/jena-fuseki2/examples/service-tdb1-mem.ttl
> >
> >
> > times out as expected for me.
> >
> > s-query --service http://localhost:3030/MEM
> >   'PREFIX afn: <http://jena.apache.org/ARQ/function#>
> >  ASK{ BIND (afn:wait(2000) AS ?X ) }'
> >
> >
> > Andy
> >
> > On 10/06/2019 13:08, Mikael Pesonen wrote:
> >>
> >> Just noticed arq namespace is not defined but it is literal so
> >> shouldn't matter?
> >
> > No
> >>
> >>
> >> @prefix :<http://localhost/jena_example/#>  .
> >> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> >> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> >> @prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
> >> @prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#>  .
> >> @prefix text:<http://jena.apache.org/text#>  .
> >> @prefix skos:<http://www.w3.org/2004/02/skos/core#>
> >> @prefix fuseki:<http://jena.apache.org/fuseki#>  .
> >> @prefix lsrm:<https://resource.lingsoft.fi/ns/resource_meta#> .
> >>
> >> ## Example of a TDB dataset and text index
> >> ## Initialize TDB
> >> [] ja:loadClass "org.apache.jena.tdb.TDB" .
> >> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> >> tdb:GraphTDBrdfs:subClassOf  ja:Model .
> >>
> >> ## Initialize text query
> >> [] ja:loadClass   "org.apache.jena.query.text.TextQuery" .
> >> # A TextDataset is a regular dataset with a text index.
> >> text:TextDataset  rdfs:subClassOf   ja:RDFDataset .
> >> # Lucene index
> >> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
> >>
> >>
> >> ## ---
> >>
> >>
> >> :text_dataset rdf:type text:TextDataset ;
> >>   text:dataset   :my_dataset ;
> >>   text:index <#indexLucene> ;
> >>   .
> >>
> >> # A TDB dataset used for RDF storage
> >> :my_dataset rdf:type  tdb:DatasetTDB ;
> >>   tdb:location "/home/text/tools/jena_data/" ;
> >>
> >> #tdb:unionDefaultGraph true ; # Optional
> >>   .
> >>
> >> # Text index description
> >> <#indexLucene> a text:TextIndexLucene ;
> >>   text:directory  ;
> >>   text:entityMap <#entMap> ;
> >>   text:storeValues true ;
> >>   text:analyzer [ a text:StandardAnalyzer ] ;
> >> # these mess up language search. why?
> >> # text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
> >> # text:queryParser text:AnalyzingQueryParser ;
> >>   text:multilingualSupport true ;
> >>.
> >>
> >> <#entMap> a text:EntityMap ;
> >>   text:defaultField "prefLabel" ;
> >>   text:entityField  "uri" ;
> >>   text:uidField "uid" ;
> >>   text:langField"lang" ;
> >>   text:graphField   "graph" ;
> >>   text:map (
> >>[ text:field "prefLabel" ; text:predicate skos:prefLabel ]
> >>[ text:field "altLabel"  ; text:predicate skos:altLabel ]
> >>[ text:field "content"  ; text:predicate lsrm:content ]
> >>) .
> >>
> >>
> >> [] rdf:type fuseki:Server ;
> >> ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000"
> >> ] ; # 1 sec for testing
> >> .
> >>
> >> <#service> rdf:type fuseki:Service ;
> >>   fuseki:name "/ds" ;  

Re: Query timeout

2019-06-10 Thread Marco Neumann
gives me

ERROR riot :: [line: 76, col: 1 ] Triples not terminated by
DOT

also you could move the [] rdf:type fuseki:Server ; section to the top
below the prefix section

On Mon, Jun 10, 2019 at 1:09 PM Mikael Pesonen 
wrote:

>
> Just noticed arq namespace is not defined but it is literal so shouldn't
> matter?
>
>
> @prefix :<http://localhost/jena_example/#>  .
> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> @prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
> @prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#>  .
> @prefix text:<http://jena.apache.org/text#>  .
> @prefix skos:<http://www.w3.org/2004/02/skos/core#>
> @prefix fuseki:<http://jena.apache.org/fuseki#>  .
> @prefix lsrm:<https://resource.lingsoft.fi/ns/resource_meta#> .
>
> ## Example of a TDB dataset and text index
> ## Initialize TDB
> [] ja:loadClass "org.apache.jena.tdb.TDB" .
> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> tdb:GraphTDBrdfs:subClassOf  ja:Model .
>
> ## Initialize text query
> [] ja:loadClass   "org.apache.jena.query.text.TextQuery" .
> # A TextDataset is a regular dataset with a text index.
> text:TextDataset  rdfs:subClassOf   ja:RDFDataset .
> # Lucene index
> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
>
>
> ## ---
>
>
> :text_dataset rdf:type text:TextDataset ;
>   text:dataset   :my_dataset ;
>   text:index <#indexLucene> ;
>   .
>
> # A TDB dataset used for RDF storage
> :my_dataset rdf:type  tdb:DatasetTDB ;
>   tdb:location "/home/text/tools/jena_data/" ;
>
> #tdb:unionDefaultGraph true ; # Optional
>   .
>
> # Text index description
> <#indexLucene> a text:TextIndexLucene ;
>   text:directory  ;
>   text:entityMap <#entMap> ;
>   text:storeValues true ;
>   text:analyzer [ a text:StandardAnalyzer ] ;
> # these mess up language search. why?
> # text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
> # text:queryParser text:AnalyzingQueryParser ;
>   text:multilingualSupport true ;
>.
>
> <#entMap> a text:EntityMap ;
>   text:defaultField "prefLabel" ;
>   text:entityField  "uri" ;
>   text:uidField "uid" ;
>   text:langField"lang" ;
>   text:graphField   "graph" ;
>   text:map (
>[ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>[ text:field "altLabel"  ; text:predicate skos:altLabel ]
>[ text:field "content"  ; text:predicate lsrm:content ]
>) .
>
>
> [] rdf:type fuseki:Server ;
> ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "1000" ] ;
> # 1 sec for testing
> .
>
> <#service> rdf:type fuseki:Service ;
>   fuseki:name "/ds" ;   # http://host:port/ds-ro
>   fuseki:serviceQuery "query" ;    # SPARQL query service
>   fuseki:serviceQuery "sparql" ;   # SPARQL query service
>   fuseki:serviceUpdate"update" ;   # SPARQL update service
>   fuseki:serviceUpload"upload" ;   # Non-SPARQL upload
> service
>   fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph
> store protocol (read and write)
>   fuseki:dataset   :text_dataset ;
>   .
>
>
>
> On 10/06/2019 15:03, Marco Neumann wrote:
> > please post the entire config file here for reference.
> >
> > On Mon, Jun 10, 2019 at 12:49 PM Mikael Pesonen <
> mikael.peso...@lingsoft.fi>
> > wrote:
> >
> >> It was in default config:
> >>
> >> @prefix :<http://localhost/jena_example/#>  .
> >> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
> >> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> >> @prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
> >> @prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#>  .
> >> @prefix text:<http://jena.apache.org/text#>  .
> >> @prefix skos:<http://www.w3.org/2004/02/skos/core#>
> >> @prefix fuseki:<http://jena.apache.org/fuseki#>  .
> >>
> >>
> >> On 10/06/2019 14:46, Marco Neumann wrote:
> >>> did you set the ja: name space?
> >>>
> >>> On Mon, Jun 10, 2019 at 11:04

Re: Query timeout

2019-06-10 Thread Marco Neumann
please post the entire config file here for reference.

On Mon, Jun 10, 2019 at 12:49 PM Mikael Pesonen 
wrote:

>
> It was in default config:
>
> @prefix :<http://localhost/jena_example/#>  .
> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> @prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
> @prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#>  .
> @prefix text:<http://jena.apache.org/text#>  .
> @prefix skos:<http://www.w3.org/2004/02/skos/core#>
> @prefix fuseki:<http://jena.apache.org/fuseki#>  .
>
>
> On 10/06/2019 14:46, Marco Neumann wrote:
> > did you set the ja: name space?
> >
> > On Mon, Jun 10, 2019 at 11:04 AM Mikael Pesonen <
> mikael.peso...@lingsoft.fi>
> > wrote:
> >
> >> How do you set query timeout? I've tried on config.ttl
> >>
> >> [] rdf:type fuseki:Server ;
> >>ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "3000"
> ] ;
> >>
> >> and
> >>
> >> :my_dataset rdf:type  tdb:DatasetTDB ;
> >>ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "3000"
> ] ;
> >>
> >> but they don't work.
> >>
> >>
> >> --
> >> Lingsoft - 30 years of Leading Language Management
> >>
> >> www.lingsoft.fi
> >>
> >> Speech Applications - Language Management - Translation - Reader's and
> >> Writer's Tools - Text Tools - E-books and M-books
> >>
> >> Mikael Pesonen
> >> System Engineer
> >>
> >> e-mail: mikael.peso...@lingsoft.fi
> >> Tel. +358 2 279 3300
> >>
> >> Time zone: GMT+2
> >>
> >> Helsinki Office
> >> Eteläranta 10
> >> FI-00130 Helsinki
> >> FINLAND
> >>
> >> Turku Office
> >> Kauppiaskatu 5 A
> >> FI-20100 Turku
> >> FINLAND
> >>
> >>
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and
> Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.peso...@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>

-- 


---
Marco Neumann
KONA


Re: Query timeout

2019-06-10 Thread Marco Neumann
did you set the ja: name space?

On Mon, Jun 10, 2019 at 11:04 AM Mikael Pesonen 
wrote:

>
> How do you set query timeout? I've tried on config.ttl
>
> [] rdf:type fuseki:Server ;
>   ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "3000" ] ;
>
> and
>
> :my_dataset rdf:type  tdb:DatasetTDB ;
>   ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "3000" ] ;
>
> but they don't work.
>
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and
> Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.peso...@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>

-- 


---
Marco Neumann
KONA


Re: pattern matching and extraction function on strings in syntaxARQ

2019-04-24 Thread Marco Neumann
great, simple is good here Lorenz. I tried $1 but was surprised to get
full strings bound to ?email on non-matches here as well. I had to
combine it with a filter of the same pattern to get the desired
results. so I read the replacement semantics here now that
backreference $0 is the input string and $1 is the first match.
(following XPath and XQuery Functions and Operators 3.1)?

query now looks like this:

SELECT ?email
WHERE  { ?s ?p ?o
FILTER regex(STR(?s),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","i")
BIND (REPLACE(STR(?s),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","$1") AS ?email)
} GROUP BY ?email




On Wed, Apr 24, 2019 at 9:03 AM Lorenz B.
 wrote:
>
> Or maybe even more simple
>
> |BIND(REPLACE(STR(?url),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","$1")
> AS ?email)|
>
> >> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> > replaces the matching email address by the email address itself, so it's
> > the same as before.
> >
> > You need to replace everything else by the email address, replace is not
> > an "extract" function, you can try
> >
> > BIND
> > (REPLACE(STR(?url),"[a-zA-Z0-9/:._-]+/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/[a-zA-Z0-9/._-]+","$1")
> > AS ?email)
> >
> > Note, I assume that email addresses are wrapped inside / char
> >
> >
> >> very good Richard, thank you. I was working along these lines with the 
> >> following
> >>
> >> BIND (REPLACE(STR(?url),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> >>
> >> where ?url contains the match but binds the entire string again to ?email
> >>
> >> eg data:
> >>
> >> url = 
> >> http://www.imagesnippets.com/imgtag/rdf/apple97...@hotmail.com/1598550_10204479279247862_1280347905880818932_o
> >>
> >> query
> >>
> >> SELECT ?email
> >> WHERE  {
> >> ?s ?p ?o
> >> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> >> }
> >>
> >> On Tue, Apr 23, 2019 at 6:00 PM Richard Cyganiak  
> >> wrote:
> >>> Hi Marco,
> >>>
> >>>> On 23 Apr 2019, at 15:53, Marco Neumann  wrote:
> >>>>
> >>>> I think I'm familiar with functions on strings in SPARQL but as far as
> >>>> I can see there is nothing similar to a grep like pattern matching and
> >>>> extraction on strings for SPARQL. Or is there one?
> >>> The replace function does pattern matching and allows extraction of 
> >>> matched sub-patterns:
> >>> https://www.w3.org/TR/sparql11-query/#func-replace 
> >>> <https://www.w3.org/TR/sparql11-query/#func-replace>
> >>> https://www.w3.org/TR/xpath-functions/#func-replace 
> >>> <https://www.w3.org/TR/xpath-functions/#func-replace>
> >>>
> >>> replace(input, pattern, replacement)
> >>>
> >>> The special “variables” $1, $2, $3, and so on can be used in the 
> >>> replacement string. They refer to parts of the input that were matched by 
> >>> the first, second, third, and so on pair of parentheses in the regex 
> >>> pattern. For example:
> >>>
> >>> replace("23 April 2019", "^([0-9][0-9])", "$1")
> >>>
> >>> would return "23" because that is the part of the input matched by the 
> >>> first (and only) pair of parentheses.
> >>>
> >>> Also useful might be Jena’s own apf:strSplit property function:
> >>> https://jena.apache.org/documentation/query/library-propfunc.html 
> >>> <https://jena.apache.org/documentation/query/library-propfunc.html>
> >>>
> >>> It can split a literal into multiple literals based on a regular 
> >>> expression.
> >>>
> >>> Taken together, these two functions can do a wide range of pattern 
> >>> matching and extraction tasks.
> >>>
> >>> Hope that helps,
> >>> Richard
> >>
> --
> Lorenz Bühmann
> AKSW group, University of Leipzig
> Group: http://aksw.org - semantic web research center
>


--


---
Marco Neumann
KONA


Re: pattern matching and extraction function on strings in syntaxARQ

2019-04-23 Thread Marco Neumann
very good Richard, thank you. I was working along these lines with the following

BIND (REPLACE(STR(?url),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)

where ?url contains the match but binds the entire string again to ?email

eg data:

url = 
http://www.imagesnippets.com/imgtag/rdf/apple97...@hotmail.com/1598550_10204479279247862_1280347905880818932_o

query

SELECT ?email
WHERE  {
?s ?p ?o
BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
}

On Tue, Apr 23, 2019 at 6:00 PM Richard Cyganiak  wrote:
>
> Hi Marco,
>
> > On 23 Apr 2019, at 15:53, Marco Neumann  wrote:
> >
> > I think I'm familiar with functions on strings in SPARQL but as far as
> > I can see there is nothing similar to a grep like pattern matching and
> > extraction on strings for SPARQL. Or is there one?
>
> The replace function does pattern matching and allows extraction of matched 
> sub-patterns:
> https://www.w3.org/TR/sparql11-query/#func-replace 
> <https://www.w3.org/TR/sparql11-query/#func-replace>
> https://www.w3.org/TR/xpath-functions/#func-replace 
> <https://www.w3.org/TR/xpath-functions/#func-replace>
>
> replace(input, pattern, replacement)
>
> The special “variables” $1, $2, $3, and so on can be used in the replacement 
> string. They refer to parts of the input that were matched by the first, 
> second, third, and so on pair of parentheses in the regex pattern. For 
> example:
>
> replace("23 April 2019", "^([0-9][0-9])", "$1")
>
> would return "23" because that is the part of the input matched by the first 
> (and only) pair of parentheses.
>
> Also useful might be Jena’s own apf:strSplit property function:
> https://jena.apache.org/documentation/query/library-propfunc.html 
> <https://jena.apache.org/documentation/query/library-propfunc.html>
>
> It can split a literal into multiple literals based on a regular expression.
>
> Taken together, these two functions can do a wide range of pattern matching 
> and extraction tasks.
>
> Hope that helps,
> Richard



-- 


---
Marco Neumann
KONA


  1   2   >