On Tue, Dec 4, 2018 at 1:04 PM Greg Albiston <galbis...@mail.com> wrote:
> Hi Marco, > > 2. The GeoSPARQL-Fuseki application has some options for convenience in > setting up the Fuseki server. It looks like the two minute delay is > caused by applying RDFS inferencing to the dataset and then writing the > results into the datset (i.e. Jena operations). The GeoSPARQL schema has > a class and property hierachy that a user can apply to their dataset for > some of the functionality. The inferencing is applied by default when > loading a file, but also when connecting to a TDB, in case it hasn't > been done manually by the user. The other potentially costly operation > is creating "hasDefaultGeometry" properties, which is switched off by > default. > ah OK that's good to know > > The following line should lead to quicker loading the second time. > > ./geosparql-fuseki --loopback false --tdb TDB1 --inference this looks useful I will check it out tonight > > I could change the setup so that file loading applies inferencing by > default and TDB does not, but I thought picking one would be better for > consistent behaviour. Always true means less burden for users working > out why they might have a problem after loading their dataset. > > There is probably a broader question as to how/if these options should > be integrated in with Fuseki, whether it should be a separate > application or they should be left out. I think they are useful to a > user who is looking for a GeoSPARQL solution. Currently, > GeoSPARQL-Fuseki is using the main/embedded server so doesn't have a GUI > etc. > 3. I get what you mean about the invalidty of the query now. The polygon > is invalid because it is not closed. However, I'm unclear about how > these errors and warnings are handled any different to if there was a > SPARQL syntax error. A Query Parse Exception is thrown with full stack > trace. The error highlights the specific problem while the warning shows > the context of the error and stack trace. This made it easier to hunt > down these kinds of problems when they could be coming from a query or > the dataset. What would you be looking for instead? > it would be great if we could tie this closer into query processor and have the query canceled on a spatial pre processor error like the one above and report something to the user. because now it seems to hit all wkts in the dataset before finishing up (of course ignoring LIMIT in the sparql query) while the user waits with no further information to be finally presented with a an empty results table. Best, Marco > > Thanks, > > Greg > > On 04/12/2018 12:01, Marco Neumann wrote: > > comments inline > > > > On Mon, Dec 3, 2018 at 5:14 PM Greg Albiston <galbis...@mail.com> wrote: > > > >> Hi Marco, > >> > >> 1. As mentioned this shouldn't be too difficult to support. > >> > > indeed not difficult but needs a decision > > > > you could try with the following geonames dataset > > > > all-geonames_lotico.ttl.gz > > > > > > > >> 2. Yes, the indexing, or rather caching, is in-memory, but it is > >> on-demand. There shouldn't be any delay at start-up beyond what Jena > >> needs to do. The cost comes during query execution. The key invariant > >> data produced for solutions is retained for a short period of time (but > >> can be configured to be retained until termination). Some regularly > >> re-used info is always kept until termination (e.g. any spatial > >> reference system transformation that has been requested). > >> > > the following will create and populate the TDB dataset > > > > ./geosparql-fuseki --loopback false --rdf_file ./lm.ttl --tdb TDB1 > > > > I presume this message refers to the creation of the spatial cache / > index > > > > 6:05:46.685 INFO Applying GeoSPARQL Schema - Started > > 6:07:44.826 INFO Applying GeoSPARQL Schema - Completed > > > > next time I can call TDB directly > > > > ./geosparql-fuseki --loopback false --tdb TDB1 > > > > 6:08:38.665 INFO Applying GeoSPARQL Schema - Started > > 6:10:18.661 INFO Applying GeoSPARQL Schema - Completed > > > > takes approximately 2m for a very small data set. the same fuseki with > > tdb+jena-spatial restarts almost instantaneously even with reasonably > large > > data sets (see geonames). > > > > > >> The main benefit of this is de-serialising geometry literals. The > >> spatial relations arguments are between a pair of geometry literals, one > >> of which is likely to be the same in the next solution, so keeping hold > >> of both means in alot of cases the de-serialisation can be avoided for > >> one (and possibly both if still retained from a previous set of > solutions). > >> > > might be a good idea to serialize the cache object of de-serialisized > > geometries to disk to speed up the boot process. maybe Andy could assist > or > > even align this with tdb > > > > > >> The aim was to only do work that's needed, not do repeat work and to be > >> generally quick (i.e. rely on JTS to be optimised for quick solutions > >> between the geometry pairs and Jena to optimise queries). There are 24 > >> spatial relations and about half a dozen other functions so > >> pre-computing every combination gets big quickly and produces data that > >> users might not want/use. > >> > >> A rough check of most the spatial relations only requires a bounding box > >> intersection or type check, so negative results can be quickly > >> discarded. I looked into caching and storing to file, but there just > >> wasn't the benefit in my use case. It took longer to load up then > >> execute than just execute from fresh and cache. Also, the spatial > >> indexes implemented by JTS aren't designed/suited for the spatial > >> relations. If there is a use-case that gets more benefit from > >> pre-computing or storing between programme execution then I'm sure it > >> can be adapted for, but in the context of GeoSPARQL this approach was > >> effective. > >> > >> 3. If you could send me the dataset that causes these errors then I'll > >> happily have a look into it. > >> > > you can use this simple list of point geometries here > > > > http://www.lotico.com/lm.ttl.gz > > > > this query will parse and execute > > > > PREFIX geo: <http://www.opengis.net/ont/geosparql#> > > PREFIX geof: <http://www.opengis.net/def/function/geosparql/> > > > > SELECT ?well > > WHERE { > > ?well <http://www.wikidata.org/entity/P625> ?geometry . > > FILTER(geof:sfWithin(?geometry,"POLYGON((-10 50,2 50,2 55,-10 55,-10 > > 50))"^^geo:wktLiteral)) > > } LIMIT 10 > > > > this one will parse and fail > > > > PREFIX geo: <http://www.opengis.net/ont/geosparql#> > > PREFIX geof: <http://www.opengis.net/def/function/geosparql/> > > > > SELECT ?well > > WHERE { > > ?well <http://www.wikidata.org/entity/P625> ?geometry . > > FILTER(geof:sfWithin(?geometry,"POLYGON((-10 50,2 50,2 55,-10 55,-10 > > 51))"^^geo:wktLiteral)) > > } LIMIT 10 > > > > warn/error messages > > > > 6:17:45.887 ERROR Points of LinearRing do not form a closed linestring - > > Illegal WKT literal: POLYGON((-10 50,2 50,2 55,-10 55,-10 51)) > > 6:17:45.887 WARN General exception in (< > > http://www.opengis.net/def/function/geosparql/sfWithin> ?geometry > > "POLYGON((-10 50,2 50,2 55,-10 55,-10 51))"^^< > > http://www.opengis.net/ont/geosparql#wktLiteral>) > > org.apache.jena.datatypes.DatatypeFormatException: Points of LinearRing > do > > not form a closed linestring - Illegal WKT literal: POLYGON((-10 50,2 > 50,2 > > 55,-10 55,-10 51)) > > at > > > io.github.galbiston.geosparql_jena.implementation.datatype.WKTDatatype.parse(WKTDatatype.java:109) > > at > > > io.github.galbiston.geosparql_jena.implementation.GeometryWrapper.extract(GeometryWrapper.java:905) > > at > > > io.github.galbiston.geosparql_jena.implementation.GeometryWrapper.extract(GeometryWrapper.java:834) > > at > > > io.github.galbiston.geosparql_jena.geof.topological.GenericFilterFunction.exec(GenericFilterFunction.java:57) > > at > > > io.github.galbiston.geosparql_jena.geof.topological.GenericFilterFunction.exec(GenericFilterFunction.java:42) > > at > > org.apache.jena.sparql.function.FunctionBase2.exec(FunctionBase2.java:55) > > at > > org.apache.jena.sparql.function.FunctionBase.exec(FunctionBase.java:63) > > at > > org.apache.jena.sparql.expr.E_Function.evalSpecial(E_Function.java:89) > > at > > org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:100) > > at > > org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:41) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:69) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:39) > > at > > > org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) > > at > > > org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:74) > > at > > > org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext(ResultSetCheckCondition.java:55) > > at > > > org.apache.jena.fuseki.servlets.SPARQL_Query.executeQuery(SPARQL_Query.java:350) > > at > > > org.apache.jena.fuseki.servlets.SPARQL_Query.execute(SPARQL_Query.java:288) > > at > > > org.apache.jena.fuseki.servlets.SPARQL_Query.executeWithParameter(SPARQL_Query.java:242) > > at > > > org.apache.jena.fuseki.servlets.SPARQL_Query.perform(SPARQL_Query.java:217) > > at > > > org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:183) > > at > > > org.apache.jena.fuseki.servlets.ActionService.execCommonWorker(ActionService.java:98) > > at > > org.apache.jena.fuseki.servlets.ActionBase.doCommon(ActionBase.java:74) > > at > > > org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:73) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > > at org.eclipse.jetty.server.Server.handle(Server.java:503) > > at > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) > > at > > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) > > at > > org.eclipse.jetty.io > .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) > > at org.eclipse.jetty.io > .FillInterest.fillable(FillInterest.java:103) > > at > > org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) > > at java.base/java.lang.Thread.run(Thread.java:834) > > > > > > > > > >> 4. The "geo:" prefix is the one used throughout the GeoSPARQL > >> documentation, so has been used for consistency when needed. The code > >> doesn't have a dependency on the "geo:" prefix, so there is no > >> requirement on the user. It would probably cause more confusion to those > >> following GeoSPARQL examples to not use the "geo:" prefix when > necessary. > >> > >> > > I know but it needs some discussion about re-purposing of prefixes here > > > > > > > >> Thanks, > >> > >> Greg > >> > >> On 03/12/2018 15:46, Marco Neumann wrote: > >>> Hi Greg, ok let's do it in the dev list first. > >>> > >>> 1. indeed the picking up of lat/long is a common if not the most common > >> use > >>> case for building a spatial index. last but not least to perform a > >>> proximity search on 2D point geometries. (I know that the ogc > recommends > >> a > >>> transformation path with a sparql query to turn lat / long into a WKT > >>> geometry datatypes maybe we could provide this as a convenient option > >> with > >>> the release) > >>> > >>> 2. as far as I can see the spatial index in geosparql-jena is memory > >> based. > >>> it creates additional load time during server startup. Am I missing > >>> something here, is there a file base spatial index as well? > >>> > >>> 3. error handling is disruptive. since we are hitting the spatial index > >>> first during query execution I am seeing a number of unpleasant side > >>> effects with syntactically correct sparql but semantically incorrect > >>> spatial queries. e.g. > >>> > >>> PREFIX geo: <http://www.opengis.net/ont/geosparql#> > >>> PREFIX geof: <http://www.opengis.net/def/function/geosparql/> > >>> > >>> SELECT ?well > >>> WHERE { > >>> ?well <http://www.wikidata.org/entity/P625> ?geometry . > >>> FILTER(geof:sfWithin(?geometry,"POLYGON((-77 38,-77 0,0 38,0 0,0 > >>> 0))"^^geo:wktLiteral)) > >>> } LIMIT 10 > >>> > >>> 4. The re-use of the geo: prefix really isn't your problem I know but > it > >>> will create confusion. Wouldn't geosparql: be a better prefix for this? > >> Is > >>> the OGC now married to this prefix? It used to be > >>> http://www.w3.org/2003/01/geo/wgs84_pos# > >>> > >>> and there is more to come... > >>> > >>> again thank you for working on this with your team Greg, much > >> appreciated. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Mon, Dec 3, 2018 at 2:15 PM Greg Albiston <galbis...@mail.com> > wrote: > >>> > >>>> Hi Marco, > >>>> > >>>> I've had a look at the doucmentation for Jena Spatial and it would > seem > >>>> the main data change is the use of the Lat/Lon pairs. > >>>> This doesn't comply with the GeoSPARQL standard so support for this > >>>> would be a Jena extension. > >>>> > >>>> This could be accomodated by a property function to convert to a WKT > >>>> Point literal with WGS84/CRS84 spatial reference. > >>>> Users would then be able to use the result in query for any of the > >>>> GeoSPARQL functions. > >>>> > >>>> Alternatively, the spatial relations could all have an extra property > >>>> function defined, provide the conversion and hand over to the > GeoSPARQL > >>>> equivalent property function. This wouldn't take long to integrate as > >>>> individual spatial relation property functions are very minimal. > >>>> > >>>> The other item that jumps out is the Jena spatial property functions. > >>>> > >>>> spatial:nearby, spatial:withinCircle, spatial:withinBox and > >>>> spatial:interesectBox all seem to be variations of Simple Features > >>>> spatial relations that are covered by GeoSPARQL. These property > >>>> functions can be incorpated for backward compatability but it's > whether > >>>> these should just be offered as the current Lat/Lon pairs or expanded > to > >>>> accept geometry literals (i.e. WKT, GML etc.)? The latter option > >>>> shouldn't be hard to provide for the same reason as above. > >>>> > >>>> spatial:north, spatial:south, spatial:west and spatial:east are not in > >>>> GeoSPARQL. Again its a question of whether these should be provided > more > >>>> generally for WKT, GML geometry literals? There might need to be a bit > >>>> of extra work handling both geographic and planar spatial reference > >>>> systems, as Jean Spatial is only doing a spatial reference system. > >>>> > >>>> I don't think it would be too difficult to support the existing Jena > >>>> Spatial functionality, at least based on the webpage > >>>> (https://jena.apache.org/documentation/query/spatial-query.html), as > an > >>>> extension to what is provided by GeoSPARQL. > >>>> > >>>> Is there anything else that you were concerned about? > >>>> > >>>> Thanks, > >>>> > >>>> Greg > >>>> > >>>> > >>>> On 03/12/2018 10:53, Marco Neumann wrote: > >>>>> so I've had a look at this and while I think geosparql-jena is a very > >>>>> welcomed contribution to the jena project I don't think we should > rush > >>>> with > >>>>> the retirement of jena-spatial at this point as Greg's approach will > >>>>> require users to make changes to their existing data. > >>>>> > >>>>> I will engage Greg on us...@jena.apache.org again to clarify a few > >>>> things > >>>>> and hopefully get more people involved in this conversation around > >>>> spatial, > >>>>> geosparql and jena. > >>>>> > >>>>> > >>>>> > >>>>> On Fri, Nov 30, 2018 at 1:23 PM Marco Neumann < > marco.neum...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> how quickly can you hook geosparql into the release? > >>>>>> > >>>>>> this would make lucene spatial obsolete in the next release. has > Greg > >>>>>> released performance benchmarks for his implementation? as I said I > >> will > >>>>>> take a look at it over the weekend when time permits. > >>>>>> > >>>>>> On Fri, Nov 30, 2018 at 11:02 AM Andy Seaborne <a...@apache.org> > >> wrote: > >>>>>>> We could retire jena-spatial immediately after 3.10.0 - given the > >>>> Lucene > >>>>>>> change that might be smoother, one release with updated > dependencies. > >>>>>>> > >>>>>>> If that is the way forward, I think it is (mildly) better to take > it > >>>> out > >>>>>>> of the Fuseki/Full build in 3.10.0. > >>>>>>> > >>>>>>> Andy > >>>>>>> > >>>>>>> On 29/11/2018 17:00, Marco Neumann wrote: > >>>>>>>> I will have to look into that I guess since I am frequent user of > >>>>>>> spatial > >>>>>>>> data. > >>>>>>>> > >>>>>>>> why not go to 7.5? was there an incompatibility? > >>>>>>>> > >>>>>>>> On Thu 29. Nov 2018 at 16:53, Andy Seaborne <a...@apache.org> > >> wrote: > >>>>>>>>> Jena 3.1.0 would be around the end of the year. I'd like to make > >> use > >>>> of > >>>>>>>>> Greg's GeoSPARQL project the "headline" item for the release and > to > >>>>>>>>> retire jena-spatial in 3.10.0 as an indication of this. > >>>>>>>>> > >>>>>>>>> Because retirement is a new process for the project, I'm sending > >> this > >>>>>>>>> first 3.10.0 message quite early to give us discussion time. > >>>>>>>>> > >>>>>>>>> == Retirements > >>>>>>>>> > >>>>>>>>> We have talked about this before but not actually done anything. > >> See > >>>>>>>>> separate thread for discussion on retirement process and for the > >>>> first > >>>>>>>>> modules: > >>>>>>>>> > >>>>>>>>> jena-spatial > >>>>>>>>> jena-fuseki1 > >>>>>>>>> jena-csv > >>>>>>>>> > >>>>>>>>> == Headlines > >>>>>>>>> > >>>>>>>>> JENA-664 : GeoSPARQL support > >>>>>>>>> > >>>>>>>>> I'd like to make use of Greg's GeoSPARQL project the "headline" > >> item > >>>>>>> for > >>>>>>>>> the release and to retire jena-spatial in 3.10.0 as an indication > >> of > >>>>>>> this. > >>>>>>>>> JENA-1621 : Lucene upgrade to 7.4 > >>>>>>>>> May need to reload lucene indexes. > >>>>>>>>> (e.g. the lucene index was create originally with Lucene v5.x > >> (prior > >>>>>>>>> Jena 3.3.0). See Lucene upgrade tool. > >>>>>>>>> https://lucene.apache.org/solr/guide/7_4/indexupgrader-tool.html > >>>>>>>>> > >>>>>>>>> JENA-1623 : Fuseki security > >>>>>>>>> JENA-1627 : HTTP support > >>>>>>>>> https://issues.apache.org/jira/browse/JENA-1623 > >>>>>>>>> > >> > http://jena.staging.apache.org/documentation/fuseki2/data-access-control > >>>>>>>>> == JIRA: > >>>>>>>>> > >>>>>>>>> 31 currently. > >>>>>>>>> > >>>>>>>>> https://s.apache.org/jena-3.10.0-jira > >>>>>>>>> > >>>>>>>>> == Updates > >>>>>>>>> > >>>>>>>>> Only plugins. JENA-1624 > >>>>>>>>> > >>>>>>>>> surefire : 2.21.0 -> 2.22.1 (+ SUREFIRE-1588) > >>>>>>>>> compiler : 3.7.0 -> 3.8.0 > >>>>>>>>> shade : 3.1.0 -> 3.2.0 > >>>>>>>>> > >>>>>>>>> Andy > >>>>>>>>> > >>>>>> -- > >>>>>> > >>>>>> > >>>>>> --- > >>>>>> Marco Neumann > >>>>>> KONA > >>>>>> > >>>>>> > > > -- --- Marco Neumann KONA