I'm pleased to announce that Ying Jiang has had the jena-spatial project
[2] accepted into the the Google Summer of Code.
Please welcome Ying Jiang [1].
Andy
[1] Intro. http://s.apache.org/cgS
[2] Submitted project description:
Background
GeoSPARQL [1] is a complete approach to spatial query but it is
complicated and directed more towards the specialist, not the average
linked data developer with SPARQL knowledge. In fact, not all spatial
queries are complicated. The sweet spot is something simpler (and less
capable) than GeoSPARQL, because the average web developer, even if
writing SPARQL, isn't looking for a complete solution to all geospatial
use cases. They are looking for something easier (= smaller). Many use
cases are covered by provision of a single facility like get all objects
within a given radius or within a given bounding box. For example, this
query makes a spatial query for the places within 10 kilometers of
Bristol UK (which as latitude/longitude of 51.46, 2.6).
SELECT ?placeName
{
?place spatial:query (51.46 2.6 10) .
?place rdfs:label ?placeName
}
My mentor has developed an initial and experimental implementation of
GeoARQ [3] as proof of the idea, which uses Lucene spatial capabilities
to provide a spatial property function for ARQ. However, GeoARQ is not
specifically related to the formal Jena project, whose internal design
is quite old and does not play so well with RDF datasets and update, as
well as adding assembler capabilities to integrate with Fuseki. To
resolve these problems, I will develop an extension to Jena ARQ, called
jena-spatial, which is exploiting the spatial capabilities of Lucene to
create a fully integrated capability that we can add to the main
download when it's stable. Jena provides a similar property function
architecture in jena-text [2], so this project is to take that concept
and apply it to geospatial information.
Project Scopes and Approaches
I have already got engaged with the Jena community through email in the
past weeks. I’m sure that I understand the needs of the project and the
commitments to make to my mentor. Here’re the project scopes and their
approaches summarized from the discussions with my mentor.
1. Spatial Data Indexer
Firstly, I’ll design and develop a module for spatial information
Indexing. The indexer can read the spatial data from Jena Model and
Statement, and transform them into Lucene Document. The indexing process
can be controlled by startIndexing(), finishIndexing(), abortIndexing()
and close(). A command line tool (by extending CmdARQ [4]) should be
provided for reading spatial datasets and indexing them for assembling,
with the arguments like “[--desc | --dataset] assemblerPath”.
2. Spatial Property Functions
What kinds of spatial property functions should be developed in this
project? GeoSPARQL seems to be a complete solution. But it’s not
necessary to get into full GeoSPARQL which is too complicated for
non-specialists. Geospatial stretched to much more complicated
relationships but a lot of useful things can be done with less than the
full geospatial model which is too complicated for the average web
developer to take on board with their limited time. For example, there
is a simple vocabulary for expressing WGS80 information ; there is also
the point information in WKT ; there’s spatial relations ontology for
reference.
On the other hand, I've studied Lucene sptatial to figure out what
spatial relations are possible to be implemented. For the current
release of Lucene 4.2.1, it provides a high level abstraction for
spatial query usage.
- SpatialOperation : compares a stored geometry to a supplied geometry.
such as "IsWithin" and "Intersects". Actually, all of them are not
supported.
- SpatialStrategy : encapsulates an approach to indexing and searching
based on shapes. Different implementations will support different
features. There're 3 implementations now: PointVectorStrategy,
RecursivePrefixTreeStrategy and TermQueryPrefixTreeStrategy. For
example, PointVectorStrategy supports only "IsWithin" for Rectangle or
Circle, while RecursivePrefixTreeStrategy can caculate "Intersects" of
any kinds of Shapes. I'd like to make full use of Lucene to make
jena-text support as many spatial relationships as possible. On the
other hand, I can also make new SpatialOperations like
"Northing/Westing" mentioned in that are not available in Lucene.
To wrap things up, here're the jena-spatial property functions that I
can do this summer:
2.1 ?A -> within -> B
A: Point Var
B: Rectangle or Circle
Approach: PointVectorStrategy directly supports this
Note: a circle can be specified in the query as (point, radius). We can
treat Rectangle and Circle differently in this way:
?point :withinCircle ( x,y,r ) .
?point :withinBox ( x1,y1,x2,y2) .
2.2 A -> nearby ->?B, C
A: Point
B: Point Var
C: radius
Approach: RecursivePrefixTreeStrategy makes
SpatialOperation.Intersects for the Circle with the center of A and the
radius of C
2.3 A -> intersects -> ?B
A: any Shape
B: non-Point Shape Var
Approach: RecursivePrefixTreeStrategy directly supports this (note: not
tested)
2.4 A -> intersects -> ?B
A: any Shape
B: Point Var
Approach: TermQueryPrefixTreeStrategy directly supports this
2.5 A -> northing/southing/easting/westing -> B
A: Point
B: Point
Approach: use rangeQuery, or RecursivePrefixTreeStrategy makes
SpatialOperation.Intersects of north/south/east/west Rectangle
2.6 A -> disjoints ->B
A: Rectangle
B: Rectangle
Approach: PointVectorStrategy directly supports this (note: it doesn't
handle dateline cross).
From the users’ views, the most common use case is to query all places
with a box or circle centered on a given point. Therefore, I’d like to
emphasize on 2.1 and 2.2, with the others marked as the optional ones to
be implemented if time permits.
3. Jena Assembler Configuration with Fuseki Integration
I’ll provide a way to describe the Lucene spatial index with a Jena
assembler description in configuration. For example, the user can have
one field, mapping a property to a spatial index field. The Fuseki
configuration simply points to the spatial dataset as the fuseki:dataset
of the service. It should be possible to have a Fuseki server with
spatial support by simply adding it to the build/classpath.