RE: Minimizing data volume

2013-09-10 Thread Andy Turner
Two Geohashes can define a bounding box for a geometry and this can help as a 
first step for potential intersection operations...

The concept of Geohash can be generalised, so we can have different Geohashes 
depending on projection and that can help further...

I'm not sure if there is a standard for Geohashing, but the way it collapses 
dimensions can work for other things. For sure it would work for 3 spatial 
axes. It can also work for collapsing two points in time into a single value - 
something like a temporal bound.

Agreeing a set of fixed mandatory geo-metadata which can include a further look 
up to additional optional geo-metadata is perhaps key.

For a complex geometry, if we can all agree on a standard same way to represent 
it, then an MD5 (http://en.wikipedia.org/wiki/MD5) of this could be a workable 
key and perhaps made even more workable if some organisation were to mint DOIs 
(http://en.wikipedia.org/wiki/Digital_object_identifier) for them...

Andy
http://www.geog.leeds.ac.uk/people/a.turner/

From: Frans Knibbe | Geodan [mailto:frans.kni...@geodan.nl]
Sent: 10 September 2013 20:09
To: Andy Turner; public-lod community
Cc: Leigh Dodds
Subject: Re: Minimizing data volume

On 10-9-2013 13:33, Andy Turner wrote:
Hi,

At least these two OGC standards might be worth having a look at in this 
context:
http://www.opengeospatial.org/standards/geosparql
http://www.opengeospatial.org/standards/tjs

The latter is a Georeferenced Table Joining Service Implementation Standard. In 
the development of this a lot of thought went in to different kinds of linking 
of geographical data. Sorry, but I know very little about the GeoSPARQL 
standard.

The notion of keeping geometry data separate and providing metadata about 
geometries in standard forms is useful. For vector data, the number of points 
in the geometry is one of the key attributes an application might consider 
before pulling that geometry. (Also the size of its representation in bytes - 
both compressed and uncompressed is useful info too - thanks Leigh.)

So, for vector data, the attributes for individual vectors (almost like 
features) can be kept separate from the spatial geometries, and some linkage 
code can be used to join the data together. Yes, there are advantages in terms 
of storage organisation for keeping attributes and geometries separate, but for 
many applications some attributes of the geometries are also wanted, this 
geometrical metadata is important to think about. Computationally some of it 
can be hard to calculate, so once calculated it is perhaps worth storing in 
optional metadata.

Individual points with a single attribute, where the point is defined with 
respect to axes in some geographical coordinate and projection system are 
simple geo-vectors. Lines built from multiple such points (and equations) are 
more detailed/complex, yet these can have simply attributed generalised point 
representations (the location of a smallest circle/sphere encompassing all the 
points in the line - perhaps with a measure of the radius of this). There are 
similar things for regional polygons in two and three dimensions.

With lines and points, their geometries can be simplified in other ways which 
can result in other lines and polygons. Simplifying contiguous polygons to 
maintain topological relationships is not necessarily straightforward.

The point I am trying to make with the above is that there are multiple 
different geometries, not a single geometry for a real world object that can be 
described/defined with RDF. Some of the more generalised forms of the spatial 
geometries can be calculated and stored as metadata in fixed number of field 
type table representations. Often so called bounding boxes and bounding circles 
are use, as are line lengths, perimeters, surface areas, volumes, average 
distances, and ratios of these geometrical attributes. Based on the geometrical 
attributes, further attributes can be derived for other attributes (e.g. 
density).

Consider something complex, like a city. This has multiple geometrical 
representations.

Two more things:

Geohashes (http://en.wikipedia.org/wiki/Geohash) which interleave coordinates 
represented by positions on axes using some predetermined axis order and 
prescription are useful in the context of linking data - as they are string 
representations, that the more truncated they are, provide less precision for 
the location of a point, but they start with the same string sequence.

The other key dimension to think about in geographical relations is time. How 
time relates to all this is important, but this email is already long, so all I 
will sate is that a city now could be very different to a city some years ago 
(in terms of spatial dimension/geometry), yet in some ways they are the same 
place. There are ways to derive (very) complex geometries of ephemeral events, 
you could consider one, like the Olympic games.

HTH and sorry for the long post.
Hello Andy,

Th

Hosting your WebID with WordPress

2013-09-10 Thread Angelo Veltens
Hi folks,

I am glad to announce version 0.3 of my WordPress plugin wp-linked-data

The new version allows you to host a WebID profile within your WordPress
blog. If you already have one, you may although link that one with your
blog account.

You may add a public key in your profile section, so that your
WordPress-hosted WebID can be used to authenticated on the web. It is
also possible to add custom RDF triples.

I hope you find this useful!

The plugin can be found in the plugin repository, and here:
http://wordpress.org/plugins/wp-linked-data/

Best regards,
Angelo



Re: Minimizing data volume

2013-09-10 Thread Rob Warren

Frans,

The nice thing about a sparql server is that you get what you ask for. If you 
want only the "Feature" without the geometry, you can do that. If you only want 
whatever centroid the Feature is linked to, you can do that. If you want 
everything, you can do that. At worst, you can 'count' the length of the 
literal or the number of points to give you an idea of the number of 
coordinates present. 

I'm not completely happy with the opengis literals myself, but realize that 
with basic sparql you can strip the coordinates to the bare numerical 
information (no uri's) and send it in json to the client. Add to this transport 
level compression (web-server's problem) and things are as fast as can be 
expected for any remote storage situation.

You will never compete with a local drive with a binary representation.

best,
rhw


On 2013-09-10, at 4:09 PM, Frans Knibbe | Geodan wrote:

> The problem that I see is how to handle those cases where geometry literals 
> become unwieldy. The GeoSparql specification that you mention provides a way 
> of writing a geometry as a literal in RDF. There may be several approaches as 
> to how to serialize a geometry, but ending up with series of coordinates is 
> inescapable. And I am worried about the impact of these series of coordinates 
> becoming very long. That is why I also do like the idea of providing some 
> extra data to enable a client to distinguish between large and small 
> geometries. The small ones could be downloaded and processed right away, but 
> the bigger ones might need some extra care.




Re: Minimizing data volume

2013-09-10 Thread Frans Knibbe | Geodan

On 10-9-2013 13:33, Andy Turner wrote:


Hi,

At least these two OGC standards might be worth having a look at in 
this context:


http://www.opengeospatial.org/standards/geosparql

http://www.opengeospatial.org/standards/tjs

The latter is a Georeferenced Table Joining Service Implementation 
Standard. In the development of this a lot of thought went in to 
different kinds of linking of geographical data. Sorry, but I know 
very little about the GeoSPARQL standard.


The notion of keeping geometry data separate and providing metadata 
about geometries in standard forms is useful. For vector data, the 
number of points in the geometry is one of the key attributes an 
application might consider before pulling that geometry. (Also the 
size of its representation in bytes -- both compressed and 
uncompressed is useful info too -- thanks Leigh.)


So, for vector data, the attributes for individual vectors (almost 
like features) can be kept separate from the spatial geometries, and 
some linkage code can be used to join the data together. Yes, there 
are advantages in terms of storage organisation for keeping attributes 
and geometries separate, but for many applications some attributes of 
the geometries are also wanted, this geometrical metadata is important 
to think about. Computationally some of it can be hard to calculate, 
so once calculated it is perhaps worth storing in optional metadata.


Individual points with a single attribute, where the point is defined 
with respect to axes in some geographical coordinate and projection 
system are simple geo-vectors. Lines built from multiple such points 
(and equations) are more detailed/complex, yet these can have simply 
attributed generalised point representations (the location of a 
smallest circle/sphere encompassing all the points in the line -- 
perhaps with a measure of the radius of this). There are similar 
things for regional polygons in two and three dimensions.


With lines and points, their geometries can be simplified in other 
ways which can result in other lines and polygons. Simplifying 
contiguous polygons to maintain topological relationships is not 
necessarily straightforward.


The point I am trying to make with the above is that there are 
multiple different geometries, not a single geometry for a real world 
object that can be described/defined with RDF. Some of the more 
generalised forms of the spatial geometries can be calculated and 
stored as metadata in fixed number of field type table 
representations. Often so called bounding boxes and bounding circles 
are use, as are line lengths, perimeters, surface areas, volumes, 
average distances, and ratios of these geometrical attributes. Based 
on the geometrical attributes, further attributes can be derived for 
other attributes (e.g. density).


Consider something complex, like a city. This has multiple geometrical 
representations.


Two more things:

Geohashes (http://en.wikipedia.org/wiki/Geohash) which interleave 
coordinates represented by positions on axes using some predetermined 
axis order and prescription are useful in the context of linking data 
- as they are string representations, that the more truncated they 
are, provide less precision for the location of a point, but they 
start with the same string sequence.


The other key dimension to think about in geographical relations is 
time. How time relates to all this is important, but this email is 
already long, so all I will sate is that a city now could be very 
different to a city some years ago (in terms of spatial 
dimension/geometry), yet in some ways they are the same place. There 
are ways to derive (very) complex geometries of ephemeral events, you 
could consider one, like the Olympic games.


HTH and sorry for the long post.


Hello Andy,

Thank you for the long post and for sharing your thoughts.

Yes, I agree that any real world object can have many different 
geometries, depending on coordinate reference system, level of detail, 
time, method of measurement and whatnot. But I don't think that is a 
problem. Linked Data is very capable of sharing different perspectives 
of a single real world phenomenon, and also of annotating those 
different perspectives to help with correctly interpreting them.


The problem that I see is how to handle those cases where geometry 
literals become unwieldy. The GeoSparql specification that you mention 
provides a way of writing a geometry as a literal in RDF. There may be 
several approaches as to how to serialize a geometry, but ending up with 
series of coordinates is inescapable. And I am worried about the impact 
of these series of coordinates becoming very long. That is why I also do 
like the idea of providing some extra data to enable a client to 
distinguish between large and small geometries. The small ones could be 
downloaded and processed right away, but the bigger ones might need some 
extra care.


Thinking about this, I wonder if the idea of a general com

RE: Minimizing data volume

2013-09-10 Thread Andy Turner
Hi,

At least these two OGC standards might be worth having a look at in this 
context:
http://www.opengeospatial.org/standards/geosparql
http://www.opengeospatial.org/standards/tjs

The latter is a Georeferenced Table Joining Service Implementation Standard. In 
the development of this a lot of thought went in to different kinds of linking 
of geographical data. Sorry, but I know very little about the GeoSPARQL 
standard.

The notion of keeping geometry data separate and providing metadata about 
geometries in standard forms is useful. For vector data, the number of points 
in the geometry is one of the key attributes an application might consider 
before pulling that geometry. (Also the size of its representation in bytes - 
both compressed and uncompressed is useful info too - thanks Leigh.)

So, for vector data, the attributes for individual vectors (almost like 
features) can be kept separate from the spatial geometries, and some linkage 
code can be used to join the data together. Yes, there are advantages in terms 
of storage organisation for keeping attributes and geometries separate, but for 
many applications some attributes of the geometries are also wanted, this 
geometrical metadata is important to think about. Computationally some of it 
can be hard to calculate, so once calculated it is perhaps worth storing in 
optional metadata.

Individual points with a single attribute, where the point is defined with 
respect to axes in some geographical coordinate and projection system are 
simple geo-vectors. Lines built from multiple such points (and equations) are 
more detailed/complex, yet these can have simply attributed generalised point 
representations (the location of a smallest circle/sphere encompassing all the 
points in the line - perhaps with a measure of the radius of this). There are 
similar things for regional polygons in two and three dimensions.

With lines and points, their geometries can be simplified in other ways which 
can result in other lines and polygons. Simplifying contiguous polygons to 
maintain topological relationships is not necessarily straightforward.

The point I am trying to make with the above is that there are multiple 
different geometries, not a single geometry for a real world object that can be 
described/defined with RDF. Some of the more generalised forms of the spatial 
geometries can be calculated and stored as metadata in fixed number of field 
type table representations. Often so called bounding boxes and bounding circles 
are use, as are line lengths, perimeters, surface areas, volumes, average 
distances, and ratios of these geometrical attributes. Based on the geometrical 
attributes, further attributes can be derived for other attributes (e.g. 
density).

Consider something complex, like a city. This has multiple geometrical 
representations.

Two more things:

Geohashes (http://en.wikipedia.org/wiki/Geohash) which interleave coordinates 
represented by positions on axes using some predetermined axis order and 
prescription are useful in the context of linking data - as they are string 
representations, that the more truncated they are, provide less precision for 
the location of a point, but they start with the same string sequence.

The other key dimension to think about in geographical relations is time. How 
time relates to all this is important, but this email is already long, so all I 
will sate is that a city now could be very different to a city some years ago 
(in terms of spatial dimension/geometry), yet in some ways they are the same 
place. There are ways to derive (very) complex geometries of ephemeral events, 
you could consider one, like the Olympic games.

HTH and sorry for the long post.

Andy
http://www.geog.leeds.ac.uk/people/a.turner/

From: Frans Knibbe | Geodan [mailto:frans.kni...@geodan.nl]
Sent: 10 September 2013 11:11
To: Leigh Dodds
Cc: public-lod community
Subject: Re: Minimizing data volume

On 9-9-2013 16:48, Leigh Dodds wrote:

Hi,



Before using compression you might also make a decision about whether

you need to represent all of this information as RDF in the first

place.



For example, rather than include the large geometries as literals, why

not store them as separate documents and let clients fetch the

geometries when needed, rather than as part of a SPARQL query?



Geometries can be served using standard HTTP compression techniques

and will benefit from caching.



You can provide summary statistics (including size of the document,

and properties of the described area, e.g. centroids) in the RDF to

help address a few common requirements, allowing clients to only fetch

the geometries they need, as they need them.



This can greatly reduce the volume of data you have to store and

provides clients with more flexibility.



Cheers,



L.

Yes, that is something to consider. Thanks for broadening my mind! I think such 
an approach may be suited for certain kinds of high volume data, like images or 
vide

Re: Minimizing data volume

2013-09-10 Thread Frans Knibbe | Geodan

On 9-9-2013 16:48, Leigh Dodds wrote:

Hi,

Before using compression you might also make a decision about whether
you need to represent all of this information as RDF in the first
place.

For example, rather than include the large geometries as literals, why
not store them as separate documents and let clients fetch the
geometries when needed, rather than as part of a SPARQL query?

Geometries can be served using standard HTTP compression techniques
and will benefit from caching.

You can provide summary statistics (including size of the document,
and properties of the described area, e.g. centroids) in the RDF to
help address a few common requirements, allowing clients to only fetch
the geometries they need, as they need them.

This can greatly reduce the volume of data you have to store and
provides clients with more flexibility.

Cheers,

L.


Yes, that is something to consider. Thanks for broadening my mind! I 
think such an approach may be suited for certain kinds of high volume 
data, like images or video. But I do have some doubts about its 
effectiveness for geographical data:


1) In geographical data sets geometries typically have different sizes. 
Some may be very big, others may be reasonably small. So where to draw 
the limit?


2) When using SPARQL and RDF it is already possible to provide summary 
statistics and leave it to the client to fetch the geometries if needed. 
However, it is not standard practice to provide summaries like centroid, 
bounding box or coordinate count for each geometry, but perhaps it 
should be.


3) On the surface, this approach seems to add complexity to data 
retrieval, for both clients and servers. Instead of one way of 
publishing and getting data, there will be two ways.


4) Having to fetch geometries one at a time, instead of processing them 
all from one data set, could complicate matters and also introduce some 
loss of performance. I can imagine this method working well for things 
like images, videos or files, because they are typically used one at a 
time. But in many cases geometries should be available all at once, to 
draw on a map for instance.


5) I think most geometries are stored as attribute data in relational 
databases. Preprocessing them to make them available as separate files 
can be done offline. But in other cases the geometries are transient, 
they could be generated by a function in a query. The method should work 
with performance gains in those cases too.



Regards,
Frans




On Mon, Sep 9, 2013 at 10:47 AM, Frans Knibbe | Geodan
 wrote:

Hello,

In my line of work (geographical information) I often deal with high volume
data. The high volume is caused by single facts having a big size. A single
2D or 3D geometry is often encoded as a single text string and can consist
of thousands of numbers (coordinates). It is easy to see that this can cause
performance issues with transferring and processing data. So I wonder about
the state of the art in minimizing data volume in Linked Data. I know that
careful publication of data will help a bit: multiple levels of detail could
be published, coordinates could use significant digits (they almost never
do), but it seems to me that some kind of compression is needed too. Is
there something like a common approach to data compression at the moment?
Something that is understood by both publishers and consumers of data?

Regards,
Frans








--
--
*Geodan*
President Kennedylaan 1
1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347
E frans.kni...@geodan.nl
www.geodan.nl  | disclaimer 


--