Hi,

As I wrote, I got a motivation for my first mail because I have seen that 
people are quite often using GeoJSON for delivering geospatial data as data, to 
be saved on disk and used like shapefiles, GML etc. As a result you get stuff 
like this:

http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=application/json


You wrote and I agree with it "that XML and JSON have very different strengths 
and use cases ". However, people do what they want and I do feel that GeoJSON 
will be used for use cases where XML could be stronger like as the only 
supported format in some download services.


About the nonsensical 4-field schema, it is a little bit violent but just what 
about everybody who is using OpenStreetMap data is doing all the time. OSM 
features are pushed into traditional simple feature model and a set of tags are 
converted to attributes in a fixed schema. There are lots of null fields in the 
data and even that is in a way  nonsensical, it is also practical because it 
makes it possible to use osm2pgsql and PostGIS and Mapnik for rendering.


I am so fixated to consume data that I was not thinking at all about how to 
write GeoJSON with GDAL. I was just thinking that if some data are only 
available as GeoJSON, how users could convert it to PostGIS etc. so that the 
data types of the attributes will be the same as in the original data.


Because GeoJSON will not carry the data types as a payload I suppose that the 
current guess-the-datatype approach is the best starting point. Workaround by 
using VRT as Even suggested is good for fine tuning and cast with SQL works as 
well. The correct datatypes may still be somehow uncertain but perhaps those 
who maintain such services will announce the structure of their data on their 
web pages if they feel that it is important and they for example are awaiting 
data updates from users. When it comes to WFS, it seems to be an easy case 
because the XML schema can be reused as "GeoJSON schema".


-Jukka Rahkonen-



________________________________
Sean Gillies <s...@mapbox.com>

> Hi Even, Jukka,

> While the OGC service architecture is heavily dependent on schemas, OGR type 
> schemas are not *generally* useful for GeoJSON. Consider the following 
> abbreviated feature collection:

  "features": [
    {"properties": {"a": 0, "b": "lol"}, ...},
    {"properties": {"c": "2014-11-21", "d": "wut"}, ...}
  ]

> It has two features and they are distinctly different types. A schema that 
> says these features have 4 fields would be nonsensical.

> There are a bunch of different JSON schema approaches and none of them seem 
> to have any traction. https://github.com/json-schema/json-schema for example 
> looks to be stalled. I think the lack of traction reflects some deeper 
> reality: that XML and JSON have very different strengths and use cases and 
> that attempts to XML-ize JSON by adding schemas will always eventually run 
> out of steam.

> For OGR to write schemas into GeoJSON would be a mistake. They could be 
> misleading and because there will never (as far as I can tell) be consensus 
> in the JSON community on the right form of schema, anything OGR implemented 
> would end up being a "loser".


On Fri, Nov 21, 2014 at 6:28 AM, Even Rouault 
<even.roua...@spatialys.com<mailto:even.roua...@spatialys.com>> wrote:
Jukka,

Data type guessing implemented in the OGR GeoJSON driver is quite natural 
hopefully.
A whole scan of the GeoJSON file is made and the following rules are applied :
- if an attribute has integer-only content --> Integer
- if an attribute has an array of integer-only content  --> IntegerList
- if an attribute has integer or floating point content --> Real
- if an attribute has an array of integer or floating point content --> RealList
- if an attribute has an array of anything else content --> StringList
- otherwise --> String

With RFC 50 and other pending improvements in the driver:
- if an attribute has boolean-only content --> Integer(Boolean)
- if an attribute has an array of boolean-only content --> IntegerList(Boolean)
- if an attribute has date-only content --> Date
- if an attribute has time-only content --> Time
- if an attribute has datetime or date content --> DateTime

I'm not sure we want to invent a .jsont format, but if you download
http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py

and run  :

python ogr2vrt.py 
"http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=json";
 test.vrt

This will create you a VRT with the default schema, that you can easily edit.
Note: as with OGR SQL CAST, this is post processing. So if the guess done by 
the GeoJSON driver
leads to a loss of information, you cannot recover it. Hopefully the 
implemented rules will not
lead to information loss.

A better approach would be to have the schema embedded in a JSON way in the 
GeoJSON file itself.
That could be an evolution of the format, but I'm not sure this would be really 
popular,
given JSON/GeoJSON is heavily used by NoSQL approaches...

Hum, doing a quick search, I just found http://json-schema.org/ that appears to 
be an IETF draft.
It doesn't look that the schema is embedded in the data file itself.

There's also GeoJSON-LD that might be a bit related : 
https://github.com/geojson/geojson-ld

CC'ing Sean in case he has thoughts on this.

Even

> Hi,
>
> I wonder if GDAL could have some simple and relatively user friendly way
> for defining a schema for GeoJSON data. The GeoJSON driver seems to guess
> the data types of attributes with some undocumented way but users could
> have better knowledge about the desired schema.
>
> I know I can control the data type by using OGR SQL and CAST as in
> ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson" states.json
> -so
>
> However, perhaps GeoJSON is enough popular for deserving an easier way for
> writing a schema. First I thought that it would be enough to copy the
> "csvt" text file mechanism from the GDAL CSV driver
> http://www.gdal.org/drv_csv.html. However, the csvt file is a plain list of
> types which will be applied to the attributes in the same order than they
> appear in the text file
> "Integer(5)","Real(10.7)","String(15)"
>
> For GeoJSON it would feel more user friendly to include the attribute names
> in the list somehow like
>  "population;Integer(5)","area;Real(10.7)","name;String(15)".
>
> This would make it easier for users to write a valid "jsont" file. A list
> with attribute names could perhaps also help GDAL as well because the
> features in GeoJSON file do not necessarily have same attributes.
>
> As an example this is the right schema for a WFS feature type which is
> captured from
> http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=des
> cribefeaturetype&typename=topp:states
>
>
> name="the_geom" type="gml:MultiPolygonPropertyType"/>
> name="STATE_NAME" type="xsd:string"/>
> name="STATE_FIPS" type="xsd:string"/>
> name="SUB_REGION" type="xsd:string"/>
> name="STATE_ABBR" type="xsd:string"/>
> name="LAND_KM" type="xsd:double"/>
> name="WATER_KM" type="xsd:double"/>
> name="PERSONS" type="xsd:double"/>
> name="FAMILIES" type="xsd:double"/>
> name="HOUSHOLD" type="xsd:double"/>
> name="MALE" type="xsd:double"/>
> name="FEMALE" type="xsd:double"/>
> name="WORKERS" type="xsd:double"/>
> name="DRVALONE" type="xsd:double"/>
> name="CARPOOL" type="xsd:double"/>
> name="PUBTRANS" type="xsd:double"/>
> name="EMPLOYED" type="xsd:double"/>
> name="UNEMPLOY" type="xsd:double"/>
> name="SERVICE" type="xsd:double"/>
> name="MANUAL" type="xsd:double"/>
> name="P_MALE" type="xsd:double"/>
> name="P_FEMALE" type="xsd:double"/>
> name="SAMP_POP" type="xsd:double"/>
>
>
> This is what GDAL is guessing:
> STATE_NAME: String (0.0)
> STATE_FIPS: String (0.0)
> SUB_REGION: String (0.0)
> STATE_ABBR: String (0.0)
> LAND_KM: Real (0.0)
> WATER_KM: Real (0.0)
> PERSONS: Real (0.0)
> FAMILIES: Integer (0.0)
> HOUSHOLD: Real (0.0)
> MALE: Real (0.0)
> FEMALE: Real (0.0)
> WORKERS: Real (0.0)
> DRVALONE: Integer (0.0)
> CARPOOL: Integer (0.0)
> PUBTRANS: Integer (0.0)
> EMPLOYED: Real (0.0)
> UNEMPLOY: Integer (0.0)
> SERVICE: Integer (0.0)
> MANUAL: Integer (0.0)
> P_MALE: Real (0.0)
> P_FEMALE: Real (0.0)
> SAMP_POP: Integer (0.0)
> bbox: RealList (0.0)
>
> -Jukka Rahkonen-
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Spatialys - Geospatial professional services
http://www.spatialys.com<http://www.spatialys.com/>
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
http://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to