Re: [jump-users] Wanted: Ogr2ogr-abled data format

Paul Austin Thu, 09 Aug 2007 11:12:38 -0700

I can see need for two different types of files for storing GIS data.

The first is an interchange format that you would use to give data toother people and should be an easy to understand format so that readersand/or converters to other formats could be created to load and savedata in this format. For this a text based format is the best option asit can be understood by humans. Think of those poor archivists right nowtrying to read binary data formats from as little as 10 years ago.Examples of this would include GML, SAIF (BC government only butactually not a bad format), and CSV. The problem with GML (and SAIF andMartin's proposal) is that there is a lot of overhead in the text fileas the names of the attributes are repeated for every record in thefile. CSV with a header row is a much more compact format and is alsoeasy to read (even in a non GIS tool such as MS Excel) and having thegeometry encoded in WKT makes that also easy to read. The disadvantageof CSV is that you can only have one type of feature per file so wouldneed many files for each feature type. But you can always bundle them upinto a Zip file for transport.

These interchange formats are however lousy for actually making edits tothe data as you have to read in the entire file to get the contents (youcan't jump to features using a BBOX) and also when making edits you haveto save out the data as a completely new file format.

The second type of file is useful for making direct edits to data. Thisis where the Shape format has some advantages as you can jump directlyto specific records and even make edits in place to the data. Note thatgeometry changes may require deleting the old record and appending a newone if the number of coordinates in a Line/Polygon changes. The shapeformat does however have a bunch of disadvantages such as requiring 3files, only short attribute names.

The best option for editing data however is to use a database it managesall the nasty storage, query and update mechanisms for you. There hasbeen some talk in the GeoTools areana of adding spatial extensions toone of the embedded Java databases so that you can use it to manage abinary file based feature storage mechanism. This would have theadvantage in single user environments of not requiring a database instance.


Paul

Martin Davis wrote:

Well, this is the holy grail of spatial formats. GML was (is)supposed to fill this need, but as you have observed it's prettycomplicated to actually use in practice. The JUMP-GML format is a bitof a hack alright (although no worse than the FME-GML format, IMO).The right way to do GML in JUMP is probably to emit a simple-profileGML file generated automatically off the layer schema (e.g. usingattribute names as element names). But in order for other tools toread this I believe that an XML-Schema file would have to be producedas well. Also, JUMP would have to be able to read GML Schemas as well- thus making everything much more complicated.Can you propose some other simple, well-known format? Or maybe askthe OGR people what they would recommend?
Personally, I'm in favour of a text format, since it's easier tounderstand. Although, it does tend to lose fidelity of the numeric(coordinate) information. Maybe a simple binary format would bebetter (or a text format with WKB-Hex encoded geometry.
I have had some thoughts about designing a *very* simple text formatlooking something like this:
---------------------------------
Geometry: POINT (0 0)
Attribute1: "Some string value"
Attribute2: 123.456
Attribute3: false

Geometry:  POINT(1 1)
etc...
---------------------------------
This is easy to read, generate, and parse. It's much easier to readthan GML, IMO. Another advantage is that it's very easy to generatethis as output from other applications, which is a major use case for me.
Other alternatives would be CSV, JSON, YAML...

Martin

Rahkonen Jukka wrote:
Hi,

It would be really nice if OpenJUMP had, in addition to shapefiles, some
data format that could be used both for input and output, and that could
be recognised by other software as well. Ogr2ogr conversion utility is
in my mind right now.

- Shapefiles can be used for input and output, but they can hold just
one type of features and you may need to split your data to points,
lines and polygons first. That is not very handy.
- FME-GML output is OK. One file can hold all the features and ogr2ogr
reads them fine. Unfortunately ogr2ogr does not write out FME format so
you need to convert data to shapefiles in order to get it back to JUMP.
Not very handy either.
- JUMP-GML is recognised only by JUMP so it is not an exchange format.
- GML 2.0 input-output is only a theoretical alternative for me, because
I have never figured out how I should make the templates needed.

Some time ago Sunburned was making plans for JUMP binary format.  That
sounds good for me. Or could it be possible to co-operate with gdal/ogr
folks so that JUMP-GML could be used both for input/output?  And is it
impossible to make reading/writing of GML 2.0 more simple in JUMP?

-Jukka Rahkonen-

_______________________________________________
jump-users mailing list
[email protected]
http://lists.refractions.net/mailman/listinfo/jump-users

_______________________________________________
jump-users mailing list
[email protected]
http://lists.refractions.net/mailman/listinfo/jump-users

Re: [jump-users] Wanted: Ogr2ogr-abled data format

Reply via email to