I can see need for two different types of files for storing GIS data.

The first is an interchange format that you would use to give data to other people and should be an easy to understand format so that readers and/or converters to other formats could be created to load and save data in this format. For this a text based format is the best option as it can be understood by humans. Think of those poor archivists right now trying to read binary data formats from as little as 10 years ago. Examples of this would include GML, SAIF (BC government only but actually not a bad format), and CSV. The problem with GML (and SAIF and Martin's proposal) is that there is a lot of overhead in the text file as the names of the attributes are repeated for every record in the file. CSV with a header row is a much more compact format and is also easy to read (even in a non GIS tool such as MS Excel) and having the geometry encoded in WKT makes that also easy to read. The disadvantage of CSV is that you can only have one type of feature per file so would need many files for each feature type. But you can always bundle them up into a Zip file for transport.

These interchange formats are however lousy for actually making edits to the data as you have to read in the entire file to get the contents (you can't jump to features using a BBOX) and also when making edits you have to save out the data as a completely new file format.

The second type of file is useful for making direct edits to data. This is where the Shape format has some advantages as you can jump directly to specific records and even make edits in place to the data. Note that geometry changes may require deleting the old record and appending a new one if the number of coordinates in a Line/Polygon changes. The shape format does however have a bunch of disadvantages such as requiring 3 files, only short attribute names.

The best option for editing data however is to use a database it manages all the nasty storage, query and update mechanisms for you. There has been some talk in the GeoTools areana of adding spatial extensions to one of the embedded Java databases so that you can use it to manage a binary file based feature storage mechanism. This would have the advantage in single user environments of not requiring a database instance.

Paul

Martin Davis wrote:
Well, this is the holy grail of spatial formats. GML was (is) supposed to fill this need, but as you have observed it's pretty complicated to actually use in practice. The JUMP-GML format is a bit of a hack alright (although no worse than the FME-GML format, IMO). The right way to do GML in JUMP is probably to emit a simple-profile GML file generated automatically off the layer schema (e.g. using attribute names as element names). But in order for other tools to read this I believe that an XML-Schema file would have to be produced as well. Also, JUMP would have to be able to read GML Schemas as well - thus making everything much more complicated. Can you propose some other simple, well-known format? Or maybe ask the OGR people what they would recommend?

Personally, I'm in favour of a text format, since it's easier to understand. Although, it does tend to lose fidelity of the numeric (coordinate) information. Maybe a simple binary format would be better (or a text format with WKB-Hex encoded geometry.

I have had some thoughts about designing a *very* simple text format looking something like this:

---------------------------------
Geometry: POINT (0 0)
Attribute1: "Some string value"
Attribute2: 123.456
Attribute3: false

Geometry:  POINT(1 1)
etc...
---------------------------------

This is easy to read, generate, and parse. It's much easier to read than GML, IMO. Another advantage is that it's very easy to generate this as output from other applications, which is a major use case for me.

Other alternatives would be CSV, JSON, YAML...

Martin

Rahkonen Jukka wrote:
Hi,

It would be really nice if OpenJUMP had, in addition to shapefiles, some
data format that could be used both for input and output, and that could
be recognised by other software as well. Ogr2ogr conversion utility is
in my mind right now.

- Shapefiles can be used for input and output, but they can hold just
one type of features and you may need to split your data to points,
lines and polygons first. That is not very handy.
- FME-GML output is OK. One file can hold all the features and ogr2ogr
reads them fine. Unfortunately ogr2ogr does not write out FME format so
you need to convert data to shapefiles in order to get it back to JUMP.
Not very handy either.
- JUMP-GML is recognised only by JUMP so it is not an exchange format.
- GML 2.0 input-output is only a theoretical alternative for me, because
I have never figured out how I should make the templates needed.

Some time ago Sunburned was making plans for JUMP binary format.  That
sounds good for me. Or could it be possible to co-operate with gdal/ogr
folks so that JUMP-GML could be used both for input/output?  And is it
impossible to make reading/writing of GML 2.0 more simple in JUMP?

-Jukka Rahkonen-

_______________________________________________
jump-users mailing list
[email protected]
http://lists.refractions.net/mailman/listinfo/jump-users


_______________________________________________
jump-users mailing list
[email protected]
http://lists.refractions.net/mailman/listinfo/jump-users

Reply via email to