Glynn Clements wrote:
Markus Metz wrote:
I like the point of Ivan that off_t is the native type for file offsets. Could G_fseek then use fseeko whenever fseeko is available (ditto for ftello)?

Well, that's the general idea. The only advantage of fseek/ftell is
that they are always available.
I think grass7 should get LFS support as far as possible, today's datasets can easily exceed the 2GB limit. Before modules can get LFS, the underlying libraries must be enabled. According to the LFS wish list in the wiki on LFS, these are the vector libs and the DB libs. For that, this fseeko/ftello problem needs to be solved for 32bit systems. I have read "The issues" and understand the problem, but some sort of implementation of G_fseek and G_ftell is needed, otherwise modules and libraries need a workaround like the iostream library is doing now. Instead of having many (potentially different) workarounds, one proper solution is preferable. This may not be easy, and as much as I like tackling not easy problems, here I can only say: Please do it!.
Bear in mind that a GRASS database may be on a networked file system,
and accessed by both 32- and 64-bit systems, and by both big- and
little-endian systems.

Also, the user shouldn't need write permission in order to read a map. Or, rather, don't assume that the user has write permission for a map
which they are reading.
OK, the biggest problem is to support reading a vector written with sizeof(off_t) == 8 when the libs use sizeof(off_t) == 4, without rebuilding topology.

The biggest problem is when the compiler doesn't provide a 64-bit
integral type (off_t doesn't necessarily have to be 64 bits).
There is a handy function called buf_alloc() in the vector libs, allocating a temporary buffer of the needed size (can be of any size), to read content of any of the vector files. You could then read this temporary buffer in chunks of the size supported by the current vector libs. The code is essentially there and would need only little adjustment.
As you suggested, 2 32bit reads can be done, and depending on the endian-ness of the host system either the high word value or the low word value used.

The low word is always used. That might be the first word or the
second word, but it's always the low word.
I got confused by this endian-ness and confused low/high word with first/second word. With the current code, the low word would be the second word when doing 2 32bit reads on a 64bit sized buffer, independent on a endian-ness mismatch. In this case, the libs would have to check if the high word is != 0 and then exit with an ERROR message, right?
When writing offsets, it would be easiest (also safest?) to always use sizeof(off_t) of the libs. There will be no mix of different offset sizes because topo and cidx are currently written anew when the vector was updated.

It would be both easiest and safest. Although it would be preferable
to use 32 bits if that is known to be sufficient, I don't know whether
this is feasible.
I don't think so. With v.in.ogr, you have no chance to estimate the coor file size. Coming back to my test shapefile for v.in.ogr with a total size below 5MB, that thing results in a coor file > 8GB with cleaning and > 4GB without cleaning. When working on a grass vector, each module would have to estimate the increase of the coor file. Most modules copy the input vector to the output vector, do the requested modifications on the output vector and write out the output vector. You would have to do some very educated guessing on the size of the final coor file, considering the expected amount of dead lines and the expected amount of additional vertices, to decide if a 32bit off_t would be sufficient. Instead I would prefer to use 64 bits whenever possible. Personally, I would regard 32bit support as a courtesy, but please don't start a discussion about that.

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to