Re: [GRASS-dev] vector large file support

Markus Metz Sun, 08 Feb 2009 11:45:47 -0800


Glynn Clements wrote:

Markus Metz wrote:
I like the point of Ivan that off_t is the native type for file offsets.Could G_fseek then use fseeko whenever fseeko is available (ditto forftello)?
Well, that's the general idea. The only advantage of fseek/ftell is
that they are always available.

I think grass7 should get LFS support as far as possible, today'sdatasets can easily exceed the 2GB limit. Before modules can get LFS,the underlying libraries must be enabled. According to the LFS wish listin the wiki on LFS, these are the vector libs and the DB libs. For that,this fseeko/ftello problem needs to be solved for 32bit systems. I haveread "The issues" and understand the problem, but some sort ofimplementation of G_fseek and G_ftell is needed, otherwise modules andlibraries need a workaround like the iostream library is doing now.Instead of having many (potentially different) workarounds, one propersolution is preferable. This may not be easy, and as much as I liketackling not easy problems, here I can only say: Please do it!.

Bear in mind that a GRASS database may be on a networked file system,
and accessed by both 32- and 64-bit systems, and by both big- and
little-endian systems.
Also, the user shouldn't need write permission in order to read a map.Or, rather, don't assume that the user has write permission for a map
which they are reading.
OK, the biggest problem is to support reading a vector written withsizeof(off_t) == 8 when the libs use sizeof(off_t) == 4, withoutrebuilding topology.
The biggest problem is when the compiler doesn't provide a 64-bit
integral type (off_t doesn't necessarily have to be 64 bits).

There is a handy function called buf_alloc() in the vector libs,allocating a temporary buffer of the needed size (can be of any size),to read content of any of the vector files. You could then read thistemporary buffer in chunks of the size supported by the current vectorlibs. The code is essentially there and would need only little adjustment.

As you suggested, 2 32bit reads can be done, anddepending on the endian-ness of the host system either the high wordvalue or the low word value used.
The low word is always used. That might be the first word or the
second word, but it's always the low word.

I got confused by this endian-ness and confused low/high word withfirst/second word. With the current code, the low word would be thesecond word when doing 2 32bit reads on a 64bit sized buffer,independent on a endian-ness mismatch. In this case, the libs would haveto check if the high word is != 0 and then exit with an ERROR message,right?

When writing offsets, it would be easiest (also safest?) to always usesizeof(off_t) of the libs. There will be no mix of different offsetsizes because topo and cidx are currently written anew when the vectorwas updated.
It would be both easiest and safest. Although it would be preferable
to use 32 bits if that is known to be sufficient, I don't know whether
this is feasible.

I don't think so. With v.in.ogr, you have no chance to estimate the coorfile size. Coming back to my test shapefile for v.in.ogr with a totalsize below 5MB, that thing results in a coor file > 8GB with cleaningand > 4GB without cleaning. When working on a grass vector, each modulewould have to estimate the increase of the coor file. Most modules copythe input vector to the output vector, do the requested modifications onthe output vector and write out the output vector. You would have to dosome very educated guessing on the size of the final coor file,considering the expected amount of dead lines and the expected amount ofadditional vertices, to decide if a 32bit off_t would be sufficient.Instead I would prefer to use 64 bits whenever possible. Personally, Iwould regard 32bit support as a courtesy, but please don't start adiscussion about that.


_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] vector large file support

Reply via email to