Re: [GRASS-dev] Usage of size_t

2012-06-22 Thread Glynn Clements

Markus Metz wrote:

  I'm moving size_t topic from user to dev list.
 
  As I understand, size_t should be used also for number of rows and
  number of columns of a map. But for example Rast_window_{rows,cols}
  returns int. So will this be also changed/fixed to size_t?
 
 Considering current hardware, we struggle to support maps with
 2147483647 (the maximum value representable by a 32-bit integer;
 Glynn's previous answer) cells = rows * columns. Therefore I guess
 that for the time being int is fine as data type for rows and columns.
 Supporting 2147483647 rows x 2147483647 columns is still a far way to
 go. Type size_t should be used for rows * columns * sizeof(some data)
 when allocating memory, and type off_t should be used for rows *
 columns * sizeof(some data) for disk IO. I would just cast as
 suggested by Glynn.

FWIW:

size_t is specified by the C standard as the type of a sizeof
expression. It's used in the library as the argument to malloc(), and
wherever an array size is specified (e.g. memcpy, fread, etc).

On systems with a flat address space (i.e. anything which GRASS
targets), size_t will have the same size as a pointer.

off_t isn't specified by the C standard, but by POSIX, and refers to
the type used by lseek() to store a file offset. On 32-bit systems,
off_t may be either 32 or 64 bits. off_t is a signed type.

If you're going to read an entire map into memory, then using size_t
for the number of cells or bytes is sufficient for any map which will
actually fit into memory.

OTOH, if you're processing the file row-by-row, it may not actually be
possible to calculate the total number of cells or bytes.

For any file which can be opened, off_t is guaranteed to be large
enough to represent the size of the file on disk; without LFS, the OS
will refuse to open a file larger than 2GiB.

However, GRASS raster maps are normally compressed, so the number of
bytes or cells in the decompressed data may exceed 2^31 even if the
file is smaller than 2 GiB.

-- 
Glynn Clements gl...@gclements.plus.com
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev


[GRASS-dev] Usage of size_t

2012-06-21 Thread Vaclav Petras
Hello,

I'm moving size_t topic from user to dev list.

As I understand, size_t should be used also for number of rows and
number of columns of a map. But for example Rast_window_{rows,cols}
returns int. So will this be also changed/fixed to size_t?

Note that some instances of using int instead of size_t can be
identified by GCC warning option -Wsign-conversion. It marks usage of
int as an argument of malloc/G_calloc/... or usage of int for
iterating C++ STL containers.

Vaclav


From topic NVIZ can't allocate enough memory on [GRASS-user]:

 From: Glynn Clements gl...@gclements.plus.com
 Date: Tue, Jun 12, 2012 at 5:23 AM
 Subject: Re: [GRASS-user] NVIZ can't allocate enough memory
 To: Daniel Lee l...@isi-solutions.org
 Cc: GRASS user list grass-u...@lists.osgeo.org



 Daniel Lee wrote:

 ERROR: G_malloc: unable to allocate 18446744071974792324 bytes at gsds.c:575

 18446744071974792324 = 0x9899ac84

 Does anybody know what could be the problem? If I'm interpreting the bytes
 correctly, that'd be about 10^10 gigabytes, which several orders of
 magnitude larger than the rasters involved. Thanks!

 A calculation has overflowed the range of a signed 32-bit integer
 resulting in a negative value. This value has then been converted to a
 signed 64-bit integer, and then to an unsigned 64-bit integer.

 Here's the output of g.region:

 rows:       32001
 cols:       20001
 cells:      640052001

 At 4 bytes per cell, 640052001 cells = 2560208004 bytes.

 2560208004 = 0x9899ac84

 The maximum value representable by a 32-bit integer is 0x7fff =
 2147483647.

 Using an unsigned integer would eliminate the problem in this
 particular case, but the region wouldn't need to be much larger in
 order for that to be insufficient (specifically, 32001 x 33553 would
 overflow).

 The correct solution is to use size_t rather than int. E.g. for
 this specific case:

 --- gsds.c      (revision 52015)
 +++ gsds.c      (working copy)
 @@ -481,7 +481,8 @@
  int gsds_alloc_typbuff(int id, int *dims, int ndims, int type)
  {
     dataset *ds;
 -    int i, siz = 1;
 +    int i;
 +    size_t siz = 1;

     if ((ds = get_dataset(id))) {
        /*

 More generally, it's important that the conversion comes before the
 mulitply; e.g.:

        size_t cells = (size_t) rows * cols;

 will work but:

        size_t cells = rows * cols;

 won't; the calculation will be perfomed using int, overflow, then
 the overflowed result will be promoted.

 However: the number of instances of this particular issue in the GRASS
 source code is probably somewhere between dozens and hundreds, and
 there's no systematic way to identify them. Running test cases with a
 region of = 2^32 cells (or even = 2^29 cells, i.e. = 2^31 bytes at
 4 bytes per cell) would find a lot of them, but it requires a 64-bit
 system with plenty of RAM, and it's going to be slow.

 --
 Glynn Clements gl...@gclements.plus.com
 ___
 grass-user mailing list
 grass-u...@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-user
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Usage of size_t

2012-06-21 Thread Markus Metz
On Thu, Jun 21, 2012 at 5:16 PM, Vaclav Petras wenzesl...@gmail.com wrote:
 Hello,

 I'm moving size_t topic from user to dev list.

 As I understand, size_t should be used also for number of rows and
 number of columns of a map. But for example Rast_window_{rows,cols}
 returns int. So will this be also changed/fixed to size_t?

Considering current hardware, we struggle to support maps with
2147483647 (the maximum value representable by a 32-bit integer;
Glynn's previous answer) cells = rows * columns. Therefore I guess
that for the time being int is fine as data type for rows and columns.
Supporting 2147483647 rows x 2147483647 columns is still a far way to
go. Type size_t should be used for rows * columns * sizeof(some data)
when allocating memory, and type off_t should be used for rows *
columns * sizeof(some data) for disk IO. I would just cast as
suggested by Glynn.


 Note that some instances of using int instead of size_t can be
 identified by GCC warning option -Wsign-conversion. It marks usage of
 int as an argument of malloc/G_calloc/... or usage of int for
 iterating C++ STL containers.

The number of rows and columns should always be non-negative, thus no
worries about sign conversion.

Also note that when calling e.g. Rast_northing_to_row() which returns
the row number, not the number of rows, the resulting row can be
negative and must be treated as double or casted to some signed
integer, not a potentially unsigned size_t.

Markus M


 Vaclav


 From topic NVIZ can't allocate enough memory on [GRASS-user]:

 From: Glynn Clements gl...@gclements.plus.com
 Date: Tue, Jun 12, 2012 at 5:23 AM
 Subject: Re: [GRASS-user] NVIZ can't allocate enough memory
 To: Daniel Lee l...@isi-solutions.org
 Cc: GRASS user list grass-u...@lists.osgeo.org



 Daniel Lee wrote:

 ERROR: G_malloc: unable to allocate 18446744071974792324 bytes at gsds.c:575

 18446744071974792324 = 0x9899ac84

 Does anybody know what could be the problem? If I'm interpreting the bytes
 correctly, that'd be about 10^10 gigabytes, which several orders of
 magnitude larger than the rasters involved. Thanks!

 A calculation has overflowed the range of a signed 32-bit integer
 resulting in a negative value. This value has then been converted to a
 signed 64-bit integer, and then to an unsigned 64-bit integer.

 Here's the output of g.region:

 rows:       32001
 cols:       20001
 cells:      640052001

 At 4 bytes per cell, 640052001 cells = 2560208004 bytes.

 2560208004 = 0x9899ac84

 The maximum value representable by a 32-bit integer is 0x7fff =
 2147483647.

 Using an unsigned integer would eliminate the problem in this
 particular case, but the region wouldn't need to be much larger in
 order for that to be insufficient (specifically, 32001 x 33553 would
 overflow).

 The correct solution is to use size_t rather than int. E.g. for
 this specific case:

 --- gsds.c      (revision 52015)
 +++ gsds.c      (working copy)
 @@ -481,7 +481,8 @@
  int gsds_alloc_typbuff(int id, int *dims, int ndims, int type)
  {
     dataset *ds;
 -    int i, siz = 1;
 +    int i;
 +    size_t siz = 1;

     if ((ds = get_dataset(id))) {
        /*

 More generally, it's important that the conversion comes before the
 mulitply; e.g.:

        size_t cells = (size_t) rows * cols;

 will work but:

        size_t cells = rows * cols;

 won't; the calculation will be perfomed using int, overflow, then
 the overflowed result will be promoted.

 However: the number of instances of this particular issue in the GRASS
 source code is probably somewhere between dozens and hundreds, and
 there's no systematic way to identify them. Running test cases with a
 region of = 2^32 cells (or even = 2^29 cells, i.e. = 2^31 bytes at
 4 bytes per cell) would find a lot of them, but it requires a 64-bit
 system with plenty of RAM, and it's going to be slow.

 --
 Glynn Clements gl...@gclements.plus.com
 ___
 grass-user mailing list
 grass-u...@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-user
 ___
 grass-dev mailing list
 grass-dev@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/grass-dev
___
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev