Re: [GRASS-dev] Usage of size_t
Markus Metz wrote: I'm moving size_t topic from user to dev list. As I understand, size_t should be used also for number of rows and number of columns of a map. But for example Rast_window_{rows,cols} returns int. So will this be also changed/fixed to size_t? Considering current hardware, we struggle to support maps with 2147483647 (the maximum value representable by a 32-bit integer; Glynn's previous answer) cells = rows * columns. Therefore I guess that for the time being int is fine as data type for rows and columns. Supporting 2147483647 rows x 2147483647 columns is still a far way to go. Type size_t should be used for rows * columns * sizeof(some data) when allocating memory, and type off_t should be used for rows * columns * sizeof(some data) for disk IO. I would just cast as suggested by Glynn. FWIW: size_t is specified by the C standard as the type of a sizeof expression. It's used in the library as the argument to malloc(), and wherever an array size is specified (e.g. memcpy, fread, etc). On systems with a flat address space (i.e. anything which GRASS targets), size_t will have the same size as a pointer. off_t isn't specified by the C standard, but by POSIX, and refers to the type used by lseek() to store a file offset. On 32-bit systems, off_t may be either 32 or 64 bits. off_t is a signed type. If you're going to read an entire map into memory, then using size_t for the number of cells or bytes is sufficient for any map which will actually fit into memory. OTOH, if you're processing the file row-by-row, it may not actually be possible to calculate the total number of cells or bytes. For any file which can be opened, off_t is guaranteed to be large enough to represent the size of the file on disk; without LFS, the OS will refuse to open a file larger than 2GiB. However, GRASS raster maps are normally compressed, so the number of bytes or cells in the decompressed data may exceed 2^31 even if the file is smaller than 2 GiB. -- Glynn Clements gl...@gclements.plus.com ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
[GRASS-dev] Usage of size_t
Hello, I'm moving size_t topic from user to dev list. As I understand, size_t should be used also for number of rows and number of columns of a map. But for example Rast_window_{rows,cols} returns int. So will this be also changed/fixed to size_t? Note that some instances of using int instead of size_t can be identified by GCC warning option -Wsign-conversion. It marks usage of int as an argument of malloc/G_calloc/... or usage of int for iterating C++ STL containers. Vaclav From topic NVIZ can't allocate enough memory on [GRASS-user]: From: Glynn Clements gl...@gclements.plus.com Date: Tue, Jun 12, 2012 at 5:23 AM Subject: Re: [GRASS-user] NVIZ can't allocate enough memory To: Daniel Lee l...@isi-solutions.org Cc: GRASS user list grass-u...@lists.osgeo.org Daniel Lee wrote: ERROR: G_malloc: unable to allocate 18446744071974792324 bytes at gsds.c:575 18446744071974792324 = 0x9899ac84 Does anybody know what could be the problem? If I'm interpreting the bytes correctly, that'd be about 10^10 gigabytes, which several orders of magnitude larger than the rasters involved. Thanks! A calculation has overflowed the range of a signed 32-bit integer resulting in a negative value. This value has then been converted to a signed 64-bit integer, and then to an unsigned 64-bit integer. Here's the output of g.region: rows: 32001 cols: 20001 cells: 640052001 At 4 bytes per cell, 640052001 cells = 2560208004 bytes. 2560208004 = 0x9899ac84 The maximum value representable by a 32-bit integer is 0x7fff = 2147483647. Using an unsigned integer would eliminate the problem in this particular case, but the region wouldn't need to be much larger in order for that to be insufficient (specifically, 32001 x 33553 would overflow). The correct solution is to use size_t rather than int. E.g. for this specific case: --- gsds.c (revision 52015) +++ gsds.c (working copy) @@ -481,7 +481,8 @@ int gsds_alloc_typbuff(int id, int *dims, int ndims, int type) { dataset *ds; - int i, siz = 1; + int i; + size_t siz = 1; if ((ds = get_dataset(id))) { /* More generally, it's important that the conversion comes before the mulitply; e.g.: size_t cells = (size_t) rows * cols; will work but: size_t cells = rows * cols; won't; the calculation will be perfomed using int, overflow, then the overflowed result will be promoted. However: the number of instances of this particular issue in the GRASS source code is probably somewhere between dozens and hundreds, and there's no systematic way to identify them. Running test cases with a region of = 2^32 cells (or even = 2^29 cells, i.e. = 2^31 bytes at 4 bytes per cell) would find a lot of them, but it requires a 64-bit system with plenty of RAM, and it's going to be slow. -- Glynn Clements gl...@gclements.plus.com ___ grass-user mailing list grass-u...@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev
Re: [GRASS-dev] Usage of size_t
On Thu, Jun 21, 2012 at 5:16 PM, Vaclav Petras wenzesl...@gmail.com wrote: Hello, I'm moving size_t topic from user to dev list. As I understand, size_t should be used also for number of rows and number of columns of a map. But for example Rast_window_{rows,cols} returns int. So will this be also changed/fixed to size_t? Considering current hardware, we struggle to support maps with 2147483647 (the maximum value representable by a 32-bit integer; Glynn's previous answer) cells = rows * columns. Therefore I guess that for the time being int is fine as data type for rows and columns. Supporting 2147483647 rows x 2147483647 columns is still a far way to go. Type size_t should be used for rows * columns * sizeof(some data) when allocating memory, and type off_t should be used for rows * columns * sizeof(some data) for disk IO. I would just cast as suggested by Glynn. Note that some instances of using int instead of size_t can be identified by GCC warning option -Wsign-conversion. It marks usage of int as an argument of malloc/G_calloc/... or usage of int for iterating C++ STL containers. The number of rows and columns should always be non-negative, thus no worries about sign conversion. Also note that when calling e.g. Rast_northing_to_row() which returns the row number, not the number of rows, the resulting row can be negative and must be treated as double or casted to some signed integer, not a potentially unsigned size_t. Markus M Vaclav From topic NVIZ can't allocate enough memory on [GRASS-user]: From: Glynn Clements gl...@gclements.plus.com Date: Tue, Jun 12, 2012 at 5:23 AM Subject: Re: [GRASS-user] NVIZ can't allocate enough memory To: Daniel Lee l...@isi-solutions.org Cc: GRASS user list grass-u...@lists.osgeo.org Daniel Lee wrote: ERROR: G_malloc: unable to allocate 18446744071974792324 bytes at gsds.c:575 18446744071974792324 = 0x9899ac84 Does anybody know what could be the problem? If I'm interpreting the bytes correctly, that'd be about 10^10 gigabytes, which several orders of magnitude larger than the rasters involved. Thanks! A calculation has overflowed the range of a signed 32-bit integer resulting in a negative value. This value has then been converted to a signed 64-bit integer, and then to an unsigned 64-bit integer. Here's the output of g.region: rows: 32001 cols: 20001 cells: 640052001 At 4 bytes per cell, 640052001 cells = 2560208004 bytes. 2560208004 = 0x9899ac84 The maximum value representable by a 32-bit integer is 0x7fff = 2147483647. Using an unsigned integer would eliminate the problem in this particular case, but the region wouldn't need to be much larger in order for that to be insufficient (specifically, 32001 x 33553 would overflow). The correct solution is to use size_t rather than int. E.g. for this specific case: --- gsds.c (revision 52015) +++ gsds.c (working copy) @@ -481,7 +481,8 @@ int gsds_alloc_typbuff(int id, int *dims, int ndims, int type) { dataset *ds; - int i, siz = 1; + int i; + size_t siz = 1; if ((ds = get_dataset(id))) { /* More generally, it's important that the conversion comes before the mulitply; e.g.: size_t cells = (size_t) rows * cols; will work but: size_t cells = rows * cols; won't; the calculation will be perfomed using int, overflow, then the overflowed result will be promoted. However: the number of instances of this particular issue in the GRASS source code is probably somewhere between dozens and hundreds, and there's no systematic way to identify them. Running test cases with a region of = 2^32 cells (or even = 2^29 cells, i.e. = 2^31 bytes at 4 bytes per cell) would find a lot of them, but it requires a 64-bit system with plenty of RAM, and it's going to be slow. -- Glynn Clements gl...@gclements.plus.com ___ grass-user mailing list grass-u...@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-user ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev ___ grass-dev mailing list grass-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/grass-dev