On Tue, 28 Apr 2020, Mathias Moser wrote:

Dear all,

sorry if this is an obvious observation (or even completely wrong), but
I got interested in what's happening here: I think you are hitting the
long vector limit of R [1] and from a look at rgdal's code this happens
because it uses 32-bit types INTSXP/REALSXP in RGDAL_GetRasterData()
which are limited to 2^31-1 n. Making getRasterData() work for your
data set would probably require patching rgdal for long vector support.

This looks probable, I'd got to the same point, as now I can max out swap on assigning values. Changes to lines 1574 and 1664 in src/gdal-bindings.cpp committed to SVN, revision 961, will be installable with

https://r-forge.r-project.org/R/?group_id=884 and if status green,
install.packages("rgdal", repos="http://R-Forge.R-project.org";)

in about 2 hours (R-Forge builds tarballs, slowly), or source tree now by

svn checkout svn://scm.r-forge.r-project.org/svnroot/rgdal/

I cannot check this because I have no access to a platform with enough RAM, so I need help here. I haven't been able to confirm that LENGTH(sRStorage) returns an R_xlen_t, or double, in lines 1672, 1685 or 1693. Is there a way of generating a large file using a GDAL app, perhaps, then I could just try reading a larger file if LENGTH plays up?

Roger


(read_gdal_data() of stars _seems_ to use R_xlen_t instead to
accomodate for 52bits [2]).

Best wishes,

Mathias


[1]
https://cloud.r-project.org/doc/manuals/r-patched/R-ints.html#Long-vectors
[2]
https://github.com/r-spatial/sf/blob/ea1bd716769ab8140d3451e3d902cfc79bc895d5/src/stars.cpp#L172


On Tue, 2020-04-28 at 13:39 +0200, Thorsten Behrens wrote:
Roger and Mike,

I really appreciate your help on this!

I had a look at getRasterData(). It results in the same error. Hence,
I
made further tests where I compared grids with the following numbers
of
cols and rows:

nPx = floor(sqrt(2^31 -1)) # 46340

and

nPx = ceiling(sqrt(2^31 -1)) # 46341

The result is clear. Raster data with 46340 * 46340 px can be loaded
with getRasterData() and with raster(), raster::as.matrix(), whereas
datasets with 46341 * 46341 px cannot and result in the error:

Error in getRasterData(gdCeil, band = 1) :
long vectors not supported yet: memory.c:3782

read_stars() works for both. You find the corresponding code below.

Are there any other option I can try?

Thorsten


Reproducible code:

## generate raster datasets
# 46340 * 46340 grid dataset
sFileNameFloor = "Floor.tif"

nPx = floor(sqrt(2^31 -1)) # 46340
nPx

rFloor = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
nPx)))
values(rFloor) = 1

writeRaster(rFloor, sFileNameFloor, "GTiff", overwrite = TRUE, NAflag
=
-9999)


# 46341 * 46341 grid dataset
sFileNameCeil = "Ceil.tif"

nPx = ceiling(sqrt(2^31 -1))
nPx

rCeil = raster(nrow = nPx, ncol = nPx, ext = extent(c(0, nPx, 0,
nPx)))
values(rCeil) = 1

writeRaster(rCeil, sFileNameCeil, "GTiff", overwrite = TRUE, NAflag
=
-9999)


## load raster datasets with different methods

# load Ceil
gdCeil = GDAL.open(sFileNameCeil)
dim(gdCeil)

vnCeil = getRasterData(gdCeil, band = 1) # error

GDAL.close(gdCeil)
str(vnCeil)

stCeil = read_stars(sFileNameCeil) # all fine
str(stCeil[[1]])

rCeil = raster(sFileNameCeil)
mCeil = raster::as.matrix(rCeil) # error
str(mCeil)


# load Floor
gdFloor = GDAL.open(sFileNameFloor)
dim(gdFloor)

vnData = getRasterData(gdFloor, band = 1) # all fine

GDAL.close(gdFloor)
str(vnData)

stFloor= read_stars(sFileNameFloor) # all fine
str(stFloor[[1]])

rFloor = raster(sFileNameFloor)
mFloor = raster::as.matrix(rFloor) # all fine
str(mFloor)




Am 28.04.2020 um 12:10 schrieb Roger Bivand:
On Tue, 28 Apr 2020, Thorsten Behrens wrote:

Michael,

Thanks for the hint, it seems to work! Real-world tests will
follow in
the next few days...

So it definitely seems to be a problem of rgdal. It would be
great if it
could still be solved.

Not rgdal, but your use of it. Try looking at a sequence of

file <- system.file("pictures/SP27GTIF.TIF", package="rgdal")
obj <- GDAL.open(file)
(dims <- dim(obj))
band <- 1
data_vector <- getRasterData(obj, band)
GDAL.close(obj)
str(data_vector)

This does not create any more complicated objects, just a matrix.
In
some cases, the rows in the matrix are ordered S -> N, so it may
appear the wrong way up.

rgdal::getRasterData() is lightweight, and has many arguments
which
may be helpful. rgdal::readGDAL() is heavyweight, creating a
SpatialGridDataFrame object. This involves much copying of data,
but
the output object can be used for example in mapping or analysis
without further conversion. My guess is that
rgdal::getRasterData()
gives you your matrix directly. Look at the R code to see how
as.is=
etc. work (files may include scale and offset values - recently a
user
was confused that scale and offset were "magically" applied to
convert
Uint16 values to degrees Kelvin on reading). For example, if as.is
==
TRUE or scale == 1 and offset == 0, no copying of the input matrix
occurs because it is not converted. If you could check this route,
others following this thread could also benefit; if I'm wrong,
that
would also be good to know.

Roger


Best,

Thorsten



Am 27.04.2020 um 15:58 schrieb Michael Sumner:
Try stars it worked for me on a test

On Mon., 27 Apr. 2020, 23:54 Thorsten Behrens,
<
thorsten.m.behr...@gmail.com
 <mailto:
thorsten.m.behr...@gmail.com

wrote:

    Roger,

    thanks a lot for your reply!

    I have 256GB RAM installed (mentioned it somewhere). And
there,
    all is
    fine when I run:

    rDemTest = raster(nrow = 48000, ncol = 72000, ext =
extent(c(0,
72000,

    values(rDemTest) = 1

    When limiting the memory to about 8GB with
    ulimit::memory_limit(8000),
    the max size which can be allocated seems to be around
10000 x
    10000px.
    In this case all tests run fine. Unfortunately it seems to
be
    related to
    the size of the grid (48000 x 72000) and therefore the
problem
    can't be
    reproduced on machines with 8GB RAM. For some processing
steps I
need
    grids of that size in the memory, which is why I have 256
GB
    installed.

    Normally, I use the raster package and not
rgdal::readGDAL(). This
    was
    just a desperate attempt to find the source of the problem.

    This is what I use in my code:

    rDem = raster(sFileNameTiff)
    mDem = raster::as.matrix(rDem)

    But maybe this is the same...

    Any further suggestions are much appreciated!

    Thanks again!

    Best,

    Thorsten




    Am 27.04.2020 um 14:50 schrieb Roger Bivand:
  > On Mon, 27 Apr 2020, Thorsten Behrens wrote:
  >
  >> Dear all,
  >>
  >> my problem is that I want to read a big geotiff raster
dataset
    into R
  >> and convert it to a matrix, which does not work.
  >> The file is big but there is sufficient memory. I need
all the
    data
  >> in the memory at the same time.
  >>
  >> The error occurs under R 3.6.3 as well as 4.0.0 using
Ubuntu
20.04
  >> LTS with the latest version of the packages (see session
info
    below)
  >> and 256GB RAM installed.
  >>
  >> When loading the raster dataset using rgdal (via readGDAL
or
  >> raster::readAll) I get the follwoing error in R 4.0.0:
  >>
  >> ```
  >> Error in rgdal::getRasterData(con, offset = offs,
region.dim =
    reg,
  >> band = object@data@band) :
  >>   long vectors not supported yet: memory.c:3782
  >> ```
  >
  > On a 16GB Fedora platform:
  >
  >> library(raster) # 3.1-5
  >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
extent(c(0,
    72000,
  > 0,
  > + 48000))) # all fine
  >> rDemTest
  > class      : RasterLayer
  > dimensions : 48000, 72000, 3.456e+09  (nrow, ncol, ncell)
  > resolution : 1, 1  (x, y)
  > extent     : 0, 72000, 0, 48000  (xmin, xmax, ymin, ymax)
  > crs        : NA
  >
  >> values(rDemTest) = 1
  > Error: cannot allocate vector of size 25.7 Gb
  >
  > So you are deceiving yourself into thinking that all is
fine at
    this
  > point. Please try to instantiate an example that can be
    reproduced on
  > a machine with 8GB RAM.
  >
  > Further note that rgdal::readGDAL() is not how you handle
very
    large
  > objects in files, and never has been. raster can handle
blocks
    of data
  > from bands in file; stars and gdalcubes can use proxy=TRUE
for the
  > same purpose. Why did you choose rgdal::readGDAL() when
this is
not
  > its purpose?
  >
  > You did not say how much RAM is on your platform.
  >
  > Roger
  >
  >>
  >> In R 3.6.3 is is "... memory.c:3717"
  >>
  >> However, I can load the same file with the tiff package
and a
    file of
  >> the same size in the native raster package format (*.grd)
with
the
  >> raster package but again not with the rgdal package.
  >>
  >> gdalinfo (gdalUtils) does not complain (see below).
Hence, Even
  >> Rouault assumes the problem is related to rgdal and not
gdal
  >> (
https://github.com/OSGeo/gdal/issues/2442
).
  >>
  >> Below you find reproducible code, which generates a
raster file,
  >> saves the two formats (.tiff and .grd) and tries to read
them
with
  >> the different packages.
  >>
  >> Is this a known limitation? Any help is greatly
appreciated!
  >>
  >> Thanks a lot in advance!
  >>
  >> Best wishes and stay healthy,
  >> Thorsten
  >>
  >>
  >>
  >> ### Steps to reproduce the problem.
  >>
  >> R code:
  >>
  >> ```
  >> library(rgdal) # 1.4-8
  >> library(raster) # 3.1-5
  >> library(tiff) # 0.1-5
  >>
  >> ## generate and manipulate a big raster dataset
  >> # - generate
  >> rDemTest = raster(nrow = 48000, ncol = 72000, ext =
extent(c(0,
  >> 72000, 0, 48000))) # all fine
  >>
  >> # - manipulate
  >> values(rDemTest) = 1 # all fine
  >>
  >> # - convert
  >> mDemTest = raster::as.matrix(rDemTest) # all fine
  >> str(mDemTest)
  >>
  >> ## save a big dataset
  >>
  >> # - as raster/gdal
  >> sFileNameTiff = "BigData.tif"
  >> writeRaster(rDemTest, sFileNameTiff, "GTiff", overwrite =
TRUE,
  >> NAflag = -9999) # all fine
  >>
  >> # - as raster native
  >> sFileNameNative = "BigData.grd"
  >> writeRaster(rDemTest, sFileNameNative, "raster",
overwrite =
TRUE,
  >> NAflag = -9999) # all fine
  >>
  >>
  >> ## load the big raster datasets with different packages
and
options
  >> # - load the tiff data with the gdal package via the
raster
package
  >> rDem = raster(sFileNameTiff) # all fine
  >> extent(rDem) # all fine
  >> mDem = raster::as.matrix(rDem) # error
  >> rDem = readAll(rDem) # error
  >>
  >> # - load the native raster data with the raster package
  >> rDem = raster(sFileNameNative) # all fine
  >> extent(rDem) # all fine
  >> mDem = raster::as.matrix(rDem) # all fine
  >> str(mDem)
  >>
  >> # - load the tiff data with the tiff package
  >> mDem = readTIFF(sFileNameTiff) # all fine
  >> str(mDem)
  >>
  >> # - load the tiff data with the gdal package
  >> sfDem = readGDAL(sFileNameTiff) # error
  >>
  >> # - load the native raster data with the gdal package
  >> sfDem = readGDAL(sFileNameNative) # error
  >>
  >> ```
  >>
  >>
  >> ### Startup messages when rgdal is attached (requested by
Roger
    Bivand)
  >>>  library(rgdal)
  >> rgdal: version: 1.4-8, (SVN revision 845)
  >>  Geospatial Data Abstraction Library extensions to R
    successfully loaded
  >>  Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
  >>  Path to GDAL shared files:
  >>  GDAL binary built with GEOS: TRUE
  >>  Loaded PROJ.4 runtime: Rel. 6.3.1, February 10th, 2020,
    [PJ_VERSION:
  >> 631]
  >>  Path to PROJ.4 shared files: (autodetected)
  >>  Linking to sp version: 1.4-1
  >>
  >>
  >> ### Session info
  >>>  sessionInfo()
  >> R version 4.0.0 (2020-04-24)
  >> Platform: x86_64-pc-linux-gnu (64-bit)
  >> Running under: Ubuntu 20.04 LTS
  >>
  >> Matrix products: default
  >> BLAS: /usr/lib/x86_64-linux-gnu/openblas-
pthread/libblas.so.3
  >> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-
pthread/liblapack.so.3
  >>
  >> locale:
  >>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
LC_TIME=de_DE.UTF-8
  >>  [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
  >> LC_MESSAGES=de_DE.UTF-8
  >>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C LC_ADDRESS=C
  >> [10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8
  >> LC_IDENTIFICATION=C
  >>
  >> attached base packages:
  >> [1] stats     graphics  grDevices utils datasets methods
base
  >>
  >> other attached packages:
  >> [1] gdalUtils_2.0.3.2 rgdal_1.4-8       tiff_0.1-5
raster_3.1-5
  >> sp_1.4-1
  >>
  >> loaded via a namespace (and not attached):
  >>  [1] compiler_4.0.0    tools_4.0.0 Rcpp_1.0.4.6
  >> R.methodsS3_1.8.0 codetools_0.2-16
  >>  [6] grid_4.0.0        iterators_1.0.12 foreach_1.5.0
  >> R.utils_2.9.2     R.oo_1.23.0
  >> [11] lattice_0.20-41
  >>
  >>
  >> ### gdalInfo
  >>>  gdalinfo(sFileNameTiff)
  >>  [1] "Driver: GTiff/GeoTIFF"
  >>  [2] "Files: BigData.tif"
  >>  [3] "Size is 72000, 48000"
  >>  [4] "Origin = (0.000000000000000,48000.000000000000000)"
  >>  [5] "Pixel Size = (1.000000000000000,-
1.000000000000000)"
  >>  [6] "Image Structure Metadata:"
  >>  [7] "  COMPRESSION=LZW"
  >>  [8] "  INTERLEAVE=BAND"
  >>  [9] "Corner Coordinates:"
  >> [10] "Upper Left  (       0.000,   48000.000) "
  >> [11] "Lower Left  (   0.0000000,   0.0000000) "
  >> [12] "Upper Right (   72000.000,   48000.000) "
  >> [13] "Lower Right (   72000.000,       0.000) "
  >> [14] "Center      (   36000.000,   24000.000) "
  >> [15] "Band 1 Block=72000x1 Type=Float32,
ColorInterp=Gray"
  >> [16] "  Min=1.000 Max=1.000 "
  >> [17] "  Minimum=1.000, Maximum=1.000, Mean=nan,
StdDev=nan"
  >> [18] "  NoData Value=-9999"
  >> [19] "  Metadata:"
  >> [20] "    STATISTICS_MAXIMUM=1"
  >> [21] "    STATISTICS_MEAN=nan"
  >> [22] "    STATISTICS_MINIMUM=1"
  >> [23] "    STATISTICS_STDDEV=nan"
  >>
  >> _______________________________________________
  >> R-sig-Geo mailing list
  >>
R-sig-Geo@r-project.org
 <mailto:
R-sig-Geo@r-project.org

  >>
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

  >>
  >>
  >

    _______________________________________________
    R-sig-Geo mailing list

R-sig-Geo@r-project.org
 <mailto:
R-sig-Geo@r-project.org


https://stat.ethz.ch/mailman/listinfo/r-sig-geo



    [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org

https://stat.ethz.ch/mailman/listinfo/r-sig-geo



_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org

https://stat.ethz.ch/mailman/listinfo/r-sig-geo


_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Reply via email to