According to http://gdal.org/drv_sqlite.html you can unset
OGR_SQLITE_SYNCHRONOUS and it should improve performance. So, use
Sys.setenv(OGR_SQLITE_SYNCHRONOUS="OFF")
in your script/session before you wish to call writes. The SQLite and
GPKG writes are still 11x and 13x respectively slower than shapefile writes
but not terribly bad. Below is what resulted in these numbers. Output
from my .Rout file follows:
> library(sf)
Linking to GEOS 3.5.1, GDAL 2.2.0, proj.4 4.9.2
> library(rbenchmark)
>
> n <- 1000
> d <- data.frame(a=seq_len(n),x=rnorm(n,1,1),y=rnorm(n,1,1))
> mp1 <- st_as_sf(d,coords=c('x','y'))
>
> td <- tempdir()
> file.remove(list.files(td,full.names=TRUE))
logical(0)
> stwrite <- function(dat, dsn, ...) {
+ if (file.exists(dsn)) unlink(dsn)
+ st_write(dat,dsn=dsn, ...)
+ }
>
> Sys.setenv(OGR_SQLITE_SYNCHRONOUS="OFF")
>
> benchmark(
+ stwrite(mp1, dsn=file.path(td,'data1.shp'), driver='ESRI Shapefile'),
+ stwrite(mp1, dsn=file.path(td,'data2.sqlite'), layer="layer1",
driver='SQLite'),
+ stwrite(mp1, dsn=file.path(td,'data2.sqlite'), layer="layer2",
driver='SQLite'),
+ stwrite(mp1, dsn=file.path(td,'data3.gpkg'), layer="layer1"),
+ stwrite(mp1, dsn=file.path(td,'data3.gpkg'), layer="layer2"),
+ replications=5
+ )
test
1 stwrite(mp1, dsn = file.path(td, "data1.shp"), driver =
"ESRI Shapefile")
2 stwrite(mp1, dsn = file.path(td, "data2.sqlite"), layer = "layer1",
driver = "SQLite")
3 stwrite(mp1, dsn = file.path(td, "data2.sqlite"), layer = "layer2",
driver = "SQLite")
4 stwrite(mp1, dsn = file.path(td, "data3.gpkg"),
layer = "layer1")
5 stwrite(mp1, dsn = file.path(td, "data3.gpkg"),
layer = "layer2")
replications elapsed relative user.self sys.self user.child sys.child
1 5 0.034 1.000 0.028 0.008 0 0
2 5 0.374 11.000 0.060 0.312 0 0
3 5 0.359 10.559 0.084 0.272 0 0
4 5 0.454 13.353 0.192 0.260 0 0
5 5 0.451 13.265 0.172 0.280 0 0
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] rbenchmark_1.0.0 sf_0.5-4
loaded via a namespace (and not attached):
[1] compiler_3.4.1 magrittr_1.5 DBI_0.7 units_0.4-5
Rcpp_0.12.12
[6] udunits2_0.13 grid_3.4.1
>
>
> proc.time()
user system elapsed
1.436 1.340 2.791
On Thu, Aug 24, 2017 at 5:44 PM, Loïc Dutrieux <[email protected]
> wrote:
>
>
> On 24/08/17 10:23, Roger Bivand wrote:
>
>> On Thu, 24 Aug 2017, [email protected] wrote:
>>
>> Dear list
>>>
>>> I am searching alternatives to ESRI shapefiles for the storage of GPS
>>> data, i.e. tagged point features, and came across SpatialLite or
>>> Geopackage. Unfortunately writing to both formats is very slow compared to
>>> shapefiles making practical use impossible.
>>>
>>> library(sf)
>>> library(rgdal)
>>> library(RSQLite)
>>>
>>> n<- 1000
>>> d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
>>> mp1 <- st_as_sf(d, coords=c("X","Y"))
>>>
>>> t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver =
>>> 'ESRI Shapefile'))
>>> t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer =
>>> 'data1', driver = 'SQLite'))
>>> t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
>>>
>>> rbind(t1,t2,t3)[,1:3]
>>>
>>> user.self sys.self elapsed
>>> t1 0.03 0.03 0.09
>>> t2 0.53 5.04 29.33
>>> t3 0.48 4.29 32.19
>>>
>>> As n increases, processing time explodes for SpatialLite and Geopackage,
>>> and I usually have a couple of 10000 points to store. Any experiences of
>>> others would be highly appreciated.
>>>
>>
>> Fedora 26 64-bit:
>>
>> n 1000
>>
>> rbind(t1,t2,t3)[,1:3]
>>>
>> user.self sys.self elapsed
>> t1 0.007 0.001 0.010
>> t2 0.067 0.035 0.103
>> t3 0.029 0.042 0.073
>>
>> n 25000
>>
>> rbind(t1,t2,t3)[,1:3]
>>>
>> user.self sys.self elapsed
>> t1 0.120 0.032 0.153
>> t2 0.412 0.829 1.247
>> t3 0.645 0.834 1.487
>>
>> R version 3.4.1 (2017-06-30)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Fedora 26 (Workstation Edition)
>> other attached packages:
>> [1] sf_0.5-3
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.4.1 magrittr_1.5 tools_3.4.1 DBI_0.7
>> units_0.4-5
>> [6] Rcpp_0.12.12 udunits2_0.13 grid_3.4.1
>>
>>
> I also get large differences on ubuntu 16.04 64-bits with ssd;
> particularly when writing a second layer to an existing geopackage
>
> library(sf)
>
> n <- 1000
> d <- data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
> mp1 <- st_as_sf(d, coords=c("X","Y"))
>
> td <- tempdir()
> file.remove(list.files(td, full.names = TRUE))
>
> t1 <- system.time(st_write(mp1, dsn = file.path(td, 'data1.shp'), driver =
> 'ESRI Shapefile'))
> t2 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'), layer
> = 'layer1', driver = 'SQLite'))
> t3 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'), layer
> = 'layer2', driver = 'SQLite'))
> t4 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer =
> 'layer1'))
> t5 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer =
> 'layer2'))
>
> rbind(t1,t2,t3,t4,t5)[,1:3]
>
> user.self sys.self elapsed
> t1 0.012 0.000 0.010
> t2 0.180 0.456 8.993
> t3 0.220 0.460 10.637
> t4 0.016 0.064 0.082
> t5 0.200 0.472 9.199
>
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS
>
> other attached packages:
> [1] sf_0.5-3 raster_2.5-8 sp_1.2-4
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.0 magrittr_1.5 DBI_0.6-1 tools_3.4.0
> units_0.4-5 yaml_2.1.14 Rcpp_0.12.10 udunits2_0.13 grid_3.4.0
> lattice_0.20-35
>
> Cheers,
> Loïc
>
>
> There is no need to load rgdal or RSQLite, neither are needed or used. For
>> portability use tempdir():
>>
>> t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp')))
>> t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer =
>> 'data1', driver = 'SQLite'))
>> t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg')))
>>
>> Maybe an order of magnitude difference because the databases need
>> initialising, but nothing like your scale; does 32/64 bit make a difference?
>>
>> I'm assuming that you installed sf as a Windows binary from CRAN?
>>
>> Consider using a github issue when others have tried tis out on other
>> platforms.
>>
>> Roger
>>
>> Many thanks
>>> Manuel
>>>
>>>
>>> ------
>>> R version 3.4.1 (2017-06-30)
>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>> Running under: Windows 7 (build 7601) Service Pack 1
>>>
>>> Matrix products: default
>>>
>>> locale:
>>> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
>>> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
>>> [5] LC_TIME=German_Switzerland.1252
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] sf_0.5-3 RSQLite_2.0 rgdal_1.2-8 sp_1.2-5
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Rcpp_0.12.12 lattice_0.20-35 digest_0.6.12 grid_3.4.1
>>> DBI_0.7
>>> [6] magrittr_1.5 units_0.4-5 rlang_0.1.2 blob_1.1.0
>>> tools_3.4.1
>>> [11] udunits2_0.13 bit64_0.9-7 bit_1.1-12 compiler_3.4.1
>>> memoise_1.1.0
>>> [16] tibble_1.3.4
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [email protected]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>
>>
> _______________________________________________
> R-sig-Geo mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
[[alternative HTML version deleted]]
_______________________________________________
R-sig-Geo mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo