Hi Phil and Roger, To close the loop on this issue, I have confirmed that the mapshaper javascript library (which rmapshaper wraps) can produce invalid polygons (https://github.com/ateucher/rmapshaper/issues/89), and I will address this in a future release of rmapshaper (https://github.com/ateucher/rmapshaper/issues/90). I have also alerted the maintainer of mapshaper and he is going to add functionality to address this as well (https://github.com/mbloch/mapshaper/issues/352).
Thanks for the thorough investigation and bringing this to my attention! Cheers, Andy Teucher (rmapshaper package maintainer) > Date: Tue, 20 Aug 2019 11:55:00 +0200 > From: Roger Bivand <roger.biv...@nhh.no> > To: Phil Haines <phil.haine...@gmail.com> > Cc: <r-sig-geo@r-project.org> > Subject: Re: [R-sig-Geo] rgdal::writeOGR with driver='ESRI Shapefile' > converts Polygon object into a hole > Message-ID: <alpine.lfd.2.21.1908201154410.12...@reclus.nhh.no> > Content-Type: text/plain; charset="us-ascii"; Format="flowed" > > Good, thanks very much! > > Roger > > On Mon, 19 Aug 2019, Phil Haines wrote: > >> Hi Roger, >> >> I have added an example to https://github.com/ateucher/rmapshaper/issues/89 >> >> Hopefully it illustrates the behaviour I was seeing from >> rmapshaper::ms_simplify() but please let me know if not. >> >> And thanks for the nudge towards GPKG, I will investigate. >> Phil >> >> On Mon, 19 Aug 2019 at 15:48, Roger Bivand <roger.biv...@nhh.no> wrote: >>> >>> Hi Phil, >>> >>> On Sun, 18 Aug 2019, Phil Haines wrote: >>> >>>> Hi Roger, >>>> >>>> I originally encountered this issue while trying to reduce the size of >>>> German postal code boundaries. I used rmapshaper::ms_simplify() which >>>> introduced the polygons sharing a common edge. I definitely didn't >>>> need them to be separate polygons, and would happily have merged them >>>> had I been more familiar with gUnaryUnion(). >>> >>> If the German postal code boundaries are open, could you please put (an >>> affected subset) on an URL, and provide a short script to replicate the >>> workflow. Then we can engage with the rmapshaper maintainer, and see >>> whether the underlying problem in the workflow is in the js library or its >>> R wrapper. I've opened: https://github.com/ateucher/rmapshaper/issues/89. >>> >>>> >>>> I apologise if my question has resulted in unnecessary effort on your >>>> part. I reported the behaviour because I found it surprising. However, >>>> I now understand that this example is not a valid simple feature >>>> geometry and that the solution is to take steps to ensure I have valid >>>> geometries, and to repair if not. >>>> >>> >>> No need to apologise!! Your reproducible example was excellent, without it >>> you wouldn't have contributed to getting the internal shapelib code >>> changed to make it less restrictive in future releases of GDAL, >>> potentially benefitting lots of users (who really should drop ESRI >>> shapefiles, and/or should check geometries for validity, but ... whose >>> work you've helped save). By the way, ESRI would be very happy if >>> shapefiles stopped being produced in current workflows, and that new files >>> should rather be GPKG. >>> >>>> Additionally I must confess that I only use the ESRI Shapefile driver >>>> because I don't know any better... My use cases are reading and >>>> writing from R (usually to produce maps) and occasionally opening in >>>> QGIS. I can happily switch to any format that supports this workflow. >>>> >>> >>> GPKG is now very broadly supported, is SF-compliant, and has the big >>> user-facing benefits of not having field name length restrictions, and >>> many fewer encoding issues for field names and string values. >>> >>> Best wishes, >>> >>> Roger >>> >>>> Thank you very much for your time, >>>> Phil >>>> >>>> On Sun, 18 Aug 2019 at 13:59, Roger Bivand <roger.biv...@nhh.no> wrote: >>>>> >>>>> The reason that the problem occurred is that a MULTIPOLYGON with two >>>>> exterior rings becomes invalid if the exterior rings touch along an edge >>>>> (this case). It is important to know the use case, to see whether: >>>>> >>>>> library(rgeos) >>>>> writeWKT(Ps1) >>>>> gIsValid(Ps1, reason=TRUE) >>>>> Ps1a <- gUnaryUnion(gBuffer(Ps1, width=0)) >>>>> gIsValid(Ps1a, reason=TRUE) >>>>> writeWKT(Ps1a) >>>>> >>>>> or equivalently in sf for sf objects, should be applied before trying to >>>>> write the object out to file with this driver. >>>>> >>>>> However, because drivers that are compliant with the simple features >>>>> standard (which bans exterior rings sharing edges) have been permissive >>>>> and do round-trip this invalid object, a relaxation in the OGR ESRI >>>>> shapefile driver has been provided and may be included in a future >>>>> release. >>>>> >>>>> We need to know (see these issues): >>>>> >>>>> https://github.com/r-spatial/sf/issues/1130 >>>>> https://github.com/OSGeo/gdal/issues/1787 >>>>> >>>>> why it was desirable to write out this object using this driver? Could an >>>>> alternative driver have been used, or is ESRI shapefile the only format >>>>> used in the workflow? >>>>> >>>>> If it has to be this driver, could the workflow be changed to repair >>>>> degenerate cases before writing? If using sp classes, rgeos may be used to >>>>> test for and probably repair such geometries before they reach >>>>> rgdal::writeOGR() for this driver. Adding code to sf and rgdal to trap >>>>> degenerate cases does encumber all users with valid geometries with >>>>> the time wasted on extra checking, so building checks into sf and rgdal is >>>>> not desirable. >>>>> >>>>> I hope it is possible to find out more about the use case quickly, to pass >>>>> on to GDAL developers to help motivate a relaxation in their current >>>>> policy with regard to this driver, and to encourage them to include the >>>>> fix branch in a future release of GDAL. >>>>> >>>>> Roger >>>>> >>>>> On Sat, 17 Aug 2019, Roger Bivand wrote: >>>>> >>>>>> Please follow up both here and on: >>>>>> >>>>>> https://github.com/r-spatial/sf/issues/1130 >>>>>> >>>>>> as the problem is also seen in the sf package using the same GDAL ESRI >>>>>> Shapefile driver. >>>>>> >>>>>> Roger >>>>>> >>>>>> On Fri, 16 Aug 2019, Roger Bivand wrote: >>>>>> >>>>>>> On Fri, 16 Aug 2019, Roger Bivand wrote: >>>>>>> >>>>>>>> On Tue, 13 Aug 2019, Phil Haines wrote: >>>>>>>> >>>>>>>>> Dear list, >>>>>>>>> >>>>>>>>> I have a single Polygons object containing multiple Polygon objects >>>>>>>>> that share a common border. When I output this using writeOGR() one >>>>>>>>> of >>>>>>>>> the Polygon objects becomes a hole, as the following example shows. >>>>>>>>> >>>>>>>>> Create a Polygons object containing two adjoining Polygon objects >>>>>>>>>> library(rgdal) >>>>>>>>>> r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1)) >>>>>>>>>> r2 <- r1; r2[,1] <- r2[,1]+1 >>>>>>>>>> Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1) >>>>>>>>>> SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)), >>>>>>>>>> data.frame(Example=c("Minimal"))) >>>>>>>>> >>>>>>>>> Perform a write/readOGR() cycle >>>>>>>>>> fn <- tempfile() >>>>>>>>>> writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile') >>>>>>>>>> SPDF2 <- readOGR(dsn=fn,layer='test') >>>>>>>>> >>>>>>>>> Second Polygon object is now a hole >>>>>>>>>> sapply(SPDF2@polygons[[1]]@Polygons,slot,"hole") >>>>>>>>> [1] FALSE TRUE >>>>>>>>> >>>>>>>>> I see from the sp documentation that "Polygon objects belonging to a >>>>>>>>> Polygons object should either not overlap one-other, or should be >>>>>>>>> fully included" but I am not sure how this relates to bordering >>>>>>>>> Polygon objects. I would welcome any advice as to whether what I am >>>>>>>>> asking of writeOGR is reasonable? >>>>>>>> >>>>>>>> The problem is with the 'ESRI Shapefile' representation and driver: >>>>>>>> >>>>>>>> library(rgdal) >>>>>>>> r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1)) >>>>>>>> r2 <- r1; r2[,1] <- r2[,1]+1 >>>>>>>> Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1) >>>>>>>> SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)), >>>>>>>> data.frame(Example=c("Minimal"))) >>>>>>>> sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "hole") >>>>>>>> sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "ringDir") >>>>>>>> >>>>>>>> # which constructs a MULTIPOLYGON object: >>>>>>>> >>>>>>>> rgeos::writeWKT(SPDF) >>>>>>>> library(sf) >>>>>>>> st_as_text(st_geometry(st_as_sf(SPDF))) >>>>>>>> >>>>>>>> # The 'ESRI Shapefile' driver is not Simple-Feature compliant (it >>>>>>>> # pre-dates it), so the failure occurs by seeing the second exterior >>>>>>>> # ring >>>>>>>> # as an interior ring >>>>>>>> >>>>>>>> fn <- tempfile() >>>>>>>> writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile') >>>>>>>> SPDF2 <- readOGR(dsn=fn, layer='test') >>>>>>>> sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "hole") >>>>>>>> sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, >>>>>>>> "ringDir") >>>>>>>> rgeos::writeWKT(SPDF2) >>>>>>>> st_as_text(st_geometry(st_as_sf(SPDF2))) >>>>>>>> >>>>>>>> # This happens with sf too, using the same GDAL driver: >>>>>>>> >>>>>>>> sf2 <- st_read(dsn=fn, layer='test') >>>>>>>> st_geometry(sf2) >>>>>>>> library(sf) >>>>>>>> st_as_text(st_geometry(st_as_sf(sf2))) >>>>>>>> rgeos::writeWKT(as(sf2, "Spatial")) >>>>>>>> >>>>>>>> # Adding the comment fix doesn't help: >>>>>>>> >>>>>>>> comment(slot(SPDF, "polygons")[[1]]) >>>>>>>> SPDF_c <- rgeos::createSPComment(SPDF) >>>>>>>> comment(slot(SPDF_c, "polygons")[[1]]) >>>>>>>> writeOGR(SPDF_c, fn, layer='test_c', driver='ESRI Shapefile', >>>>>>>> verbose=TRUE) >>>>>>>> >>>>>>>> # reports >>>>>>>> # Object initially classed as: wkbPolygon >>>>>>>> # SFS comments in Polygons objects >>>>>>>> # Object reclassed as: wkbMultiPolygon >>>>>>>> >>>>>>>> SPDF2_c <- readOGR(dsn=fn, layer='test_c') >>>>>>>> sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot, "hole") >>>>>>>> sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot, >>>>>>>> "ringDir") >>>>>>>> rgeos::writeWKT(SPDF2_c) >>>>>>>> st_as_text(st_geometry(st_as_sf(SPDF2_c))) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> # If the input object is written out with the GeoPackage driver: >>>>>>>> >>>>>>>> fn1 <- tempfile(fileext=".gpkg") >>>>>>>> writeOGR(SPDF, fn1, layer="test", driver='GPKG') >>>>>>>> sf2a <- st_read(dsn=fn1,layer='test') >>>>>>>> st_coordinates(st_geometry(sf2a)) >>>>>>>> SPDF2a <- readOGR(dsn=fn1) >>>>>>>> sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "hole") >>>>>>>> sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, >>>>>>>> "ringDir") >>>>>>>> rgeos::writeWKT(SPDF2a) >>>>>>>> st_as_text(st_geometry(st_as_sf(SPDF2a))) >>>>>>>> >>>>>>>> # the issue is resolved. If we separate the exterior rings: >>>>>>>> >>>>>>>> r2a <- r1; r2a[,1] <- r2a[,1]+1.00001 >>>>>>>> Ps1a = Polygons(list(Polygon(r1),Polygon(r2a)),ID=1) >>>>>>>> SPDFa = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1a)), >>>>>>>> data.frame(Example=c("Minimal"))) >>>>>>>> fna <- tempfile() >>>>>>>> writeOGR(SPDFa, fna, layer='test', driver='ESRI Shapefile') >>>>>>>> SPDF2_a <- readOGR(dsn=fna, layer='test') >>>>>>>> sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot, "hole") >>>>>>>> sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot, >>>>>>>> "ringDir") >>>>>>>> rgeos::writeWKT(SPDF2_a) >>>>>>>> st_as_text(st_geometry(st_as_sf(SPDF2_a))) >>>>>>>> >>>>>>>> # we are OK as the two exterior rings do not touch. >>>>>>>> >>>>>>>> # does using sf make a difference? >>>>>>>> >>>>>>>> fn_s <- tempfile(fileext=".shp") >>>>>>>> st_write(st_as_sf(SPDF), dsn=fn_s) >>>>>>>> sf_in <- st_read(fn_s) >>>>>>>> st_as_text(st_geometry(st_as_sf(sf_in))) >>>>>>>> >>>>>>>> # No >>>>>>>> >>>>>>>> fn_s <- tempfile(fileext=".shp") >>>>>>>> st_write(st_as_sf(SPDF_c), dsn=fn_s) >>>>>>>> sf_in_c <- st_read(fn_s) >>>>>>>> st_as_text(st_geometry(st_as_sf(sf_in_c))) >>>>>>>> >>>>>>>> # nor with the pretend-SF-compliant comment set either. >>>>>>>> >>>>>>>> So the weakness is in the "ESRI Shapefile" write driver, or possibly >>>>>>>> in >>>>>>>> the OGRGeometryFactory::organizePolygons() function in GDAL used in >>>>>>>> OGR_write() (a C++ function) called by writeOGR(). If sf::st_write() >>>>>>>> also >>>>>>>> calls OGRGeometryFactory::organizePolygons(), we'd maybe consider that >>>>>>>> it >>>>>>>> has a weakness for the "ESRI Shapefile" driver, but which does not >>>>>>>> affect >>>>>>>> SF-compliant drivers. >>>>>>> >>>>>>> Without the comment set, OGRGeometryFactory::organizePolygons() is used; >>>>>>> with it set, OGRGeometryFactory::organizePolygons() is not used, because >>>>>>> the object is declared as two exterior rings. In both cases, we have the >>>>>>> output object written out and read back in incorrectly with the ESRI >>>>>>> shapefile driver, but SF-compliant drivers round-trip (in the test >>>>>>> GeoJSON), correctly. >>>>>>> >>>>>>> It is likely that the changes made in 2015 to accommodate GeoJSON led to >>>>>>> this possible regression for the ESRI Shapefile driver. I'm adding this >>>>>>> geometry to tests/tripup.R and data files; without code changes the hole >>>>>>> slot is wrong and the ring direction is changes to match, so the >>>>>>> coordinates change too. >>>>>>> >>>>>>> Reading using the deprecated maptools::readShapeSpatial() also gets a >>>>>>> hole >>>>>>> in the second external ring. However, writing with deprecated >>>>>>> maptools:: writeSpatialShape() yields a shapefile that when read with >>>>>>> maptools:: readShapeSpatial() gives the correct two exterior ring, no >>>>>>> hole >>>>>>> object. When read with sf::st_read() and rgdal::readOGR(), the object is >>>>>>> also correct. So the problem definitely lies in rgdal::writeOGR(), and >>>>>>> sf::st_write() - roundtripping with sf::st_write() and sf::st_read() >>>>>>> degrades from MULTIPOLYGON to POLYGON with the ESRI Shapefile driver. >>>>>>> >>>>>>> Roger >>>>>>> >>>>>>>> >>>>>>>> This is probably related to a similar but inverse problem with the >>>>>>>> SF-compliant GeoJSON driver in 2015: >>>>>>>> >>>>>>>> https://stat.ethz.ch/pipermail/r-sig-geo/2015-October/023609.html >>>>>>>> >>>>>>>> continued the next month in: >>>>>>>> >>>>>>>> https://stat.ethz.ch/pipermail/r-sig-geo/2015-November/023656.html >>>>>>>> >>>>>>>> The details are in this SVN diff >>>>>>>> >>>>>>>> >>>>>>>> https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=555&r2=571 >>>>>>>> >>>>>>>> up to the end og the list thread, and this one from then until now: >>>>>>>> >>>>>>>> >>>>>>>> https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=571&r2=733 >>>>>>>> >>>>>>>> Summary: could you change drivers, or is it really necessary to fix an >>>>>>>> EOL >>>>>>>> problem? What is your use case? >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks in advance for your time, >>>>>>>> >>>>>>>> Thanks for a complete example, >>>>>>>> >>>>>>>> Roger >>>>>>>> >>>>>>>>> Phil >>>>>>>>> >>>>>>>>>> sessionInfo() >>>>>>>>> R version 3.5.1 (2018-07-02) >>>>>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit) >>>>>>>>> Running under: Windows >= 8 x64 (build 9200) >>>>>>>>> >>>>>>>>> Matrix products: default >>>>>>>>> >>>>>>>>> locale: >>>>>>>>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United >>>>>>>>> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C >>>>>>>>> LC_TIME=English_United Kingdom.1252 >>>>>>>>> >>>>>>>>> attached base packages: >>>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>>> >>>>>>>>> other attached packages: >>>>>>>>> [1] rgdal_1.4-4 sp_1.3-1 >>>>>>>>> >>>>>>>>> loaded via a namespace (and not attached): >>>>>>>>> [1] compiler_3.5.1 tools_3.5.1 yaml_2.2.0 grid_3.5.1 >>>>>>>>> lattice_0.20-35 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> R-sig-Geo mailing list >>>>>>>>> R-sig-Geo@r-project.org >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Roger Bivand >>>>> Department of Economics, Norwegian School of Economics, >>>>> Helleveien 30, N-5045 Bergen, Norway. >>>>> voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no >>>>> https://orcid.org/0000-0003-2392-6140 >>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>>> >>> >>> -- >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen, Norway. >>> voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no >>> https://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >> > > -- > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N-5045 Bergen, Norway. > voice: +47 55 95 93 55; e-mail: roger.biv...@nhh.no > https://orcid.org/0000-0003-2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en _______________________________________________ R-sig-Geo mailing list R-sig-Geo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo