Oh, i get it (duh!): overhead = minimum file size. Makes sense, since every .gpkg is its own SQLite instance -and 5mb is a small price to pay for a RDBMS in a single file, essentially. Still, something to bear in mind, while designing one's information architecture for the GIS. Thanks, Charles!
On Tue, Aug 11, 2020 at 9:02 PM Charles Dixon-Paver <char...@kartoza.com> wrote: > Sorry for the confusion Walt, but the "overhead" I was referring to here > is actually the fact that gpkg is implemented as a SQLite container with a > *minimum* filesize which adds a couple MB. I think the "overhead" will > vary depending on the type of data stored. Basically, if you make one for > every shapefile you could probably expect to end up with an additional ~5MB > of bloat to your existing data store for each shapefile converted... > > Upper limits as you stated should be (in theory) ~140TB, or at least > somewhere upwards from whatever I would usually consider practical to store > in a database that's stored as a single flat file... > > Regarding geomoose on Mac, you could try use docker to test it out > https://github.com/geomoose/docker-geomoose > > In terms of the specifics on how to restructure your data infrastructure, > it seems like it's going to depend a lot on the specifics of your use case > and is probably outside the scope of this mailing list, or at least this > thread... Migrating projects is another beast altogether, so maybe someone > else can offer advice on that. > > Regards > > On Tue, 11 Aug 2020 at 20:20, Walt Ludwick <w...@valedalama.net> wrote: > >> This makes good sense to me, Charles. I've got enough experience with >> databases (tho not so much with geographic ones) that i'm comfortable w/ >> SQL query tools. Unless a list or directory is small enough to eyeball with >> ease (certainly the case with this legacy QGIS instance i've inherited), >> i'd much rather search than dig for the data, so... In this sense at least, >> less fragmentation is more. >> >> That being said: i don't know if i can bundle all into a single .gpkg; if >> there is a size limit as low as 5MB on each one, then certainly not. >> Google search on string "Geopackage size limit" returns multiple >> credible-looking pages that cite a limit (subject to filesystem >> constraints) of 140TB. Can you clarify about the "~5MB of storage overhead >> for each unique .gpkg" comment? >> >> In any case: if i go for selective consolidation -selection scheme still >> TBD[1]- then i must certainly bear in mind your caution about the data loss >> risk associated with careless use of certain processing tools/ >> configurations. If there be tools & configs oriented to one & only one >> .gpkg file, i don't yet know about them... But i'll certainly watch out for >> that and keep a good backup! >> >> [1] As to selection (or classification, i should say) and naming of .gpkg >> files that will consolidate any number of .shp files: i am thinking along >> lines of either data type (raster and vector being two high-level >> groupings, with subtypes that might have more to do with the schema of >> tabular data), or else data source (which often has much to do with data >> reliability, maintainability -and value, ultimately). Need to think a bit >> more deeply on this, and would be happy for any guidance from more >> experienced GIS admins. >> >> >> On Tue, Aug 11, 2020 at 2:31 PM Charles Dixon-Paver <char...@kartoza.com> >> wrote: >> >>> Regarding the one-vs-many approach to gpkgs, I recommend consolidation >>> (within reason). I feel that the temptation to use gpkg as a drop-in >>> replacement for shp is familiarity with processes I personally consider to >>> be largely outmoded. I think it's worth getting over the initial >>> (relatively shallow) learning curve so that when you start working with db >>> oriented systems like PostGIS, everything makes sense right out of the gate. >>> >>> Basically it boils down to how you want to manage or distribute them as >>> you don't have traditional db roles. Personally, I try to package things >>> into "data.gpkg/something" and "data.gpkg/somethingelse" wherever possible, >>> rather than "a.gpkg/a" and "b.gpkg/b". It usually makes moving data around >>> easier for me. If you have a lot of inputs, maybe split it into unique >>> gpkgs based on some categorising criteria (like you might do with a schema) >>> rather than one monolithic gpkg. Performing maintenance (vacuum) on a large >>> number of unique gpkgs seems like an unnecessary chore. >>> >>> One limitation for gpkg is that certain processing tools/ configurations >>> will only support writing to an entire gpkg, so if you lack experience >>> you'll need to be careful not to overwrite all of your data and also have a >>> decent backup plan in place. Usually you can get away with utilising a >>> scratch.gpkg for that purpose with no risk to your primary datastore. >>> >>> Using the one-per-item feature offers little data management benefit >>> from shapefiles aside from removing the auxiliary files and being able to >>> store styles (as well as lae). There is little performance benefit over shp >>> directly from what I understand (both use WKB), but there is ~5MB of >>> storage overhead for each unique gpkg (if I remember correctly), but this >>> will depend on your use case. >>> >>> Hope that helps. >>> >>> On Tue, 11 Aug 2020 at 15:13, Basques, Bob (CI-StPaul) < >>> bob.basq...@ci.stpaul.mn.us> wrote: >>> >>>> *Depending on your end goal, you might be more suited to leaving things >>>> as they are and using some sort of content explorer to organize the >>>> existing data. Then worry about migrating to different formats as needed.* >>>> >>>> >>>> >>>> *We’ve been using GeoMoose for this purpose. It can connect to just >>>> about any data source on the back end, such as SHP, Postgres, and >>>> GeoPackage to name a few, but also can connect to proprietary services as >>>> well. Because it can use Mapserver as a display engine and data query >>>> tool, it lends itself to online exploration of the data without the need >>>> for a full blown GIS tool. This allows for wide spread use by non-GIS >>>> pros. The datasets can still be managed by you with QGIS and/or in >>>> Postgres/postgis, or whatever you prefer for that purpose. The Mapserver >>>> setup allow for connecting to just about any type of service behind the >>>> scenes, and with the right configuration, you can also enable each dataset >>>> in the GeoMoose catalog as a WMS/WFS data source, thee standard for open >>>> data format access and publishing.* >>>> >>>> >>>> >>>> *Bobb* >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *From:* Qgis-user <qgis-user-boun...@lists.osgeo.org> * On Behalf Of *Walt >>>> Ludwick >>>> *Sent:* Tuesday, August 11, 2020 7:45 AM >>>> *To:* qgis-user@lists.osgeo.org >>>> *Subject:* Re: [Qgis-user] Migrating legacy QGIS instance >>>> >>>> >>>> >>>> *Think Before You Click: *This email originated *outside *our >>>> organization. >>>> >>>> >>>> >>>> I'm on MacOS -and not so very comfortable with command line scripting- >>>> so it looks like i might have to go the drag&drop way to import these .shp >>>> files. Will take some time, but at least that way i can be sure about what >>>> i've put where, and in what form. >>>> >>>> >>>> >>>> But i do wonder about the (a) "stick multiple shps into a single gpkg" >>>> OR (b) "create one per feature" decision, since i'm not experienced enough >>>> to have a clear preference about this. Can you say anything about pros & >>>> cons of going one way vs the other? >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Aug 11, 2020 at 11:45 AM Charles Dixon-Paver < >>>> char...@kartoza.com> wrote: >>>> >>>> Easiest way for me is to use the GDAL ogr2ogr >>>> <https://gdal.org/programs/ogr2ogr.html> command using a bash script >>>> or cmd batch to traverse your directories (depending on how you installed >>>> QGIS this should be on your path). I don't know what environment you're >>>> running though. >>>> >>>> >>>> >>>> You can either stick multiple shps into a single gpkg or create one per >>>> feature as you prefer. ogr2ogr can also push shp files directly into >>>> PostGIS. When you want to consolidate or migrate data (between gpkgs or >>>> from gpkg to PostGIS) you can simply select the feature layers you want and >>>> use drag and drop from the QGIS 3 Browser panel to copy multiple features >>>> to a target location. >>>> >>>> >>>> >>>> Others might have different approaches though. >>>> >>>> >>>> >>>> Regards >>>> >>>> >>>> >>>> On Tue, 11 Aug 2020 at 12:24, Walt Ludwick <w...@valedalama.net> wrote: >>>> >>>> I've inherited a legacy GIS, built up over some years in versions 2.x, >>>> that i'm now responsible to maintain. Being an almost complete n00b (did >>>> take a short course in QGIS a good few years ago, but still..), i could >>>> really use some advice about migration. >>>> >>>> i've created a new QGIS instance in version 3.14, into which i am >>>> trying to bring all useful content from our old system: oodles of >>>> shapefiles, essentially, plus all those other files (each .shp file appears >>>> to bring with it a set of.shx, .dbf, .prj, qpj files, plus a .cpg file for >>>> each layer, it seems). This is a significant dataset- 14gb, >1000 files >>>> -and that is just base data, not counting Projects built on this data or >>>> Layouts used for presenting these projects in various ways. Some of this is >>>> cruft that i can happily do without, but still: i've got a lot of >>>> porting-over to do, without a clear idea of how best to do it. >>>> >>>> The one thing i'm clear about is: i want it all in a non-proprietary >>>> database (i.e. no more mess of .shp and related files) that is above all >>>> quick & easy to navigate & manage. It is a single-user system at this >>>> point, but i do aim to open it up to colleagues (off-LAN, i.e. via >>>> Internet) as soon as i've developed simple apps for them to use. No idea >>>> how long it'll take me to get there, so... >>>> >>>> Big question at this point is: What should be the new storage format >>>> for all this data? Having read a few related opinions on StackOverflow, i >>>> get the sense that GeoPackage will probably make for easiest migration >>>> (per this >>>> encouraging article >>>> <https://medium.com/@GispoFinland/learn-spatial-sql-and-master-geopackage-with-qgis-3-16b1e17f0291>, >>>> it's a simple matter of drag&drop -simple if you have just a few, i guess! >>>> [1]), and can easily support my needs in the short term, but then i wonder: >>>> How will i manage migration to PostGIS when i eventually put this system >>>> online with different users/ roles enabled? >>>> >>>> >>>> >>>> [1] Given that i need to pull in some hundreds of .shp files that are >>>> stored in a tree of many folders & subfolders, i also wonder: is there a >>>> simple way that i can ask QGIS to traverse a certain directory, pull in all >>>> the .shp files -each as its own .gpkg layer, i suppose? >>>> >>>> >>>> >>>> Any advice about managing this migration would be much appreciated! >>>> >>>> _______________________________________________ >>>> Qgis-user mailing list >>>> Qgis-user@lists.osgeo.org >>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user >>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user >>>> >>>> _______________________________________________ >>>> Qgis-user mailing list >>>> Qgis-user@lists.osgeo.org >>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user >>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user >>> >>> _______________________________________________ >> Qgis-user mailing list >> Qgis-user@lists.osgeo.org >> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user >> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user > >
_______________________________________________ Qgis-user mailing list Qgis-user@lists.osgeo.org List info: https://lists.osgeo.org/mailman/listinfo/qgis-user Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user