Re: [gdal-dev] Add a new performance hint for Spatialite
Jukka, I'm very surprised that you need to do that explicitely, since the driver should already do that by default. This is something we discovered last year or the year before when you were experimenting OSM - Spatialite conversions. Since OGR 1.10, the spatial index is created when the datasource is closed. Look at this debug trace: $ ogr2ogr -f sqlite /vsimem/out.sqlite ../autotest/ogr/data/poly.shp -dsco spatialite=yes --debug on OGR: OGROpen(../autotest/ogr/data/poly.shp/0x69a3d0) succeeded as ESRI Shapefile. SQLITE: SpatiaLite v4 DB found ! OGR_SQLITE: exec(CREATE TABLE 'poly' ( OGC_FID INTEGER PRIMARY KEY)) OGR_SQLITE: exec(DELETE FROM geometry_columns WHERE f_table_name = 'poly') OGR_SQLITE: exec(SELECT AddGeometryColumn('poly', 'GEOMETRY', 325834, 'POLYGON', 2)) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'area' FLOAT) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'eas_id' FLOAT) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'prfedea' VARCHAR(16)) OGR_SQLITE: BEGIN Transaction OGR_SQLITE: prepare(INSERT INTO 'poly' (GEOMETRY,area,eas_id,prfedea) VALUES (?,?,?,?)) OGR_SQLITE: COMMIT Transaction OGR2OGR: 10 features written in layer 'poly' OGR_SQLITE: exec(SELECT CreateSpatialIndex('poly', 'GEOMETRY')) SQLITE: Error no such table: layer_statistics OGR: Unloading VirtualOGR module Shape: 10 features read on layer 'poly'. Could try adding --debug on to your ogr2ogr command line and see when CreateSpatialIndex() is created ? Even Hi, I took timings about adding 115 GML files (548 MB together, 3.2 million linestrings) into a Spatialite table. With default settings the table gets initialized with spatial index which makes following inserts slower. Another alternative is to create the table without spatial index, append all the data first and as a last step create spatial index for the ready made table. With spatial index on: Append + index: 71 minutes With spatial index off: Append: 9 minutes Create spatial index: 6 minutes Total: 15 minutes I was rather happy with the initial conversion speed until I made this test which revealed that creating spatial index as a final step made the whole process more than four times faster! This way both data table and spatial index are probably in contiguous chunks in the SQLite datafile and there is no need for post process VACUUM. Vacuuming in SQLite is rather slow and for this 1.3 GB database it takes more than 4 minutes to run. Suggestion: Add a new performance hint on page http://www.gdal.org/ogr/drv_sqlite.html If many source files will be collected into the same Spatialite table it can be much faster to initialize the table without a spatial index by using -lco SPATIAL_INDEX=NO and to create spatial index with a separate command after all the data are appended. Spatial index can be created with ogrinfo command ogrinfo db.sqlite -sql SELECT CreateSpatialIndex('table_name','geometry_column_name') Perhaps it could also be mentioned as a performance hint that VACUUUM can also be done from orginfo as ogrinfo db.sqlite -sql VACUUM -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Add a new performance hint for Spatialite
Hi, Sorry, I must have not emphasized enough that hint is only valid for appending data with subsequent ogr2ogr commands ogr2ogr -f sqlite -dsco spatialite=yes mtk_tos.sqlite -dim 2 -lco spatial_index=no /vsizip/e:\mtk_tos\etrs89\gml\K32.zip tieviiva -gt 65536 ogr2ogr -f sqlite -append mtk_tos.sqlite -dim 2 /vsizip/e:\mtk_tos\etrs89\gml\K34.zip tieviiva -gt 65536 ... repeat the latter command for 113 more GML files... I am not sure if this is a common use case everywhere. However, our National Land Survey delivers vector data divided by map sheets which means that each country wide dataset contains hundreds or thousands of GML files. Even Rouault wrote: Jukka, I'm very surprised that you need to do that explicitely, since the driver should already do that by default. This is something we discovered last year or the year before when you were experimenting OSM - Spatialite conversions. Since OGR 1.10, the spatial index is created when the datasource is closed. Look at this debug trace: $ ogr2ogr -f sqlite /vsimem/out.sqlite ../autotest/ogr/data/poly.shp -dsco spatialite=yes --debug on OGR: OGROpen(../autotest/ogr/data/poly.shp/0x69a3d0) succeeded as ESRI Shapefile. SQLITE: SpatiaLite v4 DB found ! OGR_SQLITE: exec(CREATE TABLE 'poly' ( OGC_FID INTEGER PRIMARY KEY)) OGR_SQLITE: exec(DELETE FROM geometry_columns WHERE f_table_name = 'poly') OGR_SQLITE: exec(SELECT AddGeometryColumn('poly', 'GEOMETRY', 325834, 'POLYGON', 2)) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'area' FLOAT) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'eas_id' FLOAT) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'prfedea' VARCHAR(16)) OGR_SQLITE: BEGIN Transaction OGR_SQLITE: prepare(INSERT INTO 'poly' (GEOMETRY,area,eas_id,prfedea) VALUES (?,?,?,?)) OGR_SQLITE: COMMIT Transaction OGR2OGR: 10 features written in layer 'poly' OGR_SQLITE: exec(SELECT CreateSpatialIndex('poly', 'GEOMETRY')) SQLITE: Error no such table: layer_statistics OGR: Unloading VirtualOGR module Shape: 10 features read on layer 'poly'. Could try adding --debug on to your ogr2ogr command line and see when CreateSpatialIndex() is created ? Even Hi, I took timings about adding 115 GML files (548 MB together, 3.2 million linestrings) into a Spatialite table. With default settings the table gets initialized with spatial index which makes following inserts slower. Another alternative is to create the table without spatial index, append all the data first and as a last step create spatial index for the ready made table. With spatial index on: Append + index: 71 minutes With spatial index off: Append: 9 minutes Create spatial index: 6 minutes Total: 15 minutes I was rather happy with the initial conversion speed until I made this test which revealed that creating spatial index as a final step made the whole process more than four times faster! This way both data table and spatial index are probably in contiguous chunks in the SQLite datafile and there is no need for post process VACUUM. Vacuuming in SQLite is rather slow and for this 1.3 GB database it takes more than 4 minutes to run. Suggestion: Add a new performance hint on page http://www.gdal.org/ogr/drv_sqlite.html If many source files will be collected into the same Spatialite table it can be much faster to initialize the table without a spatial index by using -lco SPATIAL_INDEX=NO and to create spatial index with a separate command after all the data are appended. Spatial index can be created with ogrinfo command ogrinfo db.sqlite -sql SELECT CreateSpatialIndex('table_name','geometry_column_name') Perhaps it could also be mentioned as a performance hint that VACUUUM can also be done from orginfo as ogrinfo db.sqlite -sql VACUUM -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev
Re: [gdal-dev] Add a new performance hint for Spatialite
Ah, got it. I read your message too fast. I've added your suggestions with a simplified version of your below example in http://trac.osgeo.org/gdal/changeset/27104 Another solution for your use case would be to make a OGR VRT with a OGRVRTUnionLayer that reference every GML file and convert the VRT in a single step in spatialite. But we don't have yet a ogrbuildvrt utility to create that VRT in a user friendly way. Hi, Sorry, I must have not emphasized enough that hint is only valid for appending data with subsequent ogr2ogr commands ogr2ogr -f sqlite -dsco spatialite=yes mtk_tos.sqlite -dim 2 -lco spatial_index=no /vsizip/e:\mtk_tos\etrs89\gml\K32.zip tieviiva -gt 65536 ogr2ogr -f sqlite -append mtk_tos.sqlite -dim 2 /vsizip/e:\mtk_tos\etrs89\gml\K34.zip tieviiva -gt 65536 ... repeat the latter command for 113 more GML files... I am not sure if this is a common use case everywhere. However, our National Land Survey delivers vector data divided by map sheets which means that each country wide dataset contains hundreds or thousands of GML files. Even Rouault wrote: Jukka, I'm very surprised that you need to do that explicitely, since the driver should already do that by default. This is something we discovered last year or the year before when you were experimenting OSM - Spatialite conversions. Since OGR 1.10, the spatial index is created when the datasource is closed. Look at this debug trace: $ ogr2ogr -f sqlite /vsimem/out.sqlite ../autotest/ogr/data/poly.shp -dsco spatialite=yes --debug on OGR: OGROpen(../autotest/ogr/data/poly.shp/0x69a3d0) succeeded as ESRI Shapefile. SQLITE: SpatiaLite v4 DB found ! OGR_SQLITE: exec(CREATE TABLE 'poly' ( OGC_FID INTEGER PRIMARY KEY)) OGR_SQLITE: exec(DELETE FROM geometry_columns WHERE f_table_name = 'poly') OGR_SQLITE: exec(SELECT AddGeometryColumn('poly', 'GEOMETRY', 325834, 'POLYGON', 2)) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'area' FLOAT) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'eas_id' FLOAT) OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'prfedea' VARCHAR(16)) OGR_SQLITE: BEGIN Transaction OGR_SQLITE: prepare(INSERT INTO 'poly' (GEOMETRY,area,eas_id,prfedea) VALUES (?,?,?,?)) OGR_SQLITE: COMMIT Transaction OGR2OGR: 10 features written in layer 'poly' OGR_SQLITE: exec(SELECT CreateSpatialIndex('poly', 'GEOMETRY')) SQLITE: Error no such table: layer_statistics OGR: Unloading VirtualOGR module Shape: 10 features read on layer 'poly'. Could try adding --debug on to your ogr2ogr command line and see when CreateSpatialIndex() is created ? Even Hi, I took timings about adding 115 GML files (548 MB together, 3.2 million linestrings) into a Spatialite table. With default settings the table gets initialized with spatial index which makes following inserts slower. Another alternative is to create the table without spatial index, append all the data first and as a last step create spatial index for the ready made table. With spatial index on: Append + index: 71 minutes With spatial index off: Append: 9 minutes Create spatial index: 6 minutes Total: 15 minutes I was rather happy with the initial conversion speed until I made this test which revealed that creating spatial index as a final step made the whole process more than four times faster! This way both data table and spatial index are probably in contiguous chunks in the SQLite datafile and there is no need for post process VACUUM. Vacuuming in SQLite is rather slow and for this 1.3 GB database it takes more than 4 minutes to run. Suggestion: Add a new performance hint on page http://www.gdal.org/ogr/drv_sqlite.html If many source files will be collected into the same Spatialite table it can be much faster to initialize the table without a spatial index by using -lco SPATIAL_INDEX=NO and to create spatial index with a separate command after all the data are appended. Spatial index can be created with ogrinfo command ogrinfo db.sqlite -sql SELECT CreateSpatialIndex('table_name','geometry_column_name') Perhaps it could also be mentioned as a performance hint that VACUUUM can also be done from orginfo as ogrinfo db.sqlite -sql VACUUM -Jukka Rahkonen- ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev ___ gdal-dev mailing list gdal-dev@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/gdal-dev