[gdal-dev] Add a new performance hint for Spatialite

2014-03-28 Thread Jukka Rahkonen
Hi,

I took timings about adding 115 GML files (548 MB together, 3.2 million
linestrings) into a Spatialite table. With default settings the table gets
initialized with spatial index which makes following inserts slower. Another
alternative is to create the table without spatial index, append all the
data first and as a last step create spatial index for the ready made table.

With spatial index on:
Append + index: 71 minutes

With spatial index off:
Append: 9 minutes
Create spatial index: 6 minutes
Total: 15 minutes

I was rather happy with the initial conversion speed until I made this test
which revealed that creating spatial index as a final step made the whole
process more than four times faster! This way both data table and spatial
index are probably in contiguous chunks in the SQLite datafile and there is
no need for post process VACUUM. Vacuuming in SQLite is rather slow and for
this 1.3 GB database it takes more than 4 minutes to run.

Suggestion: Add a new performance hint on page
http://www.gdal.org/ogr/drv_sqlite.html

"If many source files will be collected into the same Spatialite table it
can be much faster to initialize the table without a spatial index by using
-lco SPATIAL_INDEX=NO and to create spatial index with a separate command
after all the data are appended. Spatial index can be created with ogrinfo
command
ogrinfo db.sqlite -sql "SELECT
CreateSpatialIndex('table_name','geometry_column_name')"

Perhaps it could also be mentioned as a performance hint that VACUUUM can
also be done from orginfo as
ogrinfo db.sqlite -sql "VACUUM"

-Jukka Rahkonen-



___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Add a new performance hint for Spatialite

2014-03-28 Thread Even Rouault
Jukka,

I'm very surprised that you need to do that explicitely, since the driver should
already do that by default. This is something we discovered last year or the
year before when you were experimenting OSM -> Spatialite conversions. Since OGR
1.10, the spatial index is created when the datasource is closed.

Look at this debug trace:

$ ogr2ogr -f sqlite /vsimem/out.sqlite ../autotest/ogr/data/poly.shp -dsco
spatialite=yes --debug on
OGR: OGROpen(../autotest/ogr/data/poly.shp/0x69a3d0) succeeded as ESRI
Shapefile.
SQLITE: SpatiaLite v4 DB found !
OGR_SQLITE: exec(CREATE TABLE 'poly' (   OGC_FID INTEGER PRIMARY KEY))
OGR_SQLITE: exec(DELETE FROM geometry_columns WHERE f_table_name = 'poly')
OGR_SQLITE: exec(SELECT AddGeometryColumn('poly', 'GEOMETRY', 325834, 'POLYGON',
2))
OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'area' FLOAT)
OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'eas_id' FLOAT)
OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'prfedea' VARCHAR(16))
OGR_SQLITE: BEGIN Transaction
OGR_SQLITE: prepare(INSERT INTO 'poly' ("GEOMETRY","area","eas_id","prfedea")
VALUES (?,?,?,?))
OGR_SQLITE: COMMIT Transaction
OGR2OGR: 10 features written in layer 'poly'
OGR_SQLITE: exec(SELECT CreateSpatialIndex('poly', 'GEOMETRY'))
SQLITE: Error no such table: layer_statistics
OGR: Unloading VirtualOGR module
Shape: 10 features read on layer 'poly'.

Could try adding --debug on to your ogr2ogr command line and see when
CreateSpatialIndex() is created ?

Even

> Hi,
>
> I took timings about adding 115 GML files (548 MB together, 3.2 million
> linestrings) into a Spatialite table. With default settings the table gets
> initialized with spatial index which makes following inserts slower. Another
> alternative is to create the table without spatial index, append all the
> data first and as a last step create spatial index for the ready made table.
>
> With spatial index on:
> Append + index: 71 minutes
>
> With spatial index off:
> Append: 9 minutes
> Create spatial index: 6 minutes
> Total: 15 minutes
>
> I was rather happy with the initial conversion speed until I made this test
> which revealed that creating spatial index as a final step made the whole
> process more than four times faster! This way both data table and spatial
> index are probably in contiguous chunks in the SQLite datafile and there is
> no need for post process VACUUM. Vacuuming in SQLite is rather slow and for
> this 1.3 GB database it takes more than 4 minutes to run.
>
> Suggestion: Add a new performance hint on page
> http://www.gdal.org/ogr/drv_sqlite.html
>
> "If many source files will be collected into the same Spatialite table it
> can be much faster to initialize the table without a spatial index by using
> -lco SPATIAL_INDEX=NO and to create spatial index with a separate command
> after all the data are appended. Spatial index can be created with ogrinfo
> command
> ogrinfo db.sqlite -sql "SELECT
> CreateSpatialIndex('table_name','geometry_column_name')"
>
> Perhaps it could also be mentioned as a performance hint that VACUUUM can
> also be done from orginfo as
> ogrinfo db.sqlite -sql "VACUUM"
>
> -Jukka Rahkonen-
>
>
>
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>


___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Add a new performance hint for Spatialite

2014-03-28 Thread Rahkonen Jukka (Tike)
Hi,

Sorry, I must have not emphasized enough that hint is only valid for appending 
data with subsequent ogr2ogr commands

ogr2ogr -f sqlite -dsco spatialite=yes mtk_tos.sqlite -dim 2 -lco 
spatial_index=no /vsizip/e:\mtk_tos\etrs89\gml\K32.zip tieviiva  -gt 65536
ogr2ogr -f sqlite -append mtk_tos.sqlite -dim 2 
/vsizip/e:\mtk_tos\etrs89\gml\K34.zip tieviiva -gt 65536
... repeat the latter command for 113 more GML files...

I am not sure if this is a common use case everywhere. However, our National 
Land Survey delivers vector data divided by map sheets which means that each 
country wide dataset contains hundreds or thousands of GML files.
 
Even Rouault wrote:
 
> Jukka,
> 
> I'm very surprised that you need to do that explicitely, since the driver 
> should
> already do that by default. This is something we discovered last year or the 
> year
> before when you were experimenting OSM -> Spatialite conversions. Since OGR
> 1.10, the spatial index is created when the datasource is closed.
> 
> Look at this debug trace:
> 
> $ ogr2ogr -f sqlite /vsimem/out.sqlite ../autotest/ogr/data/poly.shp -dsco
> spatialite=yes --debug on
> OGR: OGROpen(../autotest/ogr/data/poly.shp/0x69a3d0) succeeded as ESRI
> Shapefile.
> SQLITE: SpatiaLite v4 DB found !
> OGR_SQLITE: exec(CREATE TABLE 'poly' (   OGC_FID INTEGER PRIMARY KEY))
> OGR_SQLITE: exec(DELETE FROM geometry_columns WHERE f_table_name =
> 'poly')
> OGR_SQLITE: exec(SELECT AddGeometryColumn('poly', 'GEOMETRY', 325834,
> 'POLYGON',
> 2))
> OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'area' FLOAT)
> OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'eas_id' FLOAT)
> OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'prfedea' VARCHAR(16))
> OGR_SQLITE: BEGIN Transaction
> OGR_SQLITE: prepare(INSERT INTO 'poly'
> ("GEOMETRY","area","eas_id","prfedea")
> VALUES (?,?,?,?))
> OGR_SQLITE: COMMIT Transaction
> OGR2OGR: 10 features written in layer 'poly'
> OGR_SQLITE: exec(SELECT CreateSpatialIndex('poly', 'GEOMETRY'))
> SQLITE: Error no such table: layer_statistics
> OGR: Unloading VirtualOGR module
> Shape: 10 features read on layer 'poly'.
> 
> Could try adding --debug on to your ogr2ogr command line and see when
> CreateSpatialIndex() is created ?
> 
> Even
> 
> > Hi,
> >
> > I took timings about adding 115 GML files (548 MB together, 3.2
> > million
> > linestrings) into a Spatialite table. With default settings the table
> > gets initialized with spatial index which makes following inserts
> > slower. Another alternative is to create the table without spatial
> > index, append all the data first and as a last step create spatial index 
> > for the
> ready made table.
> >
> > With spatial index on:
> > Append + index: 71 minutes
> >
> > With spatial index off:
> > Append: 9 minutes
> > Create spatial index: 6 minutes
> > Total: 15 minutes
> >
> > I was rather happy with the initial conversion speed until I made this
> > test which revealed that creating spatial index as a final step made
> > the whole process more than four times faster! This way both data
> > table and spatial index are probably in contiguous chunks in the
> > SQLite datafile and there is no need for post process VACUUM.
> > Vacuuming in SQLite is rather slow and for this 1.3 GB database it takes 
> > more
> than 4 minutes to run.
> >
> > Suggestion: Add a new performance hint on page
> > http://www.gdal.org/ogr/drv_sqlite.html
> >
> > "If many source files will be collected into the same Spatialite table
> > it can be much faster to initialize the table without a spatial index
> > by using -lco SPATIAL_INDEX=NO and to create spatial index with a
> > separate command after all the data are appended. Spatial index can be
> > created with ogrinfo command ogrinfo db.sqlite -sql "SELECT
> > CreateSpatialIndex('table_name','geometry_column_name')"
> >
> > Perhaps it could also be mentioned as a performance hint that VACUUUM
> > can also be done from orginfo as ogrinfo db.sqlite -sql "VACUUM"
> >
> > -Jukka Rahkonen-
> >
> >
> >
> > ___
> > gdal-dev mailing list
> > gdal-dev@lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >
> 

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Add a new performance hint for Spatialite

2014-03-28 Thread Even Rouault
Ah, got it. I read your message too fast. I've added your suggestions with a
simplified version of your below example in
http://trac.osgeo.org/gdal/changeset/27104

Another solution for your use case would be to make a OGR VRT with a
OGRVRTUnionLayer that reference every GML file and convert the VRT in a single
step in spatialite. But we don't have yet a ogrbuildvrt utility to create that
VRT in a user friendly way.

> Hi,
>
> Sorry, I must have not emphasized enough that hint is only valid for
> appending data with subsequent ogr2ogr commands
>
> ogr2ogr -f sqlite -dsco spatialite=yes mtk_tos.sqlite -dim 2 -lco
> spatial_index=no /vsizip/e:\mtk_tos\etrs89\gml\K32.zip tieviiva  -gt 65536
> ogr2ogr -f sqlite -append mtk_tos.sqlite -dim 2
> /vsizip/e:\mtk_tos\etrs89\gml\K34.zip tieviiva -gt 65536
> ... repeat the latter command for 113 more GML files...
>
> I am not sure if this is a common use case everywhere. However, our National
> Land Survey delivers vector data divided by map sheets which means that each
> country wide dataset contains hundreds or thousands of GML files.
>
> Even Rouault wrote:
>
> > Jukka,
> >
> > I'm very surprised that you need to do that explicitely, since the driver
> should
> > already do that by default. This is something we discovered last year or
> the year
> > before when you were experimenting OSM -> Spatialite conversions. Since OGR
> > 1.10, the spatial index is created when the datasource is closed.
> >
> > Look at this debug trace:
> >
> > $ ogr2ogr -f sqlite /vsimem/out.sqlite ../autotest/ogr/data/poly.shp -dsco
> > spatialite=yes --debug on
> > OGR: OGROpen(../autotest/ogr/data/poly.shp/0x69a3d0) succeeded as ESRI
> > Shapefile.
> > SQLITE: SpatiaLite v4 DB found !
> > OGR_SQLITE: exec(CREATE TABLE 'poly' (   OGC_FID INTEGER PRIMARY KEY))
> > OGR_SQLITE: exec(DELETE FROM geometry_columns WHERE f_table_name =
> > 'poly')
> > OGR_SQLITE: exec(SELECT AddGeometryColumn('poly', 'GEOMETRY', 325834,
> > 'POLYGON',
> > 2))
> > OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'area' FLOAT)
> > OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'eas_id' FLOAT)
> > OGR_SQLITE: exec(ALTER TABLE 'poly' ADD COLUMN 'prfedea' VARCHAR(16))
> > OGR_SQLITE: BEGIN Transaction
> > OGR_SQLITE: prepare(INSERT INTO 'poly'
> > ("GEOMETRY","area","eas_id","prfedea")
> > VALUES (?,?,?,?))
> > OGR_SQLITE: COMMIT Transaction
> > OGR2OGR: 10 features written in layer 'poly'
> > OGR_SQLITE: exec(SELECT CreateSpatialIndex('poly', 'GEOMETRY'))
> > SQLITE: Error no such table: layer_statistics
> > OGR: Unloading VirtualOGR module
> > Shape: 10 features read on layer 'poly'.
> >
> > Could try adding --debug on to your ogr2ogr command line and see when
> > CreateSpatialIndex() is created ?
> >
> > Even
> >
> > > Hi,
> > >
> > > I took timings about adding 115 GML files (548 MB together, 3.2
> > > million
> > > linestrings) into a Spatialite table. With default settings the table
> > > gets initialized with spatial index which makes following inserts
> > > slower. Another alternative is to create the table without spatial
> > > index, append all the data first and as a last step create spatial index
> for the
> > ready made table.
> > >
> > > With spatial index on:
> > > Append + index: 71 minutes
> > >
> > > With spatial index off:
> > > Append: 9 minutes
> > > Create spatial index: 6 minutes
> > > Total: 15 minutes
> > >
> > > I was rather happy with the initial conversion speed until I made this
> > > test which revealed that creating spatial index as a final step made
> > > the whole process more than four times faster! This way both data
> > > table and spatial index are probably in contiguous chunks in the
> > > SQLite datafile and there is no need for post process VACUUM.
> > > Vacuuming in SQLite is rather slow and for this 1.3 GB database it takes
> more
> > than 4 minutes to run.
> > >
> > > Suggestion: Add a new performance hint on page
> > > http://www.gdal.org/ogr/drv_sqlite.html
> > >
> > > "If many source files will be collected into the same Spatialite table
> > > it can be much faster to initialize the table without a spatial index
> > > by using -lco SPATIAL_INDEX=NO and to create spatial index with a
> > > separate command after all the data are appended. Spatial index can be
> > > created with ogrinfo command ogrinfo db.sqlite -sql "SELECT
> > > CreateSpatialIndex('table_name','geometry_column_name')"
> > >
> > > Perhaps it could also be mentioned as a performance hint that VACUUUM
> > > can also be done from orginfo as ogrinfo db.sqlite -sql "VACUUM"
> > >
> > > -Jukka Rahkonen-
> > >
> > >
> > >
> > > ___
> > > gdal-dev mailing list
> > > gdal-dev@lists.osgeo.org
> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> > >
> >
>
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>


___
gdal-dev mailing