Re: [OSM-talk] DB Performance (was: JOSM plugin to import GeoJSON?_

2016-03-23 Thread Jukka Rahkonen

Hi,

Stefan did not tell enough details for repeating the test exactly so my 
test set-up is probably a bit different.


Program: GDAL 2.1.0dev, released 2015/99/99 on Windows, compiled with 
sqlite-3.10.2 and libspatialite-4.3.0a

Test data: 100 points (2D) without attributes in shp format
Storage: USB 2.0 drive, same disk used for both input and output
Test script which is also testing a performance hint about turning 
synchronous write off:


@echo %time%
ogr2ogr -f gpkg plain_points.gpkg plain_points.shp
@echo %time%
ogr2ogr -f gpkg plain_points_nosync.gpkg plain_points.shp --config 
OGR_SQLITE_SYNCHRONOUS OFF

@echo %time%
ogr2ogr -f SQLite -dsco spatialite=yes plain_points.sqlite 
plain_points.shp

@echo %time%
ogr2ogr -f SQLite -dsco spatialite=yes plain_points_nosync.sqlite 
plain_points.shp --config OGR_SQLITE_SYNCHRONOUS OFF

@echo %time%
ogr2ogr -f SQLite -dsco spatialite=yes plain_points_nosync_gt.sqlite 
plain_points.shp --config OGR_SQLITE_SYNCHRONOUS OFF -gt 65536

@echo %time%

Timings from the 5 runs:

1: 38 seconds
2: 38 seconds
3: 41 seconds
4: 37 seconds
5: 39 seconds

Conclusions:

1) In my tests with my data, software, and hardware, there were no 
meaningful differences in the conversion speeds. Test data of one 
million points may be too small or then the conversion needs so little 
computing that I was actually measuring the I/O speed of the USB drive.
2) Stefan measured a creation time of 1 min 51 sec for the Spatialite. 
There must be something in that. Fresh GDAL uses a default value of 
2 for -gt (not corrected into all GDAL documents) and using -gt 
65536 did not make any difference with GDAL 2.1-dev. However, with 
explicit -gt 200 the timing was 93 second so it may explain most part of 
the slow result. 200 is what old GDAL versions use as a default value 
for -gt which makes 5000 transactions for one million points. Stefan, 
which GDAL version did you use? Could you make a new test with bigger 
-gt value and if needed, re-consider your conclusion "Spatialite is 
several times slower for creation time". Could it rather be 
"Transactions are expensive with SQLite, try to avoid making too many"?


File sizes of my test data which had million points without attributes:
shp: 35 MB
gpkg: 92 GB
splite: 130 GB

I took also a sample of one million points from a real world dataset 
with attributes: three doubles, two dates, and two strings and that 
changed the numbers quite a lot:

shp: 170 MB
gpkg: 153 GB
splite: 185 GB

No huge differences is file sizes when data have attributes.

-Jukka Rahkonen-




Mike Thompson wrote 2016-03-23 03:26:

This from the ogr2ogr documentation[1] may be relevant:

"When writing into transactional DBMS (SQLite/PostgreSQL,MySQL,
etc...), it might be beneficial to increase the number of INSERT
statements executed between BEGIN TRANSACTION and COMMIT TRANSACTION
statements. This number is specified with the -gt option. For example,
for SQLite, explicitly defining -GT 65536 ensures optimal performance"

Mike
[1] http://www.gdal.org/ogr2ogr.html [3]

On Tue, Mar 22, 2016 at 5:01 PM, Stefan Keller 
wrote:


Hi Frederik and Jukka

Before I try give answers to performance let's be aware that we're
(at
least I am) speaking about a "desktop exchange format", not a
storage
fomat for GIS processing.

But Frederik's comment piqued my curiosity and I did some quick
comparison.
I generated 1 mio. records in PostsGIS with this table
CREATE TABLE benchmark (id serial primary key, txt varchar(32),
geom
geometry(point,4326) );

Then I used OGR2OGR to create the following three file formats:
GeoPackage (using 73.9 MB disk space), Shapefiles (dbf/shp/shx 117
MB)
and Spatialite (173 MB).

Creation time of GeoPackage was 18 sec., Shapefile 21 sec. and
Spatialite 1 min 51 sec.
So, GeoPackage is a bit faster than Shapefiles and significantly
(about 37%) smaller in size.
Spatialite in fact consumes much more disk space than Shapefile and
GeoPackage, and Spatialite is several times slower for creation
time.

This could explain the preformance issues of Spatialite Frederic
mentioned.

:Stefan

2016-03-22 13:56 GMT+01:00 Jukka Rahkonen
:

Frederik Ramm  remote.org [1]> writes:



Hi,

On 03/20/2016 10:56 PM, Stefan Keller wrote:
> But Shapefile remains an oldtimer with more drawbacks than

limited

> field names; see [1].
> GeoJSON (ascii) and GeoPackages (binary) are formats which are

more

> suited for the job.
> I still have hope that JOSM will be able to read those vector

formats too.


Frankly, whenever I venture into the brave new world of

Spatialite, I

come back to good old shape files after a while for performance

reasons.

I'm not sure if Geopackage has significant performance

improvements over

simple Spatialite but if it hasn't then my recommendation for

simple GIS

processing is certainly to stick with shape files for the time

being -

despite all their shortcomings.



Hi Frederic,

I would like to 

[OSM-talk] DB Performance (was: JOSM plugin to import GeoJSON?_

2016-03-22 Thread Mike Thompson
This from the ogr2ogr documentation[1] may be relevant:

"When writing into transactional DBMS (SQLite/PostgreSQL,MySQL, etc...), it
might be beneficial to increase the number of INSERT statements executed
between BEGIN TRANSACTION and COMMIT TRANSACTION statements. This number is
specified with the -gt option. For example, for SQLite, explicitly
defining *-gt
65536* ensures optimal performance"

Mike
[1] http://www.gdal.org/ogr2ogr.html


On Tue, Mar 22, 2016 at 5:01 PM, Stefan Keller  wrote:

> Hi Frederik and Jukka
>
> Before I try give answers to performance let's be aware that we're (at
> least I am) speaking about a "desktop exchange format", not a storage
> fomat for GIS processing.
>
> But Frederik's comment piqued my curiosity and I did some quick comparison.
> I generated 1 mio. records in PostsGIS with this table
> CREATE TABLE benchmark (id serial primary key, txt varchar(32), geom
> geometry(point,4326) );
>
> Then I used OGR2OGR to create the following three file formats:
> GeoPackage (using 73.9 MB disk space), Shapefiles (dbf/shp/shx 117 MB)
> and Spatialite (173 MB).
>
> Creation time of GeoPackage was 18 sec., Shapefile 21 sec. and
> Spatialite 1 min 51 sec.
> So, GeoPackage is a bit faster than Shapefiles and significantly
> (about 37%) smaller in size.
> Spatialite in fact consumes much more disk space than Shapefile and
> GeoPackage, and Spatialite is several times slower for creation time.
>
> This could explain the preformance issues of Spatialite Frederic mentioned.
>
> :Stefan
>
> 2016-03-22 13:56 GMT+01:00 Jukka Rahkonen :
> > Frederik Ramm  remote.org> writes:
> >
> >>
> >> Hi,
> >>
> >> On 03/20/2016 10:56 PM, Stefan Keller wrote:
> >> > But Shapefile remains an oldtimer with more drawbacks than limited
> >> > field names; see [1].
> >> > GeoJSON (ascii) and GeoPackages (binary) are formats which are more
> >> > suited for the job.
> >> > I still have hope that JOSM will be able to read those vector formats
> too.
> >>
> >> Frankly, whenever I venture into the brave new world of Spatialite, I
> >> come back to good old shape files after a while for performance reasons.
> >> I'm not sure if Geopackage has significant performance improvements over
> >> simple Spatialite but if it hasn't then my recommendation for simple GIS
> >> processing is certainly to stick with shape files for the time being -
> >> despite all their shortcomings.
> >
> >
> > Hi Frederic,
> >
> > I would like to receive some sample data, exact way to reproduce some of
> > your ventures and cold numbers about the speed you have experienced.
> > Spatialite does have it's limits but for plain selects with spatial and
> > attribute filters it can well outperform both shapefiles and PostGIS.
> >
> > I keep most vector data for WMS services in Spatialite or GeoPackage due
> to
> > the already mentioned and some other reasons:
> > - supports long attribute names
> > - supports strings longer than 255 characters
> > - supports SQL
> > - supports attribute indexes
> > - much less encoding problems due to UTF-8
> > - one single file vs. a bunch of files in shapefile, perhaps even split
> to
> > separate bunches for points, lines and polygons.
> >
> > For me SpatiaLite is a little bit slower than shapefiles if only spatial
> > filter (BBOX) is used but usually faster if also attribute filters are
> > involved, especially if more than one field is needed in filters
> (Shapefiles
> > can be sorted by one attribute only). Of course spatialite must have
> indexes
> > which suit the queries and when it comes to spatial index, the client
> must
> > know how to utilize the table based R-Tree index. I also recommend to
> VACUUM
> > once the database is ready to use.
> >
> > Many spatial operations are relatively slow in Spatialite and I don't
> > usually utilize them on-the-fly with WMS server. Instead, I run the
> > algorithm once and store the result into a new table because a few
> > mega/gigabytes of additional disk space is not crucial on the server.
> > However, such operations tend to be slow also if shapefiles are used as
> > source data.
> >
> > Write performance especially with concurrent writes is another story. I
> am
> > talking about read-only operations. I know that I am writing empty words
> as
> > far as I do not include reproducible facts but I am willing to join to a
> > controlled test if someone is organizing such.
> >
> > -Jukka Rahkonen-
> >
> >
> >
> >
> > ___
> > talk mailing list
> > talk@openstreetmap.org
> > https://lists.openstreetmap.org/listinfo/talk
>
> ___
> talk mailing list
> talk@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk
>
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk