Hi,

Have you tried to find an explanation for the difference between Spatialite and 
geopackage? They are both SQLite databases and there is not much difference in 
how they store the geometries into BLOBs. Does direct SQL “select * from table” 
give similar results?

I have not experienced such difference with Mapserver. In some not so 
scientific tests with 1.2 million polygons I found that shapefile was the 
fastest and Spatialite/Geopackage were slightly slower, but still faster than 
PostGIS with plain small bbox queries. The advantage of shapefile was lost when 
attribute filters were needed for more than one attribute. Comparison was made 
between a sorted shapefile and database tables with appropriate indexes. The 
BBOX queries returned typically about 10-50 polygons with simple geometries – 
agricultural parcels.

-Jukka Rahkonen-

Lähettäjä: mapserver-users <mapserver-users-boun...@lists.osgeo.org> Puolesta 
Andreas Neumann
Lähetetty: tiistai 27. huhtikuuta 2021 18.40
Vastaanottaja: Wouter Visscher <wouter.vissc...@gmail.com>
Kopio: mapserver-users <mapserver-users@lists.osgeo.org>
Aihe: Re: [mapserver-users] Mapserver installation in cloud environments 
(kubernetes)

Hi all,

Thanks for your reactions and replies to my request if there is experience in 
running UMN MapServer in the cloud. Sorry about the delay in responding.

I had a look at the MapServerless AWS Lambda Layer project and corresponding 
video. Thanks for sharing this work!

Pre-rendered tiles are of course good for background maps, but for other 
layers, esp. frequently updated layers and layers where we want to allow custom 
styling, feature info, etc. map tiles are not good alternatives.

Interesting to me was the feedback from Wouter about the Dutch NGDI (PDOK), 
which seems to be more similar to our GDIs (province level and national level 
OGC services in Switzerland). Interesting to hear that you are copying data as 
close as possible to the pods (Geopackage or by bringing Postgis really close 
to the Pods).

In the case of QGIS Server it is also interesting to see that the different 
vector data seem to matter substantially. We have a QGIS Server performance 
suite published at http://test.qgis.org/perf_test/graffiti/ --> scroll to the 
latest date. In the performance test suite there is a section comparing 
different vector data sources. See f.e. 
http://test.qgis.org/perf_test/graffiti/2021_04_27_01_00/report.html#c395b4ae16b447f1b3e6fc127a4531da
 where Postgis and SpatiaLite data sources are substantially faster than 
Shapefile and Geopackage is clearly the looser. Not sure if the same applies 
for UMN MapServer? Maybe that is a pattern that applies to QGIS Server only ...

Wouter: I would be interested in learning more about the "ogc-webservice-proxy" 
you mention for determining the size/duration/location. Do you have more to 
share regarding this proxy service? What is the purpose of it and how does it 
work? Do you also separate short-running requests (e.g. GetFeatureInfo or 
GetLegendGraphics) vs long-running requests (GetMap, printing services, report 
generation)?

Is your "traefik" load balancer using "round robin" load balancing or something 
more advanced based on system load or even by using an estimate/prediction how 
long a request might take based on the parameters submitted?

Thanks all for the interesting discussion!

Andreas



On Fri, 23 Apr 2021 at 22:07, Wouter Visscher 
<wouter.vissc...@gmail.com<mailto:wouter.vissc...@gmail.com>> wrote:
Hi Andreas,

To answer your question: "Do you know of any work ...."

For the Dutch National Geodata Infrastructure (PDOK) we are running now (over a 
year) hundreds of OGC WMS/WMTS/WFS servers (primarily Mapserver and Mapproxy) 
on aks (Azure kubernetes).

The challenges you are describing sound very similar to ours :)

To give some small inside on some of our solutions regarding:
deployment: kustomize and operators
cloud optimization: (regarding data) geohashes over GPKG and PostGIS tables and 
getting the data as close to the pods as possible.
load balancing: (infrastructure wise) 'standaard' loadbalancing with traefik, 
(OGC wise) we are working on a 'ogc-webservice-proxy' for determining the 
size/duration/location, basically what is going to be the impact of a request.
resource sharing: (infrastructure) ScaleSets for nodes, and ReplicaSets based 
on load, (data) because we have most of the data in GPKG we 'copy' the data 
around. So for a service that scales up to 4 pods we have 4 of the same GPKG 
copied to that node.

I will share your knowledge with my colleagues, but it seems you reached a lot 
of the same conclusions (so far I have read) we have.

On Fri, Apr 23, 2021 at 5:20 PM Andreas Neumann 
<andr...@qgis.org<mailto:andr...@qgis.org>> wrote:
Hi,

For a small project as part of the Swiss National Geodata Infrastructure (grant 
project) several people worked on a study document called "Cloud-optimized OGC 
WMS Server" where we analyzed problems that can arise when you install an OGC 
web server in the cloud (e.g. docker image deployed via Kubernetes, OpenShift 
or the likes). This work had a focus on QGIS Server with it's own set of 
problems - but some of the issues studied in this document also matter for 
other OGC WMS servers, such as UMN Mapserver or Geoserver, such as the load 
balancing problem, how to share resources, etc.

Here is the link to the document (not in final form yet, but close to being 
final): 
https://docs.google.com/document/d/1cOUWgzalRx7CHWTFgHz6-uyScsCcoaEmYC0VBHdZShQ/edit#heading=h.c7gq4lie7ys2

I wonder if any similar work has been done specifically around problems, 
challenges and solutions when you deploy UMN Mapserver in cloud environments? 
Do you know of any work?

One major problem that probably all installations of an OGC WMS server have is 
how to deploy a more intelligent load balancing system? Often, the default load 
balancer is some kind of round robin load balancer system, but often this leads 
to inferior results where "cheap and short" requests (such as a simple 
GetFeatureInfo or GetLegendGraphics request) can be queued behind a 
long-running GetMap request (potentially with many layers, many features and a 
high-dpi, such as 600dpi, where the request can take several seconds to process.

In our production system we are currently separating the requests to dedicated 
instances for short requests and potentially long requests, to avoid the above 
mentioned scenario, but we are not so satisfied with the solution, as it is  a 
bit inflexible and also a bit harder to maintain. Ideally, we would like to 
have a more intelligent load balancer with incoming queue that holds back 
requests as long as all WMS server instances are busy. This would avoid the 
situation where a "less intelligent" load balancer would simply forward the 
requests to instances based on Round-Robin principle.

Do you know of any work in the UMN Mapserver community regarding cloud 
deployment, cloud optimization, load balancing and resource sharing?

In our study document I'd like to also include the perspective of other WMS 
servers besides QGIS server, so any input would be welcome.

Thanks,
Andreas

--
Andreas Neumann
QGIS.ORG<http://QGIS.ORG> board member (treasurer)
_______________________________________________
mapserver-users mailing list
mapserver-users@lists.osgeo.org<mailto:mapserver-users@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/mapserver-users
_______________________________________________
mapserver-users mailing list
mapserver-users@lists.osgeo.org<mailto:mapserver-users@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/mapserver-users


--

--
Andreas Neumann
QGIS.ORG<http://QGIS.ORG> board member (treasurer)
_______________________________________________
mapserver-users mailing list
mapserver-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/mapserver-users

Reply via email to