[postgis-users] Hardware requirements for a server

2015-02-09 Thread Mathieu Basille

Dear PostGIS users,

I am currently planning to set up a PostGIS instance for my lab. Turns out 
I believe this would be useful for the whole center, so that I'm now 
considering setting up a PostGIS server for everyone—if interest is shared 
of course. At the moment, I am however struggling with what would be 
required in terms of hardware, and of course, the cost will depend on 
that—at the end of the day, it's really a matter of money well spent. I 
have then a series of questions/remarks, and I would welcome any feedback 
from people with existing experience on setting up a multi-user PostGIS server.


* My own experience is rather limited: I used PostGIS quite a bit, but only 
on a desktop, with 2 users. The desktop was quite good (quad-core Xeon, 12 
Go RAM, 500 GB hd), running Debian, and we never had any performance issue 
(although some queries were rather long, but still acceptable).


* The use case I'm envisioning would be (at least in the foreseeable future):
- About 10 faculty users (which means potentially a little bit more 
students using it); I would have hard time considering more than 4 
concurrent users;
- Data would primarily involve a lot (hundreds/thousands) of high 
resolution (spatial and temporal) raster and vector maps, possibly over 
large areas (Florida / USA / continental), as well as potentially millions 
of GPS records (animals individually monitored);
- Queries will primarily involve retrieving points/maps over given 
areas/time, as well as intersecting points over environmental layers; other 
use cases will involve working with steps, i.e. the straight line segment 
connecting two successive locations, and intersecting them with 
environmental layers;


* I couldn't find comprehensive or detailed guidelines on-line about 
hardware, but from what I could see, it seems that memory wouldn't be the 
main issue, but the number of cores would be (one core per database 
connection if I'm not mistaken). At the same time, we want to make sure 
that the experience is smooth for everyone...


* Is there a difference in terms of performance and usability between a 
Linux-based and a MS-based server? My center is unfortunately MS-centered, 
and existing equipment runs with MS systems... It would thus be easier for 
them to set up a MS-based server.


* Does anyone have worked with a server running the DB engine, while the DB 
itself was stored on another box/server? That would likely be the case here 
since we already have a dedicated box for file storage. Along these lines, 
does the system of the file storage box matter (Linux vs. MS)?


* We may also use the server as a workstation to streamline PostGIS 
processing with further R analyses/modeling (or even use R from within the 
database using PL/R). Again, does anyone have experience doing it? Is a 
single workstation the recommended way to work with such workflow? Or would 
it be better (but more costly) to have one server dedicated to PostGIS and 
another one, with different specs, dedicated to analyses (R)?


I realize my questions and comments may be a confusing, likely because of 
the lack of experience about these issues on my side. I really welcome any 
feedback of people working with PostGIS servers in a small unit, or any 
similar setting that could be informative!


In advance, thank you very much!

Sincerely,
Mathieu Basille.


--

~$ whoami
Mathieu Basille
http://ase-research.org/basille

~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314

~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
 -- Paul Éluard

___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [postgis-users] Hardware requirements for a server

2015-02-10 Thread Rémi Cura
Hey,
nice project =)

If you use something like qgis, each user can easily have a dozen
connection open to server, so with 10 users, you may need to use something
like pgpool.

About hardware dimension, it is more  a question for postgres list.

You may stress that your usage is probably mostly read, and that usage will
be spread on a lot of table.
Your storage being external, you may need some good network.
You didn't talk about backup, it is essential (raid, replication, backup
script?).

In my experience (research), it is totally unpractical to use a ms based
server, because all good stuff need to be compiled (sfcgal, geos, gdal,
postgis, plr ...), and it is much more easier on linux.
I solved it by using a virtualbox with ubuntu.

We used a NAS server to store postgres files, although it is was not
recommended. It worked very well over the gigabit ethernet.

About pl/r or pl/python, I used both (tough much more plpython).
For my settings the best is small function in pl language (by small I mean
not much memory and not too long (like max few minutes)) , big function
(like controlling your whole process, 60 hour computing, etc) in R or
python with postgres connector.
Or , another rule of thumb : if it fit naturally into a transaction, in pl
language, if it is bigger, python or R.

Having a dedicated R server would enable to use something like Shiny (R web
applet).

Cheers,
Rémi-C



2015-02-10 5:07 GMT+01:00 Mathieu Basille :

> Dear PostGIS users,
>
> I am currently planning to set up a PostGIS instance for my lab. Turns out
> I believe this would be useful for the whole center, so that I'm now
> considering setting up a PostGIS server for everyone—if interest is shared
> of course. At the moment, I am however struggling with what would be
> required in terms of hardware, and of course, the cost will depend on
> that—at the end of the day, it's really a matter of money well spent. I
> have then a series of questions/remarks, and I would welcome any feedback
> from people with existing experience on setting up a multi-user PostGIS
> server.
>
> * My own experience is rather limited: I used PostGIS quite a bit, but
> only on a desktop, with 2 users. The desktop was quite good (quad-core
> Xeon, 12 Go RAM, 500 GB hd), running Debian, and we never had any
> performance issue (although some queries were rather long, but still
> acceptable).
>
> * The use case I'm envisioning would be (at least in the foreseeable
> future):
> - About 10 faculty users (which means potentially a little bit more
> students using it); I would have hard time considering more than 4
> concurrent users;
> - Data would primarily involve a lot (hundreds/thousands) of high
> resolution (spatial and temporal) raster and vector maps, possibly over
> large areas (Florida / USA / continental), as well as potentially millions
> of GPS records (animals individually monitored);
> - Queries will primarily involve retrieving points/maps over given
> areas/time, as well as intersecting points over environmental layers; other
> use cases will involve working with steps, i.e. the straight line segment
> connecting two successive locations, and intersecting them with
> environmental layers;
>
> * I couldn't find comprehensive or detailed guidelines on-line about
> hardware, but from what I could see, it seems that memory wouldn't be the
> main issue, but the number of cores would be (one core per database
> connection if I'm not mistaken). At the same time, we want to make sure
> that the experience is smooth for everyone...
>
> * Is there a difference in terms of performance and usability between a
> Linux-based and a MS-based server? My center is unfortunately MS-centered,
> and existing equipment runs with MS systems... It would thus be easier for
> them to set up a MS-based server.
>
> * Does anyone have worked with a server running the DB engine, while the
> DB itself was stored on another box/server? That would likely be the case
> here since we already have a dedicated box for file storage. Along these
> lines, does the system of the file storage box matter (Linux vs. MS)?
>
> * We may also use the server as a workstation to streamline PostGIS
> processing with further R analyses/modeling (or even use R from within the
> database using PL/R). Again, does anyone have experience doing it? Is a
> single workstation the recommended way to work with such workflow? Or would
> it be better (but more costly) to have one server dedicated to PostGIS and
> another one, with different specs, dedicated to analyses (R)?
>
> I realize my questions and comments may be a confusing, likely because of
> the lack of experience about these issues on my side. I really welcome any
> feedback of people working with PostGIS servers in a small unit, or any
> similar setting that could be informative!
>
> In advance, thank you very much!
>
> Sincerely,
> Mathieu Basille.
>
>
> --
>
> ~$ whoami
> Mathieu Basille
> http://ase-research.org/basille
>
> ~$ locate --detail

Re: [postgis-users] Hardware requirements for a server

2015-02-10 Thread James Keener
> You didn't talk about backup, it is essential (raid, replication, backup
> script?).

I'm sorry to be "that guy": RAID is NOT a backup. Unless you snapshot
your replication machine, replication isn't a backup either (though,
like RAID, it enables High Availability (HA)).

For lack of a precise definition, a backup should allow to recover from:

* OH CRAP! WE DELETED A FILE YESTERDAY/LAST WEEK AND NEED IT BACK!
* OH CRAP! WE CHANGED A FILE YESTERDAY/LAST WEEK AND NEED IT BACK!
* OH CRAP! THE OFFICE NO LONGER EXISTS -- WE NEED NEW SERVERS NOW!

(Also remember that that "WE" could be an upset or rouge employee.
Access control is essential!)

But to the OP, I do agree with Parent, backups are _ESSENTIAL_.

A script that runs daily and simply does pg_dump, encrypts (if
necessary/desired), and uploads to e.g. S3 each day will be invaluable
to you when you actually need it. I offer that as a very simple example.
 If your organization already has a backup process, I would suggest
trying to tie into that

Jim



signature.asc
Description: OpenPGP digital signature
___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [postgis-users] Hardware requirements for a server

2015-02-10 Thread Mathieu Basille

Hey Rémi,

Thanks for the feedback! Exactly the type of information I'm looking for. 
Before commenting further, if anyone has another experience about such a 
multi-user PostGIS server in a small unit, please feel free to share! The 
more I get from actual users, with different experience and different 
settings, the better!


Let me clear this one right away: I did not talk about backups because I 
didn't think this was an issue here! I'm lucky enough that my center is 
actually backing every server daily and monthly on tapes, with also another 
backup that will come in addition to it at the university level...
Thanks to James for the warning (I know about RAID, never uses it as a 
backup solution!), I will also look at pg_dump and see how we can make it 
part of their backup process.
Network is not an issue either. The same guys than above make sure the 
network is good too.


This said, here are a couple of additional comments:

* I didn't know about pgpool. It looks like it may come in handy if there 
connections become sluggish or simply impossible due to too many users. I 
will definitely keep this under my hat, although I understand it is a 
UNIX-only solution.


* You suggest that I should ask about hardware requirements on the postgres 
list directly. The reason I didn't in the first place was that I thought 
that computer demands would be different using PostGIS, because of the very 
nature of the data (i.e. GIS layers that can be really big). But maybe I'm 
thinking about it in a wrong way... I will nevertheless try to see what 
people there have to say about it (and let people here know!).


* About usage being mostly read: this will be true for most "pure GIS" 
tasks (mostly intersecting), but I find that (from experience), we usually 
end up with a lot of intermediary tables for our analyses (new tables for 
the most part, not new columns).


* About MS vs. Linux based servers: same here, as long as the IT deal with 
it, I would be inclined to say this is not an issue (or at least this is 
not mine!). I agree though that, for a personal use (i.e. computer based in 
my lab), Linux systems are much easier to deal with (all my computers 
actually run Debian). But that's one the reasons I want a server in the 
center: it would come with a sysadmin who would deal with the hassle... I 
was thinking more in terms of possibilities here (remote access, linking 
PostGIS with R, etc.); in other words, do we lose anything *as a user* with 
a MS server?


* I knew about Shiny—although I never used it. I think this is not the 
focus for the moment, the server (both PostGIS and R if we can make it) 
would be 100% research/analysis oriented. For instance, I'm not considering 
(yet) a visualization solution either (CartoDB...). Thanks for your 
additional suggestions on linking R and PostGIS, very useful.


Finally, may I ask you about your own setup (number of users, typical use 
cases, hardware specs, etc.)? It would probably help me.


Thank you,
Mathieu.


Le 10/02/2015 04:39, Rémi Cura a écrit :

Hey,
nice project =)

If you use something like qgis, each user can easily have a dozen
connection open to server, so with 10 users, you may need to use something
like pgpool.

About hardware dimension, it is more  a question for postgres list.

You may stress that your usage is probably mostly read, and that usage will
be spread on a lot of table.
Your storage being external, you may need some good network.
You didn't talk about backup, it is essential (raid, replication, backup
script?).

In my experience (research), it is totally unpractical to use a ms based
server, because all good stuff need to be compiled (sfcgal, geos, gdal,
postgis, plr ...), and it is much more easier on linux.
I solved it by using a virtualbox with ubuntu.

We used a NAS server to store postgres files, although it is was not
recommended. It worked very well over the gigabit ethernet.

About pl/r or pl/python, I used both (tough much more plpython).
For my settings the best is small function in pl language (by small I mean
not much memory and not too long (like max few minutes)) , big function
(like controlling your whole process, 60 hour computing, etc) in R or
python with postgres connector.
Or , another rule of thumb : if it fit naturally into a transaction, in pl
language, if it is bigger, python or R.

Having a dedicated R server would enable to use something like Shiny (R web
applet).

Cheers,
Rémi-C



2015-02-10 5:07 GMT+01:00 Mathieu Basille mailto:basille@ase-research.org>>:

Dear PostGIS users,

I am currently planning to set up a PostGIS instance for my lab. Turns
out I believe this would be useful for the whole center, so that I'm
now considering setting up a PostGIS server for everyone—if interest is
shared of course. At the moment, I am however struggling with what
would be required in terms of hardware, and of course, the cost will
depend on that—at the end of the day, it's really a mat

Re: [postgis-users] Hardware requirements for a server

2015-02-10 Thread George Silva
Others gave lots of good feedback, but let me chime in.


> * I didn't know about pgpool. It looks like it may come in handy if there
> connections become sluggish or simply impossible due to too many users. I
> will definitely keep this under my hat, although I understand it is a
> UNIX-only solution.
>

Essential in a multiuser environment.


> * About usage being mostly read: this will be true for most "pure GIS"
> tasks (mostly intersecting), but I find that (from experience), we usually
> end up with a lot of intermediary tables for our analyses (new tables for
> the most part, not new columns).
>

New tables and intersections or process that go somewhat like Geoprocessing
in ArcGIS (execute step 1, store the result in A, process B with A, write
in C, process C with D to get E) means that you will have a lot of IO going
on. If you have 10 students crunching numbers in PostGIS writing new
results together, this will mean significant IO. Get fast disks. 15k RPM,
10k RPM or SSD. This can get your price tag to get a bit high quickly.

Mind that you can tune PostgreSQL to store you indexes in faster disks,
allowing you to store the bulk of your data on slower disks. If you have
the money, go with the fastest disks anywhere you can.


>
> * About MS vs. Linux based servers: same here, as long as the IT deal with
> it, I would be inclined to say this is not an issue (or at least this is
> not mine!). I agree though that, for a personal use (i.e. computer based in
> my lab), Linux systems are much easier to deal with (all my computers
> actually run Debian). But that's one the reasons I want a server in the
> center: it would come with a sysadmin who would deal with the hassle... I
> was thinking more in terms of possibilities here (remote access, linking
> PostGIS with R, etc.); in other words, do we lose anything *as a user* with
> a MS server?
>

There are some articles (sorry, forgot where) that clearly state that
PostgreSQL performs much better in linux systems. Looks like not so much
these days. Check:

http://stackoverflow.com/questions/8368924/postgresql-performance-on-windows-using-latest-versions

The good parts is that it's much easier, at least for me to admin services
and everything PostgreSQL need on linux. It's easier to install properly,
easier to configure it properly, amongst others. This is a personal
preference. If you can, go with linux (IMHO).


> Finally, may I ask you about your own setup (number of users, typical use
> cases, hardware specs, etc.)? It would probably help me.
>

I have a company with 12 concurrent QGIS users, editing around 50.000km2 of
land use coverage in 1:5000 scale. It has a lot of detail. We do more
editing then processing, but there are some heavy stuff, such as "complete
feature" tool from QGIS, which in BIG geometries can take some time.

My setup is:

Bare metal machine with ESXI;
PostgreSQL machine with 8Gb RAM;
2x processors quad-core;
PostgreSQL tuned for fast reads, large, large cache;
pgPool;
disks: 2 7.200RPM disks (I'm not at the office and I don't really remember
this, but I think this is 7.200rpm) - with RAID 1.

Data is spread in two databases - meaning two non contiguous projects. Each
table has around 25k km2.

We backup each database to a separate dump and upload it to rackspace.
___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [postgis-users] Hardware requirements for a server

2015-02-11 Thread Rémi Cura
Own setup : dell precision 2*6 cores, 20 go RAM, SSD for index, 7200 HDD
for big tables.

I do very different stuff with the server :
 - base for visualisation of classical vector table (Usually few users
mostly reading.)
 - base for automatic road generation (topology + very very complex query)
 - interactive editing of road generation ( several users reading/writing +
custom triggers)
 - point cloud server  for
point cloud import/export/visu
 - in base point cloud processing
,
with lot's of python
 - in base point cloud classification

(submitted, not yet accepted)

Almost each of this usage has a different profile (simply displaying a
vector table in qgis vs managing Billions of points ...)

Cheers,
Rémi-C

2015-02-10 21:15 GMT+01:00 George Silva :

> Others gave lots of good feedback, but let me chime in.
>
>
>> * I didn't know about pgpool. It looks like it may come in handy if there
>> connections become sluggish or simply impossible due to too many users. I
>> will definitely keep this under my hat, although I understand it is a
>> UNIX-only solution.
>>
>
> Essential in a multiuser environment.
>
>
>> * About usage being mostly read: this will be true for most "pure GIS"
>> tasks (mostly intersecting), but I find that (from experience), we usually
>> end up with a lot of intermediary tables for our analyses (new tables for
>> the most part, not new columns).
>>
>
> New tables and intersections or process that go somewhat like
> Geoprocessing in ArcGIS (execute step 1, store the result in A, process B
> with A, write in C, process C with D to get E) means that you will have a
> lot of IO going on. If you have 10 students crunching numbers in PostGIS
> writing new results together, this will mean significant IO. Get fast
> disks. 15k RPM, 10k RPM or SSD. This can get your price tag to get a bit
> high quickly.
>
> Mind that you can tune PostgreSQL to store you indexes in faster disks,
> allowing you to store the bulk of your data on slower disks. If you have
> the money, go with the fastest disks anywhere you can.
>
>
>>
>> * About MS vs. Linux based servers: same here, as long as the IT deal
>> with it, I would be inclined to say this is not an issue (or at least this
>> is not mine!). I agree though that, for a personal use (i.e. computer based
>> in my lab), Linux systems are much easier to deal with (all my computers
>> actually run Debian). But that's one the reasons I want a server in the
>> center: it would come with a sysadmin who would deal with the hassle... I
>> was thinking more in terms of possibilities here (remote access, linking
>> PostGIS with R, etc.); in other words, do we lose anything *as a user* with
>> a MS server?
>>
>
> There are some articles (sorry, forgot where) that clearly state that
> PostgreSQL performs much better in linux systems. Looks like not so much
> these days. Check:
>
>
> http://stackoverflow.com/questions/8368924/postgresql-performance-on-windows-using-latest-versions
>
> The good parts is that it's much easier, at least for me to admin services
> and everything PostgreSQL need on linux. It's easier to install properly,
> easier to configure it properly, amongst others. This is a personal
> preference. If you can, go with linux (IMHO).
>
>
>> Finally, may I ask you about your own setup (number of users, typical use
>> cases, hardware specs, etc.)? It would probably help me.
>>
>
> I have a company with 12 concurrent QGIS users, editing around 50.000km2
> of land use coverage in 1:5000 scale. It has a lot of detail. We do more
> editing then processing, but there are some heavy stuff, such as "complete
> feature" tool from QGIS, which in BIG geometries can take some time.
>
> My setup is:
>
> Bare metal machine with ESXI;
> PostgreSQL machine with 8Gb RAM;
> 2x processors quad-core;
> PostgreSQL tuned for fast reads, large, large cache;
> pgPool;
> disks: 2 7.200RPM disks (I'm not at the office and I don't really remember
> this, but I think this is 7.200rpm) - with RAID 1.
>
> Data is spread in two databases - meaning two non contiguous projects.
> Each table has around 25k km2.
>
> We backup each database to a separate dump and upload it to rackspace.
>
> ___
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
___
postgis-users mailing list
postgis-users@lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [postgis-users] Hardware requirements for a server

2015-02-11 Thread Mathieu Basille
Thanks to everyone who contributed to this thread, either on the PostGIS 
[1] or the PostgreSQL [2] mailing lists. I will try to summarize everything 
in this message, which I will actually post on both lists to give an update 
to everyone. I hope it can be useful for other people interested. Please 
feel free to add more advice and other experiences, this is always useful!



Performance
===

* CPU
Good CPU required for faster processing. Number of cores helps in parallel 
processing, but number of users != number of queries (Example: with no more 
than 4 concurrent users, it should be fine with a single quad-core CPU).


* Memory
Examples go from 8 to >32 GB RAM.

* Disks
Lots of I/0 with geoprocessing requires fast disks: best with SSD, 
otherwise 10k/15k RPM. An alternative would be to store indexes on faster 
disks and data on slower disks (need to tune PostgreSQL).
Better to have direct-attached storage (DAS), i.e. on the server directly 
(direct transfer between RAM and disks); external storage requires good 
network (additional RAM increases performance).


* Massive multi-user environment (lot of simultaneous connections): pgpool 
[3] (Linux/UNIX only). pgpool can be added later on, no need to worry about 
it as a start.



Platform


Linux is the platform of choice:
* Easier administration (install/configuration/upgrade), which is also true 
for addons/dependencies (starting with PostGIS, but also GEOS, GDAL, PL/R);

* Better performance [4];
* More tuning options (limited with MS systems);

There is still the possibility of a virtualbox on a MS server.


Other considerations


* Backup: integrate a script that runs daily pg_dump daily to export and 
upload DB to storage box (which is part of the backup system)


* Integration with R: a dedicated R server brings more flexibility / 
extensions (e.g. Shiny) / performance (more cores and memory available for 
PostGIS) except if data transfer is the bottleneck. Use Pl/R for small 
functions (also if it fits naturally into PostgreSQL workflow) / otherwise 
in R with PostgreSQL connector.



Example setups
==

* Dell Precision 2×6 cores, 20 GB RAM, SSD for indexes, 7200 HDD for big 
tables [Rémi Cura]:
Various usages, from visualization (few users) to complex queries with a 
lot of reading/writing (several users).


* Bare metal machine with ESXI; PostgreSQL machine with 8Gb RAM; 2 
quad-core processors; PostgreSQL tuned for fast reads, with large cache; 
pgPool; disks: 2 7.200RPM disks - with RAID 1 [George Silva]:
12 concurrent QGIS users, editing around 50.000 km² of land use coverage in 
1:5000 scale with lot of detail (in two separate DB). More editing than 
processing; some heavy queries (e.g. "complete feature" tool from QGIS) can 
take some time.


* Nominatim (OpenStreetMap data) [5]: > 1 GB RAM necessary, >32 GB 
recommended; 700 GB HDD; SSD recommended; example machine: 12-core with 
32GB RAM and standard SATA disks, I/O limiting factor.


Thanks again for the good feedback! This gives me very useful information 
to get started (I think this is still going to be a long process).


Mathieu Basille.


[1] http://lists.osgeo.org/pipermail/postgis-users/2015-February/040120.html

[2] http://www.postgresql.org/message-id/54daa7d9.8020...@ase-research.org

[3] http://www.pgpool.net/

[4] 
https://stackoverflow.com/questions/8368924/postgresql-performance-on-windows-using-latest-versions


[5] http://wiki.openstreetmap.org/wiki/Nominatim/Installation


Le 09/02/2015 23:07, Mathieu Basille a écrit :

Dear PostGIS users,

I am currently planning to set up a PostGIS instance for my lab. Turns out
I believe this would be useful for the whole center, so that I'm now
considering setting up a PostGIS server for everyone—if interest is shared
of course. At the moment, I am however struggling with what would be
required in terms of hardware, and of course, the cost will depend on
that—at the end of the day, it's really a matter of money well spent. I
have then a series of questions/remarks, and I would welcome any feedback
from people with existing experience on setting up a multi-user PostGIS
server.

* My own experience is rather limited: I used PostGIS quite a bit, but only
on a desktop, with 2 users. The desktop was quite good (quad-core Xeon, 12
Go RAM, 500 GB hd), running Debian, and we never had any performance issue
(although some queries were rather long, but still acceptable).

* The use case I'm envisioning would be (at least in the foreseeable future):
- About 10 faculty users (which means potentially a little bit more
students using it); I would have hard time considering more than 4
concurrent users;
- Data would primarily involve a lot (hundreds/thousands) of high
resolution (spatial and temporal) raster and vector maps, possibly over
large areas (Florida / USA / continental), as well as potentially millions
of GPS records (animals individually monitored);
- Queries will primarily involve retrie