Re: [postgis-users] Linux geocoder script ?
I was running out of hard disk space on my root partition. I have, for now, the geocoder database on a separate partition and hard drive. I was debugging and capturing the output of every query and command. So it generated a huge file in the gigabyte range especially when each query was aborted because it was not UTF-8. The iconv command appears to be working, albeit though twice. I will be posting a bug report with an attached patch shortly. I think I deleted the windows section, so if you patch it with that, you will have to add that back. The iconv is not in the patch. I think it will probably be a while before most distributions have the new version of shp2pgsql. I thought that I had the latest but it seems it is only available from subversion at this time. It is really nice having my own geocoder and tiger2010 database. Thanks for everyones help. I can also use the data in my favorite GIS program qgis to draw maps, and even my own mapserver. On 04/13/2011 07:15 PM, Sylvain Racine wrote: Hello, I saw a little error in your script. If you pipe your output to iconv, you have to remove -W latin1 from shp2pgsql command. You should have $PGBIN/shp2pgsql -c -s 4269 -g the_geom tl_2010_27_county10.dbf tiger_staging.mn_county10 | iconv -f latin1 -t UTF8 | $PGBIN/psql -d $PGDATABASE If you don't remove the -W flag, your data will be converted twice! About your running out of disk space, are you sure that you don't lack of RAM memory instead? I am surprise you ran out of disk space with this command. I think that the output from shp2pgsql is put in RAM memory and then, send to iconv in one shot. If your RAM memory is low, it is possible that the extra data are put in a temporary file on your disk drive (in the swap partition I think). But this temp file indicated that you first lack of RAM memory. I am not complete sure of my theory. But in my mind, I think Linux works like that. If you want to know the amount of memory needed by this script, type this: $PGBIN/shp2pgsql -c -s 4269 -g the_geom tl_2010_27_county10.dbf tiger_staging.mn_county10 temp.sql temp.sql iconv -f latin1 -t UTF8 anothertemp.sql anothertemp.sql $PGBIN/psql -d $PGDATABASE The size of the max file between temp.sql and anothertemp.sql is the minimun amount of RAM you need to run this command. Note: The above script is slower than the first one because you write you data on your disk drive between each operation. SATA or IDE disk access is always slower than RAM access, except if you use SSD disk. Hope it will be useful. Sylvain Racine On 13/04/2011 05:40, Don wrote: Here is the version: RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: shp2pgsql-core.h 6358 2010-12-13 20:09:26Z pramsey $) USAGE: shp2pgsql [options] shapefile [schema.]table I had tried the iconv fix which is a great idea. It seemed to work, but I ran out of disk space. $PGBIN/shp2pgsql -c -s 4269 -g the_geom -W latin1 tl_2010_27_county10.dbf tiger_staging.mn_county10 | iconv -f latin1 -t UTF8 | $PGBIN/psql -d $PGDATABASE So I have been spending time on creating more disk space instead of fixing this. I would like to share the patches that I have so that others can try to improve it especially those who are more familiar with the tiger2010 file formats. It would be nice to have a working tiger2010 decoder for linux in the next postgis release. I don't remember offhand which state had the utf problem. Several of them did. I would run ./tiger_load del with a set -x in the file for debugging. (You can then search that file for aborted and backtrack and find out which file it was processing.) It got rather large and so did my log files. I was trying to get more info on this when I ran out of disk space. I could not even vaccuum any databases. I see many large files in my postgres directories and wonder what they are all for. I have been using 2 different database clusters one on a new larger drive for the geocoder. It seems that my cluster on the small drive has a lot of files on it thought that are taking up a lot of space. On 04/13/2011 01:32 AM, Paragon Corporation wrote: Don, Which state were you processing? I can check it out and see if I get similar errors on my shp2pgsql. You could be right and the file just isn't Latin1. The regress test did seem to pass for me once that ticket was fixed. Also to confirm you are running the latest version of shp2pgsql If you run shp2pgsql from commandline, it should output the version. Mine for example reads RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: shp2pgsql-core.h 6925 2011-03-18 16:24:33Z pramsey $) The version unfortunately isn't quite accurate since its evidentally looking at the .h file instead of .c file. So though my version says 6925, its really 6932 or later. http://trac.osgeo.org/postgis/changeset/6932 Hope that helps, Regina http://www.postgis.us
Re: [postgis-users] (no subject)
On 12/04/11 17:38, Paul Caroline Lewis wrote: Hi, Thank you Mark and Richard for your replies. Having looked at this it seems a Full Vacuum is the answer, however I'm not sure why. Processing the SQL scripts as originall reported I do get a large table from TestSet1 and a small table from TestSet2. Once a Full vacuum is performed on the large table from TestSet1 its size drops to the same as the small table from TestS2, however adding a full vacuum into the TestSet1 procedure makes it slower to run than TestSet2, very much slower especially on uploading the very large data sets (70 mill rows). This begs the question is TestSet2 very efficient or is it missing something fundamental that a Full Vacuum provides that I'm not realising at the moment. That's strange - do you see the same behaviour if you swap the order of the data load, i.e. do the ordered data set first, and/or use a different table name for each load? I'm just wondering if you're seeing some kind of database bloat if VACUUM fixes the issue. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
Re: [postgis-users] (no subject)
Hi Paul, I'm not expert on Postgres so, maybe I should remain in silence? Anyway, I know Oracle well and I know that the physical organization of your data may be highly relevant to performance. Maybe you could prepare all the data you need on a temporary table and in the end create another table from the temporary one. In my case, I load longitude/latitude and I create geography and geometry fields. In the end, I create another table, all indexes and finally I get rid of the temporary table. The big advantage of this process is that the table I'm creating in read-only, so, the better organized the data is, the better. Below you can see an example with some useful comments. I hope it helps somehow. -- working with a temporaty table CREATE TEMPORARY TABLE tmp ( location character varying(100) NOT NULL, easting integer NOT NULL, northing integer NOT NULL, longitude double precision NOT NULL, latitude double precision NOT NULL ) WITH ( OIDS=FALSE ); COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER ',' CSV HEADER; ALTER TABLE tmp ADD COLUMN geometry GEOMETRY; UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting || ' ' || northing || ')' , 97405 ); COMMIT; ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326); UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' || longitude || ' ' || latitude || ')' ); COMMIT; -- now create the table I really need drop table location; CREATE TABLE location AS SELECT location, easting, northing, longitude, latitude, geometry, geography FROM tmp; -- actually, I would like to avoid this step below. I should really get rid of this: ALTER TABLE location ADD COLUMN id SERIAL PRIMARY KEY; -- I could change the tool which generates the CSV file and I could generate the primary key right in place. -- Once this table location is read-only, I could simply generate the primary at load time and get rid of this step. -- I suppose it would be better with regards data organization (Not sure! I'm not an expert on Postgres). VACUUM ANALYZE location; CREATE UNIQUE INDEX location_akey ON location ( upper(location) ); CREATE INDEX location_geometry_akey ON location USING gist(geometry); CREATE INDEX location_geography_akey ON location USING gist(geography); Richard Gomes http://www.jquantlib.org/index.php/User:RichardGomes twitter: frgomes JQuantLib is a library for Quantitative Finance written in Java. http://www.jquantlib.com/ twitter: jquantlib On 14/04/11 10:37, Mark Cave-Ayland wrote: On 12/04/11 17:38, Paul Caroline Lewis wrote: Hi, Thank you Mark and Richard for your replies. Having looked at this it seems a Full Vacuum is the answer, however I'm not sure why. Processing the SQL scripts as originall reported I do get a large table from TestSet1 and a small table from TestSet2. Once a Full vacuum is performed on the large table from TestSet1 its size drops to the same as the small table from TestS2, however adding a full vacuum into the TestSet1 procedure makes it slower to run than TestSet2, very much slower especially on uploading the very large data sets (70 mill rows). This begs the question is TestSet2 very efficient or is it missing something fundamental that a Full Vacuum provides that I'm not realising at the moment. That's strange - do you see the same behaviour if you swap the order of the data load, i.e. do the ordered data set first, and/or use a different table name for each load? I'm just wondering if you're seeing some kind of database bloat if VACUUM fixes the issue. ATB, Mark. ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] shp2pgsql - bad date formats
I'm trying to load a shapefile into a PostGIS database, using shp2pgsql and psql. It's failing because a couple of the sql insert statements have date fields containing '0'. ERROR: date/time field value out of range: 0 HINT: Perhaps you need a different datestyle setting. What I need is for these features to be inserted with NULL date fields, instead of '0'. Currently, I'm running psql on the generated .sql file, capturing the output. I then search the output to identify the problem statement, delete it from the generated file, and then run psql on it again, until there are no errors. I save the lines I deleted in another file, and then hand-edit them, and insert them separately. This is tiresome, tedious, and slow. Anyone have any better ideas? ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] Return point Lat, Long of ST_Distance(a.geog, b.geog)
All, I am wondering if it is possible to return the lat, long coordinates of the point that the ST_Distance(a.geog, b.geog) function basis its distance calculation on. Essentially, I'm after ST_ClosestPoint(), but using geography instead of geometry. thanks, Anthony Anthony Lopez | GIS Analyst Strategic Energy Analysis Center National Renewable Energy Laboratory 1617 Cole Blvd Golden, Colorado 80401 ph 303.275.3654 anthony.lo...@nrel.govmailto:anthony.lo...@nrel.gov | www.nrel.govhttp://www.nrel.gov/ ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users
[postgis-users] Specify vertices in ST_Buffer(geog)?
All, Does anyone know if it's possible to specify the number of vertices in ST_Buffer(geog)? If not, is there a way around this? It seems pretty limiting to this geography function... thanks, Anthony Anthony Lopez | GIS Analyst Strategic Energy Analysis Center National Renewable Energy Laboratory 1617 Cole Blvd Golden, Colorado 80401 ph 303.275.3654 anthony.lo...@nrel.govmailto:anthony.lo...@nrel.gov | www.nrel.govhttp://www.nrel.gov/ ___ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users