Re: [postgis-users] Linux geocoder script ?

2011-04-14 Thread Don
I was running out of hard disk space on my root partition.  I have, 
for now, the geocoder database on a separate partition and hard drive.
I was debugging and capturing the output of every query and command.  So 
it generated a huge file in the gigabyte range especially when each 
query was aborted because it was not UTF-8.  The iconv command appears 
to be working, albeit though twice.  I will be posting a bug report with 
an attached patch shortly.  I think I deleted the windows section, so if 
you patch it with that, you will have to add that back.  The iconv is 
not in the patch.  I think it will probably be a while before most 
distributions have the new version of shp2pgsql.   I thought that I had 
the latest but it seems it is only available from subversion at this time.
It is really nice having my own geocoder and tiger2010 database.  
Thanks for everyones help.  I can also use the data in my favorite GIS 
program qgis to draw maps, and even my own mapserver.


On 04/13/2011 07:15 PM, Sylvain Racine wrote:

Hello,

I saw a little error in your script. If you pipe your output to 
iconv, you have to remove -W latin1 from shp2pgsql command. You 
should have
$PGBIN/shp2pgsql -c -s 4269 -g the_geom tl_2010_27_county10.dbf 
tiger_staging.mn_county10 | iconv -f latin1 -t UTF8 | $PGBIN/psql -d 
$PGDATABASE


If you don't remove the -W flag, your data will be converted twice!

About your running out of disk space, are you sure that you don't lack 
of RAM memory instead? I am surprise you ran out of disk space with 
this command. I think that the output from shp2pgsql is put in RAM 
memory and then, send to iconv in one shot. If your RAM memory is low, 
it is possible that the extra data are put in a temporary file on your 
disk drive (in the swap partition I think). But this temp file 
indicated that you first lack of RAM memory. I am not complete sure of 
my theory. But in my mind, I think Linux works like that.


If you want to know the amount of memory needed by this script, type this:

$PGBIN/shp2pgsql -c -s 4269 -g the_geom tl_2010_27_county10.dbf 
tiger_staging.mn_county10  temp.sql

temp.sql  iconv -f latin1 -t UTF8  anothertemp.sql
anothertemp.sql  $PGBIN/psql -d $PGDATABASE
The size of the max file between temp.sql and anothertemp.sql is the 
minimun amount of RAM you need to run this command.


Note: The above script is slower than the first one because you write 
you data on your disk drive between each operation. SATA or IDE disk 
access is always slower than RAM access, except if you use SSD disk.


Hope it will be useful.

Sylvain Racine

On 13/04/2011 05:40, Don wrote:

Here is the version:
RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: 
shp2pgsql-core.h 6358 2010-12-13 20:09:26Z pramsey $)

USAGE: shp2pgsql [options] shapefile [schema.]table

I had tried the iconv fix which is a great idea. It seemed to work, 
but I ran out of disk space.
$PGBIN/shp2pgsql -c -s 4269 -g the_geom   -W latin1 
tl_2010_27_county10.dbf tiger_staging.mn_county10 | iconv -f latin1 
-t UTF8 | $PGBIN/psql -d $PGDATABASE
So I have been spending time on creating more disk space instead of 
fixing this.
I would like to share the patches that I have so that others can try 
to improve it especially those who are more familiar with the 
tiger2010 file formats.
It would be nice to have a working tiger2010 decoder for linux in the 
next postgis release.


I don't remember offhand which state had the utf problem.  Several of 
them did.
I would run ./tiger_load  del with a set -x in the file for 
debugging.  (You can then search that file for aborted and 
backtrack and find out which file it was processing.)  It got rather 
large and so did my log files.
I was trying to get more info on this when I ran out of disk space.  
I could not even vaccuum any databases.  I see many large files in my 
postgres directories and wonder what they are all for.  I have been 
using 2 different database clusters one on a new larger drive for the 
geocoder.  It seems that my cluster on the small drive has a lot of 
files on it thought that are taking up a lot of space.


On 04/13/2011 01:32 AM, Paragon Corporation wrote:

Don,
Which state were you processing?  I can check it out and see if I 
get similar errors on my shp2pgsql.  You could be right and the file 
just isn't Latin1.

The regress test did seem to pass for me once that ticket was fixed.
Also to confirm you are running the latest version of shp2pgsql
If you run
shp2pgsql from commandline, it should output the version.  Mine for 
example reads
RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: 
shp2pgsql-core.h 6925 2011-03-18 16:24:33Z pramsey $)
The version unfortunately isn't quite accurate since its evidentally 
looking at the .h file instead of .c file.  So though my version 
says 6925, its really

6932 or later.
http://trac.osgeo.org/postgis/changeset/6932
Hope that helps,
Regina
http://www.postgis.us

Re: [postgis-users] (no subject)

2011-04-14 Thread Mark Cave-Ayland

On 12/04/11 17:38, Paul  Caroline Lewis wrote:


Hi,
Thank you Mark and Richard for your replies. Having looked at this it
seems a Full Vacuum is the answer, however I'm not sure why. Processing
the SQL scripts as originall reported I do get a large table from
TestSet1 and a small table from TestSet2. Once a Full vacuum is
performed on the large table from TestSet1 its size drops to the same as
the small table from TestS2, however adding a full vacuum into the
TestSet1 procedure makes it slower to run than TestSet2, very much
slower especially on uploading the very large data sets (70 mill rows).
This begs the question is TestSet2 very efficient or is it missing
something fundamental that a Full Vacuum provides that I'm not realising
at the moment.


That's strange - do you see the same behaviour if you swap the order of 
the data load, i.e. do the ordered data set first, and/or use a 
different table name for each load? I'm just wondering if you're seeing 
some kind of database bloat if VACUUM fixes the issue.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


Re: [postgis-users] (no subject)

2011-04-14 Thread Richard Gomes

Hi Paul,

I'm not expert on Postgres so, maybe I should remain in silence?
Anyway, I know Oracle well and I know that the physical organization of 
your data may be highly relevant to performance.


Maybe you could prepare all the data you need on a temporary table and 
in the end create another table from the temporary one. In my case, I 
load longitude/latitude and I create geography and geometry fields. In 
the end, I create another table, all indexes and finally I get rid of 
the temporary table. The big advantage of this process is that the table 
I'm creating in read-only, so, the better organized the data is, the 
better. Below you can see an example with some useful comments. I hope 
it helps somehow.




-- working with a temporaty table

CREATE TEMPORARY TABLE tmp
(
  location  character varying(100) NOT NULL,
  easting   integer NOT NULL,
  northing  integer NOT NULL,
  longitude double precision NOT NULL,
  latitude  double precision NOT NULL
)
WITH (
  OIDS=FALSE
);

COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER ',' 
CSV HEADER;


ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;
UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting || ' 
' || northing || ')' , 97405 );

COMMIT;

ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);
UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' || longitude 
|| ' ' || latitude || ')' );

COMMIT;


-- now create the table I really need

drop table location;
CREATE TABLE location AS SELECT location, easting, northing, longitude, 
latitude, geometry, geography FROM tmp;


-- actually, I would like to avoid this step below. I should really get 
rid of this:

ALTER TABLE location ADD COLUMN  id SERIAL PRIMARY KEY;
-- I could change the tool which generates the CSV file and I could 
generate the primary key right in place.
-- Once this table location is read-only, I could simply generate the 
primary at load time and get rid of this step.
-- I suppose it would be better with regards data organization (Not 
sure! I'm not an expert on Postgres).



VACUUM ANALYZE location;

CREATE UNIQUE INDEX location_akey ON location ( upper(location) );
CREATE INDEX location_geometry_akey  ON location USING gist(geometry);
CREATE INDEX location_geography_akey ON location USING gist(geography);


Richard Gomes
http://www.jquantlib.org/index.php/User:RichardGomes
twitter: frgomes

JQuantLib is a library for Quantitative Finance written in Java.
http://www.jquantlib.com/
twitter: jquantlib


On 14/04/11 10:37, Mark Cave-Ayland wrote:

On 12/04/11 17:38, Paul  Caroline Lewis wrote:


Hi,
Thank you Mark and Richard for your replies. Having looked at this it
seems a Full Vacuum is the answer, however I'm not sure why. Processing
the SQL scripts as originall reported I do get a large table from
TestSet1 and a small table from TestSet2. Once a Full vacuum is
performed on the large table from TestSet1 its size drops to the same as
the small table from TestS2, however adding a full vacuum into the
TestSet1 procedure makes it slower to run than TestSet2, very much
slower especially on uploading the very large data sets (70 mill rows).
This begs the question is TestSet2 very efficient or is it missing
something fundamental that a Full Vacuum provides that I'm not realising
at the moment.


That's strange - do you see the same behaviour if you swap the order 
of the data load, i.e. do the ordered data set first, and/or use a 
different table name for each load? I'm just wondering if you're 
seeing some kind of database bloat if VACUUM fixes the issue.



ATB,

Mark.

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] shp2pgsql - bad date formats

2011-04-14 Thread Jeff Dege
I'm trying to load a shapefile into a PostGIS database, using shp2pgsql and 
psql.

It's failing because a couple of the sql insert statements have date fields 
containing '0'.

ERROR:  date/time field value out of range: 0
HINT:  Perhaps you need a different datestyle setting.

What I need is for these features to be inserted with NULL date fields, instead 
of '0'.

Currently, I'm running psql on the generated .sql file, capturing the output.  
I then search the output to identify the problem statement, delete it from the 
generated file, and then run psql on it again, until there are no errors.  I 
save the lines I deleted in another file, and then hand-edit them, and insert 
them separately.

This is tiresome, tedious, and slow.

Anyone have any better ideas?

___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] Return point Lat, Long of ST_Distance(a.geog, b.geog)

2011-04-14 Thread Lopez, Anthony
All,

I am wondering if it is possible to return the lat, long coordinates of the 
point that the ST_Distance(a.geog, b.geog) function basis its distance 
calculation on.  Essentially, I'm after ST_ClosestPoint(), but using geography 
instead of geometry.


thanks,

Anthony

Anthony Lopez | GIS Analyst
Strategic Energy Analysis Center
National Renewable Energy Laboratory

1617 Cole Blvd Golden, Colorado 80401
ph 303.275.3654

anthony.lo...@nrel.govmailto:anthony.lo...@nrel.gov | 
www.nrel.govhttp://www.nrel.gov/
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


[postgis-users] Specify vertices in ST_Buffer(geog)?

2011-04-14 Thread Lopez, Anthony
All,

Does anyone know if it's possible to specify the number of vertices in 
ST_Buffer(geog)? If not, is there a way around this?  It seems pretty limiting 
to this geography function...


thanks,

Anthony

Anthony Lopez | GIS Analyst
Strategic Energy Analysis Center
National Renewable Energy Laboratory

1617 Cole Blvd Golden, Colorado 80401
ph 303.275.3654

anthony.lo...@nrel.govmailto:anthony.lo...@nrel.gov | 
www.nrel.govhttp://www.nrel.gov/
___
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users