[GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Thoen
I'm new to PostgreSQL and I ran into problem I don't want to repeat. I have
a database with a little more than 18 million records that takes up about
3GB. I need to check to see if there are duplicate records, so I tried a
command like this:

SELECT count(*) AS count, fld1, fld2, fld3, fld4 FROM MyTable 
  GROUP BY fld1, fld2, fld3, fld4
  ORDER BY 1 DESC;

I knew this would take some time, but what I didn't expect was that about
an hour into the select, my mouse and keyboard locked up and also I
couldn't log in from another computer via SSH. This is a Linux machine
running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
the disc too.

I finally had to shut the power off and reboot to regain control of my
computer (that wasn't good idea, either, but eventually I got everything
working again.)

Is this normal behavior by PG with large databases? Did I misconfigure
something? Does anyone know what might be wrong?

- Bill Thoen


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-19 Thread Bill Thoen
Tom, here's the "explain" results: Does this help explain what went wrong?
(And yes, I think there will be a *lot* of groups.)

explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;

 QUERY PLAN 


 Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
   Sort Key: count(*)
   ->  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
width=160)
 ->  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
   Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
   ->  Seq Scan on compliance_2006  (cost=0.00..1039927.18
rows=18048318 width=160)
(6 rows)



On Sun, Aug 19, 2007 at 01:19:51PM -0400, Tom Lane wrote:
> Bill Thoen <[EMAIL PROTECTED]> writes:
> > I knew this would take some time, but what I didn't expect was that about
> > an hour into the select, my mouse and keyboard locked up and also I
> > couldn't log in from another computer via SSH. This is a Linux machine
> > running Fedora Core 6 and PostgresQL is 8.1.4. There's about 50GB free on
> > the disc too.
> 
> > I finally had to shut the power off and reboot to regain control of my
> > computer (that wasn't good idea, either, but eventually I got everything
> > working again.)
> 
> I've seen Fedora go nuts like that when it ran out of memory.  Once it
> starts to swap heavily, performance goes into the tank; and once the
> kernel realizes it's in memory trouble, it starts to kill processes
> more or less at random.  That might explain why ssh stopped working.
> 
> One thing to do to make it more robust is to disable memory overcommit.
> I suspect also that configuring it with lots of swap space is
> counterproductive, because that just encourages the kernel to allow lots
> of swapping.  I haven't actually experimented with that part though.
> 
> As for why PG ran the system out of memory, I suspect that the planner
> drastically underestimated the number of groups to be created by your
> GROUP BY, and thought it could get away with a hash aggregation.  We
> don't currently have any provision for spilling hash aggregation to
> disk, so if there's a very large number of groups the table just gets
> very big :-(.  The planner is not supposed to choose hash agg if the
> estimated table size exceeds work_mem ... but if it had out-of-date
> statistics to work with it might have gotten the wrong answer.  Have
> you ANALYZEd this table recently?  What does EXPLAIN show as the
> estimated number of result rows?
> 
>   regards, tom lane
> 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] Searching for Duplicates and Hosed the System

2007-08-20 Thread Bill Thoen
Something is really screwy here. I tried what looked like a simpler task 
than I tried when I started this message thread. The only unusual issue 
here is that the table compliance_2006 is rather big (over 18 million 
records). The table ers_regions is pretty small (about 3100 records) and 
all the WHERE fields are indexed (except e.region). Here's the 
not-too-complicated SQL:

SELECT a.* FROM compliance_2006 a, ers_regions e
 WHERE a.fips_st_cd=e.fips_st
   AND a.fips_cnty_cd=e.fips_cou
   AND e.region=1;

I ran VACUUM ANALYZE just before I launched this and there were no other 
postgress jobs running. I'm the only user as well. I also ran EXPLAIN 
prior to the run and got this:


Nested Loop  (cost=11.71..28800.34 rows=7219 width=584)
  ->  Seq Scan on ers_regions e  (cost=0.00..71.90 rows=16 width=28)
Filter: (region = 1)
  ->  Bitmap Heap Scan on compliance_2006 a  (cost=11.71..1788.76 
rows=451 width=584)
Recheck Cond: ((a.fips_st_cd = "outer".fips_st) AND 
(a.fips_cnty_cd = "outer".fips_cou))
->  Bitmap Index Scan on key_tract  (cost=0.00..11.71 rows=451 
width=0)
  Index Cond: ((a.fips_st_cd = "outer".fips_st) AND 
(a.fips_cnty_cd = "outer".fips_cou))

(7 rows)

So I launched it to see what would happen. This resulted in a very busy 
disk drive for about an hour, and one by one things started dying on my 
FC6 Linux machine. First the mouse disappeared, then the cursor in the 
terminal window where I was running psql vanished, the clock stopped, 
and the keyboard stopped responding. Meanwhile, the disk drive thrashed 
on. Finally the screen saver kicked in and shortly thereafter I logged 
back in only to see the word "Killed" sitting there on the last line and 
all my machine's services were running again. Just no data output.


I looked in the log and saw this:
LOG:  transaction ID wrap limit is 1073746500, limited by database 
"postgres"
LOG:  transaction ID wrap limit is 1073746500, limited by database 
"postgres"

LOG:  could not send data to client: Broken pipe

I guess this is what killed it, but does it make sense that such a 
simple request wrapped around the transaction limit? Is the bad behavior 
of my machine while running this query consistent with an error like 
this or is the machine broken and that's what caused Postgresql to 
crater? What should I set the wrap limit to? What suggestions would you 
make for tracking down the exact problem and fixing it?


Any help would be appreciated

- Bill Thoen

Tom Lane wrote:

Bill Thoen <[EMAIL PROTECTED]> writes:
  

Tom, here's the "explain" results: Does this help explain what went wrong?
(And yes, I think there will be a *lot* of groups.)



  

explain select count(*) as count,fips_st_cd, fips_cnty_cd, farm_nbr,
tract_nbr, field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr from
compliance_2006 group by fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr,crop_cd,crop_status_cd,practice_cd,seq_nbr order by 1 desc;



  
 QUERY PLAN 


 Sort  (cost=15119390.46..15123902.54 rows=1804832 width=160)
   Sort Key: count(*)
   ->  GroupAggregate  (cost=13782933.29..14301822.43 rows=1804832
width=160)
 ->  Sort  (cost=13782933.29..13828054.08 rows=18048318 width=160)
   Sort Key: fips_st_cd, fips_cnty_cd, farm_nbr, tract_nbr,
field_nbr, crop_cd, crop_status_cd, practice_cd, seq_nbr
   ->  Seq Scan on compliance_2006  (cost=0.00..1039927.18
rows=18048318 width=160)
(6 rows)



Hmm ... no, actually, that shows the planner doing the right thing for
lotsa groups: picking GroupAggregate instead of HashAggregate.  The
estimated number of groups is 1804832, which might or might not have
much to do with reality but in any case seems enough to keep it away
from HashAggregate.

Do you have autovacuum running, or a scheduled cronjob that runs ANALYZE
or VACUUM ANALYZE?  The only theory I can think of at this point is that
your database statistics are more correct now than they were when you
had the problem.

If you try the query again, does it behave more sanely?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

  



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] do you have an easy example of postgis and mapserver?

2007-08-21 Thread Bill Thoen
On Tue, Aug 21, 2007 at 12:44:49PM +0200, Ottavio Campana wrote:
> I'm sorry this mail is not very in topic, but I hope you can help me.

Just so you know, perhaps a better list to contact with this is the
MapServer mailing list at
http://lists.umn.edu/archives/mapserver-users.html, or maybe the PostGIS
list at http://www.postgis.org/mailman/listinfo/postgis-users.

> I'm trying to learn how postgis and mapserver work together, but I
> cannot understand nearly anything. I mean, I read the documentation of
> postgis and I think I understood it, but I cannot do anything useful
> with it.

I think before you try to do something customized with python you should
get familiar the mapserver MAP file which contains all the instructions to
map features from various sources like shape files, MapInfo TAB files,
PostGIS data sources, and so on. An example for mapping a layer from a
PostGIS source in a MAP file looks like this:

  LAYER
NAME states
TYPE POLYGON
STATUS ON
CONNECTION "user=gisuser password=*** dbname=us_data host=localhost
port=5432"
CONNECTIONTYPE POSTGIS
DATA "the_geom from states"
CLASSITEM 'name'
PROJECTION
  "+proj=latlong +datum=WGS84"
END
CLASS
  NAME "US States"
  OUTLINECOLOR 0 0 0
  COLOR 255 255 196
END
  END

The key bits here related to PostGIS are the CONNECTION, where you specify
all the info needed to get access to your PostgreSQL database, the
CONNECTIONTYPE which you set to POSTGIS, and the DATA directive in which
you supply a string that contains the SQL expression that pulls out the
geometry features you want to map.

If you do this with python, then you would first need to create an instance
of a mapObj, and then create a layerObj and add it to the map object,
setting the properties for the connection. However, my knowledge of this
approach is still very sketchy.

Documentation is spread around, but the two sites you should peruse are
the MapServer docs at: http://mapserver.gis.umn.edu/docs and the PostGIS docs
at: http://postgis.refractions.net/docs/.

- Bill Thoen


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] PG Seg Faults Performing a Query

2007-08-21 Thread Bill Thoen

How would you suggest I try to track down this problem?
I run the following query:

SELECT a.* FROM compliance_2006 a, ers_regions b
 WHERE a.fips_st_cd=b.fips_st
   AND a.fips_cnty_cd=b.fips_cou AND b.region =1
   AND a.fips_st_cd='17' AND a.fips_cnty_cd='003';

and it works. But when I try this:

SELECT a.* FROM compliance_2006 a, ers_regions b
 WHERE a.fips_st_cd=b.fips_st
   AND a.fips_cnty_cd=b.fips_cou AND b.region =1
   AND a.fips_st_cd='17' ;

psql dies with the message:
Segmentation Fault.

Any suggestions?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-22 Thread Bill Thoen
On Tue, Aug 21, 2007 at 04:38:42PM -0500, Scott Marlowe wrote:
> On 8/21/07, Bill Thoen <[EMAIL PROTECTED]> wrote:
> > How would you suggest I try to track down this problem?
> > I run the following query:
> >
> > SELECT a.* FROM compliance_2006 a, ers_regions b
> >   WHERE a.fips_st_cd=b.fips_st
> > AND a.fips_cnty_cd=b.fips_cou AND b.region =1
> > AND a.fips_st_cd='17' AND a.fips_cnty_cd='003';
> >
> > and it works. But when I try this:
> >
> > SELECT a.* FROM compliance_2006 a, ers_regions b
> >   WHERE a.fips_st_cd=b.fips_st
> > AND a.fips_cnty_cd=b.fips_cou AND b.region =1
> > AND a.fips_st_cd='17' ;
> >
> > psql dies with the message:
> > Segmentation Fault.
> 
> so the client psql is what's dieing right?  In that case you likely
> are getting too big a result set for psql to handle at once.  Trying
> declaring a cursor to hold your query and fetching 100 or 1000 or so
> rows at a time.
> 
> Just guessing.  What's the exact text of the error message?
> 

The exact message was:

Segmentation Fault.


But the table compliance_2006 is very big (18 million plus records) so I'll
try that cursor idea. But even so, an error like that makes me think that
something's broken.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-22 Thread Bill Thoen
On Wed, Aug 22, 2007 at 09:46:21AM +1200, Andrej Ricnik-Bay wrote:
> On 8/22/07, Bill Thoen <[EMAIL PROTECTED]> wrote:
> > How would you suggest I try to track down this problem?
> > Any suggestions?
> postgres version?
> Operating system?
> Anything in the log(s)?

PostgreSQL Version is 8.1.5, running on Linux (Fedora Core 6). The last few
lines in the Serverlog are:
LOG:  unexpected EOF on client connection
LOG:  transaction ID wrap limit is 1073746500, limited by database
"postgres"
LOG:  transaction ID wrap limit is 1073746500, limited by database
"postgres"

(I ran VACUUM FULL after it crashed to make sure there was no loose disk
space floating around, so that last line was probably from that.) I assume
that bit about "transaction wrap limit" is informational and not related to
this problem.

My PostgreSQL is working great for small SQL queries even from my large
table (18 million records). But when I ask it to retrieve anything that
takes it more than 10 minutes to assemble, it crashes with this
"Segmentation Fault" error. I get so little feedback and I'm still pretty
unfamiliar with Postgresql that I don't even know where to begin.

This version of PostgreSQL was compiled from source with support for various
other packages needed for GIS support, but the tables I'm trying to extract
data from contain no GIS information. So I believe that this operation is
plain PostgreSQL.

Any help you can offer as to how I can track down what's wrong would be
greatly appreciated. If I can't get this to work and can only use small
tables in PG, then its usefulnes to me will be pretty limited.

- Bill Thoen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-22 Thread Bill Thoen

As you requested, here's what bt in gbd reports:
(gdb) bt
#0  0x003054264571 in fputc () from /lib64/libc.so.6
#1  0x0040dbd2 in print_aligned_text ()
#2  0x0040f10b in printTable ()
#3  0x0041020b in printQuery ()
#4  0x00407906 in SendQuery ()
#5  0x00409153 in MainLoop ()
#6  0x0040b16e in main ()

Please tell me what it means if you can and if I can fix this problem.

Thanks,
- Bill Thoen

Alvaro Herrera wrote:

Martijn van Oosterhout escribió:

  

That said, it would be nice if it returned an error instead of
crashing.



In my opinion it isn't just a matter of "would be nice".  It is a
possible bug that should be investigated.

A look at a stack trace from the crashing process would be the first
place to start.  In order to do that, please set "ulimit -c unlimited"
and rerun the query under psql.  That should produce a core file.  Then
run
gdb psql core
and inside gdb, execute "bt".  Please send that output our way.

  



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] PG Seg Faults Performing a Query

2007-08-24 Thread Bill Thoen
I'm a bit out of my depth with using these debugging tools and 
interpreting their results, but I think the problem is due to the output 
being just too big for interactive display. Using the same query with 
tighter limits in the WHERE clause works perfectly. When I changed the 
SQL script to write output into a table it worked with the same query 
using even looser limits in the WHERE clause. So sending output to a 
table instead of to the monitor when the queries produce a large amount 
of output is reliable, faster and doesn't tie up the machine.


I tried using strace, but it produced so much telemetry and 
unfortunately I couldn't understand it anyway that I don't think this 
would do me any good. I don't want to bug the PostgreSQL list with a 
problem that's probably not a PostgreSQL one, but if someone here would 
be willing to help me track down this apparent popen or fork problem I'd 
appreciate it. However, I managed to get the results I needed, so we 
could also call this "fixed via workaround."


Thanks for the help, Tom and others!
- Bill Thoen

Tom Lane wrote:

Bill Thoen <[EMAIL PROTECTED]> writes:
  

(gdb) bt
#0  0x003054264571 in fputc () from /lib64/libc.so.6
#1  0x0040dbc2 in print_aligned_text (title=0x0, headers=0x5665d0,
cells=0x2f8fc010, footers=0x557c90,
opt_align=0x557ef0 'l' , "rr", 'l' times>, "rl lll", opt_tuples_only=0 '\0', opt_numeric_locale=0 '\0', 
opt_border=1,

encoding=8, fout=0x0) at print.c:448
#2  0x0040f0eb in printTable (title=0x0, headers=0x5665d0,
cells=0x2f8fc010, footers=0x557c90,
align=0x557ef0 'l' , "rr", 'l' , 
"rl lll", opt=0x7fff3e3be8c0, fout=0x3054442760, flog=0x0) at 
print.c:1551



OK, so the problem is that print_aligned_text is being passed fout = NULL.
Since that wasn't what was passed to printTable, the conclusion must be
that PageOutput() was called and returned NULL --- that is, that its
popen() call failed.  Obviously we should put in some sort of check for
that.  I can see three reasonable responses: either make psql abort
entirely (akin to its out-of-memory behavior), or have it fall back to
not using the pager, either silently or after printing an error
message.  Any thoughts which way to jump?

Meanwhile, the question Bill needs to look into is why popen() is
failing for him.  I'm guessing it's a fork() failure at bottom, but
why so consistent?  strace'ing the psql run might provide some more
info.

regards, tom lane

  



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] Importing undelimited files (Flat Files or Fixed-Length records)

2008-06-19 Thread Bill Thoen
I've got to load some large fixed-legnth ASCII records into PG and I was 
wondering how this is done. The Copy command looks like it works only 
with delimited files, and I would hate to have to convert these files to 
INSERT-type SQL to run them through psql.. Is there a way one can 
specify a table structure with raw field widths and then just pass it a 
flat file?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Abnormal termination of PostgreSQL

2008-06-21 Thread Bill Thoen
I recently installed Fedora Core 9 and included PostgreSQL, built form 
source. I also built PostGIS, GEOS and GDAL to work with it. I had no 
problems compiling and installing PostgreSQL, but I did have some 
problems building GEOS and GDAL, although I think I've resolved those 
(FC9 comes with gcc 4.3.0 which has had some changes that affect 
building older source.) I'm using the latest versoin of everything. 
PostgreSQL is ver 8.3.3.


Anyway, I created a database and "spatialized" it. Then created and 
populated a table of about 1.7 million records that have no spatial 
component. Then I ran an SQL update command to convert some double 
precision columns to 1/100 and 1/10 of their original values. During 
this update, about ten minutes into the operation (about when I expected 
it would be done) the postgres server shuts down suddenly. It reports 
this message:


vfm=> \i update.sql
psql:update.sql:1: server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
psql:update.sql:1: connection to server was lost


Nothing very helpful is reported in the serverlog file except the 
shutdown message.


In /var/log/messages I get:

Jun 21 10:42:57 sandbox kernel: postmaster[2534]: segfault at b8cfeabf 
ip 08081357 sp bfb113c0 error 4 in postgres[8048000+335000]



I tried changing the SQL to only updating one field (e.g. 'update 
corn0716 set average_yield = average_yield / 100 ;') and it also failed 
in the same manner in about the same time (~10 minutes).


Does anyone know what happened and how I can fix it?

- Bill Thoen




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Importing undelimited files (Flat Files or Fixed-Length records)

2008-06-23 Thread Bill Thoen

Patrick TJ McPhee wrote:

In article <[EMAIL PROTECTED]>, Bill Thoen <[EMAIL PROTECTED]> wrote:
% I've got to load some large fixed-legnth ASCII records into PG and I was 
% wondering how this is done. The Copy command looks like it works only 
% with delimited files, and I would hate to have to convert these files to 
% INSERT-type SQL to run them through psql.. Is there a way one can 
% specify a table structure with raw field widths and then just pass it a 
% flat file?


pg_loader is supposed to handle this.

http://pgfoundry.org/projects/pgloader

  


Thanks, but as it turned out I also had to skip blank lines, so I wrote 
a short sed script and piped the data through that into COPY. That 
worked just fine.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Advice Wanted on Selecting Multi-row Data Requests in 10-Row Blocks

2008-06-27 Thread Bill Thoen
What I want to do is present the results of a query in a web page, but 
only 10 rows at a time. My PostgreSQL table has millions of records and 
if I don't add a LIMIT 10 to the SQL selection, the request can take too 
long. The worst case scenario is when the user requests all records 
without adding filtering conditions (e.g. SELECT * FROM MyTable;)  That 
can take 10-15 minutes, which won't work on a web application.


What I'm wondering is how in PostgreSQL do you select only the first 10 
records from a selection, then the next 10, then the next, and possibly 
go back to a previous 10? Or do you do the full selection into a 
temporary table once, adding a row number to the columns and then 
performing sub-selects on that temporary table using the row id? Or do 
you run the query with Limit 10 set and then run another copy with no 
limit into a temporary table while you let the user gaze thoughtfully at 
the first ten records?


I know how to get records form the database into a web page, and I know 
how to sense user actions (PageDown, PageUp, etc.) so I'm basically 
looking for techniques to extract the data quickly.


Also, if this isn't the best forum to ask this sort of question, I'd 
appreciate being pointed to a more appropriate one.


TIA,

- Bill Thoen



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advice Wanted on Selecting Multi-row Data Requests in 10-Row Blocks

2008-06-27 Thread Bill Thoen
Thanks for tip on OFFSET. That's just what I needed. It's so easy when 
you know the command you're looking for, and so hard when you know what 
you want to do but don't know what the command is called!


Thanks,
- Bill Thoen

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Problems with pg_dump and -t wildcards

2008-07-25 Thread Bill Thoen
I'm having some problems with pg_dump and the -t switch. I can't get it 
to work using wildcards nor with multiple -t switches. I'm using 
PostgreSQL 8.1.5 and trying to dump just a table and its associated 
sequence.


As shown below, the table il_sections and the sequence 
il_sections_gid_seq both exist in the database spatial1. But neither

pg_dump -Fc -t 'il_sections*' spatial1 > il_plss.dump
nor
pg_dump -Fc -t il_sections -t il_sections_gid_seq spatial1 > il_plss.dump
will work. The former generates the error listed below while the latter 
just creates a tiny file with nothing but a couple of comments in it.


Anyone know what's wrong?
- Bill Thoen

Here's what I get:
$ psql -dspatial1
Welcome to psql 8.1.5, the PostgreSQL interactive terminal.
...
spatial1=# \d
 List of relations
Schema | Name  |   Type   |  Owner
+---+--+--
public | US_States_ogc_fid_seq | sequence | bthoen
public | counties  | table| bthoen
public | counties_gid_seq  | sequence | bthoen
public | geometry_columns  | table| postgres
public | il_sections   | table| bthoen
public | il_sections_gid_seq   | sequence | bthoen
public | spatial_ref_sys   | table| postgres
public | states| table| bthoen
(8 rows)

spatial1=# \q
$ pg_dump -Fc -t 'il_section*' spatial1 > il_plss.dump
pg_dump: specified table "il_section*" does not exist



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problems with pg_dump and -t wildcards

2008-07-25 Thread Bill Thoen

Ah ha! I'm reading the wrong docs. Duh...
I guess it's upgrade time!
Thank you.

Tom Lane wrote:

"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
  

On Fri, 2008-07-25 at 11:36 -0600, Bill Thoen wrote:

I'm having some problems with pg_dump and the -t switch. I can't get it 
to work using wildcards nor with multiple -t switches. I'm using 
PostgreSQL 8.1.5 and trying to dump just a table and its associated 
sequence.
  


  

That is because it doesn't work like that. IIRC you can dump exactly one
table with the -t switch.



-t does take a wildcard pattern ... in 8.2 and later.  I don't believe
8.1 handled more than one -t switch, either.

regards, tom lane

  



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need Some Explanation of an EXPLAIN

2008-09-26 Thread Bill Thoen
I'm trying to update a field in a table that has about 5 millin records 
with a table  that has about 3.5 million records. I've created indexes 
for th a joined columns, but PostgreSQL 8.1 doesn't seem to want to use 
them. This makes for a very slow update.


Below are descriptions of the two tables followed by the query plan that 
my PostgreSQL wants to use. So I have two questions:


1.) Why won't it use the indexes?
2.) How can I make this update faster?

TIA,
Bill Thoen


 Table "public.id2"
 Column   | Type | Modifiers
---+--+---
grower_id | integer  |
fmid  | character(7) |
fsa_id| character(9) |
Indexes:
   "id2_fsa_is_key" UNIQUE, btree (fsa_id)

 Table "public.growers"
   Column| Type  | Modifiers
--+---+---
grower_id| integer   |
fsa_id   | character(9)  |
co_name  | character varying(45) |
. . .
Indexes:
   "grower_fsa_id_key" btree (fsa_id)

fsa=# EXPLAIN UPDATE growers
 SET grower_id = id2.grower_id
 FROM id2 WHERE growers.fsa_id = id2.fsa_id;
 
   QUERY PLAN

--
Hash Join  (cost=70375.50..1603795.30 rows=6802720 width=355)
  Hash Cond: ("outer".fsa_id = "inner".fsa_id)
  ->  Seq Scan on growers  (cost=0.00..672373.20 rows=6802720 width=351)
  ->  Hash  (cost=46249.20..46249.20 rows=1966920 width=44)
->  Seq Scan on id2  (cost=0.00..46249.20 rows=1966920 width=44)
(5 rows)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why Does UPDATE Take So Long?

2008-09-30 Thread Bill Thoen
Working with PG 8.1 I'm trying to update a char(4) column, and it's 
taking a very long time; 15 minutes so far and no end in sight. From the 
explain, it doesn't seem like it should take that long, and this column 
is not indexed. Sure, there's 2.7 million records but it only takes a 
few minutes to scan the whole file. Is there some special overhead I 
should be aware of with an UPDATE? I VACUUMed and ANALYZEd first, too.


Or am I just expecting too much?

Here's the explain:
explain UPDATE farms SET prog_year='2007';
  QUERY PLAN

Seq Scan on farms  (cost=0.00..59144.07 rows=2772207 width=54)
(1 row)


TIA,
- Bill Thoen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Does UPDATE Take So Long?

2008-09-30 Thread Bill Thoen
Doesn't look like that's the problem. I moved my table over to another 
Linux box running PG 8.3 and update performance was pretty bad there as 
well. In the time that PG 8.3 was struggling with update there I created 
a copy of my table on my PG 8.1 machine and inserted all columns with 
one containing the altered values I wanted and that took less than two 
minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still 
thrashing away trying to update that one column that's not even part of 
any index..


Something is really wrong with UPDATE in PostgreSQL I think.


Scott Marlowe wrote:

On Tue, Sep 30, 2008 at 1:00 PM, Bill Thoen <[EMAIL PROTECTED]> wrote:
  

Working with PG 8.1 I'm trying to update a char(4) column, and it's taking a
very long time; 15 minutes so far and no end in sight. From the explain, it
doesn't seem like it should take that long, and this column is not indexed.
Sure, there's 2.7 million records but it only takes a few minutes to scan
the whole file. Is there some special overhead I should be aware of with an
UPDATE? I VACUUMed and ANALYZEd first, too.

Or am I just expecting too much?



The problem is that on older versions of pgsql, the db had to update
each index for each row updated as well as the rows.  The latest
version, with a low enough fill factor, can update non-indedexed
fields by using the free space in each page and not have to hit the
indexes.  But on 8.1 you don't get that optimization.

  



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Does UPDATE Take So Long?

2008-09-30 Thread Bill Thoen
Sorry for the hyperbole; I should have qualified that ridiculous 
statement with "...on my machines." No doubt the problem has something 
to do with configuration, because I don't know much about that. One of 
my machines is running PG 8.1 on Linux Fedora Core 5. It's got an AMD 
64bit CPU with a GB RAM and plenty of normal disk space (not running 
RAID 5). The other machine is running Linux FC9 and PG 8.3. It's got a 
i686 cpu with a GB RAM and also not using RAID.


Since I don't understand much about configuring PostgreSQL, both of 
these machines use the default PostgreSQL configuration. I figured that 
it was optimized for general use but maybe since my files are large-ish 
(in the low multi-million record ranges) mayb ethta doesn't qualify as 
general use. Anyway, here's the configuration settings you mentioned.

Shared_buffers are = 1000
#checkpoint_segments = 3 
#checkpoint_timeout = 300
#checkpoint_warning = 30 

What should I be looking for in the configuration to improve UPDATE 
performance?


Thanks,
- Bill Thoen

Bill Moran wrote:

In response to Bill Thoen <[EMAIL PROTECTED]>:

  
Doesn't look like that's the problem. I moved my table over to another 
Linux box running PG 8.3 and update performance was pretty bad there as 
well. In the time that PG 8.3 was struggling with update there I created 
a copy of my table on my PG 8.1 machine and inserted all columns with 
one containing the altered values I wanted and that took less than two 
minutes. Meanwhile, a half-hour later, my PG 8.3 machine was still 
thrashing away trying to update that one column that's not even part of 
any index..


Something is really wrong with UPDATE in PostgreSQL I think.



That's an interesting theory, although it's completely wrong and founded
in ridiculosity.  If something were "really wrong with UPDATE" in every
version of PostgreSQL, you'd be reading about it on the mailing lists,
and you won't.

What I suspect is that the typical tuning advice applies here.  I don't
see any information about your configuration or your hardware setup.
* What are shared_buffers set at?
* What do the checkpoint configs look like?
* In general, what does your postgresql.conf look like, how much tuning
  have you done?
* What is your hardware setup?  You're not running RAID 5 are you?

  



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Does UPDATE Take So Long?

2008-09-30 Thread Bill Thoen

Alvaro Herrera wrote:

Bill Moran wrote:

  

What I suspect is that the typical tuning advice applies here.  I don't
see any information about your configuration or your hardware setup.
* What are shared_buffers set at?
* What do the checkpoint configs look like?
* In general, what does your postgresql.conf look like, how much tuning
  have you done?
* What is your hardware setup?  You're not running RAID 5 are you?



Also, how many indexes does this table have?

  

Two, but the column I'm updating isn't included in either one of them.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Why Does UPDATE Take So Long?

2008-10-01 Thread Bill Thoen
Many thanks to everyone who helped me with this. It'll be a while before 
I understand enough to be able to do a good job of tuning my system's 
configuration, but there seem to be a few basics I can apply right away. 
Also pointing out how UPDATE actually works was very helpful. Since I'm 
at the data building stage, most of my updates  will apply to an entire 
column and in cases like that it's much more efficient to simply use 
joins into a new table and delete the old. In this case:


CREATE TABLE farm2 (LIKE farms);
INSERT INTO farm2 (farm_id, fips_cd, farm_nbr, prog_year) SELECT 
farm_id, fips_cd, farm_nbr, '2007' FROM farms;

DROP TABLE farms;
ALTER TABLE farm2 RENAME TO farms;
CREATE UNIQUE INDEX farms_id_key ON farms(farm_id);
CREATE UNIQUE INDEX farms_fips_nbr_key ON farms(fips_cd,farm_nbr);

takes only a few minutes for this 2.77 million record table. The alternative

UPDATE farms SET prog_year='2007';

takes hours! I don't know how many because I gave up after waiting for 
1.5 hrs.


Thanks all,
- Bill Thoen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen
I'm trying to combine two tables, but I only want unique records based 
on the first two columns. Can UNION be used to join three-column tables 
but only include records based on the uniqueness of the first two 
columns? If not, how would I do this with PostgreSQL 8.1?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen

Raymond O'Donnell wrote:

On 09/10/2008 17:36, Bill Thoen wrote:
  

I'm trying to combine two tables, but I only want unique records based
on the first two columns. Can UNION be used to join three-column tables
but only include records based on the uniqueness of the first two
columns? If not, how would I do this with PostgreSQL 8.1?



How do you decide which records you want? - e.g. given the following rows...

  (a, b, c)
  (a, b, d)

...how do you decide whether you the one with c or the one with d?

  
The physical order that they appear will take care of that. 



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen

Raymond O'Donnell wrote:

On 09/10/2008 17:59, Bill Thoen wrote:
  

I'm trying to combine two tables, but I only want unique records based
on the first two columns. Can UNION be used to join three-column tables
but only include records based on the uniqueness of the first two
columns? If not, how would I do this with PostgreSQL 8.1?



How do you decide which records you want? - e.g. given the following
rows...

  (a, b, c)
  (a, b, d)

...how do you decide whether you the one with c or the one with d?

  
  

The physical order that they appear will take care of that.



H

Leaving that aside, how about SELECTing the columns you want to be
unique from the two tables, and then JOINing the UNION of those back
with the UNION of the complete (three-column) tables?
  

Say, that just might work! I'll give it a try. Thanks!

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen

Raymond O'Donnell wrote:

On 09/10/2008 17:59, Bill Thoen wrote:
  

I'm trying to combine two tables, but I only want unique records based
on the first two columns. Can UNION be used to join three-column tables
but only include records based on the uniqueness of the first two
columns? If not, how would I do this with PostgreSQL 8.1?



How do you decide which records you want? - e.g. given the following
rows...

  (a, b, c)
  (a, b, d)

...how do you decide whether you the one with c or the one with d?

  
  

The physical order that they appear will take care of that.



H

Leaving that aside, how about SELECTing the columns you want to be
unique from the two tables, and then JOINing the UNION of those back
with the UNION of the complete (three-column) tables?
  

No, this won't work. Here's an example of my tables:
Table1
1, 12, A
2, 16, B
8, 6, A
19, 9, C

Table2
1, 13, D
2, 16, B
8, 6, B
12, 5, A

A simple UNION will remove the duplicate row 2, 16, B, but it won't 
block row 8, 6, B in table 2 from being included in the output. What I 
want is for records in table 1 to take precedence and for the output 
records to be unique based only on the first two columns. In other 
words, I want this output:

Table3
1, 12, A
2, 16, B
8, 6, A
19, 9, C
1, 13, D
12, 5, A

Trying your suggestion:
create table tmp as
 select col1, col2 from table1
 union
 select col1, col2 from table2;

create table tmp2 as
 select * from table1
 union
 select * from table2;

I'll get:
tmp
1, 12
2, 16
8, 6
19, 9
1, 13
12, 5

tmp2
1, 12, A
2, 16, B
8, 6, A
19, 9, C
1, 13, D
8, 6, B
12, 5, A

I now have two rows with the same first two columns (8,6,A and 8,6,B and 
if I try to JOIN tmp and tmp2 I'm going to get duplicates on the 8,6 
key. Or am I misunderstanding what you suggested?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen

David Wilson wrote:

On Thu, Oct 9, 2008 at 1:48 PM, Bill Thoen <[EMAIL PROTECTED]> wrote:
  

No, this won't work. Here's an example of my tables:
Table1
1, 12, A
2, 16, B
8, 6, A
19, 9, C

Table2
1, 13, D
2, 16, B
8, 6, B
12, 5, A



select * from table1
union
select table2.* from table2 left join table1 on table2.a=table1.a and
table2.b=table1.b where table2.a is null;

(Written in gmail, but you should get the basic idea.)
  
Thanks, but that didn't work. That selected only the records from 
table1. However, this did work:

CREATE TABLE table3 AS
  SELECT * FROM table1;

CREATE UNIQUE INDEX table3_pk ON table3  (a, b);

INSERT INTO table3 SELECT * FROM table2
  WHERE NOT EXISTS (SELECT a, b FROM table1 t1
 WHERE table2.a=t1.a AND table2.b=t1.b);


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen

David Wilson wrote:

On Thu, Oct 9, 2008 at 3:31 PM, Bill Thoen <[EMAIL PROTECTED]> wrote:

  

Thanks, but that didn't work. That selected only the records from table1.



That's why I warned you about it being written in gmail. :)
  
I'm sorry, you had it right the first time. Here's a script that 
verifies it:


create table table1 (
  a integer,
  b integer,
  c char(1)
);

insert into table1 values ( 1, 12, 'A');
insert into table1 values ( 2, 16, 'B');
insert into table1 values ( 8,  6, 'A');
insert into table1 values (19,  9, 'C');
insert into table1 values (20, 12, 'A');

create table table2 (
  a integer,
  b integer,
  c char(1)
);

insert into table2 values ( 1, 13, 'D');
insert into table2 values ( 2, 16, 'B');
insert into table2 values ( 8,  6, 'B');
insert into table2 values (12,  5, 'A');

select * from table1
union
select table2.* from table2 left join table1 on table2.a=table1.a and
table2.b=table1.b where table2.a is null;

drop table table1;
drop table table2;


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Question About UNION

2008-10-09 Thread Bill Thoen

Josh Williams wrote:

On Thu, 2008-10-09 at 10:59 -0600, Bill Thoen wrote:
  

I'm trying to combine two tables, but I only want unique records based
on the first two columns. Can UNION be used to join three-column tables
but only include records based on the uniqueness of the first two
columns? If not, how would I do this with PostgreSQL 8.1?


How do you decide which records you want? - e.g. given the following rows...

  (a, b, c)
  (a, b, d)

...how do you decide whether you the one with c or the one with d?

  
  
The physical order that they appear will take care of that. 



If the rest of the columns don't matter, how about:

SELECT DISTINCT ON(col1, col2) * FROM (
 SELECT col1, col2, col3 FROM table1
  UNION
 SELECT col1, col2, col3 FROM table2
 ORDER BY col1, col2
) AS uniontable;

- Josh Williams
  

Great! This works too.
And to correct my last post, David Wilson had it right the SECOND time. 
I've been wrestling with this stupid problem all morning and now my mind 
is so gone I don't even trust whether I can get  'SELECT * FROM table1;' 
to work!


Thanks for the help once again everybody!!!

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Bug In COPY TO?

2008-10-10 Thread Bill Thoen
I noticed that in PG 8.1 the docs for COPY TO say that you can use WITH 
HEADER, but when I do that it throws an error saying that this works 
only in CSV mode. Is this a bug or do I have something wrong with my syntax?


COPY mytable TO '/data/out.txt'
  WITH HEADER DELIMITER '|' NULL '' ;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Update with a Repeating Sequence

2008-10-14 Thread Bill Thoen
I've got a table with repeated records that I want to make unique by 
adding a sequence code of 0,1,2,...,n for each set of repeated records. 
Basically, I want to turn:

field_id | seq
--+-
   1 |   0
   2 |   0
   3 |   0
   3 |   0
   3 |   0
   4 |   0
   4 |   0
   5 |   0
   6 |   0
into:
field_id | seq
--+-
   1 |   0
   2 |   0
   3 |   0
   3 |   1
   3 |   2
   4 |   0
   4 |   1
   5 |   0
   6 |   0

What's the best way to that?

TIA,
- Bill Thoen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Update with a Repeating Sequence

2008-10-14 Thread Bill Thoen
The table exists already; all I need to do is update the sequence code 
to make the records unique, but also I need each repeating set numbered 
from 0 (Zero) so I can select a list of unique farm field records where 
seq = 0.


I think that the suggestion to use a cursor sounds good, but I'm 
concerned that according to the PG 8.1 documentation, update cursors are 
not supported. However, with a read only cursor can I FETCH a row, read 
the field variables in it and update one or two of them and have that 
change posted back into that record to update the table? I'm not at all 
familiar with PostgreSQL cursors yet, so any help on the syntax would be 
welcome too. An example script or function showing how to step through 
all records and updating just one or two column values in each row would 
be appreciated if someone could point me to it.


Again here's the table structure for my small example:
create  table farm_fields (
   field_id integer,
   seq integer
);

And I want to
convert something
like this: to this:

field_id | seq field_id | seq
-+--+-
  34556 |  0 34556 |  0
  34556 |  0 34556 |  1
  34556 |  0 34556 |  2
  37000 |  0 37000 |  0
  37002 |  0 37002 |  0
  37002 |  0 37002 |  1
  37005 |  0 37005 |  0


Webb Sprague wrote:

Untested ideas (beware):

Use an insert trigger that:
curr_seq := select max(seq) from foo where field_id = NEW.field_id
if curr_seq is null then  NEW.seq := 0
else NEW.seq  := curr_seq + 1

(You have to figure out how to build the trigger infrastructure...)

If you need to do it on a table that is already populated, let us know.

On Tue, Oct 14, 2008 at 9:21 AM, Bill Thoen <[EMAIL PROTECTED]> wrote:
  

Grzegorz Jas'kiewicz wrote:


alter table foo add newid sequencial;

alter table foo drop field_id;

alter table foo rename newid to field_id;

  

I can't do that; I need to preserve the field_id values.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




  



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Update with a Repeating Sequence

2008-10-14 Thread Bill Thoen

Steve Atkins wrote:


On Oct 14, 2008, at 9:04 AM, Bill Thoen wrote:

I've got a table with repeated records that I want to make unique by 
adding a sequence code of 0,1,2,...,n for each set of repeated 
records. Basically, I want to turn:

field_id | seq
--+-
  1 |   0
  2 |   0
  3 |   0
  3 |   0
  3 |   0
  4 |   0
  4 |   0
  5 |   0
  6 |   0
into:
field_id | seq
--+-
  1 |   0
  2 |   0
  3 |   0
  3 |   1
  3 |   2
  4 |   0
  4 |   1
  5 |   0
  6 |   0

What's the best way to that?


This is mildly tricky to do, and hard to maintain.

In most cases where people say they need this, they're actually 
perfectly happy with the seq value being enough to make the row 
unique, and ideally increasing in order of something such as insertion 
time ...
Thanks, but in this case I really need both unique records and a 
repeated sequence so I can select the first occurrence of each record 
(i.e. WHERE seq =0) and sometimes I need the max(seq) for particular 
records. Since this is a read-only table, maintaining it is not a 
problem.  Anyway, I did mange to solve it. Here's a little test script 
that shows how:


CREATE TABLE test (
  field_id integer,
  seq integer
);
INSERT INTO test VALUES (1, 0);
INSERT INTO test VALUES (2, 0);
INSERT INTO test VALUES (3, 0);
INSERT INTO test VALUES (3, 0);
INSERT INTO test VALUES (3, 0);
INSERT INTO test VALUES (4, 0);
INSERT INTO test VALUES (4, 0);
INSERT INTO test VALUES (5, 0);
INSERT INTO test VALUES (6, 0);

-- Create table to hold static variables
CREATE TABLE tmp (last_id integer, cnt integer);
INSERT INTO tmp VALUES(0,0);

-- Function to fill in repeated sequence
CREATE OR REPLACE FUNCTION test_it (field_id integer) RETURNS integer AS $$
DECLARE
  r tmp%ROWTYPE;
  nLast_id integer;
  nCnt integer;
  BEGIN
 SELECT * INTO r FROM tmp;
 nLast_id = r.last_id;
 nCnt = r.cnt;
 IF field_id = nLast_id THEN
nCnt = nCnt + 1;
 ELSE
nCnt = 0;
nLast_id = field_id;
 END IF;
 UPDATE tmp SET last_id=nLast_id, cnt=nCnt;
 RETURN nCnt;
  END;
$$ LANGUAGE plpgsql;

-- Fill in repeated sequence
UPDATE test SET seq=test_it (field_id);

-- Show results
SELECT * FROM test ORDER BY field_id, seq;

-- Clean up
DROP FUNCTION test_it(integer);
DROP TABLE tmp;
DROP TABLE test;


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Update with a Repeating Sequence

2008-10-14 Thread Bill Thoen

Grzegorz Jas'kiewicz wrote:


alter table foo add newid sequencial;

alter table foo drop field_id;

alter table foo rename newid to field_id;


I can't do that; I need to preserve the field_id values.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] UPDATE and Indexes and Performance

2008-10-15 Thread Bill Thoen

Does PG (8.1) ever use existing indexes when executing an UPDATE?

I've got some tables with millions of records and whenever I update a 
column that involves most or all the records the EXPLAIN command seems 
to indicate that it isn't using the pre-existing indexes. This result in 
a slow update, which is further slowed by the presence of indexes. So 
when doing a large update should I just drop the indexes first, or is 
there some good reason to keep them?


Thanks,
- Bill Thoen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How Do I Find the Date When A Table Was Last Changed?

2008-10-27 Thread Bill Thoen

Is there a way to find when data in a postgresql table was last changed?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Why does a normally fast query run so slow when the table is in a partition?

2011-02-02 Thread Bill Thoen
ce_disc, vfm.clu.ogc_fid, vfm.clu.statecd, 
vfm.clu.countycd,

   vfm.clu.tractnbr, vfm.clu.farmnbr, vfm.clu.clunbr, vfm.clu.acres
  ->  Append  (cost=0.00..1098364.31 rows=2 width=52) (actual 
time=38367.325..38367.339 rows=1 loops=1)

->  Seq Scan on clu  (cost=0.00..1098356.00 rows=1 width=57)
(actual time=38367.222..38367.222 
rows=0 loops=1)
  Output: vfm.clu.source_disc, vfm.clu.ogc_fid, 
vfm.clu.statecd, vfm.clu.countycd,
  vfm.clu.tractnbr, vfm.clu.farmnbr, 
vfm.clu.clunbr, vfm.clu.acres

  Filter: ((source_disc = 'co'::bpchar) AND (ogc_fid = 116337))
->  Index Scan using clu_co_pkey on clu_co clu  
(cost=0.00..8.31 rows=1 width=48)

(actual time=0.090..0.096 rows=1 loops=1)
  Output: vfm.clu.source_disc, vfm.clu.ogc_fid, 
vfm.clu.statecd, vfm.clu.countycd,
  vfm.clu.tractnbr, vfm.clu.farmnbr, 
vfm.clu.clunbr, vfm.clu.acres
  Index Cond: ((source_disc = 'co'::bpchar) AND (ogc_fid = 
116337))

Total runtime: 38367.450 ms
(10 rows)

TIA,
- Bill

--

*Bill Thoen*
GISnet - www.gisnet.com <http://www.gisnet.com/>
1401 Walnut St., Suite C
Boulder, CO 80302
303-786-9961 tel
303-443-4856 fax
bth...@gisnet.com



Re: [GENERAL] Why does a normally fast query run so slow when the table is in a partition?

2011-02-03 Thread Bill Thoen

Got it solved!

The problem was one of two things,or maybe both. I had somehow gotten 
over 15 million records into the master table and even though I 
"deleted" them and run VACUUM ANALYZE over the table, they were still 
taking up space in the table. Perhaps even just opening a table with 
that much garbage in it is what was taken 30-40 seconds. So I made a 
copy of the structure, blew away the original table, taking the bad 
records out with it and then renamed the copy and used that as the 
master table
Also, I found that some of the partition tables had a third index 
besides the Primary Key index. This was an earlier unique index that I 
was using before I learned how to add a primary key to an existing 
table. So I cleaned up all the partition tables making sure that they 
all had the exact same indexes and constraints set,.
Then I relinked everything and tried it with several queries. Wow! 
Over 20 million records (so far), and now I can retrieve any one of  
them in less than a second! It seems that partitioning is even better 
than I expected.
Thanks for the help. Your comment, Steve, about the Planner being 
finicky was the clue. KI guess it's intolerant of inconsistency. 
Probably a good thing for a database.



On 2/2/2011 3:10 PM, Steve Crawford wrote:

On 02/02/2011 01:35 PM, Bill Thoen wrote:

Steve Crawford wrote:

On 02/02/2011 12:17 PM, Bill Thoen wrote:
I've got a large  (and growing) database set up as a partitioned 
database

What is the setting of contstraint_exclusion?
http://www.postgresql.org/docs/9.0/static/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION 



Cheers,
Steve

It's set to 'Partition'

That sounds good. Out of curiosity, what happens if you use an 
explicit cast?:

...where 'co'::char(2)...

I've seen lots of cases where the planner doesn't use indexes when the 
data-type differs sometimes even subtly. Might be the same for 
constraint exclusion.


Cheers,
Steve





--
*Bill Thoen*
GISnet - www.gisnet.com
303-786-9961

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Directing Partitioned Table Searches

2011-02-04 Thread Bill Thoen
In a partitioned table, is it possible to specify the partition for a 
query to search using a variable instead of a constant?


EXAMPLE: Join another table to the partitioned one
Table: clu (partitioned by state)
ogc_fid bigint
cluid char(16)
state bpchar(2)
constraint: state='mi' (or 'co', 'ks', etc. for each partition)

Table: farms
link bigint
zone bpchar(2)
farmid char(7)
...
This selection will result in partitions being searched
SELECT cluid, farmid
FROM clu JOIN farms ON ogc_fid=link

This selection will result in only the 'mi' (Michigan) partition being 
searched

SELECT cluid, farmid
FROM clu JOIN farms ON ogc_fid=link
WHERE state='mi'

And this selection will result in ALL partitions being searched. But why?
SELECT cluid, farmid
FROM clu JOIN farms ON ogc_fid=link
WHERE state=zone

I'd like to be able to run some queries w/o the overhead of searching 
partitions unnecessarily. Can it be done?


Regards,

--

*Bill Thoen*
GISnet - www.gisnet.com <http://www.gisnet.com/>
1401 Walnut St., Suite C
Boulder, CO 80302
303-786-9961 tel
303-443-4856 fax
bth...@gisnet.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Directing Partitioned Table Searches

2011-02-07 Thread Bill Thoen

Vick Khera wrote:

Your best bet is to know which partition you need and write your query
that way dynamically, rather than trying to use a generic query and
have the DB do the constraint exclusion.  In your above case, if you
know that 'zone' will limit you to just the MI table, then specify the
MI table instead of the base clu table.
  
Thanks. That sounds simple enough. Since I want to automate this, I 
guess the next step is to learn how to create and  execute a "dynamic" 
query.  I think I know how to do that.



--

*Bill Thoen*
GISnet - www.gisnet.com <http://www.gisnet.com/>
1401 Walnut St., Suite C
Boulder, CO 80302
303-786-9961 tel
303-443-4856 fax
bth...@gisnet.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plsql question

2011-02-11 Thread Bill Thoen

On 2/10/2011 6:40 PM, Geoffrey Myers wrote:
I am trying to write a plsql routine that will delete a range of 
characters based on their octal or hexadecimal values.  Something like 
the 'tr' shell command will do:


cat file| tr -d ['\177'-'\377']

Can't seem to figure this one out.

Pointers would be appreciated.

Would the string functions to_hex(), translate(), replace() and friends 
work for you?

(see http://www.postgresql.org/docs/9.0/static/functions-string.html)


Re: [GENERAL] Logic AND between some strings

2011-03-07 Thread Bill Thoen
Just to make sure, you're asking for the logical AND, not the bitwise 
AND? In other words you're not talking about getting into bit shifting 
with << and >> and masking with &?


For the logical AND, you need to use expressions that evaluate to TRUE 
or FALSE, and follow the rules in this "truth table":


Expr 1Expr 2 AND returns
truetrue true
truefalsefalse
false   true false
false   falsefalse

like 01100010 == 1100 AND 01100010 > 1100 would evaluate to 
false AND true (depending on how you interpret these bitstrings), and so 
that evaluates to false. If you're dealing with bitstrings (text made up 
of only zeros and ones), then I'd implement this by just using text 
comparisions, since bitstrings are just text and have the same 
comparision behavior as the unsigned integers  they could represent.






On 3/7/2011 7:25 AM, yagru_alvarez wrote:

I want to make a Logic AND between some strings of 0s and 1s .. Here
you have an example:

1->  01100010
2->  1100

I wanto to make a LOGIC AND between 01100010 and 1100.

I' m working with C++, I need some code to have an idea about how I
can perform that.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PG and dynamic statements in stored procedures/triggers?

2011-03-07 Thread Bill Thoen

On 3/7/2011 7:55 AM, Adrian Klaver wrote:

On Monday, March 07, 2011 6:45:11 am Durumdara wrote:

Hi!

Thanks!

How do I create "cursor" or "for select" in PGSQL with dynamic way?

For example

:tbl = GenTempTableName()

insert into :tbl...
insert into :tbl...
insert into :tbl...

for select :part_id from :tbl begin
 exec 'select count(*) from subitems where id = ?' using :part_id into

:sumof

 update :tbl set sumof = :sumof where part_id=:part_id
end;

Can you show me same example?

There are examples in the docs at the link provided. Though I would suggest
reading the pl/pgsql documentation from the beginning to get an idea of its
structure.
You won't find this easy. I've spent an awful lot of time the last two 
days trying to figure out how to pass  variables between SQL and 
plpgsql, and the examples don't cover all the things you'd think you 
should be able to do but because Postgres SQL doesn't have variables. 
What it does have comes from  psql and they seem to be more like text 
replacement placeholders than variables you can evaluate.


For example, I have a need for a tool that gets an initial record id 
from the user, then it looks up that key and finds the primary keys of 
two other tables related to the firstkey, then it looks those tables up 
and displays the data from each side by side so I can check the 
differences between the records. (Basically, it's a case of data from 
two vendors that carry a common key, and I'm just spot checking). I've 
been using interactive psql, but I thought an app as simple as this is 
in concept wouldn't be so hard to do, but it is if you don't know enough 
of what's in the API like, isn't there a function to enumerate a table's 
attributes?. Or how do you capture the results of a select that calls a 
function in SQL? (e.g.:

\set myResults

:myResults = SELECT myFunction();
-- this won't fly; nor will this:
SELECT INTO :myResults myFunction();

Anyway, I'm begining to see that I had some misconceptions about what 
you can do within SQL and what you're better off doing in plpgsql. Or C. 
Read the whole section on variables in the manual. That's very good 
advice. In fact, peruse it. Because if you read it lightly, you'll have 
to to go over it again and again.


But after reading your note, dynamic SQL seems like it might be just 
what I'm looking for too. Didn't realize it was an option, since I see 
it's documented near the end of the manual, and there's only so much 
RTFMing I can do at a sitting, so that's all new territory to me. But if 
it works like you've sketched out here... well I'm going to try it and see.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] output screen in psql

2011-03-10 Thread Bill Thoen
On Wed, Mar 09, 2011 at 05:51:51PM -0800, abcdef wrote:
> Hi, I use postgres in red hat linux .
> When I use psql and select the content of the table .
> I find the format is not tidy .
> I change align and unalign , the output is not filful my requirement.
> Any special setting that can change the line size like oralce to fit my
> requirement ???
> So, the whole table can be view within the screen of psql ???
> 


I use \x to toggle between block display and row display. And if you log in
using ssh on Linux, you can zoom in and out with Ctrl-Shift + and Ctrl-Shft
- (Hold down the control and shift keys press the plus key to zoom in
  (increase the font size), or the minus key to decrease it.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Partitioned Database and Choosing Subtables

2011-03-14 Thread Bill Thoen
I've got a ver 8.4.5 partitioned data base with records organized by US 
state, so the partitions are set up by state. When I query this database 
and include  the key field that tells postgres what partition you , 
everything works as I expect. It searches only the specified partition, 
and it's fast . But that's only if I use a constant, like this:


SELECT lions,  tigers, bears FROM WildLife
WHERE state_pt = 'CO';

What I want to be able to do is put this key value in a table and PG 
look in whatever partition the column specifies, like so:


SELECT lions,  tigers, bears, statecode FROM WildLife
WHERE state_pt = statecode;

However when I try anything other than a constant, it search EVERY 
partition, sequentially, which is not what I want it to do. So is there 
any way to specify the partition to search using a variable/column name?


--
*Bill Thoen*
GISnet - www.gisnet.com
303-786-9961

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Partitioned Database and Choosing Subtables

2011-03-15 Thread Bill Thoen

On 3/15/2011 12:02 PM, Igor Neyman wrote:




-Original Message-
From: Bill Thoen [mailto:bth...@gisnet.com]
Sent: Monday, March 14, 2011 11:31 PM
To: pgsql-general@postgresql.org
Subject: Partitioned Database and Choosing Subtables

I've got a ver 8.4.5 partitioned data base with records
organized by US state, so the partitions are set up by state.
When I query this database and include  the key field that
tells postgres what partition you , everything works as I
expect. It searches only the specified partition, and it's
fast . But that's only if I use a constant, like this:

SELECT lions,  tigers, bears FROM WildLife WHERE state_pt = 'CO';

What I want to be able to do is put this key value in a table
and PG look in whatever partition the column specifies, like so:

SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
state_pt = statecode;

However when I try anything other than a constant, it search
EVERY partition, sequentially, which is not what I want it to
do. So is there any way to specify the partition to search
using a variable/column name?

--
*Bill Thoen*
GISnet - www.gisnet.com
303-786-9961


Try using "dynamic" sql:

EXECUTE 'SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
state_pt = ' || statecode INTO ...;

See:
http://www.postgresql.org/docs/8.4/interactive/plpgsql-statements.html

Thanks Igor. It was a nice try -- and I thought it would work, but the 
Planner had other plans. Basically, I tried the interactive method using 
a PREPARE statement with one text parammeter followed by an EXECUTE 
statement. Unfortunately, the query still went rummaging across the 
entire database sequentially.


To recap, (I still hope there's a chance for a workable answer) here's a 
simple contrived scenario


Incidents Table
---
total  integer,  -- total number of human:critter encounters
statecode char(2),   -- by state
outcome char(20) -- outcome of incident {fatal, serious, minor, 
close call}



Wildlife Table   -- the Partitioned table. Uses state_pt to 
separate data tables.

table  1: state_pt='AK'
table  2: state_pt='AL'
...
table 50: state_pt='WY'
-
lions integer,   -- total population of lions in the state
tigers integer,  -- total tigers
bears integer,   -- total bears
outcome char(20),-- outcome of incident {fatal, serious, minor, 
close call}

... more attributes...
state_pt -- key index that separates the sub tables 
inthe partitioned datbase



Some incidents table data:

total |   statecode  | outcome
--+--+-
   7  | CA   |  fatal
  12  | CA   |  close call
   3  | CO   |  fatal
  16  | CO   |  close call
   4  | WY   |  minor



If I do the following it finds the data and goes directly to the right 
table.


SELECT lions, tigers, bears FROM Wildlife
WHERE state_pt = 'CO';

Or this:
\set theState '\''CO'\''


SELECT lions, tigers, bears FROM Wildlife
WHERE state_pt = :theState ;

It works perfectly. But when I tried this:

PREPARE foo (text) AS
SELECT lions, tigers, bears FROM Wildlife
WHERE state_pt = $1;

EXECUTE foo( 'CO' )

It sequentially scanned everything, so I didn't even try the scenario 
I'm trying to create, which is using the results of a select to supply 
the key code to the Planner as to what partition it should search for 
reach record.


SELECT lions, tigers , bears FROM Wildlife a JOIN incidents b ON 
a.outcome=b.outcome

WHERE state_pt=statecode;

Apparently, the Planner is pretty fussy about how you refer to one of 
your subtables, but I almost always use queries that involve multiple 
states,and the knowledge of what state table a particular record needs 
to access is carried as one of that record's attributes. I'm surprised 
that there's so little info on this. I tseems like an obvious 
application  of partitioned databases, but maybe I just don't understand 
how it should be used?


It seems that the only variables it likes are host variables, and there 
maybe something down that road...

Anywya the quuestion is still open if anyone has some good ideas.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Partitioned Database and Choosing Subtables

2011-03-16 Thread Bill Thoen

On 3/16/2011 12:40 AM, Alban Hertroys wrote:

Try using "dynamic" sql:

EXECUTE 'SELECT lions,  tigers, bears, statecode FROM WildLife WHERE
state_pt = ' || statecode INTO ...;

Thanks Igor. It was a nice try -- and I thought it would work, but the Planner 
had other plans. Basically, I tried the interactive method using a PREPARE 
statement with one text parammeter followed by an EXECUTE statement. 
Unfortunately, the query still went rummaging across the entire database 
sequentially.


That's not the same as using dynamic SQL. In fact, that would behave just like 
your previous, problematic, query.

Dynamic SQL makes that the planner sees a new query each time, that needs to be 
planned from scratch.

Thanks for explaining this. I think I can just see what you mean. I can 
also see that I'm way out of my depth here. It's just so much fun when 
you mix deadlines with ignorance.



If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


Good point. I'm feeling a bit chainsaw-ish right now.

--
*Bill Thoen*
GISnet - www.gisnet.com
303-786-9961

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Accessing a database form another database

2009-08-10 Thread Bill Thoen
I've got some GIS tables that I want to use across several projects and 
currently I've got each project in a separate database. So say I've got 
two databases, one for the Spacely Sprookets project and one for the 
Cogswell Cogs contract, but I want to access my WorldMap tables in my 
GIS database to make maps for both of these projects.  How do I 
reference a table that's in another database? Or should I organize my 
PostgreSQL data differently?


Thanks,
- Bill Thoen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] When to use Vacuum?

2010-06-24 Thread Bill Thoen
I'm putting some large read-only data sets together and occasionally I 
change my mind about something, drop a table or two and replace them 
with something different. Do I need to use VACUUM when I drop or 
re-arrange tables? Or does PG release disk space when you drop tables? 
And finally, if the tables  are strictly read-only, then do I need to 
use VACUUM at all on a regular basis? (Like, do selections take up disk 
space that  needs to be cleaned up on a regular basis? It's my 
understanding that VACUUM only clears or marks for re-use deleted 
records, but the docs say nothing about deleted tables and I don't want 
to get into an intolerable situation someday (like disk full.)



TIA,
- Bill Thoen


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How to Declare Functions Containing OUT PArameters?

2010-07-14 Thread Bill Thoen
I'm having some difficulty getting plpgsql to recognize a function with 
a couple of OUT parameters. I'm either declaring the function 
incorrectly, making the call to it in the wrong way or my program is 
simply possessed by evil spirits. I'm using Postgres 8.1.5.
What appears  to be happening is that it's declaring the function as if 
it returned  a record and had only two  parameters, but I'm trying to 
call it with four parameters, with two of them being OUT parameters. So 
the compiler sees two different versions of the function and refused to 
do anything more. The example below shows the problem, but it's just 
something to exercise the function calls and generate the error. Can 
anyone spot the screw-up in this little example? (the error message is 
listed below in the block comment)

TIA,
-Bill Thoen

CREATE OR REPLACE FUNCTION fishy( s1 text, s2 text, OUT n integer, OUT f 
real ) AS $$

DECLARE
  c integer;
BEGIN
  c := length( s1 );
  n := length( s1 || s2 );
  f  := c::real / n::real;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION main() RETURNS VOID AS $$
DECLARE
  str1 text;
  str2 text;
  num integer := 0;
  fnum real := 0.0;
BEGIN
  str1 := 'One fish, two fish';
  str2 := 'Shark fish, No fish';
  SELECT fishy( str1, str2, num, fnum) ;

  RAISE NOTICE 'fishy() analysis: %  %', num, fnum;
END;
$$ LANGUAGE plpgsql;

SELECT main();

/*  ERROR MESSAGE

psql:ex_out_fail.sql:28: ERROR:  function fishy(text, text, integer, 
real) does not exist
HINT:  No function matches the given name and argument types. You may 
need to add explicit type casts.

CONTEXT:  SQL statement "SELECT  fishy(  $1 ,  $2 ,  $3 ,  $4 )"
PL/pgSQL function "main" line 9 at SQL statement

And when I run \df from the pgsql command line, it shows up like this:
 | fishy| record| text, text

*/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to Declare Functions Containing OUT PArameters?

2010-07-14 Thread Bill Thoen
Thanks guys. I think I see now. I was thinking it was a more transparent 
pass-by-value / pass-by-reference thing.
Anyway I solved my problem by going back into my comfort zone and 
explicitly return a record and I'm not using OUT parameters. They're 
aren't what I thought  they were and I'm working on a tight schedule, so 
I don't have much toim m eto explre. The compiler seems happier without 
them, and when it's happy, I'm happy.


Tom Lane wrote:

"Joshua D. Drake"  writes:
  

On Wed, 2010-07-14 at 12:21 -0600, Bill Thoen wrote:

I'm having some difficulty getting plpgsql to recognize a function with 
a couple of OUT parameters.
  


  
psql:ex_out_fail.sql:28: ERROR:  function fishy(text, text, integer, 
real) does not exist
  


  

You are passing four IN paramaters. The out paramaters are used in
return are they not?



You'd need to do something like

SELECT * INTO num, fnum FROM fishy(str1, str2) ;

OUT parameters in PG are just syntactic sugar for defining a
record-returning function.  The call syntax still works as if you'd
written CREATE FUNCTION foo (IN-parameters-only) RETURNS some-record-type.

regards, tom lane

  



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: What to do if space for database decrease and no additional hard drive is possible?

2010-07-26 Thread Bill Thoen



A. Kretschmer wrote:

In response to Ivan Voras :
  

* buy external storage (NAS, or even an external USB drive), move the
database to it



buy external USB-Drive, and create a new tablespace, and move some large
table into this new tablespace and/or use the new tablespace for new
tables. You can also use table-partitioning with different tablespaces.
  
Can you then unmount that USB drive without causing any damage to the 
other databases?


--
- Bill Thoen 
 GISnet - www.gisnet.com

 303-786-9961


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] How Big is Too Big for Tables?

2010-07-28 Thread Bill Thoen
I'm building a national database of agricultural information and one of 
the layers is a bit more than a gigabyte per state. That's 1-2 million 
records per state, with a mult polygon geometry, and i've got about 40 
states worth of data. I trying to store everything in a single PG table. 
What I'm concerned about is if I combine every state into one big table 
then will performance will be terrible, even with indexes? On the other 
hand, if I store the data in several smaller files, then if a user zooms 
in on a multi-state region,  I've got  to build or find a much more 
complicated way to query multiple files.


So I'm wondering, should I be concerned with building a single national 
size table (possibly 80-100 Gb) for all these records, or should I keep 
the files smaller and hope there's something like ogrtindex out there 
for PG tables? what do you all recommend in this case? I just moved over 
to Postgres to handle big files, but I don't know its limits. With a 
background working with MS Access and bitter memories of what happens 
when you get near Access'  two gigabyte database size limit, I'm a 
little nervous of these much bigger files. So I'd appreciate anyone's 
advice here.


TIA,
- Bill Thoen



Re: [GENERAL] Need magic for identifieing double adresses

2010-09-16 Thread Bill Thoen



> On 9/16/2010 5:18 AM, Sam Mason wrote:
  

>> On Thu, Sep 16, 2010 at 04:40:42AM +0200, Andreas wrote:


>>> I need to clean up a lot of contact data because of a merge of customer
>>> lists that used to be kept separate.
>>> I allready know that there are double entries within the lists and they
>>> do overlap, too.
>>>
>>> Relevant fields could be  name, street, zip, city, phon
  

>
> Since your're working with words, a better "fuzzy" match algorithm is  
> Levenshtein distance. It's surprizingly good about coming up with an  
> index number for words (or even phrases). Calculate the Levenshtein  
> distance for two words and the closer the indexes are to each other the  
> more alike the words are. It's well-documented on Wikipedia (and other  
> places)
  


The problem with using something like this is that it's awkward to do
anything apart from the Cartesian product and compare everything with
everything else.  If you've got 500k rows to start with, that's a *lot*
of comparisons to be doing.  The nice thing about using something like
soundex is that it allows you to use equality when comparing and hence
PG has a chance to finish the query in a reasonable amount of time.

If you're dealing with a small number of possibilities, I'd agree that
some form of edit distance is a very useful tool.


> But overall, you're basically normalizing an address list with people  
> names. So I first go through the lists with an eye to getting rid of the  
> junk (regexp is great for identifying names with "illeagal" characters)  
> and creating a starting name in uppercase. There's always at least one  
> data entry clerk (usually several) who do quirky things like include  
> notes in the last name field.
  


Yup, normalisation is very important when dealing with this sort of
thing.  Reducing free form text down to something more constrained is
important.


> You really need to parse each word,  
> correct spelling and consider it in the local context, i.e. does it  
> "fit" where it is?  (as a geocding routine would do with street names)  
> There are canonical forms for names and though these vary from country  
> to country, they make good templates for a name parser.
  


I've tried doing something like this before, but tended to give up.  I
guess all the matching I've been doing has been small enough that it was
quicker to handle the odd cases by hand than spending more time writing
general purpose code that never seemed to be quite good enough.

-- Sam http://samason.me.uk/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Need Help With a A Simple Query That's Not So Simple

2011-10-31 Thread Bill Thoen
I think this should be easy, but I can't seem to put the SQL together 
correctly and would appreciate any help. (I'm using Pg 8.4 in CentOS 
5.5, if that matters.)


I have a table of Farms and a table of crops in a 1:M relationship of 
Farms : Crops. There are lots of different crops to choose form but for 
now I'm only interested in two crops; corn and soybeans. Some farms grow 
only corn and some grow only soybeans, and some grow both. What I'd like 
to know is, which Farms and how many are growing only corn, which and 
how many are growing soybeans and which and how many are growing both? I 
can easily get all the corn growers with:


SELECT a.*
  FROM farms a
  JOIN crops b
ON a.farm_id=b.farm_id
 WHERE crop_cd='0041'

I can do the same with soybeans (crop_cd= '0081') and then I could 
subtract the sum of these from the total of all farms that grow either 
corn or soybeans to get the number of farms growing both, but having to 
do all those queries sounds very time consuming and inefficient. Is 
there a better way to get the farm counts or data by categories like 
farms growing only corn, farms growing only soybeans, farms growing 
both? I'm also interested in possibly expanding to a general case where 
I could select more than two crops. and get counts of the permutations.


Here's a sketch of the relevant pieces of the data base.

*Tables:*
farms crops
=== ===
farm_id  bigint (pkey) crop_id   (pkey)
type farm_idforeign key to farms
size crop_cd0041 = corn 0081=soybeans
...year
...

Any help would be much appreciated.

TIA,

- Bill Thoen





Re: [GENERAL] Need Help With a A Simple Query That's Not So Simple

2011-10-31 Thread Bill Thoen

On 10/31/2011 5:05 PM, David Johnston wrote:

From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bill Thoen
Sent: Monday, October 31, 2011 6:51 PM
To: Postgrresql
Subject: [GENERAL] Need Help With a A Simple Query That's Not So Simple

[...]
What I'd like to know is,
which Farms and how many are growing only corn, which and how many are
growing soybeans and which and how many are growing both?
[...]
Is there a better way to
get the farm counts or data by categories like farms growing only corn,
farms growing only soybeans, farms growing both? I'm also interested in
possibly expanding to a general case where I could select more than two
crops. and get counts of the permutations.
[...]
---

General Idea:

WITH crop_one AS (
SELECT farm_id, crop_cd AS crop_one_cd ...
), crop_two AS (
SELECT farm_id, crop_cd AS crop_two_cd
)
SELECT *
FROM crop_one
FULL OUTER JOIN crop_two USING (farm_id)
;

Records with NULL for "crop_one_cd" only grow crop 2, records with NULL for
"crop_two_cd" only grow crop 1, records where neither field is NULL grow
both.

Not sure regarding the general case.  You likely want to use ARRAY_AGG to
get a result like:

Farm_id_100, { 'CROP_CD_1', 'CROP_CD_2' }

You could then probably get a query to output something like:
  (crop_id, farms_exclusive, farms_shared, farms_without)
Where each of the "farms_" columns is an array of farm_ids that match the
particular conditional

= ALL (exclusive); != ALL&&  = ANY (shared); != ANY (without)

David J.
Thanks David! That worked great! When I filled in the the query from the 
"general idea" in your example above like so:


WITH crop_one AS (
SELECT farm_id, crop_cd AS corn FROM gfc_inilmoidia_2007 WHERE 
crop_cd ='0041'

), crop_two AS (
SELECT farm_id, crop_cd AS soybeans FROM gfc_inilmoidia_2007 
WHERE crop_cd = '0081'

)
SELECT *
FROM crop_one
FULL OUTER JOIN crop_two USING (farm_id)
;

It produced the following (which is essentially the base of what I'm 
looking for):


 farm_id | corn | soybeans
-+--+--
1473 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1474 | 0041 | 0081
1475 | 0041 |
1475 | 0041 |
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1476 | 0041 | 0081
1477 | 0041 |
1478 | 0041 | 0081
1479 | 0041 |
1480 |  | 0081
1480 |  | 0081

Thanks so much for the quick reply. You've also just opened up a whole 
new area of query possibilities for me of which I wasn't aware


- Bill Thoen





Re: [GENERAL] Re: [pgsql-general] Need Help With a A Simple Query That's Not So Simple

2011-11-01 Thread Bill Thoen
Thanks! Half the problem searching the 'Net for answers is knowing what it's 
called.

Regards,

Bill Thoen
GISnet
http://gisnet.com
303-786-9961

On Nov 1, 2011, at 10:01 AM, "James B. Byrne"  wrote:

> 
>> Date: Mon, 31 Oct 2011 16:51:02 -0600
>> From: Bill Thoen 
>> To: Postgrresql 
>> Subject: Need Help With a A Simple Query That's Not So
>> Simple
>> Message-ID: <4eaf2656.6020...@gisnet.com>
>> 
>> I think this should be easy, but I can't seem to put the
>> SQL together correctly and would appreciate any help.
>> (I'm using Pg 8.4 in CentOS 5.5, if that matters.)
>> 
>> I have a table of Farms and a table of crops in a 1:M
>> relationship of Farms : Crops. There are lots of
>> different crops to choose form but for now I'm only
>> interested in two crops; corn and soybeans.
>> 
>> Some farms grow only corn and some grow only soybeans,
>> and some grow both.  What I'd like to know is, which
>> Farms and how many are growing only corn, which and
>> how many are growing soybeans and which and how many are
>> growing both? I can easily get all the corn growers with:
>> 
>> SELECT a.*
>>   FROM farms a
>>   JOIN crops b
>> ON a.farm_id=b.farm_id
>>  WHERE crop_cd='0041'
>> 
>> I can do the same with soybeans (crop_cd= '0081') and
>> then I could subtract the sum of these from the total
>> of all farms that grow either corn or soybeans to get
>> the number of farms growing both, but having to
>> do all those queries sounds very time consuming and
>> inefficient. Is there a better way to get the farm
>> counts or data by categories like farms growing only
>> corn, farms growing only soybeans, farms growing
>> both? I'm also interested in possibly expanding to a
>> general case where I could select more than two crops.
>> and get counts of the permutations.
>> 
>> Here's a sketch of the relevant pieces of the data base.
>> 
>> *Tables:*
>> farms crops
>> === ===
>> farm_id  bigint (pkey) crop_id   (pkey)
>> type farm_idforeign key to farms
>> size crop_cd0041 = corn 0081=soybeans
>> ...year
>> ...
>> 
>> Any help would be much appreciated.
>> 
>> TIA,
>> 
>> - Bill Thoen
> 
> I believe that what you are trying to do is called
> relational algebra division. Take a look at these
> references and see if either fits your needs:
> 
> http://en.wikipedia.org/wiki/Relational_algebra#Division_.28.C3.B7.29
> 
> http://www.cs.arizona.edu/~mccann/research/divpresentation.pdf
> 
> 
> 
> 
> 
> -- 
> ***  E-Mail is NOT a SECURE channel  ***
> James B. Byrnemailto:byrn...@harte-lyne.ca
> Harte & Lyne Limited  http://www.harte-lyne.ca
> 9 Brockley Drive  vox: +1 905 561 1241
> Hamilton, Ontario fax: +1 905 561 0757
> Canada  L8E 3C3
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
> 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Using the internal data dictionary

2011-11-17 Thread Bill Thoen
I need to assemble a complete data dictionary for project documentation and 
other purposes and I was wondering about the pros and cons of using the 
pg_catalog metadata. But I hesitate to poke around in here because I don't know 
why it's kept so out of sight and not much documented. But it seems like an 
ideal source of information to tap with a program to generate accurate, current 
reports of what's in the database.

Is this a bad idea (everything I'm thinking of doing would be read only except 
for the description fields) but I'd just like to make sure that there's not 
some innocent looking table in there that acts as a doomsday device if you so 
much as read its first record, etc.  I'm just not sure why this isn't more 
widely used or talked about.

Regards,

Bill Thoen
GISnet
http://gisnet.com
303-786-9961
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using the internal data dictionary

2011-11-18 Thread Bill Thoen

Thanks, guys!

I'll take a closer look at the information_schema and pgAdmin and 
Maestro. Reinventing the wheel isn't a problem as this job is not 
critical, but the educational experience in looking at the system from 
another POV may be the bigger prize.


- Bill


On 11/17/2011 8:34 PM, David Johnston wrote:

On Nov 17, 2011, at 22:17, Bill Thoen  wrote:


I need to assemble a complete data dictionary for project documentation and 
other purposes and I was wondering about the pros and cons of using the 
pg_catalog metadata. But I hesitate to poke around in here because I don't know 
why it's kept so out of sight and not much documented. But it seems like an 
ideal source of information to tap with a program to generate accurate, current 
reports of what's in the database.

Is this a bad idea (everything I'm thinking of doing would be read only except 
for the description fields) but I'd just like to make sure that there's not 
some innocent looking table in there that acts as a doomsday device if you so 
much as read its first record, etc.  I'm just not sure why this isn't more 
widely used or talked about.

Regards,

Bill Thoen
GISnet
http://gisnet.com
303-786-9961
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


information_schema is the more standardized point of entry into the database 
meta-data, catalog is generally intended for internals use and thus has a less 
stable API contract.  That said, you are reinventing the wheel if you are 
looking for a straight dump of the current reality.  Various third-party tools 
already do this.  I've used, but am not affiliated with, PostgreSQL Maestro.  
Also, pgAdmin, I think, provides access to this information as well (as does 
psql via it's various commands).

You should never directly update the catalog but instead use the appropriate SQL command. 
 For descriptions you need to use "COMMENT ON".  Reading it should never cause 
a problem.

David J.





[GENERAL] Permission Problems

2011-11-23 Thread Bill Thoen
I'm trying to put together a very simple web application to display 
information about any table in my database that a web user wants to see. 
The general idea is to present a list of schemata and their associated 
tables to the user who then picks one, which causes the server to send 
the list of fields in that table.  Pretty simple.


My thought was to use an internal, low-privileged account to do the 
look-up and pass the requested info back to the client, but I think I'm 
running into permission problems and I don't know where to add all the 
"GRANTS" so that the go-between account has enough privilege to do its 
job, but not more.


I'm getting the list of schemata from the information_schema.schemata 
table, and using my superuser account it works fine. However, using the 
account I've set up for this job isn't getting very far and I'm getting 
nothing returned. I've granted permissions for SELECT and REFERENCES on 
all of my tables, and granted USEAGE on the schemas including 
information_schema and pg_catalog and the relevant views; I've even 
granted execute priv on the functions used in the view, but I'm still 
not getting results for this psuedo user, even though the SQL selection  
works fine for my account. I'd prefer not to just hand out a superuser 
privilege to the database go-between because my purpose is to keep this 
db-web interface role's reach short.


Am I digging too deep here, or what am I missing? Is there a better way 
to tranfer info between my database and the web than by using a generic 
account? It sure seems like I'm granting too much access to too little a 
player. Any advice would be welcome.


TIA,
 - Bill Thoen