date:20071001

Re: [GENERAL] Strange discrepancy in query performance...

2007-10-01 Thread Jason L. Buberel


Tom-right-as-usual:

Yep - you were right about the query plan for the prepared statement (a 
sequential scan of the table) differed a bit from the directly-executed 
version :)


For reference, when using JasperReports .jrxml files as the basis for 
the query, I only had to do to the following to 'force' postgres to 
treat the jasper report parameter as a number and not text, thereby 
allowing the correct index to be used:


select * from city summary where city_master_id = 
$P{city_master_id}::bigint ...


Query times went from 300+ seconds back down to ~100ms.

-jason

Tom Lane wrote:

"Jason L. Buberel" <[EMAIL PROTECTED]> writes:
  
In my syslog output, I see entries indicating that the 
JDBC-driver-originated query on a table named 'city_summary' are taking 
upwards of 300 seconds:



  

Oct  1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG:  duration: 307077.037 ms  execute S_42: select * from city_summary
  where  state = $1 and city_master_id = $2 and res_type = 
'single_family' and date = $3

  and range = 90 and zip = $4 and quartile  = '__ALL'
DETAIL:  parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'



  
However, if I run the same query on the same host at the same time that 
the Java application is running, but from the psql  command line, it 
takes only 0.37 seconds:



  
time /opt/postgres-8.2.4/bin/psql --port 54824  -U postgres -d 

altos_research  -c 'select fact_id from city_summary where state = 
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = 
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = 
\'2007-09-28\';'



This is not, in fact, the same query --- the JDBC-originated one is
parameterized, which means it very possibly has a different plan
(since the planner doesn't know the particular values to plan for).

Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is
being produced for the parameterized query.

regards, tom lane

Re: [GENERAL] Strange discrepancy in query performance...

2007-10-01 Thread Tom Lane

"Jason L. Buberel" <[EMAIL PROTECTED]> writes:
> In my syslog output, I see entries indicating that the 
> JDBC-driver-originated query on a table named 'city_summary' are taking 
> upwards of 300 seconds:

> Oct  1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
> LOG:  duration: 307077.037 ms  execute S_42: select * from city_summary
>   where  state = $1 and city_master_id = $2 and res_type = 
> 'single_family' and date = $3
>   and range = 90 and zip = $4 and quartile  = '__ALL'
> DETAIL:  parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

> However, if I run the same query on the same host at the same time that 
> the Java application is running, but from the psql  command line, it 
> takes only 0.37 seconds:

>>> time /opt/postgres-8.2.4/bin/psql --port 54824  -U postgres -d 
> altos_research  -c 'select fact_id from city_summary where state = 
> \'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = 
> \'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = 
> \'2007-09-28\';'

This is not, in fact, the same query --- the JDBC-originated one is
parameterized, which means it very possibly has a different plan
(since the planner doesn't know the particular values to plan for).

Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is
being produced for the parameterized query.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Strange discrepancy in query performance...

2007-10-01 Thread Jason L. Buberel

I'm hoping that someone on the list can help me understand an apparent 
discrepancy in the performance information that I'm collecting on a 
particularly troublesome query.


The configuration: pg-8.2.4 on RHEL4. log_min_duration_statement = 1m.

In my syslog output, I see entries indicating that the 
JDBC-driver-originated query on a table named 'city_summary' are taking 
upwards of 300 seconds:


Oct  1 18:27:47 srv3 postgres-8.2[1625]: [12-1]
LOG:  duration: 307077.037 ms  execute S_42: select * from city_summary
 where  state = $1 and city_master_id = $2 and res_type = 
'single_family' and date = $3

 and range = 90 and zip = $4 and quartile  = '__ALL'
DETAIL:  parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL'

However, if I run the same query on the same host at the same time that 
the Java application is running, but from the psql  command line, it 
takes only 0.37 seconds:


> time /opt/postgres-8.2.4/bin/psql --port 54824  -U postgres -d 
altos_research  -c 'select fact_id from city_summary where state = 
\'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = 
\'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = 
\'2007-09-28\';'


fact_id
--
46624925
(1 row)

0.00user 0.00system 0:00.37elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+285minor)pagefaults 0swaps

The output of 'explain' seems to indicate that the right index is being 
used:


QUERY PLAN
Index Scan using city_summary_pkey on city_summary  (cost=0.00..12.27 
rows=1 width=2721)

  Index Cond: ((date = '2007-09-28'::text) AND
   (state = 'CA'::text) AND
   (city_master_id = 334::bigint) AND
   (quartile = '__ALL'::text) AND
   (range = '90'::text))
  Filter: ((zip = '__ALL'::text) AND ((res_type)::text = 
'single_family'::text))

(3 rows)

The index looks like this:

# \d city_summary_pkey
Index "public.city_summary_pkey"
  Column   |  Type
+-
date   | text
state  | text
city_master_id | bigint
zip_master_id  | integer
res_type_master_id | bigint
quartile   | text
range  | text
primary key, btree, for table "public.city_summary"

Any ideas on why I am seeing such a big difference between the two 
measurements (JDBC/syslog vs. command line)?


Thanks,
Jason






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] windows and pg 8.2 (change database to another server)

2007-10-01 Thread Terry Yapt


Magnus Hagander escribió:

On Sun, Sep 30, 2007 at 11:36:23PM +0200, Terry Yapt wrote:

First of all.  I think this method is admisible. Isn't it ?



It is.
  


Glad to read it  :-)

And second question: I think my problem is that some rights are wrong 
after copying data folder.  What are the right rights to apply to data 
folder ?



Yes, most likely. You need to grant the postgres service account "Change"
permissions (or Full Control, but Change is recommended) on the data
directory. If you didn't do anything speicifically, it will just have
inherited from further up in the tree, which means that the service account
only has "Read" access.

//Magnus
  


I have tried a couple of combinations none of them was successful.

I have tried to assign 'Full Control' to data folder and sub-folders and 
files.  Varying this 'Full Control' preserving inheritance, deleting 
inheritance.  I have tried to assign 'Full Control' to Administrators 
and SYSTEM accounts/groups too.  I have tried to do the same thing over 
sub-folders, files and so on.


Nothing was ok.

I have got this application errors (in reverse cronological order, 
1=more recent):


= 1 
postgres: could not find the database system
Expected to find it in the directory "C:/Archivos de 
programa/PostgreSQL/8.2/data",
but could not open file "C:/Archivos de 
programa/PostgreSQL/8.2/data/global/pg_control": Permission denied

= 2 
-2007-10-01 23:19:31 PANIC:  could not open control file 
"global/pg_control": Permission denied

= 3 
--2007-10-01 23:17:28 FATAL:  could not create lock file 
"postmaster.pid": Permission denied

===


Greetings...

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Dump of table structure is not consistent

2007-10-01 Thread Poul Møller Hansen




\d my.table is showing me the primary key as:
"unitstat_pkey" PRIMARY KEY, btree (id)

But the looking into the table structure produced by
pg_dump -s -n my -t table db

I'm getting gthe primary key shown as:
ADD CONSTRAINT unistat_pkey PRIMARY KEY (id);

That has been the name of it, but I altered it executing:
ALTER TABLE unistat_pkey RENAME TO unitstat_pkey


Solved it with this statement
update pg_catalog.pg_constraint set conname = 'unitstat_pkey' where 
conname = 'unistat_pkey'


Poul



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] sha1 function

2007-10-01 Thread Phoenix Kiula

On 01/10/2007, Martin Marques <[EMAIL PROTECTED]> wrote:
>
> 1) Is MD5's weakness true?

Yes, but not really for using in a password functionality. You are
very unlikely to repeat a password but in any case you will have the
user ID to make it unique.

> 2) Is there any sha1() functions in PostgreSQL?
>

It doesn't come with the default setup, but it's very simple to get.
Take a look at this: http://snipr.com/pg_sha1

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Partitioned table limitation

2007-10-01 Thread Scott Marlowe

On 10/1/07, Goboxe <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Are there any limitations on number of child tables that can be use
> in
> partitioned table?
>
>
> I am currently having weekly partitioned tables (using partitioned
> view in SQL Server) that I kept for 2 years.
> In total, there will be 52 * 2 = 104 tables exist at one time in the
> partition.
>
>
> I am migrating from SQL Server to pg. Can pg support that number of
> tables?
>
>
> How about if I want to convert it as daily tables (356 * 2 years =
> 712
> tables).
> Is this recommended?

I've played around with as many as 1,000 child tables.  By then, the
planning time becomes noticeably longer than for a single table, but
the response time is still so much faster that it's worth it.  Note
I'm talking only a fraction of a second planning time, even at 1,000
tables.

If you are going over 100 tables, make sure you're using triggers for
updating the child tables not rules, as rules are far too slow when
there's over 50 or so tables to look choose from.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Porting Schema from MySQL

2007-10-01 Thread Michael Glaesemann



On Oct 1, 2007, at 14:54 , Farhan Khan wrote:


Any pointer for implementing this functionality in postgresql ??


Write a trigger that fires on update and replaces the value with  
CURRENT_TIMESTAMP.


You could probably do something like this with rules as well, but the  
trigger method is probably more straightforward.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username

2007-10-01 Thread Scott Marlowe

On 9/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> I don't care if it's part of the SQL standard or not. I don't care if
> oracle does it or not.  You're losing mysql converts as they go
> through the tutorial and get to this point.

If that's all it takes for them to switch, seriously, I'd rather see
them go.  There are other solutions, like using a role instead of a
user for the target of the permissions.

> There's like a hundred posts asking for this for the past four years,
> and all they get in response is standards nazi's saying it won't be
> supported because it's not in the standard

Ummm.  no.  I've never seen that response.  I have seen plenty of
people saying that no one's had the urge to hack it into working code.
 Also, a common answer is to use roles (i.e. groups) for such things.
grant permission to the role, add users to the role, all done.

> and telling them to write
> their own functions.

About 75% of the time I see that response, it comes with the actual
code to do just that.  I.e. cut and paste and voila, you've got the
functions.

> You write the function. Fuck the standard and
> wake up.

Me?  What about you?  The fact is there's a limited number of hackers
capable of writing what you're asking for cleanly and correctly, and
they're working on other stuff.  Asking them politely has been know to
work.  Using the F word not so much.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Porting Schema from MySQL

2007-10-01 Thread Farhan Khan

Hi ... I am porting a db schema from MySQL to postgresql and having problem
in defining a column level constraint...

mtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

gives me problem at  ^ON UPDATE ...

Any pointer for implementing this functionality in postgresql ??

Tx .. Farhan

Re: [GENERAL] Find clusters containing a schema?

2007-10-01 Thread Alvaro Herrera

Josh Trutwin wrote:
> Is it possible to somehow query the system catalog to find out which
> clusters/databases have a certain schema?

No, unless you connect to each database in turn.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Data cube in PostgreSQL

2007-10-01 Thread Dimitri Fontaine

Hi,

Le Wednesday 26 September 2007 20:58:38 Gowrishankar, vous avez écrit :
> Is there any other way of extending postgresql to include cubes?

Something like the cube contrib?
  http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/cube/

Contribs are part of PostgreSQL sources and maintained as such, but not 
included into the 'core' project. They are certainly distributed along with 
PostgreSQL in your packaging system of choice.

Regards,
-- 
dim

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] importing large files

2007-10-01 Thread Dimitri Fontaine

Hi,

Le Friday 28 September 2007 10:22:49 [EMAIL PROTECTED], vous avez 
écrit :
> I need to import between 100 millions to one billion records in a
> table. Each record is composed of  two char(16) fields. Input format
> is a huge csv file.I am running on a linux box with 4gb of ram.
> First I create the table. Second I 'copy from' the cvs file. Third I
> create the index on the first field.
> The overall process takes several hours. The cpu seems to be the
> limitation, not the memory or the IO.
> Are there any tips to improve the speed ?

If you don't need to fire any trigger and trust the input data, then you may 
benefit from the pgbulkload project:
  http://pgbulkload.projects.postgresql.org/

The "conditions of usage" may be lighter than what I think they are, though.

Regards,
-- 
dim

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] PostgreSQL Conference Fall 2007, final schedule

2007-10-01 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Matthew T. O'Connor wrote:
> Joshua D. Drake wrote:
>> The PostgreSQL Conference Fall 2007 to date, has been a huge success. Of
>> course we haven't actually held the conference yet but already we have a
>> strong line of speakers and sponsors confirmed.
> [ snip ]
> 
> I can't attend, but wish I could, is there going to be a web cast?  Or
> some way to watch it later / after the fact?
> 

We are planning on recording all of the talks.

Joshua D. Drake

> Thanks,
> 
> Matt
> 
> 

- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHAUlQATb/zqfZUUQRAqKCAKCsP1vdloallXCLhlHJ3nZxM6tN2ACfe8Gw
1RPgSO7Tu6x4FbaYUFO281o=
=E6Hl
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] PostgreSQL Conference Fall 2007, final schedule

2007-10-01 Thread Matthew T. O'Connor


Joshua D. Drake wrote:

The PostgreSQL Conference Fall 2007 to date, has been a huge success. Of
course we haven't actually held the conference yet but already we have a
strong line of speakers and sponsors confirmed.

[ snip ]

I can't attend, but wish I could, is there going to be a web cast?  Or 
some way to watch it later / after the fact?


Thanks,

Matt



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] usage of indexes for inner joins

2007-10-01 Thread Jan Theodore Galkowski

thanks for all your useful comments.  i will study all of them.

a couple of inline comments below, just for clarification to the group,
marked with asterisks.

On Mon, 1 Oct 2007 13:13:23 -0500, "Scott Marlowe"
<[EMAIL PROTECTED]> said:
> On 10/1/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote:
> > Scott,
> >
> > i didn't think this belonged in the general list, but the example i
> > gave for discussion was a toy, for illustration.  i could not very
> > well post the actual example for many reasons, including proprietary
> > ones and, given this is how things work, because the 1.5 million row
> > table in question is its own smallest description.
>
> This is the exact kind of question that belongs on -general.  But it
> does get asked a lot, especially by people coming from other
> databases.
>
> > while indexes are being used on that table, there's a companion
> > table which is much smaller -- a "mere" 75000 rows -- which is
> > suffering a sequential scan, and i was trying to eliminate those.
>
> Well, don't think of sequential scans as plain bad.  Sometimes they're
> the best choice, sometimes they're not.
>
> Now, if an index scan is provably and noticeably faster than the
> sequential scan, then the planner is making the wrong decision.  Have
> you tried running your query with
>
> set enable_seqscan=off;

***actually, yes.  the engine just ignored it.***
>
> to see how it behaves?  I've found many somewhat slow queries got
> really fast or really slow when I did that.
>
> Note that you shouldn't blindly run a query all the time with that
> setting, as there are many instances where seqscan is the right
> answer.  Also, your explain cost estimates will all be way off.
>
> > perhaps it is true that ANALYZE isn't being done often enough.
> > perhaps VACUUMs aren't being done often enough either.  we're leary
> > of scheduling repeated VACUUMs having encountered a case where the
> > VACUUM took over an hour to complete.
>
> Run "analyze verbose" on your db and see what it says about number of
> page slots needed versus used.  that will help you tell if you're
> vacuuming enough.
>
> How long vacuum takes isn't really that important.  What is important
> is how much of an impact it's having on the system.  there are
> several vacuum parameters in the postgresql.conf file that can lower
> the impact vacuum has on your system I/O wise while increasing its
> run time.
>
> Vacuum full is another story.  Think of it as a recovery tool, not a
> periodic maintenance tool.
>
> > it may, too, be because the tables use user-defined types heavily
> > and the original UPDATE involved a SELECT ... IN ... having a GROUP
> > BY with a few references to columns deep within user-defined types.
>
> Hard to say without a query and an explain analyze output.  It's
> common for user defined functions to produce estimates in the
> planner that are way off.  user defined types, not so much.  But the
> more complex the query the more likely it is that the query planner
> will make a bad estimate of the number of rows somewhere and choose
> a bad method.
>
> >  that wouldn't have been my choice, but, then, they were given to
> >  me to work, not my design.  in fact, PG is the first relational
> >  database implementation i've used that offered such things in a
> >  big way.
>
> Extensibility is quite a useful tool.
>
> > i also don't understand some other things, which are surprising,
> > like why some UPDATEs take so much longer when wrapped in a BEGIN
> > TRANSACTION- COMMIT than when having the transaction at a statement
> > level.
>
> that is strange.  I'd expect that maybe you've got something happening
> with the transaction waiting on other transactions, so that it's not
> so much running hard as just tapping its toe waiting for the other
> transaction to commit or roll back.

*** yes, i thought it was odd, too.  there wasn't anything else in that
transaction, and the table was set up for an experiment.  of course, the
experiment was one of those "UPDATE foo SET foo.x = 1 + foo.x WHERE
foo.y < k" things. ***
>
> > I come from an Oracle, DB2, Informix world, and in my experience
> > plans for queries are more stable.   i have loitered in and around
> > MySQL for a while.   i'm not surprised there's a learning curve with
> > PG.  i am surprised it breaks so marked with mainstay database
> > experience.
>
> Oh, I've seen Oracle get stupid due to lack of proper statistics as
> well.  You like had a good DBA who kept all that stuff hidden from
> you  though.

No comment, actually, since I worked right alongside the the DBA, and
sometimes did things myself.

>  But PostgreSQL and mainline db experience are often
> incompatible.  The very things that people are used to using to make
> other dbs fast (forcing index usage for instance) can make postgresql
> noticeably slower.
>
> You might find that partial index help for some circumstances.  If you
> are using a query that has a where clause that looks at a field that
>

[GENERAL] PostgreSQL Conference Fall 2007, final schedule

2007-10-01 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

The PostgreSQL Conference Fall 2007 to date, has been a huge success. Of
course we haven't actually held the conference yet but already we have a
strong line of speakers and sponsors confirmed.

If you haven't registered, now is your chance, just visit:

http://www.postgresqlconference.org

Schedule:

*  8:00 - 8:45 - Coffee / Breakfast / Social / Wake up / Go back to
hotel for socks (provided by conference)
* 8:45 - 9:00 - Joshua Drake - A word from our sponsors
* 9:00 - 9:25 - JoshB - Welcome to 8.3
* 9:25 - 10:20 - David Wheeler - Web 2.0 (Rails) applications with
PostgreSQL
* -- 10 minute break --
* 10:30 - 11:20 - Robert Hodges - Scaling PostgreSQL Performance with
uni/cluster
* 11:20 - 12:10 - Neil Conway - Understanding Query Execution in PostgreSQL
* 12:10 - 13:15 - Lunch
* 13:15 - 13:45 - Mark Wong - PostgreSQL Performance
* 13:45 - 14:15 - Joshua Drake - PL/Proxy and Horizontal Scaling
* 14:15 - 15:05 - Web Sprague - PostGIS (geographic database)
* -- 10 minute break --
* 15:15 - 16:05 - David Fetter - Babel of procedural languages
* 16:05 - 17:00 - Robert Treat - PostgreSQL Partitioning, semantics,
pitfalls and implementation
* 17:00 - 17:25 - Josh Berkus - Stupid Solaris tricks
* 17:25 - 17:30 - Closing Remarks, Thanks, Where's the party?
* 17:30 - 18:00 - Get to party/dinner (provided by conference)
* 18:00 -- Dinner/Party till they kick us out

And once again, thanks to our sponsors:

Command Prompt: http://www.commandprompt.com/
Continuent: http://www.continuent.com/
EnterpriseDB: http://www.enterprisedb.com/
Greenplumn : http://www.greenplum.com/
OmniTI: http://www.omniti.com/
OTG: http://www.otg-nc.com/
Sun: http://www.sun.com/
Truviso: http://www.truviso.com/

Other Sonsors:

PDXPUG: http://pugs.postgresql.org/pdx
PSU: http://www.pdx.edu

Sincerely,

Joshua D. Drake



- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHATlOATb/zqfZUUQRArmuAJ0e+hPMkVRkAfkz7k5RPgcHnweqvwCeJfC8
vOeSNyxle7KdoHRp1c0rauE=
=HUle
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] usage of indexes for inner joins

2007-10-01 Thread Scott Marlowe

On 10/1/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote:
> Scott,
>
> i didn't think this belonged in the general list, but the example i gave
> for discussion was a toy, for illustration.  i could not very well post
> the actual example for many reasons, including proprietary ones and,
> given this is how things work, because the 1.5 million row table in
> question is its own smallest description.

This is the exact kind of question that belongs on -general.  But it
does get asked a lot, especially by people coming from other
databases.

> while indexes are being used on that table, there's a companion table
> which is much smaller -- a "mere" 75000 rows -- which is suffering a
> sequential scan, and i was trying to eliminate those.

Well, don't think of sequential scans as plain bad.  Sometimes they're
the best choice, sometimes they're not.

Now, if an index scan is provably and noticeably faster than the
sequential scan, then the planner is making the wrong decision.  Have
you tried running your query with

set enable_seqscan=off;

to see how it behaves?  I've found many somewhat slow queries got
really fast or really slow when I did that.

Note that you shouldn't blindly run a query all the time with that
setting, as there are many instances where seqscan is the right
answer.  Also, your explain cost estimates will all be way off.

> perhaps it is true that ANALYZE isn't being done often enough.  perhaps
> VACUUMs aren't being done often enough either.  we're leary of
> scheduling repeated VACUUMs having encountered a case where the VACUUM
> took over an hour to complete.

Run "analyze verbose" on your db and see what it says about number of
page slots needed versus used.  that will help you tell if you're
vacuuming enough.

How long vacuum takes isn't really that important.  What is important
is how much of an impact it's having on the system.  there are several
vacuum parameters in the postgresql.conf file that can lower the
impact vacuum has on your system I/O wise while increasing its run
time.

Vacuum full is another story.  Think of it as a recovery tool, not a
periodic maintenance tool.

> it may, too, be because the tables use user-defined types heavily and
> the original UPDATE involved a SELECT ... IN ... having a GROUP BY with
> a few references to columns deep within user-defined types.

Hard to say without a query and an explain analyze output.  It's
common for user defined functions to produce estimates in the planner
that are way off.  user defined types, not so much.  But the more
complex the query the more likely it is that the query planner will
make a bad estimate of the number of rows somewhere and choose a bad
method.

>  that
> wouldn't have been my choice, but, then, they were given to me to work,
> not my design.  in fact, PG is the first relational database
> implementation i've used that offered such things in a big way.

Extensibility is quite a useful tool.

> i also don't understand some other things, which are surprising, like
> why some UPDATEs take so much longer when wrapped in a BEGIN
> TRANSACTION-
> COMMIT than when having the transaction at a statement level.

that is strange.  I'd expect that maybe you've got something happening
with the transaction waiting on other transactions, so that it's not
so much running hard as just tapping its toe waiting for the other
transaction to commit or roll back.

> I come from an Oracle, DB2, Informix world, and in my experience
> plans for queries are more stable.   i have loitered in and around MySQL
> for a while.   i'm not surprised there's a learning curve with PG.  i am
> surprised it breaks so marked with mainstay database experience.

Oh, I've seen Oracle get stupid due to lack of proper statistics as
well.  You like had a good DBA who kept all that stuff hidden from you
though.  But PostgreSQL and mainline db experience are often
incompatible.  The very things that people are used to using to make
other dbs fast (forcing index usage for instance) can make postgresql
noticeably slower.

You might find that partial index help for some circumstances.  If you
are using a query that has a where clause that looks at a field that
has one value 99% of the time and another value 1% of the time, you
can index that 1% only, and an index scan will be ultra quick.  The
standard case for that is a boolean field.

create table test (id int, info text, btest bool);
insert 100,000 rows, with 1% having btest=true, the rest false.
create index test_btest_true on test(btest) where btest IS TRUE;
analyze test;
explain analyze select * from test where btest is true;

Generally, postgresql offers different ways to solve the same problems
as other database, and knowing those ways can really help troubleshoot
and fix poorly performing queries.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so

Re: [GENERAL] Upgrading PG

2007-10-01 Thread Erik Jones


On Oct 1, 2007, at 12:26 PM, Gauthier, Dave wrote:

I’m going to move from v7.4.13 to v8.2.0 (suse-64). I have 2 pre- 
existing DBs.  Do I need to “convert” or port them to v8 in any way  
after I start up with a v8 postmaster?



Thanks


Moving between major release versions requires that you do a dump/ 
restore using the new version's pg_dump and pg_restore apps.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Gregory Stark

"Mike Charnoky" <[EMAIL PROTECTED]> writes:

> Here is the output from EXPLAIN ANALYZE.  This is the same query run
> back to back, first time takes 42 minutes, second time takes less than 2
> minutes!

That doesn't really sound strange at all. It sounds like you have a very slow
disk and very large amount of memory. 40 minutes to scan 11.4M records sounds
kind of high to me though.

How wide are these records anyways? That is, what is the table definition for
this table? If this is a single consumer drive 42 minutes sounds about right
for 2k wide records being randomly accessed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Upgrading PG

2007-10-01 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gauthier, Dave wrote:
> I'm going to move from v7.4.13 to v8.2.0 (suse-64). I have 2
> pre-existing DBs.  Do I need to "convert" or port them to v8 in any way
> after I start up with a v8 postmaster?  
> 

1. v8.2.0 is a mistake, make sure you are running the latest stable
patch rev. v8.2.5.

2. You will need to dump and reload, use the v8.2.5 pg_dump to connect
to the v7.4.13 databases and restore.

3. There are specific behavior differences. The one that comes to mind
most prominently is that float/double can no longer be ''.

4. Read the release notes from the docs

5. 8.2 is a completely different beast in terms of performance and
configuration.

Sincerely,

Joshua D. Drake

> 
> Thanks
> 
> -dave
> 
> 

- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHATIlATb/zqfZUUQRAh0AAJ9+m0Oq5/EmytFGe2A9QR+8UDlOfwCcCGQ6
VRxeRHkq6GlWfviZcdCPKJ0=
=SRz3
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Upgrading PG

2007-10-01 Thread Alan Hodgson

On Monday 01 October 2007, "Gauthier, Dave" <[EMAIL PROTECTED]> wrote:
> I'm going to move from v7.4.13 to v8.2.0 (suse-64). I have 2
> pre-existing DBs.  Do I need to "convert" or port them to v8 in any way
> after I start up with a v8 postmaster?
>

All major version upgrades require a dump and reload. You should read all 
the update notes for intermediate versions.

You may also run into some bad-unicode data issues moving from 7.4 to 8.x. I 
would do a test run on a separate machine before committing to a conversion 
date (always a good idea IMO).

-- 
Ghawar is dying

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Partitioned table limitation

2007-10-01 Thread paul rivers


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-general-
> [EMAIL PROTECTED] On Behalf Of Goboxe
> Sent: Monday, October 01, 2007 2:18 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Partitioned table limitation
> 
> Hi,
> 
> Are there any limitations on number of child tables that can be use
> in
> partitioned table?
> 
> [snip]

We currently use partitioning by date and id, with 1/4 a year of dates and
approximately 10 IDs (and slowly increasing).  Each partition runs from
around 1 million to 20 million rows.   

Whether it's recommended or not, I don't know.  But for us, the partitioning
works exactly as advertised.  As with anything new, I'd take the time to
setup a simple test to see if it works for you, too.

In particular, be sure to check the documentation on caveats.  You'll find
these a little stricter than partitioning issues in Oracle or SQL Server.  

HTH,
Paul



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alvaro Herrera

Mike Charnoky wrote:
> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!

This is just cache effect.



-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Bill Moran

In response to Mike Charnoky <[EMAIL PROTECTED]>:

> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!

This sounds like a caching issue.  My guess at what's happening is that
other operations are pushing this data out of the shared_buffers, so
when you run it, the system has to pull a bunch of tuples off the disk
to check them.  If you run it again immediately, the tuples are still in
memory, and it runs very fast.

If this is the case, you can speed up things by adding RAM/shared_buffers,
or by moving to faster disks.  The RAM solution is going to give you the
biggest performance improvement.

However, if there's enough other data on this system, you may have
difficulty getting enough RAM to mitigate the problem, in which case,
faster disks are going to be your best bet.

How much RAM do you have, and how much of it is allocated to shared_buffers?
What's your IO subsystem look like?

> Again, nothing else is happening on this db server except for a constant
> insertion into this table and a few others.  I have done "set statistics
> 100" for the evtime field in this table.
> 
> Here is the output from EXPLAIN ANALYZE.  This is the same query run
> back to back, first time takes 42 minutes, second time takes less than 2
> minutes!
> 
> mydb=# explain analyze select count(*) from prediction_accuracy where
> evtime between '2007-09-29' and '2007-09-30';
> 
>   QUERY PLAN
> 
> ---
>  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
> time=2549854.351..2549854.352 rows=1 loops=1)
>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
> rows=11423786 loops=1)
>  Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
> time zone))
>  Total runtime: 2549854.411 ms
> (4 rows)
> 
> Time: 2549943.506 ms
> mydb=# explain analyze select count(*) from prediction_accuracy where
> evtime between '2007-09-29' and '2007-09-30';
> 
>  QUERY PLAN
> 
> -
>  Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
> time=111200.943..111200.944 rows=1 loops=1)
>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
> rows=11423786 loops=1)
>  Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
> time zone))
>  Total runtime: 111201.000 ms
> (4 rows)
> 
> Time: 111298.695 ms
> 
> 
> Mike
> 
> Gregory Stark wrote:
> > "Mike Charnoky" <[EMAIL PROTECTED]> writes:
> > 
> >> I altered the table in question, with "set statistics 100" on the
> >> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
> >> queries don't seem to hang, but it still takes a long time to do the count:
> >>  * "where evtime between '2007-09-26' and '2007-09-27'"
> >>took 37 minutes to run (result was ~12 million)
> >>  * "where evtime between '2007-09-25' and '2007-09-26'"
> >>took 40 minutes to run (result was ~14 million)
> >>
> >> Still stymied about the seemingly random performance, especially since I
> >> have seen this query execute in 2 minutes.
> > 
> > 
> > And the "explain analyze" for these?
> > 
> > Are you still sure it's certain date ranges which are consistently problems
> > and others are consistently fast? Or could it be something unrelated.
> > 
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq


-- 
Bill Moran
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Upgrading PG

2007-10-01 Thread Gauthier, Dave

I'm going to move from v7.4.13 to v8.2.0 (suse-64). I have 2
pre-existing DBs.  Do I need to "convert" or port them to v8 in any way
after I start up with a v8 postmaster?  


Thanks

-dave

Re: [GENERAL] using COPY, .CSV and ¿catalog?

2007-10-01 Thread Andrej Ricnik-Bay

On 10/1/07, pere roca ristol <[EMAIL PROTECTED]> wrote:
>   Hi everybody,

>   I want to enter a .CSV file using COPY comand and plpgsql.
>   It enters latitude,longitude and some data. In the CSV data there is no
> field (such as "user_name" or current_time) that allow distinguish future
> queries for different users (ex: select x,y from table where user_name=z;
> after entering lat,lon I would like to update the_geom in  another table
> using the lat,lon of ONLY this user).

>   I could update this kind of data ("user_name") using php but next user
> could not enter his data by COPY (it only lets fill the destination table if
> ALL the fields are the same as the CSV or text file).
I'm not a 100% certain I understand what you're trying
to achieve, but to me this looks like a mis-use?  Why
not use the relational features of your RDBMS and just
pick a location from a table created once via a unique
value you assign during (or after) the COPY?



>   Thanks,
>   Pere
Cheers,
Andrej



-- 
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alan Hodgson

On Monday 01 October 2007, Mike Charnoky <[EMAIL PROTECTED]> wrote:
> This is strange... count(*) operations over a period of one day's worth
> of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
> first time the data is queried it takes about 40 minutes.  If I try the
> query again, it finishes in 1-2 minutes!
>
> Again, nothing else is happening on this db server except for a constant
> insertion into this table and a few others.  I have done "set statistics
> 100" for the evtime field in this table.

The first time, you're reading from disk. The second time, you're reading 
from cache. Tens of millions of disk seeks don't come cheap.

-- 
We're Microsoft. Everything else is just patent infringement.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Find out encoding of data

2007-10-01 Thread joynes


Hi!
I have this problem that Im not sure if my stored data has the correct
coding. When I view the data from a postgres console I just see the
characters and depending on my console-encoding it looks differently. 

What I really want to see is the hexadecimal or octal value of the bytes of
the retrieved data. Can postgres give me this somehow (without exporting
tables to files and look at the files).

/br joynes
-- 
View this message in context: 
http://www.nabble.com/Find-out-encoding-of-data-tf4549554.html#a12983073
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] Partitioned table limitation

2007-10-01 Thread Goboxe

Hi,

Are there any limitations on number of child tables that can be use
in
partitioned table?


I am currently having weekly partitioned tables (using partitioned
view in SQL Server) that I kept for 2 years.
In total, there will be 52 * 2 = 104 tables exist at one time in the
partition.


I am migrating from SQL Server to pg. Can pg support that number of
tables?


How about if I want to convert it as daily tables (356 * 2 years =
712
tables).
Is this recommended?


FYI, currently each weekly table storeing between 28 to 32 millions
records (approx 4-5 millions recs daily)


Thanks,
G


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] importing large files

2007-10-01 Thread [EMAIL PROTECTED]

Hello,

I need to import between 100 millions to one billion records in a
table. Each record is composed of  two char(16) fields. Input format
is a huge csv file.I am running on a linux box with 4gb of ram.
First I create the table. Second I 'copy from' the cvs file. Third I
create the index on the first field.
The overall process takes several hours. The cpu seems to be the
limitation, not the memory or the IO.
Are there any tips to improve the speed ?

Thanks very much,

Olivier


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] DAGs and recursive queries

2007-10-01 Thread paul.dorman

Thanks for your answers guys. I've got a cold right now and my brain
is mush, so I can't comment intelligently on your suggestions just
yet. I just wanted to express my thanks for your time.

Jeff, one book you might want to look at is Joe Celko's Trees and
Hierarchies in SQL for Smarties.

http://www.amazon.com/Hierarchies-Smarties-Kaufmann-Management-Systems/dp/1558609202

If the connectby() function is like the Oracle connectby function,
then perhaps it will suit my needs. The categorization scheme will
nearly always have multiple parents for all but the topmost node. Each
category stores serialized method calls for CRUD operations on objects
within the category, which are requested by the application for all
interactions with stored objects (though I'd like to be able to cache
them too, but that's in the application domain). Each object stored in
the database will belong to at least one category, but I expect they
will normally belong to many categories. When I create a new object of
categoryD, which is a child of categoryC and categoryB, which are
children of categoryA, then my application will need the CREATE method
calls from all parents, as well as the object category itself. I'd
like them all returned from one query, possibly ordered according to
the distance from the child node.

Oops, I'm trying to comment intelligently. Better stop now.

Cheers,
Paul


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] how to ignore invalid byte sequence for encoding without using sql_ascii?

2007-10-01 Thread [EMAIL PROTECTED]

I am now importing the dump file of wikipedia into my postgresql using
maintains/importDump.php. It fails on 'ERROR: invalid byte sequence
for encoding UTF-8'. Is there any way to let pgsql just ignore the
invalid characters ( i mean that drop the invalid ones ), that the
script will keep going without die on this error.

I know that i can using sql_ascii or even modify the importDump.php,
but those are not so easy to do as i thought.

thanks for help


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Find clusters containing a schema?

2007-10-01 Thread Josh Trutwin

Is it possible to somehow query the system catalog to find out which
clusters/databases have a certain schema?

When running the query:

SELECT * FROM pg_catalog.pg_namespace WHERE nspname = 'myschema';

It always only finds data for the current session, not all clusters,
even when connected as postgres.

Thanks,

Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Autostart PostgreSQL in Ubuntu

2007-10-01 Thread Johann Maar

Hi,

> /var/run/ might be on a temporary file system.  So you need to adjust
> your init script to create that directory if it doesn't exist.

That is what I tried now and it works for now.

I never had installed the Debian's PostgreSQL packages and I once
manually installed those init.d-script. Assigning that to the
different runlevels is possible by manually add those symbolic links
or by using a runlevel editor GUI in Ubuntu.

Thanks very much for all your responses!

Sebastian


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] Data cube in PostgreSQL

2007-10-01 Thread Gowrishankar

Hi All,

I want to implement data cube operator in PostGReSQL. I searched few
forums and found that I only can have interface to postgresql using
EFEU package which allows basic cube operations.

Is there any other way of extending postgresql to include cubes?

thanks
Gowrishankar


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username

2007-10-01 Thread dterrors


I don't care if it's part of the SQL standard or not. I don't care if
oracle does it or not.  You're losing mysql converts as they go
through the tutorial and get to this point. Or worse, they just "grant
all" because it's easier, thus causing security holes.  User
friendliness matters.

There's like a hundred posts asking for this for the past four years,
and all they get in response is standards nazi's saying it won't be
supported because it's not in the standard and telling them to write
their own functions.  You write the function. Fuck the standard and
wake up.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Mike Charnoky

This is strange... count(*) operations over a period of one day's worth
of data now take ~1-2 minutes to run or ~40 minutes.  It seems that the
first time the data is queried it takes about 40 minutes.  If I try the
query again, it finishes in 1-2 minutes!

Again, nothing else is happening on this db server except for a constant
insertion into this table and a few others.  I have done "set statistics
100" for the evtime field in this table.

Here is the output from EXPLAIN ANALYZE.  This is the same query run
back to back, first time takes 42 minutes, second time takes less than 2
minutes!

mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

  QUERY PLAN

---
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=2549854.351..2549854.352 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892
rows=11423786 loops=1)
 Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 2549854.411 ms
(4 rows)

Time: 2549943.506 ms
mydb=# explain analyze select count(*) from prediction_accuracy where
evtime between '2007-09-29' and '2007-09-30';

 QUERY PLAN

-
 Aggregate  (cost=3.02..3.03 rows=1 width=0) (actual
time=111200.943..111200.944 rows=1 loops=1)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483
rows=11423786 loops=1)
 Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with
time zone))
 Total runtime: 111201.000 ms
(4 rows)

Time: 111298.695 ms


Mike

Gregory Stark wrote:
> "Mike Charnoky" <[EMAIL PROTECTED]> writes:
> 
>> I altered the table in question, with "set statistics 100" on the
>> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
>> queries don't seem to hang, but it still takes a long time to do the count:
>>  * "where evtime between '2007-09-26' and '2007-09-27'"
>>took 37 minutes to run (result was ~12 million)
>>  * "where evtime between '2007-09-25' and '2007-09-26'"
>>took 40 minutes to run (result was ~14 million)
>>
>> Still stymied about the seemingly random performance, especially since I
>> have seen this query execute in 2 minutes.
> 
> 
> And the "explain analyze" for these?
> 
> Are you still sure it's certain date ranges which are consistently problems
> and others are consistently fast? Or could it be something unrelated.
> 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] row->ARRAY or row->table casting?

2007-10-01 Thread Nico Sabbi


Gregory Stark ha scritto:

"Nico Sabbi" <[EMAIL PROTECTED]> writes:

  

nb1=# select r.a from (select row(tab1.*)::tab1 as r from tab1)x;
ERROR:  missing FROM-clause entry for table "r"
LINE 1: select r.a from (select row(tab1.*)::tab1 as r from tab1)x;
  ^
I tried many variations (including casting x as tab1) obtaining only syntax
errors.



r.a would be the column "a" in the table named "r", but the only table in the
FROM list is "x". So you have to use a workaround to make it clear to the
parser that you're referring to the column "r", it would look like 


SELECT (r).a from (select row(tab1.*)::tab1 as r from tab1)x;

  

yes, it works.

Thanks a lot,
   Nico

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Inheritance fixing timeline? (Was "Inherited FK Indexing")

2007-10-01 Thread Alvaro Herrera

Webb Sprague escribió:
> > >   Is it
> > > possible to have  FK that spans into child tables?
> >
> > This is a well known (and documented, see [1]) deficiency. It's due to
> > the current implementation of indices, which are bound to exactly one
> > table, meaning they do return a position within the table, but cannot
> > point to different tables.
> 
> Is this set to be fixed in any particular release?

No, sorry.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Inheritance problem when restoring db

2007-10-01 Thread Tom Lane

"Sebastjan Trepca" <[EMAIL PROTECTED]> writes:
> Current state:

> Table B has a primary key with sequence b_seq. Table A also has a
> primary key with sequence a_seq.

In view of the fact that primary keys aren't inherited, and do not
"have sequences", this description is uselessly imprecise.  Please show
exactly how you created these two tables.  And which PG version is this?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[GENERAL] Inheritance problem when restoring db

2007-10-01 Thread Sebastjan Trepca

Hi,

I noticed a small bug/problem when restoring a database that uses inheritance.

Lets say you have a table B that inherits from table A.

Current state:

Table B has a primary key with sequence b_seq. Table A also has a
primary key with sequence a_seq.

Now we create a backup and restore the database.

New state:

Table B has a primary key with sequence a_seq. Table A is the same as before.

Is this wrong or normal functionality? Do I have to set some extra
flags when doing the backup?

Thanks, Sebastjan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]

2007-10-01 Thread Alban Hertroys

Stefan Schwarzer wrote:
> 
>> An entirely different question is whether it is a good idea to write a
>> range as a value that the database cannot interpret correctly (referring
>> to the '1970-75' notation). You cannot group records by value this way
>> if you need to (for example) combine data from '1970' with data from
>> '1970-75'.
>>
>> But you seem to use these values just for labels, which I assume are
>> unique across years (eg. if you have a value '1970-75' you don't have
>> values '1970', 1971'..'1974'), in which case this is safe to use. As
>> pointed out by several people earlier, they make an excellent foreign
>> key too (provided they're unique).
> 
> Yep, this is question I posed myself too. In the moment, when doing for
> example "per Capita" calculations on the fly of a variable which has
> something like 1970-75, I would then sum up the Total Population over
> the given period, divide it through the number of years and then use it
> with the selected variable to get the "per Capita" data.
> 
> But if I would instead insert yearly data, it would mean that it had
> five lines with the same values. No problem with that?

Not entirely what I suggested, but also a viable solution, sure.

I was suggesting to add a column to your yearly data marking the end of
the range. Given your above examples, you could then do queries like:

SELECT population / num_years FROM my_data;

(Assuming you add the length of the interval as a number of years, which
seems plausible because you don't seem to calculate with any intervals
not dividable by a year).

Adding this additional column may justify putting the years (and their
durations) into their own table.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] usage of indexes for inner joins

2007-10-01 Thread Scott Marlowe

On 9/27/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote:
> I fear this has been asked many times about PostgreSQL, and I have read
> the docs about how indexes are supposed to be defined and used, but I
> don't understand why the engine and optimizer is doing what it does in
> the simplest of situations.  Is it that its tuning is heavily data
> dependent?

This has in fact been discussed many times on the mailing list.
Searching the archives will probably return lots of discussions.

> Like *how* *come*?  There are indexes on both columns of the join.  Is
> it the NUMERIC datatype messing things up?  Unlikely, as I've seen the
> same with INTEGERs.

Postgresql doesn't store "visibility" information in indexes.  this
means that once you find the entry in an index, you then have to see
if it's visible to the current transaction, and that information is
only stored in the tables.  And there are lots of discussions of why
that is in the archives as well.  Basically race conditions make it
impossible to update the table and index concurrently without ugly
locking issues popping up.

So, in pgsql, whether there's an index or not, the db has to hit the
table in the end.

> If it is data dependent (these tables are presently empty), any
> suggestions as to how to tune a database for unknown mixes of data?

No it isn't.  It is range dependent.  If you had a selective enough
where clause then postgresql would choose an index over a sequential
scan.

Your biggest mistake here is thinking the simple solution (use
indexes) is always best.  PostgreSQL uses a cost based planner that
tries to decide ahead of time what plan is going to be fastest.  The
real answer is to give it good information (i.e. run analyze
frequently enough, and have a high enough stats target for the
column(s) you're using)

That means pgsql is paying attention to how big your tables are as
well as what values are in there and what % you're going to get back.

Use explain analyze to see the differences between what the planner
expects and what it gets.  Like this part of your explain analyze
output:

   Seq Scan on foo a  (cost=0.00..11.80 rows=180 width=407) (actual
   time=0.003..0.003 rows=0 loops=1)

note that the planner expected 180 rows but got 0.  that's a sign of
poor stats.  Run analyze and you should see something closer to a
match between expected and actual rows.

Also, try putting some real data in your db, and using a where clause
(unless you really are gonna grab every single row every time...)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Gregory Stark

"Mike Charnoky" <[EMAIL PROTECTED]> writes:

> I altered the table in question, with "set statistics 100" on the
> timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
> queries don't seem to hang, but it still takes a long time to do the count:
>  * "where evtime between '2007-09-26' and '2007-09-27'"
>took 37 minutes to run (result was ~12 million)
>  * "where evtime between '2007-09-25' and '2007-09-26'"
>took 40 minutes to run (result was ~14 million)
>
> Still stymied about the seemingly random performance, especially since I
> have seen this query execute in 2 minutes.


And the "explain analyze" for these?

Are you still sure it's certain date ranges which are consistently problems
and others are consistently fast? Or could it be something unrelated.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]

2007-10-01 Thread Stefan Schwarzer




An entirely different question is whether it is a good idea to write a
range as a value that the database cannot interpret correctly  
(referring

to the '1970-75' notation). You cannot group records by value this way
if you need to (for example) combine data from '1970' with data from
'1970-75'.

But you seem to use these values just for labels, which I assume are
unique across years (eg. if you have a value '1970-75' you don't have
values '1970', 1971'..'1974'), in which case this is safe to use. As
pointed out by several people earlier, they make an excellent foreign
key too (provided they're unique).


Yep, this is question I posed myself too. In the moment, when doing  
for example "per Capita" calculations on the fly of a variable which  
has something like 1970-75, I would then sum up the Total Population  
over the given period, divide it through the number of years and then  
use it with the selected variable to get the "per Capita" data.


But if I would instead insert yearly data, it would mean that it had  
five lines with the same values. No problem with that?


Stef

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Mike Charnoky

I altered the table in question, with "set statistics 100" on the
timestamp column, then ran analyze.  This seemed to help somewhat.  Now,
queries don't seem to hang, but it still takes a long time to do the count:
 * "where evtime between '2007-09-26' and '2007-09-27'"
   took 37 minutes to run (result was ~12 million)
 * "where evtime between '2007-09-25' and '2007-09-26'"
   took 40 minutes to run (result was ~14 million)

Still stymied about the seemingly random performance, especially since I
have seen this query execute in 2 minutes.

Nothing should be locking the table, the only things happening with this
database are: periodic autovacuums and a process which is constantly
inserting data into the large table in question (and a few other smaller
logging tables).

FYI: Here is the output from explain (I never knew about "\timing" in
psql, that is a big help!).  This was run immediately after I ran the
query a first time (without using the prepare and explain)... This
second run took a little over a minute!  Bizarre...

mydb=# prepare stmt as select count(*) from prediction_accuracy where
evtime between '2007-09-25' and '2007-09-26';
PREPARE
Time: 90.854 ms
mydb=# explain execute stmt;

QUERY PLAN

-
 Aggregate  (cost=642338.27..642338.28 rows=1 width=0)
   ->  Index Scan using pred_acc_evtime_index on prediction_accuracy
(cost=0.00..600219.37 rows=16847557 width=0)
 Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
time zone))
(3 rows)

Time: 131.559 ms
mydb# execute stmt;
  count
--
 14150928
(1 row)

Time: 101721.346 ms


Mike

Gregory Stark wrote:
> "Alban Hertroys" <[EMAIL PROTECTED]> writes:
> 
>> Mike Charnoky wrote:
>>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
>>> good value to use?  This wasn't really clear in the pg docs.  Also, do I
>>> need to run ANALYZE on the table after I change the statistics?
>>>
>>> Here are the EXPLAINs from the queries:
>>>
>>> db=# explain select count(*) from prediction_accuracy where evtime
>>> between '2007-09-25' and '2007-09-26';
>>>
>>> QUERY PLAN
>>>
>>> -
>>>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>>>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>>> (cost=0.00..51.44 rows=12490383 width=0)
>>>  Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
>>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
>>> time zone))
>>> (3 rows)
>>>
>>> db=# explain select count(*) from prediction_accuracy where evtime
>>> between '2007-09-26' and '2007-09-27';
>>>
>>> QUERY PLAN
>>>
>>> -
>>>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>>>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>>> (cost=0.00..454671.07 rows=12777586 width=0)
>>>  Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
>>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
>>> time zone))
>>> (3 rows)
>> Interesting, same plans and no sequential scans... Yet totally different
>> run times. Almost as if something prevents you to read some records
>> between 26 and 27 september...
> 
> Just to be sure we're looking at the right plan do this:
> 
> \timing
> PREPARE stmt AS 
>  SELECT count(*) 
>FROM prediction_accuracy 
>   WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';
> 
> EXPLAIN EXECUTE stmt;
> EXECUTE stmt;
> 
>> I'm no expert on locking in Postgres, but AFAIK locks that prevent you
>> from reading records are rather rare and probably only issued from
>> userland code.
> 
> Pages can be locked but never for very long.
> 
> What other work is going on in this server? Is there anything which might be
> locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
> etc?) 
> 
> Perhaps there's a hardware problem, is there anything in your system logs from
> dmesg?
> 

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]

2007-10-01 Thread Alban Hertroys

Stefan Schwarzer wrote:
> 
> 
>> BTW, You didn't actually use type text for your year column, did you? No
>> quotes needed then. Otherwise you'd have to make sure your year values
>> are all the same length or sorting gets... interesting.
> 
> Yep, my comment just before concerns especially this paragraph, I guess.
> With not only years but also time spans for my data - so, not only 1970,
> 1971 but also 1970-75, 1975-80 etc. I would need indeed to use text
> for my year column. Why get sorting than "interesting"?
> 
> Stef

Text sorts alphabetically, not numerically:

integer text
--- ---
1   '1'
2   '10'
3   '2'
4   '3'
5   '4'
6   '5'
7   '6'
8   '7'
9   '8'
10  '9'

I guess you won't have any problems with sorting, as you use full
century prefixes and the differences in syntax ('1970' vs '1970-75') are
at the rear of your values.

An entirely different question is whether it is a good idea to write a
range as a value that the database cannot interpret correctly (referring
to the '1970-75' notation). You cannot group records by value this way
if you need to (for example) combine data from '1970' with data from
'1970-75'.

But you seem to use these values just for labels, which I assume are
unique across years (eg. if you have a value '1970-75' you don't have
values '1970', 1971'..'1974'), in which case this is safe to use. As
pointed out by several people earlier, they make an excellent foreign
key too (provided they're unique).

Other options to handle these years involve having a start/end year or
date (to_date('01/01/1970', 'MM/DD/'), to_date('12/31/1975',
'MM/DD/')) or a start date and an interval (to_date('01/01/1970',
'MM/DD/'), INTERVAL '1 year').

That makes operators like OVERLAPS and BETWEEN available to your
queries, which may be convenient ;)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] sha1 function

2007-10-01 Thread Richard Huxton


Martin Marques wrote:
We are at the moment planning on passing some passwords that are plain 
texted in our DB to some encrypted form as now they will be used for 
processes that require better security measures.


We started looking at md5() but found that it's easy to crack and one of 
the systems uses CHAP authentication, and so SHA1 hashing.


So, two questions arise.

1) Is MD5's weakness true?


Well (and I'm not a security expert so please check details), there are 
some issues with MD5:
1. There were some recent breakthroughs in generating multiple documents 
that match a given MD5 hash. This is less of a threat to passwords and 
more to document forgery.
2. It's practical nowadays to pre-calculate "rainbow tables" to cover 
all possible combinations of a short password (hence the use of a "salt" 
to extend the password length).
3. You can't get from the hash back to a password easily. So - it's no 
use for storing passwords for other systems, just for checking supplied 
passwords.


What you need to ask is what you're protecting against. If someone has 
access to the database, have they also gained access to the process that 
uses these passwords, so allowing them to watch the unencrypted password 
be tested?



2) Is there any sha1() functions in PostgreSQL?


See contrib/pgcrypto in the source distribution or the equivalent in 
whatever package you use.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]

2007-10-01 Thread Stefan Schwarzer


I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.


Ok, I do understand that.

So, instead of the earlier mentioned database design, I would have  
something like this:


   - one table for the country names/ids/etc. (Afghanistan, 1;  
Albania, 2)
   - one table for the variable names/ids/etc. (GDP, 1; Population,  
2; Fish Catch, 3;)
   - one table for the years names/ids/etc. (1970, 1; 1971, 2;  
1973, 3; )

and
   - one table for all "statistical data" with four fields -  
id_variable, id_country, id_year, and the actual value


You say


I find it far easier to maintain normalized tables that produced
non-normalized ones (for things like data warehousing) than it is to
maintain non-normalized tables and trying to produce normalized data
from that.


What is your view about (having 500 different variables/data sets)  
using a single table for all data versus one table for each variable.  
In terms of "readability" I guess the second solution would be  
better. But, then, I don't know...


Thanks for any views

Stef

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]

2007-10-01 Thread Stefan Schwarzer




BTW, You didn't actually use type text for your year column, did  
you? No

quotes needed then. Otherwise you'd have to make sure your year values
are all the same length or sorting gets... interesting.


Yep, my comment just before concerns especially this paragraph, I  
guess. With not only years but also time spans for my data - so, not  
only 1970, 1971 but also 1970-75, 1975-80 etc. I would need  
indeed to use text for my year column. Why get sorting than  
"interesting"?


Stef

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[GENERAL] Dump of table structure is not consistent

2007-10-01 Thread Poul Møller Hansen


Hi,

\d my.table is showing me the primary key as:
"unitstat_pkey" PRIMARY KEY, btree (id)

But the looking into the table structure produced by
pg_dump -s -n my -t table db

I'm getting gthe primary key shown as:
ADD CONSTRAINT unistat_pkey PRIMARY KEY (id);

That has been the name of it, but I altered it executing:
ALTER TABLE unistat_pkey RENAME TO unitstat_pkey

How can I make the dump consistent ?
PostgreSQL 8.1.9 on i486-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt 
(GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)



Poul



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]

2007-10-01 Thread Stefan Schwarzer






   - one table for the years names/ids/etc. (1970, 1; 1971, 2;  
1973, 3;

)


If you _do_ need this table (because you want to constrain your
statistical data to only contain a specific set of years, or  
because you

need a quick list of available years to select from): Make the year
primary key and drop the artificial index.

Years are perfectly fine data to constrain on, and it saves you the
joins with that table (the foreign key constraint constrains your data
sufficiently).


If my years are not only single years such as 1970, 1971... but time  
spans, such as 1970-75 should your proposal be still valid? It  
won't be anymore an integer field, but a text field instead...


Stef

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] sha1 function

2007-10-01 Thread Martin Marques

We are at the moment planning on passing some passwords that are plain 
texted in our DB to some encrypted form as now they will be used for 
processes that require better security measures.


We started looking at md5() but found that it's easy to crack and one of 
the systems uses CHAP authentication, and so SHA1 hashing.


So, two questions arise.

1) Is MD5's weakness true?

2) Is there any sha1() functions in PostgreSQL?

--
 21:50:04 up 2 days,  9:07,  0 users,  load average: 0.92, 0.37, 0.18
-
Lic. Martín Marqués |   SELECT 'mmarques' ||
Centro de Telemática|   '@' || 'unl.edu.ar';
Universidad Nacional|   DBA, Programador,
del Litoral |   Administrador
-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Inheritance fixing timeline? (Was "Inherited FK Indexing")

2007-10-01 Thread Scott Ribe

> Is this set to be fixed in any particular release?

Depending on what you're doing, this may be overkill, but: I have child
tables that not only need FK constraints, but also triggers and the
functions called by the triggers. So instead of writing this over and over
again, I eventually wrote a single procedure that takes the name of the
table, and using dynamic sql (execute command), generates the FKs and the
procedures and the triggers.

You *could* take it a step further, and have a procedure which takes the
name of the base table, finds all inherited tables, and makes sure
everything is set up correctly. I haven't, and probably won't, because I'm a
solo developer and don't make additions to the schema at such a great rate
that I would have trouble remembering to run my current "FooChild_Setup"
function on a new table.

-- 
Scott Ribe
[EMAIL PROTECTED]
http://www.killerbytes.com/
(303) 722-0567 voice



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alban Hertroys

Albe Laurenz wrote:
> Alban Hertroys wrote:
>> A. Kretschmer wrote:
>>> Again: an index can't help! Because of MVCC: 'select count(*)'
> without
>>> WHERE-condition forces an seq. table-scan.
>> That has very little to do with MVCC.
>>
>> [...] For that it makes no difference whether a seq
>> scan or an index scan is performed - both cases need to check at the
>> record level whether it's visible (where the seq scan is
>> already looking at the actual record, of course).
> 
> If you do not use MVCC (say, you use DB2), you need not check
> the record itself because if it is there (which it is if there
> is an index entry), it will be 'visible'.

Still, that's not because of MVCC, but because of the way it is
implemented in PostgreSQL.

There has been talk in the past (regularly) about why the MVCC
information is not in the index and whether it should be, see the ML
archives.

Besides, there are still many situations where a sequential scan
(whether for count(*) or not) is faster than an index scan, no matter
whether you have MVCC or not.

As I said, MVCC has little to do with it.
The real problem is that in postgres you cannot tell from an index
whether a record is visible or not, while you can in DB2 (because it has
an index entry or not).

>> I pleed not guilty ;)
> 
> Declined, sorry.

Overruled, sorry.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] usage of indexes for inner joins

2007-10-01 Thread Ben Trewern

Sequence scans of an empty table are going to be faster than an index scan, 
so the database uses the sequence scan.  Put some data in the tables (some 
thousands or millions of records) and then see if it uses an index scan.

Ben

""Jan Theodore Galkowski"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>I fear this has been asked many times about PostgreSQL, and I have read
> the docs about how indexes are supposed to be defined and used, but I
> don't understand why the engine and optimizer is doing what it does in
> the simplest of situations.  Is it that its tuning is heavily data
> dependent?
>
> My case of interest is more complicated, but I decided to create a toy
> case to try to understand.  Here it is:
>
>
>  -- Table "foo" DDL
>
>  CREATE TABLE "public"."foo"(
>
>  "projectid" int4 NOT NULL ,
>
>  "uid" int4 NOT NULL ,
>
>  "name" varchar(254) NOT NULL ,
>
>  "ver" varchar(127) NOT NULL ,
>
>  "startdate" date NOT NULL ,
>
>  "enddate" date NOT NULL ,
>
>  "status" varchar(254) NOT NULL ,
>
>  "percentdone" numeric(7,2) NOT NULL ,
>
>  "championuid" int4 NOT NULL ,
>
>  "pmuid" int4 NOT NULL ,
>
>  PRIMARY KEY ("projectid")
>
>  )  WITHOUT OIDS;
>
>
>  -- Table "bignum" DDL
>
>  CREATE TABLE "public"."bignum"(
>
>  "thing" numeric(100) NOT NULL
>
>  )  WITHOUT OIDS;
>
>  CREATE INDEX "t" ON "public"."bignum" USING btree ("thing");
>
>
> Running
>
>EXPLAIN ANALYZE SELECT A.* FROM bignum  B, foo  A WHERE A.projectid
>= B.thing;
>
> yields:
>
>Nested Loop  (cost=0.00..15.51 rows=1 width=407) (actual
>time=0.041..0.041 rows=0 loops=1)
>
>  Join Filter: ((a.projectid)::numeric = b.thing)  ->
>
>Seq Scan on bignum b (cost=0.00..1.01 rows=1 width=16) (actual
>time=0.024..0.027 rows=1 loops=1)  ->
>
>Seq Scan on foo a  (cost=0.00..11.80 rows=180 width=407) (actual
>time=0.003..0.003 rows=0 loops=1)
>
>Total runtime: .169 ms ;
>
> Like *how* *come*?  There are indexes on both columns of the join.  Is
> it the NUMERIC datatype messing things up?  Unlikely, as I've seen the
> same with INTEGERs.
>
> If it is data dependent (these tables are presently empty), any
> suggestions as to how to tune a database for unknown mixes of data?
>
> This is run on the Windows version of PG, but I'm seeing the same kind
> of thing on Linux.
>
> Thanks.
>
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

[GENERAL] Re: ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window

2007-10-01 Thread Nis Jørgensen

Albe Laurenz skrev:
> Anoo Sivadasan Pillai wrote:
>> Why the Fun_ABC1 is created and Fun_ABC12 is raising the 
>> following error, while run through psql, ( I Could create 
>> both the functions from PgAdmin III query ) 
>>
>> ERROR:  invalid byte sequence for encoding "UTF8": 0x93
> 
> Because the characters you entered into psql are not
> encoded in the character encoding that psql expects.
> 
> You probably create the function in psql by executing an
> SQL script, right?
> 
> From the error message I deduce that psql expects UTF8
> characters from you.
> 
> You can do two things:
> 1) change the SQL script to UTF8 before running it
> 2) change the client encoding in psql before running the
>SQL script.
>This is done with the SQL command
>SET client_encoding='...'

Third option: Stop using curly quotes in your SQL - even if it just in
comments.

Nis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Query problem

2007-10-01 Thread Albe Laurenz

Naz Gassiep wrote:
> Aside from the messy nomenclature, is anyone able to spot why 
> the "sum" column from the first query is not returning 7, as 
> the second query suggests that it should? I know that this is
> probably simple, and that It's probably going to jump out at
> me the minute I hit "Send", but if I don't hit send, then I'll
> never see it hehe.
> 
> twerl=# SELECT ... SUM(contacts.id) ... FROM ... contacts ...
> ... | sum | ...
> ...-+-+-...
> ... | 594 | ...
> (1 row)
>  
> twerl=# select count(*) from contacts where groupid = 3;
>  count
> ---
>  7
> (1 row)

That's an easy one.

I assume that there are three rows in table "contacts", and
the values of the "id" column are 42, 123 and 429 (or similar).

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username

2007-10-01 Thread Ben Trewern

<[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>
> I don't care if it's part of the SQL standard or not. I don't care if
> oracle does it or not.  You're losing mysql converts as they go
> through the tutorial and get to this point. Or worse, they just "grant
> all" because it's easier, thus causing security holes.  User
> friendliness matters.
>

You can use the pgAdmin's grant wizard to do what you want.

Regards,

Ben

BTW thanks for the polite e-mail. :-/ 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window

2007-10-01 Thread Albe Laurenz

Anoo Sivadasan Pillai wrote:
> Why the Fun_ABC1 is created and Fun_ABC12 is raising the 
> following error, while run through psql, ( I Could create 
> both the functions from PgAdmin III query ) 
> 
> ERROR:  invalid byte sequence for encoding "UTF8": 0x93

Because the characters you entered into psql are not
encoded in the character encoding that psql expects.

You probably create the function in psql by executing an
SQL script, right?

From the error message I deduce that psql expects UTF8
characters from you.

You can do two things:
1) change the SQL script to UTF8 before running it
2) change the client encoding in psql before running the
   SQL script.
   This is done with the SQL command
   SET client_encoding='...'

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Query problem

2007-10-01 Thread Naz Gassiep

Aside from the messy nomenclature, is anyone able to spot why 
the "sum" column from the first query is not returning 7, as 
the second query suggests that it should? I know that this is

probably simple, and that It's probably going to jump out at
me the minute I hit "Send", but if I don't hit send, then I'll
never see it hehe.


twerl=# SELECT 'contactgroups',
contactgroups.siteid,
contactgroups.id,
contactgroups.name,
contactgroups.description,
SUM(contacts.id),
contactgroups.trashed
   FROM contactgroups
LEFT OUTER JOIN contacts ON (contactgroups.id = contacts.groupid)
  WHERE contactgroups.trashed IS NOT NULL
AND contactgroups.deleted IS NULL
   GROUP BY contactgroups.siteid,
contactgroups.id,
contactgroups.name,
contactgroups.description,
contactgroups.trashed;
  ?column?| siteid | id |name| description | sum |  
  trashed
---++++-+-+---
contactgroups |  1 |  3 | Telechoice / Optus | | 594 | 
2007-10-01 20:08:51.449825+10
(1 row)

twerl=# select count(*) from contacts where groupid = 3;
count
---
7
(1 row)


Thanks,
- Naz.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] windows and pg 8.2 (change database to another server)

2007-10-01 Thread Magnus Hagander

On Sun, Sep 30, 2007 at 11:36:23PM +0200, Terry Yapt wrote:
> Hello all,
> 
> I would like to change a pg database to another server.
> 
> The source environment is: postgresql Windows v.8.2.4 (windows xp 
> workstation).
> The target environment is: postgresql Windows v.8.2.5 (windows 2003 Server).
> 
> I would like to do migration without pg_dumpall and I think I can do the 
> migration process _only_ copying this (after stop Postgresql service, of 
> course):
> 1.- Data directory (c:\program files\postgresql\8.2\data).
> 2.- Another TableSpace folders/directories.
> 
> But after copy that folders to my new server, service PostgreSQL doesn't 
> start with a 'cannot create postmaster.pid'.
> 
> First of all.  I think this method is admisible. Isn't it ?

It is.


> And second question: I think my problem is that some rights are wrong 
> after copying data folder.  What are the right rights to apply to data 
> folder ?

Yes, most likely. You need to grant the postgres service account "Change"
permissions (or Full Control, but Change is recommended) on the data
directory. If you didn't do anything speicifically, it will just have
inherited from further up in the tree, which means that the service account
only has "Read" access.

//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window

2007-10-01 Thread Richard Huxton


Anoo Sivadasan Pillai wrote:

Why the Fun_ABC1 is created and Fun_ABC12 is raising the following
error, while run through psql, ( I Could create both the functions from
PgAdmin III query ) 


ERROR:  invalid byte sequence for encoding "UTF8": 0x93

HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".


Check your "client_encoding" setting in psql perhaps?

Presumably the text-file contains non-UTF8 characters but the 
client_encoding says UTF8. Copying+pasting into pgAdmin is probably 
translating for you.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Albe Laurenz

Alban Hertroys wrote:
> A. Kretschmer wrote:
>> Again: an index can't help! Because of MVCC: 'select count(*)'
without
>> WHERE-condition forces an seq. table-scan.
> 
> That has very little to do with MVCC.
> 
> [...] For that it makes no difference whether a seq
> scan or an index scan is performed - both cases need to check at the
> record level whether it's visible (where the seq scan is
> already looking at the actual record, of course).

If you do not use MVCC (say, you use DB2), you need not check
the record itself because if it is there (which it is if there
is an index entry), it will be 'visible'.

> I pleed not guilty ;)

Declined, sorry.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Gregory Stark

"Alban Hertroys" <[EMAIL PROTECTED]> writes:

> Mike Charnoky wrote:
>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
>> good value to use?  This wasn't really clear in the pg docs.  Also, do I
>> need to run ANALYZE on the table after I change the statistics?
>> 
>> Here are the EXPLAINs from the queries:
>> 
>> db=# explain select count(*) from prediction_accuracy where evtime
>> between '2007-09-25' and '2007-09-26';
>> 
>> QUERY PLAN
>> 
>> -
>>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>> (cost=0.00..51.44 rows=12490383 width=0)
>>  Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
>> time zone))
>> (3 rows)
>> 
>> db=# explain select count(*) from prediction_accuracy where evtime
>> between '2007-09-26' and '2007-09-27';
>> 
>> QUERY PLAN
>> 
>> -
>>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
>> (cost=0.00..454671.07 rows=12777586 width=0)
>>  Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
>> time zone))
>> (3 rows)
>
> Interesting, same plans and no sequential scans... Yet totally different
> run times. Almost as if something prevents you to read some records
> between 26 and 27 september...

Just to be sure we're looking at the right plan do this:

\timing
PREPARE stmt AS 
 SELECT count(*) 
   FROM prediction_accuracy 
  WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26';

EXPLAIN EXECUTE stmt;
EXECUTE stmt;

> I'm no expert on locking in Postgres, but AFAIK locks that prevent you
> from reading records are rather rare and probably only issued from
> userland code.

Pages can be locked but never for very long.

What other work is going on in this server? Is there anything which might be
locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE,
etc?) 

Perhaps there's a hardware problem, is there anything in your system logs from
dmesg?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window

2007-10-01 Thread Anoo Sivadasan Pillai

Why the Fun_ABC1 is created and Fun_ABC12 is raising the following
error, while run through psql, ( I Could create both the functions from
PgAdmin III query ) 

 

ERROR:  invalid byte sequence for encoding "UTF8": 0x93

HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

 

CREATE OR REPLACE FUNCTION "Fun_ABC1"() RETURNS VOID AS

$BODY$

/***Description   :  "Desc1" - Description contains character  ***/

 

CREATE OR REPLACE FUNCTION "Fun_ABC12"() RETURNS VOID AS

$BODY$

/***Description   :  "Desc1"  - Description contains character ***/

 

Note : This was posted in pg_hackers/ As per the suggestions now
reposting in general

 

 

 
Visit our Website at http://www.rmesi.co.in

This message is confidential. You should not copy it or disclose its contents 
to anyone. You may use and apply the information for the intended purpose only. 
Internet communications are not secure; therefore, RMESI does not accept legal 
responsibility for the contents of this message. Any views or opinions 
presented are those of the author only and not of RMESI. If this email has come 
to you in error, please delete it, along with any attachments. Please note that 
RMESI may intercept incoming and outgoing email communications. 

Freedom of Information Act 2000
This email and any attachments may contain confidential information belonging 
to RMESI.  Where the email and any attachments do contain information of a 
confidential nature, including without limitation information relating to trade 
secrets, special terms or prices these shall be deemed for the purpose of the 
Freedom of Information Act 2000 as information provided in confidence by RMESI 
and the disclosure of which would be prejudicial to RMESI's commercial 
interests.

This email has been scanned for viruses by Trend ScanMail.

Re: [GENERAL] COPY for .CSV files problem

2007-10-01 Thread Jorge Godoy

On Monday 01 October 2007 05:20:52 pere roca wrote:
>   Hi everybody,
>   I want to enter a .CSV file using COPY comand and plpgsql. It enters
> lat,lon and some data. In the CSV data there is no field (such as
> "user_name" or current_time) that allow distinguish future queries for
> different users (ex: select x,y from table where user_name=z; after
> entering lat,lon I would like to update the_geom in  another table using
> the lat,lon of ONLY this user).
>
>   I could update this kind of data ("user_name") using php but next user
> could not enter his data by COPY (it only lets fill the destination table
> if ALL the fields are the same as the CSV or text file).
>   So, I have a problem. I thought that may be I can reference/select data
> of a SINGLE USER using parameters like default_time without need to insert
> this data as a field in the table (otherwise, problems with COPY). Maybe
> can I use catalog?

COPY won't allow you to process your data.  You either write something to 
insert record by record or you change your data to include the needed 
information before using COPY.

I would change the input file to include the user name.


-- 
Jorge Godoy  <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] 3 tables join update

2007-10-01 Thread Alban Hertroys

rihad wrote:
> Richard Broersma Jr wrote:
> UPDATE Foo foo
> SET ...
> FROM LEFT JOIN Bar bar USING(common_field)
> WHERE blah='blah' AND bar.common_field IS NULL;
> 
> ERROR:  syntax error at or near "JOIN"
> 
> 
> I know I'm misusing UPDATE ... FROM because I don't really want Bar's
> values to go into Foo, but only using them for a conditional update
> (atomically I hope).

Oh, you mean:

UPDATE foo
SET ...
WHERE blah = 'blah'
  AND NOT EXISTS (
SELECT 1 FROM baz WHERE foo.common_field = baz.common_field
  )

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alban Hertroys

Mike Charnoky wrote:
> With respect to the ALTER TABLE SET STATISTICS... how do I determine a
> good value to use?  This wasn't really clear in the pg docs.  Also, do I
> need to run ANALYZE on the table after I change the statistics?
> 
> Here are the EXPLAINs from the queries:
> 
> db=# explain select count(*) from prediction_accuracy where evtime
> between '2007-09-25' and '2007-09-26';
> 
> QUERY PLAN
> 
> -
>  Aggregate  (cost=475677.40..475677.41 rows=1 width=0)
>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..51.44 rows=12490383 width=0)
>  Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with
> time zone))
> (3 rows)
> 
> db=# explain select count(*) from prediction_accuracy where evtime
> between '2007-09-26' and '2007-09-27';
> 
> QUERY PLAN
> 
> -
>  Aggregate  (cost=486615.04..486615.05 rows=1 width=0)
>->  Index Scan using pred_acc_evtime_index on prediction_accuracy
> (cost=0.00..454671.07 rows=12777586 width=0)
>  Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp
> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with
> time zone))
> (3 rows)

Interesting, same plans and no sequential scans... Yet totally different
run times. Almost as if something prevents you to read some records
between 26 and 27 september...

I'm no expert on locking in Postgres, but AFAIK locks that prevent you
from reading records are rather rare and probably only issued from
userland code.

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] more problems with count(*) on large table

2007-10-01 Thread Alban Hertroys

A. Kretschmer wrote:
> Again: an index can't help! Because of MVCC: 'select count(*)' without
> WHERE-condition forces an seq. table-scan. 

That has very little to do with MVCC.

If I understand correctly, MVCC is about the availability of records in
the current transaction. For that it makes no difference whether a seq
scan or an index scan is performed - both cases need to check at the
record level whether it's visible (where the seq scan is already looking
at the actual record, of course).

The only difference MVCC makes here is that the balance between a seq
scan or an index scan being more efficient is somewhat sooner in favour
of the seq scan than on some other databases, because the index scan
needs to look at the actual record for visibility.

I pleed not guilty ;)

-- 
Alban Hertroys
[EMAIL PROTECTED]

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] COPY for .CSV files problem

2007-10-01 Thread pere roca


  Hi everybody,
  I want to enter a .CSV file using COPY comand and plpgsql. It enters
lat,lon and some data. In the CSV data there is no field (such as
"user_name" or current_time) that allow distinguish future queries for
different users (ex: select x,y from table where user_name=z; after entering
lat,lon I would like to update the_geom in  another table using the lat,lon
of ONLY this user). 
 
  I could update this kind of data ("user_name") using php but next user
could not enter his data by COPY (it only lets fill the destination table if
ALL the fields are the same as the CSV or text file).
  So, I have a problem. I thought that may be I can reference/select data of
a SINGLE USER using parameters like default_time without need to insert this
data as a field in the table (otherwise, problems with COPY). Maybe can I
use catalog?

  Thanks in advance,
   Pere
-- 
View this message in context: 
http://www.nabble.com/COPY-for-.CSV-files-problem-tf4546778.html#a12974836
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] using COPY, .CSV and ¿catalog?

2007-10-01 Thread pere roca ristol

  Hi everybody,

  I want to enter a .CSV file using COPY comand and plpgsql.
  It enters latitude,longitude and some data. In the CSV data there is no
field (such as "user_name" or current_time) that allow distinguish future
queries for different users (ex: select x,y from table where user_name=z;
after entering lat,lon I would like to update the_geom in  another table
using the lat,lon of ONLY this user).
  I could update this kind of data ("user_name") using php but next user
could not enter his data by COPY (it only lets fill the destination table if
ALL the fields are the same as the CSV or text file).
  So, I have a problem. I thought that may be I can reference/select data of
a SINGLE USER using parameters like default_time without need to insert this
data as a field in the table (otherwise, problems with COPY). Maybe asking
in catalog?

  Thanks,
  Pere

73 matches

Mail list logo