date:20050408

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread John DeSoi

On Apr 7, 2005, at 5:45 PM, Carlos Moreno wrote:
The thing seems to work -- I had to go in a shell as user
postgres and execute the command:
$ createlang -d dbname plpgsql
(I'm not sure I understand why that is necessary, or
what implications -- positive or negative -- it may have)
As a security measure, no pl language is available by default. What you 
did is correct. There is not much (any?) risk with pl/pgsql, so you can 
install it in template1 so it will be available in any new database you 
create.

Am I doing the right thing?  Have I introduced some sort
of catastrophe waiting to happen?
I did not notice any problems.
John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin









Per a thread a while back the discussion was along the lines
of serving data up to the web quick.

Our app currently pulls a bunch of data to several query
pages.

I have tried and not gotten the queries to return as fast as
they do now which is a huge disappointment as the hardware is twice as powerful
as our current production. I can get it pretty close on some thing but others
are just plain slower.

 

My idea is to use the limit and offset to return just the
first 50 records, if they hit next I can set the offset.

My understanding was this gets slower as you move further
into the data, but we have several options to modify the search, and I do not
believe our clients will page very far intro a dataset.

 

One problem I think I will have though is they currently
have count of the records matching their request and I would like to keep that
as a display field

 

So given a table of associates my default query will be
something like

Select * from tblassoc where clientnum = ‘WAZ’
and isdeleted is false

The user could add stuff like and where first name like ‘Joel’

 

Currently it returns all records with a count and a display
of the records your viewing like 1-50 of 470, next page is 51-100 etc.

Is there a fast way to get the count? Will this concept fly?
Also I am getting heat that my search is now case sensitive. What is the best
way to get a case insensitive search? I could use ~* or perhaps do an UPPER(firstname)
in the select etc? 

 

Thanks for any ideas here. I have tried playing with various
settings and have not seen my times change much, I will persue this on the
performance mailing list.

 

 

 

Joel Fradkin

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Sean Davis

On Apr 8, 2005, at 8:28 AM, John DeSoi wrote:
On Apr 7, 2005, at 5:45 PM, Carlos Moreno wrote:
The thing seems to work -- I had to go in a shell as user
postgres and execute the command:
$ createlang -d dbname plpgsql
(I'm not sure I understand why that is necessary, or
what implications -- positive or negative -- it may have)
As a security measure, no pl language is available by default. What 
you did is correct. There is not much (any?) risk with pl/pgsql, so 
you can install it in template1 so it will be available in any new 
database you create.

Am I doing the right thing?  Have I introduced some sort
of catastrophe waiting to happen?
I did not notice any problems.
Just one detail, but in the form of a question. In the original 
posting, I think the trigger was doing the logging for something 
happening on a table as a before insert or update--I may be wrong on 
that detail.  I would think of doing such actions AFTER the 
update/insert.  In the world of transaction-safe operations, is there 
ANY danger in doing the logging as a BEFORE trigger rather than an 
AFTER trigger?

Thanks,
Sean
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread John DeSoi

On Apr 8, 2005, at 9:41 AM, Sean Davis wrote:
Just one detail, but in the form of a question. In the original 
posting, I think the trigger was doing the logging for something 
happening on a table as a before insert or update--I may be wrong on 
that detail.  I would think of doing such actions AFTER the 
update/insert.  In the world of transaction-safe operations, is there 
ANY danger in doing the logging as a BEFORE trigger rather than an 
AFTER trigger?

Good point. I think both will work in this case and it would depend on 
the application if it makes a difference. You definitely want an AFTER 
trigger if you need to see the final state of the row before making 
changes. In this case the assignment of the column does not depend on 
any other factors so it would not seem to matter. But I agree from a 
semantics point of view, an AFTER trigger might be a little better for 
this.

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Tom Lane

Sean Davis <[EMAIL PROTECTED]> writes:
> Just one detail, but in the form of a question. In the original 
> posting, I think the trigger was doing the logging for something 
> happening on a table as a before insert or update--I may be wrong on 
> that detail.  I would think of doing such actions AFTER the 
> update/insert.  In the world of transaction-safe operations, is there 
> ANY danger in doing the logging as a BEFORE trigger rather than an 
> AFTER trigger?

No, actually Carlos wanted to do
new.last_modified = now();
so he *must* use a BEFORE trigger --- AFTER is too late to change the
data that will be stored.

Generalizing freely, I've seen three basic uses for triggers:
1. Modify the data that will be stored.
2. Check that data is valid (eg, consistent with another table).
3. Propagate updates in one place to other places.
Clearly #1 must be done in BEFORE triggers.  #2 and #3 could be done
either way.  They are often done in AFTER triggers because that way you
*know* that any case-1 triggers have done their work and you are looking
at the correct final state of the row.  But you could do them in a
BEFORE trigger if you were willing to assume that no later-fired trigger
would make a change that invalidates your check or propagation.  AFTER
triggers are relatively expensive (since the triggering event state has
to be saved and then recalled) so I could see making that tradeoff if
performance is critical.

AFAICS the only way that you could get into a can't-roll-back situation
is if the trigger tries to propagate the update outside the database.
For instance, the proverbial trigger to send mail: once sent you can't
cancel it.  But really this is dangerous even in an AFTER trigger ---
the transaction could still be rolled back after the AFTER trigger
fires.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Andrew Sullivan

On Fri, Apr 08, 2005 at 10:36:26AM -0400, Tom Lane wrote:

> AFAICS the only way that you could get into a can't-roll-back situation
> is if the trigger tries to propagate the update outside the database.
> For instance, the proverbial trigger to send mail: once sent you can't
> cancel it.  But really this is dangerous even in an AFTER trigger ---
> the transaction could still be rolled back after the AFTER trigger
> fires.

People who know more about this will no doubt correct me, but isn't
such a case crying out for LISTEN/NOTIFY instead?  That is, your
trigger puts the mail content into a table of mails to be sent, and
wakes up the mail-sender client with the NOTIFY; the NOTIFY and the
commit to the mail-it table only happen in that case if the
transaction commits.  And since mail is async anyway, the extra few
seconds shouldn't make any difference, right?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Andrew Sullivan

On Fri, Apr 08, 2005 at 09:29:13AM -0400, Joel Fradkin wrote:
> My understanding was this gets slower as you move further into the data, but
> we have several options to modify the search, and I do not believe our
> clients will page very far intro a dataset.
> 

It gets slower because when you do an offset of 50, you have to pass
through the first 50 before picking up the ones you want.  If you
offset 100, you scan through the first 100.  &c.  If you don't want
to pay that, you need to use a cursor, but this causes the problem
that you have to keep your cursor open across page views, which is a
tricky issue on the Web.

> Currently it returns all records with a count and a display of the records
> your viewing like 1-50 of 470, next page is 51-100 etc.
> 
> Is there a fast way to get the count? 

Not really, no.  You have to perform a count() to get it, which is
possibly expensive.  One way to do it, though, is to do 

SELECT count(*) FROM tablename WHERE condition LIMIT n;

or something like that.  Assuming the condition is reasonably limited
(i.e. it's not going to cost you a fortune to run this), you'll get
the right number back if the number is < n or else you'll get
n.  If you have n, your application can say "viewing 1-50 of at least
n records".  This is something you see from time to time in this sort
of application.

> getting heat that my search is now case sensitive. What is the best way to
> get a case insensitive search? I could use ~* or perhaps do an
> UPPER(firstname) in the select etc? 

The upper() (or lower() -- whatever) stragegy is what I'd use.  In
any case, you want to make sure you put functional indexes on all
such columns, because otherwise you'll never get an index scan.

A
-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Tom Lane

Andrew Sullivan <[EMAIL PROTECTED]> writes:
> On Fri, Apr 08, 2005 at 10:36:26AM -0400, Tom Lane wrote:
>> AFAICS the only way that you could get into a can't-roll-back situation
>> is if the trigger tries to propagate the update outside the database.
>> For instance, the proverbial trigger to send mail: once sent you can't
>> cancel it.  But really this is dangerous even in an AFTER trigger ---
>> the transaction could still be rolled back after the AFTER trigger
>> fires.

> People who know more about this will no doubt correct me, but isn't
> such a case crying out for LISTEN/NOTIFY instead?  That is, your
> trigger puts the mail content into a table of mails to be sent, and
> wakes up the mail-sender client with the NOTIFY; the NOTIFY and the
> commit to the mail-it table only happen in that case if the
> transaction commits.  And since mail is async anyway, the extra few
> seconds shouldn't make any difference, right?

We do often recommend that, though it occurs to me that this just moves
the failure case somewhere else.  The hypothetical mail-sending process
would presumably want to send mail and then delete the associated record
from the table of pending mails ... so what if it fails after sending
the mail and before committing the delete?

What this does do for you is replace the risk of phantom emails (mail
sent but corresponding action inside the database never committed)
with the risk of duplicate emails (mail-sender sends you another one
after it restarts).  In most cases I think I'd prefer the latter.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread John DeSoi

Tom,
Thanks for setting the record straight. It has been a while since I 
have written a trigger and I forgot that you can't modify the row in 
the AFTER trigger. Makes perfect sense.

For the record, here is what the docs say:
Typically, row before triggers are used for checking or modifying the 
data that will be inserted or updated. For example, a before trigger 
might be used to insert the current time into a timestamp column, or to 
check that two elements of the row are consistent. Row after triggers 
are most sensibly used to propagate the updates to other tables, or 
make consistency checks against other tables. The reason for this 
division of labor is that an after trigger can be certain it is seeing 
the final value of the row, while a before trigger cannot; there might 
be other before triggers firing after it. If you have no specific 
reason to make a trigger before or after, the before case is more 
efficient, since the information about the operation doesn't have to be 
saved until end of statement.

It might be worth adding a sentence here that explicitly states 
modifications can only be made in the BEFORE trigger. I did not see 
that anywhere else in the document.

On Apr 8, 2005, at 10:36 AM, Tom Lane wrote:
No, actually Carlos wanted to do
new.last_modified = now();
so he *must* use a BEFORE trigger --- AFTER is too late to change the
data that will be stored.
Generalizing freely, I've seen three basic uses for triggers:
	1. Modify the data that will be stored.
	2. Check that data is valid (eg, consistent with another table).
	3. Propagate updates in one place to other places.
Clearly #1 must be done in BEFORE triggers.  #2 and #3 could be done
either way.  They are often done in AFTER triggers because that way you
*know* that any case-1 triggers have done their work and you are 
looking
at the correct final state of the row.  But you could do them in a
BEFORE trigger if you were willing to assume that no later-fired 
trigger
would make a change that invalidates your check or propagation.  AFTER
triggers are relatively expensive (since the triggering event state has
to be saved and then recalled) so I could see making that tradeoff if
performance is critical.

John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Andrew Sullivan

On Fri, Apr 08, 2005 at 11:35:47AM -0400, Tom Lane wrote:
> What this does do for you is replace the risk of phantom emails (mail
> sent but corresponding action inside the database never committed)
> with the risk of duplicate emails (mail-sender sends you another one
> after it restarts).  In most cases I think I'd prefer the latter.

Me too.  Besides, you already have this risk with SMTP, because a
message can be queued and accepted on the remote side when the local
side goes away, so that the session is completed improperly. 
Depending on configuration and a bunch of painful start-up
possibilities with the server, you might well get a duplicate copy of
a mail transmitted later.  (In the present age, given the remarkable
quality of networks and mail servers everyone has, you almost never
have this happen any more.  But it's still strictly speaking
possible.)

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Richard Huxton

Tom Lane wrote:
 The hypothetical mail-sending process
would presumably want to send mail and then delete the associated record
from the table of pending mails ... so what if it fails after sending
the mail and before committing the delete?
What this does do for you is replace the risk of phantom emails (mail
sent but corresponding action inside the database never committed)
with the risk of duplicate emails (mail-sender sends you another one
after it restarts).  In most cases I think I'd prefer the latter.
You have this possibility anyway. If a mailserver thinks it has failed 
to forward the message, it will resend. There is always a small window 
where the receiving mailserver might actually have received the message 
without the acknowledgement being logged by the sender.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Ragnar Hafstað

On Fri, 2005-04-08 at 11:07 -0400, Andrew Sullivan wrote:
> On Fri, Apr 08, 2005 at 09:29:13AM -0400, Joel Fradkin wrote:
> > 
> > Is there a fast way to get the count? 
> 
> Not really, no.  You have to perform a count() to get it, which is
> possibly expensive.  One way to do it, though, is to do 
> 
>   SELECT count(*) FROM tablename WHERE condition LIMIT n;
> 
> or something like that.  Assuming the condition is reasonably limited
> (i.e. it's not going to cost you a fortune to run this), you'll get
> the right number back if the number is < n or else you'll get
> n.

come again ?

test=# select count(*) from a;
 count
---
 3
(1 row)

test=# select count(*) from a limit 2;
 count
---
 3
(1 row)

the LIMIT clause limits the number of rows returned by the select,
in this case 1 row.

maybe you mean something like:

test=# select count(*) from (select * from a limit 2) as foo;
 count
---
 2
(1 row)

gnari



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Ragnar Hafstað

On Fri, 2005-04-08 at 09:29 -0400, Joel Fradkin wrote:

> Our app currently pulls a bunch of data to several query pages.
> 
> My idea is to use the limit and offset to return just the first 50
> records, if they hit next I can set the offset.
> 
> My understanding was this gets slower as you move further into the
> data, but we have several options to modify the search, and I do not
> believe our clients will page very far intro a dataset.

you might reduce the performance loss if your dataset is ordered by
a UNIQUE index.

select * from mytable where somecondition 
  ORDER by uniquecol limit 50;

and next:

select * from mytable where somecondition AND uniquecol>? 
  ORDER by uniquecol limit 50 OFFSET 50;

where the ? is placeholder for last value returned by last query.

if your unique index is a multi-column one, the method is slightly
more complicated, but the same idea.

gnari

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Andrew Sullivan

On Fri, Apr 08, 2005 at 04:17:45PM +, Ragnar Hafstað wrote:
> On Fri, 2005-04-08 at 11:07 -0400, Andrew Sullivan wrote:
> > 
> > SELECT count(*) FROM tablename WHERE condition LIMIT n;

> the LIMIT clause limits the number of rows returned by the select,
> in this case 1 row.
> 
> maybe you mean something like:
> 
> test=# select count(*) from (select * from a limit 2) as foo;

Yes, that was stupid of me.  That's what I meant, though.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Tom Lane

Ragnar =?ISO-8859-1?Q?Hafsta=F0?= <[EMAIL PROTECTED]> writes:
> you might reduce the performance loss if your dataset is ordered by
> a UNIQUE index.

> select * from mytable where somecondition 
>   ORDER by uniquecol limit 50;

> and next:

> select * from mytable where somecondition AND uniquecol>? 
>   ORDER by uniquecol limit 50 OFFSET 50;

> where the ? is placeholder for last value returned by last query.

Uh, you don't want the OFFSET there do you?  But otherwise, yeah,
this is a popular solution for paging through some rows.  Doesn't really
help with the problem of counting the total dataset though ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[SQL] Numeric and CSV under 8.0.1 ?

2005-04-08 Thread Stef

Hello Everyone,
Currently, here at work, I am doing the whole
'advocacy' part of postgreSQL. It's not really hard to
do, as the other database's are MySQL and Sybase ;)

There is obviously a whole spat of data munging
going on in the background, and I noticed that psql in
8.0.1 now accepts CSV ! Great. 

Except, it doesn't really appear to be 100% there.
Numeric's wrapped in '...' don't want to appear to go in.
Is this a 'known problem' ?

Table "public.visitor_main"
  Column  | Type |  Modifiers   
--+--+--
 iuserid  | numeric(12,0)| not null
 firstname| character(25)| not null
 lastname | character(25)| not null


Sample Data line
'3236','Alonzo','Peter'

ERROR:  invalid input syntax for type numeric: "'3236'"
CONTEXT:  COPY visitor_main, line 1, column iuserid: "'3236'"


Thoughts ?
Regards
Steph


pgp6CrlzVHPhB.pgp
Description: PGP signature

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Rod Taylor

> > select * from mytable where somecondition AND uniquecol>? 
> >   ORDER by uniquecol limit 50 OFFSET 50;
> 
> > where the ? is placeholder for last value returned by last query.
> 
> Uh, you don't want the OFFSET there do you?  But otherwise, yeah,
> this is a popular solution for paging through some rows.  Doesn't really
> help with the problem of counting the total dataset though ...

In the past I've done an EXPLAIN and parsed the plan to see what
PostgreSQL estimated for the number of . If less than
$threshhold, I would do a count(*). If more than $threshhold I would
display to the user "approx N records".

This seemed to be good enough for most cases.
-- 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Ragnar Hafstað

On Fri, 2005-04-08 at 12:32 -0400, Tom Lane wrote:
> Ragnar =?ISO-8859-1?Q?Hafsta=F0?= <[EMAIL PROTECTED]> writes:
> > you might reduce the performance loss if your dataset is ordered by
> > a UNIQUE index.
> 
> > select * from mytable where somecondition 
> >   ORDER by uniquecol limit 50;
> 
> > and next:
> 
> > select * from mytable where somecondition AND uniquecol>? 
> >   ORDER by uniquecol limit 50 OFFSET 50;
> 
> > where the ? is placeholder for last value returned by last query.
> 
> Uh, you don't want the OFFSET there do you? 

ooops! of course not. the uniquecol>? is meant to REPLACE
the OFFSET.

gnari



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin

Thanks all.
I might have to add a button to do the count on command so they don't get
the hit.
I would want it to return the count of the condition, not the currently
displayed number of rows.

Is there any other database engines that provide better performance?
(We just 2 moths moving to postgres and it is not live yet, but if I am
going to get results back slower then my 2 proc box running MSSQL in 2 gig
and 2 processor I cant see any reason to move to it)
The Postgres is on a 4 proc Dell with 8 gigs of memory.
I thought I could analyze our queries and our config to optimize.


Joel Fradkin
 



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Vivek Khera

On Apr 8, 2005, at 10:59 AM, Andrew Sullivan wrote:
wakes up the mail-sender client with the NOTIFY; the NOTIFY and the
commit to the mail-it table only happen in that case if the
transaction commits.  And since mail is async anyway, the extra few
seconds shouldn't make any difference, right?
I have a lot of processing that could benefit from this type of 
synchronization, except the fact that there's no Pg command to "wait 
until I get a notify message".  You have to constantly poll to see if 
you got one, which negates a lot of the benefit of async notification 
to rarely run processes.

Vivek Khera, Ph.D.
+1-301-869-4449 x806


smime.p7s
Description: S/MIME cryptographic signature

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Scott Marlowe

On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:
> Thanks all.
> I might have to add a button to do the count on command so they don't get
> the hit.
> I would want it to return the count of the condition, not the currently
> displayed number of rows.
> 
> Is there any other database engines that provide better performance?
> (We just 2 moths moving to postgres and it is not live yet, but if I am
> going to get results back slower then my 2 proc box running MSSQL in 2 gig
> and 2 processor I cant see any reason to move to it)
> The Postgres is on a 4 proc Dell with 8 gigs of memory.
> I thought I could analyze our queries and our config to optimize.

Judging postgresql on one single data point (count(*) performance) is
quite unfair.  Unless your system only operates on static data and is
used to mostly do things like counting, in which case, why are you using
a database?

PostgreSQL is a great fit for certain loads, and a poor fit for others. 
Are you going to have lots of people updating the database WHILE the
select count(*) queries are running?  Are you going to be doing other,
more interesting things than simply counting?  If so, you really should
build a test case that emulates what you're really going to be doing
with the system.

I've found that the poor performance of aggregates in PostgreSQL is
generally more than made up for by the outstanding behaviour it exhibits
when under heavy parallel load.  

Note that the basic design of PostgreSQL's MVCC system is such that
without using some kind of trigger to maintain pre-calculated aggregate
information, it will NEVER be as fast as most other databases at doing
aggregates across large chunks of your data.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [SQL] Numeric and CSV under 8.0.1 ?

2005-04-08 Thread Keith Worthington

On Fri, 8 Apr 2005 12:51:47 -0400, Stef wrote
> Hello Everyone,
>   Currently, here at work, I am doing the whole
> 'advocacy' part of postgreSQL. It's not really hard to
> do, as the other database's are MySQL and Sybase ;)
> 
>   There is obviously a whole spat of data munging
> going on in the background, and I noticed that psql in
> 8.0.1 now accepts CSV ! Great.
> 
>   Except, it doesn't really appear to be 100% there.
> Numeric's wrapped in '...' don't want to appear to go in.
> Is this a 'known problem' ?
> 
> Table "public.visitor_main"
>   Column  | Type |  Modifiers
> 
> --+--+--
>  iuserid  | numeric(12,0)| not null
>  firstname| character(25)| not null
>  lastname | character(25)| not null
> 
> Sample Data line
> '3236','Alonzo','Peter'
> 
> ERROR:  invalid input syntax for type numeric: "'3236'"
> CONTEXT:  COPY visitor_main, line 1, column iuserid: "'3236'"
> 
>   Thoughts ?
>   Regards
>   Steph

Steph,

'3236' is a string not a numeric.  As I see it (novice that I am) you have
three choices.  1) Write an external program (gawk, sed, etc.) to remove the
quotes around that field.  2) Import the data into an intermediate table and
then using an after trigger move and manipulate the data using CAST.  3)
Import the data into your table using a BEFORE trigger and manipulate the data
using CAST.

HTH

Kind Regards,
Keith

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [SQL] Numeric and CSV under 8.0.1 ?

2005-04-08 Thread Stef


Hello Keith,
well, I can understand that 3265 appears to be a
string, but, I was under the impression that -everything-
in a CSV format file could have ' ' around it? Is this not
the case ?

Sorry if I am being completely insane here :)
Steph

On Fri, Apr 08, 2005 at 02:12:11PM -0400, Keith Worthington wrote:
> On Fri, 8 Apr 2005 12:51:47 -0400, Stef wrote
> > Hello Everyone,
> > Currently, here at work, I am doing the whole
> > 'advocacy' part of postgreSQL. It's not really hard to
> > do, as the other database's are MySQL and Sybase ;)
> > 
> > There is obviously a whole spat of data munging
> > going on in the background, and I noticed that psql in
> > 8.0.1 now accepts CSV ! Great.
> > 
> > Except, it doesn't really appear to be 100% there.
> > Numeric's wrapped in '...' don't want to appear to go in.
> > Is this a 'known problem' ?
> > 
> > Table "public.visitor_main"
> >   Column  | Type |  Modifiers
> > 
> > --+--+--
> >  iuserid  | numeric(12,0)| not null
> >  firstname| character(25)| not null
> >  lastname | character(25)| not null
> > 
> > Sample Data line
> > '3236','Alonzo','Peter'
> > 
> > ERROR:  invalid input syntax for type numeric: "'3236'"
> > CONTEXT:  COPY visitor_main, line 1, column iuserid: "'3236'"
> > 
> > Thoughts ?
> > Regards
> > Steph
> 
> Steph,
> 
> '3236' is a string not a numeric.  As I see it (novice that I am) you have
> three choices.  1) Write an external program (gawk, sed, etc.) to remove the
> quotes around that field.  2) Import the data into an intermediate table and
> then using an after trigger move and manipulate the data using CAST.  3)
> Import the data into your table using a BEFORE trigger and manipulate the data
> using CAST.
> 
> HTH
> 
> Kind Regards,
> Keith
> 


pgpEdkkQPDQxn.pgp
Description: PGP signature

Re: [SQL] Numeric and CSV under 8.0.1 ?

2005-04-08 Thread Michael Fuhr

On Fri, Apr 08, 2005 at 02:25:13PM -0400, Stef wrote:
> 
>   well, I can understand that 3265 appears to be a
> string, but, I was under the impression that -everything-
> in a CSV format file could have ' ' around it? Is this not
> the case ?

See the documentation for COPY -- the default quote character for
CSV is a double quote, but you can change it with QUOTE.

http://www.postgresql.org/docs/8.0/interactive/sql-copy.html

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [SQL] Question on triggers and plpgsql

2005-04-08 Thread Tom Lane

Vivek Khera <[EMAIL PROTECTED]> writes:
> I have a lot of processing that could benefit from this type of 
> synchronization, except the fact that there's no Pg command to "wait 
> until I get a notify message".

This is a client library deficiency, not a problem with the backend or
the protocol.  In libpq it is actually possible to do it, but you have
to select() or poll() on the socket for yourself, which is a tad ugly.

OTOH, most apps that want to do that also want to wait on other sockets
at the same time, so a cleaner-looking API wouldn't necessarily be any
more useful.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Bob Henkel

On Apr 8, 2005 1:10 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote:
On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:> Thanks all.> I might have to add a button to do the count on command so they don't get> the hit.> I would want it to return the count of the condition, not the currently> displayed number of rows.>> Is there any other database engines that provide better performance?> (We just 2 moths moving to postgres and it is not live yet, but if I am> going to get results back slower then my 2 proc box running MSSQL in 2 gig> and 2 processor I cant see any reason to move to it)> The Postgres is on a 4 proc Dell with 8 gigs of memory.> I thought I could analyze our queries and our config to optimize.Judging postgresql on one single data point (count(*) performance) isquite unfair.  Unless your system only operates on static data and isused to mostly do things like counting, in which case, why are you usinga database?PostgreSQL is a great fit for certain loads, and a poor fit for others.Are you going to have lots of people updating the database WHILE theselect count(*) queries are running?  Are you going to be doing other,more interesting things than simply counting?  If so, you really shouldbuild a test case that emulates what you're really going to be doingwith the system.I've found that the poor performance of aggregates in PostgreSQL isgenerally more than made up for by the outstanding behaviour it exhibitswhen under heavy parallel load.Note that the basic design of PostgreSQL's MVCC system is such thatwithout using some kind of trigger to maintain pre-calculated aggregateinformation, it will NEVER be as fast as most other databases at doingaggregates across large chunks of your data.---(end of broadcast)---TIP 7: don't forget to increase your free space map settingsFrom a simple/high level perspective why is this?  That is why can't PostgreSQL do aggregates as well across large chunks of data. I'm assuming it extremely complicated. Otherwise the folks around here would have churned out a fix in a month or less and made this issue a past story.

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Tom Lane

Bob Henkel <[EMAIL PROTECTED]> writes:
> From a simple/high level perspective why is this? That is why can't
> PostgreSQL do aggregates as well across large chunks of data. I'm
> assuming it extremely complicated. Otherwise the folks around here
> would have churned out a fix in a month or less and made this issue a
> past story.

You can find very detailed discussions of this in the archives, but
the basic reason is that we have a very general/extensible view of
aggregates (which is how come we can support custom aggregates).
An aggregate is a function that you feed all the input rows to, one
at a time, and then it produces the answer.  Nice, general, extensible,
and not at all optimizable :-(

Now in general that is the only way to do it, and so Scott's implication
that we always suck compared to other databases is really an
overstatement.  Ask another database to do a standard deviation
calculation, for instance, and it'll be just as slow.  However there are
special cases that other DBs can optimize that we don't even try to.
The big ones are:

* COUNT(*) across a whole table --- most non-MVCC databases keep tabs of
the physical number of the rows in the table, and so they can answer
this very quickly.  Postgres doesn't keep such a count, and under MVCC
rules it wouldn't necessarily be the right answer if we had it.
(BTW, count of rows satisfying a particular condition is a different
ballgame entirely; in most cases that can't be optimized at all, AFAIK.)
If you are willing to accept approximate answers there are various
tricks you can use --- see the archives --- but we don't get to fudge
on COUNT(*) itself because it's in the SQL standard.

* MIN or MAX of an indexed column --- most DBs can use an index scan to
find such a row relatively quickly, although whether this trick works or
not depends a whole lot on whether you have WHERE or GROUP BY and just
what those conditions look like.

You can fake the min/max answer in Postgres by doing the transformstion
to an indexable query by hand, for instance instead of MAX(col) do
SELECT col FROM tab ORDER BY col DESC LIMIT 1;

There are periodic discussions in the hackers list about teaching the
planner to do that automatically, and it will probably happen someday;
but it's a complicated task and not exceedingly high on the priority list.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin

Believe me I just spent two months converting our app, I do not wish to give
up on that work. We do a great deal more then count. Specifically many of
our queries run much slower on postgres. As mentioned I purchased a 4 proc
box with 8 gigs of memory for this upgrade (Dell may have been a poor choice
based on comments I have received). Even when I could see a query like
select * from tblassoc where clientnum = 'WAZ' using indexed joins on
location and jobtitle it is still taking 22 seconds to run compared to the 9
seconds on MSSQL on a 2 proc 2 gig box. I got one of my querries to run
faster using a page cost of .2 but then the assoc query was running 50
seconds, so I adjusted to a cost of 2 (tried 1.2, 2, 3, and 4 and did not
see hug changes in the assoc except it did not like .2).

I have placed a call to commandprompt.com and am going to pay for some
support to see if they have anything meaningful to add.

It could be something with my hardware, my hardware config, my postgres
config. I am just not sure. I know I have worked diligently to try to learn
all I can and I used to think I was kinda smart.

I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15k
drives. I am using links to those from the install directory. It starts and
stops ok this way, but maybe it should be different.

I can tell you I am very happy to have this forum as I could not have gotten
to the point I am without the many usefull comments from folks on the list.
I greatly appreciate everyone who has helped. But truth is if I cant get to
work better then I have I may have to ditch the effort and bite the 70K
bullet. Its compounded by using 3 developers time for two months to yield an
answer that my boss may just fire me for. I figured since my first test
showed I could get data faster on the postgres box that I could with enough
study get all our data to go faster, but I am afraid I have not been very
successful.

My failure is not a reflection postgres as you mentioned it is definatley
great at some things. I have 90 some views not to mention as many stored
procedures that have been converted. I wrote an app to move the data and it
works great. But if it too slow I just will not be able to use for
production.

Joel


Judging postgresql on one single data point (count(*) performance) is
quite unfair.  Unless your system only operates on static data and is
used to mostly do things like counting, in which case, why are you using
a database?

PostgreSQL is a great fit for certain loads, and a poor fit for others. 
Are you going to have lots of people updating the database WHILE the
select count(*) queries are running?  Are you going to be doing other,
more interesting things than simply counting?  If so, you really should
build a test case that emulates what you're really going to be doing
with the system.

I've found that the poor performance of aggregates in PostgreSQL is
generally more than made up for by the outstanding behaviour it exhibits
when under heavy parallel load.  

Note that the basic design of PostgreSQL's MVCC system is such that
without using some kind of trigger to maintain pre-calculated aggregate
information, it will NEVER be as fast as most other databases at doing
aggregates across large chunks of your data.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Andrew Sullivan

On Fri, Apr 08, 2005 at 03:23:25PM -0400, Joel Fradkin wrote:
> Believe me I just spent two months converting our app, I do not wish to give
> our queries run much slower on postgres. As mentioned I purchased a 4 proc

I suspect you want the -performance list.  And it'd be real handy to
get some EXPLAIN ANALYSE results for the offending queries in order
to help you (where "handy" is read as "necessary").

A
-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Information security isn't a technological problem.  It's an economics
problem.
--Bruce Schneier

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin

As always thanks Tom,

I will definitely look at what I can do.
Since it is a count of matched condition records I may not have a way
around.
I don't think my clients would like me to aprox as it is a count of their
records. What I plan on doing assuming I can get all my other problems fixed
(as mentioned I am going to try and get paid help to see if I goofed it up
some where) is make the count a button, so they don't wait everytime, but
can choose to wait if need be, maybe I can store the last count with a count
on day for the generic search it defaults to, and just have them do a count
on demand if they have a specific query. Our screens have several criteria
fields in each application.

Joel Fradkin

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 08, 2005 2:28 PM
To: Bob Henkel
Cc: Scott Marlowe; Joel Fradkin; Andrew Sullivan; pgsql-sql@postgresql.org
Subject: Re: [SQL] getting count for a specific querry 

Bob Henkel <[EMAIL PROTECTED]> writes:
> From a simple/high level perspective why is this? That is why can't
> PostgreSQL do aggregates as well across large chunks of data. I'm
> assuming it extremely complicated. Otherwise the folks around here
> would have churned out a fix in a month or less and made this issue a
> past story.

You can find very detailed discussions of this in the archives, but
the basic reason is that we have a very general/extensible view of
aggregates (which is how come we can support custom aggregates).
An aggregate is a function that you feed all the input rows to, one
at a time, and then it produces the answer.  Nice, general, extensible,
and not at all optimizable :-(

Now in general that is the only way to do it, and so Scott's implication
that we always suck compared to other databases is really an
overstatement.  Ask another database to do a standard deviation
calculation, for instance, and it'll be just as slow.  However there are
special cases that other DBs can optimize that we don't even try to.
The big ones are:

* COUNT(*) across a whole table --- most non-MVCC databases keep tabs of
the physical number of the rows in the table, and so they can answer
this very quickly.  Postgres doesn't keep such a count, and under MVCC
rules it wouldn't necessarily be the right answer if we had it.
(BTW, count of rows satisfying a particular condition is a different
ballgame entirely; in most cases that can't be optimized at all, AFAIK.)
If you are willing to accept approximate answers there are various
tricks you can use --- see the archives --- but we don't get to fudge
on COUNT(*) itself because it's in the SQL standard.

* MIN or MAX of an indexed column --- most DBs can use an index scan to
find such a row relatively quickly, although whether this trick works or
not depends a whole lot on whether you have WHERE or GROUP BY and just
what those conditions look like.

You can fake the min/max answer in Postgres by doing the transformstion
to an indexable query by hand, for instance instead of MAX(col) do
SELECT col FROM tab ORDER BY col DESC LIMIT 1;

There are periodic discussions in the hackers list about teaching the
planner to do that automatically, and it will probably happen someday;
but it's a complicated task and not exceedingly high on the priority list.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Bob Henkel

On Apr 8, 2005 2:23 PM, Joel Fradkin <[EMAIL PROTECTED]> wrote:
Believe me I just spent two months converting our app, I do not wish to giveup on that work. We do a great deal more then count. Specifically many ofour queries run much slower on postgres. As mentioned I purchased a 4 procbox with 8 gigs of memory for this upgrade (Dell may have been a poor choicebased on comments I have received). Even when I could see a query likeselect * from tblassoc where clientnum = 'WAZ' using indexed joins onlocation and jobtitle it is still taking 22 seconds to run compared to the 9seconds on MSSQL on a 2 proc 2 gig box. I got one of my querries to runfaster using a page cost of .2 but then the assoc query was running 50seconds, so I adjusted to a cost of 2 (tried 1.2, 2, 3, and 4 and did notsee hug changes in the assoc except it did not like .2).I have placed a call to commandprompt.com and am going to pay for somesupport to see if they have anything meaningful to add.It could be something with my hardware, my hardware config, my postgresconfig. I am just not sure. I know I have worked diligently to try to learnall I can and I used to think I was kinda smart.I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15kdrives. I am using links to those from the install directory. It starts andstops ok this way, but maybe it should be different.I can tell you I am very happy to have this forum as I could not have gottento the point I am without the many usefull comments from folks on the list.I greatly appreciate everyone who has helped. But truth is if I cant get towork better then I have I may have to ditch the effort and bite the 70Kbullet. Its compounded by using 3 developers time for two months to yield ananswer that my boss may just fire me for. I figured since my first testshowed I could get data faster on the postgres box that I could with enoughstudy get all our data to go faster, but I am afraid I have not been verysuccessful.My failure is not a reflection postgres as you mentioned it is definatleygreat at some things. I have 90 some views not to mention as many storedprocedures that have been converted. I wrote an app to move the data and itworks great. But if it too slow I just will not be able to use forproduction.JoelJudging postgresql on one single data point (count(*) performance) isquite unfair.  Unless your system only operates on static data and isused to mostly do things like counting, in which case, why are you usinga database?PostgreSQL is a great fit for certain loads, and a poor fit for others.Are you going to have lots of people updating the database WHILE theselect count(*) queries are running?  Are you going to be doing other,more interesting things than simply counting?  If so, you really shouldbuild a test case that emulates what you're really going to be doingwith the system.I've found that the poor performance of aggregates in PostgreSQL isgenerally more than made up for by the outstanding behaviour it exhibitswhen under heavy parallel load.Note that the basic design of PostgreSQL's MVCC system is such thatwithout using some kind of trigger to maintain pre-calculated aggregateinformation, it will NEVER be as fast as most other databases at doingaggregates across large chunks of your data.---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not matchHave you posted your postgresql config files for the folks here to review? I can't say I can help you with that because I too can only read the docs and go from there. But for specific situations you need specific configs.  I would think you can get more out of postgresql with a some time and help from the people around here.  Though count(*) looks like it may be slow.

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Mischa Sandberg

Quoting Scott Marlowe <[EMAIL PROTECTED]>:

> On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:
> > I might have to add a button to do the count on command so they don't get
> > the hit.
> > I would want it to return the count of the condition, not the currently
> > displayed number of rows.
> 
> Judging postgresql on one single data point (count(*) performance) is
> quite unfair.  Unless your system only operates on static data and is
> used to mostly do things like counting, in which case, why are you using
> a database?

For the general discussion of slowness of count(*),
and given no entry on the subject in 
   http://www.postgresql.org/docs/faqs.FAQ.html
... I guess everyone has to be pointed at:
 http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php

However, the gist of this person's problem is that an adhoc query,
NOT just a 'select count(*) from table', can take remarkably long.
Again, the problem is that PG can't just scan an index.
--
One workaround for this is to use EXPLAIN.
THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES.
It's pointless overhead, otherwise.

default_statistics_target is cranked up to 200 on all such tables,
and pg_autovacuum is running. (If there were anything to improve,
it would be refining the thresholds on this).

If the "(cost...rows=" string returns a number higher than the 
QUERY row limit, the user is derailed ("That's not specific enough to answer
immediately; do you want an emailed report?").

Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query itself.
If the "(actual...rows=...)" is higher than the RESULT row limit (PAGE 
limit).

It then runs the query, with the PAGE rows offset and limit --- and happily,
practically everything that query needs is now in shared_buffers.
The count from the EXPLAIN analyze is displayed in the web page.

-- 
"Dreams come true, not free." -- S.Sondheim, ITW

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin









I have asked specific questions and paid
attention to the various threads on configuration.

I will take my config files and post on
the performance thread that is a good suggestion (personnaly I have more faith
in this forum then a paid consultant, but at this point I am willing to try
both).

 

Thanks again.

 

The count thing I can get around using
stored results and on demand counting, but some of my statistical reporting is
just a must have.

I enclosed one of my views, I realize to
get help I should also include tables and indexes etc, and maybe I will do that.

It is just there are so many of them. This
one in particular did not run at all when I first got my data loaded.

I ended up adding a few indexes and not
sure what else and got it to run faster on postgres.

Now it is running horrid, so I am back to the
drawing board I change one thing and something else breaks.

I am just frustrated, maybe Monday I will
have better strength to figure it all out.

 



Joel Fradkin



 



 








CREATE OR REPLACE VIEW  viwcasecube
as
/*customer 1*/
select c.clientnum,c.casenum,c.casereferencenum, coalesce
((select sum(quantity * amount) 
   as merchandiseamount
   from tblmerchandise m
   where  m.caseid = c.caseid and m.clientnum=c.clientnum), 0) 
|| coalesce(c.totaladmitted, 0) as casevalue, 
coalesce(c.totaladmitted, 0) as admitted, 
coalesce(c.totalrecovery, 0) as recovered, 
coalesce(c.ageatcase, 0) as ageatcase,
1 as numberofcase, coalesce(ct.value,'na') as casetype, 
s.value as status, c.opendate,  c.closedate, 
a.value as action, u2.completename as closebyuser, 
cs.value as casesource, m.value as method, m2.value as method2, 
c.reportingagentfirstinitial, c.reportingagentlastname, 
case when c.monthsemployedatcase is null then 'na'
else cast(c.monthsemployedatcase as varchar(3))
end  as monthsemployedatcase,
u1.completename as createdby, st.value as subjecttype,
ot.value as offensetype, 
 /*cust*/  custpt.value as patrontype, 
'na'  as jobtitle,
0 as testscore,
coalesce(cust.firstname,'na') as firstname,
coalesce(cust.lastname,'na') as lastname,
coalesce(cust.address,'na') as address,
coalesce(cust.city,'na') as city,
coalesce(cust.state,'na') as state,
coalesce(cust.zip,'na') as zip,

coalesce(crtt.value,'na') as restitutiontype,
/*
type of restitution tracking
*/

coalesce(tblsex.value,'na') as gender,
coalesce(eth.value,'na') as ethnicity,
custmbt.value as militarybranch,
custmst.value as militarystatus,
coalesce(secagentnum,'not recorded') as secagentnum,

l.locationnum, l.name as store,

coalesce(l.address,'na') as locationaddress,
coalesce(l.city,'na') as locationcity,
coalesce(l.state,'na') as locationstate,
coalesce(l.zip,'na') as locationzip,

d .districtnum, 
d .districtname as district, r.regionnum, 
r.regionname as region, dv.divisionnum, 
dv.divisionname as division, 
case when c.apprehdate is null then c.opendate
else c.apprehdate
end  as apprehdate,
 to_char( coalesce(c.apprehdate,c.opendate),'') as year,

  to_char( coalesce(c.apprehdate, c.opendate),'q') as 
quarter,

  to_char( coalesce(c.apprehdate, c.opendate),'MM') as 
month,

  to_char( coalesce(c.apprehdate, c.opendate),'D') as 
weekday,

  to_char( coalesce(c.apprehdate, c.opendate),'WW') as week,

  to_char( coalesce(c.apprehdate, c.opendate),'HH24:MI') as 
time,

coalesce(c.sourcedocnum,'none') as sourcedocnum,
   case coalesce(c.sourcemodule,'n') 
 when 'n' then 'none'
 when 'i' then 'incident'
 when 'g' then 'general investigation'
 when 'e' then 'employee investigation'
   else 'none'
   end as sourcemodule,
case coalesce(tblcase1.clientnum, 'no') || coalesce(cdacase.clicasenumber, 
'no')
when 'nono' then 'no' else 'yes' end as civilcase,
coalesce(lpr.lpregionnum,'na')as lpregionnum,coalesce(lpr.managername,'na') 
as lpmanager
from tblcase c left outer join
tblaction a on c.actionid = a.id and c.clientnum = a.clientnum   and 1= 
a.presentationid left outer join 
tblmethod m on c.methodid = m.id and c.clientnum = m.clientnum and 1= 
m.presentationid  left outer join
tblmethod m2 on c.methodid2 = m2.id  and c.clientnum = m2.clientnum and 1= 
m2.presentationid left outer join
tblcasesource cs on  c.casesourceid = cs.id  and c.clientnum = cs.clientnum 
 and 1= cs.presentationid
inner join
tbllocation l 
left outer join tbllpregion lpr on l.lpregionid = lpr.lpregionid and 
l.clientnum = lpr.clientnum and 1= lpr.presentationid 
on c.clientnum = l.clientnum and 
c.locationid = l.locationid 
inner join
tbldistrict d on 

c.clientnum = d.clientnum and 
l.districtid = d.districtid and
l.regionid = d.regionid and
l.divisionid = d.divisionid
 inner join
tblregion r on 
c.clientnu

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin

I will also look at doing it the way you describe, they do have wide
liberty. Thanks so much for the ideas. Sorry I did not do a perusal of the
archives first (I normally try that, but think I am brain dead today).

Joel Fradkin

Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305

[EMAIL PROTECTED]
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and delete and destroy
all copies of the original message, including attachments.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Mischa Sandberg
Sent: Friday, April 08, 2005 2:40 PM
To: Scott Marlowe
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] getting count for a specific querry

Quoting Scott Marlowe <[EMAIL PROTECTED]>:

> On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:
> > I might have to add a button to do the count on command so they don't
get
> > the hit.
> > I would want it to return the count of the condition, not the currently
> > displayed number of rows.
> 
> Judging postgresql on one single data point (count(*) performance) is
> quite unfair.  Unless your system only operates on static data and is
> used to mostly do things like counting, in which case, why are you using
> a database?

For the general discussion of slowness of count(*),
and given no entry on the subject in 
   http://www.postgresql.org/docs/faqs.FAQ.html
... I guess everyone has to be pointed at:
 http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php

However, the gist of this person's problem is that an adhoc query,
NOT just a 'select count(*) from table', can take remarkably long.
Again, the problem is that PG can't just scan an index.
--
One workaround for this is to use EXPLAIN.
THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES.
It's pointless overhead, otherwise.

default_statistics_target is cranked up to 200 on all such tables,
and pg_autovacuum is running. (If there were anything to improve,
it would be refining the thresholds on this).

If the "(cost...rows=" string returns a number higher than the 
QUERY row limit, the user is derailed ("That's not specific enough to answer
immediately; do you want an emailed report?").

Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query
itself.
If the "(actual...rows=...)" is higher than the RESULT row limit (PAGE
limit).

It then runs the query, with the PAGE rows offset and limit --- and happily,
practically everything that query needs is now in shared_buffers.
The count from the EXPLAIN analyze is displayed in the web page.

-- 
"Dreams come true, not free." -- S.Sondheim, ITW

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Vivek Khera

On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:
I set up the data on 4 10k scsi drives in a powervault and my wal on 2 
15k
drives. I am using links to those from the install directory. It 
starts and
stops ok this way, but maybe it should be different.

Your problem might just be the choice of using a Dell RAID controller.  
I have a 1 year old box connected to a 14 disk powervault (PowerEdge 
2650) and it is dog slow compared to a dual opteron with 8 disks that 
is replacing it.  It is all I/O for me, and the dell's just are not 
known for speedy I/O.

Vivek Khera, Ph.D.
+1-301-869-4449 x806


smime.p7s
Description: S/MIME cryptographic signature

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Scott Marlowe

On Fri, 2005-04-08 at 15:23, Vivek Khera wrote:
> On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:
> 
> > I set up the data on 4 10k scsi drives in a powervault and my wal on 2 
> > 15k
> > drives. I am using links to those from the install directory. It 
> > starts and
> > stops ok this way, but maybe it should be different.
> >
> 
> Your problem might just be the choice of using a Dell RAID controller.  
> I have a 1 year old box connected to a 14 disk powervault (PowerEdge 
> 2650) and it is dog slow compared to a dual opteron with 8 disks that 
> is replacing it.  It is all I/O for me, and the dell's just are not 
> known for speedy I/O.

Note that there are several different RAID controllers you can get with
a DELL.  I had good luck with the PERC 4C (AMI MegaRAID based) at my
last job.  In a dual 2400Mz machine with 2 gigs ram, it handily outran a
4 way (about 1200 MHz CPUs) windows / MSSQL box with 4 gigs of ram at
most tasks.  Especially inserts / updates.  The windows machine had the
more generic PERC 3I type controller in it.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Vivek Khera

On Apr 8, 2005, at 4:31 PM, Scott Marlowe wrote:
Note that there are several different RAID controllers you can get with
a DELL.  I had good luck with the PERC 4C (AMI MegaRAID based) at my
I've had bad luck regarding speed with *all* of them, AMI MegaRAID and 
Adaptec based ones, under high load.  Under moderate to low load 
they're acceptable.

Vivek Khera, Ph.D.
+1-301-869-4449 x806


smime.p7s
Description: S/MIME cryptographic signature

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Vivek Khera

On Apr 8, 2005, at 4:35 PM, Bob Henkel wrote:
desktop SATA drive with no RAID? I'm by any means as knowledgeable 
about I/O
setup as many of you are but my 2 cents wonders if the Dell RAID is 
really
that much slower than a competitively priced/speced alternative? Would
Joel's problems just fade away if he wasn't using a Dell RAID?

"Dell RAID" is not one thing.  They sell "altered" RAID cards from 
Adaptec and LSI.  Whatever alteration they do to them tends to make 
them run not so fast.

I have a Dell SATA RAID (adaptec based) on the office server and it is 
OK, though not something I'd buy again.

I have various PERC 3 and PERC 4 RAID controllers on my servers (SCSI 
based) and they suck under heavy I/O load.

I wonder why the name-brand LSI cards work so much faster... perhaps it 
is the motherboard?  I don't know, and I don't care... :-)  For me, 
high performance DB and Dell servers are mutually exclusive.

Vivek Khera, Ph.D.
+1-301-869-4449 x806


smime.p7s
Description: S/MIME cryptographic signature

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Scott Marlowe

On Fri, 2005-04-08 at 15:35, Bob Henkel wrote:
> On Apr 8, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote: 
> On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:
> 
> > I set up the data on 4 10k scsi drives in a powervault and
> my wal on 2
> > 15k
> > drives. I am using links to those from the install
> directory. It
> > starts and
> > stops ok this way, but maybe it should be different.
> >
> 
> Your problem might just be the choice of using a Dell RAID
> controller.
> I have a 1 year old box connected to a 14 disk powervault
> (PowerEdge
> 2650) and it is dog slow compared to a dual opteron with 8
> disks that
> is replacing it.  It is all I/O for me, and the dell's just
> are not
> known for speedy I/O.
> 
> Vivek Khera, Ph.D.
> +1-301-869-4449 x806
> 
> 
> 
> But that is relative I would think.  Is the Dell RAID much faster than
> my desktop SATA drive with no RAID? I'm by any means as knowledgeable
> about I/O setup as many of you are but my 2 cents wonders if the Dell
> RAID is really that much slower than a competitively priced/speced
> alternative?  Would Joel's problems just fade away if he wasn't using
> a Dell RAID? 

My experience with the 3i controllers (See my earlier post) was that my
old Pentium Pro200x2 machine with 512 meg ram and a generic Ultra Wide
SCSI card and a half dozen drives running software RAID 5 was faster.

Seriously.  So was my P-II-350 desktop with the same controller, and an
older Dual P-III-750 with only UltraSCSI running in a RAID-1 mirror set.

The 3I is REALLY slow (or at least WAS slow) under linux.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Bob Henkel

On Apr 8, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:
On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:> I set up the data on 4 10k scsi drives in a powervault and my wal on 2> 15k> drives. I am using links to those from the install directory. It> starts and> stops ok this way, but maybe it should be different.>Your problem might just be the choice of using a Dell RAID controller.I have a 1 year old box connected to a 14 disk powervault (PowerEdge2650) and it is dog slow compared to a dual opteron with 8 disks thatis replacing it.  It is all I/O for me, and the dell's just are notknown for speedy I/O.Vivek Khera, Ph.D.+1-301-869-4449 x806
But that is relative I would think.  Is the Dell RAID much faster than my desktop SATA drive with no RAID? I'm by any means as knowledgeable about I/O setup as many of you are but my 2 cents wonders if the Dell RAID is really that much slower than a competitively priced/speced alternative?  Would Joel's problems just fade away if he wasn't using a Dell RAID?

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Bob Henkel

On Apr 8, 2005 3:42 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote:
On Fri, 2005-04-08 at 15:35, Bob Henkel wrote:> On Apr 8, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:> On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:>> > I set up the data on 4 10k scsi drives in a powervault and> my wal on 2> > 15k> > drives. I am using links to those from the install> directory. It> > starts and> > stops ok this way, but maybe it should be different.> >>> Your problem might just be the choice of using a Dell RAID> controller.> I have a 1 year old box connected to a 14 disk powervault> (PowerEdge> 2650) and it is dog slow compared to a dual opteron with 8> disks that> is replacing it.  It is all I/O for me, and the dell's just> are not> known for speedy I/O.>> Vivek Khera, Ph.D.> +1-301-869-4449 x806 But that is relative I would think.  Is the Dell RAID much faster than> my desktop SATA drive with no RAID? I'm by any means as knowledgeable> about I/O setup as many of you are but my 2 cents wonders if the Dell> RAID is really that much slower than a competitively priced/speced> alternative?  Would Joel's problems just fade away if he wasn't using> a Dell RAID?My experience with the 3i controllers (See my earlier post) was that myold Pentium Pro200x2 machine with 512 meg ram and a generic Ultra WideSCSI card and a half dozen drives running software RAID 5 was faster.Seriously.  So was my P-II-350 desktop with the same controller, and anolder Dual P-III-750 with only UltraSCSI running in a RAID-1 mirror set.The 3I is REALLY slow (or at least WAS slow) under linux.
Interesting...  Maybe Joel after a weekend of rest can try it on a different setup even if that different setup is just a power users development machine to see if he has same or worse timing results.  Be wonderful if it magically sped  up.

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Scott Marlowe

On Fri, 2005-04-08 at 15:36, Vivek Khera wrote:
> On Apr 8, 2005, at 4:31 PM, Scott Marlowe wrote:
> 
> > Note that there are several different RAID controllers you can get with
> > a DELL.  I had good luck with the PERC 4C (AMI MegaRAID based) at my
> >
> 
> I've had bad luck regarding speed with *all* of them, AMI MegaRAID and 
> Adaptec based ones, under high load.  Under moderate to low load 
> they're acceptable.

Were you using battery backed cache with write-back enabled on the 4C? 
The 3C is also megaraid based, but it a pretty old design and not very
fast.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Scott Marlowe

On Fri, 2005-04-08 at 15:41, Vivek Khera wrote:
> On Apr 8, 2005, at 4:35 PM, Bob Henkel wrote:
> 
> > desktop SATA drive with no RAID? I'm by any means as knowledgeable 
> > about I/O
> > setup as many of you are but my 2 cents wonders if the Dell RAID is 
> > really
> > that much slower than a competitively priced/speced alternative? Would
> > Joel's problems just fade away if he wasn't using a Dell RAID?
> >
> 
> "Dell RAID" is not one thing.  They sell "altered" RAID cards from 
> Adaptec and LSI.  Whatever alteration they do to them tends to make 
> them run not so fast.
> 
> I have a Dell SATA RAID (adaptec based) on the office server and it is 
> OK, though not something I'd buy again.
> 
> I have various PERC 3 and PERC 4 RAID controllers on my servers (SCSI 
> based) and they suck under heavy I/O load.
> 
> I wonder why the name-brand LSI cards work so much faster... perhaps it 
> is the motherboard?  I don't know, and I don't care... :-)  For me, 
> high performance DB and Dell servers are mutually exclusive.

It would  be nice to be able to put a stock ami megaraid in one and see.

Do you run your 2650s with hyperthreading on?  I found that slowed mine
down under load, but we never had more than a couple dozen users hitting
the db at once, so we may well have had a different load profile than
what you're seeing.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] getting count for a specific querry

2005-04-08 Thread Joel Fradkin

I turned off hyperthreading (I saw that on the list that it did not  help on
Linux).

I am using a pretty lightweight windows box Optiplex with IDE 750-meg
internal 2.4 mghz cpu.

My desktop has 2 gig, so might not be bad idea to try it local (I have
installed), but me thinks its not totally a hardware issue for us.

Joel Fradkin
 

Do you run your 2650s with hyperthreading on?  I found that slowed mine
down under load, but we never had more than a couple dozen users hitting
the db at once, so we may well have had a different load profile than
what you're seeing.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [SQL] getting count for a specific querry

2005-04-08 Thread PFC


Since it is a count of matched condition records I may not have a way
around.
	What you could do is cache the search results (just caching the id's of  
the rows to display is enough and uses little space) in a cache table,  
numbering them with your sort order using a temporary sequence, so that  
you can :
	SELECT ... FROM cache WHERE row_position BETWEEN page_no*per_page AND  
(page_no+1)*per_page-1
	to get the count :
	SELECT row_position FROM CACHE ORDER BY row_position DESC LIMIT 1

	Add a session_id referencing your sessions table with an ON DELETE  
CASCADE and the cache will be auto-purged when sessions expire.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] getting count for a specific querry

2005-04-08 Thread PFC

Please run this disk throughput test on your system :
http://boutiquenumerique.com/pf/multi_io.py
	It just spawns N threads which will write a lot of data simultaneously to  
the disk, then measures the total time. Same for read. Modify the  
parameters in the source... it's set to generate 10G of files in the  
current directory and re-read them, all with 8 threads.

How much I/O do you get ?
Also  hdparm -t /dev/hd? would be interesting.
On Fri, 08 Apr 2005 21:51:02 +0200, Joel Fradkin <[EMAIL PROTECTED]>  
wrote:

I will also look at doing it the way you describe, they do have wide
liberty. Thanks so much for the ideas. Sorry I did not do a perusal of  
the
archives first (I normally try that, but think I am brain dead today).

Joel Fradkin
Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305
[EMAIL PROTECTED]
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized  
review,
use, disclosure or distribution is prohibited.  If you are not the  
intended
recipient, please contact the sender by reply email and delete and  
destroy
all copies of the original message, including attachments.

-Original Message-
From: [EMAIL PROTECTED]  
[mailto:[EMAIL PROTECTED]
On Behalf Of Mischa Sandberg
Sent: Friday, April 08, 2005 2:40 PM
To: Scott Marlowe
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] getting count for a specific querry

Quoting Scott Marlowe <[EMAIL PROTECTED]>:
On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:
> I might have to add a button to do the count on command so they don't
get
> the hit.
> I would want it to return the count of the condition, not the  
currently
> displayed number of rows.

Judging postgresql on one single data point (count(*) performance) is
quite unfair.  Unless your system only operates on static data and is
used to mostly do things like counting, in which case, why are you using
a database?
For the general discussion of slowness of count(*),
and given no entry on the subject in
   http://www.postgresql.org/docs/faqs.FAQ.html
... I guess everyone has to be pointed at:
 http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php
However, the gist of this person's problem is that an adhoc query,
NOT just a 'select count(*) from table', can take remarkably long.
Again, the problem is that PG can't just scan an index.
--
One workaround for this is to use EXPLAIN.
THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES.
It's pointless overhead, otherwise.
default_statistics_target is cranked up to 200 on all such tables,
and pg_autovacuum is running. (If there were anything to improve,
it would be refining the thresholds on this).
If the "(cost...rows=" string returns a number higher than the
QUERY row limit, the user is derailed ("That's not specific enough to  
answer
immediately; do you want an emailed report?").

Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query
itself.
If the "(actual...rows=...)" is higher than the RESULT row limit  
(PAGE
limit).

It then runs the query, with the PAGE rows offset and limit --- and  
happily,
practically everything that query needs is now in shared_buffers.
The count from the EXPLAIN analyze is displayed in the web page.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [SQL] Numeric and CSV under 8.0.1 ?

2005-04-08 Thread Jim Johannsen

Stef:
   Why is iuserid numeric?  Are you going to do any math on the field?  
If not, change it to varchar.  In the long run you'll be happier.

Stef wrote:
Hello Everyone,
Currently, here at work, I am doing the whole
'advocacy' part of postgreSQL. It's not really hard to
do, as the other database's are MySQL and Sybase ;)
	There is obviously a whole spat of data munging
going on in the background, and I noticed that psql in
8.0.1 now accepts CSV ! Great. 

Except, it doesn't really appear to be 100% there.
Numeric's wrapped in '...' don't want to appear to go in.
Is this a 'known problem' ?
   Table "public.visitor_main"
 Column  | Type |  Modifiers   
--+--+--
iuserid  | numeric(12,0)| not null
firstname| character(25)| not null
lastname | character(25)| not null

Sample Data line
'3236','Alonzo','Peter'
ERROR:  invalid input syntax for type numeric: "'3236'"
CONTEXT:  COPY visitor_main, line 1, column iuserid: "'3236'"
	Thoughts ?
	Regards
	Steph
 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

47 matches

Mail list logo