Re: [SQL] Question on triggers and plpgsql
On Apr 7, 2005, at 5:45 PM, Carlos Moreno wrote: The thing seems to work -- I had to go in a shell as user postgres and execute the command: $ createlang -d dbname plpgsql (I'm not sure I understand why that is necessary, or what implications -- positive or negative -- it may have) As a security measure, no pl language is available by default. What you did is correct. There is not much (any?) risk with pl/pgsql, so you can install it in template1 so it will be available in any new database you create. Am I doing the right thing? Have I introduced some sort of catastrophe waiting to happen? I did not notice any problems. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[SQL] getting count for a specific querry
Per a thread a while back the discussion was along the lines of serving data up to the web quick. Our app currently pulls a bunch of data to several query pages. I have tried and not gotten the queries to return as fast as they do now which is a huge disappointment as the hardware is twice as powerful as our current production. I can get it pretty close on some thing but others are just plain slower. My idea is to use the limit and offset to return just the first 50 records, if they hit next I can set the offset. My understanding was this gets slower as you move further into the data, but we have several options to modify the search, and I do not believe our clients will page very far intro a dataset. One problem I think I will have though is they currently have count of the records matching their request and I would like to keep that as a display field So given a table of associates my default query will be something like Select * from tblassoc where clientnum = ‘WAZ’ and isdeleted is false The user could add stuff like and where first name like ‘Joel’ Currently it returns all records with a count and a display of the records your viewing like 1-50 of 470, next page is 51-100 etc. Is there a fast way to get the count? Will this concept fly? Also I am getting heat that my search is now case sensitive. What is the best way to get a case insensitive search? I could use ~* or perhaps do an UPPER(firstname) in the select etc? Thanks for any ideas here. I have tried playing with various settings and have not seen my times change much, I will persue this on the performance mailing list. Joel Fradkin
Re: [SQL] Question on triggers and plpgsql
On Apr 8, 2005, at 8:28 AM, John DeSoi wrote: On Apr 7, 2005, at 5:45 PM, Carlos Moreno wrote: The thing seems to work -- I had to go in a shell as user postgres and execute the command: $ createlang -d dbname plpgsql (I'm not sure I understand why that is necessary, or what implications -- positive or negative -- it may have) As a security measure, no pl language is available by default. What you did is correct. There is not much (any?) risk with pl/pgsql, so you can install it in template1 so it will be available in any new database you create. Am I doing the right thing? Have I introduced some sort of catastrophe waiting to happen? I did not notice any problems. Just one detail, but in the form of a question. In the original posting, I think the trigger was doing the logging for something happening on a table as a before insert or update--I may be wrong on that detail. I would think of doing such actions AFTER the update/insert. In the world of transaction-safe operations, is there ANY danger in doing the logging as a BEFORE trigger rather than an AFTER trigger? Thanks, Sean ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Question on triggers and plpgsql
On Apr 8, 2005, at 9:41 AM, Sean Davis wrote: Just one detail, but in the form of a question. In the original posting, I think the trigger was doing the logging for something happening on a table as a before insert or update--I may be wrong on that detail. I would think of doing such actions AFTER the update/insert. In the world of transaction-safe operations, is there ANY danger in doing the logging as a BEFORE trigger rather than an AFTER trigger? Good point. I think both will work in this case and it would depend on the application if it makes a difference. You definitely want an AFTER trigger if you need to see the final state of the row before making changes. In this case the assignment of the column does not depend on any other factors so it would not seem to matter. But I agree from a semantics point of view, an AFTER trigger might be a little better for this. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] Question on triggers and plpgsql
Sean Davis <[EMAIL PROTECTED]> writes: > Just one detail, but in the form of a question. In the original > posting, I think the trigger was doing the logging for something > happening on a table as a before insert or update--I may be wrong on > that detail. I would think of doing such actions AFTER the > update/insert. In the world of transaction-safe operations, is there > ANY danger in doing the logging as a BEFORE trigger rather than an > AFTER trigger? No, actually Carlos wanted to do new.last_modified = now(); so he *must* use a BEFORE trigger --- AFTER is too late to change the data that will be stored. Generalizing freely, I've seen three basic uses for triggers: 1. Modify the data that will be stored. 2. Check that data is valid (eg, consistent with another table). 3. Propagate updates in one place to other places. Clearly #1 must be done in BEFORE triggers. #2 and #3 could be done either way. They are often done in AFTER triggers because that way you *know* that any case-1 triggers have done their work and you are looking at the correct final state of the row. But you could do them in a BEFORE trigger if you were willing to assume that no later-fired trigger would make a change that invalidates your check or propagation. AFTER triggers are relatively expensive (since the triggering event state has to be saved and then recalled) so I could see making that tradeoff if performance is critical. AFAICS the only way that you could get into a can't-roll-back situation is if the trigger tries to propagate the update outside the database. For instance, the proverbial trigger to send mail: once sent you can't cancel it. But really this is dangerous even in an AFTER trigger --- the transaction could still be rolled back after the AFTER trigger fires. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Question on triggers and plpgsql
On Fri, Apr 08, 2005 at 10:36:26AM -0400, Tom Lane wrote: > AFAICS the only way that you could get into a can't-roll-back situation > is if the trigger tries to propagate the update outside the database. > For instance, the proverbial trigger to send mail: once sent you can't > cancel it. But really this is dangerous even in an AFTER trigger --- > the transaction could still be rolled back after the AFTER trigger > fires. People who know more about this will no doubt correct me, but isn't such a case crying out for LISTEN/NOTIFY instead? That is, your trigger puts the mail content into a table of mails to be sent, and wakes up the mail-sender client with the NOTIFY; the NOTIFY and the commit to the mail-it table only happen in that case if the transaction commits. And since mail is async anyway, the extra few seconds shouldn't make any difference, right? A -- Andrew Sullivan | [EMAIL PROTECTED] The plural of anecdote is not data. --Roger Brinner ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] getting count for a specific querry
On Fri, Apr 08, 2005 at 09:29:13AM -0400, Joel Fradkin wrote: > My understanding was this gets slower as you move further into the data, but > we have several options to modify the search, and I do not believe our > clients will page very far intro a dataset. > It gets slower because when you do an offset of 50, you have to pass through the first 50 before picking up the ones you want. If you offset 100, you scan through the first 100. &c. If you don't want to pay that, you need to use a cursor, but this causes the problem that you have to keep your cursor open across page views, which is a tricky issue on the Web. > Currently it returns all records with a count and a display of the records > your viewing like 1-50 of 470, next page is 51-100 etc. > > Is there a fast way to get the count? Not really, no. You have to perform a count() to get it, which is possibly expensive. One way to do it, though, is to do SELECT count(*) FROM tablename WHERE condition LIMIT n; or something like that. Assuming the condition is reasonably limited (i.e. it's not going to cost you a fortune to run this), you'll get the right number back if the number is < n or else you'll get n. If you have n, your application can say "viewing 1-50 of at least n records". This is something you see from time to time in this sort of application. > getting heat that my search is now case sensitive. What is the best way to > get a case insensitive search? I could use ~* or perhaps do an > UPPER(firstname) in the select etc? The upper() (or lower() -- whatever) stragegy is what I'd use. In any case, you want to make sure you put functional indexes on all such columns, because otherwise you'll never get an index scan. A -- Andrew Sullivan | [EMAIL PROTECTED] In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Question on triggers and plpgsql
Andrew Sullivan <[EMAIL PROTECTED]> writes: > On Fri, Apr 08, 2005 at 10:36:26AM -0400, Tom Lane wrote: >> AFAICS the only way that you could get into a can't-roll-back situation >> is if the trigger tries to propagate the update outside the database. >> For instance, the proverbial trigger to send mail: once sent you can't >> cancel it. But really this is dangerous even in an AFTER trigger --- >> the transaction could still be rolled back after the AFTER trigger >> fires. > People who know more about this will no doubt correct me, but isn't > such a case crying out for LISTEN/NOTIFY instead? That is, your > trigger puts the mail content into a table of mails to be sent, and > wakes up the mail-sender client with the NOTIFY; the NOTIFY and the > commit to the mail-it table only happen in that case if the > transaction commits. And since mail is async anyway, the extra few > seconds shouldn't make any difference, right? We do often recommend that, though it occurs to me that this just moves the failure case somewhere else. The hypothetical mail-sending process would presumably want to send mail and then delete the associated record from the table of pending mails ... so what if it fails after sending the mail and before committing the delete? What this does do for you is replace the risk of phantom emails (mail sent but corresponding action inside the database never committed) with the risk of duplicate emails (mail-sender sends you another one after it restarts). In most cases I think I'd prefer the latter. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Question on triggers and plpgsql
Tom, Thanks for setting the record straight. It has been a while since I have written a trigger and I forgot that you can't modify the row in the AFTER trigger. Makes perfect sense. For the record, here is what the docs say: Typically, row before triggers are used for checking or modifying the data that will be inserted or updated. For example, a before trigger might be used to insert the current time into a timestamp column, or to check that two elements of the row are consistent. Row after triggers are most sensibly used to propagate the updates to other tables, or make consistency checks against other tables. The reason for this division of labor is that an after trigger can be certain it is seeing the final value of the row, while a before trigger cannot; there might be other before triggers firing after it. If you have no specific reason to make a trigger before or after, the before case is more efficient, since the information about the operation doesn't have to be saved until end of statement. It might be worth adding a sentence here that explicitly states modifications can only be made in the BEFORE trigger. I did not see that anywhere else in the document. On Apr 8, 2005, at 10:36 AM, Tom Lane wrote: No, actually Carlos wanted to do new.last_modified = now(); so he *must* use a BEFORE trigger --- AFTER is too late to change the data that will be stored. Generalizing freely, I've seen three basic uses for triggers: 1. Modify the data that will be stored. 2. Check that data is valid (eg, consistent with another table). 3. Propagate updates in one place to other places. Clearly #1 must be done in BEFORE triggers. #2 and #3 could be done either way. They are often done in AFTER triggers because that way you *know* that any case-1 triggers have done their work and you are looking at the correct final state of the row. But you could do them in a BEFORE trigger if you were willing to assume that no later-fired trigger would make a change that invalidates your check or propagation. AFTER triggers are relatively expensive (since the triggering event state has to be saved and then recalled) so I could see making that tradeoff if performance is critical. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] Question on triggers and plpgsql
On Fri, Apr 08, 2005 at 11:35:47AM -0400, Tom Lane wrote: > What this does do for you is replace the risk of phantom emails (mail > sent but corresponding action inside the database never committed) > with the risk of duplicate emails (mail-sender sends you another one > after it restarts). In most cases I think I'd prefer the latter. Me too. Besides, you already have this risk with SMTP, because a message can be queued and accepted on the remote side when the local side goes away, so that the session is completed improperly. Depending on configuration and a bunch of painful start-up possibilities with the server, you might well get a duplicate copy of a mail transmitted later. (In the present age, given the remarkable quality of networks and mail servers everyone has, you almost never have this happen any more. But it's still strictly speaking possible.) A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] Question on triggers and plpgsql
Tom Lane wrote: The hypothetical mail-sending process would presumably want to send mail and then delete the associated record from the table of pending mails ... so what if it fails after sending the mail and before committing the delete? What this does do for you is replace the risk of phantom emails (mail sent but corresponding action inside the database never committed) with the risk of duplicate emails (mail-sender sends you another one after it restarts). In most cases I think I'd prefer the latter. You have this possibility anyway. If a mailserver thinks it has failed to forward the message, it will resend. There is always a small window where the receiving mailserver might actually have received the message without the acknowledgement being logged by the sender. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 11:07 -0400, Andrew Sullivan wrote: > On Fri, Apr 08, 2005 at 09:29:13AM -0400, Joel Fradkin wrote: > > > > Is there a fast way to get the count? > > Not really, no. You have to perform a count() to get it, which is > possibly expensive. One way to do it, though, is to do > > SELECT count(*) FROM tablename WHERE condition LIMIT n; > > or something like that. Assuming the condition is reasonably limited > (i.e. it's not going to cost you a fortune to run this), you'll get > the right number back if the number is < n or else you'll get > n. come again ? test=# select count(*) from a; count --- 3 (1 row) test=# select count(*) from a limit 2; count --- 3 (1 row) the LIMIT clause limits the number of rows returned by the select, in this case 1 row. maybe you mean something like: test=# select count(*) from (select * from a limit 2) as foo; count --- 2 (1 row) gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 09:29 -0400, Joel Fradkin wrote: > Our app currently pulls a bunch of data to several query pages. > > My idea is to use the limit and offset to return just the first 50 > records, if they hit next I can set the offset. > > My understanding was this gets slower as you move further into the > data, but we have several options to modify the search, and I do not > believe our clients will page very far intro a dataset. you might reduce the performance loss if your dataset is ordered by a UNIQUE index. select * from mytable where somecondition ORDER by uniquecol limit 50; and next: select * from mytable where somecondition AND uniquecol>? ORDER by uniquecol limit 50 OFFSET 50; where the ? is placeholder for last value returned by last query. if your unique index is a multi-column one, the method is slightly more complicated, but the same idea. gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] getting count for a specific querry
On Fri, Apr 08, 2005 at 04:17:45PM +, Ragnar Hafstað wrote: > On Fri, 2005-04-08 at 11:07 -0400, Andrew Sullivan wrote: > > > > SELECT count(*) FROM tablename WHERE condition LIMIT n; > the LIMIT clause limits the number of rows returned by the select, > in this case 1 row. > > maybe you mean something like: > > test=# select count(*) from (select * from a limit 2) as foo; Yes, that was stupid of me. That's what I meant, though. A -- Andrew Sullivan | [EMAIL PROTECTED] When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] getting count for a specific querry
Ragnar =?ISO-8859-1?Q?Hafsta=F0?= <[EMAIL PROTECTED]> writes: > you might reduce the performance loss if your dataset is ordered by > a UNIQUE index. > select * from mytable where somecondition > ORDER by uniquecol limit 50; > and next: > select * from mytable where somecondition AND uniquecol>? > ORDER by uniquecol limit 50 OFFSET 50; > where the ? is placeholder for last value returned by last query. Uh, you don't want the OFFSET there do you? But otherwise, yeah, this is a popular solution for paging through some rows. Doesn't really help with the problem of counting the total dataset though ... regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[SQL] Numeric and CSV under 8.0.1 ?
Hello Everyone, Currently, here at work, I am doing the whole 'advocacy' part of postgreSQL. It's not really hard to do, as the other database's are MySQL and Sybase ;) There is obviously a whole spat of data munging going on in the background, and I noticed that psql in 8.0.1 now accepts CSV ! Great. Except, it doesn't really appear to be 100% there. Numeric's wrapped in '...' don't want to appear to go in. Is this a 'known problem' ? Table "public.visitor_main" Column | Type | Modifiers --+--+-- iuserid | numeric(12,0)| not null firstname| character(25)| not null lastname | character(25)| not null Sample Data line '3236','Alonzo','Peter' ERROR: invalid input syntax for type numeric: "'3236'" CONTEXT: COPY visitor_main, line 1, column iuserid: "'3236'" Thoughts ? Regards Steph pgp6CrlzVHPhB.pgp Description: PGP signature
Re: [SQL] getting count for a specific querry
> > select * from mytable where somecondition AND uniquecol>? > > ORDER by uniquecol limit 50 OFFSET 50; > > > where the ? is placeholder for last value returned by last query. > > Uh, you don't want the OFFSET there do you? But otherwise, yeah, > this is a popular solution for paging through some rows. Doesn't really > help with the problem of counting the total dataset though ... In the past I've done an EXPLAIN and parsed the plan to see what PostgreSQL estimated for the number of . If less than $threshhold, I would do a count(*). If more than $threshhold I would display to the user "approx N records". This seemed to be good enough for most cases. -- ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 12:32 -0400, Tom Lane wrote: > Ragnar =?ISO-8859-1?Q?Hafsta=F0?= <[EMAIL PROTECTED]> writes: > > you might reduce the performance loss if your dataset is ordered by > > a UNIQUE index. > > > select * from mytable where somecondition > > ORDER by uniquecol limit 50; > > > and next: > > > select * from mytable where somecondition AND uniquecol>? > > ORDER by uniquecol limit 50 OFFSET 50; > > > where the ? is placeholder for last value returned by last query. > > Uh, you don't want the OFFSET there do you? ooops! of course not. the uniquecol>? is meant to REPLACE the OFFSET. gnari ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [SQL] getting count for a specific querry
Thanks all. I might have to add a button to do the count on command so they don't get the hit. I would want it to return the count of the condition, not the currently displayed number of rows. Is there any other database engines that provide better performance? (We just 2 moths moving to postgres and it is not live yet, but if I am going to get results back slower then my 2 proc box running MSSQL in 2 gig and 2 processor I cant see any reason to move to it) The Postgres is on a 4 proc Dell with 8 gigs of memory. I thought I could analyze our queries and our config to optimize. Joel Fradkin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Question on triggers and plpgsql
On Apr 8, 2005, at 10:59 AM, Andrew Sullivan wrote: wakes up the mail-sender client with the NOTIFY; the NOTIFY and the commit to the mail-it table only happen in that case if the transaction commits. And since mail is async anyway, the extra few seconds shouldn't make any difference, right? I have a lot of processing that could benefit from this type of synchronization, except the fact that there's no Pg command to "wait until I get a notify message". You have to constantly poll to see if you got one, which negates a lot of the benefit of async notification to rarely run processes. Vivek Khera, Ph.D. +1-301-869-4449 x806 smime.p7s Description: S/MIME cryptographic signature
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > Thanks all. > I might have to add a button to do the count on command so they don't get > the hit. > I would want it to return the count of the condition, not the currently > displayed number of rows. > > Is there any other database engines that provide better performance? > (We just 2 moths moving to postgres and it is not live yet, but if I am > going to get results back slower then my 2 proc box running MSSQL in 2 gig > and 2 processor I cant see any reason to move to it) > The Postgres is on a 4 proc Dell with 8 gigs of memory. > I thought I could analyze our queries and our config to optimize. Judging postgresql on one single data point (count(*) performance) is quite unfair. Unless your system only operates on static data and is used to mostly do things like counting, in which case, why are you using a database? PostgreSQL is a great fit for certain loads, and a poor fit for others. Are you going to have lots of people updating the database WHILE the select count(*) queries are running? Are you going to be doing other, more interesting things than simply counting? If so, you really should build a test case that emulates what you're really going to be doing with the system. I've found that the poor performance of aggregates in PostgreSQL is generally more than made up for by the outstanding behaviour it exhibits when under heavy parallel load. Note that the basic design of PostgreSQL's MVCC system is such that without using some kind of trigger to maintain pre-calculated aggregate information, it will NEVER be as fast as most other databases at doing aggregates across large chunks of your data. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Numeric and CSV under 8.0.1 ?
On Fri, 8 Apr 2005 12:51:47 -0400, Stef wrote > Hello Everyone, > Currently, here at work, I am doing the whole > 'advocacy' part of postgreSQL. It's not really hard to > do, as the other database's are MySQL and Sybase ;) > > There is obviously a whole spat of data munging > going on in the background, and I noticed that psql in > 8.0.1 now accepts CSV ! Great. > > Except, it doesn't really appear to be 100% there. > Numeric's wrapped in '...' don't want to appear to go in. > Is this a 'known problem' ? > > Table "public.visitor_main" > Column | Type | Modifiers > > --+--+-- > iuserid | numeric(12,0)| not null > firstname| character(25)| not null > lastname | character(25)| not null > > Sample Data line > '3236','Alonzo','Peter' > > ERROR: invalid input syntax for type numeric: "'3236'" > CONTEXT: COPY visitor_main, line 1, column iuserid: "'3236'" > > Thoughts ? > Regards > Steph Steph, '3236' is a string not a numeric. As I see it (novice that I am) you have three choices. 1) Write an external program (gawk, sed, etc.) to remove the quotes around that field. 2) Import the data into an intermediate table and then using an after trigger move and manipulate the data using CAST. 3) Import the data into your table using a BEFORE trigger and manipulate the data using CAST. HTH Kind Regards, Keith ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Numeric and CSV under 8.0.1 ?
Hello Keith, well, I can understand that 3265 appears to be a string, but, I was under the impression that -everything- in a CSV format file could have ' ' around it? Is this not the case ? Sorry if I am being completely insane here :) Steph On Fri, Apr 08, 2005 at 02:12:11PM -0400, Keith Worthington wrote: > On Fri, 8 Apr 2005 12:51:47 -0400, Stef wrote > > Hello Everyone, > > Currently, here at work, I am doing the whole > > 'advocacy' part of postgreSQL. It's not really hard to > > do, as the other database's are MySQL and Sybase ;) > > > > There is obviously a whole spat of data munging > > going on in the background, and I noticed that psql in > > 8.0.1 now accepts CSV ! Great. > > > > Except, it doesn't really appear to be 100% there. > > Numeric's wrapped in '...' don't want to appear to go in. > > Is this a 'known problem' ? > > > > Table "public.visitor_main" > > Column | Type | Modifiers > > > > --+--+-- > > iuserid | numeric(12,0)| not null > > firstname| character(25)| not null > > lastname | character(25)| not null > > > > Sample Data line > > '3236','Alonzo','Peter' > > > > ERROR: invalid input syntax for type numeric: "'3236'" > > CONTEXT: COPY visitor_main, line 1, column iuserid: "'3236'" > > > > Thoughts ? > > Regards > > Steph > > Steph, > > '3236' is a string not a numeric. As I see it (novice that I am) you have > three choices. 1) Write an external program (gawk, sed, etc.) to remove the > quotes around that field. 2) Import the data into an intermediate table and > then using an after trigger move and manipulate the data using CAST. 3) > Import the data into your table using a BEFORE trigger and manipulate the data > using CAST. > > HTH > > Kind Regards, > Keith > pgpEdkkQPDQxn.pgp Description: PGP signature
Re: [SQL] Numeric and CSV under 8.0.1 ?
On Fri, Apr 08, 2005 at 02:25:13PM -0400, Stef wrote: > > well, I can understand that 3265 appears to be a > string, but, I was under the impression that -everything- > in a CSV format file could have ' ' around it? Is this not > the case ? See the documentation for COPY -- the default quote character for CSV is a double quote, but you can change it with QUOTE. http://www.postgresql.org/docs/8.0/interactive/sql-copy.html -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] Question on triggers and plpgsql
Vivek Khera <[EMAIL PROTECTED]> writes: > I have a lot of processing that could benefit from this type of > synchronization, except the fact that there's no Pg command to "wait > until I get a notify message". This is a client library deficiency, not a problem with the backend or the protocol. In libpq it is actually possible to do it, but you have to select() or poll() on the socket for yourself, which is a tad ugly. OTOH, most apps that want to do that also want to wait on other sockets at the same time, so a cleaner-looking API wouldn't necessarily be any more useful. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] getting count for a specific querry
On Apr 8, 2005 1:10 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote: On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote:> Thanks all.> I might have to add a button to do the count on command so they don't get> the hit.> I would want it to return the count of the condition, not the currently> displayed number of rows.>> Is there any other database engines that provide better performance?> (We just 2 moths moving to postgres and it is not live yet, but if I am> going to get results back slower then my 2 proc box running MSSQL in 2 gig> and 2 processor I cant see any reason to move to it)> The Postgres is on a 4 proc Dell with 8 gigs of memory.> I thought I could analyze our queries and our config to optimize.Judging postgresql on one single data point (count(*) performance) isquite unfair. Unless your system only operates on static data and isused to mostly do things like counting, in which case, why are you usinga database?PostgreSQL is a great fit for certain loads, and a poor fit for others.Are you going to have lots of people updating the database WHILE theselect count(*) queries are running? Are you going to be doing other,more interesting things than simply counting? If so, you really shouldbuild a test case that emulates what you're really going to be doingwith the system.I've found that the poor performance of aggregates in PostgreSQL isgenerally more than made up for by the outstanding behaviour it exhibitswhen under heavy parallel load.Note that the basic design of PostgreSQL's MVCC system is such thatwithout using some kind of trigger to maintain pre-calculated aggregateinformation, it will NEVER be as fast as most other databases at doingaggregates across large chunks of your data.---(end of broadcast)---TIP 7: don't forget to increase your free space map settingsFrom a simple/high level perspective why is this? That is why can't PostgreSQL do aggregates as well across large chunks of data. I'm assuming it extremely complicated. Otherwise the folks around here would have churned out a fix in a month or less and made this issue a past story.
Re: [SQL] getting count for a specific querry
Bob Henkel <[EMAIL PROTECTED]> writes: > From a simple/high level perspective why is this? That is why can't > PostgreSQL do aggregates as well across large chunks of data. I'm > assuming it extremely complicated. Otherwise the folks around here > would have churned out a fix in a month or less and made this issue a > past story. You can find very detailed discussions of this in the archives, but the basic reason is that we have a very general/extensible view of aggregates (which is how come we can support custom aggregates). An aggregate is a function that you feed all the input rows to, one at a time, and then it produces the answer. Nice, general, extensible, and not at all optimizable :-( Now in general that is the only way to do it, and so Scott's implication that we always suck compared to other databases is really an overstatement. Ask another database to do a standard deviation calculation, for instance, and it'll be just as slow. However there are special cases that other DBs can optimize that we don't even try to. The big ones are: * COUNT(*) across a whole table --- most non-MVCC databases keep tabs of the physical number of the rows in the table, and so they can answer this very quickly. Postgres doesn't keep such a count, and under MVCC rules it wouldn't necessarily be the right answer if we had it. (BTW, count of rows satisfying a particular condition is a different ballgame entirely; in most cases that can't be optimized at all, AFAIK.) If you are willing to accept approximate answers there are various tricks you can use --- see the archives --- but we don't get to fudge on COUNT(*) itself because it's in the SQL standard. * MIN or MAX of an indexed column --- most DBs can use an index scan to find such a row relatively quickly, although whether this trick works or not depends a whole lot on whether you have WHERE or GROUP BY and just what those conditions look like. You can fake the min/max answer in Postgres by doing the transformstion to an indexable query by hand, for instance instead of MAX(col) do SELECT col FROM tab ORDER BY col DESC LIMIT 1; There are periodic discussions in the hackers list about teaching the planner to do that automatically, and it will probably happen someday; but it's a complicated task and not exceedingly high on the priority list. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [SQL] getting count for a specific querry
Believe me I just spent two months converting our app, I do not wish to give up on that work. We do a great deal more then count. Specifically many of our queries run much slower on postgres. As mentioned I purchased a 4 proc box with 8 gigs of memory for this upgrade (Dell may have been a poor choice based on comments I have received). Even when I could see a query like select * from tblassoc where clientnum = 'WAZ' using indexed joins on location and jobtitle it is still taking 22 seconds to run compared to the 9 seconds on MSSQL on a 2 proc 2 gig box. I got one of my querries to run faster using a page cost of .2 but then the assoc query was running 50 seconds, so I adjusted to a cost of 2 (tried 1.2, 2, 3, and 4 and did not see hug changes in the assoc except it did not like .2). I have placed a call to commandprompt.com and am going to pay for some support to see if they have anything meaningful to add. It could be something with my hardware, my hardware config, my postgres config. I am just not sure. I know I have worked diligently to try to learn all I can and I used to think I was kinda smart. I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15k drives. I am using links to those from the install directory. It starts and stops ok this way, but maybe it should be different. I can tell you I am very happy to have this forum as I could not have gotten to the point I am without the many usefull comments from folks on the list. I greatly appreciate everyone who has helped. But truth is if I cant get to work better then I have I may have to ditch the effort and bite the 70K bullet. Its compounded by using 3 developers time for two months to yield an answer that my boss may just fire me for. I figured since my first test showed I could get data faster on the postgres box that I could with enough study get all our data to go faster, but I am afraid I have not been very successful. My failure is not a reflection postgres as you mentioned it is definatley great at some things. I have 90 some views not to mention as many stored procedures that have been converted. I wrote an app to move the data and it works great. But if it too slow I just will not be able to use for production. Joel Judging postgresql on one single data point (count(*) performance) is quite unfair. Unless your system only operates on static data and is used to mostly do things like counting, in which case, why are you using a database? PostgreSQL is a great fit for certain loads, and a poor fit for others. Are you going to have lots of people updating the database WHILE the select count(*) queries are running? Are you going to be doing other, more interesting things than simply counting? If so, you really should build a test case that emulates what you're really going to be doing with the system. I've found that the poor performance of aggregates in PostgreSQL is generally more than made up for by the outstanding behaviour it exhibits when under heavy parallel load. Note that the basic design of PostgreSQL's MVCC system is such that without using some kind of trigger to maintain pre-calculated aggregate information, it will NEVER be as fast as most other databases at doing aggregates across large chunks of your data. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] getting count for a specific querry
On Fri, Apr 08, 2005 at 03:23:25PM -0400, Joel Fradkin wrote: > Believe me I just spent two months converting our app, I do not wish to give > our queries run much slower on postgres. As mentioned I purchased a 4 proc I suspect you want the -performance list. And it'd be real handy to get some EXPLAIN ANALYSE results for the offending queries in order to help you (where "handy" is read as "necessary"). A -- Andrew Sullivan | [EMAIL PROTECTED] Information security isn't a technological problem. It's an economics problem. --Bruce Schneier ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [SQL] getting count for a specific querry
As always thanks Tom, I will definitely look at what I can do. Since it is a count of matched condition records I may not have a way around. I don't think my clients would like me to aprox as it is a count of their records. What I plan on doing assuming I can get all my other problems fixed (as mentioned I am going to try and get paid help to see if I goofed it up some where) is make the count a button, so they don't wait everytime, but can choose to wait if need be, maybe I can store the last count with a count on day for the generic search it defaults to, and just have them do a count on demand if they have a specific query. Our screens have several criteria fields in each application. Joel Fradkin -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Friday, April 08, 2005 2:28 PM To: Bob Henkel Cc: Scott Marlowe; Joel Fradkin; Andrew Sullivan; pgsql-sql@postgresql.org Subject: Re: [SQL] getting count for a specific querry Bob Henkel <[EMAIL PROTECTED]> writes: > From a simple/high level perspective why is this? That is why can't > PostgreSQL do aggregates as well across large chunks of data. I'm > assuming it extremely complicated. Otherwise the folks around here > would have churned out a fix in a month or less and made this issue a > past story. You can find very detailed discussions of this in the archives, but the basic reason is that we have a very general/extensible view of aggregates (which is how come we can support custom aggregates). An aggregate is a function that you feed all the input rows to, one at a time, and then it produces the answer. Nice, general, extensible, and not at all optimizable :-( Now in general that is the only way to do it, and so Scott's implication that we always suck compared to other databases is really an overstatement. Ask another database to do a standard deviation calculation, for instance, and it'll be just as slow. However there are special cases that other DBs can optimize that we don't even try to. The big ones are: * COUNT(*) across a whole table --- most non-MVCC databases keep tabs of the physical number of the rows in the table, and so they can answer this very quickly. Postgres doesn't keep such a count, and under MVCC rules it wouldn't necessarily be the right answer if we had it. (BTW, count of rows satisfying a particular condition is a different ballgame entirely; in most cases that can't be optimized at all, AFAIK.) If you are willing to accept approximate answers there are various tricks you can use --- see the archives --- but we don't get to fudge on COUNT(*) itself because it's in the SQL standard. * MIN or MAX of an indexed column --- most DBs can use an index scan to find such a row relatively quickly, although whether this trick works or not depends a whole lot on whether you have WHERE or GROUP BY and just what those conditions look like. You can fake the min/max answer in Postgres by doing the transformstion to an indexable query by hand, for instance instead of MAX(col) do SELECT col FROM tab ORDER BY col DESC LIMIT 1; There are periodic discussions in the hackers list about teaching the planner to do that automatically, and it will probably happen someday; but it's a complicated task and not exceedingly high on the priority list. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] getting count for a specific querry
On Apr 8, 2005 2:23 PM, Joel Fradkin <[EMAIL PROTECTED]> wrote: Believe me I just spent two months converting our app, I do not wish to giveup on that work. We do a great deal more then count. Specifically many ofour queries run much slower on postgres. As mentioned I purchased a 4 procbox with 8 gigs of memory for this upgrade (Dell may have been a poor choicebased on comments I have received). Even when I could see a query likeselect * from tblassoc where clientnum = 'WAZ' using indexed joins onlocation and jobtitle it is still taking 22 seconds to run compared to the 9seconds on MSSQL on a 2 proc 2 gig box. I got one of my querries to runfaster using a page cost of .2 but then the assoc query was running 50seconds, so I adjusted to a cost of 2 (tried 1.2, 2, 3, and 4 and did notsee hug changes in the assoc except it did not like .2).I have placed a call to commandprompt.com and am going to pay for somesupport to see if they have anything meaningful to add.It could be something with my hardware, my hardware config, my postgresconfig. I am just not sure. I know I have worked diligently to try to learnall I can and I used to think I was kinda smart.I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15kdrives. I am using links to those from the install directory. It starts andstops ok this way, but maybe it should be different.I can tell you I am very happy to have this forum as I could not have gottento the point I am without the many usefull comments from folks on the list.I greatly appreciate everyone who has helped. But truth is if I cant get towork better then I have I may have to ditch the effort and bite the 70Kbullet. Its compounded by using 3 developers time for two months to yield ananswer that my boss may just fire me for. I figured since my first testshowed I could get data faster on the postgres box that I could with enoughstudy get all our data to go faster, but I am afraid I have not been verysuccessful.My failure is not a reflection postgres as you mentioned it is definatleygreat at some things. I have 90 some views not to mention as many storedprocedures that have been converted. I wrote an app to move the data and itworks great. But if it too slow I just will not be able to use forproduction.JoelJudging postgresql on one single data point (count(*) performance) isquite unfair. Unless your system only operates on static data and isused to mostly do things like counting, in which case, why are you usinga database?PostgreSQL is a great fit for certain loads, and a poor fit for others.Are you going to have lots of people updating the database WHILE theselect count(*) queries are running? Are you going to be doing other,more interesting things than simply counting? If so, you really shouldbuild a test case that emulates what you're really going to be doingwith the system.I've found that the poor performance of aggregates in PostgreSQL isgenerally more than made up for by the outstanding behaviour it exhibitswhen under heavy parallel load.Note that the basic design of PostgreSQL's MVCC system is such thatwithout using some kind of trigger to maintain pre-calculated aggregateinformation, it will NEVER be as fast as most other databases at doingaggregates across large chunks of your data.---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not matchHave you posted your postgresql config files for the folks here to review? I can't say I can help you with that because I too can only read the docs and go from there. But for specific situations you need specific configs. I would think you can get more out of postgresql with a some time and help from the people around here. Though count(*) looks like it may be slow.
Re: [SQL] getting count for a specific querry
Quoting Scott Marlowe <[EMAIL PROTECTED]>: > On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > > I might have to add a button to do the count on command so they don't get > > the hit. > > I would want it to return the count of the condition, not the currently > > displayed number of rows. > > Judging postgresql on one single data point (count(*) performance) is > quite unfair. Unless your system only operates on static data and is > used to mostly do things like counting, in which case, why are you using > a database? For the general discussion of slowness of count(*), and given no entry on the subject in http://www.postgresql.org/docs/faqs.FAQ.html ... I guess everyone has to be pointed at: http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php However, the gist of this person's problem is that an adhoc query, NOT just a 'select count(*) from table', can take remarkably long. Again, the problem is that PG can't just scan an index. -- One workaround for this is to use EXPLAIN. THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES. It's pointless overhead, otherwise. default_statistics_target is cranked up to 200 on all such tables, and pg_autovacuum is running. (If there were anything to improve, it would be refining the thresholds on this). If the "(cost...rows=" string returns a number higher than the QUERY row limit, the user is derailed ("That's not specific enough to answer immediately; do you want an emailed report?"). Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query itself. If the "(actual...rows=...)" is higher than the RESULT row limit (PAGE limit). It then runs the query, with the PAGE rows offset and limit --- and happily, practically everything that query needs is now in shared_buffers. The count from the EXPLAIN analyze is displayed in the web page. -- "Dreams come true, not free." -- S.Sondheim, ITW ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] getting count for a specific querry
I have asked specific questions and paid attention to the various threads on configuration. I will take my config files and post on the performance thread that is a good suggestion (personnaly I have more faith in this forum then a paid consultant, but at this point I am willing to try both). Thanks again. The count thing I can get around using stored results and on demand counting, but some of my statistical reporting is just a must have. I enclosed one of my views, I realize to get help I should also include tables and indexes etc, and maybe I will do that. It is just there are so many of them. This one in particular did not run at all when I first got my data loaded. I ended up adding a few indexes and not sure what else and got it to run faster on postgres. Now it is running horrid, so I am back to the drawing board I change one thing and something else breaks. I am just frustrated, maybe Monday I will have better strength to figure it all out. Joel Fradkin CREATE OR REPLACE VIEW viwcasecube as /*customer 1*/ select c.clientnum,c.casenum,c.casereferencenum, coalesce ((select sum(quantity * amount) as merchandiseamount from tblmerchandise m where m.caseid = c.caseid and m.clientnum=c.clientnum), 0) || coalesce(c.totaladmitted, 0) as casevalue, coalesce(c.totaladmitted, 0) as admitted, coalesce(c.totalrecovery, 0) as recovered, coalesce(c.ageatcase, 0) as ageatcase, 1 as numberofcase, coalesce(ct.value,'na') as casetype, s.value as status, c.opendate, c.closedate, a.value as action, u2.completename as closebyuser, cs.value as casesource, m.value as method, m2.value as method2, c.reportingagentfirstinitial, c.reportingagentlastname, case when c.monthsemployedatcase is null then 'na' else cast(c.monthsemployedatcase as varchar(3)) end as monthsemployedatcase, u1.completename as createdby, st.value as subjecttype, ot.value as offensetype, /*cust*/ custpt.value as patrontype, 'na' as jobtitle, 0 as testscore, coalesce(cust.firstname,'na') as firstname, coalesce(cust.lastname,'na') as lastname, coalesce(cust.address,'na') as address, coalesce(cust.city,'na') as city, coalesce(cust.state,'na') as state, coalesce(cust.zip,'na') as zip, coalesce(crtt.value,'na') as restitutiontype, /* type of restitution tracking */ coalesce(tblsex.value,'na') as gender, coalesce(eth.value,'na') as ethnicity, custmbt.value as militarybranch, custmst.value as militarystatus, coalesce(secagentnum,'not recorded') as secagentnum, l.locationnum, l.name as store, coalesce(l.address,'na') as locationaddress, coalesce(l.city,'na') as locationcity, coalesce(l.state,'na') as locationstate, coalesce(l.zip,'na') as locationzip, d .districtnum, d .districtname as district, r.regionnum, r.regionname as region, dv.divisionnum, dv.divisionname as division, case when c.apprehdate is null then c.opendate else c.apprehdate end as apprehdate, to_char( coalesce(c.apprehdate,c.opendate),'') as year, to_char( coalesce(c.apprehdate, c.opendate),'q') as quarter, to_char( coalesce(c.apprehdate, c.opendate),'MM') as month, to_char( coalesce(c.apprehdate, c.opendate),'D') as weekday, to_char( coalesce(c.apprehdate, c.opendate),'WW') as week, to_char( coalesce(c.apprehdate, c.opendate),'HH24:MI') as time, coalesce(c.sourcedocnum,'none') as sourcedocnum, case coalesce(c.sourcemodule,'n') when 'n' then 'none' when 'i' then 'incident' when 'g' then 'general investigation' when 'e' then 'employee investigation' else 'none' end as sourcemodule, case coalesce(tblcase1.clientnum, 'no') || coalesce(cdacase.clicasenumber, 'no') when 'nono' then 'no' else 'yes' end as civilcase, coalesce(lpr.lpregionnum,'na')as lpregionnum,coalesce(lpr.managername,'na') as lpmanager from tblcase c left outer join tblaction a on c.actionid = a.id and c.clientnum = a.clientnum and 1= a.presentationid left outer join tblmethod m on c.methodid = m.id and c.clientnum = m.clientnum and 1= m.presentationid left outer join tblmethod m2 on c.methodid2 = m2.id and c.clientnum = m2.clientnum and 1= m2.presentationid left outer join tblcasesource cs on c.casesourceid = cs.id and c.clientnum = cs.clientnum and 1= cs.presentationid inner join tbllocation l left outer join tbllpregion lpr on l.lpregionid = lpr.lpregionid and l.clientnum = lpr.clientnum and 1= lpr.presentationid on c.clientnum = l.clientnum and c.locationid = l.locationid inner join tbldistrict d on c.clientnum = d.clientnum and l.districtid = d.districtid and l.regionid = d.regionid and l.divisionid = d.divisionid inner join tblregion r on c.clientnu
Re: [SQL] getting count for a specific querry
I will also look at doing it the way you describe, they do have wide liberty. Thanks so much for the ideas. Sorry I did not do a perusal of the archives first (I normally try that, but think I am brain dead today). Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 [EMAIL PROTECTED] www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mischa Sandberg Sent: Friday, April 08, 2005 2:40 PM To: Scott Marlowe Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] getting count for a specific querry Quoting Scott Marlowe <[EMAIL PROTECTED]>: > On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > > I might have to add a button to do the count on command so they don't get > > the hit. > > I would want it to return the count of the condition, not the currently > > displayed number of rows. > > Judging postgresql on one single data point (count(*) performance) is > quite unfair. Unless your system only operates on static data and is > used to mostly do things like counting, in which case, why are you using > a database? For the general discussion of slowness of count(*), and given no entry on the subject in http://www.postgresql.org/docs/faqs.FAQ.html ... I guess everyone has to be pointed at: http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php However, the gist of this person's problem is that an adhoc query, NOT just a 'select count(*) from table', can take remarkably long. Again, the problem is that PG can't just scan an index. -- One workaround for this is to use EXPLAIN. THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES. It's pointless overhead, otherwise. default_statistics_target is cranked up to 200 on all such tables, and pg_autovacuum is running. (If there were anything to improve, it would be refining the thresholds on this). If the "(cost...rows=" string returns a number higher than the QUERY row limit, the user is derailed ("That's not specific enough to answer immediately; do you want an emailed report?"). Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query itself. If the "(actual...rows=...)" is higher than the RESULT row limit (PAGE limit). It then runs the query, with the PAGE rows offset and limit --- and happily, practically everything that query needs is now in shared_buffers. The count from the EXPLAIN analyze is displayed in the web page. -- "Dreams come true, not free." -- S.Sondheim, ITW ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] getting count for a specific querry
On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote: I set up the data on 4 10k scsi drives in a powervault and my wal on 2 15k drives. I am using links to those from the install directory. It starts and stops ok this way, but maybe it should be different. Your problem might just be the choice of using a Dell RAID controller. I have a 1 year old box connected to a 14 disk powervault (PowerEdge 2650) and it is dog slow compared to a dual opteron with 8 disks that is replacing it. It is all I/O for me, and the dell's just are not known for speedy I/O. Vivek Khera, Ph.D. +1-301-869-4449 x806 smime.p7s Description: S/MIME cryptographic signature
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 15:23, Vivek Khera wrote: > On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote: > > > I set up the data on 4 10k scsi drives in a powervault and my wal on 2 > > 15k > > drives. I am using links to those from the install directory. It > > starts and > > stops ok this way, but maybe it should be different. > > > > Your problem might just be the choice of using a Dell RAID controller. > I have a 1 year old box connected to a 14 disk powervault (PowerEdge > 2650) and it is dog slow compared to a dual opteron with 8 disks that > is replacing it. It is all I/O for me, and the dell's just are not > known for speedy I/O. Note that there are several different RAID controllers you can get with a DELL. I had good luck with the PERC 4C (AMI MegaRAID based) at my last job. In a dual 2400Mz machine with 2 gigs ram, it handily outran a 4 way (about 1200 MHz CPUs) windows / MSSQL box with 4 gigs of ram at most tasks. Especially inserts / updates. The windows machine had the more generic PERC 3I type controller in it. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] getting count for a specific querry
On Apr 8, 2005, at 4:31 PM, Scott Marlowe wrote: Note that there are several different RAID controllers you can get with a DELL. I had good luck with the PERC 4C (AMI MegaRAID based) at my I've had bad luck regarding speed with *all* of them, AMI MegaRAID and Adaptec based ones, under high load. Under moderate to low load they're acceptable. Vivek Khera, Ph.D. +1-301-869-4449 x806 smime.p7s Description: S/MIME cryptographic signature
Re: [SQL] getting count for a specific querry
On Apr 8, 2005, at 4:35 PM, Bob Henkel wrote: desktop SATA drive with no RAID? I'm by any means as knowledgeable about I/O setup as many of you are but my 2 cents wonders if the Dell RAID is really that much slower than a competitively priced/speced alternative? Would Joel's problems just fade away if he wasn't using a Dell RAID? "Dell RAID" is not one thing. They sell "altered" RAID cards from Adaptec and LSI. Whatever alteration they do to them tends to make them run not so fast. I have a Dell SATA RAID (adaptec based) on the office server and it is OK, though not something I'd buy again. I have various PERC 3 and PERC 4 RAID controllers on my servers (SCSI based) and they suck under heavy I/O load. I wonder why the name-brand LSI cards work so much faster... perhaps it is the motherboard? I don't know, and I don't care... :-) For me, high performance DB and Dell servers are mutually exclusive. Vivek Khera, Ph.D. +1-301-869-4449 x806 smime.p7s Description: S/MIME cryptographic signature
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 15:35, Bob Henkel wrote: > On Apr 8, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote: > On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote: > > > I set up the data on 4 10k scsi drives in a powervault and > my wal on 2 > > 15k > > drives. I am using links to those from the install > directory. It > > starts and > > stops ok this way, but maybe it should be different. > > > > Your problem might just be the choice of using a Dell RAID > controller. > I have a 1 year old box connected to a 14 disk powervault > (PowerEdge > 2650) and it is dog slow compared to a dual opteron with 8 > disks that > is replacing it. It is all I/O for me, and the dell's just > are not > known for speedy I/O. > > Vivek Khera, Ph.D. > +1-301-869-4449 x806 > > > > But that is relative I would think. Is the Dell RAID much faster than > my desktop SATA drive with no RAID? I'm by any means as knowledgeable > about I/O setup as many of you are but my 2 cents wonders if the Dell > RAID is really that much slower than a competitively priced/speced > alternative? Would Joel's problems just fade away if he wasn't using > a Dell RAID? My experience with the 3i controllers (See my earlier post) was that my old Pentium Pro200x2 machine with 512 meg ram and a generic Ultra Wide SCSI card and a half dozen drives running software RAID 5 was faster. Seriously. So was my P-II-350 desktop with the same controller, and an older Dual P-III-750 with only UltraSCSI running in a RAID-1 mirror set. The 3I is REALLY slow (or at least WAS slow) under linux. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [SQL] getting count for a specific querry
On Apr 8, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote: On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:> I set up the data on 4 10k scsi drives in a powervault and my wal on 2> 15k> drives. I am using links to those from the install directory. It> starts and> stops ok this way, but maybe it should be different.>Your problem might just be the choice of using a Dell RAID controller.I have a 1 year old box connected to a 14 disk powervault (PowerEdge2650) and it is dog slow compared to a dual opteron with 8 disks thatis replacing it. It is all I/O for me, and the dell's just are notknown for speedy I/O.Vivek Khera, Ph.D.+1-301-869-4449 x806 But that is relative I would think. Is the Dell RAID much faster than my desktop SATA drive with no RAID? I'm by any means as knowledgeable about I/O setup as many of you are but my 2 cents wonders if the Dell RAID is really that much slower than a competitively priced/speced alternative? Would Joel's problems just fade away if he wasn't using a Dell RAID?
Re: [SQL] getting count for a specific querry
On Apr 8, 2005 3:42 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote: On Fri, 2005-04-08 at 15:35, Bob Henkel wrote:> On Apr 8, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:> On Apr 8, 2005, at 3:23 PM, Joel Fradkin wrote:>> > I set up the data on 4 10k scsi drives in a powervault and> my wal on 2> > 15k> > drives. I am using links to those from the install> directory. It> > starts and> > stops ok this way, but maybe it should be different.> >>> Your problem might just be the choice of using a Dell RAID> controller.> I have a 1 year old box connected to a 14 disk powervault> (PowerEdge> 2650) and it is dog slow compared to a dual opteron with 8> disks that> is replacing it. It is all I/O for me, and the dell's just> are not> known for speedy I/O.>> Vivek Khera, Ph.D.> +1-301-869-4449 x806 But that is relative I would think. Is the Dell RAID much faster than> my desktop SATA drive with no RAID? I'm by any means as knowledgeable> about I/O setup as many of you are but my 2 cents wonders if the Dell> RAID is really that much slower than a competitively priced/speced> alternative? Would Joel's problems just fade away if he wasn't using> a Dell RAID?My experience with the 3i controllers (See my earlier post) was that myold Pentium Pro200x2 machine with 512 meg ram and a generic Ultra WideSCSI card and a half dozen drives running software RAID 5 was faster.Seriously. So was my P-II-350 desktop with the same controller, and anolder Dual P-III-750 with only UltraSCSI running in a RAID-1 mirror set.The 3I is REALLY slow (or at least WAS slow) under linux. Interesting... Maybe Joel after a weekend of rest can try it on a different setup even if that different setup is just a power users development machine to see if he has same or worse timing results. Be wonderful if it magically sped up.
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 15:36, Vivek Khera wrote: > On Apr 8, 2005, at 4:31 PM, Scott Marlowe wrote: > > > Note that there are several different RAID controllers you can get with > > a DELL. I had good luck with the PERC 4C (AMI MegaRAID based) at my > > > > I've had bad luck regarding speed with *all* of them, AMI MegaRAID and > Adaptec based ones, under high load. Under moderate to low load > they're acceptable. Were you using battery backed cache with write-back enabled on the 4C? The 3C is also megaraid based, but it a pretty old design and not very fast. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [SQL] getting count for a specific querry
On Fri, 2005-04-08 at 15:41, Vivek Khera wrote: > On Apr 8, 2005, at 4:35 PM, Bob Henkel wrote: > > > desktop SATA drive with no RAID? I'm by any means as knowledgeable > > about I/O > > setup as many of you are but my 2 cents wonders if the Dell RAID is > > really > > that much slower than a competitively priced/speced alternative? Would > > Joel's problems just fade away if he wasn't using a Dell RAID? > > > > "Dell RAID" is not one thing. They sell "altered" RAID cards from > Adaptec and LSI. Whatever alteration they do to them tends to make > them run not so fast. > > I have a Dell SATA RAID (adaptec based) on the office server and it is > OK, though not something I'd buy again. > > I have various PERC 3 and PERC 4 RAID controllers on my servers (SCSI > based) and they suck under heavy I/O load. > > I wonder why the name-brand LSI cards work so much faster... perhaps it > is the motherboard? I don't know, and I don't care... :-) For me, > high performance DB and Dell servers are mutually exclusive. It would be nice to be able to put a stock ami megaraid in one and see. Do you run your 2650s with hyperthreading on? I found that slowed mine down under load, but we never had more than a couple dozen users hitting the db at once, so we may well have had a different load profile than what you're seeing. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] getting count for a specific querry
I turned off hyperthreading (I saw that on the list that it did not help on Linux). I am using a pretty lightweight windows box Optiplex with IDE 750-meg internal 2.4 mghz cpu. My desktop has 2 gig, so might not be bad idea to try it local (I have installed), but me thinks its not totally a hardware issue for us. Joel Fradkin Do you run your 2650s with hyperthreading on? I found that slowed mine down under load, but we never had more than a couple dozen users hitting the db at once, so we may well have had a different load profile than what you're seeing. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] getting count for a specific querry
Since it is a count of matched condition records I may not have a way around. What you could do is cache the search results (just caching the id's of the rows to display is enough and uses little space) in a cache table, numbering them with your sort order using a temporary sequence, so that you can : SELECT ... FROM cache WHERE row_position BETWEEN page_no*per_page AND (page_no+1)*per_page-1 to get the count : SELECT row_position FROM CACHE ORDER BY row_position DESC LIMIT 1 Add a session_id referencing your sessions table with an ON DELETE CASCADE and the cache will be auto-purged when sessions expire. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] getting count for a specific querry
Please run this disk throughput test on your system : http://boutiquenumerique.com/pf/multi_io.py It just spawns N threads which will write a lot of data simultaneously to the disk, then measures the total time. Same for read. Modify the parameters in the source... it's set to generate 10G of files in the current directory and re-read them, all with 8 threads. How much I/O do you get ? Also hdparm -t /dev/hd? would be interesting. On Fri, 08 Apr 2005 21:51:02 +0200, Joel Fradkin <[EMAIL PROTECTED]> wrote: I will also look at doing it the way you describe, they do have wide liberty. Thanks so much for the ideas. Sorry I did not do a perusal of the archives first (I normally try that, but think I am brain dead today). Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 [EMAIL PROTECTED] www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mischa Sandberg Sent: Friday, April 08, 2005 2:40 PM To: Scott Marlowe Cc: pgsql-sql@postgresql.org Subject: Re: [SQL] getting count for a specific querry Quoting Scott Marlowe <[EMAIL PROTECTED]>: On Fri, 2005-04-08 at 12:08, Joel Fradkin wrote: > I might have to add a button to do the count on command so they don't get > the hit. > I would want it to return the count of the condition, not the currently > displayed number of rows. Judging postgresql on one single data point (count(*) performance) is quite unfair. Unless your system only operates on static data and is used to mostly do things like counting, in which case, why are you using a database? For the general discussion of slowness of count(*), and given no entry on the subject in http://www.postgresql.org/docs/faqs.FAQ.html ... I guess everyone has to be pointed at: http://archives.postgresql.org/pgsql-hackers/2005-01/msg00247.php However, the gist of this person's problem is that an adhoc query, NOT just a 'select count(*) from table', can take remarkably long. Again, the problem is that PG can't just scan an index. -- One workaround for this is to use EXPLAIN. THIS APPLIES TO CASES WHERE THE USER HAS WIDE LIBERTY IN QUERIES. It's pointless overhead, otherwise. default_statistics_target is cranked up to 200 on all such tables, and pg_autovacuum is running. (If there were anything to improve, it would be refining the thresholds on this). If the "(cost...rows=" string returns a number higher than the QUERY row limit, the user is derailed ("That's not specific enough to answer immediately; do you want an emailed report?"). Otherwise, it runs EXPLAIN ANALYZE, which is still faster than the query itself. If the "(actual...rows=...)" is higher than the RESULT row limit (PAGE limit). It then runs the query, with the PAGE rows offset and limit --- and happily, practically everything that query needs is now in shared_buffers. The count from the EXPLAIN analyze is displayed in the web page. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Numeric and CSV under 8.0.1 ?
Stef: Why is iuserid numeric? Are you going to do any math on the field? If not, change it to varchar. In the long run you'll be happier. Stef wrote: Hello Everyone, Currently, here at work, I am doing the whole 'advocacy' part of postgreSQL. It's not really hard to do, as the other database's are MySQL and Sybase ;) There is obviously a whole spat of data munging going on in the background, and I noticed that psql in 8.0.1 now accepts CSV ! Great. Except, it doesn't really appear to be 100% there. Numeric's wrapped in '...' don't want to appear to go in. Is this a 'known problem' ? Table "public.visitor_main" Column | Type | Modifiers --+--+-- iuserid | numeric(12,0)| not null firstname| character(25)| not null lastname | character(25)| not null Sample Data line '3236','Alonzo','Peter' ERROR: invalid input syntax for type numeric: "'3236'" CONTEXT: COPY visitor_main, line 1, column iuserid: "'3236'" Thoughts ? Regards Steph ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]