Re: [GENERAL] help w/ SRF function
am Mon, dem 17.09.2007, um 9:21:22 +0800 mailte Ow Mun Heng folgendes: > CREATE OR REPLACE FUNCTION foo_func(fromdate timestamp, todate > timestamp, code text) > RETURNS SETOF foo AS > $BODY$ > SELECT > TRH.ID, > TRH.data1, > TRH.data2, > FROM D > INNER JOIN TS >ON TS.id = D.id > inner join TRH >on ts.id = trh.id > WHERE D.start_timestamp BETWEEN fromdate AND todate > And D.code IN (code) > $BODY$ > LANGUAGE 'sql' IMMUTABLE STRICT; > > How can I go about this this? The above will fail due to missing columns > fromdate/todate/code. Use $1, 2 and $3 within the function-body instead fromdate, todate and code. Example: test=# select * from n; feld1 | feld2 +--- Frank |23 Frank |31 Stefan |32 Stefan |22 Jochen |29 (5 rows) test=*# create or replace function nn(int) returns setof n as $$ select * from n where feld2=$1; $$ language sql; CREATE FUNCTION test=*# select * from nn(22); feld1 | feld2 +--- Stefan |22 (1 row) Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Problem dropping table
On 9/18/07, Ken Logan <[EMAIL PROTECTED]> wrote: > > > On Tue, 2007-09-18 at 11:10 -0700, Alan Hodgson wrote: > > On Tuesday 18 September 2007 10:30, Ken Logan <[EMAIL PROTECTED]> wrote:> > When we try to drop the table we get the error:> ERROR: "member_pkey" is an > index > You have to remove the table from it's Slony set before you can drop it. > Slony does some hackish things to subscriber tables that make them unusable > for normal DDL operations. > Your master probably isn't too thrilled, either, since it probably thinks the > table is still in a set. > > > We actually used slonik_drop_table so its been properly removed from > slony. > > In any case, you'd probably have better luck with this on the Slony list. > I'm not at all sure your problem can be fixed without discarding the slave, > but someone there might know. > > > I was afraid of that... since there doesn't seem to be any remnants left > behind by slony affecting this table I'm not sure I can ask a sensible > question about it on the slony list, so I guess I'll need to either dig into > the source for what the postgresql error message means or reinit the slave > (again). > It is a known issue with slony (the foreign key causes it), check the slony mailing list archives for more info. AFAIK you have to use slonik and EXECUTE SCRIPT ... EXECUTE ONLY ON NODE x to remove the table. Regards MP
Re: [GENERAL] Q:Aggregrating Weekly Production Data. How do you do it?
On Tue, 2007-09-18 at 09:55 -0500, Scott Marlowe wrote: > On 9/17/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote: > > > Just wondering how everyone is doing aggregration of production data. > > Where I work, we aggregate by the minute in the application, then dump > those stats entries into the database en masse from each machine. So, [snip] > The fail1/2/3 are types of failures, the totalresptime > time is all the successful requests added together, so that > totalresptime/success = average for that minute, and max is the > longest running request ending in that minute. So, it's basicaly a summation rather than using avg() all the way. That would mean using something like bigint or something to deal with the large numbers > > > (getting min/max/count isn't much of an issue. Stdev is the main issue I > > believe) > > > > One such instance I've read about is.. > > Isn't stddev() working for you? Stdev is based on a population of data. if I do slices and if I want to stdev across 2 months (8 weeks eg) then it would be wrong. > > What I do is aggregate the minute time slices by grouping by > date_trunc('xxx',timestamp) and then use that as a subselect to a > query that does the stddev() outside of that. works pretty well for > us, and makes it easy to identify trends. Trending analysis is very critical and the only reference I've found on how to get "stdev" is based on what I posted. > One of the things we started doing is to aggregate the last six weeks > data by the day / hour and then comparing the last 24 hours worth of > data to those six weeks worth to see trends for each hour of each day. > The queries are huge and ugly, but they work, and run in about 4 > minutes on a fairly good sized chunk of data. We have about 150k to > 300k entries a day put into this db. I'm not sure how many K entries in a day(yet to count it) but I'm getting into trouble w/ one-2-many relationships and PG is choosing to do nested loops etc. (lots of left joins on same tables) So, I've to resort to using SRF and function scans. a >2 hour Complex query (on my laptopn,PG, runs in 20min on high end mssql) is reduced to ~60secs. Thanks for the response. BTW, are user queries an issue? Does it interfere with the loading. (That's one of my concerns over here). I noticed that you do the calc at the app before sending it to the DB, which is a good thing and every minute too. (is it adviseable to do it in chunks of 1min? less data per minute vs per 30min/1 hour ) ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Uninstall mess Postgresql 7.3.4, 8.1.4 on Redhat 9
I had two versions of postgresql running on a Redhat 9 server. Today I shut down the 7.3.4 version and uninstalled it using the RPM's. Unfortunately, as far as I can tell, this deleted the postgres user account and the system's knowledge of the posgresql 8.1.4 software. The system will no longer run 8.1.4 even when I go to the /bin/ to run the commands. The data is still present and so is the 8.1.4 software. I remade the postgres account on the server and gave it permission's to 8.1.4 directories but still no dice. I have multiple backups of the database that are okay. Would my best bet be to download and install 8.2.5 and import the backup? Or do I need to get 8.1.4 running, and how do I do that? Please reply to me as well as the list as I am on the digest version. TIA *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** Margaret Gillon, IS Dept., Chromalloy Los Angeles, ext. 297 This e-mail message and any attachment(s) are for the sole use of the intended recipient(s) and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient(s), please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachment(s) to the original message.
[GENERAL] Version 8.2.5 for Windows doesn't startup normally after upgrading from 8.2.4
Hello, I just upgraded from 8.2.4 to 8.2.5 on Windows but the service doesn't startup normally. This is the first time I have trouble with an upgrade of version 8.2. I have the following non standard configuration: - Program directory: C:\Archivos de programa\PostgreSQL\8.2\bin - Data Directory: D:\PostgreSQL\data - SSL ON - Working database is IOP_IPR (there is no POSTGRES database) - When using pg_ctl start -D D:\PostgreSQL\data\ The server comes up normally, 2 messages appear: LOG: could not load root certificate file "root.crt": No such file or directory DETAIL: Will not verify client certificates. The log file contains only: 2007-09-18 14:36:25LOG: database system was shut down at 2007-09-18 14:33:27 2007-09-18 14:36:25LOG: checkpoint record is at 29/D1D4100 2007-09-18 14:36:25LOG: redo record is at 29/D1D4100; undo record is at 0/0; shutdown TRUE 2007-09-18 14:36:25LOG: next transaction ID: 0/326345; next OID: 42275 2007-09-18 14:36:25LOG: next MultiXactId: 1; next MultiXactOffset: 0 2007-09-18 14:36:25LOG: database system is ready - When started as a service with the following value of pgsql-8.2\ImagePath in the registry: "C:\Archivos de programa\PostgreSQL\8.2\bin\pg_ctl.exe" runservice -w -N "pgsql-8.2" -D "D:\PostgreSQL\data\" The log file contains: 2007-09-18 14:28:34LOG: database system was shut down at 2007-09-18 14:26:02 2007-09-18 14:28:34LOG: checkpoint record is at 29/D1D4060 2007-09-18 14:28:34LOG: redo record is at 29/D1D4060; undo record is at 0/0; shutdown TRUE 2007-09-18 14:28:34LOG: next transaction ID: 0/326343; next OID: 42275 2007-09-18 14:28:34LOG: next MultiXactId: 1; next MultiXactOffset: 0 2007-09-18 14:28:34LOG: could not load root certificate file "root.crt": No such file or directory 2007-09-18 14:28:34DETAIL: Will not verify client certificates. 2007-09-18 14:28:34LOG: database system is ready 2007-09-18 14:28:35 127.0.0.1 postgres postgres FATAL: the database system is starting up And then an endless list of 2007-09-18 14:28:35LOG: could not load root certificate file "root.crt": No such file or directory 2007-09-18 14:28:35DETAIL: Will not verify client certificates. 2007-09-18 14:28:35 127.0.0.1 postgres postgres FATAL: database "postgres" does not exist 2007-09-18 14:28:36LOG: could not load root certificate file "root.crt": No such file or directory 2007-09-18 14:28:36DETAIL: Will not verify client certificates. 2007-09-18 14:28:36 127.0.0.1 postgres postgres FATAL: database "postgres" does not exist And I have to abort the startup. Maybe the next is a hint: When I had blocked the access to localhost with SSL=ON (using hostnossl pg_hba.conf) there was a constant complaint (2 times per second) with: 127.0.0.1 postgres postgres FATAL: no pg_hba.conf entry for host "127.0.0.1", user "postgres", database "postgres", SSL on - Have I something wrong with the configuration of the service? Thanks on forehand. -- Walter Roeland ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Tsearch2 - spanish
Hi You are rigth, the output of "show lc_ctype;" is C. Then I did is: prueba1=# show lc_ctype; lc_ctype - es_MX.ISO8859-1 (1 row) and do it % initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1 (how you do say) and "createdb -E iso8859-1 prueba1" and finally tsearch2 the original problem is resolved prueba1=# select to_tsvector('espanol','melón'); to_tsvector - 'melón':1 (1 row) but if I change the sentece for it: prueba1=# select to_tsvector('espanol','melón perro mordelón'); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> ??? lost the connection ... the server is up any idea? The synonym is intentional thanks in advanced El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev escribió: > > LC_CTYPE="POSIX" > > > pls, output of "show lc_ctype;" command. If it's C locale then I can identify > problem - characters diacritical mark (as ó) is not an alpha character, and > ispell dictionary will fail. To fix that you should run initdb with options: > % initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1 > or > % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8 > > In last case you should also recode all dictionary's datafile in utf8 > encoding. > > >>> prueba=# select to_tsvector('espanol','melón'); > >>> ERROR: Affix parse error at 506 line > >> and > >>> prueba=# select lexize('sp','melón'); > >>> lexize > >>> - > >>> {melon} > >>> (1 row) > sp is a Snowball stemmer, it doesn't require affix file, so it works. > > By the way, why is synonym dictionary paced after ispell? is it intentional? > Usually, synonym dictionary goes first, then ispell and after all of them > snowball. > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Problem dropping table
On Tue, 2007-09-18 at 11:10 -0700, Alan Hodgson wrote: > On Tuesday 18 September 2007 10:30, Ken Logan <[EMAIL PROTECTED]> wrote: > > When we try to drop the table we get the error: > > ERROR: "member_pkey" is an index > > You have to remove the table from it's Slony set before you can drop it. > Slony does some hackish things to subscriber tables that make them > unusable for normal DDL operations. > > Your master probably isn't too thrilled, either, since it probably > thinks the table is still in a set. We actually used slonik_drop_table so its been properly removed from slony. > In any case, you'd probably have better luck with this on the Slony > list. I'm not at all sure your problem can be fixed without discarding > the slave, but someone there might know. I was afraid of that... since there doesn't seem to be any remnants left behind by slony affecting this table I'm not sure I can ask a sensible question about it on the slony list, so I guess I'll need to either dig into the source for what the postgresql error message means or reinit the slave (again).
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Richard Broersma Jr wrote: --- Gregory Williamson <[EMAIL PROTECTED]> wrote: A very low fill factor means that pages are "sparse" and so inserts and updates are less likely to trigger massive b-tree rebalancings. I take it that "massive b-tree rebalancings" could cause a problem with the performance of disk writing though-put from UPDATEs and INSERTs? Regards, Richard Broersma Jr. Precisely -- even if it can keep everything in RAM it can occupy quite a few cycles to rebalance a large b-tree. And eventually those changes do need to get written to disk so the next checkpoint (I think) will also have more work. G ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Gregory Williamson <[EMAIL PROTECTED]> wrote: > A very low fill factor means that pages are > "sparse" and so inserts and updates are less likely to trigger massive b-tree > rebalancings. I take it that "massive b-tree rebalancings" could cause a problem with the performance of disk writing though-put from UPDATEs and INSERTs? Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Windows Auto-Vacuum in 8.2.4 or 8.2.5
A while back it was pointed out the that the Windows version of 8.2.3 had a bug that prevented auto-vacuum from working correctly. http://archives.postgresql.org/pgsql-general/2007-04/msg00139.php I wasn't able to determine from the release notes if this bug was fixed in versions 8.2.4 or 8.2.5. Does any know if it was? Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Sorry for top-posting -- challenged reader. Can't speak directly to PostgreSQL but in Informix the fill factor is useful for tweaking indexes. A very high fill factor is useful for tables that are static -- any inserts or changes to the index trigger a *lot* of moving of b-tree branches. But the high fill factor means that each page has more useful data references in it. A very low fill factor means that pages are "sparse" and so inserts and updates are less likely to trigger massive b-tree rebalancings. I've never used it on PostgreSQL (yet!) but am looking forward to it. Beware of premature optimization! HTH, Greg Williamson Senior DBA GlobeXplorer LLC, a DigitalGlobe company Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. (My corporate masters made me say this.) -Original Message- From: [EMAIL PROTECTED] on behalf of Richard Broersma Jr Sent: Tue 9/18/2007 10:29 AM To: Phoenix Kiula; Bill Moran Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER --- Phoenix Kiula <[EMAIL PROTECTED]> wrote: > What constitutes a "small fill factor"? Would 70 be good? I guess my > current must have been the default, which the manual says is 100. On the following link: http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS I found this: "B-trees use a default fillfactor of 90, but any value from 10 to 100 can be selected." Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Q:Aggregrating Weekly Production Data. How do you do it?
Ow Mun Heng wrote: > Data which runs in the vicinity of a few million a week. > > What are the methods which will effectively provide the > min/max/average/count/stdev of the weekly sample size based on different > products/software mix etc. > > and still be able to answer correctly, what's the average of data_1 over > the pass 2 months? So, are you viewing the problem as one of figuring out how to avoid having to store all this raw data permanently but still have the statistical summary value results available without having to recalculate each time? > > I can't just take the average of an 8 averages of each week) > > eg: > wk avg data_1 > w1 - 2 > ...average of past 2 months = ave(w1-w8) which is statistically wrong. One approach would be to to save the count of values as well as the average. Then your eight-week average is calculated as a weighted average, i.e., each weeks average is weighted by the corresponding count of data values: CREATE TABLE summary_data ( summary_date timestamp, num_of_values integer, avg_of_values numeric ); Then, for instance, the eight-week average is computed as SELECT AVG(num_of_values * avg_of_values)/SUM(num_of_values) AS eight_week_avg FROM summary_data WHERE summary_date BETWEEN (eight weeks ago) AND now(); > > (getting min/max/count isn't much of an issue. Stdev is the main issue I > believe) You probably need to store each of those weekly summary statistic values too, at least for min and max, i.e., CREATE TABLE summary_data ( summary_date timestamp, num_of_values integer, avg_of_values numeric, min_of_values numeric, max_of_values numeric ); Then, for instance, overall basic statistics are computed as SELECT COUNT(num_of_values), AVG(num_of_values * avg_of_values), MIN(min_of_values), MAX(max_of_values) FROM summary_data WHERE summary_date BETWEEN (eight weeks ago) AND now(); Extending this design to include the variance is a more complicated. While you can compute the average for the the past eight weeks using a weighted average of each of the separate eight weeks, I think you actually need the raw data values for the whole eight weeks to figure the standard deviation -- I don't readily see how you could without it. A different possibility would be to maintain a running average and variance (rather than windowing those statistics for the sliding eight-week period), and then taking a weekly snap shot of the running values to use for trending. > > One such instance I've read about is.. > > 1. Calculate sum of square of each sale > ... > 8. Stdev will be the square root of step 7 > > The results are valid (verified with actual data) but I don't understand > the logic. All the Statistical books I've read marked stdev as sqrt > (sum(x - ave(x))^2 / (n - 1). The formula is very different, hence the > confusion. I think you copied the std dev formula incorrectly, but the eight step algorithm can be shown to get you to the same point, after a little analytical manipulation. The advantage to one over the other is that the eight-step procedure can be used to implement an on-the-fly calculation of mean and variance, i.e., you can maintain a "running average" and update the statistics as you collect more data and not have to maintain the entire detail data set history, as would be required by an attempt to implement the definition directly. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?
On Tue, Sep 18, 2007 at 11:26:17AM +0300, Marko Kreen wrote: > Few bytes being same is normal. Those are PGP packet header, > telling "this is symmetrically encrypted session key packet, > with length X" plus some more details. I see. So, you are saying no need to generate my own IV to prepend to the plain text before encrypting. > If more that few bytes are same, and if the salt is not randomised > it _could_ be a sign of problem. Either pgcrypto bug or failure > to initialize random generator. If you suspect a problem, please > send me few example encryptions with keys and your setup details > (postgres version, openssl or not, os version) No, it was only a few bytes that were similar, so the headers explain that. > > Besides following the PCI DSS and external audit procedures, the plan > > is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a > > credit card storage server. The server and db are SSL only and the > > key is passed from the application and never stored anyplace (except > > in memcached on other servers during the session). The key is a > > user's plain text password plus an application-specific secret. So, > > each row has its own key. Passwords must be changed periodically, > > etc. > > I don't know details of your setup, but I strongly suggest you > look into using public-key crypto. That allow you separate keys > for encryption and decryption. So in webserver where users only > input credit cards, you keep only public keys, so anybody cracking > that won't be able to decrypt data. I need to look at that more. But I've seen that suggested where one needs to decrypt the data at a later time. We don't have that need. Our plan was to never store any keys. Every user must log in to the application with a password. Their account passwords are only stored hashed on disk, so we don't know their passwords. The plan is to encrypt their plain-text password with a secret known by the application only and stored into memcached. It's this plain-text password that will be sent to a separate server to encrypt and (and decrypt) their credit card data when the user make a transaction. We only need to store the credit card data to allow subsequent charges to their "card on file" -- and that only happens when a user logs in and processes a transaction. We don't have any way to decrypt the data without this password stored in the session. If someone hacks an application server they could pluck active user's passwords from memcached and also find the application's secret word. Then if they also hacked the credit card server they could then decrypt the data using passwords they were able to sniff. See any glaring holes? Thanks for the help! -- Bill Moseley [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Performance Issues (was: "like" vs "substring" again)
Christian Schröder wrote: Or would it be possible to tweak how the planner determines the selectivity? I have read in the docs (chapter 54.1) that in case of more than one condition in the where clause, independency is assumed. In my case ("... where test like '11%' and test not like '113%'") this is clearly not the case, so it might be an interesting point to address. I think the planner does think about the interactions of inequalities, so if you can express your query with less-than and friends, or even with BETWEEN, you might get a better plan. I don't know the details of your setup, but you can do things like this with any ordered type: where test between '11' and '113' or test >= '114' I know this does not match the exact semantics of your query, but hopefully you get the idea. - John D. Burger MITRE ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Problem dropping table
On Tuesday 18 September 2007 10:30, Ken Logan <[EMAIL PROTECTED]> wrote: > When we try to drop the table we get the error: > ERROR: "member_pkey" is an index You have to remove the table from it's Slony set before you can drop it. Slony does some hackish things to subscriber tables that make them unusable for normal DDL operations. Your master probably isn't too thrilled, either, since it probably thinks the table is still in a set. In any case, you'd probably have better luck with this on the Slony list. I'm not at all sure your problem can be fixed without discarding the slave, but someone there might know. -- "Corruptissima republica, plurimae leges" (The more corrupt the state, the more laws.) - Tacitus ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Problem dropping table
We are using postgresql 8.2.3 with slony1 1.2.8 and we're having problems trying to drop a table on the slony1 secondary that was sucessfully dropped on the master database. I'm sure this is just because I'm missing something, but it doesn't seem like there should be any reason the table cannot be dropped. The problem seems to be due to a foreign key constraint on the table we are trying to drop, I don't believe any other tables reference the table we are trying to drop with foreign key constraints. The schema of the table is as follows: Table "public.member_note" Column | Type | Modifiers ---+-+--- member_id | bigint | not null notes_id | bigint | not null notes_idx | integer | not null Foreign-key constraints: "fkb5d78a7737864f78" FOREIGN KEY (member_id) REFERENCES member(id) "fkb5d78a77503a5452" FOREIGN KEY (notes_id) REFERENCES note(id) When we try to drop the table we get the error: ERROR: "member_pkey" is an index This error seems to have something to do with the primary key in the related member table: Table "public.member" Column | Type | Modifiers ---++--- discriminator | character varying(31) | not null id| bigint | not null deleted | boolean| not null status| character varying(255) | not null principal_id | bigint | extravaluelist_id | bigint | Indexes: "member_pkey" PRIMARY KEY, btree (id) "member_principal_id_key" btree (principal_id) Foreign-key constraints: "fk892776ba530919c4" FOREIGN KEY (extravaluelist_id) REFERENCES extravaluelist(id) "fk892776baa88520c6" FOREIGN KEY (principal_id) REFERENCES principal(id) We have also tried to drop the foreign key constraint on the table as an intermediate step to getting rid of the table: alter table member_note drop constraint fkb5d78a7737864f78; ERROR: "member_pkey" is an index Anyone know how to drop this table or what the error means?
[GENERAL] Performance Issues (was: "like" vs "substring" again)
Hi list, I am still fighting with the really slow database queries (see http://www.nabble.com/%22like%22-vs-%22substring%22-again-t4447906.html), and I still believe that the cause of the problem is that the query planner makes incorrect estimations about the selectivity of the "where" clauses. I wondered if it is possible to make the query planner perform a sequential scan over a table *before* it starts planning? If I know that a table has only about 3000 rows, the overhead due to this sequential scan can be ignored. On the other hand, this would give the planner an exact data basis for his planning. Or would it be possible to tweak how the planner determines the selectivity? I have read in the docs (chapter 54.1) that in case of more than one condition in the where clause, independency is assumed. In my case ("... where test like '11%' and test not like '113%'") this is clearly not the case, so it might be an interesting point to address. Do you have any other tips for me? Kind regards, Christian -- Deriva GmbH Tel.: +49 551 489500-42 Financial IT and Consulting Fax: +49 551 489500-91 Hans-Böckler-Straße 2 http://www.deriva.de D-37079 Göttingen Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Optimizing "exists"
Steve Crawford <[EMAIL PROTECTED]> writes: > If the sub-select returns a large result set, will there be any benefit > to adding "limit 1" to the sub-select or does the query planner > automatically deduce that "limit 1" is the correct interpretation? It does, although poking at it I notice a bit of a bug: regression=# explain select * from tenk1 a join tenk1 b on a.unique1 = b.ten; QUERY PLAN Merge Join (cost=2287.89..2438.58 rows=1 width=488) Merge Cond: (a.unique1 = b.ten) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..1702.22 rows=1 width=244) -> Materialize (cost=2287.89..2412.89 rows=1 width=244) -> Sort (cost=2287.89..2312.89 rows=1 width=244) Sort Key: b.ten -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=244) (7 rows) regression=# explain select exists(select * from tenk1 a join tenk1 b on a.unique1 = b.ten); QUERY PLAN - Result (cost=4822.00..4822.01 rows=1 width=0) InitPlan -> Nested Loop (cost=0.00..4822.00 rows=1 width=488) -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=244) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..0.42 rows=1 width=244) Index Cond: (a.unique1 = b.ten) (6 rows) The second case is correctly choosing a fast-start plan, but it's reporting the sub-select's cost up to the next plan level as being the full runtime instead of the expected partial runtime. That has no bad effect here, but might in a more complex situation where the estimated subselect cost affected upper join order or some such. If you were up against such a situation, an explicit LIMIT 1 would probably help: regression=# explain select exists(select * from tenk1 a join tenk1 b on a.unique1 = b.ten limit 1); QUERY PLAN --- Result (cost=0.48..0.49 rows=1 width=0) InitPlan -> Limit (cost=0.00..0.48 rows=1 width=488) -> Nested Loop (cost=0.00..4822.00 rows=1 width=488) -> Seq Scan on tenk1 b (cost=0.00..458.00 rows=1 width=244) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..0.42 rows=1 width=244) Index Cond: (a.unique1 = b.ten) (7 rows) Same subplan, but a saner cost estimate at the upper level... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?
--- "A. Kretschmer" <[EMAIL PROTECTED]> wrote: > You can do this, you need a untrusted language like plperlU or plsh. > Then you can write a TRIGGER and call external programs. This may be a silly question, will plsh work on a windows server? I am pretty sure that plbat doesn't exist :-). Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] update command question?
On Sun, Sep 16, 2007 at 08:03:56PM -0700, Charles.Hou wrote: > i have the log analysis report by using the pgFouine. in the part of > Slowest queries, the update commands " update com_status set > travel=620.70001220703 where no=1" have the high Av.duration(s). how > should i do to solve this problem? why this simple command can take > up > the most time? Maybe you are missing an index? In any case, you're going to have to provide *much* more information (at the very least an explain analyse of that statement) if you want any more detailed answer. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
[GENERAL] tradeoffs for multi-schema or multi-db
Here's the situation... I have 2 different apps that both require a separate shema, or maybe db. There is actually one column in one table of each ot these db/schemas that are in common and a desire to :cross" between them in some cases. For example, an app for keeping track of the census results and another app that keeps track of criminal cases in the justice system. They "shared" field is of course the citizen/defendant. Two different apps that should remain separate, but at times it would be nice to check the legal status of the defendant by looking at the census data. Considering the somewhat rare query that will need to bridge these 2 data sources (dbs or schemas), what are the pros/cons of having 2 schemas in the same DB vs 2 DBs? What if the query is to be committed to a PLpgsql function/procedure? How awkward is it to bridge schemas vs bridging dbs in that form? Thanks for any advise/help. -dave
Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?
On Sep 17, 2007, at 9:50 AM, Bima Djaloeis wrote: Hi there, I am new to PostgreSQL, is it possible to create something so that 1) If I insert / update / delete an item from my DB... 2) ... an awk / shell / external program is executed in my UNIX System? If yes, how do I do this and if no, thanks for telling. Yes it's possible, but it's probably a really bad idea, so I'm not going to tell you how. Instead, use a trigger to store a message in a queue table, then have an external persistent process poll the queue table (or use listen/ notify to sleep until new messages to be added to the queue). Cheers, Steve ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?
On Sep 17, 2007, at 11:50 AM, Bima Djaloeis wrote: Hi there, I am new to PostgreSQL, is it possible to create something so that 1) If I insert / update / delete an item from my DB... 2) ... an awk / shell / external program is executed in my UNIX System? If yes, how do I do this and if no, thanks for telling. Thanks for reading, any help is appreciated. You could use a trigger function in an untrusted procedural language such as plperlu or plpythonu to do that. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?
am Mon, dem 17.09.2007, um 18:50:46 +0200 mailte Bima Djaloeis folgendes: > Hi there, > > I am new to PostgreSQL, is it possible to create something so that > > 1) If I insert / update / delete an item from my DB... > 2) ... an awk / shell / external program is executed in my UNIX System? > > If yes, how do I do this and if no, thanks for telling. You can do this, you need a untrusted language like plperlU or plsh. Then you can write a TRIGGER and call external programs. Hope that helps, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?
On 9/17/07, Bima Djaloeis <[EMAIL PROTECTED]> wrote: > Hi there, > > I am new to PostgreSQL, is it possible to create something so that > > 1) If I insert / update / delete an item from my DB... > 2) ... an awk / shell / external program is executed in my UNIX System? > > If yes, how do I do this and if no, thanks for telling. Yes. you have to use an untrusted pl language, like pl/perlu or pl/tclu and you have to be a superuser to create user defined functions in those languages. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] ON INSERT => execute AWK/SH/EXE?
On 9/17/07, Bima Djaloeis <[EMAIL PROTECTED]> wrote: > Thanks for reading, any help is appreciated. Triggers + Untrusted PL/Perl, see: 1) http://www.postgresql.org/docs/8.2/static/plperl-triggers.html 2) http://www.postgresql.org/docs/8.2/static/plperl-trusted.html ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Alter sequence restart with selected value...
Scott Marlowe wrote: On 9/18/07, Jeff Ross <[EMAIL PROTECTED]> wrote: I'm using copy to insert a bunch of rows into a new table with a unique primary key. Copy is correctly incrementing the primary key, but apparently the sequence itself is never updated because when I go to insert again I get a constraint violation. Try using setval. select setval('seqname',select max(id) from tablename)); !DSPAM:46eff995184363531088756! Oh, I knew it was going to be something simple! Than you, Scott. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Phoenix Kiula <[EMAIL PROTECTED]> wrote: > What constitutes a "small fill factor"? Would 70 be good? I guess my > current must have been the default, which the manual says is 100. On the following link: http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS I found this: "B-trees use a default fillfactor of 90, but any value from 10 to 100 can be selected." Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
index fillfactor (was Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER)
In response to "Phoenix Kiula" <[EMAIL PROTECTED]>: > Thanks for a very informative post! One question: > > > I'm not sure how to find the current value, but a smaller fill factor > > on busy tables should lead to less fragmentation, thus more efficient > > indexes over time. Keep in mind that a smaller fill factor will also > > lead to larger indexes initially. > > What constitutes a "small fill factor"? Would 70 be good? Unfortunately, I can't say. I have not yet had the opportunity to experiment with different fillfactors, so I can only speak in vague estimations on this topic. > I guess my > current must have been the default, which the manual says is 100. I expect it's at the default, but the docs say that is 90%: http://www.postgresql.org/docs/8.2/static/sql-createindex.html Where did you see 100? > Or > did you mean really small fill factor like 20? In this context, what > is "packing" in the manual -- is that some kind of compression? Hopefully, someone more knowledgeable will chime in with some wise suggestions. Barring that, I can only suggest you experiment to find what works for your workload, but don't rule out the possibility that extremely low fillfactor values might work well for you. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Optimizing "exists"
Does the planner automatically add "limit 1" to "exists" sub-selects? In other words, take an update like: update foo set itexists = exists (select 1 from bar where bar.something = foo.something); If the sub-select returns a large result set, will there be any benefit to adding "limit 1" to the sub-select or does the query planner automatically deduce that "limit 1" is the correct interpretation? Cheers, Steve ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Database reverse engineering
On 13 Sep, 06:12, [EMAIL PROTECTED] (Ow Mun Heng) wrote: > On Mon, 2007-09-10 at 13:00 -0600, RC Gobeille wrote: > > Or this one: > >http://schemaspy.sourceforge.net/ > > Can't seem to get it to connect to PG using the example. > > java -jar schemaSpy_3.1.1.jar -t pgsql -u operator -p operator -o > test_db -host localhost -db test_db As the end of the error message suggests, you might want to try something like this: java -jar schemaSpy_3.1.1.jar -cp postgresql-xxx.jar -t pgsql -u operator -p operator -o test_db -host localhost -db test_db The argument for -cp needs to be the full path to the .jar file, and must obviously use the real filename of that file itself. You may then get complaints about not finding the schema: I had to specify "-s public", I think. Paul ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Thanks for a very informative post! One question: > I'm not sure how to find the current value, but a smaller fill factor > on busy tables should lead to less fragmentation, thus more efficient > indexes over time. Keep in mind that a smaller fill factor will also > lead to larger indexes initially. What constitutes a "small fill factor"? Would 70 be good? I guess my current must have been the default, which the manual says is 100. Or did you mean really small fill factor like 20? In this context, what is "packing" in the manual -- is that some kind of compression? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] help w/ SRF function
Hi, I want to use a SRF to return multi rows. current SRF is pretty static. create type foo_type as ( id smallint data1 int data2 int ) CREATE OR REPLACE FUNCTION foo_func() RETURNS SETOF foo AS $BODY$ SELECT TRH.ID, TRH.data1, TRH.data2, FROM D INNER JOIN TS ON TS.id = D.id inner join TRH on ts.id = trh.id WHERE D.start_timestamp BETWEEN '8/1/2007' AND '9/8/2007' And D.code IN ('ID_123') $BODY$ LANGUAGE 'sql' IMMUTABLE STRICT; I would like for the above to be a little bit more dynamic in that the start_timestamp and the code can be input-fields. eg: CREATE OR REPLACE FUNCTION foo_func(fromdate timestamp, todate timestamp, code text) RETURNS SETOF foo AS $BODY$ SELECT TRH.ID, TRH.data1, TRH.data2, FROM D INNER JOIN TS ON TS.id = D.id inner join TRH on ts.id = trh.id WHERE D.start_timestamp BETWEEN fromdate AND todate And D.code IN (code) $BODY$ LANGUAGE 'sql' IMMUTABLE STRICT; How can I go about this this? The above will fail due to missing columns fromdate/todate/code. Or should I use plpgsql as SQL cannot handle variable substitution? What about doing dynamic SQL eg: Dsql = select X,Y,Z from foo, join bar on bar.a = foo.a where D.start_timestamp between ' || fromdate ||' and ' || todate||' execute DSQL Thanks for any/all help. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Alter sequence restart with selected value...
On 9/18/07, Jeff Ross <[EMAIL PROTECTED]> wrote: > I'm using copy to insert a bunch of rows into a new table with a unique > primary key. Copy is correctly incrementing the primary key, but > apparently the sequence itself is never updated because when I go to > insert again I get a constraint violation. Try using setval. select setval('seqname',select max(id) from tablename)); ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] update command question?
i have the log analysis report by using the pgFouine. in the part of Slowest queries, the update commands " update com_status set travel=620.70001220703 where no=1" have the high Av.duration(s). how should i do to solve this problem? why this simple command can take up the most time? Av.duration(s) :21.06 Times executed: 9 total duration : 3m9s ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] NOT NULL Issue
Tom Lane wrote: "Gustav Lindenberg" <[EMAIL PROTECTED]> writes: Why is '' not considered null is postgres (8.1.3) Because they're different. The SQL spec says that an empty string is different from NULL, and so does every database in the world except Oracle. Oracle, however, does not define the standard. If people would think of it in terms of an address it might make more sense to them. An empty string has an address, so can a string, integer and so on. When you think of NULL, think of it in the context of a NULL address. It's not addressable, it's nothing, it's not set, it's not there. I know it's not 100% accurate, but I think it helps folks understand the concept. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] ON INSERT => execute AWK/SH/EXE?
Hi there, I am new to PostgreSQL, is it possible to create something so that 1) If I insert / update / delete an item from my DB... 2) ... an awk / shell / external program is executed in my UNIX System? If yes, how do I do this and if no, thanks for telling. Thanks for reading, any help is appreciated.
Re: [GENERAL] keeping 3 tables in sync w/ each other
> Can I use inheritance? References? inheritance ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Alter sequence restart with selected value...
I'm using copy to insert a bunch of rows into a new table with a unique primary key. Copy is correctly incrementing the primary key, but apparently the sequence itself is never updated because when I go to insert again I get a constraint violation. Here's the start of the new table creation: CREATE TABLE training_programs ( trg_prg_id integer NOT NULL primary key DEFAULT nextval('training_programs_trg_prg_id_seq'), I then use copy (select about half the columns in the original table) to '/tmp/training_programs.txt' and then copy training_programs from '/tmp/training_programs.txt' After this I get the following: wykids=# select max(trg_prg_id) from training_programs; max -- 4893 wykids=# select nextval('training_programs_trg_prg_id_seq'); nextval - 1 (1 row) Since I'm doing this against a copy of a live database in preparation for running it against the real thing, I never know how many records will be in training_programs. I'm trying, then, to do something like this: alter sequence training_programs_trg_prg_id_seq restart with (select (max(trg_prg_id) + 1) from training_programs); but that isn't working. Thanks in advance for any help! Jeff Ross ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] read-only queries on PITRslaves, any progress?
On 9/18/07, Merlin Moncure <[EMAIL PROTECTED]> wrote: > Florian has been hard at work on getting the lazy xid assignment patch > in for 8.3, which is now complete. AIUI, this is an important step > towards read only pitr slave (with other advantages too), and shows > that things are moving along. I don't have a crystal ball, but I'm > guessing you will start to hear more about this when the 8.3 beta is > wrapped. I was hoping for a tighter timeframe, but this is good to know, thanks. Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Inline Function documentation
Hi! All, I have come cross many posting that talks about INLINE FUNCTION and their utility in helping optimizer decide the right plan. However, other than few posting in different forums I haven't been able to get my hands on any sort of documentation that explains which function can be converted to an INLINE function. I understand it has to be a simple function and inline_function in backend\optimizer\util\clauses.c gave me some more pointers but I was wondering if there is more comprehensive documentation available anywhere. Thanks, Brijesh
Re: [GENERAL] Tsearch2 - spanish
prueba=# select to_tsvector('espanol','melón'); ERROR: Affix parse error at 506 line and prueba=# select lexize('sp','melón'); lexize - {melon} (1 row) Looks very strange, can you provide list of dictionaries and configuration map? I tried many dictionaries with the same results. Also I change the codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to iso88591") and got the same error where can I investigate for resolve about this problem? My dictionary at 506 line had: Where do you take this file? And what is encdoing/locale setting of your db? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] stability issues
On 9/18/07, Willy-Bas Loos <[EMAIL PROTECTED]> wrote: > Hi, > > I'm running a PostgreSQL 8.1.9 server on Debian etch (default installation, > w/ PostGIS from Debian repositories). > The machine has double Xeon 5130, 4 GB of DDR2 ECC Reg. RAM and a > two-sata-disc Raid 0 cluster. I don't know the brand of the memory nor the > HDDs. The Riad controller is a 3Ware 8006-2. > > Lately i've been confronted by some failures in several PostgreSQL clusters. > 3 weeks ago a cluster crashed when i was reindexing it, beond my repair. > Last week we've been getting Toast warnings and shared index problems on > another cluster. I solved the shared index issue, but the Toast trouble > remained, in more than one database on that cluster. I dumped the databases, > created another cluster and shut the troubled one down. (problems were > solved). > I've asked our provider to do a filesystem check, (e2fsck -f /dev/sda1) > which indicated that "there's nothing wrong with the filesystem". Too little really. This system needs to have serious diagnostics run on it to find out what the problem is. > This doesn't feel right. > Is there anything else i should check? Yes. If you can't get them to run real tests with memtest86 etc, then try running very large compiles, like the linux kernel with -j4 or -j8 and watch for sig 11s while doing it. Set it up to run the compiles in a look looking for errors in compiling. Better yet, find a hosting provider who knows what they're doing. Your data / uptime are worth it. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] stability issues
"Willy-Bas Loos" <[EMAIL PROTECTED]> writes: > I'm running a PostgreSQL 8.1.9 server on Debian etch (default installation, > w/ PostGIS from Debian repositories). > The machine has double Xeon 5130, 4 GB of DDR2 ECC Reg. RAM and a > two-sata-disc Raid 0 cluster. I don't know the brand of the memory nor the > HDDs. The Riad controller is a 3Ware 8006-2. > Lately i've been confronted by some failures in several PostgreSQL clusters. > 3 weeks ago a cluster crashed when i was reindexing it, beond my repair. > Last week we've been getting Toast warnings and shared index problems on > another cluster. I solved the shared index issue, but the Toast trouble > remained, in more than one database on that cluster. I dumped the databases, > created another cluster and shut the troubled one down. (problems were > solved). > I've asked our provider to do a filesystem check, (e2fsck -f /dev/sda1) > which indicated that "there's nothing wrong with the filesystem". > This doesn't feel right. No, it sure doesn't. It sounds to me like the hardware is getting flaky. Memory tests might be the first thing to run. There's also the old clean-and-reseat-all-the-boards-and-connectors exercise... regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Q:Aggregrating Weekly Production Data. How do you do it?
On 9/17/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote: > Just wondering how everyone is doing aggregration of production data. Where I work, we aggregate by the minute in the application, then dump those stats entries into the database en masse from each machine. So, we wind up with rows something like this: (simplified) id | parentid | host | timestamp | request | total | success | fail1 | fail2 | fail3 | totalresptime | maxresptime 1 | NULL | jboss01 | 2007-09-17 12:02:03 | authcc | 120 | 112 | 4 | 2 | 2 | 48 | 12039 2 | 1 | jboss01 | 2007-09-17 12:02:03 | ccconnect | 120 | 118 | 0 | 0 | 2 | 423900 | 10394 where the id comes from a sequence, and parent ID ties our children stats to their parents. I.e. in this example authcc called ccconnect, and so on. The fail1/2/3 are types of failures, the totalresptime time is all the successful requests added together, so that totalresptime/success = average for that minute, and max is the longest running request ending in that minute. Then we can aggregate those minutes together, monitor individual machine performance, etc. Ours is actually more complex than this, but you get the idea. We have a cron job that checks the statistics every x minutes for high failure rates and have it generate an alert email if any of our requests go over a preset threshold. This catches problems long before anything shows up interesting in the logs most of the time. > (getting min/max/count isn't much of an issue. Stdev is the main issue I > believe) > > One such instance I've read about is.. Isn't stddev() working for you? What I do is aggregate the minute time slices by grouping by date_trunc('xxx',timestamp) and then use that as a subselect to a query that does the stddev() outside of that. works pretty well for us, and makes it easy to identify trends. One of the things we started doing is to aggregate the last six weeks data by the day / hour and then comparing the last 24 hours worth of data to those six weeks worth to see trends for each hour of each day. The queries are huge and ugly, but they work, and run in about 4 minutes on a fairly good sized chunk of data. We have about 150k to 300k entries a day put into this db. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] read-only queries on PITRslaves, any progress?
On 9/18/07, Alexander Staubo <[EMAIL PROTECTED]> wrote: > On 9/15/07, Marinos Yannikos <[EMAIL PROTECTED]> wrote: > > this might not be the best place to ask, but has there been any further > > progress or an alternative project to Florian Pflug's SoC 2007 proposal > > about enabling PITR slaves to serve read-only queries? It seems like an > > elegant way to load-balance a PostgreSQL database with very little overhead. > > I too would love to hear about this. Florian has been hard at work on getting the lazy xid assignment patch in for 8.3, which is now complete. AIUI, this is an important step towards read only pitr slave (with other advantages too), and shows that things are moving along. I don't have a crystal ball, but I'm guessing you will start to hear more about this when the 8.3 beta is wrapped. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
In response to "Phoenix Kiula" <[EMAIL PROTECTED]>: > > If you find that reindexing improves performance, then you should > > investigate further. Depending on the exact nature of the problem, > > there are many possible solutions, three that come to mind: > > * Add RAM/SHM > > Can I add SHM with merely by managing the entry in sysctl.conf? My > current values: > > kernel.shmmax = 536870912 > kernel.shmall = 536870912 These values define the max allowed. They exist to keep poorly written applications from sucking up all the available memory. Setting them higher than is needed does not cause any problems, unless a greedy or poorly-written application grabs all that memory. > My "shared_buffers" in postgresql.conf is "2". From the website > http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax > should be sharedbuffer*8192, so I suppose my shmmax can be much lower > than the above, but I raised it for performance. Am I wrong to do so? It's completely impossible to tell without knowing more about your physical hardware. The rule of thumb is 1/3 physical RAM to start, then adjust if more or less seems to help. That advice is for versions of PG >= 8. If you're still running a 7.X version, upgrade. How much RAM does this system have in it? Unless you have other applications running on this system using RAM, you should allocate more of it to shared_buffers. If 160M is 1/3 your RAM, you probably need to add more RAM. How big is your database? If it's possible to fit it all in shared_buffers, that will give you the best performance. > > * REINDEX on a regular schedule > > This is sadly not really feasible, because we need to offer a 100% > availability website. REINDEX does not work concurrently so it is not > really an option for us. My max_fsm_pages and max_fsm_relations are > way above the numbers that come up after the VACUUM ANALYZE VERBOSE > run. Hence my comment about "depending on your workload" and "investigating the situation" to determine the best solution. > But still, the autovacuum stuff seems like it is not working at all. > Some related entries in the conf file: > > autovacuum = on > autovacuum_vacuum_cost_delay = 20 > vacuum_cost_delay= 20 > autovacuum_naptime = 30 > stats_start_collector= on > stats_row_level = on > autovacuum_vacuum_threshold = 80 > autovacuum_analyze_threshold = 80 > > And yet, the db often slows down, at which point I manually login and > run a manual VACUUM ANALYZE and it seems fine for some more time. > Sometimes, I also restart pgsql and that seems to help for a while. You don't mention *_scale_factor settings. Those are going to be important as well. Based on your symptoms, it sounds like autovacuum is not getting those tables vacuumed enough. I recommend raising the debug level and watching the logs to see if autovacuum is actually getting tables vacuumed. Consider lowering your *_scale_factor values if not. Or even reducing autovacuum_naptime. > Another advice on these forums is to see "vmstat 1", without actually > specifying how to draw inferences from it. The "free" version of it is > coming up at decent rates, as follows: > > > procs ---memory-- ---swap-- -io --system-- cpu > r b swpd free buff cache si sobibo incs us sy id wa > 1 0 29124 110760 108980 346773601 206 1400 4 2 1 85 > 12 > 0 0 29124 110632 108980 346773600 0 0 1052 108 0 0 100 > 0 > 2 0 29124 108840 108980 346773600 0 0 1112 299 1 1 98 > 0 > 1 0 29124 109288 108980 346773600 0 0 1073 319 2 1 98 > 0 > . Explaining how to interpret the output of this command and determine what to do with it is not something easily done in a short paragraph. However, it looks like you've got a lot of RAM being used for the disk cache. That memory would probably be better used as shared_buffers, so I suggest you increase that value considerably. > > * (with newer version) reduce the fill factor and REINDEX > > I think some of my tables are updated very frequently so a smaller > fill factor will be nice. How can I find the current fill factor on my > tables? Also, is there some method or science to calculating a decent > fill factor -- size of table, number of indexes, frequency of updates, > and such? We have one major table which faces a lot of INSERTs and > UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10 > million). I'm not sure how to find the current value, but a smaller fill factor on busy tables should lead to less fragmentation, thus more efficient indexes over time. Keep in mind that a smaller fill factor will also lead to larger indexes initially. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Q:Aggregrating Weekly Production Data. How do you do it?
Ow Mun Heng wrote: The results are valid (verified with actual data) but I don't understand the logic. All the Statistical books I've read marked stdev as sqrt (sum(x - ave(x))^2 / (n - 1). The formula is very different, hence the confusion. A formula is not an algorithm. In particular, the naive way of calculating variance or standard deviation has massive numerical instability problems - anything involving sums of squares does. There are a variety of alternate algorithms for stddev/variance, I presume your other algorithm is similarly trying to avoid these same issues (but I have not looked closely at it). You can also see Wikipedia for one of the most well known, due to Knuth/Wellford: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance - John D. Burger MITRE ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Hi, > Now, I can merrily increase the shared_buffers, but the manual warns > me against increasing the value too much because it is "per > transaction" value. Shared_buffers is not per-transaction. Where did you find this information? - Sander ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] read-only queries on PITRslaves, any progress?
On 9/15/07, Marinos Yannikos <[EMAIL PROTECTED]> wrote: > this might not be the best place to ask, but has there been any further > progress or an alternative project to Florian Pflug's SoC 2007 proposal > about enabling PITR slaves to serve read-only queries? It seems like an > elegant way to load-balance a PostgreSQL database with very little overhead. I too would love to hear about this. Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 18/09/2007, Sander Steffann <[EMAIL PROTECTED]> wrote: > Hi, > > > Can I add SHM with merely by managing the entry in sysctl.conf? My > > current values: > > > > kernel.shmmax = 536870912 > > kernel.shmall = 536870912 > > > > My "shared_buffers" in postgresql.conf is "2". From the website > > http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax > > should be sharedbuffer*8192, so I suppose my shmmax can be much lower > > than the above, but I raised it for performance. Am I wrong to do so? > > You need to configure the kernel so it allows processes to use more shared > memory. This does not mean that a process automatically uses it. For > PostgreSQL you will need to increase shared_buffers to make it use the extra > available shared memory. With your shared memory settings you can probably > increase shared_buffers to about 65000. > Thanks, the IPCS command shows me this: -- Shared Memory Segments keyshmid owner perms bytes nattch status 0x0052e2c1 6782976postgres 600176668672 2 Now, I can merrily increase the shared_buffers, but the manual warns me against increasing the value too much because it is "per transaction" value. So here's the conflict for a novice like me: 1. Do not increase shared_buffer too much because it is per-transaction. 2. Do increase the SHM for performance, but it is only useful if you also increase shared_buffer. So which is it? Would it help to increase the effective_cache_size? It is currently at "512000". I have 4GB ram on the machine, but am willing to devote about 2GB to pgsql. Thanks! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Q:Aggregrating Weekly Production Data. How do you do it?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/17/07 23:34, Ow Mun Heng wrote: > Just wondering how everyone is doing aggregration of production data. > > Data which runs in the vicinity of a few million a week. > > What are the methods which will effectively provide the > min/max/average/count/stdev of the weekly sample size based on different > products/software mix etc. > > and still be able to answer correctly, what's the average of data_1 over > the pass 2 months? That's the purpose of data warehouses and ETL, isn't it? - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG78prS9HxQb37XmcRAhhBAKCGoBYox6azDqxQpEbvMo/Zya8cAACgy5Y6 XtrDC35IE0TOcD29Iziorfs= =XCZw -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Recall: August Monthly techdata split fi
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/18/07 05:54, Ow Mun Heng wrote: > On Tue, 2007-09-18 at 06:52 -0400, Geoffrey wrote: >> [EMAIL PROTECTED] wrote: >>> Robert Wickert would like to recall the message, "August Monthly >>> techdata split file printers for France and Denmark ". >> In my experience, attempting to 'recall' an email message is a fruitless >> endeavor. Seems to me that this is a 'Microsoft' creation. I really >> don't understand the purpose, because by the time you consider >> 'recalling' the email message, it's already sitting in 1000s of inboxes... > > It works for people that has the message _still_ in their exchange > boxes. For those that don't, then... all is lost unfortunately. Actually, it even works if you fetch mails from the Exchange server to your local box. But, as another poster mentioned, you still have to be attached to an Exchange domain for it to work. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG78l2S9HxQb37XmcRAjmlAJ9pHW2sDN/c2y6Rng+mzv3te+h2LgCdFZqw QYs2/bEoaWFlMW0+priEzTs= =i/MW -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Hi, > Can I add SHM with merely by managing the entry in sysctl.conf? My > current values: > > kernel.shmmax = 536870912 > kernel.shmall = 536870912 > > My "shared_buffers" in postgresql.conf is "2". From the website > http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax > should be sharedbuffer*8192, so I suppose my shmmax can be much lower > than the above, but I raised it for performance. Am I wrong to do so? You need to configure the kernel so it allows processes to use more shared memory. This does not mean that a process automatically uses it. For PostgreSQL you will need to increase shared_buffers to make it use the extra available shared memory. With your shared memory settings you can probably increase shared_buffers to about 65000. With the 'ipcs' command you can see how much shared memory PostgreSQL uses. Look under 'Shared Memory Segments' to memory owned by user postgres. - Sander ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] New PostgreSQL RPM sets are available for Fedora / RHEL
Hi, The PostgreSQL New RPM Sets 2007-09-17 Versions: 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20 Set labels: 8.2.5-1PGDG, 8.1.10-1PGDG, 8.0.14-1PGDG, 7.4.18-1PGDG, 7.3.20-1PGDG - - Release Info: The PostgreSQL RPM Building Project has released RPMs for 7.3.20, 7.4.18 8.0.14, 8.1.10 and 8.2.5 and they are available in main FTP site and its mirrors. We currently have RPMs for: - Fedora 7 - Fedora 7-x86_64 - Red Hat Enterprise Linux Enterprise Server 3.0-x86_64 - Red Hat Enterprise Linux Enterprise Server 4 - Red Hat Enterprise Linux Enterprise Server 4-x86_64 - Red Hat Enterprise Linux Advanced Server 4 - Red Hat Enterprise Linux Advanced Server 4-x86_64 - Red Hat Enterprise Linux 5 - Red Hat Enterprise Linux 5 x86_64 More may (will) come later. I want to thank every package builder for this great number of supported platforms. Support for Fedora Core 6 will be abandoned in future releases, when it reaches EOL. Please let us know if you can assist us in building RPMS of missing Red Hat / Fedora platforms. For complete list of changes in RPM sets, please refer to the changelogs in the RPMs. Use rpm -q -changelog package_name for querying the changelog. Point releases generally do not require a dump/reload from the previous point, but please see the Release Notes to confirm procedures for upgrading, especially if your current version is older than the last point release. The SRPMs are also provided. Please note that we have one SRPM for all platforms. We also have a howto document about RPM installation of PostgreSQL: http://pgfoundry.org/docman/?group_id=148 Please follow the instructions before installing/upgrading. Most of the RPMs have been signed by the builder, and each directory contains CURRENT_MAINTAINER file which includes the name/email of the package builder and link to their PGP key. If you experience problems with the RPMs or if you have feature requests, please join pgsqlrpms-general ( at ) pgfoundry ( dot ) org More info about the list is found at: http://lists.pgfoundry.org/mailman/listinfo/pgsqlrpms-general The project page is: http://pgfoundry.org/projects/pgsqlrpms Please do not use these resources for issue running or using PostgreSQL once it is installed. Please download these files from: http://www.postgresql.org/ftp/binary/v8.2.5/linux/ http://www.postgresql.org/ftp/binary/v8.1.10/linux/ http://www.postgresql.org/ftp/binary/v8.0.14/linux/ http://www.postgresql.org/ftp/binary/v7.4.18/linux/ http://www.postgresql.org/ftp/binary/v7.3.20/linux/ Regards, -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [GENERAL] keeping 3 tables in sync w/ each other
Ow Mun Heng wrote: Hi, I have 3 tables foo foo_loading_source1 foo_loading_source2 which is something like create table foo (a int, b int, c int) create table foo_loading_source1 (a int, b int, c int) create table foo_loading_source2 (a int, b int, c int) Is there a way which can be made easier to keep these 3 tables DDL in sync? the loading_sourceX tables are just a temporary-in-transit table for data \copy'ied into the DB before being inserted into the main foo table. Since these are temporary tables, why don't you just create them on the fly as temporary tables? CREATE TEMPORARY TABLE foo_loading_source1 (LIKE foo); CREATE TEMPORARY TABLE foo_loading_source2 (LIKE foo); Then do your loading process. Then you don't really have to worry about maintaining the loading tables at all. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
> Unfortunately, folks like Phoenix are looking for yes/no answers, and > with many of these questions, the _correct_ answer is "it depends on > your workload" I wanted merely to simplify the advice that gets dispensed on this list, often conflicting to novice ears like mine. So I appreciate your notes very much. > If you find that reindexing improves performance, then you should > investigate further. Depending on the exact nature of the problem, > there are many possible solutions, three that come to mind: > * Add RAM/SHM Can I add SHM with merely by managing the entry in sysctl.conf? My current values: kernel.shmmax = 536870912 kernel.shmall = 536870912 My "shared_buffers" in postgresql.conf is "2". From the website http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax should be sharedbuffer*8192, so I suppose my shmmax can be much lower than the above, but I raised it for performance. Am I wrong to do so? > * REINDEX on a regular schedule This is sadly not really feasible, because we need to offer a 100% availability website. REINDEX does not work concurrently so it is not really an option for us. My max_fsm_pages and max_fsm_relations are way above the numbers that come up after the VACUUM ANALYZE VERBOSE run. But still, the autovacuum stuff seems like it is not working at all. Some related entries in the conf file: autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay= 20 autovacuum_naptime = 30 stats_start_collector= on stats_row_level = on autovacuum_vacuum_threshold = 80 autovacuum_analyze_threshold = 80 And yet, the db often slows down, at which point I manually login and run a manual VACUUM ANALYZE and it seems fine for some more time. Sometimes, I also restart pgsql and that seems to help for a while. Another advice on these forums is to see "vmstat 1", without actually specifying how to draw inferences from it. The "free" version of it is coming up at decent rates, as follows: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 29124 110760 108980 346773601 206 1400 4 2 1 85 12 0 0 29124 110632 108980 346773600 0 0 1052 108 0 0 100 0 2 0 29124 108840 108980 346773600 0 0 1112 299 1 1 98 0 1 0 29124 109288 108980 346773600 0 0 1073 319 2 1 98 0 . > * (with newer version) reduce the fill factor and REINDEX > I think some of my tables are updated very frequently so a smaller fill factor will be nice. How can I find the current fill factor on my tables? Also, is there some method or science to calculating a decent fill factor -- size of table, number of indexes, frequency of updates, and such? We have one major table which faces a lot of INSERTs and UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10 million). Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] statements of an unfinished transaction
On Tue, Sep 18, 2007 at 02:04:32PM +0300, Sabin Coanda wrote: > Hi there, > > I'd like to find the sessions that provide unclosed transactions (i.e. > marked as in transaction). > > Is any way to find the SQL statements that belong to such a transaction, or > the transaction time start, or any other helpful data ? pg_stat_activity has that, surely... Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
[GENERAL] statements of an unfinished transaction
Hi there, I'd like to find the sessions that provide unclosed transactions (i.e. marked as in transaction). Is any way to find the SQL statements that belong to such a transaction, or the transaction time start, or any other helpful data ? TIA, Sabin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Recall: August Monthly techdata split fi
Ow Mun Heng wrote: On Tue, 2007-09-18 at 06:52 -0400, Geoffrey wrote: [EMAIL PROTECTED] wrote: Robert Wickert would like to recall the message, "August Monthly techdata split file printers for France and Denmark ". In my experience, attempting to 'recall' an email message is a fruitless endeavor. Seems to me that this is a 'Microsoft' creation. I really don't understand the purpose, because by the time you consider 'recalling' the email message, it's already sitting in 1000s of inboxes... It works for people that has the message _still_ in their exchange boxes. For those that don't, then... all is lost unfortunately. My point exactly. -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Recall: August Monthly techdata split fi
On 9/18/07, Geoffrey <[EMAIL PROTECTED]> wrote: > In my experience, attempting to 'recall' an email message is a fruitless > endeavor. Seems to me that this is a 'Microsoft' creation. I really > don't understand the purpose, because by the time you consider > 'recalling' the email message, it's already sitting in 1000s of inboxes... It's an Exchange "feature". It only works if you send the email within the Exchange domain. So, yeah, it's a brain-dead feature... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Recall: August Monthly techdata split fi
On Tue, 2007-09-18 at 06:52 -0400, Geoffrey wrote: > [EMAIL PROTECTED] wrote: > > Robert Wickert would like to recall the message, "August Monthly > > techdata split file printers for France and Denmark ". > > In my experience, attempting to 'recall' an email message is a fruitless > endeavor. Seems to me that this is a 'Microsoft' creation. I really > don't understand the purpose, because by the time you consider > 'recalling' the email message, it's already sitting in 1000s of inboxes... It works for people that has the message _still_ in their exchange boxes. For those that don't, then... all is lost unfortunately. Bitten... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Recall: August Monthly techdata split fi
[EMAIL PROTECTED] wrote: Robert Wickert would like to recall the message, "August Monthly techdata split file printers for France and Denmark ". In my experience, attempting to 'recall' an email message is a fruitless endeavor. Seems to me that this is a 'Microsoft' creation. I really don't understand the purpose, because by the time you consider 'recalling' the email message, it's already sitting in 1000s of inboxes... -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 9/18/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote: > On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote: > > * (with newer version) reduce the fill factor and REINDEX > > What is fill factor? See "Index Storage Parameters": http://www.postgresql.org/docs/8.2/static/sql-createindex.html ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] FW: August Monthly techdata split file p
ok they're both done -Original Message- From: Robert Wickert Sent: 18 September 2007 11:05 To: Gurvan Meyer Subject: RE: August Monthly techdata split file printers for France and Denmark the reports can always be found here .. \\147.114.32.180\monthly_sales_reports\ just find the year and month and you're off! denmark is finished and france is still being run -Original Message- From: Gurvan Meyer Sent: 18 September 2007 09:51 To: Robert Wickert Subject: RE: August Monthly techdata split file printers for France and Denmark And when it's done , could you please tell me where you stored it . thanks gurvan -Original Message- From: Robert Wickert Sent: September 18 2007 09:33 To: Gurvan Meyer Subject: RE: August Monthly techdata split file printers for France and Denmark sorry saw it in title .. will run right now -Original Message- From: Gurvan Meyer Sent: 17 September 2007 17:21 To: Robert Wickert Cc: Nathalie Bourgoin; Eric Pouget Subject: August Monthly techdata split file printers for France and Denmark Importance: High Hello Robert We need urgently August Monthly Techdata Split file printers for France and Denmark . Thanks a lot . Gurvan * Robert Wickert Senior Software Developer CONTEXT Tel:+44 (0)20 8394 7739 Fax:+44 (0)20 8394 7701 Email Address: [EMAIL PROTECTED] Web Site: http://www.contextworld.com * This message and the information contained therein is intended for the use of the person(s) ("the intended recipient(s)" to whom it is addressed. It may contain information that is privileged and confidential within the meaning of applicable law. If you are not the intended recipient, please contact the sender as soon possible. The views expressed in this communication may not necessarily be the views held by Context or its subsidiaries. The contents of an attachment to this e-mail may contain viruses that could damage your own computer system. While every reasonable precaution has been taken to minimise this risk, Context and its subsidiaries cannot accept liability for any damage which you sustain as a result of software viruses. You should carry out your own virus checks before opening the attachment. Please notice that Context monitors e-mails sent or received. Further communication will signify your consent to this. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] JOIN with ORDER on both tables does a sort when it souldn't
Tom Lane <[EMAIL PROTECTED]> wrote: > Dániel Dénes <[EMAIL PROTECTED]> writes: > > But even then, it won't realize that the result are in correct > > order, and does a sort! Why? > > In general the output of a nestloop doesn't derive any ordering > properties from the inner scan. It might happen to work in your > particular case because on the outer side (site_id, order) is unique > and so the "order" values must be strictly increasing. But if there > could be multiple rows with the same "order" value coming from the > outer side, then it would be incorrect to claim that the join output is > sorted by (outer.order, inner.order). > > It's possible that the planner could be taught to recognize this > situation, but it looks to me like doing that would result in drastic > increases in planning time for many queries (due to having to consider > a lot more Paths) with a resulting win in only a very few. > > regards, tom lane When you wrote this answer, I thought maybe it's really a one-time problem, and it's not worth spending much time on it, because the tables involved had 10-100 rows, so a sort wasn't really that scary; I just wanted to know the cause. But now I ran into this again. There are 2 tables involved (simplified): banners_places: - id integer (PKEY) - pageid integer (FKEY to a table not involved now) - place text UNIQUE KEY: (pageid, place) banners_show: - id integer (PKEY) - bplid integer (FKEY to banners_places.id) - uptime timestamp INDEX: (bplid, uptime) My query is: SELECT * FROM banners_places AS bpl JOIN banners_show AS bsh ON bsh.bplid = bpl.id WHERE bpl.pageid = 123 ORDER BY bpl.place, bsh.uptime To me it looks like the best plan would be to get the desired rows from banners_places and then do a NestLoop join using the index on banners_show. This way no sorting should be necessary. But even though I forced PG to do my plan (disabled almost every alternative), the sort is there: Sort Sort Key: bpl.place, bsh.uptime -> Nested Loop -> Index Scan using bpl_UNIQUE on banners_places bpl Index Cond: (pageid = 123) -> Index Scan using bsh_INDEX on banners_show bsh Index Cond: (bsh.bplid = "outer".id) Are you sure this can't be fixed without drastically increasing planning time? Or is there a way I can make this query not to do a sort? Regards, Denes Daniel ___ Légy mindig trendi és naprakész - olvass magazinokat a mobilodon Mobizinnel! www.t-mobile.hu/mobizin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Recall: August Monthly techdata split fi
Robert Wickert would like to recall the message, "August Monthly techdata split file printers for France and Denmark ". * Robert Wickert Senior Software Developer CONTEXT Tel:+44 (0)20 8394 7739 Fax:+44 (0)20 8394 7701 Email Address: [EMAIL PROTECTED] Web Site: http://www.contextworld.com * This message and the information contained therein is intended for the use of the person(s) ("the intended recipient(s)" to whom it is addressed. It may contain information that is privileged and confidential within the meaning of applicable law. If you are not the intended recipient, please contact the sender as soon possible. The views expressed in this communication may not necessarily be the views held by Context or its subsidiaries. The contents of an attachment to this e-mail may contain viruses that could damage your own computer system. While every reasonable precaution has been taken to minimise this risk, Context and its subsidiaries cannot accept liability for any damage which you sustain as a result of software viruses. You should carry out your own virus checks before opening the attachment. Please notice that Context monitors e-mails sent or received. Further communication will signify your consent to this. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote: > * (with newer version) reduce the fill factor and REINDEX What is fill factor? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] stability issues
Hi, I'm running a PostgreSQL 8.1.9 server on Debian etch (default installation, w/ PostGIS from Debian repositories). The machine has double Xeon 5130, 4 GB of DDR2 ECC Reg. RAM and a two-sata-disc Raid 0 cluster. I don't know the brand of the memory nor the HDDs. The Riad controller is a 3Ware 8006-2. Lately i've been confronted by some failures in several PostgreSQL clusters. 3 weeks ago a cluster crashed when i was reindexing it, beond my repair. Last week we've been getting Toast warnings and shared index problems on another cluster. I solved the shared index issue, but the Toast trouble remained, in more than one database on that cluster. I dumped the databases, created another cluster and shut the troubled one down. (problems were solved). I've asked our provider to do a filesystem check, (e2fsck -f /dev/sda1) which indicated that "there's nothing wrong with the filesystem". This doesn't feel right. Is there anything else i should check? thx, WBL
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
"Filip Rembiałkowski" <[EMAIL PROTECTED]> wrote: > > 2007/9/18, Joshua D. Drake <[EMAIL PROTECTED]>: > > > If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If > > you do not overrun your max_fsm_pages, yes vacuum analyze can deal with > > the issue. > > Are you sure? I have a situation where above is no true. postgres > version 8.1.8. while vacuum verbose says: > > INFO: free space map contains 2329221 pages in 490 relations > DETAIL: A total of 2345744 page slots are in use (including overhead). > 2345744 page slots are required to track all free space. > Current limits are: 1000 page slots, 1000 relations, using 58698 KB. > > ... and we have constant problem with index bloat and need to REINDEX > frequently. > > the database is very redundant and has quite hight data retention rate > (it's an ecommerce site) I've been involved in a number of the discussions on this, and I think part of the confusion stems from the fact that "index bloat" is an ambiguous term. If the index gets large enough that it no longer fits in shared memory, and reindexing it will reduce its size to where it _will_ fit in shared memory, then the index _could_ be said to be "bloated". However, an equally valid solution to that problem is to increase the amount of shared memory available (possibly by adding RAM). Unfortunately, folks like Phoenix are looking for yes/no answers, and with many of these questions, the _correct_ answer is "it depends on your workload" If you find that reindexing improves performance, then you should investigate further. Depending on the exact nature of the problem, there are many possible solutions, three that come to mind: * Add RAM/SHM * REINDEX on a regular schedule * (with newer version) reduce the fill factor and REINDEX -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
2007/9/18, Joshua D. Drake <[EMAIL PROTECTED]>: > If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If > you do not overrun your max_fsm_pages, yes vacuum analyze can deal with > the issue. Are you sure? I have a situation where above is no true. postgres version 8.1.8. while vacuum verbose says: INFO: free space map contains 2329221 pages in 490 relations DETAIL: A total of 2345744 page slots are in use (including overhead). 2345744 page slots are required to track all free space. Current limits are: 1000 page slots, 1000 relations, using 58698 KB. ... and we have constant problem with index bloat and need to REINDEX frequently. the database is very redundant and has quite hight data retention rate (it's an ecommerce site) -- Filip Rembiałkowski ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] help w/ SRF function
On 9/17/07, Ow Mun Heng <[EMAIL PROTECTED]> wrote: > > > CREATE OR REPLACE FUNCTION foo_func(fromdate timestamp, todate > > > timestamp, code text) > > > LANGUAGE 'sql' IMMUTABLE STRICT; > > But If I were to use ALIASINg, I get an error > > > > eg: DECLARE > > DECLARE > > fromdate ALIAS for $1; > > todate ALIAS for $2; > > code ALIAS for $3; > > > > > > ERROR: syntax error at or near "ALIAS" > > LINE 5: fromdate ALIAS for $1; > anyone knows how come I can't use the reference fromdate/todate etc or > use aliases but have to resort to using $1/$2 etc? You seem to be confusing SQL with PL/pgSQL. If you want variables, aliases, flow control etc instead of a simple macro, you need to use a procedural language. http://www.postgresql.org/docs/8.2/static/xfunc-sql.html http://www.postgresql.org/docs/8.2/static/plpgsql.html ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] keeping 3 tables in sync w/ each other
2007/9/18, Ow Mun Heng <[EMAIL PROTECTED]>: > Hi, > > I have 3 tables > > foo > foo_loading_source1 > foo_loading_source2 > > which is something like > > create table foo (a int, b int, c int) > create table foo_loading_source1 (a int, b int, c int) > create table foo_loading_source2 (a int, b int, c int) > > Is there a way which can be made easier to keep these 3 tables DDL in > sync? > > the loading_sourceX tables are just a temporary-in-transit table for > data \copy'ied into the DB before being inserted into the main foo > table. > > Currently, each time I add a new column to foo, I have to "remember" to > add the same to the other 2 table. > > Can I use inheritance? References? Inheritance might work in this case. But it will be a bit weird, because you will see non-constraint data in parent unless you will SELECT ... FROM ONLY parent Try this example: create table parent ( id serial, data1 text ); create table child () inherits( parent ); \d child alter table only parent add check ( data1 like '%fits parent' ); insert into parent(data1) select 'this data fits parent'; insert into child(data1) select 'this data was inserted to child'; select * from parent; select * from only parent; select * from child; alter table parent add column data2 text default 'new column default'; \d child -- Filip Rembiałkowski ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] pgcrypto: is an IV needed with pgp_sym_encrypt()?
On 9/18/07, Bill Moseley <[EMAIL PROTECTED]> wrote: > I'm just starting with pgcrypto, and I'm curious if it's > needed/recommended to use an initialization vector/value (IV) with > the pgp_sym_encrypt() function. > > The docs hint that an IV is used automatically, but encrypting plain > text that starts the same seems to result in initial common cipher > text. So, I'm not clear. Few bytes being same is normal. Those are PGP packet header, telling "this is symmetrically encrypted session key packet, with length X" plus some more details. Yout can use pgpdump (or www.pgpdump.net) to visualize packet structure. It does not show you IV but does show salt key S2K loop count, so you can check if those are randomized. Note that the random IV will be put into second packet, so quite far from start. > 2. Data is prefixed with block of random bytes. This is equal to > using random IV. > > So, I'm currently generating a substring of a md5 hash of a few items > and pre-pending that to the plain text I need to encrypt as the IV. > Then when I decrypt I remove that prefix. You could try with different (same-length) hashes, you'll still see that few bytes are same. Also, the PGP IV _must_ be there so to check you can always try decrypting with gnupg, to see if packet structure is sane. If more that few bytes are same, and if the salt is not randomised it _could_ be a sign of problem. Either pgcrypto bug or failure to initialize random generator. If you suspect a problem, please send me few example encryptions with keys and your setup details (postgres version, openssl or not, os version) > BTW, this is for credit card storage, which is a business requirement. > > Besides following the PCI DSS and external audit procedures, the plan > is to use pgcrypto (pgp_sym_encrypt() with AES-256) as part of a > credit card storage server. The server and db are SSL only and the > key is passed from the application and never stored anyplace (except > in memcached on other servers during the session). The key is a > user's plain text password plus an application-specific secret. So, > each row has its own key. Passwords must be changed periodically, > etc. I don't know details of your setup, but I strongly suggest you look into using public-key crypto. That allow you separate keys for encryption and decryption. So in webserver where users only input credit cards, you keep only public keys, so anybody cracking that won't be able to decrypt data. Also, if you only want to check if inputted credit card matches stored one, you don't need to store credit card at all, just store hash and then compare it with the hash of user-inputted one. For that it's preferable to use crypt() function with crypt-blowfish hash, which is couple of magnitudes stronger that MD5/SHA* for that purpose. > I'd welcome any comments or recommendations from others that have > implemented something similar. -- marko ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] RFC : best way to distrubute IO from queries (low end server)
Final specs for the server is just an ordinary desktop fitted w/ 3 7200rpm 500GB drives & 1 7200 80GB drive / 1 GB ram / 2G processor (single core) number of records will be between 3 to 30 million rows. Currently the process is 1. pull from mssql 2. \copy into PG temp table 3. insert into final table. current tables are distributed via tablespaces. (current test server is my laptop w/ 2 5400rpm drives hda & hdc.) and I'm already seeing the strain of the concurrent select/delete/insert/update and the additional "client" pull. So, I wanted to ask the list for advice on how to tread w/ regard to the server. Scenario 1. 1. temp table in it's own tablespace 2. final tables in it's own tablespace 3. pgxlog in the OS tablespace scenario 2 1. temp table in it's own tablespace 2. final tables in it's own tablespace (Read Only Copy) 3. final tables in it's own tablespace (Read write Copy) 4. pgxlog in the OS tablespace the idea of read and read/write copy is obtained from some presentation/article I read whereby, all updates of new data is inserted into the read/write copy up until 1 point (say lunch time / 5pm etc) whereby the read-write copy[3] will be "renamed" and made into a read-only copy and the previous read-only copy[2] will be made into a read-write copy. The only thing I can't wrap my head around is how to keep these 2 copies in sync. eg: when everything is being updated to [3] and users are querying [2] and then at the switch over instance, how will the new data be inputted to the read only copy? because if I switch back and forth like this, there's bound to be gaps in the data. Any comments appreciated. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster