Re: [GENERAL] Strange discrepancy in query performance...
Tom-right-as-usual: Yep - you were right about the query plan for the prepared statement (a sequential scan of the table) differed a bit from the directly-executed version :) For reference, when using JasperReports .jrxml files as the basis for the query, I only had to do to the following to 'force' postgres to treat the jasper report parameter as a number and not text, thereby allowing the correct index to be used: select * from city summary where city_master_id = $P{city_master_id}::bigint ... Query times went from 300+ seconds back down to ~100ms. -jason Tom Lane wrote: "Jason L. Buberel" <[EMAIL PROTECTED]> writes: In my syslog output, I see entries indicating that the JDBC-driver-originated query on a table named 'city_summary' are taking upwards of 300 seconds: Oct 1 18:27:47 srv3 postgres-8.2[1625]: [12-1] LOG: duration: 307077.037 ms execute S_42: select * from city_summary where state = $1 and city_master_id = $2 and res_type = 'single_family' and date = $3 and range = 90 and zip = $4 and quartile = '__ALL' DETAIL: parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL' However, if I run the same query on the same host at the same time that the Java application is running, but from the psql command line, it takes only 0.37 seconds: time /opt/postgres-8.2.4/bin/psql --port 54824 -U postgres -d altos_research -c 'select fact_id from city_summary where state = \'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = \'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = \'2007-09-28\';' This is not, in fact, the same query --- the JDBC-originated one is parameterized, which means it very possibly has a different plan (since the planner doesn't know the particular values to plan for). Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is being produced for the parameterized query. regards, tom lane
Re: [GENERAL] Strange discrepancy in query performance...
"Jason L. Buberel" <[EMAIL PROTECTED]> writes: > In my syslog output, I see entries indicating that the > JDBC-driver-originated query on a table named 'city_summary' are taking > upwards of 300 seconds: > Oct 1 18:27:47 srv3 postgres-8.2[1625]: [12-1] > LOG: duration: 307077.037 ms execute S_42: select * from city_summary > where state = $1 and city_master_id = $2 and res_type = > 'single_family' and date = $3 > and range = 90 and zip = $4 and quartile = '__ALL' > DETAIL: parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL' > However, if I run the same query on the same host at the same time that > the Java application is running, but from the psql command line, it > takes only 0.37 seconds: >>> time /opt/postgres-8.2.4/bin/psql --port 54824 -U postgres -d > altos_research -c 'select fact_id from city_summary where state = > \'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = > \'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = > \'2007-09-28\';' This is not, in fact, the same query --- the JDBC-originated one is parameterized, which means it very possibly has a different plan (since the planner doesn't know the particular values to plan for). Try using PREPARE and EXPLAIN EXECUTE to examine the plan that is being produced for the parameterized query. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Strange discrepancy in query performance...
I'm hoping that someone on the list can help me understand an apparent discrepancy in the performance information that I'm collecting on a particularly troublesome query. The configuration: pg-8.2.4 on RHEL4. log_min_duration_statement = 1m. In my syslog output, I see entries indicating that the JDBC-driver-originated query on a table named 'city_summary' are taking upwards of 300 seconds: Oct 1 18:27:47 srv3 postgres-8.2[1625]: [12-1] LOG: duration: 307077.037 ms execute S_42: select * from city_summary where state = $1 and city_master_id = $2 and res_type = 'single_family' and date = $3 and range = 90 and zip = $4 and quartile = '__ALL' DETAIL: parameters: $1 = 'CA', $2 = '291', $3 = '2007-09-28', $4 = '__ALL' However, if I run the same query on the same host at the same time that the Java application is running, but from the psql command line, it takes only 0.37 seconds: > time /opt/postgres-8.2.4/bin/psql --port 54824 -U postgres -d altos_research -c 'select fact_id from city_summary where state = \'CA\' and city_master_id = 291 and zip = \'__ALL\' and quartile = \'__ALL\' and res_type = \'single_family\' and range = \'90\' and date = \'2007-09-28\';' fact_id -- 46624925 (1 row) 0.00user 0.00system 0:00.37elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+285minor)pagefaults 0swaps The output of 'explain' seems to indicate that the right index is being used: QUERY PLAN Index Scan using city_summary_pkey on city_summary (cost=0.00..12.27 rows=1 width=2721) Index Cond: ((date = '2007-09-28'::text) AND (state = 'CA'::text) AND (city_master_id = 334::bigint) AND (quartile = '__ALL'::text) AND (range = '90'::text)) Filter: ((zip = '__ALL'::text) AND ((res_type)::text = 'single_family'::text)) (3 rows) The index looks like this: # \d city_summary_pkey Index "public.city_summary_pkey" Column | Type +- date | text state | text city_master_id | bigint zip_master_id | integer res_type_master_id | bigint quartile | text range | text primary key, btree, for table "public.city_summary" Any ideas on why I am seeing such a big difference between the two measurements (JDBC/syslog vs. command line)? Thanks, Jason ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] windows and pg 8.2 (change database to another server)
Magnus Hagander escribió: On Sun, Sep 30, 2007 at 11:36:23PM +0200, Terry Yapt wrote: First of all. I think this method is admisible. Isn't it ? It is. Glad to read it :-) And second question: I think my problem is that some rights are wrong after copying data folder. What are the right rights to apply to data folder ? Yes, most likely. You need to grant the postgres service account "Change" permissions (or Full Control, but Change is recommended) on the data directory. If you didn't do anything speicifically, it will just have inherited from further up in the tree, which means that the service account only has "Read" access. //Magnus I have tried a couple of combinations none of them was successful. I have tried to assign 'Full Control' to data folder and sub-folders and files. Varying this 'Full Control' preserving inheritance, deleting inheritance. I have tried to assign 'Full Control' to Administrators and SYSTEM accounts/groups too. I have tried to do the same thing over sub-folders, files and so on. Nothing was ok. I have got this application errors (in reverse cronological order, 1=more recent): = 1 postgres: could not find the database system Expected to find it in the directory "C:/Archivos de programa/PostgreSQL/8.2/data", but could not open file "C:/Archivos de programa/PostgreSQL/8.2/data/global/pg_control": Permission denied = 2 -2007-10-01 23:19:31 PANIC: could not open control file "global/pg_control": Permission denied = 3 --2007-10-01 23:17:28 FATAL: could not create lock file "postmaster.pid": Permission denied === Greetings... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Dump of table structure is not consistent
\d my.table is showing me the primary key as: "unitstat_pkey" PRIMARY KEY, btree (id) But the looking into the table structure produced by pg_dump -s -n my -t table db I'm getting gthe primary key shown as: ADD CONSTRAINT unistat_pkey PRIMARY KEY (id); That has been the name of it, but I altered it executing: ALTER TABLE unistat_pkey RENAME TO unitstat_pkey Solved it with this statement update pg_catalog.pg_constraint set conname = 'unitstat_pkey' where conname = 'unistat_pkey' Poul ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] sha1 function
On 01/10/2007, Martin Marques <[EMAIL PROTECTED]> wrote: > > 1) Is MD5's weakness true? Yes, but not really for using in a password functionality. You are very unlikely to repeat a password but in any case you will have the user ID to make it unique. > 2) Is there any sha1() functions in PostgreSQL? > It doesn't come with the default setup, but it's very simple to get. Take a look at this: http://snipr.com/pg_sha1 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Partitioned table limitation
On 10/1/07, Goboxe <[EMAIL PROTECTED]> wrote: > Hi, > > Are there any limitations on number of child tables that can be use > in > partitioned table? > > > I am currently having weekly partitioned tables (using partitioned > view in SQL Server) that I kept for 2 years. > In total, there will be 52 * 2 = 104 tables exist at one time in the > partition. > > > I am migrating from SQL Server to pg. Can pg support that number of > tables? > > > How about if I want to convert it as daily tables (356 * 2 years = > 712 > tables). > Is this recommended? I've played around with as many as 1,000 child tables. By then, the planning time becomes noticeably longer than for a single table, but the response time is still so much faster that it's worth it. Note I'm talking only a fraction of a second planning time, even at 1,000 tables. If you are going over 100 tables, make sure you're using triggers for updating the child tables not rules, as rules are far too slow when there's over 50 or so tables to look choose from. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Porting Schema from MySQL
On Oct 1, 2007, at 14:54 , Farhan Khan wrote: Any pointer for implementing this functionality in postgresql ?? Write a trigger that fires on update and replaces the value with CURRENT_TIMESTAMP. You could probably do something like this with rules as well, but the trigger method is probably more straightforward. Michael Glaesemann grzm seespotcode net ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username
On 9/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > I don't care if it's part of the SQL standard or not. I don't care if > oracle does it or not. You're losing mysql converts as they go > through the tutorial and get to this point. If that's all it takes for them to switch, seriously, I'd rather see them go. There are other solutions, like using a role instead of a user for the target of the permissions. > There's like a hundred posts asking for this for the past four years, > and all they get in response is standards nazi's saying it won't be > supported because it's not in the standard Ummm. no. I've never seen that response. I have seen plenty of people saying that no one's had the urge to hack it into working code. Also, a common answer is to use roles (i.e. groups) for such things. grant permission to the role, add users to the role, all done. > and telling them to write > their own functions. About 75% of the time I see that response, it comes with the actual code to do just that. I.e. cut and paste and voila, you've got the functions. > You write the function. Fuck the standard and > wake up. Me? What about you? The fact is there's a limited number of hackers capable of writing what you're asking for cleanly and correctly, and they're working on other stuff. Asking them politely has been know to work. Using the F word not so much. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Porting Schema from MySQL
Hi ... I am porting a db schema from MySQL to postgresql and having problem in defining a column level constraint... mtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, gives me problem at ^ON UPDATE ... Any pointer for implementing this functionality in postgresql ?? Tx .. Farhan
Re: [GENERAL] Find clusters containing a schema?
Josh Trutwin wrote: > Is it possible to somehow query the system catalog to find out which > clusters/databases have a certain schema? No, unless you connect to each database in turn. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Data cube in PostgreSQL
Hi, Le Wednesday 26 September 2007 20:58:38 Gowrishankar, vous avez écrit : > Is there any other way of extending postgresql to include cubes? Something like the cube contrib? http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/cube/ Contribs are part of PostgreSQL sources and maintained as such, but not included into the 'core' project. They are certainly distributed along with PostgreSQL in your packaging system of choice. Regards, -- dim ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] importing large files
Hi, Le Friday 28 September 2007 10:22:49 [EMAIL PROTECTED], vous avez écrit : > I need to import between 100 millions to one billion records in a > table. Each record is composed of two char(16) fields. Input format > is a huge csv file.I am running on a linux box with 4gb of ram. > First I create the table. Second I 'copy from' the cvs file. Third I > create the index on the first field. > The overall process takes several hours. The cpu seems to be the > limitation, not the memory or the IO. > Are there any tips to improve the speed ? If you don't need to fire any trigger and trust the input data, then you may benefit from the pgbulkload project: http://pgbulkload.projects.postgresql.org/ The "conditions of usage" may be lighter than what I think they are, though. Regards, -- dim ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PostgreSQL Conference Fall 2007, final schedule
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Matthew T. O'Connor wrote: > Joshua D. Drake wrote: >> The PostgreSQL Conference Fall 2007 to date, has been a huge success. Of >> course we haven't actually held the conference yet but already we have a >> strong line of speakers and sponsors confirmed. > [ snip ] > > I can't attend, but wish I could, is there going to be a web cast? Or > some way to watch it later / after the fact? > We are planning on recording all of the talks. Joshua D. Drake > Thanks, > > Matt > > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHAUlQATb/zqfZUUQRAqKCAKCsP1vdloallXCLhlHJ3nZxM6tN2ACfe8Gw 1RPgSO7Tu6x4FbaYUFO281o= =E6Hl -END PGP SIGNATURE- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PostgreSQL Conference Fall 2007, final schedule
Joshua D. Drake wrote: The PostgreSQL Conference Fall 2007 to date, has been a huge success. Of course we haven't actually held the conference yet but already we have a strong line of speakers and sponsors confirmed. [ snip ] I can't attend, but wish I could, is there going to be a web cast? Or some way to watch it later / after the fact? Thanks, Matt ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] usage of indexes for inner joins
thanks for all your useful comments. i will study all of them. a couple of inline comments below, just for clarification to the group, marked with asterisks. On Mon, 1 Oct 2007 13:13:23 -0500, "Scott Marlowe" <[EMAIL PROTECTED]> said: > On 10/1/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote: > > Scott, > > > > i didn't think this belonged in the general list, but the example i > > gave for discussion was a toy, for illustration. i could not very > > well post the actual example for many reasons, including proprietary > > ones and, given this is how things work, because the 1.5 million row > > table in question is its own smallest description. > > This is the exact kind of question that belongs on -general. But it > does get asked a lot, especially by people coming from other > databases. > > > while indexes are being used on that table, there's a companion > > table which is much smaller -- a "mere" 75000 rows -- which is > > suffering a sequential scan, and i was trying to eliminate those. > > Well, don't think of sequential scans as plain bad. Sometimes they're > the best choice, sometimes they're not. > > Now, if an index scan is provably and noticeably faster than the > sequential scan, then the planner is making the wrong decision. Have > you tried running your query with > > set enable_seqscan=off; ***actually, yes. the engine just ignored it.*** > > to see how it behaves? I've found many somewhat slow queries got > really fast or really slow when I did that. > > Note that you shouldn't blindly run a query all the time with that > setting, as there are many instances where seqscan is the right > answer. Also, your explain cost estimates will all be way off. > > > perhaps it is true that ANALYZE isn't being done often enough. > > perhaps VACUUMs aren't being done often enough either. we're leary > > of scheduling repeated VACUUMs having encountered a case where the > > VACUUM took over an hour to complete. > > Run "analyze verbose" on your db and see what it says about number of > page slots needed versus used. that will help you tell if you're > vacuuming enough. > > How long vacuum takes isn't really that important. What is important > is how much of an impact it's having on the system. there are > several vacuum parameters in the postgresql.conf file that can lower > the impact vacuum has on your system I/O wise while increasing its > run time. > > Vacuum full is another story. Think of it as a recovery tool, not a > periodic maintenance tool. > > > it may, too, be because the tables use user-defined types heavily > > and the original UPDATE involved a SELECT ... IN ... having a GROUP > > BY with a few references to columns deep within user-defined types. > > Hard to say without a query and an explain analyze output. It's > common for user defined functions to produce estimates in the > planner that are way off. user defined types, not so much. But the > more complex the query the more likely it is that the query planner > will make a bad estimate of the number of rows somewhere and choose > a bad method. > > > that wouldn't have been my choice, but, then, they were given to > > me to work, not my design. in fact, PG is the first relational > > database implementation i've used that offered such things in a > > big way. > > Extensibility is quite a useful tool. > > > i also don't understand some other things, which are surprising, > > like why some UPDATEs take so much longer when wrapped in a BEGIN > > TRANSACTION- COMMIT than when having the transaction at a statement > > level. > > that is strange. I'd expect that maybe you've got something happening > with the transaction waiting on other transactions, so that it's not > so much running hard as just tapping its toe waiting for the other > transaction to commit or roll back. *** yes, i thought it was odd, too. there wasn't anything else in that transaction, and the table was set up for an experiment. of course, the experiment was one of those "UPDATE foo SET foo.x = 1 + foo.x WHERE foo.y < k" things. *** > > > I come from an Oracle, DB2, Informix world, and in my experience > > plans for queries are more stable. i have loitered in and around > > MySQL for a while. i'm not surprised there's a learning curve with > > PG. i am surprised it breaks so marked with mainstay database > > experience. > > Oh, I've seen Oracle get stupid due to lack of proper statistics as > well. You like had a good DBA who kept all that stuff hidden from > you though. No comment, actually, since I worked right alongside the the DBA, and sometimes did things myself. > But PostgreSQL and mainline db experience are often > incompatible. The very things that people are used to using to make > other dbs fast (forcing index usage for instance) can make postgresql > noticeably slower. > > You might find that partial index help for some circumstances. If you > are using a query that has a where clause that looks at a field that >
[GENERAL] PostgreSQL Conference Fall 2007, final schedule
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, The PostgreSQL Conference Fall 2007 to date, has been a huge success. Of course we haven't actually held the conference yet but already we have a strong line of speakers and sponsors confirmed. If you haven't registered, now is your chance, just visit: http://www.postgresqlconference.org Schedule: * 8:00 - 8:45 - Coffee / Breakfast / Social / Wake up / Go back to hotel for socks (provided by conference) * 8:45 - 9:00 - Joshua Drake - A word from our sponsors * 9:00 - 9:25 - JoshB - Welcome to 8.3 * 9:25 - 10:20 - David Wheeler - Web 2.0 (Rails) applications with PostgreSQL * -- 10 minute break -- * 10:30 - 11:20 - Robert Hodges - Scaling PostgreSQL Performance with uni/cluster * 11:20 - 12:10 - Neil Conway - Understanding Query Execution in PostgreSQL * 12:10 - 13:15 - Lunch * 13:15 - 13:45 - Mark Wong - PostgreSQL Performance * 13:45 - 14:15 - Joshua Drake - PL/Proxy and Horizontal Scaling * 14:15 - 15:05 - Web Sprague - PostGIS (geographic database) * -- 10 minute break -- * 15:15 - 16:05 - David Fetter - Babel of procedural languages * 16:05 - 17:00 - Robert Treat - PostgreSQL Partitioning, semantics, pitfalls and implementation * 17:00 - 17:25 - Josh Berkus - Stupid Solaris tricks * 17:25 - 17:30 - Closing Remarks, Thanks, Where's the party? * 17:30 - 18:00 - Get to party/dinner (provided by conference) * 18:00 -- Dinner/Party till they kick us out And once again, thanks to our sponsors: Command Prompt: http://www.commandprompt.com/ Continuent: http://www.continuent.com/ EnterpriseDB: http://www.enterprisedb.com/ Greenplumn : http://www.greenplum.com/ OmniTI: http://www.omniti.com/ OTG: http://www.otg-nc.com/ Sun: http://www.sun.com/ Truviso: http://www.truviso.com/ Other Sonsors: PDXPUG: http://pugs.postgresql.org/pdx PSU: http://www.pdx.edu Sincerely, Joshua D. Drake - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHATlOATb/zqfZUUQRArmuAJ0e+hPMkVRkAfkz7k5RPgcHnweqvwCeJfC8 vOeSNyxle7KdoHRp1c0rauE= =HUle -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] usage of indexes for inner joins
On 10/1/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote: > Scott, > > i didn't think this belonged in the general list, but the example i gave > for discussion was a toy, for illustration. i could not very well post > the actual example for many reasons, including proprietary ones and, > given this is how things work, because the 1.5 million row table in > question is its own smallest description. This is the exact kind of question that belongs on -general. But it does get asked a lot, especially by people coming from other databases. > while indexes are being used on that table, there's a companion table > which is much smaller -- a "mere" 75000 rows -- which is suffering a > sequential scan, and i was trying to eliminate those. Well, don't think of sequential scans as plain bad. Sometimes they're the best choice, sometimes they're not. Now, if an index scan is provably and noticeably faster than the sequential scan, then the planner is making the wrong decision. Have you tried running your query with set enable_seqscan=off; to see how it behaves? I've found many somewhat slow queries got really fast or really slow when I did that. Note that you shouldn't blindly run a query all the time with that setting, as there are many instances where seqscan is the right answer. Also, your explain cost estimates will all be way off. > perhaps it is true that ANALYZE isn't being done often enough. perhaps > VACUUMs aren't being done often enough either. we're leary of > scheduling repeated VACUUMs having encountered a case where the VACUUM > took over an hour to complete. Run "analyze verbose" on your db and see what it says about number of page slots needed versus used. that will help you tell if you're vacuuming enough. How long vacuum takes isn't really that important. What is important is how much of an impact it's having on the system. there are several vacuum parameters in the postgresql.conf file that can lower the impact vacuum has on your system I/O wise while increasing its run time. Vacuum full is another story. Think of it as a recovery tool, not a periodic maintenance tool. > it may, too, be because the tables use user-defined types heavily and > the original UPDATE involved a SELECT ... IN ... having a GROUP BY with > a few references to columns deep within user-defined types. Hard to say without a query and an explain analyze output. It's common for user defined functions to produce estimates in the planner that are way off. user defined types, not so much. But the more complex the query the more likely it is that the query planner will make a bad estimate of the number of rows somewhere and choose a bad method. > that > wouldn't have been my choice, but, then, they were given to me to work, > not my design. in fact, PG is the first relational database > implementation i've used that offered such things in a big way. Extensibility is quite a useful tool. > i also don't understand some other things, which are surprising, like > why some UPDATEs take so much longer when wrapped in a BEGIN > TRANSACTION- > COMMIT than when having the transaction at a statement level. that is strange. I'd expect that maybe you've got something happening with the transaction waiting on other transactions, so that it's not so much running hard as just tapping its toe waiting for the other transaction to commit or roll back. > I come from an Oracle, DB2, Informix world, and in my experience > plans for queries are more stable. i have loitered in and around MySQL > for a while. i'm not surprised there's a learning curve with PG. i am > surprised it breaks so marked with mainstay database experience. Oh, I've seen Oracle get stupid due to lack of proper statistics as well. You like had a good DBA who kept all that stuff hidden from you though. But PostgreSQL and mainline db experience are often incompatible. The very things that people are used to using to make other dbs fast (forcing index usage for instance) can make postgresql noticeably slower. You might find that partial index help for some circumstances. If you are using a query that has a where clause that looks at a field that has one value 99% of the time and another value 1% of the time, you can index that 1% only, and an index scan will be ultra quick. The standard case for that is a boolean field. create table test (id int, info text, btest bool); insert 100,000 rows, with 1% having btest=true, the rest false. create index test_btest_true on test(btest) where btest IS TRUE; analyze test; explain analyze select * from test where btest is true; Generally, postgresql offers different ways to solve the same problems as other database, and knowing those ways can really help troubleshoot and fix poorly performing queries. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so
Re: [GENERAL] Upgrading PG
On Oct 1, 2007, at 12:26 PM, Gauthier, Dave wrote: I’m going to move from v7.4.13 to v8.2.0 (suse-64). I have 2 pre- existing DBs. Do I need to “convert” or port them to v8 in any way after I start up with a v8 postmaster? Thanks Moving between major release versions requires that you do a dump/ restore using the new version's pg_dump and pg_restore apps. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] more problems with count(*) on large table
"Mike Charnoky" <[EMAIL PROTECTED]> writes: > Here is the output from EXPLAIN ANALYZE. This is the same query run > back to back, first time takes 42 minutes, second time takes less than 2 > minutes! That doesn't really sound strange at all. It sounds like you have a very slow disk and very large amount of memory. 40 minutes to scan 11.4M records sounds kind of high to me though. How wide are these records anyways? That is, what is the table definition for this table? If this is a single consumer drive 42 minutes sounds about right for 2k wide records being randomly accessed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Upgrading PG
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gauthier, Dave wrote: > I'm going to move from v7.4.13 to v8.2.0 (suse-64). I have 2 > pre-existing DBs. Do I need to "convert" or port them to v8 in any way > after I start up with a v8 postmaster? > 1. v8.2.0 is a mistake, make sure you are running the latest stable patch rev. v8.2.5. 2. You will need to dump and reload, use the v8.2.5 pg_dump to connect to the v7.4.13 databases and restore. 3. There are specific behavior differences. The one that comes to mind most prominently is that float/double can no longer be ''. 4. Read the release notes from the docs 5. 8.2 is a completely different beast in terms of performance and configuration. Sincerely, Joshua D. Drake > > Thanks > > -dave > > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHATIlATb/zqfZUUQRAh0AAJ9+m0Oq5/EmytFGe2A9QR+8UDlOfwCcCGQ6 VRxeRHkq6GlWfviZcdCPKJ0= =SRz3 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Upgrading PG
On Monday 01 October 2007, "Gauthier, Dave" <[EMAIL PROTECTED]> wrote: > I'm going to move from v7.4.13 to v8.2.0 (suse-64). I have 2 > pre-existing DBs. Do I need to "convert" or port them to v8 in any way > after I start up with a v8 postmaster? > All major version upgrades require a dump and reload. You should read all the update notes for intermediate versions. You may also run into some bad-unicode data issues moving from 7.4 to 8.x. I would do a test run on a separate machine before committing to a conversion date (always a good idea IMO). -- Ghawar is dying ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Partitioned table limitation
> -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-general- > [EMAIL PROTECTED] On Behalf Of Goboxe > Sent: Monday, October 01, 2007 2:18 AM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Partitioned table limitation > > Hi, > > Are there any limitations on number of child tables that can be use > in > partitioned table? > > [snip] We currently use partitioning by date and id, with 1/4 a year of dates and approximately 10 IDs (and slowly increasing). Each partition runs from around 1 million to 20 million rows. Whether it's recommended or not, I don't know. But for us, the partitioning works exactly as advertised. As with anything new, I'd take the time to setup a simple test to see if it works for you, too. In particular, be sure to check the documentation on caveats. You'll find these a little stricter than partitioning issues in Oracle or SQL Server. HTH, Paul ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] more problems with count(*) on large table
Mike Charnoky wrote: > This is strange... count(*) operations over a period of one day's worth > of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > first time the data is queried it takes about 40 minutes. If I try the > query again, it finishes in 1-2 minutes! This is just cache effect. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] more problems with count(*) on large table
In response to Mike Charnoky <[EMAIL PROTECTED]>: > This is strange... count(*) operations over a period of one day's worth > of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > first time the data is queried it takes about 40 minutes. If I try the > query again, it finishes in 1-2 minutes! This sounds like a caching issue. My guess at what's happening is that other operations are pushing this data out of the shared_buffers, so when you run it, the system has to pull a bunch of tuples off the disk to check them. If you run it again immediately, the tuples are still in memory, and it runs very fast. If this is the case, you can speed up things by adding RAM/shared_buffers, or by moving to faster disks. The RAM solution is going to give you the biggest performance improvement. However, if there's enough other data on this system, you may have difficulty getting enough RAM to mitigate the problem, in which case, faster disks are going to be your best bet. How much RAM do you have, and how much of it is allocated to shared_buffers? What's your IO subsystem look like? > Again, nothing else is happening on this db server except for a constant > insertion into this table and a few others. I have done "set statistics > 100" for the evtime field in this table. > > Here is the output from EXPLAIN ANALYZE. This is the same query run > back to back, first time takes 42 minutes, second time takes less than 2 > minutes! > > mydb=# explain analyze select count(*) from prediction_accuracy where > evtime between '2007-09-29' and '2007-09-30'; > > QUERY PLAN > > --- > Aggregate (cost=3.02..3.03 rows=1 width=0) (actual > time=2549854.351..2549854.352 rows=1 loops=1) >-> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892 > rows=11423786 loops=1) > Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with > time zone)) > Total runtime: 2549854.411 ms > (4 rows) > > Time: 2549943.506 ms > mydb=# explain analyze select count(*) from prediction_accuracy where > evtime between '2007-09-29' and '2007-09-30'; > > QUERY PLAN > > - > Aggregate (cost=3.02..3.03 rows=1 width=0) (actual > time=111200.943..111200.944 rows=1 loops=1) >-> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483 > rows=11423786 loops=1) > Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with > time zone)) > Total runtime: 111201.000 ms > (4 rows) > > Time: 111298.695 ms > > > Mike > > Gregory Stark wrote: > > "Mike Charnoky" <[EMAIL PROTECTED]> writes: > > > >> I altered the table in question, with "set statistics 100" on the > >> timestamp column, then ran analyze. This seemed to help somewhat. Now, > >> queries don't seem to hang, but it still takes a long time to do the count: > >> * "where evtime between '2007-09-26' and '2007-09-27'" > >>took 37 minutes to run (result was ~12 million) > >> * "where evtime between '2007-09-25' and '2007-09-26'" > >>took 40 minutes to run (result was ~14 million) > >> > >> Still stymied about the seemingly random performance, especially since I > >> have seen this query execute in 2 minutes. > > > > > > And the "explain analyze" for these? > > > > Are you still sure it's certain date ranges which are consistently problems > > and others are consistently fast? Or could it be something unrelated. > > > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Upgrading PG
I'm going to move from v7.4.13 to v8.2.0 (suse-64). I have 2 pre-existing DBs. Do I need to "convert" or port them to v8 in any way after I start up with a v8 postmaster? Thanks -dave
Re: [GENERAL] using COPY, .CSV and ¿catalog?
On 10/1/07, pere roca ristol <[EMAIL PROTECTED]> wrote: > Hi everybody, > I want to enter a .CSV file using COPY comand and plpgsql. > It enters latitude,longitude and some data. In the CSV data there is no > field (such as "user_name" or current_time) that allow distinguish future > queries for different users (ex: select x,y from table where user_name=z; > after entering lat,lon I would like to update the_geom in another table > using the lat,lon of ONLY this user). > I could update this kind of data ("user_name") using php but next user > could not enter his data by COPY (it only lets fill the destination table if > ALL the fields are the same as the CSV or text file). I'm not a 100% certain I understand what you're trying to achieve, but to me this looks like a mis-use? Why not use the relational features of your RDBMS and just pick a location from a table created once via a unique value you assign during (or after) the COPY? > Thanks, > Pere Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] more problems with count(*) on large table
On Monday 01 October 2007, Mike Charnoky <[EMAIL PROTECTED]> wrote: > This is strange... count(*) operations over a period of one day's worth > of data now take ~1-2 minutes to run or ~40 minutes. It seems that the > first time the data is queried it takes about 40 minutes. If I try the > query again, it finishes in 1-2 minutes! > > Again, nothing else is happening on this db server except for a constant > insertion into this table and a few others. I have done "set statistics > 100" for the evtime field in this table. The first time, you're reading from disk. The second time, you're reading from cache. Tens of millions of disk seeks don't come cheap. -- We're Microsoft. Everything else is just patent infringement. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Find out encoding of data
Hi! I have this problem that Im not sure if my stored data has the correct coding. When I view the data from a postgres console I just see the characters and depending on my console-encoding it looks differently. What I really want to see is the hexadecimal or octal value of the bytes of the retrieved data. Can postgres give me this somehow (without exporting tables to files and look at the files). /br joynes -- View this message in context: http://www.nabble.com/Find-out-encoding-of-data-tf4549554.html#a12983073 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Partitioned table limitation
Hi, Are there any limitations on number of child tables that can be use in partitioned table? I am currently having weekly partitioned tables (using partitioned view in SQL Server) that I kept for 2 years. In total, there will be 52 * 2 = 104 tables exist at one time in the partition. I am migrating from SQL Server to pg. Can pg support that number of tables? How about if I want to convert it as daily tables (356 * 2 years = 712 tables). Is this recommended? FYI, currently each weekly table storeing between 28 to 32 millions records (approx 4-5 millions recs daily) Thanks, G ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] importing large files
Hello, I need to import between 100 millions to one billion records in a table. Each record is composed of two char(16) fields. Input format is a huge csv file.I am running on a linux box with 4gb of ram. First I create the table. Second I 'copy from' the cvs file. Third I create the index on the first field. The overall process takes several hours. The cpu seems to be the limitation, not the memory or the IO. Are there any tips to improve the speed ? Thanks very much, Olivier ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] DAGs and recursive queries
Thanks for your answers guys. I've got a cold right now and my brain is mush, so I can't comment intelligently on your suggestions just yet. I just wanted to express my thanks for your time. Jeff, one book you might want to look at is Joe Celko's Trees and Hierarchies in SQL for Smarties. http://www.amazon.com/Hierarchies-Smarties-Kaufmann-Management-Systems/dp/1558609202 If the connectby() function is like the Oracle connectby function, then perhaps it will suit my needs. The categorization scheme will nearly always have multiple parents for all but the topmost node. Each category stores serialized method calls for CRUD operations on objects within the category, which are requested by the application for all interactions with stored objects (though I'd like to be able to cache them too, but that's in the application domain). Each object stored in the database will belong to at least one category, but I expect they will normally belong to many categories. When I create a new object of categoryD, which is a child of categoryC and categoryB, which are children of categoryA, then my application will need the CREATE method calls from all parents, as well as the object category itself. I'd like them all returned from one query, possibly ordered according to the distance from the child node. Oops, I'm trying to comment intelligently. Better stop now. Cheers, Paul ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] how to ignore invalid byte sequence for encoding without using sql_ascii?
I am now importing the dump file of wikipedia into my postgresql using maintains/importDump.php. It fails on 'ERROR: invalid byte sequence for encoding UTF-8'. Is there any way to let pgsql just ignore the invalid characters ( i mean that drop the invalid ones ), that the script will keep going without die on this error. I know that i can using sql_ascii or even modify the importDump.php, but those are not so easy to do as i thought. thanks for help ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Find clusters containing a schema?
Is it possible to somehow query the system catalog to find out which clusters/databases have a certain schema? When running the query: SELECT * FROM pg_catalog.pg_namespace WHERE nspname = 'myschema'; It always only finds data for the current session, not all clusters, even when connected as postgres. Thanks, Josh ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Autostart PostgreSQL in Ubuntu
Hi, > /var/run/ might be on a temporary file system. So you need to adjust > your init script to create that directory if it doesn't exist. That is what I tried now and it works for now. I never had installed the Debian's PostgreSQL packages and I once manually installed those init.d-script. Assigning that to the different runlevels is possible by manually add those symbolic links or by using a runlevel editor GUI in Ubuntu. Thanks very much for all your responses! Sebastian ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Data cube in PostgreSQL
Hi All, I want to implement data cube operator in PostGReSQL. I searched few forums and found that I only can have interface to postgresql using EFEU package which allows basic cube operations. Is there any other way of extending postgresql to include cubes? thanks Gowrishankar ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username
I don't care if it's part of the SQL standard or not. I don't care if oracle does it or not. You're losing mysql converts as they go through the tutorial and get to this point. Or worse, they just "grant all" because it's easier, thus causing security holes. User friendliness matters. There's like a hundred posts asking for this for the past four years, and all they get in response is standards nazi's saying it won't be supported because it's not in the standard and telling them to write their own functions. You write the function. Fuck the standard and wake up. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] more problems with count(*) on large table
This is strange... count(*) operations over a period of one day's worth of data now take ~1-2 minutes to run or ~40 minutes. It seems that the first time the data is queried it takes about 40 minutes. If I try the query again, it finishes in 1-2 minutes! Again, nothing else is happening on this db server except for a constant insertion into this table and a few others. I have done "set statistics 100" for the evtime field in this table. Here is the output from EXPLAIN ANALYZE. This is the same query run back to back, first time takes 42 minutes, second time takes less than 2 minutes! mydb=# explain analyze select count(*) from prediction_accuracy where evtime between '2007-09-29' and '2007-09-30'; QUERY PLAN --- Aggregate (cost=3.02..3.03 rows=1 width=0) (actual time=2549854.351..2549854.352 rows=1 loops=1) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..3.02 rows=1 width=0) (actual time=97.676..2532824.892 rows=11423786 loops=1) Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with time zone)) Total runtime: 2549854.411 ms (4 rows) Time: 2549943.506 ms mydb=# explain analyze select count(*) from prediction_accuracy where evtime between '2007-09-29' and '2007-09-30'; QUERY PLAN - Aggregate (cost=3.02..3.03 rows=1 width=0) (actual time=111200.943..111200.944 rows=1 loops=1) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..3.02 rows=1 width=0) (actual time=36.396..96347.483 rows=11423786 loops=1) Index Cond: ((evtime >= '2007-09-29 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-30 00:00:00-07'::timestamp with time zone)) Total runtime: 111201.000 ms (4 rows) Time: 111298.695 ms Mike Gregory Stark wrote: > "Mike Charnoky" <[EMAIL PROTECTED]> writes: > >> I altered the table in question, with "set statistics 100" on the >> timestamp column, then ran analyze. This seemed to help somewhat. Now, >> queries don't seem to hang, but it still takes a long time to do the count: >> * "where evtime between '2007-09-26' and '2007-09-27'" >>took 37 minutes to run (result was ~12 million) >> * "where evtime between '2007-09-25' and '2007-09-26'" >>took 40 minutes to run (result was ~14 million) >> >> Still stymied about the seemingly random performance, especially since I >> have seen this query execute in 2 minutes. > > > And the "explain analyze" for these? > > Are you still sure it's certain date ranges which are consistently problems > and others are consistently fast? Or could it be something unrelated. > ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] row->ARRAY or row->table casting?
Gregory Stark ha scritto: "Nico Sabbi" <[EMAIL PROTECTED]> writes: nb1=# select r.a from (select row(tab1.*)::tab1 as r from tab1)x; ERROR: missing FROM-clause entry for table "r" LINE 1: select r.a from (select row(tab1.*)::tab1 as r from tab1)x; ^ I tried many variations (including casting x as tab1) obtaining only syntax errors. r.a would be the column "a" in the table named "r", but the only table in the FROM list is "x". So you have to use a workaround to make it clear to the parser that you're referring to the column "r", it would look like SELECT (r).a from (select row(tab1.*)::tab1 as r from tab1)x; yes, it works. Thanks a lot, Nico ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Inheritance fixing timeline? (Was "Inherited FK Indexing")
Webb Sprague escribió: > > > Is it > > > possible to have FK that spans into child tables? > > > > This is a well known (and documented, see [1]) deficiency. It's due to > > the current implementation of indices, which are bound to exactly one > > table, meaning they do return a position within the table, but cannot > > point to different tables. > > Is this set to be fixed in any particular release? No, sorry. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Inheritance problem when restoring db
"Sebastjan Trepca" <[EMAIL PROTECTED]> writes: > Current state: > Table B has a primary key with sequence b_seq. Table A also has a > primary key with sequence a_seq. In view of the fact that primary keys aren't inherited, and do not "have sequences", this description is uselessly imprecise. Please show exactly how you created these two tables. And which PG version is this? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Inheritance problem when restoring db
Hi, I noticed a small bug/problem when restoring a database that uses inheritance. Lets say you have a table B that inherits from table A. Current state: Table B has a primary key with sequence b_seq. Table A also has a primary key with sequence a_seq. Now we create a backup and restore the database. New state: Table B has a primary key with sequence a_seq. Table A is the same as before. Is this wrong or normal functionality? Do I have to set some extra flags when doing the backup? Thanks, Sebastjan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]
Stefan Schwarzer wrote: > >> An entirely different question is whether it is a good idea to write a >> range as a value that the database cannot interpret correctly (referring >> to the '1970-75' notation). You cannot group records by value this way >> if you need to (for example) combine data from '1970' with data from >> '1970-75'. >> >> But you seem to use these values just for labels, which I assume are >> unique across years (eg. if you have a value '1970-75' you don't have >> values '1970', 1971'..'1974'), in which case this is safe to use. As >> pointed out by several people earlier, they make an excellent foreign >> key too (provided they're unique). > > Yep, this is question I posed myself too. In the moment, when doing for > example "per Capita" calculations on the fly of a variable which has > something like 1970-75, I would then sum up the Total Population over > the given period, divide it through the number of years and then use it > with the selected variable to get the "per Capita" data. > > But if I would instead insert yearly data, it would mean that it had > five lines with the same values. No problem with that? Not entirely what I suggested, but also a viable solution, sure. I was suggesting to add a column to your yearly data marking the end of the range. Given your above examples, you could then do queries like: SELECT population / num_years FROM my_data; (Assuming you add the length of the interval as a number of years, which seems plausible because you don't seem to calculate with any intervals not dividable by a year). Adding this additional column may justify putting the years (and their durations) into their own table. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] usage of indexes for inner joins
On 9/27/07, Jan Theodore Galkowski <[EMAIL PROTECTED]> wrote: > I fear this has been asked many times about PostgreSQL, and I have read > the docs about how indexes are supposed to be defined and used, but I > don't understand why the engine and optimizer is doing what it does in > the simplest of situations. Is it that its tuning is heavily data > dependent? This has in fact been discussed many times on the mailing list. Searching the archives will probably return lots of discussions. > Like *how* *come*? There are indexes on both columns of the join. Is > it the NUMERIC datatype messing things up? Unlikely, as I've seen the > same with INTEGERs. Postgresql doesn't store "visibility" information in indexes. this means that once you find the entry in an index, you then have to see if it's visible to the current transaction, and that information is only stored in the tables. And there are lots of discussions of why that is in the archives as well. Basically race conditions make it impossible to update the table and index concurrently without ugly locking issues popping up. So, in pgsql, whether there's an index or not, the db has to hit the table in the end. > If it is data dependent (these tables are presently empty), any > suggestions as to how to tune a database for unknown mixes of data? No it isn't. It is range dependent. If you had a selective enough where clause then postgresql would choose an index over a sequential scan. Your biggest mistake here is thinking the simple solution (use indexes) is always best. PostgreSQL uses a cost based planner that tries to decide ahead of time what plan is going to be fastest. The real answer is to give it good information (i.e. run analyze frequently enough, and have a high enough stats target for the column(s) you're using) That means pgsql is paying attention to how big your tables are as well as what values are in there and what % you're going to get back. Use explain analyze to see the differences between what the planner expects and what it gets. Like this part of your explain analyze output: Seq Scan on foo a (cost=0.00..11.80 rows=180 width=407) (actual time=0.003..0.003 rows=0 loops=1) note that the planner expected 180 rows but got 0. that's a sign of poor stats. Run analyze and you should see something closer to a match between expected and actual rows. Also, try putting some real data in your db, and using a where clause (unless you really are gonna grab every single row every time...) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] more problems with count(*) on large table
"Mike Charnoky" <[EMAIL PROTECTED]> writes: > I altered the table in question, with "set statistics 100" on the > timestamp column, then ran analyze. This seemed to help somewhat. Now, > queries don't seem to hang, but it still takes a long time to do the count: > * "where evtime between '2007-09-26' and '2007-09-27'" >took 37 minutes to run (result was ~12 million) > * "where evtime between '2007-09-25' and '2007-09-26'" >took 40 minutes to run (result was ~14 million) > > Still stymied about the seemingly random performance, especially since I > have seen this query execute in 2 minutes. And the "explain analyze" for these? Are you still sure it's certain date ranges which are consistently problems and others are consistently fast? Or could it be something unrelated. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]
An entirely different question is whether it is a good idea to write a range as a value that the database cannot interpret correctly (referring to the '1970-75' notation). You cannot group records by value this way if you need to (for example) combine data from '1970' with data from '1970-75'. But you seem to use these values just for labels, which I assume are unique across years (eg. if you have a value '1970-75' you don't have values '1970', 1971'..'1974'), in which case this is safe to use. As pointed out by several people earlier, they make an excellent foreign key too (provided they're unique). Yep, this is question I posed myself too. In the moment, when doing for example "per Capita" calculations on the fly of a variable which has something like 1970-75, I would then sum up the Total Population over the given period, divide it through the number of years and then use it with the selected variable to get the "per Capita" data. But if I would instead insert yearly data, it would mean that it had five lines with the same values. No problem with that? Stef ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] more problems with count(*) on large table
I altered the table in question, with "set statistics 100" on the timestamp column, then ran analyze. This seemed to help somewhat. Now, queries don't seem to hang, but it still takes a long time to do the count: * "where evtime between '2007-09-26' and '2007-09-27'" took 37 minutes to run (result was ~12 million) * "where evtime between '2007-09-25' and '2007-09-26'" took 40 minutes to run (result was ~14 million) Still stymied about the seemingly random performance, especially since I have seen this query execute in 2 minutes. Nothing should be locking the table, the only things happening with this database are: periodic autovacuums and a process which is constantly inserting data into the large table in question (and a few other smaller logging tables). FYI: Here is the output from explain (I never knew about "\timing" in psql, that is a big help!). This was run immediately after I ran the query a first time (without using the prepare and explain)... This second run took a little over a minute! Bizarre... mydb=# prepare stmt as select count(*) from prediction_accuracy where evtime between '2007-09-25' and '2007-09-26'; PREPARE Time: 90.854 ms mydb=# explain execute stmt; QUERY PLAN - Aggregate (cost=642338.27..642338.28 rows=1 width=0) -> Index Scan using pred_acc_evtime_index on prediction_accuracy (cost=0.00..600219.37 rows=16847557 width=0) Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with time zone)) (3 rows) Time: 131.559 ms mydb# execute stmt; count -- 14150928 (1 row) Time: 101721.346 ms Mike Gregory Stark wrote: > "Alban Hertroys" <[EMAIL PROTECTED]> writes: > >> Mike Charnoky wrote: >>> With respect to the ALTER TABLE SET STATISTICS... how do I determine a >>> good value to use? This wasn't really clear in the pg docs. Also, do I >>> need to run ANALYZE on the table after I change the statistics? >>> >>> Here are the EXPLAINs from the queries: >>> >>> db=# explain select count(*) from prediction_accuracy where evtime >>> between '2007-09-25' and '2007-09-26'; >>> >>> QUERY PLAN >>> >>> - >>> Aggregate (cost=475677.40..475677.41 rows=1 width=0) >>>-> Index Scan using pred_acc_evtime_index on prediction_accuracy >>> (cost=0.00..51.44 rows=12490383 width=0) >>> Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp >>> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with >>> time zone)) >>> (3 rows) >>> >>> db=# explain select count(*) from prediction_accuracy where evtime >>> between '2007-09-26' and '2007-09-27'; >>> >>> QUERY PLAN >>> >>> - >>> Aggregate (cost=486615.04..486615.05 rows=1 width=0) >>>-> Index Scan using pred_acc_evtime_index on prediction_accuracy >>> (cost=0.00..454671.07 rows=12777586 width=0) >>> Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp >>> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with >>> time zone)) >>> (3 rows) >> Interesting, same plans and no sequential scans... Yet totally different >> run times. Almost as if something prevents you to read some records >> between 26 and 27 september... > > Just to be sure we're looking at the right plan do this: > > \timing > PREPARE stmt AS > SELECT count(*) >FROM prediction_accuracy > WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26'; > > EXPLAIN EXECUTE stmt; > EXECUTE stmt; > >> I'm no expert on locking in Postgres, but AFAIK locks that prevent you >> from reading records are rather rare and probably only issued from >> userland code. > > Pages can be locked but never for very long. > > What other work is going on in this server? Is there anything which might be > locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE, > etc?) > > Perhaps there's a hardware problem, is there anything in your system logs from > dmesg? > ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]
Stefan Schwarzer wrote: > > >> BTW, You didn't actually use type text for your year column, did you? No >> quotes needed then. Otherwise you'd have to make sure your year values >> are all the same length or sorting gets... interesting. > > Yep, my comment just before concerns especially this paragraph, I guess. > With not only years but also time spans for my data - so, not only 1970, > 1971 but also 1970-75, 1975-80 etc. I would need indeed to use text > for my year column. Why get sorting than "interesting"? > > Stef Text sorts alphabetically, not numerically: integer text --- --- 1 '1' 2 '10' 3 '2' 4 '3' 5 '4' 6 '5' 7 '6' 8 '7' 9 '8' 10 '9' I guess you won't have any problems with sorting, as you use full century prefixes and the differences in syntax ('1970' vs '1970-75') are at the rear of your values. An entirely different question is whether it is a good idea to write a range as a value that the database cannot interpret correctly (referring to the '1970-75' notation). You cannot group records by value this way if you need to (for example) combine data from '1970' with data from '1970-75'. But you seem to use these values just for labels, which I assume are unique across years (eg. if you have a value '1970-75' you don't have values '1970', 1971'..'1974'), in which case this is safe to use. As pointed out by several people earlier, they make an excellent foreign key too (provided they're unique). Other options to handle these years involve having a start/end year or date (to_date('01/01/1970', 'MM/DD/'), to_date('12/31/1975', 'MM/DD/')) or a start date and an interval (to_date('01/01/1970', 'MM/DD/'), INTERVAL '1 year'). That makes operators like OVERLAPS and BETWEEN available to your queries, which may be convenient ;) -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] sha1 function
Martin Marques wrote: We are at the moment planning on passing some passwords that are plain texted in our DB to some encrypted form as now they will be used for processes that require better security measures. We started looking at md5() but found that it's easy to crack and one of the systems uses CHAP authentication, and so SHA1 hashing. So, two questions arise. 1) Is MD5's weakness true? Well (and I'm not a security expert so please check details), there are some issues with MD5: 1. There were some recent breakthroughs in generating multiple documents that match a given MD5 hash. This is less of a threat to passwords and more to document forgery. 2. It's practical nowadays to pre-calculate "rainbow tables" to cover all possible combinations of a short password (hence the use of a "salt" to extend the password length). 3. You can't get from the hash back to a password easily. So - it's no use for storing passwords for other systems, just for checking supplied passwords. What you need to ask is what you're protecting against. If someone has access to the database, have they also gained access to the process that uses these passwords, so allowing them to watch the unencrypted password be tested? 2) Is there any sha1() functions in PostgreSQL? See contrib/pgcrypto in the source distribution or the equivalent in whatever package you use. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]
I find it far easier to maintain normalized tables that produced non-normalized ones (for things like data warehousing) than it is to maintain non-normalized tables and trying to produce normalized data from that. Ok, I do understand that. So, instead of the earlier mentioned database design, I would have something like this: - one table for the country names/ids/etc. (Afghanistan, 1; Albania, 2) - one table for the variable names/ids/etc. (GDP, 1; Population, 2; Fish Catch, 3;) - one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3; ) and - one table for all "statistical data" with four fields - id_variable, id_country, id_year, and the actual value You say I find it far easier to maintain normalized tables that produced non-normalized ones (for things like data warehousing) than it is to maintain non-normalized tables and trying to produce normalized data from that. What is your view about (having 500 different variables/data sets) using a single table for all data versus one table for each variable. In terms of "readability" I guess the second solution would be better. But, then, I don't know... Thanks for any views Stef ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]
BTW, You didn't actually use type text for your year column, did you? No quotes needed then. Otherwise you'd have to make sure your year values are all the same length or sorting gets... interesting. Yep, my comment just before concerns especially this paragraph, I guess. With not only years but also time spans for my data - so, not only 1970, 1971 but also 1970-75, 1975-80 etc. I would need indeed to use text for my year column. Why get sorting than "interesting"? Stef ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Dump of table structure is not consistent
Hi, \d my.table is showing me the primary key as: "unitstat_pkey" PRIMARY KEY, btree (id) But the looking into the table structure produced by pg_dump -s -n my -t table db I'm getting gthe primary key shown as: ADD CONSTRAINT unistat_pkey PRIMARY KEY (id); That has been the name of it, but I altered it executing: ALTER TABLE unistat_pkey RENAME TO unitstat_pkey How can I make the dump consistent ? PostgreSQL 8.1.9 on i486-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) Poul ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Normalized Tables & SELECT [was: Find "smallest common year"]
- one table for the years names/ids/etc. (1970, 1; 1971, 2; 1973, 3; ) If you _do_ need this table (because you want to constrain your statistical data to only contain a specific set of years, or because you need a quick list of available years to select from): Make the year primary key and drop the artificial index. Years are perfectly fine data to constrain on, and it saves you the joins with that table (the foreign key constraint constrains your data sufficiently). If my years are not only single years such as 1970, 1971... but time spans, such as 1970-75 should your proposal be still valid? It won't be anymore an integer field, but a text field instead... Stef ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] sha1 function
We are at the moment planning on passing some passwords that are plain texted in our DB to some encrypted form as now they will be used for processes that require better security measures. We started looking at md5() but found that it's easy to crack and one of the systems uses CHAP authentication, and so SHA1 hashing. So, two questions arise. 1) Is MD5's weakness true? 2) Is there any sha1() functions in PostgreSQL? -- 21:50:04 up 2 days, 9:07, 0 users, load average: 0.92, 0.37, 0.18 - Lic. Martín Marqués | SELECT 'mmarques' || Centro de Telemática| '@' || 'unl.edu.ar'; Universidad Nacional| DBA, Programador, del Litoral | Administrador - ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Inheritance fixing timeline? (Was "Inherited FK Indexing")
> Is this set to be fixed in any particular release? Depending on what you're doing, this may be overkill, but: I have child tables that not only need FK constraints, but also triggers and the functions called by the triggers. So instead of writing this over and over again, I eventually wrote a single procedure that takes the name of the table, and using dynamic sql (execute command), generates the FKs and the procedures and the triggers. You *could* take it a step further, and have a procedure which takes the name of the base table, finds all inherited tables, and makes sure everything is set up correctly. I haven't, and probably won't, because I'm a solo developer and don't make additions to the schema at such a great rate that I would have trouble remembering to run my current "FooChild_Setup" function on a new table. -- Scott Ribe [EMAIL PROTECTED] http://www.killerbytes.com/ (303) 722-0567 voice ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] more problems with count(*) on large table
Albe Laurenz wrote: > Alban Hertroys wrote: >> A. Kretschmer wrote: >>> Again: an index can't help! Because of MVCC: 'select count(*)' > without >>> WHERE-condition forces an seq. table-scan. >> That has very little to do with MVCC. >> >> [...] For that it makes no difference whether a seq >> scan or an index scan is performed - both cases need to check at the >> record level whether it's visible (where the seq scan is >> already looking at the actual record, of course). > > If you do not use MVCC (say, you use DB2), you need not check > the record itself because if it is there (which it is if there > is an index entry), it will be 'visible'. Still, that's not because of MVCC, but because of the way it is implemented in PostgreSQL. There has been talk in the past (regularly) about why the MVCC information is not in the index and whether it should be, see the ML archives. Besides, there are still many situations where a sequential scan (whether for count(*) or not) is faster than an index scan, no matter whether you have MVCC or not. As I said, MVCC has little to do with it. The real problem is that in postgres you cannot tell from an index whether a record is visible or not, while you can in DB2 (because it has an index entry or not). >> I pleed not guilty ;) > > Declined, sorry. Overruled, sorry. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] usage of indexes for inner joins
Sequence scans of an empty table are going to be faster than an index scan, so the database uses the sequence scan. Put some data in the tables (some thousands or millions of records) and then see if it uses an index scan. Ben ""Jan Theodore Galkowski"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I fear this has been asked many times about PostgreSQL, and I have read > the docs about how indexes are supposed to be defined and used, but I > don't understand why the engine and optimizer is doing what it does in > the simplest of situations. Is it that its tuning is heavily data > dependent? > > My case of interest is more complicated, but I decided to create a toy > case to try to understand. Here it is: > > > -- Table "foo" DDL > > CREATE TABLE "public"."foo"( > > "projectid" int4 NOT NULL , > > "uid" int4 NOT NULL , > > "name" varchar(254) NOT NULL , > > "ver" varchar(127) NOT NULL , > > "startdate" date NOT NULL , > > "enddate" date NOT NULL , > > "status" varchar(254) NOT NULL , > > "percentdone" numeric(7,2) NOT NULL , > > "championuid" int4 NOT NULL , > > "pmuid" int4 NOT NULL , > > PRIMARY KEY ("projectid") > > ) WITHOUT OIDS; > > > -- Table "bignum" DDL > > CREATE TABLE "public"."bignum"( > > "thing" numeric(100) NOT NULL > > ) WITHOUT OIDS; > > CREATE INDEX "t" ON "public"."bignum" USING btree ("thing"); > > > Running > >EXPLAIN ANALYZE SELECT A.* FROM bignum B, foo A WHERE A.projectid >= B.thing; > > yields: > >Nested Loop (cost=0.00..15.51 rows=1 width=407) (actual >time=0.041..0.041 rows=0 loops=1) > > Join Filter: ((a.projectid)::numeric = b.thing) -> > >Seq Scan on bignum b (cost=0.00..1.01 rows=1 width=16) (actual >time=0.024..0.027 rows=1 loops=1) -> > >Seq Scan on foo a (cost=0.00..11.80 rows=180 width=407) (actual >time=0.003..0.003 rows=0 loops=1) > >Total runtime: .169 ms ; > > Like *how* *come*? There are indexes on both columns of the join. Is > it the NUMERIC datatype messing things up? Unlikely, as I've seen the > same with INTEGERs. > > If it is data dependent (these tables are presently empty), any > suggestions as to how to tune a database for unknown mixes of data? > > This is run on the Windows version of PG, but I'm seeing the same kind > of thing on Linux. > > Thanks. > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Re: ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window
Albe Laurenz skrev: > Anoo Sivadasan Pillai wrote: >> Why the Fun_ABC1 is created and Fun_ABC12 is raising the >> following error, while run through psql, ( I Could create >> both the functions from PgAdmin III query ) >> >> ERROR: invalid byte sequence for encoding "UTF8": 0x93 > > Because the characters you entered into psql are not > encoded in the character encoding that psql expects. > > You probably create the function in psql by executing an > SQL script, right? > > From the error message I deduce that psql expects UTF8 > characters from you. > > You can do two things: > 1) change the SQL script to UTF8 before running it > 2) change the client encoding in psql before running the >SQL script. >This is done with the SQL command >SET client_encoding='...' Third option: Stop using curly quotes in your SQL - even if it just in comments. Nis ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Query problem
Naz Gassiep wrote: > Aside from the messy nomenclature, is anyone able to spot why > the "sum" column from the first query is not returning 7, as > the second query suggests that it should? I know that this is > probably simple, and that It's probably going to jump out at > me the minute I hit "Send", but if I don't hit send, then I'll > never see it hehe. > > twerl=# SELECT ... SUM(contacts.id) ... FROM ... contacts ... > ... | sum | ... > ...-+-+-... > ... | 594 | ... > (1 row) > > twerl=# select count(*) from contacts where groupid = 3; > count > --- > 7 > (1 row) That's an easy one. I assume that there are three rows in table "contacts", and the values of the "id" column are 42, 123 and 429 (or similar). Yours, Laurenz Albe ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] It's time to support GRANT SELECT,UPDATE,...,...,... ON database.* to username
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > I don't care if it's part of the SQL standard or not. I don't care if > oracle does it or not. You're losing mysql converts as they go > through the tutorial and get to this point. Or worse, they just "grant > all" because it's easier, thus causing security holes. User > friendliness matters. > You can use the pgAdmin's grant wizard to do what you want. Regards, Ben BTW thanks for the polite e-mail. :-/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window
Anoo Sivadasan Pillai wrote: > Why the Fun_ABC1 is created and Fun_ABC12 is raising the > following error, while run through psql, ( I Could create > both the functions from PgAdmin III query ) > > ERROR: invalid byte sequence for encoding "UTF8": 0x93 Because the characters you entered into psql are not encoded in the character encoding that psql expects. You probably create the function in psql by executing an SQL script, right? From the error message I deduce that psql expects UTF8 characters from you. You can do two things: 1) change the SQL script to UTF8 before running it 2) change the client encoding in psql before running the SQL script. This is done with the SQL command SET client_encoding='...' Yours, Laurenz Albe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Query problem
Aside from the messy nomenclature, is anyone able to spot why the "sum" column from the first query is not returning 7, as the second query suggests that it should? I know that this is probably simple, and that It's probably going to jump out at me the minute I hit "Send", but if I don't hit send, then I'll never see it hehe. twerl=# SELECT 'contactgroups', contactgroups.siteid, contactgroups.id, contactgroups.name, contactgroups.description, SUM(contacts.id), contactgroups.trashed FROM contactgroups LEFT OUTER JOIN contacts ON (contactgroups.id = contacts.groupid) WHERE contactgroups.trashed IS NOT NULL AND contactgroups.deleted IS NULL GROUP BY contactgroups.siteid, contactgroups.id, contactgroups.name, contactgroups.description, contactgroups.trashed; ?column?| siteid | id |name| description | sum | trashed ---++++-+-+--- contactgroups | 1 | 3 | Telechoice / Optus | | 594 | 2007-10-01 20:08:51.449825+10 (1 row) twerl=# select count(*) from contacts where groupid = 3; count --- 7 (1 row) Thanks, - Naz. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] windows and pg 8.2 (change database to another server)
On Sun, Sep 30, 2007 at 11:36:23PM +0200, Terry Yapt wrote: > Hello all, > > I would like to change a pg database to another server. > > The source environment is: postgresql Windows v.8.2.4 (windows xp > workstation). > The target environment is: postgresql Windows v.8.2.5 (windows 2003 Server). > > I would like to do migration without pg_dumpall and I think I can do the > migration process _only_ copying this (after stop Postgresql service, of > course): > 1.- Data directory (c:\program files\postgresql\8.2\data). > 2.- Another TableSpace folders/directories. > > But after copy that folders to my new server, service PostgreSQL doesn't > start with a 'cannot create postmaster.pid'. > > First of all. I think this method is admisible. Isn't it ? It is. > And second question: I think my problem is that some rights are wrong > after copying data folder. What are the right rights to apply to data > folder ? Yes, most likely. You need to grant the postgres service account "Change" permissions (or Full Control, but Change is recommended) on the data directory. If you didn't do anything speicifically, it will just have inherited from further up in the tree, which means that the service account only has "Read" access. //Magnus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window
Anoo Sivadasan Pillai wrote: Why the Fun_ABC1 is created and Fun_ABC12 is raising the following error, while run through psql, ( I Could create both the functions from PgAdmin III query ) ERROR: invalid byte sequence for encoding "UTF8": 0x93 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". Check your "client_encoding" setting in psql perhaps? Presumably the text-file contains non-UTF8 characters but the client_encoding says UTF8. Copying+pasting into pgAdmin is probably translating for you. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] more problems with count(*) on large table
Alban Hertroys wrote: > A. Kretschmer wrote: >> Again: an index can't help! Because of MVCC: 'select count(*)' without >> WHERE-condition forces an seq. table-scan. > > That has very little to do with MVCC. > > [...] For that it makes no difference whether a seq > scan or an index scan is performed - both cases need to check at the > record level whether it's visible (where the seq scan is > already looking at the actual record, of course). If you do not use MVCC (say, you use DB2), you need not check the record itself because if it is there (which it is if there is an index entry), it will be 'visible'. > I pleed not guilty ;) Declined, sorry. Yours, Laurenz Albe ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] more problems with count(*) on large table
"Alban Hertroys" <[EMAIL PROTECTED]> writes: > Mike Charnoky wrote: >> With respect to the ALTER TABLE SET STATISTICS... how do I determine a >> good value to use? This wasn't really clear in the pg docs. Also, do I >> need to run ANALYZE on the table after I change the statistics? >> >> Here are the EXPLAINs from the queries: >> >> db=# explain select count(*) from prediction_accuracy where evtime >> between '2007-09-25' and '2007-09-26'; >> >> QUERY PLAN >> >> - >> Aggregate (cost=475677.40..475677.41 rows=1 width=0) >>-> Index Scan using pred_acc_evtime_index on prediction_accuracy >> (cost=0.00..51.44 rows=12490383 width=0) >> Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp >> with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with >> time zone)) >> (3 rows) >> >> db=# explain select count(*) from prediction_accuracy where evtime >> between '2007-09-26' and '2007-09-27'; >> >> QUERY PLAN >> >> - >> Aggregate (cost=486615.04..486615.05 rows=1 width=0) >>-> Index Scan using pred_acc_evtime_index on prediction_accuracy >> (cost=0.00..454671.07 rows=12777586 width=0) >> Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp >> with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with >> time zone)) >> (3 rows) > > Interesting, same plans and no sequential scans... Yet totally different > run times. Almost as if something prevents you to read some records > between 26 and 27 september... Just to be sure we're looking at the right plan do this: \timing PREPARE stmt AS SELECT count(*) FROM prediction_accuracy WHERE evtime BETWEEN '2007-09-25' AND '2007-09-26'; EXPLAIN EXECUTE stmt; EXECUTE stmt; > I'm no expert on locking in Postgres, but AFAIK locks that prevent you > from reading records are rather rare and probably only issued from > userland code. Pages can be locked but never for very long. What other work is going on in this server? Is there anything which might be locking the table periodically (either a VACUUM FULL, CLUSTER, ALTER TABLE, etc?) Perhaps there's a hardware problem, is there anything in your system logs from dmesg? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] ERROR: invalid byte sequence from psql - Works perfectly from pgAdmin III query window
Why the Fun_ABC1 is created and Fun_ABC12 is raising the following error, while run through psql, ( I Could create both the functions from PgAdmin III query ) ERROR: invalid byte sequence for encoding "UTF8": 0x93 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". CREATE OR REPLACE FUNCTION "Fun_ABC1"() RETURNS VOID AS $BODY$ /***Description : "Desc1" - Description contains character ***/ CREATE OR REPLACE FUNCTION "Fun_ABC12"() RETURNS VOID AS $BODY$ /***Description : "Desc1" - Description contains character ***/ Note : This was posted in pg_hackers/ As per the suggestions now reposting in general Visit our Website at http://www.rmesi.co.in This message is confidential. You should not copy it or disclose its contents to anyone. You may use and apply the information for the intended purpose only. Internet communications are not secure; therefore, RMESI does not accept legal responsibility for the contents of this message. Any views or opinions presented are those of the author only and not of RMESI. If this email has come to you in error, please delete it, along with any attachments. Please note that RMESI may intercept incoming and outgoing email communications. Freedom of Information Act 2000 This email and any attachments may contain confidential information belonging to RMESI. Where the email and any attachments do contain information of a confidential nature, including without limitation information relating to trade secrets, special terms or prices these shall be deemed for the purpose of the Freedom of Information Act 2000 as information provided in confidence by RMESI and the disclosure of which would be prejudicial to RMESI's commercial interests. This email has been scanned for viruses by Trend ScanMail.
Re: [GENERAL] COPY for .CSV files problem
On Monday 01 October 2007 05:20:52 pere roca wrote: > Hi everybody, > I want to enter a .CSV file using COPY comand and plpgsql. It enters > lat,lon and some data. In the CSV data there is no field (such as > "user_name" or current_time) that allow distinguish future queries for > different users (ex: select x,y from table where user_name=z; after > entering lat,lon I would like to update the_geom in another table using > the lat,lon of ONLY this user). > > I could update this kind of data ("user_name") using php but next user > could not enter his data by COPY (it only lets fill the destination table > if ALL the fields are the same as the CSV or text file). > So, I have a problem. I thought that may be I can reference/select data > of a SINGLE USER using parameters like default_time without need to insert > this data as a field in the table (otherwise, problems with COPY). Maybe > can I use catalog? COPY won't allow you to process your data. You either write something to insert record by record or you change your data to include the needed information before using COPY. I would change the input file to include the user name. -- Jorge Godoy <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] 3 tables join update
rihad wrote: > Richard Broersma Jr wrote: > UPDATE Foo foo > SET ... > FROM LEFT JOIN Bar bar USING(common_field) > WHERE blah='blah' AND bar.common_field IS NULL; > > ERROR: syntax error at or near "JOIN" > > > I know I'm misusing UPDATE ... FROM because I don't really want Bar's > values to go into Foo, but only using them for a conditional update > (atomically I hope). Oh, you mean: UPDATE foo SET ... WHERE blah = 'blah' AND NOT EXISTS ( SELECT 1 FROM baz WHERE foo.common_field = baz.common_field ) -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] more problems with count(*) on large table
Mike Charnoky wrote: > With respect to the ALTER TABLE SET STATISTICS... how do I determine a > good value to use? This wasn't really clear in the pg docs. Also, do I > need to run ANALYZE on the table after I change the statistics? > > Here are the EXPLAINs from the queries: > > db=# explain select count(*) from prediction_accuracy where evtime > between '2007-09-25' and '2007-09-26'; > > QUERY PLAN > > - > Aggregate (cost=475677.40..475677.41 rows=1 width=0) >-> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..51.44 rows=12490383 width=0) > Index Cond: ((evtime >= '2007-09-25 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-26 00:00:00-07'::timestamp with > time zone)) > (3 rows) > > db=# explain select count(*) from prediction_accuracy where evtime > between '2007-09-26' and '2007-09-27'; > > QUERY PLAN > > - > Aggregate (cost=486615.04..486615.05 rows=1 width=0) >-> Index Scan using pred_acc_evtime_index on prediction_accuracy > (cost=0.00..454671.07 rows=12777586 width=0) > Index Cond: ((evtime >= '2007-09-26 00:00:00-07'::timestamp > with time zone) AND (evtime <= '2007-09-27 00:00:00-07'::timestamp with > time zone)) > (3 rows) Interesting, same plans and no sequential scans... Yet totally different run times. Almost as if something prevents you to read some records between 26 and 27 september... I'm no expert on locking in Postgres, but AFAIK locks that prevent you from reading records are rather rare and probably only issued from userland code. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] more problems with count(*) on large table
A. Kretschmer wrote: > Again: an index can't help! Because of MVCC: 'select count(*)' without > WHERE-condition forces an seq. table-scan. That has very little to do with MVCC. If I understand correctly, MVCC is about the availability of records in the current transaction. For that it makes no difference whether a seq scan or an index scan is performed - both cases need to check at the record level whether it's visible (where the seq scan is already looking at the actual record, of course). The only difference MVCC makes here is that the balance between a seq scan or an index scan being more efficient is somewhat sooner in favour of the seq scan than on some other databases, because the index scan needs to look at the actual record for visibility. I pleed not guilty ;) -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] COPY for .CSV files problem
Hi everybody, I want to enter a .CSV file using COPY comand and plpgsql. It enters lat,lon and some data. In the CSV data there is no field (such as "user_name" or current_time) that allow distinguish future queries for different users (ex: select x,y from table where user_name=z; after entering lat,lon I would like to update the_geom in another table using the lat,lon of ONLY this user). I could update this kind of data ("user_name") using php but next user could not enter his data by COPY (it only lets fill the destination table if ALL the fields are the same as the CSV or text file). So, I have a problem. I thought that may be I can reference/select data of a SINGLE USER using parameters like default_time without need to insert this data as a field in the table (otherwise, problems with COPY). Maybe can I use catalog? Thanks in advance, Pere -- View this message in context: http://www.nabble.com/COPY-for-.CSV-files-problem-tf4546778.html#a12974836 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] using COPY, .CSV and ¿catalog?
Hi everybody, I want to enter a .CSV file using COPY comand and plpgsql. It enters latitude,longitude and some data. In the CSV data there is no field (such as "user_name" or current_time) that allow distinguish future queries for different users (ex: select x,y from table where user_name=z; after entering lat,lon I would like to update the_geom in another table using the lat,lon of ONLY this user). I could update this kind of data ("user_name") using php but next user could not enter his data by COPY (it only lets fill the destination table if ALL the fields are the same as the CSV or text file). So, I have a problem. I thought that may be I can reference/select data of a SINGLE USER using parameters like default_time without need to insert this data as a field in the table (otherwise, problems with COPY). Maybe asking in catalog? Thanks, Pere