[GENERAL] Is "query" a reserved word in 8.3 plpgsql?
Hi, I saw the item in the release notes about the new "return query" syntax in pl/pgsql, but I didn't see any note about "query" being reserved now. Perhaps an explicit mention should be added? I loaded a dump from 8.2.4 into 8.3b2 without error. However, every function that uses "query" as a variable name now fails with a syntax error when it is executed: ERROR: syntax error at or near "query" at character 151 My workaround will be to extract the functions from the dump, edit them to rename the variable, and load them into the new db. -- todd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [HACKERS] [GENERAL] Is "query" a reserved word in 8.3 plpgsql?
Tom Lane wrote: It turned out to be a very easy change, so it's done: QUERY isn't a reserved word anymore. Thanks for your help. :) -- todd ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4
Hi, First, the numbers: PG VersionLoad time pg_database_size autovac -- 8.2.13179 min 92,807,992,820on 8.3.7 180 min 84,048,744,044on (defaults) 8.4b2 206 min 84,028,995,344on (defaults) 8.4b2 183 min 84,028,839,696off The bulk of the data is in 16 tables, each having about 55 million rows of the form (int, int, smallint, smallint, int, int, int). Each table has a single partial index on one of the integer columns. The dump file was 14GB compressed. The loads were all done on the same machine, with the DB going on a pair of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and 8GB RAM. maintenance_work_mem was set to 512MB in all three cases. -- todd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4
Vick Khera wrote: On Wed, Jun 17, 2009 at 10:50 AM, Todd A. Cook wrote: The loads were all done on the same machine, with the DB going on a pair of SATA drives in a RAID-0 stripe. The machine has 2 non-HT Xeons and 8GB RAM. maintenance_work_mem was set to 512MB in all three cases. What if you double or triple the number of checkpoint segments? checkpoint_segments was set to 128 for all tests. -- todd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4
Tom Lane wrote: "Todd A. Cook" writes: First, the numbers: PG VersionLoad time pg_database_size autovac -- 8.2.13179 min 92,807,992,820on 8.3.7 180 min 84,048,744,044on (defaults) 8.4b2 206 min 84,028,995,344on (defaults) 8.4b2 183 min 84,028,839,696off The bulk of the data is in 16 tables, each having about 55 million rows of the form (int, int, smallint, smallint, int, int, int). Each table has a single partial index on one of the integer columns. Given that it's multiple tables, it'd be possible for autovacuum to kick in and ANALYZE the data inserted into earlier tables while the later ones were still being loaded. If so, the discrepancy might be explained by 8.4's more-aggressive statistics target, which means that a background ANALYZE will take about 10x more work than before. If you have time to repeat the experiments, it would be interesting to see what happens with consistent default_statistics_target across 8.3 and 8.4. That would seem to be it: 8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10) I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend. -- todd regards, tom lane . -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4
Todd A. Cook wrote: Tom Lane wrote: "Todd A. Cook" writes: First, the numbers: PG VersionLoad time pg_database_size autovac -- 8.2.13179 min 92,807,992,820on 8.3.7 180 min 84,048,744,044on (defaults) 8.4b2 206 min 84,028,995,344on (defaults) 8.4b2 183 min 84,028,839,696off The bulk of the data is in 16 tables, each having about 55 million rows of the form (int, int, smallint, smallint, int, int, int). Each table has a single partial index on one of the integer columns. Given that it's multiple tables, it'd be possible for autovacuum to kick in and ANALYZE the data inserted into earlier tables while the later ones were still being loaded. If so, the discrepancy might be explained by 8.4's more-aggressive statistics target, which means that a background ANALYZE will take about 10x more work than before. If you have time to repeat the experiments, it would be interesting to see what happens with consistent default_statistics_target across 8.3 and 8.4. That would seem to be it: 8.4b2 183 min 84,028,897,040 on (defaults, default_statistics_target=10) I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend. The results for this are also consistent with Tom's theory: 8.3.7 205 min 84,048,866,924 on (defaults, default_statistics_target=100) -- todd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] scoring differences between bitmasks
Ben wrote: Just the number of bits, not which ones. Basically, the hamming distance. I see. Could you pre-compute the bit counts for the vectors in the table? You could count the bits in the search vector as Martijn suggested, and then do a lookup based on the count. -- todd ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] scoring differences between bitmasks
Hi, It may be that I don't understand your problem. :) Are you searching the table for the closest vector? If so, is "closeness" defined only as the number of bits that are different? Or, do you need to know which bits as well? -- todd Ben wrote: Hrm, I don't understand. Can you give me an example with some reasonably sized vectors? On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote: Hi, Try breaking the vector into 4 bigint columns and building a multi- column index, with index columns going from the most evenly distributed to the least. Depending on the distribution of your data, you may only need 2 or 3 columns in the index. If you can cluster the table in that order, it should be really fast. (This structure is a tabular form of a linked trie.) -- todd Ben wrote: Yes, that's the straightforward way to do it. But given that my vectors are 256 bits in length, and that I'm going to eventually have about 4 million of them to search through, I was hoping greater minds than mine had figured out how to do it faster, or how compute some kind of indexing... somehow. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] scoring differences between bitmasks
Hi, Try breaking the vector into 4 bigint columns and building a multi-column index, with index columns going from the most evenly distributed to the least. Depending on the distribution of your data, you may only need 2 or 3 columns in the index. If you can cluster the table in that order, it should be really fast. (This structure is a tabular form of a linked trie.) -- todd Ben wrote: Yes, that's the straightforward way to do it. But given that my vectors are 256 bits in length, and that I'm going to eventually have about 4 million of them to search through, I was hoping greater minds than mine had figured out how to do it faster, or how compute some kind of indexing... somehow. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] scoring differences between bitmasks
Hi, Sorry for being so abtuse; my 9 year old has been helping me with my work today. :) The hamming distance between two bit vectors can also be found as the sum of distances between sub-vectors. Let's assume you're looking for all vectors less than d bits away from a search vector, and we'll divide the vectors into k-bit subvectors. The value of k will depend on d, how sparse you data is, and how much memory you have for lookup tables. The goal is to eliminate the vectors that are more than d bits away. (This assumes that most of the search vectors are more than d bits away; for the opposite case, just reverse the sense of the comparisons below.) Compute the set of k-bit vectors that Sure, but unless I can figure out some way to choose a small number of vectors, I'm left with computing the full N^2 set. Given that I'm shooting for N to be 4 million or larger, that's a lot of data to store. On Oct 2, 2005, at 12:14 PM, Todd A. Cook wrote: Ben wrote: Just the number of bits, not which ones. Basically, the hamming distance. I see. Could you pre-compute the bit counts for the vectors in the table? You could count the bits in the search vector as Martijn suggested, and then do a lookup based on the count. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Out of memory error in 8.1.0 Win32
Hi, I am consistently running into out-of-memory issues in 8.1.4 running on RHEL3 and 8.0.5 on RHEL4. The logs show entries like this: AggContext: -2130714624 total in 271 blocks; 9688 free (269 chunks); -2130724312 used TupleHashTable: 893902872 total in 119 blocks; 1088688 free (449 chunks); 892814184 used which looks mighty suspicious to me. :-; I can provide a self-contained test case if anyone wants to look at it. -- todd Relyea, Mike wrote: So what's my next step? How do I track down what is causing this problem? -Original Message- From: Qingqing Zhou [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 21, 2006 11:01 PM To: Relyea, Mike Cc: pgsql-general@postgresql.org; Tom Lane Subject: RE: [GENERAL] Out of memory error in 8.1.0 Win32 On Wed, 21 Jun 2006, Relyea, Mike wrote: ExecutorState: 550339936 total in 123 blocks; 195003544 free (740135 chunks); 355336392 used HashBatchContext: 293593176 total in 44 blocks; 3107384 free (80 chunks); 290485792 used TIDBitmap: 2088960 total in 8 blocks; 924720 free (27 chunks); 1164240 used HashBatchContext: 41877664 total in 11 blocks; 4845648 free (15 chunks); 37032016 used TIDBitmap: 2088960 total in 8 blocks; 1012120 free (27 chunks); 1076840 used The same problem. ExecutorState uses much more memory than we expect -- but not sure where they are from :-( Regards, Qingqing ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Out of memory error in 8.1.0 Win32
Tom Lane wrote: Misestimated hash aggregation, perhaps? What is the query and what does EXPLAIN show for it? What have you got work_mem set to? oom_test=> \d oom_tab Table "public.oom_tab" Column | Type | Modifiers +-+--- val| integer | oom_test=> explain select val,count(*) from oom_tab group by val; QUERY PLAN - HashAggregate (cost=1163446.13..1163448.63 rows=200 width=4) -> Seq Scan on oom_tab (cost=0.00..867748.42 rows=59139542 width=4) The row estimitate for oom_tab is close to the actual value. Most of the values are unique, however, so the result should have around 59M rows too. I've tried it with work_mem set to 32M, 512M, 1G, and 2G. It fails in all cases, but it hits the failure point quicker with work_mem=32M. -- todd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Out of memory error in 8.1.0 Win32
Tom Lane wrote: "Todd A. Cook" <[EMAIL PROTECTED]> writes: oom_test=> explain select val,count(*) from oom_tab group by val; QUERY PLAN - HashAggregate (cost=1163446.13..1163448.63 rows=200 width=4) -> Seq Scan on oom_tab (cost=0.00..867748.42 rows=59139542 width=4) The row estimitate for oom_tab is close to the actual value. Most of the values are unique, however, so the result should have around 59M rows too. Well, that's the problem right there :-(. Have you ANALYZEd this table? My production table and query are more complex. In the original, the query above was in a sub-select; the work-around was to create a temp table with the sub-query results, analyze it, and then do the larger query based off of the temp table. There have been off and on discussions on the pg lists about out of memory issues (see http://archives.postgresql.org/pgsql-bugs/2006-03/msg00102.php). I was just offering my test case as an example in case it might be of any use in tracking those problems down. :) -- todd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Out of memory error in 8.1.0 Win32
Tom Lane wrote: Hmm. One of the things that's on my TODO list is to make the planner smarter about drilling down into sub-selects to extract statistics. I think that's what's called for here, but your example has eliminated all the interesting details. Can you show us the actual query, its EXPLAIN plan, and the definitions of the tables/views involved? I've already altered the queries that ran into this problem. I'll dig the old ones out of CVS and send them to you tomorrow. -- todd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly