[GENERAL] Is "query" a reserved word in 8.3 plpgsql?

2007-11-09 Thread Todd A. Cook

Hi,

I saw the item in the release notes about the new "return query" syntax in 
pl/pgsql,
but I didn't see any note about "query" being reserved now.  Perhaps an explicit
mention should be added?

I loaded a dump from 8.2.4 into 8.3b2 without error.  However, every function 
that
uses "query" as a variable name now fails with a syntax error when it is 
executed:

ERROR:  syntax error at or near "query" at character 151

My workaround will be to extract the functions from the dump, edit them to 
rename
the variable, and load them into the new db.

-- todd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [HACKERS] [GENERAL] Is "query" a reserved word in 8.3 plpgsql?

2007-11-12 Thread Todd A. Cook

Tom Lane wrote:


It turned out to be a very easy change, so it's done: QUERY isn't a
reserved word anymore.


Thanks for your help. :)

-- todd

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

2009-06-17 Thread Todd A. Cook

Hi,

First, the numbers:

 PG VersionLoad time pg_database_size autovac
--
  8.2.13179 min   92,807,992,820on
  8.3.7 180 min   84,048,744,044on (defaults)
  8.4b2 206 min   84,028,995,344on (defaults)
  8.4b2 183 min   84,028,839,696off

The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int).  Each table has a
single partial index on one of the integer columns.  The dump file was 14GB
compressed.

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.

-- todd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

2009-06-18 Thread Todd A. Cook

Vick Khera wrote:

On Wed, Jun 17, 2009 at 10:50 AM, Todd A.
Cook wrote:

The loads were all done on the same machine, with the DB going on a pair
of SATA drives in a RAID-0 stripe.  The machine has 2 non-HT Xeons and
8GB RAM.  maintenance_work_mem was set to 512MB in all three cases.


What if you double or triple the number of checkpoint segments?



checkpoint_segments was set to 128 for all tests.

-- todd


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

2009-06-19 Thread Todd A. Cook

Tom Lane wrote:

"Todd A. Cook"  writes:

First, the numbers:



  PG VersionLoad time pg_database_size autovac
--
   8.2.13179 min   92,807,992,820on
   8.3.7 180 min   84,048,744,044on (defaults)
   8.4b2 206 min   84,028,995,344on (defaults)
   8.4b2 183 min   84,028,839,696off



The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int).  Each table has a
single partial index on one of the integer columns.


Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded.  If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.


That would seem to be it:

 8.4b2  183 min  84,028,897,040 on (defaults, 
default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.

-- todd




regards, tom lane
.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FYI: Load times for a largish DB in 8.2 vs. 8.3 vs. 8.4

2009-06-22 Thread Todd A. Cook

Todd A. Cook wrote:

Tom Lane wrote:

"Todd A. Cook"  writes:

First, the numbers:
  PG VersionLoad time pg_database_size autovac
--
   8.2.13179 min   92,807,992,820on
   8.3.7 180 min   84,048,744,044on (defaults)
   8.4b2 206 min   84,028,995,344on (defaults)
   8.4b2 183 min   84,028,839,696off
The bulk of the data is in 16 tables, each having about 55 million rows of
the form (int, int, smallint, smallint, int, int, int).  Each table has a
single partial index on one of the integer columns.

Given that it's multiple tables, it'd be possible for autovacuum to
kick in and ANALYZE the data inserted into earlier tables while the
later ones were still being loaded.  If so, the discrepancy might be
explained by 8.4's more-aggressive statistics target, which means that
a background ANALYZE will take about 10x more work than before.

If you have time to repeat the experiments, it would be interesting to
see what happens with consistent default_statistics_target across 8.3
and 8.4.


That would seem to be it:

  8.4b2  183 min  84,028,897,040 on (defaults, 
default_statistics_target=10)

I'll run the test on 8.3.7 with default_statistics_target=100 over the weekend.


The results for this are also consistent with Tom's theory:

   8.3.7   205 min  84,048,866,924 on (defaults, 
default_statistics_target=100)

-- todd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook

Ben wrote:

Just the number of bits, not which ones. Basically, the hamming  distance.


I see.  Could you pre-compute the bit counts for the vectors in the table?
You could count the bits in the search vector as Martijn suggested, and then
do a lookup based on the count.

-- todd




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook

Hi,

It may be that I don't understand your problem. :)

Are you searching the table for the closest vector?  If so, is
"closeness" defined only as the number of bits that are different?
Or, do you need to know which bits as well?

-- todd


Ben wrote:
Hrm, I don't understand. Can you give me an example with some  
reasonably sized vectors?


On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote:


Hi,

Try breaking the vector into 4 bigint columns and building a multi- 
column

index, with index columns going from the most evenly distributed to  the
least.  Depending on the distribution of your data, you may only  need 2
or 3 columns in the index.  If you can cluster the table in that  order,
it should be really fast.  (This structure is a tabular form of a  linked
trie.)

-- todd


Ben wrote:

Yes, that's the straightforward way to do it. But given that my   
vectors are 256 bits in length, and that I'm going to eventually  
have  about 4 million of them to search through, I was hoping  
greater minds  than mine had figured out how to do it faster, or  how 
compute some  kind of indexing... somehow.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook

Hi,

Try breaking the vector into 4 bigint columns and building a multi-column
index, with index columns going from the most evenly distributed to the
least.  Depending on the distribution of your data, you may only need 2
or 3 columns in the index.  If you can cluster the table in that order,
it should be really fast.  (This structure is a tabular form of a linked
trie.)

-- todd


Ben wrote:
Yes, that's the straightforward way to do it. But given that my  vectors 
are 256 bits in length, and that I'm going to eventually have  about 4 
million of them to search through, I was hoping greater minds  than mine 
had figured out how to do it faster, or how compute some  kind of 
indexing... somehow.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook

Hi,

Sorry for being so abtuse; my 9 year old has been helping me with my
work today. :)

The hamming distance between two bit vectors can also be found as the
sum of distances between sub-vectors.  Let's assume you're looking
for all vectors less than d bits away from a search vector, and we'll
divide the vectors into k-bit subvectors.  The value of k will depend
on d, how sparse you data is, and how much memory you have for lookup
tables.  The goal is to eliminate the vectors that are more than d
bits away.  (This assumes that most of the search vectors are more
than d bits away; for the opposite case, just reverse the sense of
the comparisons below.)

Compute the set of k-bit vectors that Sure, but unless I can figure out some way to choose a small number  of 
vectors, I'm left with computing the full N^2 set. Given that I'm  
shooting for N to be 4 million or larger, that's a lot of data to  
store.


On Oct 2, 2005, at 12:14 PM, Todd A. Cook wrote:


Ben wrote:

Just the number of bits, not which ones. Basically, the hamming   
distance.




I see.  Could you pre-compute the bit counts for the vectors in the  
table?
You could count the bits in the search vector as Martijn suggested,  
and then

do a lookup based on the count.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Out of memory error in 8.1.0 Win32

2006-06-22 Thread Todd A. Cook

Hi,

I am consistently running into out-of-memory issues in 8.1.4 running on
RHEL3 and 8.0.5 on RHEL4.  The logs show entries like this:

AggContext: -2130714624 total in 271 blocks; 9688 free (269 chunks); 
-2130724312 used
TupleHashTable: 893902872 total in 119 blocks; 1088688 free (449 chunks); 
892814184 used

which looks mighty suspicious to me. :-;  I can provide a self-contained
test case if anyone wants to look at it.

-- todd


Relyea, Mike wrote:

So what's my next step?  How do I track down what is causing this
problem? 


-Original Message-
From: Qingqing Zhou [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 21, 2006 11:01 PM

To: Relyea, Mike
Cc: pgsql-general@postgresql.org; Tom Lane
Subject: RE: [GENERAL] Out of memory error in 8.1.0 Win32 




On Wed, 21 Jun 2006, Relyea, Mike wrote:


ExecutorState: 550339936 total in 123 blocks; 195003544 free (740135
chunks); 355336392 used
HashBatchContext: 293593176 total in 44 blocks; 3107384 free (80
chunks); 290485792 used
TIDBitmap: 2088960 total in 8 blocks; 924720 free (27 chunks); 1164240
used
HashBatchContext: 41877664 total in 11 blocks; 4845648 free (15

chunks);

37032016 used
TIDBitmap: 2088960 total in 8 blocks; 1012120 free (27 chunks);

1076840

used


The same problem. ExecutorState uses much more memory than we expect --
but not sure where they are from :-(

Regards,
Qingqing

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Out of memory error in 8.1.0 Win32

2006-06-22 Thread Todd A. Cook

Tom Lane wrote:


Misestimated hash aggregation, perhaps?  What is the query and what does
EXPLAIN show for it?  What have you got work_mem set to?


oom_test=> \d oom_tab
Table "public.oom_tab"
 Column |  Type   | Modifiers
+-+---
 val| integer |

oom_test=> explain select val,count(*) from oom_tab group by val;
   QUERY PLAN
-
 HashAggregate  (cost=1163446.13..1163448.63 rows=200 width=4)
   ->  Seq Scan on oom_tab  (cost=0.00..867748.42 rows=59139542 width=4)

The row estimitate for oom_tab is close to the actual value.  Most of
the values are unique, however, so the result should have around 59M
rows too.

I've tried it with work_mem set to 32M, 512M, 1G, and 2G.  It fails in
all cases, but it hits the failure point quicker with work_mem=32M.

-- todd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Out of memory error in 8.1.0 Win32

2006-06-22 Thread Todd A. Cook

Tom Lane wrote:

"Todd A. Cook" <[EMAIL PROTECTED]> writes:

oom_test=> explain select val,count(*) from oom_tab group by val;
QUERY PLAN
-
  HashAggregate  (cost=1163446.13..1163448.63 rows=200 width=4)
->  Seq Scan on oom_tab  (cost=0.00..867748.42 rows=59139542 width=4)



The row estimitate for oom_tab is close to the actual value.  Most of
the values are unique, however, so the result should have around 59M
rows too.


Well, that's the problem right there :-(.  Have you ANALYZEd this table?


My production table and query are more complex.  In the original, the
query above was in a sub-select; the work-around was to create a temp
table with the sub-query results, analyze it, and then do the larger
query based off of the temp table.

There have been off and on discussions on the pg lists about out of
memory issues (see 
http://archives.postgresql.org/pgsql-bugs/2006-03/msg00102.php).
I was just offering my test case as an example in case it might be of
any use in tracking those problems down. :)

-- todd


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Out of memory error in 8.1.0 Win32

2006-06-22 Thread Todd A. Cook

Tom Lane wrote:


Hmm.  One of the things that's on my TODO list is to make the planner
smarter about drilling down into sub-selects to extract statistics.
I think that's what's called for here, but your example has eliminated
all the interesting details.  Can you show us the actual query, its
EXPLAIN plan, and the definitions of the tables/views involved?


I've already altered the queries that ran into this problem.  I'll
dig the old ones out of CVS and send them to you tomorrow.

-- todd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly