Re: [HACKERS] Planner estimates and cast operations ,...
On Mon, Sep 04, 2006 at 19:09:16 +0200, Hans-Juergen Schoenig <[EMAIL PROTECTED]> wrote: > > setting work_mem to 2gb does not help here ;) > set it to the max value on 8.0. > this was my first try too. > the problem is - there is no magic switch to mislead the planner a > little without hacking the system stats (which is not what people > should do i would say ;) ). Did you combine that with telling it not to use sorts? I am not sure that will really work for GROUP BY, but it is probably an easy test. You can do an explain to see what it will try without actually running the query in case it picks the poor plan again. > my question is: is adding hooks for selectivity a feasable way of > dealing with things like that? I think the expectation is that you create a functional index and that's how you would tell the system to keep stats for particular functions. I don't think data on the most common values are kept now for functional indexes, but the index itself will still have clues about the data. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Planner estimates and cast operations ,...
On Sep 4, 2006, at 7:04 PM, Bruno Wolff III wrote: On Mon, Sep 04, 2006 at 17:19:37 +0200, Hans-Juergen Schoenig <[EMAIL PROTECTED]> wrote: i thought about creating an index on the expression but the problem is that this is hardly feasable. in 8.0 (what i have here) this would block the table and i would run That may be hard to deal with. it is ... but the problem is not primarily that i have some problem with a certain query. somehow this can be solved somehow. i am thinking about GROUP BY and estimates in general here ... just wondering if there is a chance to improve ... out of disk space as well. this is a 600 gb biest :( I wouldn't expect this to be a problem. If you have 10^9 rows, I would expect the index to be less than 10% of you current size. If you are so close to your disk space limit that that is a problem, you have a problem in any case. the index itself is not too large but when building it up it is written several times. it is not funny when dealing with so much data ... what about the planner approach? this would solve the problem for some other issues as well. an index might not be flexible enough :(. If you disable sorting you might be able to get it to switch plans. Lying about the amount of work memory so that the planner thinks the hash will fit in memory despite its misguessing the number of buckets might also help. setting work_mem to 2gb does not help here ;) set it to the max value on 8.0. this was my first try too. the problem is - there is no magic switch to mislead the planner a little without hacking the system stats (which is not what people should do i would say ;) ). my question is: is adding hooks for selectivity a feasable way of dealing with things like that? hans ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Planner estimates and cast operations ,...
On Mon, Sep 04, 2006 at 17:19:37 +0200, Hans-Juergen Schoenig <[EMAIL PROTECTED]> wrote: > > i thought about creating an index on the expression but the problem > is that this is hardly feasable. > in 8.0 (what i have here) this would block the table and i would run That may be hard to deal with. > out of disk space as well. this is a 600 gb biest :( I wouldn't expect this to be a problem. If you have 10^9 rows, I would expect the index to be less than 10% of you current size. If you are so close to your disk space limit that that is a problem, you have a problem in any case. > > what about the planner approach? > this would solve the problem for some other issues as well. an index > might not be flexible enough :(. If you disable sorting you might be able to get it to switch plans. Lying about the amount of work memory so that the planner thinks the hash will fit in memory despite its misguessing the number of buckets might also help. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Planner estimates and cast operations ,...
hi tom ... i thought about creating an index on the expression but the problem is that this is hardly feasable. in 8.0 (what i have here) this would block the table and i would run out of disk space as well. this is a 600 gb biest :( what about the planner approach? this would solve the problem for some other issues as well. an index might not be flexible enough :(. many thanks, hans On Sep 4, 2006, at 4:57 PM, Tom Lane wrote: Hans-Juergen Schoenig <[EMAIL PROTECTED]> writes: consider the following: SELECT some_timestamp::date FROM very_large_table GROUP BY some_timestamp::date my very_large_table is around 1billion entries. the problem is: the planner has a problem here as it is taking the (correct) estimates for timestamp. this avoids a HashAggregate because the dataset seems to large for work_mem. what the planner cannot know is that the number of days is quite limited (in my case around 1000 different values). i wonder how to teach the planner to take the cast into consideration. Create an index on that expression. regression=# create table foo(x) as select x * '864 sec'::interval + now()::timestamp from generate_series(1,1) x; SELECT regression=# analyze foo; ANALYZE regression=# explain select x::date from foo group by x::date; QUERY PLAN --- HashAggregate (cost=205.00..330.00 rows=1 width=8) -> Seq Scan on foo (cost=0.00..180.00 rows=1 width=8) (2 rows) regression=# create index fooi on foo((x::date)); CREATE INDEX regression=# analyze foo; ANALYZE regression=# explain select x::date from foo group by x::date; QUERY PLAN --- HashAggregate (cost=205.00..206.26 rows=101 width=8) -> Seq Scan on foo (cost=0.00..180.00 rows=1 width=8) (2 rows) regression=# I had to cheat a little bit here: I tried to do this example with a timestamptz column, and the index creation failed because timestamptz to date isn't immutable (it depends on TimeZone). If yours is too, you could perhaps do something involving AT TIME ZONE to generate an immutable conversion to date. It would perhaps make sense to provide a way to cue ANALYZE to compute stats on expressions that aren't actually being indexed, but I see no good reason to limit our attention to cast expressions. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Planner estimates and cast operations ,...
Hans-Juergen Schoenig <[EMAIL PROTECTED]> writes: > consider the following: > SELECT some_timestamp::date FROM very_large_table GROUP BY > some_timestamp::date > my very_large_table is around 1billion entries. > the problem is: the planner has a problem here as it is taking the > (correct) estimates for timestamp. this avoids a HashAggregate > because the dataset seems to large for work_mem. > what the planner cannot know is that the number of days is quite > limited (in my case around 1000 different values). > i wonder how to teach the planner to take the cast into consideration. Create an index on that expression. regression=# create table foo(x) as select x * '864 sec'::interval + now()::timestamp from generate_series(1,1) x; SELECT regression=# analyze foo; ANALYZE regression=# explain select x::date from foo group by x::date; QUERY PLAN --- HashAggregate (cost=205.00..330.00 rows=1 width=8) -> Seq Scan on foo (cost=0.00..180.00 rows=1 width=8) (2 rows) regression=# create index fooi on foo((x::date)); CREATE INDEX regression=# analyze foo; ANALYZE regression=# explain select x::date from foo group by x::date; QUERY PLAN --- HashAggregate (cost=205.00..206.26 rows=101 width=8) -> Seq Scan on foo (cost=0.00..180.00 rows=1 width=8) (2 rows) regression=# I had to cheat a little bit here: I tried to do this example with a timestamptz column, and the index creation failed because timestamptz to date isn't immutable (it depends on TimeZone). If yours is too, you could perhaps do something involving AT TIME ZONE to generate an immutable conversion to date. It would perhaps make sense to provide a way to cue ANALYZE to compute stats on expressions that aren't actually being indexed, but I see no good reason to limit our attention to cast expressions. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Planner estimates and cast operations ,...
i am looking at some corner case which might also cause troubles for other people. consider the following: SELECT some_timestamp::date FROM very_large_table GROUP BY some_timestamp::date my very_large_table is around 1billion entries. the problem is: the planner has a problem here as it is taking the (correct) estimates for timestamp. this avoids a HashAggregate because the dataset seems to large for work_mem. what the planner cannot know is that the number of days is quite limited (in my case around 1000 different values). i wonder how to teach the planner to take the cast into consideration. at the moment the planner uses the per column statistics - it cannot know that the cast might change the number of different values. how about the following? Command: CREATE CAST Description: define a new cast Syntax: CREATE CAST (sourcetype AS targettype) [USING SELECTIVITY number | funcname(argtypes)] WITH FUNCTION funcname (argtypes) [ AS ASSIGNMENT | AS IMPLICIT ] if it was possible to assign a constant or some function to the cast i think we could make the example used above work. by default no costs are changed. if somebody is doing some fancy query it would be possible to tweak GOUOP BY planning by assigning some cleverly written function or a constant to the scenery. a constant would be useful in terms of casts to boolean or so. does anybody have an idea which could help solving this issue? best regards, hans ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly