I hope this is not an inappropriate time for 9.3 discussions. The flip side of asking for submissions in the first couple commitfests means that I need to submit proposals now.
What is a Range Join? See attached SQL for example. The key difference is that the join condition is not equality, but overlaps (&&). Problem statement: slow. Nested loops are the only option, although they can benefit from an inner GiST index if available. But if the join is happening up in the plan tree somewhere, then it's impossible for any index to be available. Proposed solution: a modified merge join that can handle ranges. 1. Order the ranges on both sides by the lower bound, then upper bound. Empty ranges can be excluded entirely. 2. Left := first range on left, Right := first range on right 3. If Left or Right is empty, terminate. 4. If lower(Left) > upper(Right), discard Right, goto 2 5. If lower(Right) > upper(Left), discard Left, goto 2 6. return (Left, Right) as joined tuple 7. Right := next range on right 8. goto 3 If we get step 4 or step 5 keeps getting triggered, and a btree index is available (ordered by lower bound), we can re-probe to go to the correct position, and consider that the new top range on that side. This is an optimization for the case where there are large numbers of ranges with no match on the other side. Thanks to Nathan Boley for helping me devise this algorithm. However, any bugs are mine alone ;) Weaknesses: I haven't thought through the optimization, but I suspect it will be hard to be very accurate in the costing. That might be OK, because there aren't very many options anyway, but I'll need to think it through. Questions: * Is this idea sane? -- that is, are ranges important enough that people are willing to maintain a new operator? * The more general problem might be "spatial joins" which can operate in N dimensions, and I doubt this would work very well in that case. Does someone know of a spatial join algorithm (without IP claims) that would be as good as this one for ranges? * Other thoughts? Regards, Jeff Davis
-- rate is billing rate in dollars per hour CREATE OR REPLACE FUNCTION bill(rate NUMERIC, during TSTZRANGE) RETURNS NUMERIC LANGUAGE plpgsql AS $$ DECLARE usage_s NUMERIC; BEGIN IF isempty(during) THEN usage_s := 0; ELSE usage_s := extract(epoch from (upper(during) - lower(during))); END IF; RETURN rate * (usage_s/3600.0); END; $$; CREATE TABLE billing_rate(rate NUMERIC, during TSTZRANGE); CREATE TABLE billing_usage(customer_id INT8, during TSTZRANGE); INSERT INTO billing_rate VALUES (12.50, '[2010-01-01 14:00, 2010-01-01 15:00)'); INSERT INTO billing_rate VALUES (14.50, '[2010-01-01 15:00, 2010-01-01 15:45)'); INSERT INTO billing_usage VALUES (123, '[2010-01-01 14:30, 2010-01-01 15:30)'); INSERT INTO billing_usage VALUES (234, '[2010-01-01 14:15, 2010-01-01 15:45)'); SELECT customer_id, bill(rate, range_intersect(u.during, r.during)) AS bill FROM billing_rate r, billing_usage u WHERE r.during && u.during;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers