Re: [HACKERS] GEQO: ERX

2009-05-20 Thread Tobias Zahn
Hello Adriano,
thank you very much for posting your patch. I think it will help to make
further work easier, too. I hope you don't mind when I ask you some
questions.

When you said that this new approach is worse or equal than GEQO, did
you refer to performance or to the quality of results?
Why do you think that compressed annealing might be the better approach?

TIA and best regards,
Tobias Zahn


Adriano Lange schrieb:
 Robert Haas escreveu:
 On Wed, May 13, 2009 at 4:14 PM, Tobias Zahn tobias-z...@arcor.de
 wrote:
 Hello,
 thank you for posting the paper, it was quite interesting to read. I
 think it would be a good idea to give the two-phase optimization a try.
 As far as I know and understand the current (geqo) optimizer source,
 many important parts are already there. For example, we can calculate
 the costs of a given join order. Therefore, it would only be necessary
 to write an algorithm witch chooses the right input for the cost
 function.
 I would be interested in your opinion.

 I'm very interested in any improvements we can make to planning large
 join nests.  Unfortunately the paper seems to conclude that it's not
 really feasible to use heuristics, as had been my hope, but I'd be
 very interested in any other approaches we can come up with.  I
 probably do not have time to implement anything myself, but I'm happy
 to help with ideas and code review.

 
 I implemented the 2PO algorithm last month but I didn't have much time
 to do an extensive test and to comment all code. The code was posted in
 this list in a previous thread. In that occasion, I was interested in a
 kind of cache structure to avoid the constructing a complete RelOptInfo
 from scratch every time when the cheapest total_cost must be calculated
 (this occur in GEQO).
 
 I’m sending a patch for the 8.3 release.
 
 I also changed some GUC variables to facilitate some tests:
 (remove) geqo
 (remove) geqo_threshold
 (add) ljqo_enable (bool) – activate Large Join Query Optimizer
 (add) ljqo_algorithm {geqo|twopo}
 (add) ljqo_threshold (int) – like geqo_threshold
 (add) twopo_heuristic_states (bool) – initial heuristic states
 (add) twopo_ii_stop (int) – II phase loop
 (add) twopo_sa_phase (bool) – enable SA phase
 (add) twopo_sa_initial_temperature (float)
 (add) twopo_sa_temperature_reduction (float)
 (add) twopo_sa_equilibrium (int)
 
 In my little tests, this algorithm seems equal or worse than geqo,
 except when using heuristic in order to bias the initial state. Maybe
 some tunings are needed but I prefer spend yet some time reading more
 about the compressed annealing, cited in TODO list. Anyway, I think that
 to build another annealing-like algorithm might be easier if some
 structures and functions in 2PO source code are correct.
 
 Sincerely,
 
 Adriano Lange
 
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GEQO: ERX

2009-05-13 Thread Tobias Zahn
Hello,
thank you for posting the paper, it was quite interesting to read. I
think it would be a good idea to give the two-phase optimization a try.
As far as I know and understand the current (geqo) optimizer source,
many important parts are already there. For example, we can calculate
the costs of a given join order. Therefore, it would only be necessary
to write an algorithm witch chooses the right input for the cost function.
I would be interested in your opinion.

Regards,
Tobias

Robert Haas schrieb:
 On Sat, May 2, 2009 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Tobias Zahn tobias-z...@arcor.de writes:
 I didn't not get any response to my initial message below. Now I am
 wondering if nobody is into the optimizer or if my question was just too
 stupid. Could you please give me some clues? Your help would really be
 appreciated.
 Well, nobody's into GEQO very much.  I took a quick look and didn't
 think that deleting the ERX support would save anything noticeable,
 but you're welcome to try it if you think different.

 The real problem with working on GEQO, in my humble opinion, is that
 it's throwing good effort after bad.  That module doesn't need marginal
 fixing, it needs throwing away and rewriting from scratch.  Bad enough
 that it's convoluted and full of dead (experimental?) code; but I don't
 even believe that it's based on a good analogy.   The planning problem
 is not all that much like traveling salesman problems, so heuristics
 designed for TSP are of pretty questionable usefulness to start with.
 That complaint could have been refuted if the module performed well,
 but in fact if you check the archives you'll find many many complaints
 about it --- its ability to find good plans seems to be mostly dependent
 on luck.

 My knowledge of AI search algorithms is about 20 years obsolete, but
 last I heard simulated annealing had overtaken genetic algorithms for
 many purposes.  It might be interesting to try a rewrite based on SA;
 or maybe there's something better out there now.
 
 There's a 1997 article on this topic that's pretty interesting.
 
 Heuristic and randomized optimization for the join ordering problem
 http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf
 
 Here's the basic conclusion:
 
 If good solutions are of highest importance, Two-Phase Optimization,
 the algorithm that performed best in our experiments, is a very good
 choice; other Simulated Annealing variants, for instance Toured
 Simulated Annealing (TSA, LVZ93]), that we did not implement, are
 likely to achieve quite similar results. The 'pure' Simulated
 Annealing algorithm has a much higher running time without yielding
 significantly better solutions. If short running time is more
 important, Iterative Improvement (IIIO), the genetic algo- rithm
 (BushyGenetic), and, to a lesser extent, Two-Phase Optimization (2PO)
 are feasible alternatives.
 
 I'm not sure if there's anything more recent out there.
 
 ...Robert
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GEQO: ERX

2009-05-02 Thread Tobias Zahn
Hello again,
I didn't not get any response to my initial message below. Now I am
wondering if nobody is into the optimizer or if my question was just too
stupid. Could you please give me some clues? Your help would really be
appreciated.

Regards,
Tobias

 Hello,
 I was digging through the optimizer code and have a question regarding
 the edge recombination crossover (ERX) of the GEQO.  It might be
 completely stupid and therefore I apologize for this in advance.
 
 As far as I understand it, the idea of the ERX is the minimization of
 edge failures. When reading in geqo_main.c and geqo_erx.c, it seams that
 every iterative round (generation) it is checked by gimme_tour() if
 there where any edge failures. When I understand the algorithm right,
 there should be no edge failures.
 Therefore I think about NOT checking for edge failures anymore, to save
 some time. (If it just to make sure, it could be done only once in the
 end.) Might that work or do I have some errors in my thoughts?
 
 Thanks in advance,
 Tobias
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] GEQO: ERX

2009-04-20 Thread Tobias Zahn
Hello,
I was digging through the optimizer code and have a question regarding
the edge recombination crossover (ERX) of the GEQO.  It might be
completely stupid and therefore I apologize for this in advance.

As far as I understand it, the idea of the ERX is the minimization of
edge failures. When reading in geqo_main.c and geqo_erx.c, it seams that
every iterative round (generation) it is checked by gimme_tour() if
there where any edge failures. When I understand the algorithm right,
there should be no edge failures.
Therefore I think about NOT checking for edge failures anymore, to save
some time. (If it just to make sure, it could be done only once in the
end.) Might that work or do I have some errors in my thoughts?

Thanks in advance,
Tobias

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers