Re: [HACKERS] GEQO: ERX
Is this a TODO item? --- Adriano Lange wrote: Hi Tobias Zahn escreveu: Hello Adriano, thank you very much for posting your patch. I think it will help to make further work easier, too. I hope you don't mind when I ask you some questions. When you said that this new approach is worse or equal than GEQO, did you refer to performance or to the quality of results? Not exactly this approach, but the implemented (and not configured) algorithm was worse than GEQO in a little test made. I just used a sequence of 8 executions of a query with 18 relations for each algorithm. The costs generated by GEQO was little better than 2PO, in average and standard deviation. But 8 executions and 1 query don't prove anything. I want to make some further tests, but this little difference seems good for me. Why do you think that compressed annealing might be the better approach? I don't think if compressed annealing is better or not. I don't read about it yet. However, an optimizer can be better in a context but worse in another. Regards, Adriano -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Bruce Momjian br...@momjian.us writes: Is this a TODO item? We already have a TODO item about replacing GEQO. However, linking to this thread might be more useful than the 404 that's there now... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Is this a TODO item? We already have a TODO item about replacing GEQO. However, linking to this thread might be more useful than the 404 that's there now... Removed and added. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Hello Adriano, thank you very much for posting your patch. I think it will help to make further work easier, too. I hope you don't mind when I ask you some questions. When you said that this new approach is worse or equal than GEQO, did you refer to performance or to the quality of results? Why do you think that compressed annealing might be the better approach? TIA and best regards, Tobias Zahn Adriano Lange schrieb: Robert Haas escreveu: On Wed, May 13, 2009 at 4:14 PM, Tobias Zahn tobias-z...@arcor.de wrote: Hello, thank you for posting the paper, it was quite interesting to read. I think it would be a good idea to give the two-phase optimization a try. As far as I know and understand the current (geqo) optimizer source, many important parts are already there. For example, we can calculate the costs of a given join order. Therefore, it would only be necessary to write an algorithm witch chooses the right input for the cost function. I would be interested in your opinion. I'm very interested in any improvements we can make to planning large join nests. Unfortunately the paper seems to conclude that it's not really feasible to use heuristics, as had been my hope, but I'd be very interested in any other approaches we can come up with. I probably do not have time to implement anything myself, but I'm happy to help with ideas and code review. I implemented the 2PO algorithm last month but I didn't have much time to do an extensive test and to comment all code. The code was posted in this list in a previous thread. In that occasion, I was interested in a kind of cache structure to avoid the constructing a complete RelOptInfo from scratch every time when the cheapest total_cost must be calculated (this occur in GEQO). I’m sending a patch for the 8.3 release. I also changed some GUC variables to facilitate some tests: (remove) geqo (remove) geqo_threshold (add) ljqo_enable (bool) – activate Large Join Query Optimizer (add) ljqo_algorithm {geqo|twopo} (add) ljqo_threshold (int) – like geqo_threshold (add) twopo_heuristic_states (bool) – initial heuristic states (add) twopo_ii_stop (int) – II phase loop (add) twopo_sa_phase (bool) – enable SA phase (add) twopo_sa_initial_temperature (float) (add) twopo_sa_temperature_reduction (float) (add) twopo_sa_equilibrium (int) In my little tests, this algorithm seems equal or worse than geqo, except when using heuristic in order to bias the initial state. Maybe some tunings are needed but I prefer spend yet some time reading more about the compressed annealing, cited in TODO list. Anyway, I think that to build another annealing-like algorithm might be easier if some structures and functions in 2PO source code are correct. Sincerely, Adriano Lange -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Hi Tobias Zahn escreveu: Hello Adriano, thank you very much for posting your patch. I think it will help to make further work easier, too. I hope you don't mind when I ask you some questions. When you said that this new approach is worse or equal than GEQO, did you refer to performance or to the quality of results? Not exactly this approach, but the implemented (and not configured) algorithm was worse than GEQO in a little test made. I just used a sequence of 8 executions of a query with 18 relations for each algorithm. The costs generated by GEQO was little better than 2PO, in average and standard deviation. But 8 executions and 1 query don't prove anything. I want to make some further tests, but this little difference seems good for me. Why do you think that compressed annealing might be the better approach? I don't think if compressed annealing is better or not. I don't read about it yet. However, an optimizer can be better in a context but worse in another. Regards, Adriano -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Robert Haas escreveu: On Wed, May 13, 2009 at 4:14 PM, Tobias Zahn tobias-z...@arcor.de wrote: Hello, thank you for posting the paper, it was quite interesting to read. I think it would be a good idea to give the two-phase optimization a try. As far as I know and understand the current (geqo) optimizer source, many important parts are already there. For example, we can calculate the costs of a given join order. Therefore, it would only be necessary to write an algorithm witch chooses the right input for the cost function. I would be interested in your opinion. I'm very interested in any improvements we can make to planning large join nests. Unfortunately the paper seems to conclude that it's not really feasible to use heuristics, as had been my hope, but I'd be very interested in any other approaches we can come up with. I probably do not have time to implement anything myself, but I'm happy to help with ideas and code review. I implemented the 2PO algorithm last month but I didn't have much time to do an extensive test and to comment all code. The code was posted in this list in a previous thread. In that occasion, I was interested in a kind of cache structure to avoid the constructing a complete RelOptInfo from scratch every time when the cheapest total_cost must be calculated (this occur in GEQO). I’m sending a patch for the 8.3 release. I also changed some GUC variables to facilitate some tests: (remove) geqo (remove) geqo_threshold (add) ljqo_enable (bool) – activate Large Join Query Optimizer (add) ljqo_algorithm {geqo|twopo} (add) ljqo_threshold (int) – like geqo_threshold (add) twopo_heuristic_states (bool) – initial heuristic states (add) twopo_ii_stop (int) – II phase loop (add) twopo_sa_phase (bool) – enable SA phase (add) twopo_sa_initial_temperature (float) (add) twopo_sa_temperature_reduction (float) (add) twopo_sa_equilibrium (int) In my little tests, this algorithm seems equal or worse than geqo, except when using heuristic in order to bias the initial state. Maybe some tunings are needed but I prefer spend yet some time reading more about the compressed annealing, cited in TODO list. Anyway, I think that to build another annealing-like algorithm might be easier if some structures and functions in 2PO source code are correct. Sincerely, Adriano Lange -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Adriano Lange escreveu: I implemented the 2PO algorithm last month but I didn't have much time to do an extensive test and to comment all code. The code was posted in this list in a previous thread. In that occasion, I was interested in a kind of cache structure to avoid the constructing a complete RelOptInfo from scratch every time when the cheapest total_cost must be calculated (this occur in GEQO). I’m sending a patch for the 8.3 release. I forgot it. I also changed some GUC variables to facilitate some tests: (remove) geqo (remove) geqo_threshold (add) ljqo_enable (bool) – activate Large Join Query Optimizer (add) ljqo_algorithm {geqo|twopo} (add) ljqo_threshold (int) – like geqo_threshold (add) twopo_heuristic_states (bool) – initial heuristic states (add) twopo_ii_stop (int) – II phase loop (add) twopo_sa_phase (bool) – enable SA phase (add) twopo_sa_initial_temperature (float) (add) twopo_sa_temperature_reduction (float) (add) twopo_sa_equilibrium (int) In my little tests, this algorithm seems equal or worse than geqo, except when using heuristic in order to bias the initial state. Maybe some tunings are needed but I prefer spend yet some time reading more about the compressed annealing, cited in TODO list. Anyway, I think that to build another annealing-like algorithm might be easier if some structures and functions in 2PO source code are correct. Sincerely, Adriano Lange Index: src/include/optimizer/ljqo.h === --- src/include/optimizer/ljqo.h (revisão 0) +++ src/include/optimizer/ljqo.h (revisão 14) @@ -0,0 +1,39 @@ +/* + * + * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * contributed by: + *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* + * Adriano Lange * C3SL - Centro de Computação* + * adri...@c3sl.ufpr.br * CientÃfica e Software Livre / * + * * Departamento de Informática / * + * * Universidade Federal do Paraná * + * * Curitiba, Brasil * + * * * + * * http://www.c3sl.ufpr.br* + * * * + *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* + */ + +#ifndef LJQO_H +#define LJQO_H + +#include nodes/relation.h + +typedef enum ljqo_algorithms { + ljqo_a_geqo, + ljqo_a_twopo, +} ljqo_algorithms; + +#define DEFAULT_LJQO_ENABLE true +#define MIN_LJQO_THRESHOLD 2 +#define DEFAULT_LJQO_THRESHOLD 10 +#define DEFAULT_LJQO_ALGORITHM ljqo_a_geqo +#define DEFAULT_LJQO_ALGORITHM_STR geqo + +const char * assign_ljqo_algorithm(const char *newval, bool doit); +RelOptInfo * ljqo(PlannerInfo *root, int levels_needed, List *initial_rels); + + +#endif /* LJQO_H */ Index: src/include/optimizer/twopo.h === --- src/include/optimizer/twopo.h (revisão 0) +++ src/include/optimizer/twopo.h (revisão 14) @@ -0,0 +1,52 @@ +/* + * twopo.h + *Two Phase Optimization + * + * Portions Copyright (c) 1996-2009, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * contributed by: + *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* + * Adriano Lange * C3SL - Centro de Computação* + * adri...@c3sl.ufpr.br * CientÃfica e Software Livre / * + * * Departamento de Informática / * + * * Universidade Federal do Paraná * + * * Curitiba, Brasil * + * * * + * * http://www.c3sl.ufpr.br* + * * * + *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=* + * + * Implementation based on: + * Y. E. Ioannidis and Y. Kang, Randomized algorithms for optimizing + * large join queries, SIGMOD Rec., vol. 19, no. 2, pp. 312â321, 1990. + */ + +#ifndef TWOPO_H +#define TWOPO_H + +#include nodes/relation.h + +#define DEFAULT_TWOPO_HEURISTIC_STATES true +#define DEFAULT_TWOPO_II_STOP 10 +#define MIN_TWOPO_II_STOP 1 +#define DEFAULT_TWOPO_SA_PHASE true +#define DEFAULT_TWOPO_SA_INITIAL_TEMPERATURE0.2 +#define MIN_TWOPO_SA_INITIAL_TEMPERATURE0.01 +#define DEFAULT_TWOPO_SA_TEMPERATURE_REDUCTION 0.95 +#define MIN_TWOPO_SA_TEMPERATURE_REDUCTION 0.1 +#define DEFAULT_TWOPO_SA_EQUILIBRIUM16 +#define
Re: [HACKERS] GEQO: ERX
Hello, thank you for posting the paper, it was quite interesting to read. I think it would be a good idea to give the two-phase optimization a try. As far as I know and understand the current (geqo) optimizer source, many important parts are already there. For example, we can calculate the costs of a given join order. Therefore, it would only be necessary to write an algorithm witch chooses the right input for the cost function. I would be interested in your opinion. Regards, Tobias Robert Haas schrieb: On Sat, May 2, 2009 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Tobias Zahn tobias-z...@arcor.de writes: I didn't not get any response to my initial message below. Now I am wondering if nobody is into the optimizer or if my question was just too stupid. Could you please give me some clues? Your help would really be appreciated. Well, nobody's into GEQO very much. I took a quick look and didn't think that deleting the ERX support would save anything noticeable, but you're welcome to try it if you think different. The real problem with working on GEQO, in my humble opinion, is that it's throwing good effort after bad. That module doesn't need marginal fixing, it needs throwing away and rewriting from scratch. Bad enough that it's convoluted and full of dead (experimental?) code; but I don't even believe that it's based on a good analogy. The planning problem is not all that much like traveling salesman problems, so heuristics designed for TSP are of pretty questionable usefulness to start with. That complaint could have been refuted if the module performed well, but in fact if you check the archives you'll find many many complaints about it --- its ability to find good plans seems to be mostly dependent on luck. My knowledge of AI search algorithms is about 20 years obsolete, but last I heard simulated annealing had overtaken genetic algorithms for many purposes. It might be interesting to try a rewrite based on SA; or maybe there's something better out there now. There's a 1997 article on this topic that's pretty interesting. Heuristic and randomized optimization for the join ordering problem http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf Here's the basic conclusion: If good solutions are of highest importance, Two-Phase Optimization, the algorithm that performed best in our experiments, is a very good choice; other Simulated Annealing variants, for instance Toured Simulated Annealing (TSA, LVZ93]), that we did not implement, are likely to achieve quite similar results. The 'pure' Simulated Annealing algorithm has a much higher running time without yielding significantly better solutions. If short running time is more important, Iterative Improvement (IIIO), the genetic algo- rithm (BushyGenetic), and, to a lesser extent, Two-Phase Optimization (2PO) are feasible alternatives. I'm not sure if there's anything more recent out there. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
On Wed, May 13, 2009 at 4:14 PM, Tobias Zahn tobias-z...@arcor.de wrote: Hello, thank you for posting the paper, it was quite interesting to read. I think it would be a good idea to give the two-phase optimization a try. As far as I know and understand the current (geqo) optimizer source, many important parts are already there. For example, we can calculate the costs of a given join order. Therefore, it would only be necessary to write an algorithm witch chooses the right input for the cost function. I would be interested in your opinion. I'm very interested in any improvements we can make to planning large join nests. Unfortunately the paper seems to conclude that it's not really feasible to use heuristics, as had been my hope, but I'd be very interested in any other approaches we can come up with. I probably do not have time to implement anything myself, but I'm happy to help with ideas and code review. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
On Sat, May 2, 2009 at 11:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Tobias Zahn tobias-z...@arcor.de writes: I didn't not get any response to my initial message below. Now I am wondering if nobody is into the optimizer or if my question was just too stupid. Could you please give me some clues? Your help would really be appreciated. Well, nobody's into GEQO very much. I took a quick look and didn't think that deleting the ERX support would save anything noticeable, but you're welcome to try it if you think different. The real problem with working on GEQO, in my humble opinion, is that it's throwing good effort after bad. That module doesn't need marginal fixing, it needs throwing away and rewriting from scratch. Bad enough that it's convoluted and full of dead (experimental?) code; but I don't even believe that it's based on a good analogy. The planning problem is not all that much like traveling salesman problems, so heuristics designed for TSP are of pretty questionable usefulness to start with. That complaint could have been refuted if the module performed well, but in fact if you check the archives you'll find many many complaints about it --- its ability to find good plans seems to be mostly dependent on luck. My knowledge of AI search algorithms is about 20 years obsolete, but last I heard simulated annealing had overtaken genetic algorithms for many purposes. It might be interesting to try a rewrite based on SA; or maybe there's something better out there now. There's a 1997 article on this topic that's pretty interesting. Heuristic and randomized optimization for the join ordering problem http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf Here's the basic conclusion: If good solutions are of highest importance, Two-Phase Optimization, the algorithm that performed best in our experiments, is a very good choice; other Simulated Annealing variants, for instance Toured Simulated Annealing (TSA, LVZ93]), that we did not implement, are likely to achieve quite similar results. The 'pure' Simulated Annealing algorithm has a much higher running time without yielding significantly better solutions. If short running time is more important, Iterative Improvement (IIIO), the genetic algo- rithm (BushyGenetic), and, to a lesser extent, Two-Phase Optimization (2PO) are feasible alternatives. I'm not sure if there's anything more recent out there. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Hello again, I didn't not get any response to my initial message below. Now I am wondering if nobody is into the optimizer or if my question was just too stupid. Could you please give me some clues? Your help would really be appreciated. Regards, Tobias Hello, I was digging through the optimizer code and have a question regarding the edge recombination crossover (ERX) of the GEQO. It might be completely stupid and therefore I apologize for this in advance. As far as I understand it, the idea of the ERX is the minimization of edge failures. When reading in geqo_main.c and geqo_erx.c, it seams that every iterative round (generation) it is checked by gimme_tour() if there where any edge failures. When I understand the algorithm right, there should be no edge failures. Therefore I think about NOT checking for edge failures anymore, to save some time. (If it just to make sure, it could be done only once in the end.) Might that work or do I have some errors in my thoughts? Thanks in advance, Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Tobias Zahn tobias-z...@arcor.de writes: I didn't not get any response to my initial message below. Now I am wondering if nobody is into the optimizer or if my question was just too stupid. Could you please give me some clues? Your help would really be appreciated. Well, nobody's into GEQO very much. I took a quick look and didn't think that deleting the ERX support would save anything noticeable, but you're welcome to try it if you think different. The real problem with working on GEQO, in my humble opinion, is that it's throwing good effort after bad. That module doesn't need marginal fixing, it needs throwing away and rewriting from scratch. Bad enough that it's convoluted and full of dead (experimental?) code; but I don't even believe that it's based on a good analogy. The planning problem is not all that much like traveling salesman problems, so heuristics designed for TSP are of pretty questionable usefulness to start with. That complaint could have been refuted if the module performed well, but in fact if you check the archives you'll find many many complaints about it --- its ability to find good plans seems to be mostly dependent on luck. My knowledge of AI search algorithms is about 20 years obsolete, but last I heard simulated annealing had overtaken genetic algorithms for many purposes. It might be interesting to try a rewrite based on SA; or maybe there's something better out there now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GEQO: ERX
Hi, Le 2 mai 09 à 17:37, Tom Lane a écrit : My knowledge of AI search algorithms is about 20 years obsolete, but last I heard simulated annealing had overtaken genetic algorithms for many purposes. It might be interesting to try a rewrite based on SA; or maybe there's something better out there now. I've done very very few courses in the domain, and was tough SA too (about 10 years ago). But what I gathered more recently was that it's already being obsoleted by fuzzy logic ideas, which implementations are more and more reliable. The idea would be to offer typical queries and possible plans, and to tell apart the very good ones from the good and pretty bad ones. This would serve as a model for the fuzzy logic engine to take decisions and be able to give back best possible plan (as showed) for a given input set (the query). I'm very unclear how to better specify learning methods or models, etc, and probably I should have left this part away. But it well seems it's the better out there now trend. Recreating the same end-product from a given recipe and new constraint (raw milk enabled industrial products is forbidden in this country) is an example of application. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GEQO: ERX
Hello, I was digging through the optimizer code and have a question regarding the edge recombination crossover (ERX) of the GEQO. It might be completely stupid and therefore I apologize for this in advance. As far as I understand it, the idea of the ERX is the minimization of edge failures. When reading in geqo_main.c and geqo_erx.c, it seams that every iterative round (generation) it is checked by gimme_tour() if there where any edge failures. When I understand the algorithm right, there should be no edge failures. Therefore I think about NOT checking for edge failures anymore, to save some time. (If it just to make sure, it could be done only once in the end.) Might that work or do I have some errors in my thoughts? Thanks in advance, Tobias -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers