Wouldn't this be a first step towards a cost based optimizer? I think it would pay in the long run to start thinking about a general framework now. Hacking it into the current framework might provide only a short term solution and not a very elegant one. Probably the best place to start would be database literature.
I am also interested in the topic and would like to help where I can :) Cheers, -- Gianmarco On Wed, Nov 3, 2010 at 06:09, Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com> wrote: > A couple of weeks ago on a list discussion Alan suggested me an interesting > project which consists in the idea of switching join operators based on some > data properties e.g. at logical plan compiling time, a specific join > operator might be chosen, but maybe this operator is probably not the most > suitable for the data. For example, if both data sources are ordered by its > key, then a merge join would be the best operator. > But at this point I dunno how I should proceed. I have some 'general doubts' > about the approach that should be taken. > > 1. Data statistics can be passed to the LoadFunc by using the LoadMetadata > interface right? But how should these statistics be collected? should I > modify the LOLoad class to use a different LoadFunc? > 2. And how would these statistics be passed to the optimizer to change (if > it were the case) the join operator? > > Please correct me if I am wrong (which I probably am), and any suggestions > or comments are highly appreciated. > Thanks in advance. > > > Renato M. >