You can carry on using your own formula, but move the formula into a metadata
provider. You just don’t need to create a subclass in order for it to get
called. For example, if you’ve written
public class DrillLogicalFilter extends LogicalFilter {
public double getRows() {
return <<my formula>>;
}
}
and getRows() is its only method you can obsolete it and register the following
metadata provider:
public class DrillMdRowCount {
public Double getRowCount(LogicalFilter filter) {
return <<my formula>>;
}
}
Calcite uses double dispatch (dispatching to a method based the provider AND
its first argument type) so the method will be called automatically.
Julian
> On Nov 23, 2015, at 5:56 PM, Jinfeng Ni <[email protected]> wrote:
>
> My understanding is RelMetadataProvider gives the estimation of row
> count, distinct row count, etc. But it's still up to each Rel node to
> decide how to estimate it's own cost, given the row count, distinct
> row count etc from MetadataProvider. Are you suggesting we completely
> remove the Drill's costing estimation method, and use Calcite's
> default one?
>
>
>
> On Mon, Nov 23, 2015 at 5:35 PM, Julian Hyde <[email protected]> wrote:
>> Yes. You don’t need an “implement” method (or yours can just throw).
>>
>> You could use your own serialization to/from JSON or you could use
>> RelJsonWriter/RelJsonReader.
>>
>> Julian
>>
>>
>>> On Nov 23, 2015, at 5:31 PM, Jacques Nadeau <[email protected]> wrote:
>>>
>>> We could create serializers and deserializers for the logical plan stuff.
>>> It looks like we can resolve the costing through metadata providers unless
>>> I misunderstood what Julian was suggesting.
>>>
>>>
>>>
>>> --
>>> Jacques Nadeau
>>> CTO and Co-Founder, Dremio
>>>
>>> On Mon, Nov 23, 2015 at 5:12 PM, Jinfeng Ni <[email protected]> wrote:
>>>
>>>> @Jacaues,
>>>>
>>>> Every DrillLogicalRel has to override computeSelfCost(), and implement
>>>> implement() method. The latter is to get Logical Plan, which is one of
>>>> three input types Drill should accept (SQL, Logical Plan, Physical
>>>> Plan).
>>>>
>>>> So, for now, we do have to override/exend all DrillLogicalRel.
>>>>
>>>>
>>>> On Mon, Nov 23, 2015 at 4:55 PM, Julian Hyde <[email protected]> wrote:
>>>>> I’m not sure what properties / behavior you want to override but
>>>> remember that Calcite specifies a lot of brings as traits or metadata.
>>>>>
>>>>> For example, “double RelNode.getRows()" is deprecated and you would
>>>> these days use RelMetadataQuery.getRowCount(). You would not need to
>>>> sub-class a RelNode to override its row-count estimate, just supply a
>>>> different metadata provider.
>>>>>
>>>>> Julian
>>>>>
>>>>>
>>>>>> On Nov 23, 2015, at 4:50 PM, Jacques Nadeau <[email protected]> wrote:
>>>>>>
>>>>>> Yes, my suggestion is removal of DRILL_LOGICAL. @Hsuan, this is
>>>> independent
>>>>>> from the number of phases and I'm not suggesting changing that.
>>>>>>
>>>>>> My main thought was: if we only need to override one or two rels, do
>>>> only
>>>>>> that rather than having a wholesale copy of every operator with a bunch
>>>> of
>>>>>> basic noop rules.
>>>>>>
>>>>>> --
>>>>>> Jacques Nadeau
>>>>>> CTO and Co-Founder, Dremio
>>>>>>
>>>>>> On Mon, Nov 23, 2015 at 4:37 PM, Jinfeng Ni <[email protected]>
>>>> wrote:
>>>>>>
>>>>>>> @Jacques, are you talking about removing the convention DRILL_LOGICAL?
>>>>>>>
>>>>>>> DrillRel extends Calcite's LogialRel. It overrides some LogicalRel's
>>>>>>> methods, and adds new methods. Therefore, even we remove
>>>>>>> DRILL_LOGICAL convention, we still have to maintain a set of extended
>>>>>>> class from Calcite Logical. I'm not clear what benefit we would get by
>>>>>>> removing the DRILL_LOGICAL convention.
>>>>>>>
>>>>>>> If we want to remove the complete set of DrillLogical classes, then
>>>>>>> I'm not sure where we put the Drill specific logic, for instance,
>>>>>>> Drill Join has certain restriction different from Calcite Join.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 23, 2015 at 4:11 PM, Hsuan Yi Chu <[email protected]>
>>>> wrote:
>>>>>>>> My understanding is:
>>>>>>>> In logical planning, we determine the "structure" of the tree (e.g.,
>>>> join
>>>>>>>> order)
>>>>>>>> And then in physical, we determine the implementation (e.g., merge vs
>>>>>>> hash
>>>>>>>> join).
>>>>>>>>
>>>>>>>> This staging seems clean to me. So what is the motivation to merge
>>>> them
>>>>>>> all
>>>>>>>> together?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 23, 2015 at 2:51 PM, Jacques Nadeau <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Anybody think we should just get rid of Drels (Rel > Drel > Prel) and
>>>>>>> use
>>>>>>>>> Calcite's logical representation directly (Rel > Prel)?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jacques Nadeau
>>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>>
>>>>>>>>> On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Currently all rules based on Calcite logical rels and Drill logical
>>>>>>> rels
>>>>>>>>>> are put together and are fired together. As part of DRILL-3996,
>>>>>>> Jinfeng
>>>>>>>>>> will break it down into different phases. I should be able to take
>>>>>>>>>> advantage of this and move the directory based partition pruning to
>>>>>>> fire
>>>>>>>>>> based on Calcite rels.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Mehant
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/23/15 10:58 AM, Hanifi GUNES wrote:
>>>>>>>>>>
>>>>>>>>>>> The general idea of multi-phase pruning makes sense to me. I am
>>>>>>>>> wondering,
>>>>>>>>>>> though, are we referring to introducing a new planning phase before
>>>>>>> the
>>>>>>>>>>> logical or separating out the logic so as to make directory pruning
>>>>>>> kick
>>>>>>>>>>> off ahead of column partitioning?
>>>>>>>>>>>
>>>>>>>>>>> 2015-11-23 10:33 GMT-08:00 Mehant Baid <[email protected]>:
>>>>>>>>>>>
>>>>>>>>>>> As part of DRILL-3996 <
>>>>>>> https://issues.apache.org/jira/browse/DRILL-3996
>>>>>>>>>>
>>>>>>>>>>>> Jinfeng mentioned that he plans to move the directory based
>>>> pruning
>>>>>>>>> rule
>>>>>>>>>>>> earlier than column based pruning. I want to expand on that a
>>>>>>> little,
>>>>>>>>>>>> provide the motivation and gather thoughts/ feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently both the directory based pruning and the column based
>>>>>>> pruning
>>>>>>>>>>>> is
>>>>>>>>>>>> fired in the same planning phase and are based on Drill logical
>>>>>>> rels.
>>>>>>>>>>>> This
>>>>>>>>>>>> is not optimal in the case where data is organized in such a way
>>>>>>> that
>>>>>>>>>>>> both
>>>>>>>>>>>> directory based pruning and column based pruning can be applied
>>>>>>> (when
>>>>>>>>> the
>>>>>>>>>>>> data is organized with a nested directory structure plus the
>>>>>>> individual
>>>>>>>>>>>> files contain partition columns). As part of creating the Drill
>>>>>>> logical
>>>>>>>>>>>> scan we read the footers of all the files involved. If the
>>>> directory
>>>>>>>>>>>> based
>>>>>>>>>>>> pruning rule is fired earlier (rule to fire based on calcite
>>>> logical
>>>>>>>>>>>> rels)
>>>>>>>>>>>> then we will be able to prune out unnecessary directories and save
>>>>>>> the
>>>>>>>>>>>> work
>>>>>>>>>>>> of reading the footers of these files.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Mehant
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>