[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140152#comment-17140152 ]
Danny Chen commented on CALCITE-3786: ------------------------------------- [~vladimirsitnikov] I didn't intend to do modify the RexNode normalization at first, but when i fix to the RexNode#equals instead of the pure string, i found that *i have to* because it seems not right that 2 RexCalls do not equals but have same digest. I can think of 2 more benefits to switch from pure string normalization to nodes: - We can add more normalizations logic more easily(put complex rexnodes behind), i.e. i plan to add a new component RexNormalizaiton - We should make the configuration with more flexibility, user/downstream can config through FrameworkConfig [~hyuan] Current patch already solved 2, 3 of your questions, for 1, i would fix if [~laurent] and i make agreements, but remove the Digest and add 2 more interfaces to RelNodes(#hashCode and #equals) goes too far away for these reasons: - We should be cautious for #equals #hashcode #explainTerms to have similar variables, it is error-prone for downstream projects to implement for their custom nodes - And it is hard to clarify the semantics between strict #equals1 and equivalent #equals1, what if i want a normal #equals1 there - The additional objects references should not be a bottleneck, removing it does not gain much. Based on that, i would definitely give a -1 for the commit f970df. As for 4, i'm looking forward to the RexNode memo(maybe), but it has no relationship with this issue. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > --------------------------------------------------------------------------------- > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core > Affects Versions: 1.21.0 > Reporter: Vladimir Sitnikov > Assignee: Danny Chen > Priority: Major > Fix For: 1.24.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array<Any> // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)