[ https://issues.apache.org/jira/browse/CALCITE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17132916#comment-17132916 ]
Haisheng Yuan commented on CALCITE-3786: ---------------------------------------- {quote} That's not true, for RelNode we only bookeep its class name and id, for rex node only an additional object reference. {quote} Let's use Join operator as an example. It has 11 members (without the original string digest): left, right, rowType, cluster, id, traitset, condition, variablesSet, hints, joinType, joinInfo. Suppose all by reference they will use 44 bytes shallow heap size. The digest of Join has: hashCode, rel, digest, items. The reference use 16 bytes. And items is a list with 10 elements (the default size is 10), uses 40 bytes. Inside the items array list, it has 4 pairs \{"left", leftinput\}, \{"right", rightinput\}, \{"condition", condition\}, \{"joinType", jointype\}, which use 32 bytes. If the relnode is registered inside VolcanoPlanner, the left and right inputs are subsets, which will be converted to string with set_ID+traitset, which can take more than 10 bytes for a single input. The Digest itself can use more than 100 bytes. {quote} The digest behaves like a tool to unify the logic. We can also let each rel node to handle its "digest" but the code would be a mess. {quote} Do we really need a tool to unify the logic? It sounds like forced marriage. The reality is that with digest it is a mess. We can't even use an object's hashcode and equals to determine it is equals with another object or not, but rather need to override explainTerms(), isn't it counter-intuitive? {quote} They are different instances, but all the operators expect to be a singleton, and each time we copy the RexCalls in the rule, we just pass around the object reference. {quote} Did you try to debug it? It is true that when you copy the RexCall it just pass the reference, but after column pruning or project transpose, the RexCall might be a complete new object, even there are equivalent rexcall already exist. > Add Digest interface to enable efficient hashCode(equals) for RexNode and > RelNode > --------------------------------------------------------------------------------- > > Key: CALCITE-3786 > URL: https://issues.apache.org/jira/browse/CALCITE-3786 > Project: Calcite > Issue Type: New Feature > Components: core > Affects Versions: 1.21.0 > Reporter: Vladimir Sitnikov > Assignee: Danny Chen > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Current digests for RexNode, RelNode, RelType, and similar cases use String > concatenation. > It is easy to implement, however, it has drawbacks: > 1) String objects cannot be reused. For instance, RexCall has operands, > however, the digest is duplicated. It causes extra memory use and extra CPU > for string copying > 2) There's no way to have multiple #toString() methods. RelType might need > multiple digests: "including field names", "excluding field names". > A suggested resolution might be behind the lines of > {code:java} > class Digest { // immutable > final int hashCode; // speedup hashCode and equals > final Object[] contents; // The values are either other Digest objects or > Strings > String toString(); // e.g. for debugging purposes > int compareTo(Digest); // e.g. for debugging purposes. > } > {code} > Note how fields in Kotlin are aligned much better, and it makes it easier to > read: > {code:java} > class Digest { // immutable > val hashCode: Int // speedup hashCode and equals > val contents: Array<Any> // The values are either other Digest objects or > Strings > fun toString(): String // e.g. for debugging purposes > fun compareTo(other: Digest): Int // e.g. for debugging purposes. > } > {code} > Then the digest for RexCall could be the bits relevant to RexCall itself + > digests of the operands (which can be reused as is) -- This message was sent by Atlassian Jira (v8.3.4#803005)