I’m looking at using RelFieldTrimmer, and I’m noticing that if a side of a join 
has unnecessary fields after a filter, there is no trim-fields project on that 
side to reduce the width of the row.
Is this expected, or is there a configuration or pre-processing step that I am 
missing?

For example, starting with this tree (these all look better in monospace, 
hopefully the formatting comes through)
4:Project(C5633_14509=[$4], C5633_486=[$8])
└── 3:Join(condition=[=($1, $6)], joinType=[inner])
....├── 1:Filter(condition=[<($2, 10)])
....│...└── 0:TableScan(table=[T902], Schema=[...6 fields...])
....└── 2:TableScan(table=[T895], Schema=[...64 fields...])

The result of RelFieldTrimmer is this:
9:Project(C5633_14509=[$2], C5633_486=[$4])
└── 8:Join(condition=[=($0, $3)], joinType=[inner])
....├── 6:Filter(condition=[<($1, 10)])
....│...└── 5:Project(C5633_14505=[$1], C5633_14506=[$2], C5633_14509=[$4])
....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
....└── 7:Project(ID=[$0], C5633_486=[$2])
........└── 2:TableScan(table=[T895], Schema=[...64 fields...])

Notice: $1 on the LHS of the node is not used *after* the filter so a 
projection of only the $0 and $2 fields would be reduce the width of the row 
before the join.

However, I can force the insertion of a projection which is simply the identity 
(ie, projecting all fields of the input row with now additions or subtractions):
5:Project(C5633_14509=[$4], C5633_486=[$8])
└── 4:Join(condition=[=($1, $6)], joinType=[inner])
....├── 2:Project(...Identity mapping, 6 fields...)
....│...└── 1:Filter(condition=[<($2, 10)])
....│.......└── 0:TableScan(table=[T902], Schema=[...6 fields...])
....└── 3:TableScan(table=[T895], Schema=[...64 fields...])

And the result is a projection wich only has the 2 fields necessary after the 
filter.
11:Project(C5633_14509=[$1], C5633_486=[$3])
└── 10:Join(condition=[=($0, $2)], joinType=[inner])
....├── 8:Project(C5633_14505=[$0], C5633_14509=[$2]) <- trimmed
....│...└── 7:Filter(condition=[<($1, 10)])
....│.......└── 6:Project(C5633_14505=[$1], C5633_14506=[$2], C5633_14509=[$4])
....│...........└── 0:TableScan(table=[T902], Schema=[...6 fields...])
....└── 9:Project(ID=[$0], C5633_486=[$2])
........└── 3:TableScan(table=[T895], Schema=[...64 fields...])

Thanks!
-Ian

Reply via email to