[ 
https://issues.apache.org/jira/browse/FLINK-18687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Caizhi Weng closed FLINK-18687.
-------------------------------
    Resolution: Duplicate

> ProjectionCodeGenerator#generateProjectionExpression should remove for loop 
> optimization
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-18687
>                 URL: https://issues.apache.org/jira/browse/FLINK-18687
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.11.0
>            Reporter: Caizhi Weng
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> If too many fields of the same type are projected, 
> {{ProjectionCodeGenerator#generateProjectionExpression}} currently performs a 
> "for loop optimization" which, instead of generating code separately for each 
> field, they'll be squashed into a for loop.
> However, if the indices of the fields with the same type are not continuous, 
> this optimization will not write fields in index ascending order. This is not 
> acceptable because {{BinaryWriter}}s expect the users to write fields in 
> index ascending order (that is to say, we *have to* first write field 0, then 
> field 1, then...), otherwise the variable length area of the two binary rows 
> with same data might be different. Although we can use {{getXX}} methods of 
> {{BinaryRow}} to get the fields correctly, states for streaming jobs compare 
> state keys with binary bits, not with the contents of the keys. So we need to 
> make sure the binary bits of the binary rows be the same if two rows contain 
> the same data.
> What's worse, as the current implementation of 
> {{ProjectionCodeGenerator#generateProjectionExpression}} uses a scala 
> {{HashMap}}, the key order of the map might be different on different 
> workers; Even if the projection does not meet the condition to be optimized, 
> it will still be affected by this bug.
> What I suggest is to simply remove this optimization. Because if we still 
> want this optimization, we have to make sure that the fields of the same type 
> have continuous order, which is a very strict and rare condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to