fqshopify opened a new pull request, #174:
URL: https://github.com/apache/flink-connector-kafka/pull/174

   Fixes: https://issues.apache.org/jira/browse/FLINK-32609
   
   Implements the `SupportsProjectPushdown` interface for `KafkaDynamicSource`. 
   
   # Benefits
   
   1. Improved performance
       - Unneeded columns will be filtered out at an earlier stage of 
processing (specficially in the `TableSourceScan` node). 
       - The amount of improvement in performance will vary depending on:
           - The number of columns selected
           - The number of columns in the source data
           - The DecodingFormat 
           - etc.
   2. Improved resiliency
       - Currently SQL queries will fail if any field in the table undergoes a 
breaking schema change, even if the SQL query itself does not depend on that 
field. 
       - After the changes in this PR, SQL queries will continue to work even 
if fields that they do not depend on experience breaking schema changes. See 
`testBreakingSchemaChanges` for an example. 
       - This improvement is generally only applicable to 
`ProjectableDecodingFormat` that can decode messages independently/dynamically 
e.g. `json`, `avro-confluent`, `debezium-avro-confluent`. 
   
   # Limitations
   
   1. We cannot push projections all the way down into Kafka. 
       - Projection pushdown is primarily used as an I/O optimization technique 
by pushing projections down all the way into the storage layer. 
       - Unfortunately our storage layer, Kafka, does not support projection 
pushdown. As a result, we are not able to push projections down further than 
the deserialization step of our Flink pipelines. 
       - We're still able to improve performance by eliminating unnecessary 
columns at an earlier stage of processing within our Flink pipeline but the 
performance benefits are relatively lower than they would have been otherwise. 
   
   # Challenges
   
   1. `AvroDecodingFormat` is not actually projectable
       - The `AvroDecodingFormat` claims to be a `ProjectableDecodingFormat` 
but is not actually projectable AFAICT. 
       - It appears that I’m not the first person to discover this issue: 
FLINK-35324
       - Possible remediations: 
           - Fix the `AvroDecodingFormat` to actually be projectable
               - I think this is impossible based on how Avro works
           - Change `AvroDecodingFormat` so that it implements just 
`DecodingFormat`
               - I think this is the right solution but this is likely a 
breaking change
           - Add an optional configuration to disable pushing down projections 
into the decoder
               - This is what I've done in the prototype to unblock myself 
temporarily
   
   # Next steps
   
   - [ ] Get feedback from the community
   - [ ] Raise a FLIP if necessary
   - [ ] Align on a solution for `AvroDecodingFormat` issue
   - [ ] Clean up code, more unit tests, etc.
   - [ ] Open up for formal PR reviews (early reviews/comments are still 
welcome though!)
   
   # Reviewing notes
   
   Please note that the code included in this PR is currently in the 
working-prototype stage and is mostly intended to facilitate discussion. I can 
definitely clean up the code and I'm open to changing things. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to