b-slim opened a new pull request #1386:
URL: https://github.com/apache/samza/pull/1386


   This is an initial draft on how to support the nested row access in Samza 
SQL.
   There are multiple  #interconnected items.
   
   1. Added a the actual definition of a ROW Calcite Data Type [**EASY FINAL**].
   
   2. Added a Row Converter form Samza Type System to Calcite Type System 
[**Okay for now but will need more work for types like timestamps**].
   3. Added a Collector for projects and filters that are pushed to Remote 
Table Scan **[Complex and Needs Discussions**]. 
   
    - Why we need this ? Adding a nested row struct forces the addition of 
project and in general nothing stops Calcite logical planner to add such an 
identity project thus this is needed anyway.
   
   - How this done ? As of now I chose to minimize the amount of rewrite or 
refactor and added a queue to collect the call stack between Remote table Scan 
and Join node. Then When doing the join The Project and Filter will happen post 
Join Lookup. We need to handle the case where filter does not match and null 
pad the result or return null as by current convention. To be honest I am still 
debating adding the Filter push down seems like there is no real gain since we 
have done the lookup already.
   
   4. Need to Add more Type conversion To support legacy UDFs that operate on 
non scalar types and assume Everything is a SamzaRelRecord or Java Maps [**Not 
Done yet maybe Followup**].
   5. Need more code cleaning where type is mixed up between String Java and 
Avro Utf8 Java as a Key in the map [**WIP**].
   6. Need more work on the union Type System case we have more than 2 Types 
[**Followup**].
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to