b-slim opened a new pull request #1386:
URL: https://github.com/apache/samza/pull/1386
This is an initial draft on how to support the nested row access in Samza
SQL.
There are multiple #interconnected items.
1. Added a the actual definition of a ROW Calcite Data Type [**EASY FINAL**].
2. Added a Row Converter form Samza Type System to Calcite Type System
[**Okay for now but will need more work for types like timestamps**].
3. Added a Collector for projects and filters that are pushed to Remote
Table Scan **[Complex and Needs Discussions**].
- Why we need this ? Adding a nested row struct forces the addition of
project and in general nothing stops Calcite logical planner to add such an
identity project thus this is needed anyway.
- How this done ? As of now I chose to minimize the amount of rewrite or
refactor and added a queue to collect the call stack between Remote table Scan
and Join node. Then When doing the join The Project and Filter will happen post
Join Lookup. We need to handle the case where filter does not match and null
pad the result or return null as by current convention. To be honest I am still
debating adding the Filter push down seems like there is no real gain since we
have done the lookup already.
4. Need to Add more Type conversion To support legacy UDFs that operate on
non scalar types and assume Everything is a SamzaRelRecord or Java Maps [**Not
Done yet maybe Followup**].
5. Need more code cleaning where type is mixed up between String Java and
Avro Utf8 Java as a Key in the map [**WIP**].
6. Need more work on the union Type System case we have more than 2 Types
[**Followup**].
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]