Github user fhueske commented on the issue:
https://github.com/apache/flink/pull/3149
Hi @ramkrish86, @tonycox, and @wuchong,
sorry for joining the discussion a bit late. I haven't looked at the code
yet, but I think the discussion is going into the right direction.
I had a look at [how Apache Drill provides access to HBase
tables](https://drill.apache.org/docs/querying-hbase/). Drill also uses a
nested schema of `[rowkey, colfamily1[col1, col2, ...], colfamiliy2[col1, col2,
...] ...]` so basically the same as we are discussing here.
Regarding the field types: The serialization is not under our control, so
should also offer to just return the raw bytes (as Drill does). If users have
custom data types or serialization logic they can use a user defined scalar
function to extract the value. I don't know what's the standard serialization
format for primitives with HBase (or if there is one at all).
Regarding restricting the scan with rowkeys. @tonycox's PR for [filterable
TableSources](https://github.com/apache/flink/pull/3166) can be used to set the
scan range. This would be much better than "hardcoding" the scan ranges in the
TableSource.
Best, Fabian
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---