[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource

fhueske Fri, 20 Jan 2017 16:43:58 -0800

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/3149
  
    Hi @ramkrish86, @tonycox, and @wuchong,
    
    sorry for joining the discussion a bit late. I haven't looked at the code 
yet, but I think the discussion is going into the right direction. 
    
    I had a look at [how Apache Drill provides access to HBase 
tables](https://drill.apache.org/docs/querying-hbase/). Drill also uses a 
nested schema of `[rowkey, colfamily1[col1, col2, ...], colfamiliy2[col1, col2, 
...] ...]` so basically the same as we are discussing here.
    
    Regarding the field types: The serialization is not under our control, so 
should also offer to just return the raw bytes (as Drill does). If users have 
custom data types or serialization logic they can use a user defined scalar 
function to extract the value. I don't know what's the standard serialization 
format for primitives with HBase (or if there is one at all). 
    
    Regarding restricting the scan with rowkeys. @tonycox's PR for [filterable 
TableSources](https://github.com/apache/flink/pull/3166) can be used to set the 
scan range. This would be much better than "hardcoding" the scan ranges in the 
TableSource.
    
    Best, Fabian



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3149: FLINK-2168 Add HBaseTableSource

Reply via email to