hi, Guys,

I am looking for a way to Read HBase table through MPP(Postgres-XC). And
hoping to get some suggestions to either validate or invalidate the
approach.

Kind of like Apache Drill, but through PostgresSQL. Long story about why
Postgres, and how c/c++ will give me headache for months to come. :-) I
will leave it as is for now.

The design is to have distributed Postgres-XC installed on the same HBase
cluster, so Postgres' datanodes are on the same physical node as HBase's
regionServers. connect HBase from PostgresSQL through existing HBase client
code.

Step1: At Postgres coordinator node(like Master of HBase), use
HTable.getRegionLocations to get all Regions of a particular table:
NavigableMap<HRegionInfo, ServerName>
Step 2: iterate through above NavigatbleMap to map HBase ServerName to
PG-XC's dataNode. The goal is to let the dataNode of Postgres handle the
regions on its own physical machine.
Step 3: Postgres coordinator node send the execution plan to Postgres
datanode , through a existing framework called foreign data wrapper.
Step 4: Postgres DataNode iterate through its assigned regions, and open a
HBase Client.Scan() with .setStartRow and .setStopRow so it will only read
the assigned region.  I was hoping to use HRegionInfo.regionId directly,
but can find such API in Client.Scan
Step 5: Posgres DataNode further analyse the retrieve data.

So in short, the architect design is to leverage Postgres optimizer to
parse SQL Query, and use Postgres DataNode as HBase' client to read HBase
regions directly in parallel. With the hope to 1) read HRegion locally; 2)
leverage existing HBase filters.

On step4 above, is there a way to talk to RegionSever directly without
communicating with HMaster?

Similar ideas(Drill for one, how about HP vertica?) are brought up before,
and discussed.  So before I am heading down the same road, Can I pick your
brain, please shed me some light? or prevent me from doing something stupid?

Many thanks

Demai

Reply via email to