For 1 > I think adding read only to HBase should be ok because most
update to HBase either through HBase client or REST via Stargate [1]
or Thrift

For 2 > In Apache Gora we use Avro to do type mapping to column and
generate POJO java via Avro compiler.

For 3 > This is the one I am kinda torn. Apache Phoenix incubating try
to provide SQL to HBase [2] via extra indexing and caching. I think
this is defeat the purpose of having NoSQL databases that serve
different purpose than Relational databse.

I am not sure Metamodel should touch NoSQL databases which more like
column types. These databases are designed for large data with access
primary via key and not query mechanism.

Just my 2-cent


[1] http://wiki.apache.org/hadoop/Hbase/Stargate
[2] http://phoenix.incubator.apache.org/

On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
<[email protected]> wrote:
> Hi everyone,
>
> I was looking at our "hbase-module" branch and as much as I like this idea,
> I think we've been a bit too idle with the branch. Maybe we should try to
> make something final e.g. for a version 4.1.
>
> So I thought to give an overview/status of the module's current
> capabilities and it's shortcomings. We should figure out if we think this
> is good enough for a first version, or if we want to do some improvements
> to the module before adding it to our portfolio of MetaModel modules.
>
> 1) The module only offers read-only/query access to HBase. That is in my
> opinion OK for now, we have several such modules, and this is something we
> can better add later if we straighten out the remaining topics in this mail.
>
> 2) With regards to metadata mapping: HBase is different because it has both
> column families and in column families there are columns. For the sake of
> our view on HBase I would describe column families simply as "a logical of
> columns". Column families are fixed within a table, but rows in a table may
> contain arbitrary numbers of columns within each column family. So... You
> can instantiate the HBaseDataContext in two ways:
>
> 2a) You can let MetaModel discover the metadata. This unfortunately has a
> severe limitation. We discover the table names and column families using
> the HBase API. But the actual columns and their contents cannot be provided
> by the API. So instead we simply expose the column families with a MAP data
> types. The trouble with this is that the keys and values of the maps will
> simply be byte-arrays ... Usually not very useful! But it's sort of the
> only thing (as far as I can see) that's "safe" in HBase, since HBase allows
> anything (byte arrays) in it's columns.
>
> 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array of
> tables (SimpleTableDef). That way the user defines the metadata himself and
> the implementation assumes that it is correct (or else it will break). The
> good thing about this is that the user can define the proper data types
> etc. for columns. The user defines the column family and column name by
> setting defining the MetaModel column name as this: "family:name"
> (consistent with most HBase tools and API calls).
>
> 3) With regards to querying: We've implemented basic query capabilities
> using the MetaModel query postprocessor. But not all queries are very
> effective... In addition to of course full table scans, we have optimized
> support of of COUNT queries and of table scans with maxRows.
>
> We could rather easily add optimized support for a couple of other typical
> queries:
>  * lookup record by ID
>  * paged table scans (both firstRow and maxRows)
>  * queries with simple filters/where items
>
> 4) With regards to dependencies: The module right now depends on the
> artifact called "hbase-client". This dependency has a loot of transient
> dependencies so the size of the module is quite extreme. As an example, it
> includes stuff like jetty, jersey, jackson and of course hadoop... But I am
> wondering if we can have a more thin client-side than that! If anyone knows
> if e.g. we can use the REST interface easily or so, that would maybe be
> better. I'm not an expert on HBase though, so please enlighten me!
>
> Kind regards,
> Kasper

Reply via email to