Re: [DISCUSS] State of the work-in-progress HBase branch

Kasper Sørensen Mon, 17 Feb 2014 09:35:24 -0800

Thanks for the input Henry. With your experience, do you then also happen
to know of a good thin client-side library? I imagine that we could maybe
use a REST client instead of the full client we currently use. That would
save us a ton of dependency-overhead I think. Or is it a non-issue in your
mind, since HBase users are used to this overhead?



2014-02-16 7:16 GMT+01:00 Henry Saputra <[email protected]>:

> For 1 > I think adding read only to HBase should be ok because most
> update to HBase either through HBase client or REST via Stargate [1]
> or Thrift
>
> For 2 > In Apache Gora we use Avro to do type mapping to column and
> generate POJO java via Avro compiler.
>
> For 3 > This is the one I am kinda torn. Apache Phoenix incubating try
> to provide SQL to HBase [2] via extra indexing and caching. I think
> this is defeat the purpose of having NoSQL databases that serve
> different purpose than Relational databse.
>
> I am not sure Metamodel should touch NoSQL databases which more like
> column types. These databases are designed for large data with access
> primary via key and not query mechanism.
>
> Just my 2-cent
>
>
> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> [2] http://phoenix.incubator.apache.org/
>
> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> <[email protected]> wrote:
> > Hi everyone,
> >
> > I was looking at our "hbase-module" branch and as much as I like this
> idea,
> > I think we've been a bit too idle with the branch. Maybe we should try to
> > make something final e.g. for a version 4.1.
> >
> > So I thought to give an overview/status of the module's current
> > capabilities and it's shortcomings. We should figure out if we think this
> > is good enough for a first version, or if we want to do some improvements
> > to the module before adding it to our portfolio of MetaModel modules.
> >
> > 1) The module only offers read-only/query access to HBase. That is in my
> > opinion OK for now, we have several such modules, and this is something
> we
> > can better add later if we straighten out the remaining topics in this
> mail.
> >
> > 2) With regards to metadata mapping: HBase is different because it has
> both
> > column families and in column families there are columns. For the sake of
> > our view on HBase I would describe column families simply as "a logical
> of
> > columns". Column families are fixed within a table, but rows in a table
> may
> > contain arbitrary numbers of columns within each column family. So... You
> > can instantiate the HBaseDataContext in two ways:
> >
> > 2a) You can let MetaModel discover the metadata. This unfortunately has a
> > severe limitation. We discover the table names and column families using
> > the HBase API. But the actual columns and their contents cannot be
> provided
> > by the API. So instead we simply expose the column families with a MAP
> data
> > types. The trouble with this is that the keys and values of the maps will
> > simply be byte-arrays ... Usually not very useful! But it's sort of the
> > only thing (as far as I can see) that's "safe" in HBase, since HBase
> allows
> > anything (byte arrays) in it's columns.
> >
> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array of
> > tables (SimpleTableDef). That way the user defines the metadata himself
> and
> > the implementation assumes that it is correct (or else it will break).
> The
> > good thing about this is that the user can define the proper data types
> > etc. for columns. The user defines the column family and column name by
> > setting defining the MetaModel column name as this: "family:name"
> > (consistent with most HBase tools and API calls).
> >
> > 3) With regards to querying: We've implemented basic query capabilities
> > using the MetaModel query postprocessor. But not all queries are very
> > effective... In addition to of course full table scans, we have optimized
> > support of of COUNT queries and of table scans with maxRows.
> >
> > We could rather easily add optimized support for a couple of other
> typical
> > queries:
> >  * lookup record by ID
> >  * paged table scans (both firstRow and maxRows)
> >  * queries with simple filters/where items
> >
> > 4) With regards to dependencies: The module right now depends on the
> > artifact called "hbase-client". This dependency has a loot of transient
> > dependencies so the size of the module is quite extreme. As an example,
> it
> > includes stuff like jetty, jersey, jackson and of course hadoop... But I
> am
> > wondering if we can have a more thin client-side than that! If anyone
> knows
> > if e.g. we can use the REST interface easily or so, that would maybe be
> > better. I'm not an expert on HBase though, so please enlighten me!
> >
> > Kind regards,
> > Kasper
>

Re: [DISCUSS] State of the work-in-progress HBase branch

Reply via email to