Hi Henry,

Yea the Phoenix project is definately an interesting approach to making MM
capable of working with HBase. The only downside to me is that it seems
they do a lot of intrusive stuff to HBase like creating new index tables
etc... I would normally not "allow" that for a simple connector.

Maybe we should simply support both styles. And in the case of Phoenix, I
guess we could simply go through the JDBC module of MetaModel and connect
via their JDBC driver... Is that maybe a route, do you know?

- Kasper


2014-02-24 6:37 GMT+01:00 Henry Saputra <[email protected]>:

> We could use the HBase client library from the store I suppose.
> The issue I am actually worry is actually adding real query support
> for column based datastore is kind of big task.
> Apache Phoenix tried to do that so maybe we could leverage the SQL
> planner layer to provide the implementation of the query execution to
> HBase layer?
>
> - Henry
>
>
> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen
> <[email protected]> wrote:
> > Thanks for the input Henry. With your experience, do you then also happen
> > to know of a good thin client-side library? I imagine that we could maybe
> > use a REST client instead of the full client we currently use. That would
> > save us a ton of dependency-overhead I think. Or is it a non-issue in
> your
> > mind, since HBase users are used to this overhead?
> >
> >
> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <[email protected]>:
> >
> >> For 1 > I think adding read only to HBase should be ok because most
> >> update to HBase either through HBase client or REST via Stargate [1]
> >> or Thrift
> >>
> >> For 2 > In Apache Gora we use Avro to do type mapping to column and
> >> generate POJO java via Avro compiler.
> >>
> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating try
> >> to provide SQL to HBase [2] via extra indexing and caching. I think
> >> this is defeat the purpose of having NoSQL databases that serve
> >> different purpose than Relational databse.
> >>
> >> I am not sure Metamodel should touch NoSQL databases which more like
> >> column types. These databases are designed for large data with access
> >> primary via key and not query mechanism.
> >>
> >> Just my 2-cent
> >>
> >>
> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate
> >> [2] http://phoenix.incubator.apache.org/
> >>
> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen
> >> <[email protected]> wrote:
> >> > Hi everyone,
> >> >
> >> > I was looking at our "hbase-module" branch and as much as I like this
> >> idea,
> >> > I think we've been a bit too idle with the branch. Maybe we should
> try to
> >> > make something final e.g. for a version 4.1.
> >> >
> >> > So I thought to give an overview/status of the module's current
> >> > capabilities and it's shortcomings. We should figure out if we think
> this
> >> > is good enough for a first version, or if we want to do some
> improvements
> >> > to the module before adding it to our portfolio of MetaModel modules.
> >> >
> >> > 1) The module only offers read-only/query access to HBase. That is in
> my
> >> > opinion OK for now, we have several such modules, and this is
> something
> >> we
> >> > can better add later if we straighten out the remaining topics in this
> >> mail.
> >> >
> >> > 2) With regards to metadata mapping: HBase is different because it has
> >> both
> >> > column families and in column families there are columns. For the
> sake of
> >> > our view on HBase I would describe column families simply as "a
> logical
> >> of
> >> > columns". Column families are fixed within a table, but rows in a
> table
> >> may
> >> > contain arbitrary numbers of columns within each column family. So...
> You
> >> > can instantiate the HBaseDataContext in two ways:
> >> >
> >> > 2a) You can let MetaModel discover the metadata. This unfortunately
> has a
> >> > severe limitation. We discover the table names and column families
> using
> >> > the HBase API. But the actual columns and their contents cannot be
> >> provided
> >> > by the API. So instead we simply expose the column families with a MAP
> >> data
> >> > types. The trouble with this is that the keys and values of the maps
> will
> >> > simply be byte-arrays ... Usually not very useful! But it's sort of
> the
> >> > only thing (as far as I can see) that's "safe" in HBase, since HBase
> >> allows
> >> > anything (byte arrays) in it's columns.
> >> >
> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array
> of
> >> > tables (SimpleTableDef). That way the user defines the metadata
> himself
> >> and
> >> > the implementation assumes that it is correct (or else it will break).
> >> The
> >> > good thing about this is that the user can define the proper data
> types
> >> > etc. for columns. The user defines the column family and column name
> by
> >> > setting defining the MetaModel column name as this: "family:name"
> >> > (consistent with most HBase tools and API calls).
> >> >
> >> > 3) With regards to querying: We've implemented basic query
> capabilities
> >> > using the MetaModel query postprocessor. But not all queries are very
> >> > effective... In addition to of course full table scans, we have
> optimized
> >> > support of of COUNT queries and of table scans with maxRows.
> >> >
> >> > We could rather easily add optimized support for a couple of other
> >> typical
> >> > queries:
> >> >  * lookup record by ID
> >> >  * paged table scans (both firstRow and maxRows)
> >> >  * queries with simple filters/where items
> >> >
> >> > 4) With regards to dependencies: The module right now depends on the
> >> > artifact called "hbase-client". This dependency has a loot of
> transient
> >> > dependencies so the size of the module is quite extreme. As an
> example,
> >> it
> >> > includes stuff like jetty, jersey, jackson and of course hadoop...
> But I
> >> am
> >> > wondering if we can have a more thin client-side than that! If anyone
> >> knows
> >> > if e.g. we can use the REST interface easily or so, that would maybe
> be
> >> > better. I'm not an expert on HBase though, so please enlighten me!
> >> >
> >> > Kind regards,
> >> > Kasper
> >>
>

Reply via email to