Thanks for the input Henry. With your experience, do you then also happen to know of a good thin client-side library? I imagine that we could maybe use a REST client instead of the full client we currently use. That would save us a ton of dependency-overhead I think. Or is it a non-issue in your mind, since HBase users are used to this overhead?
2014-02-16 7:16 GMT+01:00 Henry Saputra <[email protected]>: > For 1 > I think adding read only to HBase should be ok because most > update to HBase either through HBase client or REST via Stargate [1] > or Thrift > > For 2 > In Apache Gora we use Avro to do type mapping to column and > generate POJO java via Avro compiler. > > For 3 > This is the one I am kinda torn. Apache Phoenix incubating try > to provide SQL to HBase [2] via extra indexing and caching. I think > this is defeat the purpose of having NoSQL databases that serve > different purpose than Relational databse. > > I am not sure Metamodel should touch NoSQL databases which more like > column types. These databases are designed for large data with access > primary via key and not query mechanism. > > Just my 2-cent > > > [1] http://wiki.apache.org/hadoop/Hbase/Stargate > [2] http://phoenix.incubator.apache.org/ > > On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen > <[email protected]> wrote: > > Hi everyone, > > > > I was looking at our "hbase-module" branch and as much as I like this > idea, > > I think we've been a bit too idle with the branch. Maybe we should try to > > make something final e.g. for a version 4.1. > > > > So I thought to give an overview/status of the module's current > > capabilities and it's shortcomings. We should figure out if we think this > > is good enough for a first version, or if we want to do some improvements > > to the module before adding it to our portfolio of MetaModel modules. > > > > 1) The module only offers read-only/query access to HBase. That is in my > > opinion OK for now, we have several such modules, and this is something > we > > can better add later if we straighten out the remaining topics in this > mail. > > > > 2) With regards to metadata mapping: HBase is different because it has > both > > column families and in column families there are columns. For the sake of > > our view on HBase I would describe column families simply as "a logical > of > > columns". Column families are fixed within a table, but rows in a table > may > > contain arbitrary numbers of columns within each column family. So... You > > can instantiate the HBaseDataContext in two ways: > > > > 2a) You can let MetaModel discover the metadata. This unfortunately has a > > severe limitation. We discover the table names and column families using > > the HBase API. But the actual columns and their contents cannot be > provided > > by the API. So instead we simply expose the column families with a MAP > data > > types. The trouble with this is that the keys and values of the maps will > > simply be byte-arrays ... Usually not very useful! But it's sort of the > > only thing (as far as I can see) that's "safe" in HBase, since HBase > allows > > anything (byte arrays) in it's columns. > > > > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array of > > tables (SimpleTableDef). That way the user defines the metadata himself > and > > the implementation assumes that it is correct (or else it will break). > The > > good thing about this is that the user can define the proper data types > > etc. for columns. The user defines the column family and column name by > > setting defining the MetaModel column name as this: "family:name" > > (consistent with most HBase tools and API calls). > > > > 3) With regards to querying: We've implemented basic query capabilities > > using the MetaModel query postprocessor. But not all queries are very > > effective... In addition to of course full table scans, we have optimized > > support of of COUNT queries and of table scans with maxRows. > > > > We could rather easily add optimized support for a couple of other > typical > > queries: > > * lookup record by ID > > * paged table scans (both firstRow and maxRows) > > * queries with simple filters/where items > > > > 4) With regards to dependencies: The module right now depends on the > > artifact called "hbase-client". This dependency has a loot of transient > > dependencies so the size of the module is quite extreme. As an example, > it > > includes stuff like jetty, jersey, jackson and of course hadoop... But I > am > > wondering if we can have a more thin client-side than that! If anyone > knows > > if e.g. we can use the REST interface easily or so, that would maybe be > > better. I'm not an expert on HBase though, so please enlighten me! > > > > Kind regards, > > Kasper >
