Yep. Or in slightly more technical terms: It means that the HBaseDataContext only implements DataContext which has these two significant methods:
* getSchemas() * executeQuery(...) (Plus a bunch more methods, but those two give you the general impression: Explore metadata and fire queries / reads) But not UpdateableDataContext, which has the write operations: * executeUpdate(...) Regards, Kasper 2014-03-24 22:37 GMT+01:00 Henry Saputra <[email protected]>: > Hmm, what does it mean by read only? You can use it to read data from > HBase? > > - Henry > > On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen > <[email protected]> wrote: > > A quick update on this since the module has now been merged into the > master > > branch: > > > > 1) Module is still read-only. This is accepted for now (unless someone > > wants to help change it of course). > > > > 2) Metadata mapping is still working in two modes: a) we discover the > > column families and expose them as byte-array maps (not very useful, but > > works as a "lowest common denominator") and b) the user provides a set of > > SimpleTableDef (which now has a convenient parser btw.:)) and gets his > > table mapping as he wants it. > > > > 3) Querying now has special support for lookup-by-id type queries where > we > > will use HBase Get instead of Scan. We also have good support for > > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will scan > > past the first records on the client side). > > > > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in many > > flavours and all are not compatible. I doubt there's a lot we can do > about > > it, except ask the users to provide their own HBase dependency as per > their > > backend version. We should probably thus make all our HBase/Hadoop > > dependencies <optional>true</optional> in order to not influence the > > typical clients. > > > > Kasper > > > > > > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen < > [email protected]>: > > > >> Hi Henry, > >> > >> Yea the Phoenix project is definately an interesting approach to making > MM > >> capable of working with HBase. The only downside to me is that it seems > >> they do a lot of intrusive stuff to HBase like creating new index tables > >> etc... I would normally not "allow" that for a simple connector. > >> > >> Maybe we should simply support both styles. And in the case of Phoenix, > I > >> guess we could simply go through the JDBC module of MetaModel and > connect > >> via their JDBC driver... Is that maybe a route, do you know? > >> > >> - Kasper > >> > >> > >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <[email protected]>: > >> > >> We could use the HBase client library from the store I suppose. > >>> The issue I am actually worry is actually adding real query support > >>> for column based datastore is kind of big task. > >>> Apache Phoenix tried to do that so maybe we could leverage the SQL > >>> planner layer to provide the implementation of the query execution to > >>> HBase layer? > >>> > >>> - Henry > >>> > >>> > >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen > >>> <[email protected]> wrote: > >>> > Thanks for the input Henry. With your experience, do you then also > >>> happen > >>> > to know of a good thin client-side library? I imagine that we could > >>> maybe > >>> > use a REST client instead of the full client we currently use. That > >>> would > >>> > save us a ton of dependency-overhead I think. Or is it a non-issue in > >>> your > >>> > mind, since HBase users are used to this overhead? > >>> > > >>> > > >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra <[email protected]>: > >>> > > >>> >> For 1 > I think adding read only to HBase should be ok because most > >>> >> update to HBase either through HBase client or REST via Stargate [1] > >>> >> or Thrift > >>> >> > >>> >> For 2 > In Apache Gora we use Avro to do type mapping to column and > >>> >> generate POJO java via Avro compiler. > >>> >> > >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix incubating > try > >>> >> to provide SQL to HBase [2] via extra indexing and caching. I think > >>> >> this is defeat the purpose of having NoSQL databases that serve > >>> >> different purpose than Relational databse. > >>> >> > >>> >> I am not sure Metamodel should touch NoSQL databases which more like > >>> >> column types. These databases are designed for large data with > access > >>> >> primary via key and not query mechanism. > >>> >> > >>> >> Just my 2-cent > >>> >> > >>> >> > >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate > >>> >> [2] http://phoenix.incubator.apache.org/ > >>> >> > >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen > >>> >> <[email protected]> wrote: > >>> >> > Hi everyone, > >>> >> > > >>> >> > I was looking at our "hbase-module" branch and as much as I like > this > >>> >> idea, > >>> >> > I think we've been a bit too idle with the branch. Maybe we should > >>> try to > >>> >> > make something final e.g. for a version 4.1. > >>> >> > > >>> >> > So I thought to give an overview/status of the module's current > >>> >> > capabilities and it's shortcomings. We should figure out if we > think > >>> this > >>> >> > is good enough for a first version, or if we want to do some > >>> improvements > >>> >> > to the module before adding it to our portfolio of MetaModel > modules. > >>> >> > > >>> >> > 1) The module only offers read-only/query access to HBase. That is > >>> in my > >>> >> > opinion OK for now, we have several such modules, and this is > >>> something > >>> >> we > >>> >> > can better add later if we straighten out the remaining topics in > >>> this > >>> >> mail. > >>> >> > > >>> >> > 2) With regards to metadata mapping: HBase is different because it > >>> has > >>> >> both > >>> >> > column families and in column families there are columns. For the > >>> sake of > >>> >> > our view on HBase I would describe column families simply as "a > >>> logical > >>> >> of > >>> >> > columns". Column families are fixed within a table, but rows in a > >>> table > >>> >> may > >>> >> > contain arbitrary numbers of columns within each column family. > >>> So... You > >>> >> > can instantiate the HBaseDataContext in two ways: > >>> >> > > >>> >> > 2a) You can let MetaModel discover the metadata. This > unfortunately > >>> has a > >>> >> > severe limitation. We discover the table names and column families > >>> using > >>> >> > the HBase API. But the actual columns and their contents cannot be > >>> >> provided > >>> >> > by the API. So instead we simply expose the column families with a > >>> MAP > >>> >> data > >>> >> > types. The trouble with this is that the keys and values of the > maps > >>> will > >>> >> > simply be byte-arrays ... Usually not very useful! But it's sort > of > >>> the > >>> >> > only thing (as far as I can see) that's "safe" in HBase, since > HBase > >>> >> allows > >>> >> > anything (byte arrays) in it's columns. > >>> >> > > >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide an > array > >>> of > >>> >> > tables (SimpleTableDef). That way the user defines the metadata > >>> himself > >>> >> and > >>> >> > the implementation assumes that it is correct (or else it will > >>> break). > >>> >> The > >>> >> > good thing about this is that the user can define the proper data > >>> types > >>> >> > etc. for columns. The user defines the column family and column > name > >>> by > >>> >> > setting defining the MetaModel column name as this: "family:name" > >>> >> > (consistent with most HBase tools and API calls). > >>> >> > > >>> >> > 3) With regards to querying: We've implemented basic query > >>> capabilities > >>> >> > using the MetaModel query postprocessor. But not all queries are > very > >>> >> > effective... In addition to of course full table scans, we have > >>> optimized > >>> >> > support of of COUNT queries and of table scans with maxRows. > >>> >> > > >>> >> > We could rather easily add optimized support for a couple of other > >>> >> typical > >>> >> > queries: > >>> >> > * lookup record by ID > >>> >> > * paged table scans (both firstRow and maxRows) > >>> >> > * queries with simple filters/where items > >>> >> > > >>> >> > 4) With regards to dependencies: The module right now depends on > the > >>> >> > artifact called "hbase-client". This dependency has a loot of > >>> transient > >>> >> > dependencies so the size of the module is quite extreme. As an > >>> example, > >>> >> it > >>> >> > includes stuff like jetty, jersey, jackson and of course hadoop... > >>> But I > >>> >> am > >>> >> > wondering if we can have a more thin client-side than that! If > anyone > >>> >> knows > >>> >> > if e.g. we can use the REST interface easily or so, that would > maybe > >>> be > >>> >> > better. I'm not an expert on HBase though, so please enlighten me! > >>> >> > > >>> >> > Kind regards, > >>> >> > Kasper > >>> >> > >>> > >> > >> >
