Hmm was kinda hoping we wouldn't have to... But that's just because I am lazy and I prefer "live" (editable online) documentation where possible (that way you can easily react if someone starts pointing at missing parts). I think either way is doable, but you're right that in case we use wiki-pages, each wiki page should clearly state which versions they apply to, if they are version-specific.
2014-03-24 23:03 GMT+01:00 Henry Saputra <[email protected]>: > Hmm seems like we need to bundle the doc for each release. For > example, the 4.0.0 does not have HBase store. > > Most projects have docs for each release on top of project homepage, > like Zookeeper http://zookeeper.apache.org/doc/r3.4.6/ or Spark > http://spark.apache.org/docs/0.9.0/ > > Thoughts? > > - Henry > > On Mon, Mar 24, 2014 at 2:50 PM, Kasper Sørensen > <[email protected]> wrote: > > Hmm I suppose a wiki page would be good. I guess we have wiki pages for > > some of the DataContext implementations already like Salesforce [1], POJO > > [2] and Composite [3] ... Maybe we should even have a page for *every > > *DataContext > > implementation there is, simply for completeness and referenceability of > > documentation. > > > > [1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext > > [2] http://wiki.apache.org/metamodel/examples/PojoDataContext > > [3] http://wiki.apache.org/metamodel/examples/CompositeDataContext > > > > > > 2014-03-24 22:44 GMT+01:00 Henry Saputra <[email protected]>: > > > >> Ok +1 > >> > >> How do you propose to document this feature? As another page in the > >> doc svn repo? > >> > >> - Henry > >> > >> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen > >> <[email protected]> wrote: > >> > Yep. Or in slightly more technical terms: It means that the > >> > HBaseDataContext only implements DataContext which has these two > >> > significant methods: > >> > > >> > * getSchemas() > >> > * executeQuery(...) > >> > > >> > (Plus a bunch more methods, but those two give you the general > >> impression: > >> > Explore metadata and fire queries / reads) > >> > But not UpdateableDataContext, which has the write operations: > >> > > >> > * executeUpdate(...) > >> > > >> > Regards, > >> > Kasper > >> > > >> > > >> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <[email protected]>: > >> > > >> >> Hmm, what does it mean by read only? You can use it to read data from > >> >> HBase? > >> >> > >> >> - Henry > >> >> > >> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen > >> >> <[email protected]> wrote: > >> >> > A quick update on this since the module has now been merged into > the > >> >> master > >> >> > branch: > >> >> > > >> >> > 1) Module is still read-only. This is accepted for now (unless > someone > >> >> > wants to help change it of course). > >> >> > > >> >> > 2) Metadata mapping is still working in two modes: a) we discover > the > >> >> > column families and expose them as byte-array maps (not very > useful, > >> but > >> >> > works as a "lowest common denominator") and b) the user provides a > >> set of > >> >> > SimpleTableDef (which now has a convenient parser btw.:)) and gets > his > >> >> > table mapping as he wants it. > >> >> > > >> >> > 3) Querying now has special support for lookup-by-id type queries > >> where > >> >> we > >> >> > will use HBase Get instead of Scan. We also have good support for > >> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will > >> scan > >> >> > past the first records on the client side). > >> >> > > >> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in > >> many > >> >> > flavours and all are not compatible. I doubt there's a lot we can > do > >> >> about > >> >> > it, except ask the users to provide their own HBase dependency as > per > >> >> their > >> >> > backend version. We should probably thus make all our HBase/Hadoop > >> >> > dependencies <optional>true</optional> in order to not influence > the > >> >> > typical clients. > >> >> > > >> >> > Kasper > >> >> > > >> >> > > >> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen < > >> >> [email protected]>: > >> >> > > >> >> >> Hi Henry, > >> >> >> > >> >> >> Yea the Phoenix project is definately an interesting approach to > >> making > >> >> MM > >> >> >> capable of working with HBase. The only downside to me is that it > >> seems > >> >> >> they do a lot of intrusive stuff to HBase like creating new index > >> tables > >> >> >> etc... I would normally not "allow" that for a simple connector. > >> >> >> > >> >> >> Maybe we should simply support both styles. And in the case of > >> Phoenix, > >> >> I > >> >> >> guess we could simply go through the JDBC module of MetaModel and > >> >> connect > >> >> >> via their JDBC driver... Is that maybe a route, do you know? > >> >> >> > >> >> >> - Kasper > >> >> >> > >> >> >> > >> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <[email protected] > >: > >> >> >> > >> >> >> We could use the HBase client library from the store I suppose. > >> >> >>> The issue I am actually worry is actually adding real query > support > >> >> >>> for column based datastore is kind of big task. > >> >> >>> Apache Phoenix tried to do that so maybe we could leverage the > SQL > >> >> >>> planner layer to provide the implementation of the query > execution > >> to > >> >> >>> HBase layer? > >> >> >>> > >> >> >>> - Henry > >> >> >>> > >> >> >>> > >> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen > >> >> >>> <[email protected]> wrote: > >> >> >>> > Thanks for the input Henry. With your experience, do you then > also > >> >> >>> happen > >> >> >>> > to know of a good thin client-side library? I imagine that we > >> could > >> >> >>> maybe > >> >> >>> > use a REST client instead of the full client we currently use. > >> That > >> >> >>> would > >> >> >>> > save us a ton of dependency-overhead I think. Or is it a > >> non-issue in > >> >> >>> your > >> >> >>> > mind, since HBase users are used to this overhead? > >> >> >>> > > >> >> >>> > > >> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra < > [email protected] > >> >: > >> >> >>> > > >> >> >>> >> For 1 > I think adding read only to HBase should be ok because > >> most > >> >> >>> >> update to HBase either through HBase client or REST via > Stargate > >> [1] > >> >> >>> >> or Thrift > >> >> >>> >> > >> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to > column > >> and > >> >> >>> >> generate POJO java via Avro compiler. > >> >> >>> >> > >> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix > >> incubating > >> >> try > >> >> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I > >> think > >> >> >>> >> this is defeat the purpose of having NoSQL databases that > serve > >> >> >>> >> different purpose than Relational databse. > >> >> >>> >> > >> >> >>> >> I am not sure Metamodel should touch NoSQL databases which > more > >> like > >> >> >>> >> column types. These databases are designed for large data with > >> >> access > >> >> >>> >> primary via key and not query mechanism. > >> >> >>> >> > >> >> >>> >> Just my 2-cent > >> >> >>> >> > >> >> >>> >> > >> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate > >> >> >>> >> [2] http://phoenix.incubator.apache.org/ > >> >> >>> >> > >> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen > >> >> >>> >> <[email protected]> wrote: > >> >> >>> >> > Hi everyone, > >> >> >>> >> > > >> >> >>> >> > I was looking at our "hbase-module" branch and as much as I > >> like > >> >> this > >> >> >>> >> idea, > >> >> >>> >> > I think we've been a bit too idle with the branch. Maybe we > >> should > >> >> >>> try to > >> >> >>> >> > make something final e.g. for a version 4.1. > >> >> >>> >> > > >> >> >>> >> > So I thought to give an overview/status of the module's > current > >> >> >>> >> > capabilities and it's shortcomings. We should figure out if > we > >> >> think > >> >> >>> this > >> >> >>> >> > is good enough for a first version, or if we want to do some > >> >> >>> improvements > >> >> >>> >> > to the module before adding it to our portfolio of MetaModel > >> >> modules. > >> >> >>> >> > > >> >> >>> >> > 1) The module only offers read-only/query access to HBase. > >> That is > >> >> >>> in my > >> >> >>> >> > opinion OK for now, we have several such modules, and this > is > >> >> >>> something > >> >> >>> >> we > >> >> >>> >> > can better add later if we straighten out the remaining > topics > >> in > >> >> >>> this > >> >> >>> >> mail. > >> >> >>> >> > > >> >> >>> >> > 2) With regards to metadata mapping: HBase is different > >> because it > >> >> >>> has > >> >> >>> >> both > >> >> >>> >> > column families and in column families there are columns. > For > >> the > >> >> >>> sake of > >> >> >>> >> > our view on HBase I would describe column families simply > as "a > >> >> >>> logical > >> >> >>> >> of > >> >> >>> >> > columns". Column families are fixed within a table, but rows > >> in a > >> >> >>> table > >> >> >>> >> may > >> >> >>> >> > contain arbitrary numbers of columns within each column > family. > >> >> >>> So... You > >> >> >>> >> > can instantiate the HBaseDataContext in two ways: > >> >> >>> >> > > >> >> >>> >> > 2a) You can let MetaModel discover the metadata. This > >> >> unfortunately > >> >> >>> has a > >> >> >>> >> > severe limitation. We discover the table names and column > >> families > >> >> >>> using > >> >> >>> >> > the HBase API. But the actual columns and their contents > >> cannot be > >> >> >>> >> provided > >> >> >>> >> > by the API. So instead we simply expose the column families > >> with a > >> >> >>> MAP > >> >> >>> >> data > >> >> >>> >> > types. The trouble with this is that the keys and values of > the > >> >> maps > >> >> >>> will > >> >> >>> >> > simply be byte-arrays ... Usually not very useful! But it's > >> sort > >> >> of > >> >> >>> the > >> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase, > since > >> >> HBase > >> >> >>> >> allows > >> >> >>> >> > anything (byte arrays) in it's columns. > >> >> >>> >> > > >> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide > an > >> >> array > >> >> >>> of > >> >> >>> >> > tables (SimpleTableDef). That way the user defines the > metadata > >> >> >>> himself > >> >> >>> >> and > >> >> >>> >> > the implementation assumes that it is correct (or else it > will > >> >> >>> break). > >> >> >>> >> The > >> >> >>> >> > good thing about this is that the user can define the proper > >> data > >> >> >>> types > >> >> >>> >> > etc. for columns. The user defines the column family and > column > >> >> name > >> >> >>> by > >> >> >>> >> > setting defining the MetaModel column name as this: > >> "family:name" > >> >> >>> >> > (consistent with most HBase tools and API calls). > >> >> >>> >> > > >> >> >>> >> > 3) With regards to querying: We've implemented basic query > >> >> >>> capabilities > >> >> >>> >> > using the MetaModel query postprocessor. But not all queries > >> are > >> >> very > >> >> >>> >> > effective... In addition to of course full table scans, we > have > >> >> >>> optimized > >> >> >>> >> > support of of COUNT queries and of table scans with maxRows. > >> >> >>> >> > > >> >> >>> >> > We could rather easily add optimized support for a couple of > >> other > >> >> >>> >> typical > >> >> >>> >> > queries: > >> >> >>> >> > * lookup record by ID > >> >> >>> >> > * paged table scans (both firstRow and maxRows) > >> >> >>> >> > * queries with simple filters/where items > >> >> >>> >> > > >> >> >>> >> > 4) With regards to dependencies: The module right now > depends > >> on > >> >> the > >> >> >>> >> > artifact called "hbase-client". This dependency has a loot > of > >> >> >>> transient > >> >> >>> >> > dependencies so the size of the module is quite extreme. As > an > >> >> >>> example, > >> >> >>> >> it > >> >> >>> >> > includes stuff like jetty, jersey, jackson and of course > >> hadoop... > >> >> >>> But I > >> >> >>> >> am > >> >> >>> >> > wondering if we can have a more thin client-side than that! > If > >> >> anyone > >> >> >>> >> knows > >> >> >>> >> > if e.g. we can use the REST interface easily or so, that > would > >> >> maybe > >> >> >>> be > >> >> >>> >> > better. I'm not an expert on HBase though, so please > enlighten > >> me! > >> >> >>> >> > > >> >> >>> >> > Kind regards, > >> >> >>> >> > Kasper > >> >> >>> >> > >> >> >>> > >> >> >> > >> >> >> > >> >> > >> >
