Some projects do link back from homepage to wiki page. I think the main key is to have separate docs for each release.
What do you think? - Henry On Tue, Mar 25, 2014 at 4:47 AM, Kasper Sørensen <[email protected]> wrote: > Hmm was kinda hoping we wouldn't have to... But that's just because I am > lazy and I prefer "live" (editable online) documentation where possible > (that way you can easily react if someone starts pointing at missing > parts). I think either way is doable, but you're right that in case we use > wiki-pages, each wiki page should clearly state which versions they apply > to, if they are version-specific. > > > 2014-03-24 23:03 GMT+01:00 Henry Saputra <[email protected]>: > >> Hmm seems like we need to bundle the doc for each release. For >> example, the 4.0.0 does not have HBase store. >> >> Most projects have docs for each release on top of project homepage, >> like Zookeeper http://zookeeper.apache.org/doc/r3.4.6/ or Spark >> http://spark.apache.org/docs/0.9.0/ >> >> Thoughts? >> >> - Henry >> >> On Mon, Mar 24, 2014 at 2:50 PM, Kasper Sørensen >> <[email protected]> wrote: >> > Hmm I suppose a wiki page would be good. I guess we have wiki pages for >> > some of the DataContext implementations already like Salesforce [1], POJO >> > [2] and Composite [3] ... Maybe we should even have a page for *every >> > *DataContext >> > implementation there is, simply for completeness and referenceability of >> > documentation. >> > >> > [1] http://wiki.apache.org/metamodel/examples/SalesforceDataContext >> > [2] http://wiki.apache.org/metamodel/examples/PojoDataContext >> > [3] http://wiki.apache.org/metamodel/examples/CompositeDataContext >> > >> > >> > 2014-03-24 22:44 GMT+01:00 Henry Saputra <[email protected]>: >> > >> >> Ok +1 >> >> >> >> How do you propose to document this feature? As another page in the >> >> doc svn repo? >> >> >> >> - Henry >> >> >> >> On Mon, Mar 24, 2014 at 2:42 PM, Kasper Sørensen >> >> <[email protected]> wrote: >> >> > Yep. Or in slightly more technical terms: It means that the >> >> > HBaseDataContext only implements DataContext which has these two >> >> > significant methods: >> >> > >> >> > * getSchemas() >> >> > * executeQuery(...) >> >> > >> >> > (Plus a bunch more methods, but those two give you the general >> >> impression: >> >> > Explore metadata and fire queries / reads) >> >> > But not UpdateableDataContext, which has the write operations: >> >> > >> >> > * executeUpdate(...) >> >> > >> >> > Regards, >> >> > Kasper >> >> > >> >> > >> >> > 2014-03-24 22:37 GMT+01:00 Henry Saputra <[email protected]>: >> >> > >> >> >> Hmm, what does it mean by read only? You can use it to read data from >> >> >> HBase? >> >> >> >> >> >> - Henry >> >> >> >> >> >> On Mon, Mar 24, 2014 at 2:34 PM, Kasper Sørensen >> >> >> <[email protected]> wrote: >> >> >> > A quick update on this since the module has now been merged into >> the >> >> >> master >> >> >> > branch: >> >> >> > >> >> >> > 1) Module is still read-only. This is accepted for now (unless >> someone >> >> >> > wants to help change it of course). >> >> >> > >> >> >> > 2) Metadata mapping is still working in two modes: a) we discover >> the >> >> >> > column families and expose them as byte-array maps (not very >> useful, >> >> but >> >> >> > works as a "lowest common denominator") and b) the user provides a >> >> set of >> >> >> > SimpleTableDef (which now has a convenient parser btw.:)) and gets >> his >> >> >> > table mapping as he wants it. >> >> >> > >> >> >> > 3) Querying now has special support for lookup-by-id type queries >> >> where >> >> >> we >> >> >> > will use HBase Get instead of Scan. We also have good support for >> >> >> > LIMIT/"maxRows", but not OFFSET/"firstRow" (in those cases we will >> >> scan >> >> >> > past the first records on the client side). >> >> >> > >> >> >> > 4) Dependencies seems to be a pain still. HBase and Hadoop comes in >> >> many >> >> >> > flavours and all are not compatible. I doubt there's a lot we can >> do >> >> >> about >> >> >> > it, except ask the users to provide their own HBase dependency as >> per >> >> >> their >> >> >> > backend version. We should probably thus make all our HBase/Hadoop >> >> >> > dependencies <optional>true</optional> in order to not influence >> the >> >> >> > typical clients. >> >> >> > >> >> >> > Kasper >> >> >> > >> >> >> > >> >> >> > 2014-02-24 17:08 GMT+01:00 Kasper Sørensen < >> >> >> [email protected]>: >> >> >> > >> >> >> >> Hi Henry, >> >> >> >> >> >> >> >> Yea the Phoenix project is definately an interesting approach to >> >> making >> >> >> MM >> >> >> >> capable of working with HBase. The only downside to me is that it >> >> seems >> >> >> >> they do a lot of intrusive stuff to HBase like creating new index >> >> tables >> >> >> >> etc... I would normally not "allow" that for a simple connector. >> >> >> >> >> >> >> >> Maybe we should simply support both styles. And in the case of >> >> Phoenix, >> >> >> I >> >> >> >> guess we could simply go through the JDBC module of MetaModel and >> >> >> connect >> >> >> >> via their JDBC driver... Is that maybe a route, do you know? >> >> >> >> >> >> >> >> - Kasper >> >> >> >> >> >> >> >> >> >> >> >> 2014-02-24 6:37 GMT+01:00 Henry Saputra <[email protected] >> >: >> >> >> >> >> >> >> >> We could use the HBase client library from the store I suppose. >> >> >> >>> The issue I am actually worry is actually adding real query >> support >> >> >> >>> for column based datastore is kind of big task. >> >> >> >>> Apache Phoenix tried to do that so maybe we could leverage the >> SQL >> >> >> >>> planner layer to provide the implementation of the query >> execution >> >> to >> >> >> >>> HBase layer? >> >> >> >>> >> >> >> >>> - Henry >> >> >> >>> >> >> >> >>> >> >> >> >>> On Mon, Feb 17, 2014 at 9:33 AM, Kasper Sørensen >> >> >> >>> <[email protected]> wrote: >> >> >> >>> > Thanks for the input Henry. With your experience, do you then >> also >> >> >> >>> happen >> >> >> >>> > to know of a good thin client-side library? I imagine that we >> >> could >> >> >> >>> maybe >> >> >> >>> > use a REST client instead of the full client we currently use. >> >> That >> >> >> >>> would >> >> >> >>> > save us a ton of dependency-overhead I think. Or is it a >> >> non-issue in >> >> >> >>> your >> >> >> >>> > mind, since HBase users are used to this overhead? >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > 2014-02-16 7:16 GMT+01:00 Henry Saputra < >> [email protected] >> >> >: >> >> >> >>> > >> >> >> >>> >> For 1 > I think adding read only to HBase should be ok because >> >> most >> >> >> >>> >> update to HBase either through HBase client or REST via >> Stargate >> >> [1] >> >> >> >>> >> or Thrift >> >> >> >>> >> >> >> >> >>> >> For 2 > In Apache Gora we use Avro to do type mapping to >> column >> >> and >> >> >> >>> >> generate POJO java via Avro compiler. >> >> >> >>> >> >> >> >> >>> >> For 3 > This is the one I am kinda torn. Apache Phoenix >> >> incubating >> >> >> try >> >> >> >>> >> to provide SQL to HBase [2] via extra indexing and caching. I >> >> think >> >> >> >>> >> this is defeat the purpose of having NoSQL databases that >> serve >> >> >> >>> >> different purpose than Relational databse. >> >> >> >>> >> >> >> >> >>> >> I am not sure Metamodel should touch NoSQL databases which >> more >> >> like >> >> >> >>> >> column types. These databases are designed for large data with >> >> >> access >> >> >> >>> >> primary via key and not query mechanism. >> >> >> >>> >> >> >> >> >>> >> Just my 2-cent >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> [1] http://wiki.apache.org/hadoop/Hbase/Stargate >> >> >> >>> >> [2] http://phoenix.incubator.apache.org/ >> >> >> >>> >> >> >> >> >>> >> On Fri, Jan 24, 2014 at 11:35 AM, Kasper Sørensen >> >> >> >>> >> <[email protected]> wrote: >> >> >> >>> >> > Hi everyone, >> >> >> >>> >> > >> >> >> >>> >> > I was looking at our "hbase-module" branch and as much as I >> >> like >> >> >> this >> >> >> >>> >> idea, >> >> >> >>> >> > I think we've been a bit too idle with the branch. Maybe we >> >> should >> >> >> >>> try to >> >> >> >>> >> > make something final e.g. for a version 4.1. >> >> >> >>> >> > >> >> >> >>> >> > So I thought to give an overview/status of the module's >> current >> >> >> >>> >> > capabilities and it's shortcomings. We should figure out if >> we >> >> >> think >> >> >> >>> this >> >> >> >>> >> > is good enough for a first version, or if we want to do some >> >> >> >>> improvements >> >> >> >>> >> > to the module before adding it to our portfolio of MetaModel >> >> >> modules. >> >> >> >>> >> > >> >> >> >>> >> > 1) The module only offers read-only/query access to HBase. >> >> That is >> >> >> >>> in my >> >> >> >>> >> > opinion OK for now, we have several such modules, and this >> is >> >> >> >>> something >> >> >> >>> >> we >> >> >> >>> >> > can better add later if we straighten out the remaining >> topics >> >> in >> >> >> >>> this >> >> >> >>> >> mail. >> >> >> >>> >> > >> >> >> >>> >> > 2) With regards to metadata mapping: HBase is different >> >> because it >> >> >> >>> has >> >> >> >>> >> both >> >> >> >>> >> > column families and in column families there are columns. >> For >> >> the >> >> >> >>> sake of >> >> >> >>> >> > our view on HBase I would describe column families simply >> as "a >> >> >> >>> logical >> >> >> >>> >> of >> >> >> >>> >> > columns". Column families are fixed within a table, but rows >> >> in a >> >> >> >>> table >> >> >> >>> >> may >> >> >> >>> >> > contain arbitrary numbers of columns within each column >> family. >> >> >> >>> So... You >> >> >> >>> >> > can instantiate the HBaseDataContext in two ways: >> >> >> >>> >> > >> >> >> >>> >> > 2a) You can let MetaModel discover the metadata. This >> >> >> unfortunately >> >> >> >>> has a >> >> >> >>> >> > severe limitation. We discover the table names and column >> >> families >> >> >> >>> using >> >> >> >>> >> > the HBase API. But the actual columns and their contents >> >> cannot be >> >> >> >>> >> provided >> >> >> >>> >> > by the API. So instead we simply expose the column families >> >> with a >> >> >> >>> MAP >> >> >> >>> >> data >> >> >> >>> >> > types. The trouble with this is that the keys and values of >> the >> >> >> maps >> >> >> >>> will >> >> >> >>> >> > simply be byte-arrays ... Usually not very useful! But it's >> >> sort >> >> >> of >> >> >> >>> the >> >> >> >>> >> > only thing (as far as I can see) that's "safe" in HBase, >> since >> >> >> HBase >> >> >> >>> >> allows >> >> >> >>> >> > anything (byte arrays) in it's columns. >> >> >> >>> >> > >> >> >> >>> >> > 2b) Like in e.g. MongoDb or CouchDb modules you can provide >> an >> >> >> array >> >> >> >>> of >> >> >> >>> >> > tables (SimpleTableDef). That way the user defines the >> metadata >> >> >> >>> himself >> >> >> >>> >> and >> >> >> >>> >> > the implementation assumes that it is correct (or else it >> will >> >> >> >>> break). >> >> >> >>> >> The >> >> >> >>> >> > good thing about this is that the user can define the proper >> >> data >> >> >> >>> types >> >> >> >>> >> > etc. for columns. The user defines the column family and >> column >> >> >> name >> >> >> >>> by >> >> >> >>> >> > setting defining the MetaModel column name as this: >> >> "family:name" >> >> >> >>> >> > (consistent with most HBase tools and API calls). >> >> >> >>> >> > >> >> >> >>> >> > 3) With regards to querying: We've implemented basic query >> >> >> >>> capabilities >> >> >> >>> >> > using the MetaModel query postprocessor. But not all queries >> >> are >> >> >> very >> >> >> >>> >> > effective... In addition to of course full table scans, we >> have >> >> >> >>> optimized >> >> >> >>> >> > support of of COUNT queries and of table scans with maxRows. >> >> >> >>> >> > >> >> >> >>> >> > We could rather easily add optimized support for a couple of >> >> other >> >> >> >>> >> typical >> >> >> >>> >> > queries: >> >> >> >>> >> > * lookup record by ID >> >> >> >>> >> > * paged table scans (both firstRow and maxRows) >> >> >> >>> >> > * queries with simple filters/where items >> >> >> >>> >> > >> >> >> >>> >> > 4) With regards to dependencies: The module right now >> depends >> >> on >> >> >> the >> >> >> >>> >> > artifact called "hbase-client". This dependency has a loot >> of >> >> >> >>> transient >> >> >> >>> >> > dependencies so the size of the module is quite extreme. As >> an >> >> >> >>> example, >> >> >> >>> >> it >> >> >> >>> >> > includes stuff like jetty, jersey, jackson and of course >> >> hadoop... >> >> >> >>> But I >> >> >> >>> >> am >> >> >> >>> >> > wondering if we can have a more thin client-side than that! >> If >> >> >> anyone >> >> >> >>> >> knows >> >> >> >>> >> > if e.g. we can use the REST interface easily or so, that >> would >> >> >> maybe >> >> >> >>> be >> >> >> >>> >> > better. I'm not an expert on HBase though, so please >> enlighten >> >> me! >> >> >> >>> >> > >> >> >> >>> >> > Kind regards, >> >> >> >>> >> > Kasper >> >> >> >>> >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
