OK cool. I've created a remote branch called 'hbase-module' for this.
Here's an overview of what is there and what is still missing or
doubtful in my opinion. Let's discuss these items a bit so we agree on
the design:

* The module is working and the test case should be running on a
out-of-the-box install of HBase on localhost.
* The module so far is read/querying-only. I believe we should add
also write/update capabilities, but I wanted to agree on some of these
issues before implementing...

* Which version of HBase to depend on? It seems that the dependency
isn't completely stable and I am not sure if someone who upgrades or
downgrades the current version (0.95.1-hadoop1) would still have a
functional library. It would be great if there was a more "client
oriented" JAR to depend on since I see the one we currently employ
takes in the full set of Hadoop dependencies and I was initially more
aiming for a thin(ner) client.

* How to do schema definition / detection? The current implementation
allows the user to define his own schema model using the
SimpleTableDef class. If he doesn't then every column family will be
detected and represented as a column with a MAP data type (since a
column family is actually not a single field, but a family of fields).
Furthermore the contents of this map will be binary (Map<byte[],
byte[]>) since HBase does not reveal anything about the type or
contents of the column families (in deed they are dynamically
populated so it can be anything). Should we keep it like this (binary
by default, but manually overrideable) or should we go further in
terms of autodetection? In CouchDB and MongoDB for instance, we
autodetect by querying the first 1000 documents and then build a
schema model based on those. Still not exact science...

* How to represent a column in a column family? So far I used what
seemed like a convention, that a user can specify a column "foo" in a
column family "bar" with a colon inbetween like this: "bar:foo".
Thereby he can specify in his SimpleTableDef a column named "bar:foo"
and apply the data type VARCHAR, which will cause MetaModel to
automatically read the binary contents as a string. So far only string
types are converted this way, but we should add more (HBaseRow.java
line 71).

I think that was all from me. Please help with your inputs.

Best regards,
Kasper


2013/7/30 sameer arora <[email protected]>:
> The remote branch approach should work well as we might not be making a lot
> of changes in the other Metamodel modules or MM-core, I have followed this
> approach earlier and never faced trouble while merging the branch back on
> the Master.
>
> regards
> Sameer
>
>
> On Tue, Jul 30, 2013 at 12:28 PM, Kasper Sørensen <
> [email protected]> wrote:
>
>> Hi everyone,
>>
>> Spoke with Sameer who originally started the HBase branch of MetaModel
>> on the old infrastructure and we thought it would be good to also make
>> a remote branch in the Git repo for this effort. The HBase module does
>> pose a few challenges [1], so having it on a branch would allow
>> developers to work on it without having to deliver a fully fledged
>> solution from the first patch.
>>
>> Is that a good use of a Git remote branch, or do we need some Git
>> education on this? :-)
>>
>> Regards,
>> Kasper
>>
>> [1] One of the distinct challenges of HBase so far has been that the
>> data types are not directly visible - everything is more or less just
>> a byte array. Another issue is how we should represent the nested
>> column structure of HBase (every column has a 'column family'). And
>> there are probably more...
>>

Reply via email to