I also just pushed some updates to the branch. I realized that I had only
done local commits and due to some conflict it was not getting pushed to
the central git repo. So apologies I have been mentioning stuff that wasn't
quite the same in other's repos.


2014-01-28 Kasper Sørensen <[email protected]>

> Regarding point no. 4 ... I was just investigating and tried making a
> "thinner" HBase client simply by adding Maven <exclude>s to the
> hbase-client dependency. I eventually came up with this quite long list of
> excludes that at least do not affect our (tested) usage of HBase:
>
> <exclusion>
> <artifactId>log4j</artifactId>
> <groupId>log4j</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>commons-logging</artifactId>
>  <groupId>commons-logging</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>netty</artifactId>
> <groupId>io.netty</groupId>
> </exclusion>
>  <exclusion>
> <artifactId>jersey-json</artifactId>
> <groupId>com.sun.jersey</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>jersey-server</artifactId>
>  <groupId>com.sun.jersey</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>jersey-core</artifactId>
> <groupId>com.sun.jersey</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>jackson-mapper-asl</artifactId>
>  <groupId>org.codehaus.jackson</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>jsp-2.1</artifactId>
> <groupId>org.mortbay.jetty</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>jsp-api-2.1</artifactId>
>  <groupId>org.mortbay.jetty</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>jasper-compiler</artifactId>
> <groupId>tomcat</groupId>
> </exclusion>
>  <exclusion>
> <artifactId>jasper-runtime</artifactId>
> <groupId>tomcat</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>jetty-util</artifactId>
>  <groupId>org.mortbay.jetty</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>jetty</artifactId>
> <groupId>org.mortbay.jetty</groupId>
> </exclusion>
>  <exclusion>
> <artifactId>commons-httpclient</artifactId>
> <groupId>commons-httpclient</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>findbugs-annotations</artifactId>
>  <groupId>com.github.stephenc.findbugs</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>commons-cli</artifactId>
> <groupId>commons-cli</groupId>
> </exclusion>
>  <exclusion>
> <artifactId>commons-el</artifactId>
> <groupId>commons-el</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>commons-net</artifactId>
>  <groupId>commons-net</groupId>
> </exclusion>
> <exclusion>
>  <artifactId>xmlenc</artifactId>
> <groupId>xmlenc</groupId>
> </exclusion>
>  <exclusion>
> <artifactId>commons-math</artifactId>
> <groupId>org.apache.commons</groupId>
>  </exclusion>
> <exclusion>
> <artifactId>jsr305</artifactId>
>  <groupId>com.google.code.findbugs</groupId>
> </exclusion>
>
> Quite a long list ... I'm not feeling super happy to commit this, but it
> seems the best option to use the native HBase client and with these
> exclusions it is at least trimmed down to just the dependencies that we
> actually need.
>
>
> 2014-01-27 Henry Saputra <[email protected]>
>
> Kasper, sorry typo =)
>>
>> On Mon, Jan 27, 2014 at 1:07 PM, Henry Saputra <[email protected]>
>> wrote:
>> > Sorry Kapser, a bit busy and hectic with my schedule so I have punt my
>> > response later. Apologize about the delay.
>> >
>> > - Henry
>> >
>> > On Mon, Jan 27, 2014 at 12:18 PM, Kasper Sørensen
>> > <[email protected]> wrote:
>> >> OK to kick things off, let me provide my own input for this discussion.
>> >> Please find below my thoughts on the issues and what we need to do.
>> Your
>> >> feedback is very very welcome.
>> >>
>> >>
>> >> 2014-01-24 Kasper Sørensen <[email protected]>
>> >>
>> >>> Hi everyone,
>> >>>
>> >>> I was looking at our "hbase-module" branch and as much as I like this
>> >>> idea, I think we've been a bit too idle with the branch. Maybe we
>> should
>> >>> try to make something final e.g. for a version 4.1.
>> >>>
>> >>> So I thought to give an overview/status of the module's current
>> >>> capabilities and it's shortcomings. We should figure out if we think
>> this
>> >>> is good enough for a first version, or if we want to do some
>> improvements
>> >>> to the module before adding it to our portfolio of MetaModel modules.
>> >>>
>> >>> 1) The module only offers read-only/query access to HBase. That is in
>> my
>> >>> opinion OK for now, we have several such modules, and this is
>> something we
>> >>> can better add later if we straighten out the remaining topics in
>> this mail.
>> >>>
>> >>
>> >> No problem
>> >>
>> >>
>> >>> 2) With regards to metadata mapping: HBase is different because it has
>> >>> both column families and in column families there are columns. For
>> the sake
>> >>> of our view on HBase I would describe column families simply as "a
>> logical
>> >>> of columns". Column families are fixed within a table, but rows in a
>> table
>> >>> may contain arbitrary numbers of columns within each column family.
>> So...
>> >>> You can instantiate the HBaseDataContext in two ways:
>> >>>
>> >>> 2a) You can let MetaModel discover the metadata. This unfortunately
>> has a
>> >>> severe limitation. We discover the table names and column families
>> using
>> >>> the HBase API. But the actual columns and their contents cannot be
>> provided
>> >>> by the API. So instead we simply expose the column families with a
>> MAP data
>> >>> types. The trouble with this is that the keys and values of the maps
>> will
>> >>> simply be byte-arrays ... Usually not very useful! But it's sort of
>> the
>> >>> only thing (as far as I can see) that's "safe" in HBase, since HBase
>> allows
>> >>> anything (byte arrays) in it's columns.
>> >>>
>> >>
>> >> I think we could maybe add a flag here to allow MetaModel to assume
>> that
>> >> column keys are of String type. That would at least make the discovered
>> >> metadata more meaningful since we can expose columns and not just
>> column
>> >> families. It's still going to be tough to figure out the value types,
>> but
>> >> we could e.g. make the Column implementations mutable and allow setting
>> >> ColumnType on a "live" HBaseColumn.
>> >>
>> >>
>> >>> 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array
>> of
>> >>> tables (SimpleTableDef). That way the user defines the metadata
>> himself and
>> >>> the implementation assumes that it is correct (or else it will
>> break). The
>> >>> good thing about this is that the user can define the proper data
>> types
>> >>> etc. for columns. The user defines the column family and column name
>> by
>> >>> setting defining the MetaModel column name as this: "family:name"
>> >>> (consistent with most HBase tools and API calls).
>> >>>
>> >>
>> >> This is good, but requires more of the user.
>> >>
>> >>
>> >>> 3) With regards to querying: We've implemented basic query
>> capabilities
>> >>> using the MetaModel query postprocessor. But not all queries are very
>> >>> effective... In addition to of course full table scans, we have
>> optimized
>> >>> support of of COUNT queries and of table scans with maxRows.
>> >>>
>> >>> We could rather easily add optimized support for a couple of other
>> typical
>> >>> queries:
>> >>>  * lookup record by ID
>> >>>  * paged table scans (both firstRow and maxRows)
>> >>>  * queries with simple filters/where items
>> >>>
>> >>
>> >> I think "lookup record by ID" is a MUST, since this is a whole other
>> class
>> >> of queries in HBase (Get instead of Scan).
>> >>
>> >> Other optimizations would be nice too, but for the usage I have I could
>> >> live without it in the first release.
>> >>
>> >>
>> >>> 4) With regards to dependencies: The module right now depends on the
>> >>> artifact called "hbase-client". This dependency has a loot of
>> transient
>> >>> dependencies so the size of the module is quite extreme. As an
>> example, it
>> >>> includes stuff like jetty, jersey, jackson and of course hadoop...
>> But I am
>> >>> wondering if we can have a more thin client-side than that! If anyone
>> knows
>> >>> if e.g. we can use the REST interface easily or so, that would maybe
>> be
>> >>> better. I'm not an expert on HBase though, so please enlighten me!
>> >>>
>> >>
>> >> This is a big problem IMO. Anyone with HBase client experience? Would
>> be a
>> >> lot better with a thin client somehow.
>> >>
>> >>
>> >>> Kind regards,
>> >>> Kasper
>> >>>
>> >>>
>> >>>
>>
>
>

Reply via email to