I also just pushed some updates to the branch. I realized that I had only done local commits and due to some conflict it was not getting pushed to the central git repo. So apologies I have been mentioning stuff that wasn't quite the same in other's repos.
2014-01-28 Kasper Sørensen <[email protected]> > Regarding point no. 4 ... I was just investigating and tried making a > "thinner" HBase client simply by adding Maven <exclude>s to the > hbase-client dependency. I eventually came up with this quite long list of > excludes that at least do not affect our (tested) usage of HBase: > > <exclusion> > <artifactId>log4j</artifactId> > <groupId>log4j</groupId> > </exclusion> > <exclusion> > <artifactId>commons-logging</artifactId> > <groupId>commons-logging</groupId> > </exclusion> > <exclusion> > <artifactId>netty</artifactId> > <groupId>io.netty</groupId> > </exclusion> > <exclusion> > <artifactId>jersey-json</artifactId> > <groupId>com.sun.jersey</groupId> > </exclusion> > <exclusion> > <artifactId>jersey-server</artifactId> > <groupId>com.sun.jersey</groupId> > </exclusion> > <exclusion> > <artifactId>jersey-core</artifactId> > <groupId>com.sun.jersey</groupId> > </exclusion> > <exclusion> > <artifactId>jackson-mapper-asl</artifactId> > <groupId>org.codehaus.jackson</groupId> > </exclusion> > <exclusion> > <artifactId>jsp-2.1</artifactId> > <groupId>org.mortbay.jetty</groupId> > </exclusion> > <exclusion> > <artifactId>jsp-api-2.1</artifactId> > <groupId>org.mortbay.jetty</groupId> > </exclusion> > <exclusion> > <artifactId>jasper-compiler</artifactId> > <groupId>tomcat</groupId> > </exclusion> > <exclusion> > <artifactId>jasper-runtime</artifactId> > <groupId>tomcat</groupId> > </exclusion> > <exclusion> > <artifactId>jetty-util</artifactId> > <groupId>org.mortbay.jetty</groupId> > </exclusion> > <exclusion> > <artifactId>jetty</artifactId> > <groupId>org.mortbay.jetty</groupId> > </exclusion> > <exclusion> > <artifactId>commons-httpclient</artifactId> > <groupId>commons-httpclient</groupId> > </exclusion> > <exclusion> > <artifactId>findbugs-annotations</artifactId> > <groupId>com.github.stephenc.findbugs</groupId> > </exclusion> > <exclusion> > <artifactId>commons-cli</artifactId> > <groupId>commons-cli</groupId> > </exclusion> > <exclusion> > <artifactId>commons-el</artifactId> > <groupId>commons-el</groupId> > </exclusion> > <exclusion> > <artifactId>commons-net</artifactId> > <groupId>commons-net</groupId> > </exclusion> > <exclusion> > <artifactId>xmlenc</artifactId> > <groupId>xmlenc</groupId> > </exclusion> > <exclusion> > <artifactId>commons-math</artifactId> > <groupId>org.apache.commons</groupId> > </exclusion> > <exclusion> > <artifactId>jsr305</artifactId> > <groupId>com.google.code.findbugs</groupId> > </exclusion> > > Quite a long list ... I'm not feeling super happy to commit this, but it > seems the best option to use the native HBase client and with these > exclusions it is at least trimmed down to just the dependencies that we > actually need. > > > 2014-01-27 Henry Saputra <[email protected]> > > Kasper, sorry typo =) >> >> On Mon, Jan 27, 2014 at 1:07 PM, Henry Saputra <[email protected]> >> wrote: >> > Sorry Kapser, a bit busy and hectic with my schedule so I have punt my >> > response later. Apologize about the delay. >> > >> > - Henry >> > >> > On Mon, Jan 27, 2014 at 12:18 PM, Kasper Sørensen >> > <[email protected]> wrote: >> >> OK to kick things off, let me provide my own input for this discussion. >> >> Please find below my thoughts on the issues and what we need to do. >> Your >> >> feedback is very very welcome. >> >> >> >> >> >> 2014-01-24 Kasper Sørensen <[email protected]> >> >> >> >>> Hi everyone, >> >>> >> >>> I was looking at our "hbase-module" branch and as much as I like this >> >>> idea, I think we've been a bit too idle with the branch. Maybe we >> should >> >>> try to make something final e.g. for a version 4.1. >> >>> >> >>> So I thought to give an overview/status of the module's current >> >>> capabilities and it's shortcomings. We should figure out if we think >> this >> >>> is good enough for a first version, or if we want to do some >> improvements >> >>> to the module before adding it to our portfolio of MetaModel modules. >> >>> >> >>> 1) The module only offers read-only/query access to HBase. That is in >> my >> >>> opinion OK for now, we have several such modules, and this is >> something we >> >>> can better add later if we straighten out the remaining topics in >> this mail. >> >>> >> >> >> >> No problem >> >> >> >> >> >>> 2) With regards to metadata mapping: HBase is different because it has >> >>> both column families and in column families there are columns. For >> the sake >> >>> of our view on HBase I would describe column families simply as "a >> logical >> >>> of columns". Column families are fixed within a table, but rows in a >> table >> >>> may contain arbitrary numbers of columns within each column family. >> So... >> >>> You can instantiate the HBaseDataContext in two ways: >> >>> >> >>> 2a) You can let MetaModel discover the metadata. This unfortunately >> has a >> >>> severe limitation. We discover the table names and column families >> using >> >>> the HBase API. But the actual columns and their contents cannot be >> provided >> >>> by the API. So instead we simply expose the column families with a >> MAP data >> >>> types. The trouble with this is that the keys and values of the maps >> will >> >>> simply be byte-arrays ... Usually not very useful! But it's sort of >> the >> >>> only thing (as far as I can see) that's "safe" in HBase, since HBase >> allows >> >>> anything (byte arrays) in it's columns. >> >>> >> >> >> >> I think we could maybe add a flag here to allow MetaModel to assume >> that >> >> column keys are of String type. That would at least make the discovered >> >> metadata more meaningful since we can expose columns and not just >> column >> >> families. It's still going to be tough to figure out the value types, >> but >> >> we could e.g. make the Column implementations mutable and allow setting >> >> ColumnType on a "live" HBaseColumn. >> >> >> >> >> >>> 2b) Like in e.g. MongoDb or CouchDb modules you can provide an array >> of >> >>> tables (SimpleTableDef). That way the user defines the metadata >> himself and >> >>> the implementation assumes that it is correct (or else it will >> break). The >> >>> good thing about this is that the user can define the proper data >> types >> >>> etc. for columns. The user defines the column family and column name >> by >> >>> setting defining the MetaModel column name as this: "family:name" >> >>> (consistent with most HBase tools and API calls). >> >>> >> >> >> >> This is good, but requires more of the user. >> >> >> >> >> >>> 3) With regards to querying: We've implemented basic query >> capabilities >> >>> using the MetaModel query postprocessor. But not all queries are very >> >>> effective... In addition to of course full table scans, we have >> optimized >> >>> support of of COUNT queries and of table scans with maxRows. >> >>> >> >>> We could rather easily add optimized support for a couple of other >> typical >> >>> queries: >> >>> * lookup record by ID >> >>> * paged table scans (both firstRow and maxRows) >> >>> * queries with simple filters/where items >> >>> >> >> >> >> I think "lookup record by ID" is a MUST, since this is a whole other >> class >> >> of queries in HBase (Get instead of Scan). >> >> >> >> Other optimizations would be nice too, but for the usage I have I could >> >> live without it in the first release. >> >> >> >> >> >>> 4) With regards to dependencies: The module right now depends on the >> >>> artifact called "hbase-client". This dependency has a loot of >> transient >> >>> dependencies so the size of the module is quite extreme. As an >> example, it >> >>> includes stuff like jetty, jersey, jackson and of course hadoop... >> But I am >> >>> wondering if we can have a more thin client-side than that! If anyone >> knows >> >>> if e.g. we can use the REST interface easily or so, that would maybe >> be >> >>> better. I'm not an expert on HBase though, so please enlighten me! >> >>> >> >> >> >> This is a big problem IMO. Anyone with HBase client experience? Would >> be a >> >> lot better with a thin client somehow. >> >> >> >> >> >>> Kind regards, >> >>> Kasper >> >>> >> >>> >> >>> >> > >
