[ 
https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750429#comment-15750429
 ] 

Daniel Vimont commented on HBASE-17257:
---------------------------------------

Here is a partial list of outstanding questions regarding this JIRA issue (to 
add a column-aliasing capability to HBase):

*META-ASSESSMENT*: Is this proposed feature (i.e. column-aliasing) of 
sufficient potential importance/usefulness to the HBase ecosystem to justify 
people taking the time to assess its technical viability via this JIRA entry? 
([~aoxiang] commented above "I think it is a very useful feature." This is 
helpful, and others have anecdotally given me similar comments in other venues 
in the past few months, but input from others would still be welcome, 
particularly from those who might take a contrary viewpoint.)

*QUESTION 1* -- Aside from the question of where to persist alias-mapping data 
(that question is below) -- _Is the overall architecture for aliasing as 
presented in patch-v3 (and as outlined in the specifications above)..._
(a) ... viable?
(b) ... optimal?

*QUESTION 2a* -- _Where should alias-mapping data (referred to hereafter as 
"hbase:alias") be persisted?_
I currently see three options:
(1) An existing system table (hbase:meta has been suggested). This might entail 
scalability problems, since system tables apparently cannot be split into more 
than one region.
(2) A separate table in the "hbase" namespace, but created/administered as a 
"normal" HBase table on RegionServer(s). [This is currently what is offered in 
patch-v3: "hbase:alias" is in the system namespace, but it is a standard HBase 
table created on a "normal" RegionServer. In my original design-phase, I went 
with this option after realizing the scalability limitations inherent in Option 
1, above.]
(3) A separate aliasMapping table for every alias-enabled Table.

*QUESTION 2b* -- _How to increase/improve fault-tolerance of hbase:alias table:_
Given that read/write access to all alias-enabled column-families in a cluster 
is completely dependent upon unfettered read/write access to hbase:alias, are 
there heightened "fault tolerance" options available to decrease the chances of 
an unscheduled unavailability of hbase:alias? (Concerns of this have been 
relayed to me in offline discussions.)

*QUESTION 3* -- _Does basic read/write access to alias-enabled column-families 
need to be guaranteed when/if the Master server becomes unavailable?_
It seems to be a publicized feature of HBase that basic read/write access to 
HBase tables (i.e. to RegionServers) can continue uninterrupted even when the 
Master server becomes unavailable. However, in the current, patch-v3 
architecture, column-aliasing requires access to the Master server (to fulfill 
#getTableDescriptor RPC invocations to look-up aliasing metadata for a table 
and its families). I have experimented with a potential "patch-v4" architecture 
in which all required metadata for a column-family is stored in hbase:alias, 
negating the need for invocation of #getTableDescriptor, and thus making 
alias-enabled tables still ostensibly accessible in the event of an outage of 
the Master server. I can finalize and submit this architectural variation upon 
request.

*THANKS TO ALL WHO HAVE READ TO THIS POINT!! I VERY MUCH APPRECIATE YOUR 
TIME!!!*

> Add column-aliasing capability to hbase-client
> ----------------------------------------------
>
>                 Key: HBASE-17257
>                 URL: https://issues.apache.org/jira/browse/HBASE-17257
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client
>    Affects Versions: 2.0.0
>            Reporter: Daniel Vimont
>            Assignee: Daniel Vimont
>              Labels: features
>         Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, 
> HBASE-17257.patch
>
>
> Review Board link: https://reviews.apache.org/r/54635/
> Column aliasing will provide the option for a 1, 2, or 4 byte alias value to 
> be stored in each cell of an "alias enabled" column-family, in place of the 
> full-length column-qualifier. Aliasing is intended to operate completely 
> invisibly to the end-user developer, with absolutely no "awareness" of 
> aliasing required to be coded into a front-end application. No new public 
> hbase-client interfaces are to be introduced, and only a few new public 
> methods should need to be added to existing interfaces, primarily to allow an 
> administrator to designate that a new column-family is to be alias-enabled by 
> setting its aliasSize attribute to 1, 2, or 4.
> To facilitate such functionality, new subclasses of HTable, 
> BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding 
> methods of these new subclasses will invoke methods of the new AliasManager 
> class to facilitate qualifier-to-alias conversions (for user-submitted Gets, 
> Scans, and Mutations) and alias-to-qualifier conversions (for Results 
> returned from HBase) for any Table that has one or more alias-enabled column 
> families. All conversion logic will be encapsulated in the new AliasManager 
> class, and all qualifier-to-alias mappings will be persisted in a new 
> aliasMappingTable in a new, reserved namespace.
> An informal polling of HBase users at HBaseCon East and at the 
> Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing 
> could be a popular enhancement to standard HBase functionality, due to the 
> fact that full column-qualifiers are stored in each cell, and reducing this 
> qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove 
> beneficial in terms of reduced storage and bandwidth needs. Aliasing is 
> intended chiefly for column-families which are of the "narrow and tall" 
> variety (i.e., that are designed to use relatively few distinct 
> column-qualifiers throughout a large number of rows, throughout the lifespan 
> of the column-family). A column-family that is set up with an alias-size of 1 
> byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size 
> allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size 
> allows for up to 4,294,967,295 unique column-qualifiers.
> Fuller specifications will be entered into the comments section below. Note 
> that it may well not be viable to add aliasing support in the new "async" 
> classes that appear to be currently under development.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to