[ 
https://issues.apache.org/jira/browse/ACCUMULO-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573687#comment-13573687
 ] 

Christopher Tubbs commented on ACCUMULO-164:
--------------------------------------------

{quote}
The feature in its simplest sense does conflict with the original idea behind 
locality groups, but is that always "bad"? I'm not sure, but it's definitely 
different.
{quote}
We've already extended the original idea behind locality groups, by allowing 
users to specify more than one column family for a locality group. And, I think 
that is definitely not "bad" ("good", even). This is just an easier way to 
select multiple families to put in a locality group, based on a common 
characteristic (like common prefix).

However, I question why something like "common prefix" should be a desirable 
selection mechanism for multiple families in the first place. Not only are (in 
the case of the common prefix) these data naturally grouped locally without any 
use of locality groups, it's not clear to me that something like "common 
prefix" is the most sensible way to group related families in the general case. 
I'm not sure there *is* a general case, though. Perhaps len < 4 is more useful 
than identifying a common prefix for some users? Further, the only application 
for this, that I can think of, is when users introduce variability into the 
family that allows the number of distinct families to grow continuously (which, 
I think can be, and should be, done in the qualifier instead). So, I personally 
see little benefit to it (at least, for the common prefix case; though full 
regexes or suffixes would certainly have greater benefit).

Maybe the most useful, and general, thing we could do to provide users the most 
utility to select families for a locality group, is to allow users to inject a 
user-defined hash function (maybe in JEXL?) to bin families into discrete 
localities by the arbitrary method of their choosing?

{quote}
Do you have any ideas on how to present such a feature that would avoid 
steering the common user toward it? Is healthy warning/documentation sufficient?
{quote}
If implemented, I think documentation should be sufficient to address all of my 
concerns. The main thing is just make it clear that the feature is used to 
*select* multiple column families, so that it's not implied that families with 
variability *are* the same "family". The API treats non-equal families as 
distinct, and that's how we should discuss them.
                
> Add support for wildcards/regexes in locality group setting.
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-164
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-164
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client, master, tserver
>            Reporter: John Vines
>             Fix For: 1.6.0
>
>
> We should look into adding the ability to specify locality group columns as 
> either wildcarding or regexes. I'm unsure of the feasibility of this, hence 
> the lack of fix date.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to