[ 
https://issues.apache.org/jira/browse/HBASE-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019020#comment-16019020
 ] 

Josh Elser commented on HBASE-18075:
------------------------------------

{code}
  public static void main(String[] args) throws Exception {
    HashSet<Integer> disallowedUnicodeValues = new HashSet<>();
    for (int i = 1; i <= 25; i++) {
      disallowedUnicodeValues.add(i);
    }
    for (int i = 127; i <= 159; i++) {
      disallowedUnicodeValues.add(i);
    }
    for (int i = 55296; i <= 63743; i++) {
      disallowedUnicodeValues.add(i);
    }
    for (int i = 65520; i <= 65535; i++) {
      disallowedUnicodeValues.add(i);
    }
    for (Integer unicodeValue : disallowedUnicodeValues) {
      if (Character.isAlphabetic(unicodeValue.intValue())) {
        System.out.println(unicodeValue + " is alphabetic");
      }
    }
    System.out.println("Done");
  }
{code}

The above simply prints "Done". Let me modify the patch to remove the explicit 
check and just update the documentation to point to ZK for future devs in this 
part of the code.

> Support namespaces and tables with non-latin alphabetical characters
> --------------------------------------------------------------------
>
>                 Key: HBASE-18075
>                 URL: https://issues.apache.org/jira/browse/HBASE-18075
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 2.0.0
>
>         Attachments: HBASE-18075.001.patch, HBASE-18075.002.patch
>
>
> On the heels of HBASE-18067, it would be nice to support namespaces and 
> tables with names that fall outside of Latin alphabetical characters and 
> numbers.
> Our current regex for allowable characters is approximately 
> {{\[a-zA-Z0-9\]+}}.
> It would be nice to replace {{a-zA-Z}} with Java's {{\p\{IsAlphabetic\}}} 
> which will naturally restrict the unicode character space down to just those 
> that are part of the alphabet for each script (e.g. latin, cyrillic, greek).
> Technically, our possible scope of allowable characters is, best as I can 
> tell, only limited by the limitations of ZooKeeper itself 
> https://zookeeper.apache.org/doc/r3.4.10/zookeeperProgrammers.html#ch_zkDataModel
>  (as both table and namespace are created as znodes).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to