ctubbsii commented on issue #88:
URL: https://github.com/apache/accumulo-access/issues/88#issuecomment-3644093932

   I think these are the actionable items:
   
   1. Be more specific in the EBNF to clarify what kinds of characters are 
valid (some subset of [Unicode character 
categories](https://www.unicode.org/reports/tr44/#General_Category_Values)).
      1. EBNF defines characters, not bytes
      2. Characters *MUST* be valid Unicode codepoints
      3. Characters *SHOULD* be human-readable (noncharacters, reserved 
codepoints, and control characters are not recommended, but technically still 
allowable, because it'd be hard to check for them and I'm not sure it's worth 
it)
      4. UTF-8 *SHOULD* be used for persistence
   2. Deprecate and phase out APIs that allow inputting Authorizations and 
ColumnVisibility as bytes
      1. Use String or CharSequence to ensure that what is specified is a 
sequence of Unicode characters
      2. Add validation to deprecated APIs that still accept bytes
      3. Check on upgrade to see if any existing authorizations contain 
disallowed characters (unlikely, but not a bad thing to check for)
   3. When reading existing data, decode persisted bytes using UTF-8 decoder, 
and treat decoding errors as invalid visibility expression, in the same way 
that mismatched parentheses would be treated like an invalid visibility 
expression.
   
   I would be interested in a second beta release of accumulo-access whose API 
was String-centric, rather than one that allowed arbitrary bytes. I understand 
that there may be issues with that, but I'd be curious to see if we could make 
it work. I think at its core, the main issue here is that we assume people are 
going to use Unicode characters, but we allow raw bytes, which can break our 
assumptions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to