keith-turner commented on issue #88:
URL: https://github.com/apache/accumulo-access/issues/88#issuecomment-3642831573

   One path we could consider is following the blue print for what we did w/ 
removing table properties from system config (see 
https://github.com/apache/accumulo/issues/4537).  That could look something 
like the following.
   
    1. In 2.1.5 or greater offer a new tool that can scan tables and find non 
utf8 visibility fields.  The tool would return the key and the user could then 
take corrective action for those keys.
    2. In the 2.1.5 pre upgrade step warn user that they should run this tool 
if they suspect they might have offending visibility fields.
    3. In 4.0 the code will simply fail when it sees non utf8 chars on ingest 
and scan.
   
   Writing an iterator that only returns non utf8 visibility keys would be 
easy.   The harder part would be running it over all data, that would probably 
need to be something that could be done piecemeal.
   
   One thing I am not sure about w/ the above is doing the validation at ingest 
time, what would that look like for bulk import and batch write? If scans would 
fail on non-utf8 data, then we would want to make it really hard to ingest that 
data into the system.  If we do checks before the 2.1->4.0 upgrade but its 
still easy for users to ingest offending data after the upgrade, then it has 
the potential to cause unwanted surprises at scan time.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to