keith-turner commented on issue #88: URL: https://github.com/apache/accumulo-access/issues/88#issuecomment-3642831573
One path we could consider is following the blue print for what we did w/ removing table properties from system config (see https://github.com/apache/accumulo/issues/4537). That could look something like the following. 1. In 2.1.5 or greater offer a new tool that can scan tables and find non utf8 visibility fields. The tool would return the key and the user could then take corrective action for those keys. 2. In the 2.1.5 pre upgrade step warn user that they should run this tool if they suspect they might have offending visibility fields. 3. In 4.0 the code will simply fail when it sees non utf8 chars on ingest and scan. Writing an iterator that only returns non utf8 visibility keys would be easy. The harder part would be running it over all data, that would probably need to be something that could be done piecemeal. One thing I am not sure about w/ the above is doing the validation at ingest time, what would that look like for bulk import and batch write? If scans would fail on non-utf8 data, then we would want to make it really hard to ingest that data into the system. If we do checks before the 2.1->4.0 upgrade but its still easy for users to ingest offending data after the upgrade, then it has the potential to cause unwanted surprises at scan time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
