[
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254276#comment-16254276
]
Sandeep More commented on ORC-256:
----------------------------------
Thanks for the review [~owen.omalley] !
bq. We'd do better with an explicit Range class that held the start/end and
make an ArrayList of those. It would be much faster to iterate over. There
isn't any need for a synchronized container class in this context.
I am assuming you are talking about
[unmaskIndexRanges|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR124]
map
The reason I did not use a separate Range class is because I wanted to keep the
Map sorted based on the index (basically when I normalize the negative
indexes). I use TreeMap to make sure that the map is always sorted (needed for
string substitution operations). Using LinkedList and Range class would work
but then I would have to call Collections.sort() every time I update the map.
Let me know if you prefer LinkedList implementation over the TreeMap
implementation, I can change it.
I can probably lose the synchronization on unmaskIndexRanges.
bq. You have some spurious whitespace changes at line 271 & 291.
Will fix this !
bq. It probably is easier to reason about the numerics if we: 1. Use the
original code if there is no unmasking. 2.Convert the original value to a
string, use maskString, and then convert it back.
I think I may not be following you completely here. The patch takes care of
this, for e.g. the function to mask long
[unmaskRangeLongValue|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR915]
does the same, if no masking is needed (map unmaskIndexRanges is empty) return
original else convert the original value to string (String.valueOf(value)) and
mask it then convert it back.
Let me know your thoughts, and thanks again for the review !
P.S. do you think we should keep the the unmasking option for decimal given the
subtleties (precision, length) ?
> Add unmasked ranges option for redact mask
> ------------------------------------------
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
> Issue Type: Sub-task
> Reporter: Owen O'Malley
> Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave
> certain ranges of strings unmasked.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)