[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16254276#comment-16254276
 ] 

Sandeep More commented on ORC-256:
----------------------------------

Thanks for the review [~owen.omalley] !

bq. We'd do better with an explicit Range class that held the start/end and 
make an ArrayList of those. It would be much faster to iterate over. There 
isn't any need for a synchronized container class in this context.

I am assuming you are talking about 
[unmaskIndexRanges|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR124]
 map 
The reason I did not use a separate Range class is because I wanted to keep the 
Map sorted based on the index (basically when I normalize the negative 
indexes). I use TreeMap to make sure that the map is always sorted (needed for 
string substitution operations). Using LinkedList and Range class would work 
but then I would have to call Collections.sort() every time I update the map. 
Let me know if you prefer LinkedList implementation over the TreeMap 
implementation, I can change it.

I can probably lose the synchronization on unmaskIndexRanges.

bq. You have some spurious whitespace changes at line 271 & 291.

Will fix this !

bq.  It probably is easier to reason about the numerics if we: 1. Use the 
original code if there is no unmasking. 2.Convert the original value to a 
string, use maskString, and then convert it back.

I think I may not be following you completely here. The patch takes care of 
this, for e.g. the function to mask long 
[unmaskRangeLongValue|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR915]
 does the same, if no masking is needed (map unmaskIndexRanges is empty) return 
original else convert the original value to string (String.valueOf(value)) and 
mask it then convert it back.

Let me know your thoughts, and thanks again for the review !

P.S. do you think we should keep the the unmasking option for decimal given the 
subtleties (precision, length) ?







> Add unmasked ranges option for redact mask
> ------------------------------------------
>
>                 Key: ORC-256
>                 URL: https://issues.apache.org/jira/browse/ORC-256
>             Project: ORC
>          Issue Type: Sub-task
>            Reporter: Owen O'Malley
>            Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to