[ 
https://issues.apache.org/jira/browse/CASSANDRA-17940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702693#comment-17702693
 ] 

Andres de la Peña commented on CASSANDRA-17940:
-----------------------------------------------

Thanks for the review.

I have been doing some profiling, as suggested by [~blerer]. It has shown that 
[this call to 
List#addAll|https://github.com/apache/cassandra/blob/641b906be5974776544a45b3f5ec3c81d7b11c53/src/java/org/apache/cassandra/cql3/functions/masking/ColumnMask.java#L123]
 on {{ColumnMask#mask}} was taking a toll on performance:
!read_ddm_enabled_default_addall.png|width=627,height=687!

I have make some changes to store the partial arguments as an array, use 
{{System.arraycopy}} when there are multiple arguments, and optimize for the 
case where there are no partial arguments. 

A further improvement, at least for native functions, is exploiting the fact 
that all the partial arguments in a column mask are fixed and can be either be 
memoized or ignored if the function doesn't take such arguments. I have added a 
new {{NativeFunction#Masker}} class to represent this. The {{ColumnMask}} 
caches the {{{}Masker{}}}, so it only has to call {{Masker#mask(ByteBuffer)}} 
with the value of the masked column. That allows us to avoid playing with 
collections on every call to {{{}ColumnMask#mask{}}}. It also skips the 
unserializing of the partial arguments on very function call.

After these two improvements, the flame graphs for running some queries look 
like:
!select_query_with_ddm.jpg|width=816,height=365!

 

Here with more detail on request processing:
!select_query_with_ddm_detail.jpg|width=816,height=365!

I don't see any noticable difference between unpatched trunk and the feature 
branch with DDM disabled. Enabling DDM without masking any column doesn't seem 
to make a difference either.

If we mask the queried columns with either {{mask_default}} or {{mask_replace}} 
the query times seem to increase a little bit. That increase seems to be caused 
not for the mask application but from the use of {{SelectionWithProcessing}} 
instead of {{{}SimpleSelection{}}}.

If we use {{mask_inner}} we see a more noticeable impact of masking. This is 
due to the masking function unserializing the column value and serializing its 
masked value, 
[here|https://github.com/apache/cassandra/blob/641b906be5974776544a45b3f5ec3c81d7b11c53/src/java/org/apache/cassandra/cql3/functions/masking/PartialMaskingFunction.java#L119-L121].
 I'd say this is acceptable, although perhaps we could try to optimize that 
function by delegating its action to the type serializer. In any case, that 
seems like something to do in a separate ticket. 

 

> CEP-20: Dynamic Data Masking
> ----------------------------
>
>                 Key: CASSANDRA-17940
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17940
>             Project: Cassandra
>          Issue Type: Epic
>          Components: Feature/Dynamic Data Masking
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: read_ddm_enabled_default_addall.jpg, 
> read_ddm_enabled_default_addall.png, select_query_with_ddm.jpg, 
> select_query_with_ddm_detail.jpg
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Implementation of dynamic data masking as described in 
> [CEP-20|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-20%3A+Dynamic+Data+Masking].
> Dynamic data masking (DDM) allows to obscure sensitive information while 
> still allowing access to the masked columns, and without changing the stored 
> data. The CEP aims to provide DDM for Apache Cassandra using both native and 
> user-defined CQL functions. These functions can be either be used on their 
> own or be attached to specific columns in the table definition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to