[jira] [Comment Edited] (CASSANDRA-14781) Log message when mutation passed to CommitLog#add(Mutation) is too large is not descriptive enough

Jordan West (Jira) Mon, 06 Jan 2020 13:30:24 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009182#comment-17009182
 ]


Jordan West edited comment on CASSANDRA-14781 at 1/6/20 9:29 PM:
-----------------------------------------------------------------

Thanks for picking this up [~n.v.harikrishna]. Comments below. 

 

General:
 * Since this change logs keys which in some use cases may contain PII that 
users may not want logged, should we add a flag to gate this feature? 
 * I don’t believe this address [~aleksey]'s concern that the exception be 
propagated to the client vs. timing out. At a minimum, we should add an in-jvm 
dtest to show it does (I’m happy to help contribute this unless you’d like to). 
Its possible to split the patch into two parts (one that changes the logging 
and one that propagates the exception), however, in prod I’ve seen the lack of 
this logging and the timeouts be an issue so they are definitely both worth 
addressing. 

CommitLog:
 * With your change to validate the size using serializedSize, it looks like 
its no longer be necessary to use the scratch buffer (which seems to be used 
primarily to get the length w/o calculating serialized size). However, I wonder 
if the original approach is better performance wise since we are already 
serializing the mutation. It would be good to benchmark workloads with all 
mutations exceeding max size, no mutations exceeding max size, and mixed to 
check because I can imagine how what I said is wrong as well. We should check 
if there is any difference or if its negligible. 

SizeHolder:
 * Since the only call to {{#getSize()}} currently is 
{{IMutation#validateSize}} which is only called once from CommitLog.java. 
Consider removing the memoization. 
 * If the memoization is still desired, I don’t think its necessary to have 
fields for each version since a mutation is executed within a single version of 
Cassandra (it may cross network boundaries and thus versions but at this point 
we have a new SizeHolder). Using a single variable would cleanup {{#getSize}} 
considerably. Further, it would be cleaner/less error prone to memoize the size 
in the mutation since nothing maps the object passed to {{#getSize}} to the 
stored value. 

MutationExceededMaxSizeException:
 * {{prepareMessage}} recalculates the mutation size. While this is an 
exceptional case it is still common. Perhaps a place where the memoization 
could be helpful? 
 * Spelling error: maxMutaionSize
 * I almost always prefer a configurable limit to something hardcoded and 
requiring recompilation to change 

Other:
 * I don’t think {{IMutation.getMaxMutationSize()}} is necessary. Use 
{{DatabaseDescriptor}} instead. This is more typical of Cassandra config code. 
Further, the original code only called 
{{DatabaseDescriptor.getMaxMutationSize()}} once instead of for every write. 
Unless we are going to make it a hot-prop its worth putting back to how it was. 
 * We have a few {{Serializer}} interfaces in C* already, consider renaming 
{{SizeHolder.Serializer}} if its kept


was (Author: jrwest):
Thanks for picking this up [~n.v.harikrishna]. Comments below. 

 
 * General:
 * Since this change logs keys which in some use cases may contain PII that 
users may not want logged, should we add a flag to gate this feature? 
 * I don’t believe this address [~aleksey]'s concern that the exception be 
propagated to the client vs. timing out. At a minimum, we should add an in-jvm 
dtest to show it does (I’m happy to help contribute this unless you’d like to). 
Its possible to split the patch into two parts (one that changes the logging 
and one that propagates the exception), however, in prod I’ve seen the lack of 
this logging and the timeouts be an issue so they are definitely both worth 
addressing. 


 * CommitLog:
 * With your change to validate the size using serializedSize, it looks like 
its no longer be necessary to use the scratch buffer (which seems to be used 
primarily to get the length w/o calculating serialized size). However, I wonder 
if the original approach is better performance wise since we are already 
serializing the mutation. It would be good to benchmark workloads with all 
mutations exceeding max size, no mutations exceeding max size, and mixed to 
check because I can imagine how what I said is wrong as well. We should check 
if there is any difference or if its negligible. 


 * SizeHolder:
 * Since the only call to \{{#getSize()}} currently is 
\{{IMutation#validateSize}} which is only called once from CommitLog.java. 
Consider removing the memoization. 
 * If the memoization is still desired, I don’t think its necessary to have 
fields for each version since a mutation is executed within a single version of 
Cassandra (it may cross network boundaries and thus versions but at this point 
we have a new SizeHolder). Using a single variable would cleanup \{{#getSize}} 
considerably. Further, it would be cleaner/less error prone to memoize the size 
in the mutation since nothing maps the object passed to \{{#getSize}} to the 
stored value. 


 * MutationExceededMaxSizeException:
 * {\{prepareMessage}} recalculates the mutation size. While this is an 
exceptional case it is still common. Perhaps a place where the memoization 
could be helpful? 
 * Spelling error: maxMutaionSize
 * I almost always prefer a configurable limit to something hardcoded and 
requiring recompilation to change 


 * I don’t think \{{IMutation.getMaxMutationSize()}} is necessary. Use 
\{{DatabaseDescriptor}} instead. This is more typical of Cassandra config code. 
Further, the original code only called 
\{{DatabaseDescriptor.getMaxMutationSize()}} once instead of for every write. 
Unless we are going to make it a hot-prop its worth putting back to how it was. 
 * We have a few \{{Serializer}} interfaces in C* already, consider renaming 
\{{SizeHolder.Serializer}} if its kept

> Log message when mutation passed to CommitLog#add(Mutation) is too large is 
> not descriptive enough
> --------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14781
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14781
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Hints, Local/Commit Log, Messaging/Client
>            Reporter: Jordan West
>            Assignee: Tom Petracca
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-beta
>
>         Attachments: CASSANDRA-14781.patch, CASSANDRA-14781_3.0.patch, 
> CASSANDRA-14781_3.11.patch
>
>
> When hitting 
> [https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/commitlog/CommitLog.java#L256-L257],
>  the log message produced does not help the operator track down what data is 
> being written. At a minimum the keyspace and cfIds involved would be useful 
> (and are available) – more detail might not be reasonable to include. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-14781) Log message when mutation passed to CommitLog#add(Mutation) is too large is not descriptive enough

Reply via email to