[jira] [Commented] (CASSANDRA-16218) Simplify the almost duplicated code for calculating serialization size and serialization

2020-10-21 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218634#comment-17218634
 ] 

Yifan Cai commented on CASSANDRA-16218:
---

The benchmark result shows the new approach takes *14% more time* on 
calculating the serialized size. Attached the benchmark. So.. it is probably 
not worthy. We may instead just fix CASSANDRA-16103 and pay more attention when 
reviewing changes in serialize() and serializedSize().

 
{code:java}
c-16218
 [java] BenchmarkMode  Cnt  Score   Error  
Units
 [java] SerializationSizeBench.getSerializeSize  avgt6  2.290 ± 0.489  
ns/op

trunk
 [java] BenchmarkMode  Cnt  Score   Error  
Units
 [java] SerializationSizeBench.getSerializeSize  avgt6  2.003 ± 0.163  
ns/op{code}
 

> Simplify the almost duplicated code for calculating serialization size and 
> serialization
> 
>
> Key: CASSANDRA-16218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16218
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: SerializationSizeBench.java
>
>
> The current pattern of counting the serialization size and the actual 
> serialization in the codebase is error-prone and hard to maintain. Those 2 
> code paths have almost the same code repeated. 
>  
> The pattern can be found in {{org.apache.cassandra.net.Message#Serializer}} 
> and numerous locations that use {{org.apache.cassandra.transport.CBCodec}}. 
>  
> I am proposing a new approach that lets both methods share the same code 
> path. The code basically looks like the below (in the case of 
> {{org.apache.cassandra.net.Message#Serializer}}).
> {code:java}
> // A fake DataOutputPlus that simply increment the size when invoking write* 
> methods
> public class SizeCountingDataOutput implements DataOutputPlus
> {
>  private int size = 0;
>  public int getSize()
>  {
>return size;
>  }
>  public void write(int b)
>  {
>size += 1;
>  }
>  public void write(byte[] b)
>  {
>size += b.length;
>  }
>  ...
> }
> {code}
> Therefore, in the size calculation, we can supply the fake data output and 
> call serialize to get the size.
> {code:java}
> private  int serializedSize(Message message, int version)
> {
>  SizeCountingDataOutput out = new SizeCountingDataOutput();
>  try
>  {
>serialize(message, out, version);
>  }
>  catch (IOException exception)
>  {
>throw new AssertionError("It should not happen!", exception);
>  }
>  return out.getSize();
> // The original implementation
> // return version >= VERSION_40 ? serializedSizePost40(message, version) : 
> serializedSizePre40(message, version);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16218) Simplify the almost duplicated code for calculating serialization size and serialization

2020-10-21 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17218577#comment-17218577
 ] 

David Capwell commented on CASSANDRA-16218:
---

Idea sounds good to me, would simplify serializer logic.  I took a quick scan 
and the only negative I can find is extra function calls in some paths, but the 
common case had roughly equal.

> Simplify the almost duplicated code for calculating serialization size and 
> serialization
> 
>
> Key: CASSANDRA-16218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16218
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
>
> The current pattern of counting the serialization size and the actual 
> serialization in the codebase is error-prone and hard to maintain. Those 2 
> code paths have almost the same code repeated. 
>  
> The pattern can be found in {{org.apache.cassandra.net.Message#Serializer}} 
> and numerous locations that use {{org.apache.cassandra.transport.CBCodec}}. 
>  
> I am proposing a new approach that lets both methods share the same code 
> path. The code basically looks like the below (in the case of 
> {{org.apache.cassandra.net.Message#Serializer}}).
> {code:java}
> // A fake DataOutputPlus that simply increment the size when invoking write* 
> methods
> public class SizeCountingDataOutput implements DataOutputPlus
> {
>  private int size = 0;
>  public int getSize()
>  {
>return size;
>  }
>  public void write(int b)
>  {
>size += 1;
>  }
>  public void write(byte[] b)
>  {
>size += b.length;
>  }
>  ...
> }
> {code}
> Therefore, in the size calculation, we can supply the fake data output and 
> call serialize to get the size.
> {code:java}
> private  int serializedSize(Message message, int version)
> {
>  SizeCountingDataOutput out = new SizeCountingDataOutput();
>  try
>  {
>serialize(message, out, version);
>  }
>  catch (IOException exception)
>  {
>throw new AssertionError("It should not happen!", exception);
>  }
>  return out.getSize();
> // The original implementation
> // return version >= VERSION_40 ? serializedSizePost40(message, version) : 
> serializedSizePre40(message, version);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org