[ https://issues.apache.org/jira/browse/BEAM-5404?focusedWorklogId=144855&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-144855 ]
ASF GitHub Bot logged work on BEAM-5404: ---------------------------------------- Author: ASF GitHub Bot Created on: 17/Sep/18 14:26 Start Date: 17/Sep/18 14:26 Worklog Time Spent: 10m Work Description: nielm opened a new pull request #6407: [BEAM-5404] Use Java serialization for MutationGroup objects. URL: https://github.com/apache/beam/pull/6407 The Cloud Spanner connector uses a custom serialization system for MutationGroup objects Java serialization is much more efficient than the custom serialization system used by MutationGroupEncode -- in both speed, and space: the encoded byte arrays are 1/10th of the size. This PR replaces the custom serialization with a simple Java serialization using the Beam SerializableCoder class. @chamikaramj Post-Commit Tests Status (on master branch) ------------------------------------------------------------------------------------------------ Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) </br> [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | --- ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 144855) Time Spent: 10m Remaining Estimate: 0h > Inefficient Serialization of Spanner MutationGroup in pipeline > -------------------------------------------------------------- > > Key: BEAM-5404 > URL: https://issues.apache.org/jira/browse/BEAM-5404 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.6.0 > Reporter: Niel Markwick > Assignee: Chamikara Jayalath > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The Cloud Spanner connector uses a custom serialization mechanism to convert > MutationGroup objects into a byte array. > This mechanism is very inefficient producing byte arrays approx 10x larger > than simple Java Serialization of the MutationGroup objects, which increases > the resources needed by the connector to ~40x the size of the original > mutations. > There are no obvious benefits to using this custom serialization system, as > the objects are deserialized within the pipeline itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005)