Heejong Lee created BEAM-7008: --------------------------------- Summary: standardize UTF-8 string coder encodings Key: BEAM-7008 URL: https://issues.apache.org/jira/browse/BEAM-7008 Project: Beam Issue Type: Bug Components: sdk-java-core, sdk-py-core Reporter: Heejong Lee Assignee: Heejong Lee
It looks like UTF-8 String Coder in Java and Python SDKs uses different encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the input string before actual data bytes however StrUtf8Coder in Python SDK directly encodes the input string to bytes value. We should unify the encoding schemes of UTF8 strings across the different SDKs and make it a standard coder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)