[ 
https://issues.apache.org/jira/browse/BEAM-7008?focusedWorklogId=223296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-223296
 ]

ASF GitHub Bot logged work on BEAM-7008:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/19 22:14
            Start Date: 04/Apr/19 22:14
    Worklog Time Spent: 10m 
      Work Description: robertwb commented on pull request #8228: [BEAM-7008] 
standardize UTF-8 string coder encodings
URL: https://github.com/apache/beam/pull/8228#discussion_r272387474
 
 

 ##########
 File path: sdks/python/apache_beam/coders/coder_impl.py
 ##########
 @@ -433,6 +433,18 @@ def decode(self, encoded):
     return encoded
 
 
+class StrUtf8CoderImpl(StreamCoderImpl):
+  """For internal use only; no backwards-compatibility guarantees."""
+  def encode_to_stream(self, value, out, nested):
+    byte_value = value.encode('utf-8')
+    out.write_var_int64(len(byte_value))
+    out.write(byte_value)
+
+  def decode_from_stream(self, in_stream, nested):
+    byte_length = in_stream.read_var_int64()
 
 Review comment:
   Similarly. Otherwise read to the end of the stream. This can be done with 
`stream.read_all(nested)`.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 223296)
    Time Spent: 20m  (was: 10m)

> standardize UTF-8 string coder encodings
> ----------------------------------------
>
>                 Key: BEAM-7008
>                 URL: https://issues.apache.org/jira/browse/BEAM-7008
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core, sdk-py-core
>            Reporter: Heejong Lee
>            Assignee: Heejong Lee
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> It looks like UTF-8 String Coder in Java and Python SDKs uses different 
> encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the 
> input string before actual data bytes however StrUtf8Coder in Python SDK 
> directly encodes the input string to bytes value. We should unify the 
> encoding schemes of UTF8 strings across the different SDKs and make it a 
> standard coder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to