[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=148686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148686 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 27/Sep/18 11:06 Start Date: 27/Sep/18 11:06 Worklog Time Spent: 10m Work Description: robertwb closed pull request #6497: [BEAM-5270] Fix ToString coder to return bytes objects in Python 3. URL: https://github.com/apache/beam/pull/6497 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/coders/coder_impl.py b/sdks/python/apache_beam/coders/coder_impl.py index 6fd9b169ed6..eb6f9a1e510 100644 --- a/sdks/python/apache_beam/coders/coder_impl.py +++ b/sdks/python/apache_beam/coders/coder_impl.py @@ -197,6 +197,10 @@ def get_estimated_size_and_observables(self, value, nested=False): return self.estimate_size(value, nested), [] + def __repr__(self): +return 'CallbackCoderImpl[encoder=%s, decoder=%s]' % ( +self._encoder, self._decoder) + class DeterministicFastPrimitivesCoderImpl(CoderImpl): """For internal use only; no backwards-compatibility guarantees.""" diff --git a/sdks/python/apache_beam/coders/coders.py b/sdks/python/apache_beam/coders/coders.py index ad4edbbb374..f0ed6dcbeb9 100644 --- a/sdks/python/apache_beam/coders/coders.py +++ b/sdks/python/apache_beam/coders/coders.py @@ -22,6 +22,7 @@ from __future__ import absolute_import import base64 +import sys from builtins import object import google.protobuf.wrappers_pb2 @@ -314,13 +315,17 @@ def is_deterministic(self): class ToStringCoder(Coder): """A default string coder used if no sink coder is specified.""" - def encode(self, value): -try: # Python 2 - if isinstance(value, unicode): # pylint: disable=unicode-builtin -return value.encode('utf-8') -except NameError: # Python 3 - pass -return str(value) + if sys.version_info.major == 2: + +def encode(self, value): + # pylint: disable=unicode-builtin + return (value.encode('utf-8') if isinstance(value, unicode) # noqa: F821 + else str(value)) + + else: + +def encode(self, value): + return value if isinstance(value, bytes) else str(value).encode('utf-8') def decode(self, _): raise NotImplementedError('ToStringCoder cannot be used for decoding.') This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 148686) Time Spent: 3h 10m (was: 3h) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=148345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148345 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 26/Sep/18 20:11 Start Date: 26/Sep/18 20:11 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on issue #6497: [BEAM-5270] Fix ToString coder to return bytes objects in Python 3. URL: https://github.com/apache/beam/pull/6497#issuecomment-424853444 LGTM, thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 148345) Time Spent: 3h (was: 2h 50m) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=148143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148143 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 26/Sep/18 14:05 Start Date: 26/Sep/18 14:05 Worklog Time Spent: 10m Work Description: robertwb commented on issue #6497: [BEAM-5270] Fix ToString coder to return bytes objects in Python 3. URL: https://github.com/apache/beam/pull/6497#issuecomment-424727978 R: @RobbeSneyders This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 148143) Time Spent: 2h 50m (was: 2h 40m) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=148142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-148142 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 26/Sep/18 14:04 Start Date: 26/Sep/18 14:04 Worklog Time Spent: 10m Work Description: robertwb opened a new pull request #6497: [BEAM-5270] Fix ToString coder to return bytes objects in Python 3. URL: https://github.com/apache/beam/pull/6497 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Flink/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 148142) Time Spent: 2h 40m (was: 2.5h) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Fix For: Not
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=141999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141999 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 07/Sep/18 01:44 Start Date: 07/Sep/18 01:44 Worklog Time Spent: 10m Work Description: aaltay closed pull request #6310: [BEAM-5270] Finish Python 3 porting for coders subpackage URL: https://github.com/apache/beam/pull/6310 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/coders/coders.py b/sdks/python/apache_beam/coders/coders.py index cf4b9b5d520..ad4edbbb374 100644 --- a/sdks/python/apache_beam/coders/coders.py +++ b/sdks/python/apache_beam/coders/coders.py @@ -62,12 +62,13 @@ def serialize_coder(coder): from apache_beam.internal import pickler - return '%s$%s' % (coder.__class__.__name__, pickler.dumps(coder)) + return b'%s$%s' % (coder.__class__.__name__.encode('utf-8'), + pickler.dumps(coder)) def deserialize_coder(serialized): from apache_beam.internal import pickler - return pickler.loads(serialized.split('$', 1)[1]) + return pickler.loads(serialized.split(b'$', 1)[1]) # pylint: enable=wrong-import-order, wrong-import-position diff --git a/sdks/python/apache_beam/coders/coders_test_common.py b/sdks/python/apache_beam/coders/coders_test_common.py index 0b8b4c20fde..969c1de10c4 100644 --- a/sdks/python/apache_beam/coders/coders_test_common.py +++ b/sdks/python/apache_beam/coders/coders_test_common.py @@ -20,6 +20,7 @@ import logging import math +import sys import unittest from builtins import range @@ -41,7 +42,7 @@ class CustomCoder(coders.Coder): def encode(self, x): -return str(x+1) +return str(x+1).encode('utf-8') def decode(self, encoded): return int(encoded) - 1 @@ -56,6 +57,9 @@ class CodersTest(unittest.TestCase): def setUpClass(cls): cls.seen = set() cls.seen_nested = set() +# Method has been renamed in Python 3 +if sys.version_info[0] < 3: + cls.assertCountEqual = cls.assertItemsEqual @classmethod def tearDownClass(cls): @@ -272,7 +276,7 @@ def iter_generator(count): yield i iterable_coder = coders.IterableCoder(coders.VarIntCoder()) -self.assertItemsEqual(list(iter_generator(count)), +self.assertCountEqual(list(iter_generator(count)), iterable_coder.decode( iterable_coder.encode(iter_generator(count @@ -374,8 +378,8 @@ def test_global_window_coder(self): self.assertEqual({'@type': 'kind:global_window'}, coder.as_cloud_object()) # Test binary representation -self.assertEqual('', coder.encode(value)) -self.assertEqual(value, coder.decode('')) +self.assertEqual(b'', coder.encode(value)) +self.assertEqual(value, coder.decode(b'')) # Test unnested self.check_coder(coder, value) # Test nested diff --git a/sdks/python/apache_beam/coders/slow_stream.py b/sdks/python/apache_beam/coders/slow_stream.py index da27a49883a..4bdece6072b 100644 --- a/sdks/python/apache_beam/coders/slow_stream.py +++ b/sdks/python/apache_beam/coders/slow_stream.py @@ -22,6 +22,7 @@ from __future__ import absolute_import import struct +import sys from builtins import chr from builtins import object @@ -70,7 +71,7 @@ def write_bigendian_double(self, v): self.write(struct.pack('>d', v)) def get(self): -return ''.join(self.data) +return b''.join(self.data) def size(self): return len(self.data) @@ -114,6 +115,19 @@ def __init__(self, data): self.data = data self.pos = 0 +# The behavior of looping over a byte-string and obtaining byte characters +# has been changed between python 2 and 3. +# b = b'\xff\x01' +# Python 2: +# b[0] = '\xff' +# ord(b[0]) = 255 +# Python 3: +# b[0] = 255 +if sys.version_info[0] >= 3: + self.read_byte = self.read_byte_py3 +else: + self.read_byte = self.read_byte_py2 + def size(self): return len(self.data) - self.pos @@ -124,10 +138,14 @@ def read(self, size): def read_all(self, nested): return self.read(self.read_var_int64() if nested else self.size()) - def read_byte(self): + def read_byte_py2(self): self.pos += 1 return ord(self.data[self.pos - 1]) + def read_byte_py3(self): +self.pos += 1 +return self.data[self.pos - 1] + def read_var_int64(self): shift = 0 result = 0 diff --git a/sdks/python/apache_beam/coders/standard_coders_test.py b/sdks/python/apache_beam/coders/standard_coders_test.py index
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=141997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141997 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 07/Sep/18 01:42 Start Date: 07/Sep/18 01:42 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6310: [BEAM-5270] Finish Python 3 porting for coders subpackage URL: https://github.com/apache/beam/pull/6310#issuecomment-419293776 Thank you, @RobbeSneyders I also checked coders microbenchmark to verify performance was not affected, although changes seem safe. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141997) Time Spent: 2h 10m (was: 2h) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=141998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141998 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 07/Sep/18 01:42 Start Date: 07/Sep/18 01:42 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6310: [BEAM-5270] Finish Python 3 porting for coders subpackage URL: https://github.com/apache/beam/pull/6310#issuecomment-419293783 @aaltay this is ready to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141998) Time Spent: 2h 20m (was: 2h 10m) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=141932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141932 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 06/Sep/18 20:48 Start Date: 06/Sep/18 20:48 Worklog Time Spent: 10m Work Description: RobbeSneyders commented on issue #6310: [BEAM-5270] Finish Python 3 porting for coders subpackage URL: https://github.com/apache/beam/pull/6310#issuecomment-419236279 Rebased @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141932) Time Spent: 2h (was: 1h 50m) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=141819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141819 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 06/Sep/18 15:30 Start Date: 06/Sep/18 15:30 Worklog Time Spent: 10m Work Description: tvalentyn edited a comment on issue #6310: [BEAM-5270] Finish Python 3 porting for coders subpackage URL: https://github.com/apache/beam/pull/6310#issuecomment-419138336 @RobbeSneyders Could you rebase this please on top of master? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141819) Time Spent: 1h 50m (was: 1h 40m) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5270) Finish Python 3 porting for coders module
[ https://issues.apache.org/jira/browse/BEAM-5270?focusedWorklogId=141818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-141818 ] ASF GitHub Bot logged work on BEAM-5270: Author: ASF GitHub Bot Created on: 06/Sep/18 15:30 Start Date: 06/Sep/18 15:30 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #6310: [BEAM-5270] Finish Python 3 porting for coders subpackage URL: https://github.com/apache/beam/pull/6310#issuecomment-419138336 Could you rebase this please? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 141818) Time Spent: 1h 40m (was: 1.5h) > Finish Python 3 porting for coders module > - > > Key: BEAM-5270 > URL: https://issues.apache.org/jira/browse/BEAM-5270 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)