[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=320028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320028 ]
ASF GitHub Bot logged work on BEAM-5428: ---------------------------------------- Author: ASF GitHub Bot Created on: 28/Sep/19 16:22 Start Date: 28/Sep/19 16:22 Worklog Time Spent: 10m Work Description: mxm commented on pull request #9418: [BEAM-5428] Implement cross-bundle user state caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r329316747 ########## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ########## @@ -199,26 +199,19 @@ def finish(self): class _StateBackedIterable(object): - def __init__(self, state_handler, state_key, coder_or_impl): + def __init__(self, state_handler, state_key, coder_or_impl, + is_cached=False): self._state_handler = state_handler self._state_key = state_key if isinstance(coder_or_impl, coders.Coder): self._coder_impl = coder_or_impl.get_impl() else: self._coder_impl = coder_or_impl + self._is_cached = is_cached def __iter__(self): - # This is the continuation token this might be useful - data, continuation_token = self._state_handler.blocking_get(self._state_key) - while True: - input_stream = coder_impl.create_InputStream(data) - while input_stream.size() > 0: - yield self._coder_impl.decode_from_stream(input_stream, True) - if not continuation_token: - break - else: - data, continuation_token = self._state_handler.blocking_get( - self._state_key, continuation_token) + return self._state_handler.blocking_get( Review comment: >The code that is removed keeps fetching the state until there is no token. This code has _not_ been removed. It has simply been refactored to support fetching all state at once instead of just one element at a time. The logic regarding the continuation token is unchanged. Please look inside the `materialize_iter` method. Of course I will address all remaining comments before merging the PR, as soon as I get a chance. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 320028) Time Spent: 26h 10m (was: 26h) > Implement cross-bundle state caching. > ------------------------------------- > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness > Reporter: Robert Bradshaw > Assignee: Maximilian Michels > Priority: Major > Time Spent: 26h 10m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)