[
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354775
]
ASF GitHub Bot logged work on BEAM-8645:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Dec/19 23:18
Start Date: 05/Dec/19 23:18
Worklog Time Spent: 10m
Work Description: robertwb commented on pull request #10143: [BEAM-8645]
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354597042
##########
File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
##########
@@ -1579,6 +1580,49 @@ def test_lull_logging(self):
'.*There has been a processing lull of over.*',
'Unable to find a lull logged for this job.')
+class StateBackedTestElementType(object):
+ element_count = 0
+
+ def __init__(self, num_elems):
+ self.num_elems = num_elems
+ self.value = ['a' for _ in range(num_elems)]
+ StateBackedTestElementType.element_count += 1
+ # Due to using state backed iterable, we expect there is a few instances
+ # alive at any given time.
+ if StateBackedTestElementType.element_count > 5:
+ raise RuntimeError('Too many live instances.')
+
+ def __del__(self):
+ StateBackedTestElementType.element_count -= 1
+
+ def __reduce__(self):
+ return (self.__class__, (self.num_elems, ))
+
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+
+ class ElementDoFn(beam.DoFn):
+ def process(self, elements):
+ unused_key, ts = elements
+
+ yield sum([item.num_elems for item in ts])
+
+ def create_pipeline(self):
+ return beam.Pipeline(
Review comment:
We'd need TestPipeline to be a ValidatesRunner test. Of course in that case
we couldn't manually pass use_state_iterables (unless we make it a pipeline
option).
However, I'm +1 for this test going in and then future work getting it to
run on other runners (possibly just overriding the create_pipeline method
altogether).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 354775)
> TimestampCombiner incorrect in beam python
> ------------------------------------------
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Ruoyun Huang
> Priority: Major
> Time Spent: 9h 20m
> Remaining Estimate: 0h
>
> When we have a TimestampValue on combine:
> {code:java}
> main_stream = (p
> | 'main TestStream' >> TestStream()
> .add_elements([window.TimestampedValue(('k', 100), 0)])
> .add_elements([window.TimestampedValue(('k', 400), 9)])
> .advance_watermark_to_infinity()
> | 'main windowInto' >> beam.WindowInto(
> window.FixedWindows(10),
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) |
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST: (('k', 500), Timestamp(9)),
> EARLIEST: (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results:
> LATEST: (('k', 500), Timestamp(10)),
> EARLIEST: (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.99999999)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)