[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

ASF GitHub Bot (Jira) Thu, 05 Dec 2019 16:26:08 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=354816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-354816
 ]


ASF GitHub Bot logged work on BEAM-8645:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Dec/19 00:25
            Start Date: 06/Dec/19 00:25
    Worklog Time Spent: 10m 
      Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r354615011
 
 

 ##########
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##########
 @@ -1579,6 +1580,49 @@ def test_lull_logging(self):
         '.*There has been a processing lull of over.*',
         'Unable to find a lull logged for this job.')
 
+class StateBackedTestElementType(object):
+  element_count = 0
+
+  def __init__(self, num_elems):
+    self.num_elems = num_elems
+    self.value = ['a' for _ in range(num_elems)]
+    StateBackedTestElementType.element_count += 1
+    # Due to using state backed iterable, we expect there is a few instances
+    # alive at any given time.
+    if StateBackedTestElementType.element_count > 5:
+      raise RuntimeError('Too many live instances.')
+
+  def __del__(self):
+    StateBackedTestElementType.element_count -= 1
+
+  def __reduce__(self):
+    return (self.__class__, (self.num_elems, ))
 
 Review comment:
   updated.  I believe the motivation is to just make the element large when 
serialization happens, thus the actual element now does not hold a blob of 
data. 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 354816)
    Time Spent: 9h 50m  (was: 9h 40m)

> TimestampCombiner incorrect in beam python
> ------------------------------------------
>
>                 Key: BEAM-8645
>                 URL: https://issues.apache.org/jira/browse/BEAM-8645
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Ruoyun Huang
>            Priority: Major
>          Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p                   
> | 'main TestStream' >> TestStream()                   
> .add_elements([window.TimestampedValue(('k', 100), 0)])                   
> .add_elements([window.TimestampedValue(('k', 400), 9)])                   
> .advance_watermark_to_infinity()                   
> | 'main windowInto' >> beam.WindowInto(                         
> window.FixedWindows(10),                      
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)                   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.99999999)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

Reply via email to