[ 
https://issues.apache.org/jira/browse/BEAM-4006?focusedWorklogId=130033&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-130033
 ]

ASF GitHub Bot logged work on BEAM-4006:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Aug/18 00:08
            Start Date: 02/Aug/18 00:08
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on a change in pull request #5729: 
[BEAM-4006] Futurize transforms subpackage
URL: https://github.com/apache/beam/pull/5729#discussion_r207068499
 
 

 ##########
 File path: sdks/python/apache_beam/transforms/window.py
 ##########
 @@ -246,10 +263,33 @@ def __init__(self, value, timestamp):
     self.value = value
     self.timestamp = Timestamp.of(timestamp)
 
-  def __cmp__(self, other):
-    if type(self) is not type(other):
-      return cmp(type(self), type(other))
-    return cmp((self.value, self.timestamp), (other.value, other.timestamp))
+  def __eq__(self, other):
 
 Review comment:
   I checked performance of `windowed_value`, `interval_window`, 
`timestamped_value`, `bounded_window` in dictionaries and ordered lists, with 
and without this PR. For the most part, performance is not changed or  
improved. `@total_ordering` does not significantly affect it. Only concern is 
using `hash(type(self))` when evaluating hashes of objects may be unnecessary 
in most cases, and slightly decreases the performance here: 
https://github.com/apache/beam/pull/5729/files#diff-d7dfd884622fb59806ba9276cf3bd8fbR242.
 So I left some more comments to simplify hash functions. The change above was 
also the trigger for test flakiness, although ultimately the test was at fault. 
   
   Without PR:
   ```
   wv_with_one_window: dict, 10000 element(s)      : per element median time 
cost: 4.71699e-06 sec, relative std: 5.93%                                      
   
   wv_with_multiple_windows: dict, 10000 element(s): per element median time 
cost: 4.02698e-05 sec, relative std: 0.60%                                      
   
   interval_window: dict, 10000 element(s)         : per element median time 
cost: 1.5276e-06 sec, relative std: 1.78%                                       
   
   timestamped_value: dict, 10000 element(s)       : per element median time 
cost: 1.39499e-07 sec, relative std: 7.44%                                      
   
   interval_window: sorting., 10000 element(s)     : per element median time 
cost: 4.04392e-05 sec, relative std: 0.63%                                      
   
   timestamped_value: sorting., 10000 element(s)   : per element median time 
cost: 1.80363e-05 sec, relative std: 1.35%                                      
   
   bounded_window: sorting., 10000 element(s)      : per element median time 
cost: 4.06633e-05 sec, relative std: 1.26%                                      
   
   ```
   With PR (including the change suggested in last iteration).
   
   ```
   wv_with_one_window: dict, 10000 element(s)      : per element median time 
cost: 5.047e-06 sec, relative std: 2.16%
   wv_with_multiple_windows: dict, 10000 element(s): per element median time 
cost: 4.0575e-05 sec, relative std: 2.20%
   interval_window: dict, 10000 element(s)         : per element median time 
cost: 1.53821e-06 sec, relative std: 2.43%
   timestamped_value: dict, 10000 element(s)       : per element median time 
cost: 1.27995e-06 sec, relative std: 6.11%
   interval_window: sorting., 10000 element(s)     : per element median time 
cost: 1.83087e-05 sec, relative std: 1.28%
   timestamped_value: sorting., 10000 element(s)   : per element median time 
cost: 8.4375e-06 sec, relative std: 2.62%
   bounded_window: sorting., 10000 element(s)      : per element median time 
cost: 1.80462e-05 sec, relative std: 3.56%
   ```
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 130033)
    Time Spent: 15.5h  (was: 15h 20m)

> Futurize and fix python 2 compatibility for transforms subpackage
> -----------------------------------------------------------------
>
>                 Key: BEAM-4006
>                 URL: https://issues.apache.org/jira/browse/BEAM-4006
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Robbe
>            Assignee: Matthias Feys
>            Priority: Major
>          Time Spent: 15.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to