[ 
https://issues.apache.org/jira/browse/BEAM-8618?focusedWorklogId=379877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-379877
 ]

ASF GitHub Bot logged work on BEAM-8618:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Jan/20 09:27
            Start Date: 31/Jan/20 09:27
    Worklog Time Spent: 10m 
      Work Description: mxm commented on pull request #10655: [BEAM-8618] Tear 
down unused DoFns periodically in Python SDK harness.
URL: https://github.com/apache/beam/pull/10655#discussion_r373384948
 
 

 ##########
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##########
 @@ -280,6 +283,7 @@ def get(self, instruction_id, bundle_descriptor_id):
     try:
       # pop() is threadsafe
       processor = self.cached_bundle_processors[bundle_descriptor_id].pop()
+      self.last_access_time[bundle_descriptor_id] = time.time()
     except IndexError:
 
 Review comment:
   >If the bundle processor is newly created, it means that the cached bundle 
processor list is empty. This is the main reason that the last access time is 
only updated when bundle processor is retrieved from the cache. 
   
   Consider the case where we just have a single bundle processor. When we call 
get for the first time, we won't update the last-used time. However, every time 
we retrieve it afterwards, we will update the time, but the list of cached 
bundle processors will remain empty.
   
   I think we should either (1) always update the last-used timestamp in `get`, 
regardless of creation or (2) update it only on `release`.
   
   I'm leaning towards (2) because while a bundle processor is in-use, it can't 
be removed anyways. We update the timestamp when we put it back in `release`.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 379877)
    Time Spent: 2h 50m  (was: 2h 40m)

> Tear down unused DoFns periodically in Python SDK harness
> ---------------------------------------------------------
>
>                 Key: BEAM-8618
>                 URL: https://issues.apache.org/jira/browse/BEAM-8618
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-harness
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>            Priority: Major
>             Fix For: 2.20.0
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, detail can be found [1],  the teardown of DoFns 
> should be supported in the portability framework. It happens at two places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for tear down the unused DoFns 
> periodically in Python SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to