[ 
https://issues.apache.org/jira/browse/BEAM-7949?focusedWorklogId=364948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-364948
 ]

ASF GitHub Bot logged work on BEAM-7949:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 31/Dec/19 10:14
            Start Date: 31/Dec/19 10:14
    Worklog Time Spent: 10m 
      Work Description: sunjincheng121 commented on pull request #10246: 
[BEAM-7949] Add time-based cache threshold support in the data service of the 
Python SDK harness
URL: https://github.com/apache/beam/pull/10246#discussion_r362187710
 
 

 ##########
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##########
 @@ -200,6 +203,29 @@ def _get_state_cache_size(pipeline_options):
   return 0
 
 
+def _get_data_buffer_time_limit_ms(pipeline_options):
+  """Defines the time limt of the outbound data buffering.
+
+  Note: data_buffer_time_limit_ms is an experimental flag and might
+  not be available in future releases.
+
+  Returns:
+    an int indicating the time limit in milliseconds of the the outbound
+      data buffering. Default is 0 (disabled)
+  """
+  experiments = pipeline_options.view_as(DebugOptions).experiments
+  experiments = experiments if experiments else []
+
+  for experiment in experiments:
+    # There should only be 1 match so returning from the loop
+    if re.match(r'data_buffer_time_limit_ms=', experiment):
+      return int(
+          re.match(
+              r'data_buffer_time_limit_ms=(?P<data_buffer_time_limit_ms>.*)',
 
 Review comment:
   I have also thought about this question when preparing this PR. The reason I 
have not done that in this PR is because I found that most config keys in the 
Java SDK harness starts with "beam_fn_api_" and it's not the same case for the 
config keys in the Python SDK harness.
    I'm fine to unify the config key if you don't think that's a problem. 
   What's your thought? @lukecwik @mxm 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 364948)
    Time Spent: 4h  (was: 3h 50m)

> Add time-based cache threshold support in the data service of the Python SDK 
> harness
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-7949
>                 URL: https://issues.apache.org/jira/browse/BEAM-7949
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-harness
>            Reporter: sunjincheng
>            Assignee: sunjincheng
>            Priority: Major
>             Fix For: 2.19.0
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in the data service of 
> Python SDK harness. It should also support the time-based cache threshold. 
> This is very important, especially for streaming jobs which are sensitive to 
> the delay. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to