[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=135160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-135160
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 15/Aug/18 21:28
Start Date: 15/Aug/18 21:28
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r210414347
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
 
   def __init__(self):
 super(_PerThreadWorkerData, self).__init__()
-# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements
 
 Review comment:
   Part of this TODO was accidentally deleted.  Please fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 135160)
Time Spent: 19h 40m  (was: 19.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=135159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-135159
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 15/Aug/18 21:24
Start Date: 15/Aug/18 21:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-413342234
 
 
   This was meant to be removed, because optimizations are no longer needed.
   This list only ever contains one element per work item.
   
   On Wed, Aug 15, 2018 at 2:23 PM Charles Chen 
   wrote:
   
   > *@charlesccychen* commented on this pull request.
   >
   > In sdks/python/apache_beam/runners/worker/logger.py
   > :
   >
   > > @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
   >
   >def __init__(self):
   >  super(_PerThreadWorkerData, self).__init__()
   > -# TODO(robertwb): Consider starting with an initial (ignored) ~20 
elements
   >
   > Part of this TODO was accidentally deleted. Please fix.
   >
   > —
   > You are receiving this because you modified the open/close state.
   >
   >
   > Reply to this email directly, view it on GitHub
   > , or mute
   > the thread
   > 

   > .
   >
   -- 
   Got feedback? go/pabloem-feedback
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 135159)
Time Spent: 19.5h  (was: 19h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-08-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=135158=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-135158
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 15/Aug/18 21:23
Start Date: 15/Aug/18 21:23
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r210414347
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
 
   def __init__(self):
 super(_PerThreadWorkerData, self).__init__()
-# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements
 
 Review comment:
   Part of this TODO was accidentally deleted.  Please fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 135158)
Time Spent: 19h 20m  (was: 19h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: 2.6.0
>
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120985
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:47
Start Date: 09/Jul/18 19:47
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403598107
 
 
   Merged. Thanks @charlesccychen for reviewing the large-ish change : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120985)
Time Spent: 19h 10m  (was: 19h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120984
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 19:46
Start Date: 09/Jul/18 19:46
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/common.pxd 
b/sdks/python/apache_beam/runners/common.pxd
index 5c5eba22732..4bb226492ba 100644
--- a/sdks/python/apache_beam/runners/common.pxd
+++ b/sdks/python/apache_beam/runners/common.pxd
@@ -81,9 +81,7 @@ cdef class PerWindowInvoker(DoFnInvoker):
 
 cdef class DoFnRunner(Receiver):
   cdef DoFnContext context
-  cdef LoggingContext logging_context
   cdef object step_name
-  cdef ScopedMetricsContainer scoped_metrics_container
   cdef list side_inputs
   cdef DoFnInvoker do_fn_invoker
 
@@ -112,15 +110,5 @@ cdef class DoFnContext(object):
   cpdef set_element(self, WindowedValue windowed_value)
 
 
-cdef class LoggingContext(object):
-  # TODO(robertwb): Optimize "with [cdef class]"
-  cpdef enter(self)
-  cpdef exit(self)
-
-
-cdef class _LoggingContextAdapter(LoggingContext):
-  cdef object underlying
-
-
 cdef class _ReceiverAdapter(Receiver):
   cdef object underlying
diff --git a/sdks/python/apache_beam/runners/common.py 
b/sdks/python/apache_beam/runners/common.py
index d5f35de988f..88745c778e3 100644
--- a/sdks/python/apache_beam/runners/common.py
+++ b/sdks/python/apache_beam/runners/common.py
@@ -119,16 +119,6 @@ def logging_name(self):
 return self.user_name
 
 
-class LoggingContext(object):
-  """For internal use only; no backwards-compatibility guarantees."""
-
-  def enter(self):
-pass
-
-  def exit(self):
-pass
-
-
 class Receiver(object):
   """For internal use only; no backwards-compatibility guarantees.
 
@@ -551,20 +541,15 @@ def __init__(self,
   windowing: windowing properties of the output PCollection(s)
   tagged_receivers: a dict of tag name to Receiver objects
   step_name: the name of this step
-  logging_context: a LoggingContext object
+  logging_context: DEPRECATED [BEAM-4728]
   state: handle for accessing DoFn state
-  scoped_metrics_container: Context switcher for metrics container
+  scoped_metrics_container: DEPRECATED
   operation_name: The system name assigned by the runner for this 
operation.
 """
 # Need to support multiple iterations.
 side_inputs = list(side_inputs)
 
-from apache_beam.metrics.execution import ScopedMetricsContainer
-
-self.scoped_metrics_container = (
-scoped_metrics_container or ScopedMetricsContainer())
 self.step_name = step_name
-self.logging_context = logging_context or LoggingContext()
 self.context = DoFnContext(step_name, state=state)
 
 do_fn_signature = DoFnSignature(fn)
@@ -595,26 +580,16 @@ def receive(self, windowed_value):
 
   def process(self, windowed_value):
 try:
-  self.logging_context.enter()
-  self.scoped_metrics_container.enter()
   self.do_fn_invoker.invoke_process(windowed_value)
 except BaseException as exn:
   self._reraise_augmented(exn)
-finally:
-  self.scoped_metrics_container.exit()
-  self.logging_context.exit()
 
   def _invoke_bundle_method(self, bundle_method):
 try:
-  self.logging_context.enter()
-  self.scoped_metrics_container.enter()
   self.context.set_element(None)
   bundle_method()
 except BaseException as exn:
   self._reraise_augmented(exn)
-finally:
-  self.scoped_metrics_container.exit()
-  self.logging_context.exit()
 
   def start(self):
 self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle)
diff --git a/sdks/python/apache_beam/runners/worker/bundle_processor.py 
b/sdks/python/apache_beam/runners/worker/bundle_processor.py
index 4193ea2debb..958731d0ce4 100644
--- a/sdks/python/apache_beam/runners/worker/bundle_processor.py
+++ b/sdks/python/apache_beam/runners/worker/bundle_processor.py
@@ -63,13 +63,12 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self, operation_name, step_name, consumers, counter_factory,
+  def __init__(self, name_context, step_name, consumers, counter_factory,
state_sampler, windowed_coder, target, data_channel):
 super(RunnerIOOperation, self).__init__(
-operation_name, None, counter_factory, 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120962
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:54
Start Date: 09/Jul/18 18:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403583667
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120962)
Time Spent: 18h 50m  (was: 18h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120961
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 18:53
Start Date: 09/Jul/18 18:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403583607
 
 
   Squashed commits and resolved conflicts. Reruning tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120961)
Time Spent: 18h 40m  (was: 18.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120897
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 17:14
Start Date: 09/Jul/18 17:14
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403552409
 
 
   Tests passing. I'll squash and merge this after lunch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120897)
Time Spent: 18.5h  (was: 18h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120824
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/Jul/18 15:53
Start Date: 09/Jul/18 15:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403527745
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120824)
Time Spent: 18h 20m  (was: 18h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-06 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=120079=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-120079
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 06/Jul/18 21:38
Start Date: 06/Jul/18 21:38
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-403153187
 
 
   Thanks Pablo!  This LGTM.  Can you rebase?  It looks like this is a great 
performance improvement too, cutting down processing overhead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 120079)
Time Spent: 18h 10m  (was: 18h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118899
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 23:32
Start Date: 03/Jul/18 23:32
Worklog Time Spent: 10m 
  Work Description: pabloem edited a comment on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-402320036
 
 
   Results of the `map_fn_microbenchmark`. For some (i'd think 
floating-point-related) reason, it gives per-element cost as zero, but if you 
observe row-per-row, the cost with changes is slightly lower than the cost on 
master.
   
   On master:
   ```
1 element  0.798268 sec
 1001 elements 1.01165 sec
 2001 elements 1.05419 sec
 3001 elements 1.1398 sec
 4001 elements 1.37623 sec
 5001 elements 1.47872 sec
 6001 elements 1.68769 sec
 7001 elements 1.68809 sec
 8001 elements 1.8503 sec
 9001 elements 2.0606 sec
   Fixed cost   0.8104043092640963
   Per-element  0.0
   R^2  0.9845043457059202
   ```
   
   With these changes:
   ```
1 element  0.796835 sec
 1001 elements 0.952501 sec
 2001 elements 1.01314 sec
 3001 elements 1.17117 sec
 4001 elements 1.31416 sec
 5001 elements 1.3791 sec
 6001 elements 1.54986 sec
 7001 elements 1.68663 sec
 8001 elements 1.71841 sec
 9001 elements 1.93632 sec
   Fixed cost   0.8011857992764676
   Per-element  0.0
   R^2  0.9922532858766798
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118899)
Time Spent: 18h  (was: 17h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118898
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 23:29
Start Date: 03/Jul/18 23:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-402320036
 
 
   Results of the `map_fn_microbenchmark`, for some floating point reason, it 
gives per-element cost as zero, but if you observe row-per-row, the cost with 
changes is slightly lower than the cost on master.
   
   On master:
   ```
1 element  0.798268 sec
 1001 elements 1.01165 sec
 2001 elements 1.05419 sec
 3001 elements 1.1398 sec
 4001 elements 1.37623 sec
 5001 elements 1.47872 sec
 6001 elements 1.68769 sec
 7001 elements 1.68809 sec
 8001 elements 1.8503 sec
 9001 elements 2.0606 sec
   Fixed cost   0.8104043092640963
   Per-element  0.0
   R^2  0.9845043457059202
   ```
   
   With these changes:
   ```
1 element  0.796835 sec
 1001 elements 0.952501 sec
 2001 elements 1.01314 sec
 3001 elements 1.17117 sec
 4001 elements 1.31416 sec
 5001 elements 1.3791 sec
 6001 elements 1.54986 sec
 7001 elements 1.68663 sec
 8001 elements 1.71841 sec
 9001 elements 1.93632 sec
   Fixed cost   0.8011857992764676
   Per-element  0.0
   R^2  0.9922532858766798
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118898)
Time Spent: 17h 50m  (was: 17h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118894
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 23:16
Start Date: 03/Jul/18 23:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-402318032
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118894)
Time Spent: 17h 40m  (was: 17.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118883
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 22:41
Start Date: 03/Jul/18 22:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-402312367
 
 
   Also removed the extra path in `statesampler` for `CounterFactory`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118883)
Time Spent: 17.5h  (was: 17h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118879
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 22:40
Start Date: 03/Jul/18 22:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199965066
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -59,13 +59,12 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self, operation_name, step_name, consumers, counter_factory,
+  def __init__(self, name_context, step_name, consumers, counter_factory,
state_sampler, windowed_coder, target, data_channel):
 super(RunnerIOOperation, self).__init__(
-operation_name, None, counter_factory, state_sampler)
+name_context, None, counter_factory, state_sampler)
 self.windowed_coder = windowed_coder
 self.windowed_coder_impl = windowed_coder.get_impl()
-self.step_name = step_name
 
 Review comment:
   This is part of my goal with BEAM-4028. The step name is meant to only be 
retrievable through the name context.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118879)
Time Spent: 17h  (was: 16h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118880
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 22:40
Start Date: 03/Jul/18 22:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199965280
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
 
   def __init__(self):
 super(_PerThreadWorkerData, self).__init__()
-# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements
 
 Review comment:
   The logging context will be removed once it's no longer useful (after it's 
removed from Google code) so optimizations should not be considered anymore. 
I'll remove it ASAP as part of BEAM-4728


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118880)
Time Spent: 17h 10m  (was: 17h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118882
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 22:40
Start Date: 03/Jul/18 22:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199958565
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -49,7 +48,7 @@ def get_data(self):
 per_thread_worker_data = _PerThreadWorkerData()
 
 
-class PerThreadLoggingContext(LoggingContext):
+class PerThreadLoggingContext(object):
 
 Review comment:
   This class is used internally at google, so we need to remove it from there 
first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118882)
Time Spent: 17h 20m  (was: 17h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118881
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 22:40
Start Date: 03/Jul/18 22:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199965341
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operation_specs.py
 ##
 @@ -376,9 +376,9 @@ def __init__(self, operations, stage_name,
step_names=None,
original_names=None,
name_contexts=None):
+
 self.operations = operations
 self.stage_name = stage_name
-# TODO(BEAM-4028): Remove arguments other than name_contexts.
 
 Review comment:
   Oops. Not obsolete. Good catch!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118881)
Time Spent: 17h 10m  (was: 17h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118878
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 22:40
Start Date: 03/Jul/18 22:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199963501
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -539,19 +529,14 @@ def __init__(self,
   windowing: windowing properties of the output PCollection(s)
   tagged_receivers: a dict of tag name to Receiver objects
   step_name: the name of this step
-  logging_context: a LoggingContext object
+  logging_context: DEPRECATED
 
 Review comment:
   Added BEAM-4728.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118878)
Time Spent: 16h 50m  (was: 16h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118552
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199661824
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -59,13 +59,12 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self, operation_name, step_name, consumers, counter_factory,
+  def __init__(self, name_context, step_name, consumers, counter_factory,
state_sampler, windowed_coder, target, data_channel):
 super(RunnerIOOperation, self).__init__(
-operation_name, None, counter_factory, state_sampler)
+name_context, None, counter_factory, state_sampler)
 self.windowed_coder = windowed_coder
 self.windowed_coder_impl = windowed_coder.get_impl()
-self.step_name = step_name
 
 Review comment:
   Is this deletion intentional?  If so, can you add a comment / JIRA reference 
to clean up step_name in the arguments?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118552)
Time Spent: 16.5h  (was: 16h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118554
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199662355
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/statesampler.py
 ##
 @@ -62,12 +64,22 @@ def __init__(self, prefix, counter_factory,
sampling_period_ms=DEFAULT_SAMPLING_PERIOD_MS):
 self.states_by_name = {}
 self._prefix = prefix
-self._counter_factory = counter_factory
+self._counter_factory = counter_factory or CounterFactory()
 
 Review comment:
   It looks like the second branch is only used by tests.  Can we have the 
tests pass empty `CounterFactory()`s instead of adding this optional behavior 
in the actual code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118554)
Time Spent: 16h 40m  (was: 16.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118551
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199661924
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -49,7 +48,7 @@ def get_data(self):
 per_thread_worker_data = _PerThreadWorkerData()
 
 
-class PerThreadLoggingContext(LoggingContext):
+class PerThreadLoggingContext(object):
 
 Review comment:
   Are we able to get rid of this class entirely?  It looks like you removed 
the only usage in operations.py.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118551)
Time Spent: 16.5h  (was: 16h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118553
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199662052
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/logger.py
 ##
 @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
 
   def __init__(self):
 super(_PerThreadWorkerData, self).__init__()
-# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements
 
 Review comment:
   Accidental deletion?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118553)
Time Spent: 16h 40m  (was: 16.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118549
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199661623
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -539,19 +529,14 @@ def __init__(self,
   windowing: windowing properties of the output PCollection(s)
   tagged_receivers: a dict of tag name to Receiver objects
   step_name: the name of this step
-  logging_context: a LoggingContext object
+  logging_context: DEPRECATED
 
 Review comment:
   Can you add a JIRA to remove this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118549)
Time Spent: 16h 20m  (was: 16h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-07-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=118550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-118550
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Jul/18 01:34
Start Date: 03/Jul/18 01:34
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on a change in pull request 
#5356: [BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#discussion_r199255236
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operation_specs.py
 ##
 @@ -376,9 +376,9 @@ def __init__(self, operations, stage_name,
step_names=None,
original_names=None,
name_contexts=None):
+
 self.operations = operations
 self.stage_name = stage_name
-# TODO(BEAM-4028): Remove arguments other than name_contexts.
 
 Review comment:
   Is this obsolete?  The Jira is still open.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 118550)
Time Spent: 16.5h  (was: 16h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116892
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 28/Jun/18 16:39
Start Date: 28/Jun/18 16:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-401097706
 
 
   r: @charlesccychen 
   This unifies context management in Python, so that logging will use the 
state sampler to retrieve its current state. Also, NameContext is used more 
widely to improve step name management.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116892)
Time Spent: 16h 10m  (was: 16h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116890
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 28/Jun/18 16:37
Start Date: 28/Jun/18 16:37
Worklog Time Spent: 10m 
  Work Description: pabloem edited a comment on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-394432491
 
 
   This unifies context management in Python, which simplifies further feature 
work, and also expands the use of NameContext, which should improve the 
separation of runner and sdk harness.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116890)
Time Spent: 16h  (was: 15h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116687
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 28/Jun/18 00:25
Start Date: 28/Jun/18 00:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-400871602
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116687)
Time Spent: 15h 50m  (was: 15h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-27 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116615
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 27/Jun/18 21:28
Start Date: 27/Jun/18 21:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-400834725
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116615)
Time Spent: 15h 40m  (was: 15.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116250
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 27/Jun/18 01:12
Start Date: 27/Jun/18 01:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-400511400
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116250)
Time Spent: 15.5h  (was: 15h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116218
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 27/Jun/18 00:37
Start Date: 27/Jun/18 00:37
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-400506299
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116218)
Time Spent: 15h 20m  (was: 15h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116169
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 26/Jun/18 22:06
Start Date: 26/Jun/18 22:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-400478419
 
 
   Run Python PreCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116169)
Time Spent: 15h 10m  (was: 15h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-26 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=116108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-116108
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 26/Jun/18 19:10
Start Date: 26/Jun/18 19:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-400429277
 
 
   Run Python Dataflow ValidatesRunner


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 116108)
Time Spent: 15h  (was: 14h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=108674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-108674
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 04/Jun/18 17:21
Start Date: 04/Jun/18 17:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5356: 
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context
URL: https://github.com/apache/beam/pull/5356#issuecomment-394432491
 
 
   r: @aaltay 
   This unifies context management in Python, which simplifies further feature 
work, and also expands the use of NameContext, which should improve the 
separation of runner and sdk harness.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 108674)
Time Spent: 14h 50m  (was: 14h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=100247=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-100247
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/May/18 20:27
Start Date: 09/May/18 20:27
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #5321: [BEAM-2732] 
Improving Cython annotations for State Sampler
URL: https://github.com/apache/beam/pull/5321
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/worker/opcounters.pxd 
b/sdks/python/apache_beam/runners/worker/opcounters.pxd
index 0bcd42848d2..1d7f296c5ce 100644
--- a/sdks/python/apache_beam/runners/worker/opcounters.pxd
+++ b/sdks/python/apache_beam/runners/worker/opcounters.pxd
@@ -19,13 +19,14 @@ cimport cython
 cimport libc.stdint
 
 from apache_beam.utils.counters cimport Counter
+from apache_beam.runners.worker cimport statesampler_fast
 
 
 cdef class TransformIOCounter(object):
   cdef readonly object _counter_factory
   cdef readonly object _state_sampler
   cdef Counter bytes_read_counter
-  cdef object scoped_state
+  cdef statesampler_fast.ScopedState scoped_state
   cdef object _latest_step
 
   cpdef update_current_step(self)
diff --git a/sdks/python/apache_beam/runners/worker/statesampler_fast.pxd 
b/sdks/python/apache_beam/runners/worker/statesampler_fast.pxd
new file mode 100644
index 000..a808a8e4a89
--- /dev/null
+++ b/sdks/python/apache_beam/runners/worker/statesampler_fast.pxd
@@ -0,0 +1,59 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+cimport cython
+
+from apache_beam.metrics.execution cimport MetricsContainer
+
+from cpython cimport pythread
+from libc.stdint cimport int32_t, int64_t
+
+cdef class StateSampler(object):
+  """Tracks time spent in states during pipeline execution."""
+  cdef int _sampling_period_ms
+
+  cdef list scoped_states_by_index
+
+  cdef public bint started
+  cdef public bint finished
+  cdef object sampling_thread
+
+  # This lock guards members that are shared between threads, specificaly
+  # finished, scoped_states_by_index, and the nsecs field of each state 
therein.
+  cdef pythread.PyThread_type_lock lock
+
+  cdef public int64_t state_transition_count
+  cdef public int64_t time_since_transition
+
+  cdef int32_t current_state_index
+
+  cpdef _scoped_state(self, counter_name, output_counter, metrics_container)
+
+cdef class ScopedState(object):
+  """Context manager class managing transitions for a given sampler state."""
+
+  cdef readonly StateSampler sampler
+  cdef readonly int32_t state_index
+  cdef readonly object counter
+  cdef readonly object name
+  cdef readonly int64_t _nsecs
+  cdef int32_t old_state_index
+  cdef readonly MetricsContainer _metrics_container
+
+  cpdef __enter__(self)
+
+  cpdef __exit__(self, unused_exc_type, unused_exc_value, unused_traceback)
diff --git a/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx 
b/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx
index 0fc58445f3b..21b09e626ec 100644
--- a/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx
+++ b/sdks/python/apache_beam/runners/worker/statesampler_fast.pyx
@@ -70,23 +70,6 @@ cdef inline int64_t get_nsec_time() nogil:
 
 cdef class StateSampler(object):
   """Tracks time spent in states during pipeline execution."""
-  cdef int _sampling_period_ms
-
-  cdef list scoped_states_by_index
-
-  cdef public bint started
-  cdef public bint finished
-  cdef object sampling_thread
-
-  # This lock guards members that are shared between threads, specificaly
-  # finished, scoped_states_by_index, and the nsecs field of each state 
therein.
-  cdef pythread.PyThread_type_lock lock
-
-  cdef public int64_t state_transition_count
-  cdef public int64_t time_since_transition
-
-  cdef int32_t current_state_index
-
   def __init__(self, sampling_period_ms, *args):
  

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=100202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-100202
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/May/18 18:43
Start Date: 09/May/18 18:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5321: [BEAM-2732] Improving 
state sampler Cython annotations for State Sampler
URL: https://github.com/apache/beam/pull/5321#issuecomment-387837012
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 100202)
Time Spent: 14.5h  (was: 14h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=100201=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-100201
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/May/18 18:43
Start Date: 09/May/18 18:43
Worklog Time Spent: 10m 
  Work Description: pabloem opened a new pull request #5321: [BEAM-2732] 
Improving state sampler Cython annotations for State Sampler
URL: https://github.com/apache/beam/pull/5321
 
 
   This makes sure that other classes can rely on the c-types that are exported 
by statesampler, and is utilized in the TransformIOCounters to show it compiles


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 100201)
Time Spent: 14h 20m  (was: 14h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-09 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=100048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-100048
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 09/May/18 15:39
Start Date: 09/May/18 15:39
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #5299: [BEAM-2732] 
StateSampler knows the execution thread it tracks.
URL: https://github.com/apache/beam/pull/5299
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/runners/worker/statesampler.py 
b/sdks/python/apache_beam/runners/worker/statesampler.py
index d3980928ac7..8a00079c984 100644
--- a/sdks/python/apache_beam/runners/worker/statesampler.py
+++ b/sdks/python/apache_beam/runners/worker/statesampler.py
@@ -47,7 +47,10 @@ def get_current_tracker():
 
 StateSamplerInfo = namedtuple(
 'StateSamplerInfo',
-['state_name', 'transition_count', 'time_since_transition'])
+['state_name',
+ 'transition_count',
+ 'time_since_transition',
+ 'tracked_thread'])
 
 
 # Default period for sampling current state of pipeline execution.
@@ -63,6 +66,7 @@ def __init__(self, prefix, counter_factory,
 self._counter_factory = counter_factory
 self._states_by_name = {}
 self.sampling_period_ms = sampling_period_ms
+self.tracked_thread = None
 super(StateSampler, self).__init__(sampling_period_ms)
 
   def stop_if_still_running(self):
@@ -70,6 +74,7 @@ def stop_if_still_running(self):
   self.stop()
 
   def start(self):
+self.tracked_thread = threading.current_thread()
 set_current_tracker(self)
 execution.metrics_startup()
 super(StateSampler, self).start()
@@ -80,7 +85,8 @@ def get_info(self):
 return StateSamplerInfo(
 self.current_state().name,
 self.state_transition_count,
-self.time_since_transition)
+self.time_since_transition,
+self.tracked_thread)
 
   def scoped_state(self,
step_name,


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 100048)
Time Spent: 14h 10m  (was: 14h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=99361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99361
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 08/May/18 04:22
Start Date: 08/May/18 04:22
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5299: [BEAM-2732] 
StateSampler knows the execution thread it tracks.
URL: https://github.com/apache/beam/pull/5299#issuecomment-387280122
 
 
   r: @tvalentyn 
   To improve debuggability of Python pipelines


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 99361)
Time Spent: 14h  (was: 13h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=99333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99333
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 08/May/18 02:26
Start Date: 08/May/18 02:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5299: [BEAM-2732] 
StateSampler knows the execution thread it tracks.
URL: https://github.com/apache/beam/pull/5299#issuecomment-387241569
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 99333)
Time Spent: 13h 40m  (was: 13.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=99334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99334
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 08/May/18 02:26
Start Date: 08/May/18 02:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5299: [BEAM-2732] 
StateSampler knows the execution thread it tracks.
URL: https://github.com/apache/beam/pull/5299#issuecomment-387264663
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 99334)
Time Spent: 13h 50m  (was: 13h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=99281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99281
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 07/May/18 23:55
Start Date: 07/May/18 23:55
Worklog Time Spent: 10m 
  Work Description: pabloem opened a new pull request #5299: [BEAM-2732] 
StateSampler knows the execution thread it tracks.
URL: https://github.com/apache/beam/pull/5299
 
 
   This will enable features such as logging a stack trace when the execution 
thread is stuck in a specific execution state


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 99281)
Time Spent: 13h 20m  (was: 13h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-05-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=99282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99282
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 07/May/18 23:55
Start Date: 07/May/18 23:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #5299: [BEAM-2732] 
StateSampler knows the execution thread it tracks.
URL: https://github.com/apache/beam/pull/5299#issuecomment-387241569
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 99282)
Time Spent: 13.5h  (was: 13h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-24 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=94470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94470
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 24/Apr/18 06:10
Start Date: 24/Apr/18 06:10
Worklog Time Spent: 10m 
  Work Description: robertwb closed pull request #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/metrics/execution.py 
b/sdks/python/apache_beam/metrics/execution.py
index f6c790de5d4..310faf6c9c8 100644
--- a/sdks/python/apache_beam/metrics/execution.py
+++ b/sdks/python/apache_beam/metrics/execution.py
@@ -127,25 +127,34 @@ def set_metrics_supported(self, supported):
 with self._METRICS_SUPPORTED_LOCK:
   self.METRICS_SUPPORTED = supported
 
-  def current_container(self):
+  def _old_style_container(self):
+"""Gets the current MetricsContainer based on the container stack.
+
+The container stack is the old method, and will be deprecated. Should
+rely on StateSampler instead."""
 self.set_container_stack()
 index = len(self.PER_THREAD.container) - 1
 if index < 0:
   return None
 return self.PER_THREAD.container[index]
 
-  def set_current_container(self, container):
-self.set_container_stack()
-self.PER_THREAD.container.append(container)
-
-  def unset_current_container(self):
-self.set_container_stack()
-self.PER_THREAD.container.pop()
+  def current_container(self):
+"""Returns the current MetricsContainer."""
+sampler = statesampler.get_current_tracker()
+if sampler is None:
+  return self._old_style_container()
+return sampler.current_state().metrics_container
 
 
 MetricsEnvironment = _MetricsEnvironment()
 
 
+def metrics_startup():
+  """Initialize metrics context to run."""
+  global statesampler  # pylint: disable=global-variable-not-assigned
+  from apache_beam.runners.worker import statesampler
+
+
 class MetricsContainer(object):
   """Holds the metrics of a single step and a single bundle."""
   def __init__(self, step_name):
@@ -227,10 +236,12 @@ def __init__(self, container=None):
 self._container = container
 
   def enter(self):
-self._stack.append(self._container)
+if self._container:
+  self._stack.append(self._container)
 
   def exit(self):
-self._stack.pop()
+if self._container:
+  self._stack.pop()
 
   def __enter__(self):
 self.enter()
diff --git a/sdks/python/apache_beam/metrics/execution_test.py 
b/sdks/python/apache_beam/metrics/execution_test.py
index 2367e35df4d..37d24f3407b 100644
--- a/sdks/python/apache_beam/metrics/execution_test.py
+++ b/sdks/python/apache_beam/metrics/execution_test.py
@@ -18,11 +18,7 @@
 import unittest
 
 from apache_beam.metrics.cells import CellCommitState
-from apache_beam.metrics.execution import MetricKey
 from apache_beam.metrics.execution import MetricsContainer
-from apache_beam.metrics.execution import MetricsEnvironment
-from apache_beam.metrics.execution import ScopedMetricsContainer
-from apache_beam.metrics.metric import Metrics
 from apache_beam.metrics.metricbase import MetricName
 
 
@@ -33,29 +29,6 @@ def test_create_new_counter(self):
 mc.get_counter(MetricName('namespace', 'name'))
 self.assertTrue(MetricName('namespace', 'name') in mc.counters)
 
-  def test_scoped_container(self):
-c1 = MetricsContainer('mystep')
-c2 = MetricsContainer('myinternalstep')
-with ScopedMetricsContainer(c1):
-  self.assertEqual(c1, MetricsEnvironment.current_container())
-  counter = Metrics.counter('ns', 'name')
-  counter.inc(2)
-
-  with ScopedMetricsContainer(c2):
-self.assertEqual(c2, MetricsEnvironment.current_container())
-counter = Metrics.counter('ns', 'name')
-counter.inc(3)
-self.assertEqual(
-list(c2.get_cumulative().counters.items()),
-[(MetricKey('myinternalstep', MetricName('ns', 'name')), 3)])
-
-  self.assertEqual(c1, MetricsEnvironment.current_container())
-  counter = Metrics.counter('ns', 'name')
-  counter.inc(4)
-  self.assertEqual(
-  list(c1.get_cumulative().counters.items()),
-  [(MetricKey('mystep', MetricName('ns', 'name')), 6)])
-
   def test_add_to_counter(self):
 mc = MetricsContainer('astep')
 counter = mc.get_counter(MetricName('namespace', 'name'))
@@ -118,29 +91,5 @@ def test_get_cumulative_or_updates(self):
  set([v.value for _, v in cumulative.gauges.items()]))
 
 
-class 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-23 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=94170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-94170
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 23/Apr/18 17:29
Start Date: 23/Apr/18 17:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-383656913
 
 
   I believe this is ready to be merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 94170)
Time Spent: 13h  (was: 12h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-19 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=92624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92624
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 19/Apr/18 15:28
Start Date: 19/Apr/18 15:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382778796
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92624)
Time Spent: 12h 50m  (was: 12h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=92253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92253
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 18/Apr/18 21:53
Start Date: 18/Apr/18 21:53
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382541364
 
 
   https://github.com/apache/beam/pull/5167 should address those errors.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92253)
Time Spent: 12h 40m  (was: 12.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=92231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92231
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 18/Apr/18 20:27
Start Date: 18/Apr/18 20:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382518272
 
 
   Monsterpaste:
   
   ```
   ==
   ERROR: test_basics_with_type_check 
(apache_beam.examples.cookbook.group_with_coder_test.GroupWithCoderTest)
   --
   Traceback (most recent call last):
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/examples/cookbook/group_with_coder_test.py",
 line 53, in test_basics_with_type_check
   '--output=%s.result' % temp_path])
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/examples/cookbook/group_with_coder.py",
 line 118, in run
   | WriteToText(known_args.output))
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/textio.py",
 line 522, in __init__
   skip_header_lines=skip_header_lines)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/textio.py",
 line 117, in __init__
   validate=validate)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filebasedsource.py",
 line 119, in __init__
   self._validate()
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/options/value_provider.py",
 line 124, in _f
   return fnc(self, *args, **kwargs)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filebasedsource.py",
 line 176, in _validate
   match_result = FileSystems.match([pattern], limits=[1])[0]
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filesystems.py",
 line 166, in match
   return filesystem.match(patterns, limits)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filesystem.py",
 line 600, in match
   raise BeamIOError("Match operation failed", exceptions)
   BeamIOError: Match operation failed with exceptions {'/tmp/tmpJcX3Kr*': 
BeamIOError("List operation failed with exceptions {'/tmp': OSError(2, 'No such 
file or directory')}",)}
    >> begin captured logging << 
   root: INFO: Missing pipeline option (runner). Executing pipeline using the 
default runner: DirectRunner.
   - >> end captured logging << -
   
   ==
   ERROR: test_basics_without_type_check 
(apache_beam.examples.cookbook.group_with_coder_test.GroupWithCoderTest)
   --
   Traceback (most recent call last):
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/examples/cookbook/group_with_coder_test.py",
 line 74, in test_basics_without_type_check
   '--output=%s.result' % temp_path])
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/examples/cookbook/group_with_coder.py",
 line 118, in run
   | WriteToText(known_args.output))
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/textio.py",
 line 522, in __init__
   skip_header_lines=skip_header_lines)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/textio.py",
 line 117, in __init__
   validate=validate)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filebasedsource.py",
 line 119, in __init__
   self._validate()
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/options/value_provider.py",
 line 124, in _f
   return fnc(self, *args, **kwargs)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filebasedsource.py",
 line 176, in _validate
   match_result = FileSystems.match([pattern], limits=[1])[0]
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filesystems.py",
 line 166, in match
   return filesystem.match(patterns, limits)
 File 
"/usr/local/google/home/pabloem/codes/global-sampler-metrics/sdks/python/apache_beam/io/filesystem.py",
 line 600, in match
   raise 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=92227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92227
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 18/Apr/18 20:01
Start Date: 18/Apr/18 20:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382510866
 
 
   What errors were you getting on your machine? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92227)
Time Spent: 12h 20m  (was: 12h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=92212=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92212
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 18/Apr/18 18:48
Start Date: 18/Apr/18 18:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382490595
 
 
   PreCommits seem to be broken, also on my machine after I rebased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92212)
Time Spent: 12h 10m  (was: 12h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-18 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=92144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-92144
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 18/Apr/18 15:34
Start Date: 18/Apr/18 15:34
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382430043
 
 
   Squashed commits. Letting tests run.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 92144)
Time Spent: 11h 50m  (was: 11h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91812
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 17/Apr/18 16:41
Start Date: 17/Apr/18 16:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382055245
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91812)
Time Spent: 11h 40m  (was: 11.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 11h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91801
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 17/Apr/18 16:25
Start Date: 17/Apr/18 16:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382055245
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91801)
Time Spent: 11.5h  (was: 11h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91800
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 17/Apr/18 16:25
Start Date: 17/Apr/18 16:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382029439
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91800)
Time Spent: 11h 20m  (was: 11h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-17 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91776
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 17/Apr/18 15:12
Start Date: 17/Apr/18 15:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-382029439
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91776)
Time Spent: 11h 10m  (was: 11h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91521
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 22:09
Start Date: 16/Apr/18 22:09
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181900638
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -133,24 +133,25 @@ def __init__(self, name_context, spec, counter_factory, 
state_sampler):
 
 # These are overwritten in the legacy harness.
 self.metrics_container = MetricsContainer(self.name_context.metrics_name())
-self.scoped_metrics_container = ScopedMetricsContainer(
-self.metrics_container)
+self.scoped_metrics_container = ScopedMetricsContainer()
 
 Review comment:
   Done. Thanks Robert.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91521)
Time Spent: 11h  (was: 10h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91520
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 22:09
Start Date: 16/Apr/18 22:09
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181900611
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/executor.py
 ##
 @@ -290,70 +293,87 @@ def __init__(self, transform_evaluator_registry, 
evaluation_context,
 self._retry_count = 0
 self._max_retries_per_bundle = TransformExecutor._MAX_RETRY_PER_BUNDLE
 
-  def call(self):
+  def call(self, state_sampler):
 self._call_count += 1
 assert self._call_count <= (1 + len(self._applied_ptransform.side_inputs))
 metrics_container = MetricsContainer(self._applied_ptransform.full_label)
-scoped_metrics_container = ScopedMetricsContainer(metrics_container)
-
-for side_input in self._applied_ptransform.side_inputs:
-  # Find the projection of main's window onto the side input's window.
-  window_mapping_fn = side_input._view_options().get(
-  'window_mapping_fn', sideinputs._global_window_mapping_fn)
-  main_onto_side_window = window_mapping_fn(self._latest_main_input_window)
-  block_until = main_onto_side_window.end
-
-  if side_input not in self._side_input_values:
-value = self._evaluation_context.get_value_or_block_until_ready(
-side_input, self, block_until)
-if not value:
-  # Monitor task will reschedule this executor once the side input is
-  # available.
-  return
-self._side_input_values[side_input] = value
-side_input_values = [self._side_input_values[side_input]
- for side_input in 
self._applied_ptransform.side_inputs]
-
-while self._retry_count < self._max_retries_per_bundle:
-  try:
-self.attempt_call(metrics_container,
-  scoped_metrics_container,
-  side_input_values)
-break
-  except Exception as e:
-self._retry_count += 1
-logging.error(
-'Exception at bundle %r, due to an exception.\n %s',
-self._input_bundle, traceback.format_exc())
-if self._retry_count == self._max_retries_per_bundle:
-  logging.error('Giving up after %s attempts.',
-self._max_retries_per_bundle)
-  self._completion_callback.handle_exception(self, e)
+start_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'start',
+metrics_container=metrics_container)
+process_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'process',
+metrics_container=metrics_container)
+finish_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'finish',
+metrics_container=metrics_container)
+
+with start_state:
+  for side_input in self._applied_ptransform.side_inputs:
+# Find the projection of main's window onto the side input's window.
+window_mapping_fn = side_input._view_options().get(
+'window_mapping_fn', sideinputs._global_window_mapping_fn)
+main_onto_side_window = window_mapping_fn(
+self._latest_main_input_window)
+block_until = main_onto_side_window.end
+
+if side_input not in self._side_input_values:
+  value = self._evaluation_context.get_value_or_block_until_ready(
+  side_input, self, block_until)
+  if not value:
+# Monitor task will reschedule this executor once the side input is
+# available.
+return
+  self._side_input_values[side_input] = value
+  side_input_values = [
+  self._side_input_values[side_input]
+  for side_input in self._applied_ptransform.side_inputs]
+
+  while self._retry_count < self._max_retries_per_bundle:
+try:
+  self.attempt_call(metrics_container,
+side_input_values,
+process_state,
+finish_state)
+  break
+except Exception as e:
+  self._retry_count += 1
+  logging.error(
+  'Exception at bundle %r, due to an exception.\n %s',
+  self._input_bundle, traceback.format_exc())
+  if self._retry_count == self._max_retries_per_bundle:
+logging.error('Giving up after %s attempts.',
+  self._max_retries_per_bundle)
+self._completion_callback.handle_exception(self, e)
 
 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91517
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 21:59
Start Date: 16/Apr/18 21:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181898590
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/executor.py
 ##
 @@ -290,70 +293,87 @@ def __init__(self, transform_evaluator_registry, 
evaluation_context,
 self._retry_count = 0
 self._max_retries_per_bundle = TransformExecutor._MAX_RETRY_PER_BUNDLE
 
-  def call(self):
+  def call(self, state_sampler):
 self._call_count += 1
 assert self._call_count <= (1 + len(self._applied_ptransform.side_inputs))
 metrics_container = MetricsContainer(self._applied_ptransform.full_label)
-scoped_metrics_container = ScopedMetricsContainer(metrics_container)
-
-for side_input in self._applied_ptransform.side_inputs:
-  # Find the projection of main's window onto the side input's window.
-  window_mapping_fn = side_input._view_options().get(
-  'window_mapping_fn', sideinputs._global_window_mapping_fn)
-  main_onto_side_window = window_mapping_fn(self._latest_main_input_window)
-  block_until = main_onto_side_window.end
-
-  if side_input not in self._side_input_values:
-value = self._evaluation_context.get_value_or_block_until_ready(
-side_input, self, block_until)
-if not value:
-  # Monitor task will reschedule this executor once the side input is
-  # available.
-  return
-self._side_input_values[side_input] = value
-side_input_values = [self._side_input_values[side_input]
- for side_input in 
self._applied_ptransform.side_inputs]
-
-while self._retry_count < self._max_retries_per_bundle:
-  try:
-self.attempt_call(metrics_container,
-  scoped_metrics_container,
-  side_input_values)
-break
-  except Exception as e:
-self._retry_count += 1
-logging.error(
-'Exception at bundle %r, due to an exception.\n %s',
-self._input_bundle, traceback.format_exc())
-if self._retry_count == self._max_retries_per_bundle:
-  logging.error('Giving up after %s attempts.',
-self._max_retries_per_bundle)
-  self._completion_callback.handle_exception(self, e)
+start_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'start',
+metrics_container=metrics_container)
+process_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'process',
+metrics_container=metrics_container)
+finish_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'finish',
+metrics_container=metrics_container)
+
+with start_state:
+  for side_input in self._applied_ptransform.side_inputs:
+# Find the projection of main's window onto the side input's window.
+window_mapping_fn = side_input._view_options().get(
+'window_mapping_fn', sideinputs._global_window_mapping_fn)
+main_onto_side_window = window_mapping_fn(
+self._latest_main_input_window)
+block_until = main_onto_side_window.end
+
+if side_input not in self._side_input_values:
+  value = self._evaluation_context.get_value_or_block_until_ready(
+  side_input, self, block_until)
+  if not value:
+# Monitor task will reschedule this executor once the side input is
+# available.
+return
+  self._side_input_values[side_input] = value
+  side_input_values = [
+  self._side_input_values[side_input]
+  for side_input in self._applied_ptransform.side_inputs]
+
+  while self._retry_count < self._max_retries_per_bundle:
+try:
+  self.attempt_call(metrics_container,
+side_input_values,
+process_state,
+finish_state)
+  break
+except Exception as e:
+  self._retry_count += 1
+  logging.error(
+  'Exception at bundle %r, due to an exception.\n %s',
+  self._input_bundle, traceback.format_exc())
+  if self._retry_count == self._max_retries_per_bundle:
+logging.error('Giving up after %s attempts.',
+  self._max_retries_per_bundle)
+self._completion_callback.handle_exception(self, e)
 
 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91516
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 21:57
Start Date: 16/Apr/18 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181898009
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -133,24 +133,25 @@ def __init__(self, name_context, spec, counter_factory, 
state_sampler):
 
 # These are overwritten in the legacy harness.
 self.metrics_container = MetricsContainer(self.name_context.metrics_name())
-self.scoped_metrics_container = ScopedMetricsContainer(
-self.metrics_container)
+self.scoped_metrics_container = ScopedMetricsContainer()
 
 Review comment:
   OK. Please add a comment (with a JIRA) here about that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91516)
Time Spent: 10.5h  (was: 10h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91486
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 21:01
Start Date: 16/Apr/18 21:01
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381748225
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91486)
Time Spent: 10h 20m  (was: 10h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91485
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 21:01
Start Date: 16/Apr/18 21:01
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381702920
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91485)
Time Spent: 10h 10m  (was: 10h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91483
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:49
Start Date: 16/Apr/18 20:49
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181880514
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -133,24 +133,25 @@ def __init__(self, name_context, spec, counter_factory, 
state_sampler):
 
 # These are overwritten in the legacy harness.
 self.metrics_container = MetricsContainer(self.name_context.metrics_name())
-self.scoped_metrics_container = ScopedMetricsContainer(
-self.metrics_container)
+self.scoped_metrics_container = ScopedMetricsContainer()
 
 Review comment:
   Ah nevermind. The op.scoped_metrics_container is used by Dataflow worker 
code. Needs to be available.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91483)
Time Spent: 10h  (was: 9h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91477
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181877336
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -338,6 +337,7 @@ def run_pipeline(self, pipeline):
 from apache_beam.runners.direct.transform_evaluator import \
   TransformEvaluatorRegistry
 from apache_beam.testing.test_stream import TestStream
+from apache_beam.metrics.execution import MetricsEnvironment
 
 Review comment:
   That is reasonable, and I agree. I've fixed this by having a lazy import in 
the metrics execution module.
   Currently, we're working under the idea that state sampler is the global 
context provider, so metrics rely on it. I'd think that makes it into a module 
that provides a service to other modules that require/handle context 
management. Though I'd agree that a more loose coupling would be quite 
desirable. Let me know what you think : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91477)
Time Spent: 9h 40m  (was: 9.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91476
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181877336
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -338,6 +337,7 @@ def run_pipeline(self, pipeline):
 from apache_beam.runners.direct.transform_evaluator import \
   TransformEvaluatorRegistry
 from apache_beam.testing.test_stream import TestStream
+from apache_beam.metrics.execution import MetricsEnvironment
 
 Review comment:
   That is reasonable, and I agree. I've fixed this by having a lazy import in 
the metrics execution module.
   Currently, we're working under the idea that state sampler is the global 
context provider, so metrics rely on it. I'd think that makes it into a module 
that provides a service to other modules that require/handle context 
management. Though I'd agree that a more loose coupling would be good to have. 
Let me know what you think : )


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91476)
Time Spent: 9.5h  (was: 9h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91474
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181877325
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -221,6 +220,9 @@ def __init__(self,
 self._clock = clock
 self._data = []
 self._ignore_next_timing = False
+
+from apache_beam.metrics import Metrics
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91474)
Time Spent: 9h 10m  (was: 9h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91473=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91473
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181877312
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -133,24 +133,25 @@ def __init__(self, name_context, spec, counter_factory, 
state_sampler):
 
 # These are overwritten in the legacy harness.
 self.metrics_container = MetricsContainer(self.name_context.metrics_name())
-self.scoped_metrics_container = ScopedMetricsContainer(
-self.metrics_container)
+self.scoped_metrics_container = ScopedMetricsContainer()
 
 Review comment:
   Right. Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91473)
Time Spent: 9h  (was: 8h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91478
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381741327
 
 
   Thanks Robert!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91478)
Time Spent: 9h 50m  (was: 9h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91475
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181877336
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -338,6 +337,7 @@ def run_pipeline(self, pipeline):
 from apache_beam.runners.direct.transform_evaluator import \
   TransformEvaluatorRegistry
 from apache_beam.testing.test_stream import TestStream
+from apache_beam.metrics.execution import MetricsEnvironment
 
 Review comment:
   That is reasonable, and I agree. I've fixed this by having a lazy import in 
the metrics execution module.
   Currently, we're working under the idea that state sampler is the global 
context provider, so metrics rely on it. I'd think that makes it into a module 
that provides a service to other modules that require/handle context 
management. Though I'd agree that a more loose coupling would be good to have.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91475)
Time Spent: 9h 20m  (was: 9h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91472
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 20:38
Start Date: 16/Apr/18 20:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181877301
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/executor.py
 ##
 @@ -290,70 +293,87 @@ def __init__(self, transform_evaluator_registry, 
evaluation_context,
 self._retry_count = 0
 self._max_retries_per_bundle = TransformExecutor._MAX_RETRY_PER_BUNDLE
 
-  def call(self):
+  def call(self, state_sampler):
 self._call_count += 1
 assert self._call_count <= (1 + len(self._applied_ptransform.side_inputs))
 metrics_container = MetricsContainer(self._applied_ptransform.full_label)
-scoped_metrics_container = ScopedMetricsContainer(metrics_container)
-
-for side_input in self._applied_ptransform.side_inputs:
-  # Find the projection of main's window onto the side input's window.
-  window_mapping_fn = side_input._view_options().get(
-  'window_mapping_fn', sideinputs._global_window_mapping_fn)
-  main_onto_side_window = window_mapping_fn(self._latest_main_input_window)
-  block_until = main_onto_side_window.end
-
-  if side_input not in self._side_input_values:
-value = self._evaluation_context.get_value_or_block_until_ready(
-side_input, self, block_until)
-if not value:
-  # Monitor task will reschedule this executor once the side input is
-  # available.
-  return
-self._side_input_values[side_input] = value
-side_input_values = [self._side_input_values[side_input]
- for side_input in 
self._applied_ptransform.side_inputs]
-
-while self._retry_count < self._max_retries_per_bundle:
-  try:
-self.attempt_call(metrics_container,
-  scoped_metrics_container,
-  side_input_values)
-break
-  except Exception as e:
-self._retry_count += 1
-logging.error(
-'Exception at bundle %r, due to an exception.\n %s',
-self._input_bundle, traceback.format_exc())
-if self._retry_count == self._max_retries_per_bundle:
-  logging.error('Giving up after %s attempts.',
-self._max_retries_per_bundle)
-  self._completion_callback.handle_exception(self, e)
+start_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'start',
+metrics_container=metrics_container)
+process_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'process',
+metrics_container=metrics_container)
+finish_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'finish',
+metrics_container=metrics_container)
+
+with start_state:
+  for side_input in self._applied_ptransform.side_inputs:
+# Find the projection of main's window onto the side input's window.
+window_mapping_fn = side_input._view_options().get(
+'window_mapping_fn', sideinputs._global_window_mapping_fn)
+main_onto_side_window = window_mapping_fn(
+self._latest_main_input_window)
+block_until = main_onto_side_window.end
+
+if side_input not in self._side_input_values:
+  value = self._evaluation_context.get_value_or_block_until_ready(
+  side_input, self, block_until)
+  if not value:
+# Monitor task will reschedule this executor once the side input is
+# available.
+return
+  self._side_input_values[side_input] = value
+  side_input_values = [
+  self._side_input_values[side_input]
+  for side_input in self._applied_ptransform.side_inputs]
+
+  while self._retry_count < self._max_retries_per_bundle:
+try:
+  self.attempt_call(metrics_container,
+side_input_values,
+process_state,
+finish_state)
+  break
+except Exception as e:
+  self._retry_count += 1
+  logging.error(
+  'Exception at bundle %r, due to an exception.\n %s',
+  self._input_bundle, traceback.format_exc())
+  if self._retry_count == self._max_retries_per_bundle:
+logging.error('Giving up after %s attempts.',
+  self._max_retries_per_bundle)
+self._completion_callback.handle_exception(self, e)
 
 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91442
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 19:30
Start Date: 16/Apr/18 19:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181858411
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -221,6 +220,9 @@ def __init__(self,
 self._clock = clock
 self._data = []
 self._ignore_next_timing = False
+
+from apache_beam.metrics import Metrics
 
 Review comment:
   Undo this change. Transforms should be free to use metrics (at least the 
public metrics write API). 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91442)
Time Spent: 8h 20m  (was: 8h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91441
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 19:30
Start Date: 16/Apr/18 19:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181856996
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/executor.py
 ##
 @@ -290,70 +293,87 @@ def __init__(self, transform_evaluator_registry, 
evaluation_context,
 self._retry_count = 0
 self._max_retries_per_bundle = TransformExecutor._MAX_RETRY_PER_BUNDLE
 
-  def call(self):
+  def call(self, state_sampler):
 self._call_count += 1
 assert self._call_count <= (1 + len(self._applied_ptransform.side_inputs))
 metrics_container = MetricsContainer(self._applied_ptransform.full_label)
-scoped_metrics_container = ScopedMetricsContainer(metrics_container)
-
-for side_input in self._applied_ptransform.side_inputs:
-  # Find the projection of main's window onto the side input's window.
-  window_mapping_fn = side_input._view_options().get(
-  'window_mapping_fn', sideinputs._global_window_mapping_fn)
-  main_onto_side_window = window_mapping_fn(self._latest_main_input_window)
-  block_until = main_onto_side_window.end
-
-  if side_input not in self._side_input_values:
-value = self._evaluation_context.get_value_or_block_until_ready(
-side_input, self, block_until)
-if not value:
-  # Monitor task will reschedule this executor once the side input is
-  # available.
-  return
-self._side_input_values[side_input] = value
-side_input_values = [self._side_input_values[side_input]
- for side_input in 
self._applied_ptransform.side_inputs]
-
-while self._retry_count < self._max_retries_per_bundle:
-  try:
-self.attempt_call(metrics_container,
-  scoped_metrics_container,
-  side_input_values)
-break
-  except Exception as e:
-self._retry_count += 1
-logging.error(
-'Exception at bundle %r, due to an exception.\n %s',
-self._input_bundle, traceback.format_exc())
-if self._retry_count == self._max_retries_per_bundle:
-  logging.error('Giving up after %s attempts.',
-self._max_retries_per_bundle)
-  self._completion_callback.handle_exception(self, e)
+start_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'start',
+metrics_container=metrics_container)
+process_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'process',
+metrics_container=metrics_container)
+finish_state = state_sampler.scoped_state(
+self._applied_ptransform.full_label,
+'finish',
+metrics_container=metrics_container)
+
+with start_state:
+  for side_input in self._applied_ptransform.side_inputs:
+# Find the projection of main's window onto the side input's window.
+window_mapping_fn = side_input._view_options().get(
+'window_mapping_fn', sideinputs._global_window_mapping_fn)
+main_onto_side_window = window_mapping_fn(
+self._latest_main_input_window)
+block_until = main_onto_side_window.end
+
+if side_input not in self._side_input_values:
+  value = self._evaluation_context.get_value_or_block_until_ready(
+  side_input, self, block_until)
+  if not value:
+# Monitor task will reschedule this executor once the side input is
+# available.
+return
+  self._side_input_values[side_input] = value
+  side_input_values = [
+  self._side_input_values[side_input]
+  for side_input in self._applied_ptransform.side_inputs]
+
+  while self._retry_count < self._max_retries_per_bundle:
+try:
+  self.attempt_call(metrics_container,
+side_input_values,
+process_state,
+finish_state)
+  break
+except Exception as e:
+  self._retry_count += 1
+  logging.error(
+  'Exception at bundle %r, due to an exception.\n %s',
+  self._input_bundle, traceback.format_exc())
+  if self._retry_count == self._max_retries_per_bundle:
+logging.error('Giving up after %s attempts.',
+  self._max_retries_per_bundle)
+self._completion_callback.handle_exception(self, e)
 
 

[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91440
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 19:30
Start Date: 16/Apr/18 19:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181857518
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/operations.py
 ##
 @@ -133,24 +133,25 @@ def __init__(self, name_context, spec, counter_factory, 
state_sampler):
 
 # These are overwritten in the legacy harness.
 self.metrics_container = MetricsContainer(self.name_context.metrics_name())
-self.scoped_metrics_container = ScopedMetricsContainer(
-self.metrics_container)
+self.scoped_metrics_container = ScopedMetricsContainer()
 
 Review comment:
   Can this just be deleted?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91440)
Time Spent: 8h 10m  (was: 8h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91443
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 19:30
Start Date: 16/Apr/18 19:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181855332
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -338,6 +337,7 @@ def run_pipeline(self, pipeline):
 from apache_beam.runners.direct.transform_evaluator import \
   TransformEvaluatorRegistry
 from apache_beam.testing.test_stream import TestStream
+from apache_beam.metrics.execution import MetricsEnvironment
 
 Review comment:
   Undo this change? Metrics should not be changed to depend on anything in the 
runners package. (Either that or the lazy import should be made there.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91443)
Time Spent: 8.5h  (was: 8h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91444
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 19:30
Start Date: 16/Apr/18 19:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on a change in pull request #4387: 
[BEAM-2732] Metrics rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#discussion_r181858771
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -338,6 +337,7 @@ def run_pipeline(self, pipeline):
 from apache_beam.runners.direct.transform_evaluator import \
   TransformEvaluatorRegistry
 from apache_beam.testing.test_stream import TestStream
+from apache_beam.metrics.execution import MetricsEnvironment
 
 Review comment:
   Or perhaps don't import metrics.execution when importing the public (write) 
Metrics api. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91444)
Time Spent: 8h 40m  (was: 8.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91426
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 18:26
Start Date: 16/Apr/18 18:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381702920
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91426)
Time Spent: 8h  (was: 7h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91421
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 17:55
Start Date: 16/Apr/18 17:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381693446
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91421)
Time Spent: 7h 50m  (was: 7h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91420
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 17:55
Start Date: 16/Apr/18 17:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381668215
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91420)
Time Spent: 7h 40m  (was: 7.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91389=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91389
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 16:37
Start Date: 16/Apr/18 16:37
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381656696
 
 
   Run Python PostCommit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91389)
Time Spent: 7.5h  (was: 7h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91388
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 16:36
Start Date: 16/Apr/18 16:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381668215
 
 
   Retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91388)
Time Spent: 7h 20m  (was: 7h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91375
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 16:00
Start Date: 16/Apr/18 16:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381656611
 
 
   Rebased change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91375)
Time Spent: 7h  (was: 6h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-16 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=91376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-91376
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 16/Apr/18 16:00
Start Date: 16/Apr/18 16:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-381656696
 
 
   Run Python PostCommit.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 91376)
Time Spent: 7h 10m  (was: 7h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-12 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=90480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90480
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 12/Apr/18 16:44
Start Date: 12/Apr/18 16:44
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-380870668
 
 
   @robertwb PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90480)
Time Spent: 6h 50m  (was: 6h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=90214=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90214
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 11/Apr/18 22:53
Start Date: 11/Apr/18 22:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-380620750
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90214)
Time Spent: 6h 40m  (was: 6.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-11 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=90170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90170
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 11/Apr/18 21:17
Start Date: 11/Apr/18 21:17
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-380598583
 
 
   Rebased change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 90170)
Time Spent: 6.5h  (was: 6h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-04 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87836
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 05/Apr/18 00:32
Start Date: 05/Apr/18 00:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378786407
 
 
   Python Postcommit tests are passing. I'm not sure why the `mvn clean install 
-pl sdks/python -am -am...` are broken, as they are a subset of the Postcommit 
suite:
   
   
![image](https://user-images.githubusercontent.com/1301740/38341543-00e4e738-382e-11e8-9b24-35e45dffbc10.png)
   
   since all tests pass with Postcommit, @robertwb PTAL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87836)
Time Spent: 6h 20m  (was: 6h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87283
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 21:23
Start Date: 03/Apr/18 21:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378403789
 
 
   @pabloem I looked at the post commit tests. It seems to be failing because 
of a quota issue.
   
   ```
   
   Workflow failed. Causes: Project apache-beam-testing has insufficient 
quota(s) to execute this workflow with 1 instances in region us-central1. Quota 
summary (required/available): 1/1441 instances, 1/0 CPUs, 250/14440 disk GB, 
0/1998 SSD disk GB, 1/63 instance groups, 1/13 managed instance groups, 1/39 
instance templates, 1/293 in-use IP addresses.
   
   Please see https://cloud.google.com/compute/docs/resource-quotas about 
requesting more quota.
   ```
   @alanmyrvold Is it possible to increase the quota for the project?
   
   Also @markflyhigh is currently working on fixing an issue with 
`test_streaming_wordcount_it` (other than the timeout). That would probably fix 
your tests issue as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87283)
Time Spent: 5h 50m  (was: 5h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87253
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 20:20
Start Date: 03/Apr/18 20:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378383669
 
 
   `test_streaming_wordcount_it` seems to be timing out, but not showing any 
errors related to my change. Also MArk reports it's been timing out. @robertwb 
can you take a look please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87253)
Time Spent: 5h 40m  (was: 5.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87252
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 20:19
Start Date: 03/Apr/18 20:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378378390
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87252)
Time Spent: 5.5h  (was: 5h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87235
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 20:05
Start Date: 03/Apr/18 20:05
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378378390
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87235)
Time Spent: 5h 20m  (was: 5h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87206=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87206
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 19:01
Start Date: 03/Apr/18 19:01
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378360678
 
 
   All integration tests but `test_streaming_wordcount_it` are passing. I'll 
try to fix that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87206)
Time Spent: 5h 10m  (was: 5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87127=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87127
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:45
Start Date: 03/Apr/18 15:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-377969909
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87127)
Time Spent: 4.5h  (was: 4h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87130
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:45
Start Date: 03/Apr/18 15:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-367066658
 
 
   Rebased


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87130)
Time Spent: 5h  (was: 4h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87129
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:45
Start Date: 03/Apr/18 15:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378054412
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87129)
Time Spent: 4h 50m  (was: 4h 40m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87128
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:45
Start Date: 03/Apr/18 15:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-367423884
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87128)
Time Spent: 4h 40m  (was: 4.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=87126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-87126
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 15:44
Start Date: 03/Apr/18 15:44
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378297046
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 87126)
Time Spent: 4h 20m  (was: 4h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=86879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86879
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 03/Apr/18 00:20
Start Date: 03/Apr/18 00:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378088637
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86879)
Time Spent: 4h 10m  (was: 4h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=86789=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86789
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 02/Apr/18 21:36
Start Date: 02/Apr/18 21:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-378054412
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86789)
Time Spent: 4h  (was: 3h 50m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=86630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86630
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 02/Apr/18 16:27
Start Date: 02/Apr/18 16:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-377969909
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86630)
Time Spent: 3h 20m  (was: 3h 10m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=86631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86631
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 02/Apr/18 16:27
Start Date: 02/Apr/18 16:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-376212792
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86631)
Time Spent: 3.5h  (was: 3h 20m)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-2732) State tracking in Python is inefficient and has duplicated code

2018-04-02 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-2732?focusedWorklogId=86632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-86632
 ]

ASF GitHub Bot logged work on BEAM-2732:


Author: ASF GitHub Bot
Created on: 02/Apr/18 16:27
Start Date: 02/Apr/18 16:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #4387: [BEAM-2732] Metrics 
rely on statesampler state
URL: https://github.com/apache/beam/pull/4387#issuecomment-377634927
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 86632)
Time Spent: 3h 40m  (was: 3.5h)

> State tracking in Python is inefficient and has duplicated code
> ---
>
> Key: BEAM-2732
> URL: https://issues.apache.org/jira/browse/BEAM-2732
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> e.g logging and metrics keep state separately. State tracking should be 
> unified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >