[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=348003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348003
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 22/Nov/19 08:43
Start Date: 22/Nov/19 08:43
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10105: [BEAM-4776] 
Add metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#discussion_r349483248
 
 

 ##
 File path: runners/flink/job-server/flink_job_server.gradle
 ##
 @@ -139,12 +139,9 @@ def portableValidatesRunnerTask(String name, Boolean 
streaming) {
   includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner'
   excludeCategories 
'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders'
   excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB'
-  excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics'
 
 Review comment:
   Not a blocker for this PR but out of curiosity, do enabling these in the 
Portable Spark Runner pass? It would be a good idea to enable it to if so, or 
report the errors so they can be fixed if not.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348003)
Time Spent: 6h 10m  (was: 6h)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=348037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348037
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 09:56
Start Date: 22/Nov/19 09:56
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10161: [BEAM-8746] Make 
local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r349514933
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   It looks ok, but one idea:
   
   >there are two separate hostnames required, one for opening the port for the 
server, and one which is delivered to the client for reconnecting to the 
staging service.
   
   Wouldn't it work to allow configuration of a `bind` address (e.g. `[::]`) 
and a `connect` address (e.g. `service_name`, what is currently returned in 
`get_hostname()`)? That way we would give the subclass full control over what 
it wants to do. Maybe return this as a tuple to avoid multiple overrides.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348037)
Time Spent: 1h 40m  (was: 1.5h)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7636) Migrate SqsIO to AWS SDK for Java 2

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7636?focusedWorklogId=348045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348045
 ]

ASF GitHub Bot logged work on BEAM-7636:


Author: ASF GitHub Bot
Created on: 22/Nov/19 10:20
Start Date: 22/Nov/19 10:20
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #9935: 
[BEAM-7636] Migrate SqsIO to AWS SDK for Java 2
URL: https://github.com/apache/beam/pull/9935
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348045)
Time Spent: 2h 20m  (was: 2h 10m)

> Migrate SqsIO to AWS SDK for Java 2
> ---
>
> Key: BEAM-7636
> URL: https://issues.apache.org/jira/browse/BEAM-7636
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7636) Migrate SqsIO to AWS SDK V2 for Java

2019-11-22 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-7636:
---
Summary: Migrate SqsIO to AWS SDK V2 for Java  (was: Migrate SqsIO to AWS 
SDK for Java 2)

> Migrate SqsIO to AWS SDK V2 for Java
> 
>
> Key: BEAM-7636
> URL: https://issues.apache.org/jira/browse/BEAM-7636
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7636) Migrate SqsIO to AWS SDK V2 for Java

2019-11-22 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-7636.

Fix Version/s: 2.18.0
   Resolution: Fixed

> Migrate SqsIO to AWS SDK V2 for Java
> 
>
> Key: BEAM-7636
> URL: https://issues.apache.org/jira/browse/BEAM-7636
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7636) Migrate SqsIO to AWS SDK for Java 2

2019-11-22 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-7636:
---
Summary: Migrate SqsIO to AWS SDK for Java 2  (was: Migrate SqsIO to AWS 
SDK V2 for Java)

> Migrate SqsIO to AWS SDK for Java 2
> ---
>
> Key: BEAM-7636
> URL: https://issues.apache.org/jira/browse/BEAM-7636
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Cam Mach
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-22 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980070#comment-16980070
 ] 

Maximilian Michels commented on BEAM-8512:
--

This can be used as the embedded testing cluster with the Rest API: 
https://github.com/apache/beam/compare/master...mxm:flink-mini-cluster?expand=1

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=348104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348104
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 13:02
Start Date: 22/Nov/19 13:02
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10126: 
[BEAM-8619] Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349584586
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GaugeCell.java
 ##
 @@ -50,6 +50,12 @@ public GaugeCell(MetricName name) {
 this.name = name;
   }
 
+  @Override
+  public void reset() {
+dirty.afterModification();
 
 Review comment:
   The newly constructed DirtyState object is DIRTY by default. 
(https://github.com/apache/beam/blob/e51aa5f978c05f094310d67c4704017009121948/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/DirtyState.java#L55).
 If it's set to CLEAN when reset, I think we should also change that behavior. 
However, I'm not sure whether this is by design and so I have not updated it 
for now. What's your thought?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348104)
Time Spent: 1h 40m  (was: 1.5h)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=348126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348126
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 13:17
Start Date: 22/Nov/19 13:17
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#issuecomment-557527718
 
 
   Thanks for the review and I have update the PR accordingly. @lukecwik 
   Only one comment I am not pretty sure, and I have left the message to you :)
   I appreciate if you have another look. 
   Best,
   Jincheng
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348126)
Time Spent: 1h 50m  (was: 1h 40m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

2019-11-22 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980190#comment-16980190
 ] 

Brian Hulette commented on BEAM-8198:
-

Thomas definitely 
[said|https://lists.apache.org/thread.html/079413d5700a89fb974cf690d996d8b663b4251dcb0f5efa4f1d72da@%3Cdev.beam.apache.org%3E]
 this was just a problem with cython setup:
{quote}I found that the regression was caused by our own Cython setup.{quote}

So I think this is safe to close. He did have a 
[suggestion|https://lists.apache.org/thread.html/efab49bb1938780681d82771924b0cb0398d7a3e6bebf4b0ffad5a6b@%3Cdev.beam.apache.org%3E]
 to better document Cython setup, but that could be another JIRA if we want to 
track it.

> Investigate possible performance regression of Wordcount 1GB batch benchmark 
> on Py3.
> 
>
> Key: BEAM-8198
> URL: https://issues.apache.org/jira/browse/BEAM-8198
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.17.0
>
>
> context: 
> https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E
> Setting fix version to 2.16.0 to understand the cause, hopefully before the 
> vote.
> cc: [~altay] [~thw] [~markflyhigh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8806) Add integration tests for SqsIO

2019-11-22 Thread Alexey Romanenko (Jira)
Alexey Romanenko created BEAM-8806:
--

 Summary: Add integration tests for SqsIO
 Key: BEAM-8806
 URL: https://issues.apache.org/jira/browse/BEAM-8806
 Project: Beam
  Issue Type: Improvement
  Components: io-java-aws
Reporter: Alexey Romanenko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8807) Add integration tests for SnsIO

2019-11-22 Thread Alexey Romanenko (Jira)
Alexey Romanenko created BEAM-8807:
--

 Summary: Add integration tests for SnsIO
 Key: BEAM-8807
 URL: https://issues.apache.org/jira/browse/BEAM-8807
 Project: Beam
  Issue Type: Improvement
  Components: io-java-aws
Reporter: Alexey Romanenko






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8806) Add integration tests for SqsIO

2019-11-22 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8806:
---
Priority: Major  (was: Minor)

> Add integration tests for SqsIO
> ---
>
> Key: BEAM-8806
> URL: https://issues.apache.org/jira/browse/BEAM-8806
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Alexey Romanenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8807) Add integration tests for SnsIO

2019-11-22 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8807:
---
Status: Open  (was: Triage Needed)

> Add integration tests for SnsIO
> ---
>
> Key: BEAM-8807
> URL: https://issues.apache.org/jira/browse/BEAM-8807
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Alexey Romanenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8806) Add integration tests for SqsIO

2019-11-22 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8806:
---
Status: Open  (was: Triage Needed)

> Add integration tests for SqsIO
> ---
>
> Key: BEAM-8806
> URL: https://issues.apache.org/jira/browse/BEAM-8806
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Alexey Romanenko
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8732) Add support for additional structured types to Schemas/RowCoders

2019-11-22 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980206#comment-16980206
 ] 

Brian Hulette commented on BEAM-8732:
-

I'm definitely fine with you taking this on, but I'm not sure that calling 
{{typing_to_runner_api()}} is all that needs to happen. Since that function 
generates a random UUID for the schema, I think it'll get a different UUID when 
executed at pipeline construction time than when executed at pipeline execution 
time (on the workers), so the workers won't be able to look up the correct 
class in the registry and will generate one instead. Perhaps one solution would 
be to make the UUID generation deterministic somehow? But I'm not sure how 
feasible that is.

In general the solution to this problem in the Java SDK has been to include 
serialized java classes in the pipeline graph, e.g. 
[SchemaCoder|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java]
 is currently just represented as a big serialized class including the user 
class and functions for converting to/from Row. We also currently allow for 
[java-specific logical 
types|https://github.com/apache/beam/blob/07d952f313477ee18cdc706100ba7e1810b1ef4f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java#L91-L102]
 which serialize a class for converting to/from user types.

cc: [~reuvenlax] who is more familiar with the Java side of this than me.

> Add support for additional structured types to Schemas/RowCoders
> 
>
> Key: BEAM-8732
> URL: https://issues.apache.org/jira/browse/BEAM-8732
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>
> Currently we can convert between a {{NamedTuple}} type and its {{Schema}} 
> protos using {{named_tuple_from_schema}} and {{named_tuple_to_schema}}. I'd 
> like to introduce a system to support additional types, starting with 
> structured types like {{attrs}}, {{dataclasses}}, and {{TypedDict}}.
> I've only just started digesting the code, but this task seems pretty 
> straightforward. For example, I think the type-to-schema code would look 
> roughly like this:
> {code:python}
> def typing_to_runner_api(type_):
>   # type: (Type) -> schema_pb2.FieldType
>   structured_handler = _get_structured_handler(type_)
>   if structured_handler:
> schema = None
> if hasattr(type_, 'id'):
>   schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id)
> if schema is None:
>   fields = structured_handler.get_fields()
>   type_id = str(uuid4())
>   schema = schema_pb2.Schema(fields=fields, id=type_id)
>   SCHEMA_REGISTRY.add(type_, schema)
> return schema_pb2.FieldType(
> row_type=schema_pb2.RowType(
> schema=schema))
> {code}
> The rest of the work would be in implementing a class hierarchy for working 
> with structured types, such as getting a list of fields from an instance, and 
> instantiation from a list of fields. Eventually we can extend this behavior 
> to arbitrary, unstructured types.  
> Going in the schema-to-type direction, we have the problem of choosing which 
> type to use for a given schema. I believe that as long as 
> {{typing_to_runner_api()}} has been called on our structured type in the 
> current python session, it should be added to the registry and thus round 
> trip ok, so I think we just need a public function for registering schemas 
> for structured types.
> [~bhulette] Did you want to tackle this or are you ok with me going after it?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8747?focusedWorklogId=348191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348191
 ]

ASF GitHub Bot logged work on BEAM-8747:


Author: ASF GitHub Bot
Created on: 22/Nov/19 15:35
Start Date: 22/Nov/19 15:35
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10172: 
[BEAM-8747] Guava dependency cleanup
URL: https://github.com/apache/beam/pull/10172#discussion_r349656312
 
 

 ##
 File path: sdks/java/io/kinesis/build.gradle
 ##
 @@ -34,6 +34,7 @@ dependencies {
   compile library.java.slf4j_api
   compile library.java.joda_time
   compile library.java.jackson_dataformat_cbor
+  compile library.java.guava
 
 Review comment:
   @kennknowles Thank you for details. Yes, there is a known discrepancy 
between guava versions used in different Kinesis libs and I believe we test it 
sufficiently with unit and integration tests. Though, we still have a task to 
move IT into Apache Jenkins env and run it there (the problem that we run it 
against real AWS instances but it's another question, not elated to this PR).
   
   Back to initial question, to make it clear - I don't insist, I just asked to 
clarify =) And I agree that since we know that it's working we should not worry 
too much about that. Kinesis client library is [open 
source](https://github.com/awslabs/amazon-kinesis-client), so yes, we can check 
it if needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348191)
Time Spent: 2h 10m  (was: 2h)

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8747?focusedWorklogId=348194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348194
 ]

ASF GitHub Bot logged work on BEAM-8747:


Author: ASF GitHub Bot
Created on: 22/Nov/19 15:42
Start Date: 22/Nov/19 15:42
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10172: 
[BEAM-8747] Guava dependency cleanup
URL: https://github.com/apache/beam/pull/10172#discussion_r349659798
 
 

 ##
 File path: sdks/java/io/kinesis/build.gradle
 ##
 @@ -34,6 +34,7 @@ dependencies {
   compile library.java.slf4j_api
   compile library.java.joda_time
   compile library.java.jackson_dataformat_cbor
+  compile library.java.guava
 
 Review comment:
   So, it's LGTM for KinesisIO from my side
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348194)
Time Spent: 2h 20m  (was: 2h 10m)

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=348196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348196
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 22/Nov/19 15:46
Start Date: 22/Nov/19 15:46
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184#issuecomment-557583841
 
 
   @kennknowles 
   
   > I have opinions about maybe more user friendly ways to do it but with 
unsolved gaps so I might just be having wishful thinking.
   
   If that's for Linkage Checker, I appreciate if you can file your thoughts in 
  https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348196)
Time Spent: 3h 20m  (was: 3h 10m)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=348197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348197
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 22/Nov/19 15:49
Start Date: 22/Nov/19 15:49
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: 
Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-557584883
 
 
   R: @mxm 
   R: @lukecwik
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348197)
Time Spent: 2.5h  (was: 2h 20m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-22 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 4:58 PM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python3.6 from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}



h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}

Installed {{sudo apt-get install python3-dev}}. It didn't work. Installing 
python3.5 from source code resolved the issue.

{noformat}
suztomo@suxtomo24:/tmp/cpython$ git checkout v3.5.9 
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 

[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-22 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 4:59 PM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python3.6 from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}



h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}

Installed {{sudo apt-get install python3-dev}}. It didn't work. Installing 
python3.5 from source code resolved the issue.

{noformat}
suztomo@suxtomo24:/tmp/cpython$ git checkout v3.5.9 
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local
suztomo@suxtomo24:/tmp/cpython$ make install
suztomo@suxtomo24:/tmp/cpython$ which python3.5
/usr/local/google/home/suztomo/local/bin/python3.5
{noformat}




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python3.6 from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyrigh

[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-22 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 4:59 PM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python3.6 from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}



h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}

Tried {{sudo apt-get install python3-dev}} but it didn't work. Installing 
python3.5 from source code resolved the issue.

{noformat}
suztomo@suxtomo24:/tmp/cpython$ git checkout v3.5.9 
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local
suztomo@suxtomo24:/tmp/cpython$ make install
suztomo@suxtomo24:/tmp/cpython$ which python3.5
/usr/local/google/home/suztomo/local/bin/python3.5
{noformat}




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python3.6 from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright

[jira] [Comment Edited] (BEAM-8787) Python setup issues

2019-11-22 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979785#comment-16979785
 ] 

Tomo Suzuki edited comment on BEAM-8787 at 11/22/19 5:00 PM:
-

h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build and install the python version from the source after installing 
libbz2-dev:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11:37) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from distutils import sysconfig
>>> 
{noformat}
 
Now {{:sdks:python:test-suites:tox:py35:setupVirtualenv}} succeeds

{noformat}
suztomo@suxtomo24:~/beam4$ ./gradlew -p sdks/python/test-suites/tox/py35 
setupVirtualenv
...
> Task :sdks:python:test-suites:tox:py35:setupVirtualenv
...
BUILD SUCCESSFUL in 5s
{noformat}



h2. testPy36Gcp failrue

The {{:sdks:python:test-suites:tox:py36:testPy36Gcp}} was failing:
https://gist.github.com/suztomo/ebfc110652b8ffaf7fede64276d7a053

It seemed that _bz2 library was missing for the Python3.6. Followed 
[Stackoverflow: No module named '_bz2' in 
python3|https://stackoverflow.com/questions/20280726/how-to-git-clone-a-specific-tag/24102558].


h2. testPy35Cython failure


{noformat}
./gradlew :sdks:python:test-suites:tox:py35:testPy35Cython
...
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall 
-Wstrict-prototypes -g 
-fdebug-prefix-map=/build/python3.5-ta1Uke/python3.5-3.5.4=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m 
-I/usr/local/google/home/suztomo/beam4/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/target/.tox-py35-cython/py35-cython/include/python3.5m
 -c apache_beam/coders/stream.c -o 
build/temp.linux-x86_64-3.5/apache_beam/coders/stream.o
apache_beam/coders/stream.c:17:10: fatal error: Python.h: No such file or 
directory
 #include "Python.h"
  ^~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sdks:python:test-suites:tox:py35:testPy35Cython'.

{noformat}

Tried {{sudo apt-get install python3-dev}} but it didn't work. Installing 
python3.5 from source code resolved the issue.

{noformat}
suztomo@suxtomo24:/tmp/cpython$ git checkout v3.5.9 
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local
suztomo@suxtomo24:/tmp/cpython$ make install
suztomo@suxtomo24:/tmp/cpython$ which python3.5
/usr/local/google/home/suztomo/local/bin/python3.5
{noformat}




was (Author: suztomo):
h1. Problem

The problem for my environment was that Python3.6 was missing required module 
{{distutils.sysconfig}} and the latest python3-disutils does not support 
Python3.6.

h1. Solution

Build Python3.6 from the source:
{noformat}
suztomo@suxtomo24:/tmp$ sudo apt-get install libbz2-dev # This is needed for 
Python3.6's _bz package

suztomo@suxtomo24:/tmp$ git clone --branch v3.6.8 
https://github.com/python/cpython.git
...
suztomo@suxtomo24:/tmp$ cd cpython
suztomo@suxtomo24:/tmp/cpython$ git status
Not currently on any branch.
nothing to commit, working tree clean
suztomo@suxtomo24:/tmp/cpython$ git log -1
commit 3c6b436a57893dd1fae4e072768f41a199076252 (HEAD, tag: v3.6.8)
Author: Ned Deily 
Date:   Sun Dec 23 16:37:14 2018 -0500

3.6.8final
suztomo@suxtomo24:/tmp/cpython$ ./configure --prefix=$HOME/local # pick up your 
preference
...
suztomo@suxtomo24:/tmp/cpython$ make install
{noformat}
Add the directory to the path with "/bin" appended. In {{~/.bashrc}}:
{noformat}
export PATH=$HOME/local/bin:$PATH
{noformat}
Now disutils.sysconfig module is available for Python3.6:
{noformat}
suztomo@suxtomo24:/tmp/cpython$ python3.6
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Nov 21 2019, 21:11

[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=348219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348219
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 17:12
Start Date: 22/Nov/19 17:12
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10161: [BEAM-8746] 
Make local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r349703328
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   Yeah, that's probably the most future-proof and versatile solution. 
   
   What do you think about terminology for this?
   
   1. `get_bind_address()`, `get_connect_address()`
   2. `get_bind_address()`, `get_service_address()`
   3. `get_bind_address()`, `get_hostname()`
   
   Something else?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348219)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=348221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348221
 ]

ASF GitHub Bot logged work on BEAM-3865:


Author: ASF GitHub Bot
Created on: 22/Nov/19 17:14
Start Date: 22/Nov/19 17:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10192: [BEAM-3865] 
Stronger trigger tests.
URL: https://github.com/apache/beam/pull/10192#issuecomment-557616357
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348221)
Time Spent: 2h 20m  (was: 2h 10m)

> Incorrect timestamp on merging window outputs.
> --
>
> Key: BEAM-3865
> URL: https://issues.apache.org/jira/browse/BEAM-3865
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Looks like we're setting multiple watermark holds with one arbitrarily being 
> held. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=348220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348220
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 17:14
Start Date: 22/Nov/19 17:14
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10161: [BEAM-8746] 
Make local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r349704203
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   ```python
 def get_bind_address(self):
   """Return the address used to open the port on the gRPC server.
   """
   return self.get_connect_address()
   
 def get_connect_address(self):
   """Return the host name at which this server will be accessible.
   
   In particular, this is provided to the client upon connection as the
   artifact staging endpoint.
   """
   return 'localhost'
   
 def start_grpc_server(self, port=0):
   self._server = grpc.server(UnboundedThreadPoolExecutor())
   port = self._server.add_insecure_port('%s:%d' % 
(self.get_bind_address(), port))
   beam_job_api_pb2_grpc.add_JobServiceServicer_to_server(self, 
self._server)
   beam_artifact_api_pb2_grpc.add_ArtifactStagingServiceServicer_to_server(
   self._artifact_service, self._server)
   hostname = self.get_connect_address()
   self._artifact_staging_endpoint = endpoints_pb2.ApiServiceDescriptor(
   url='%s:%d' % (hostname, port))
   self._server.start()
   _LOGGER.info('Grpc server started at %s on port %d' % (hostname, port))
   return port
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348220)
Time Spent: 2h  (was: 1h 50m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=348223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348223
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 22/Nov/19 17:30
Start Date: 22/Nov/19 17:30
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-557621776
 
 
   @mxm How can I go about manually adding gauges? Does that mean changing the 
FlinkRunner to publish gauge metrics?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348223)
Time Spent: 6h 20m  (was: 6h 10m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8808) TestBigQueryOptions is never registered

2019-11-22 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-8808:
---

 Summary: TestBigQueryOptions is never registered
 Key: BEAM-8808
 URL: https://issues.apache.org/jira/browse/BEAM-8808
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Brian Hulette
Assignee: Brian Hulette


So it's not possible to set targetDataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

2019-11-22 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980365#comment-16980365
 ] 

Valentyn Tymofieiev commented on BEAM-8198:
---

There was a suspicion of another regression on the thread but I was not able to 
reproduce it. Closing for now.

> Investigate possible performance regression of Wordcount 1GB batch benchmark 
> on Py3.
> 
>
> Key: BEAM-8198
> URL: https://issues.apache.org/jira/browse/BEAM-8198
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.17.0
>
>
> context: 
> https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E
> Setting fix version to 2.16.0 to understand the cause, hopefully before the 
> vote.
> cc: [~altay] [~thw] [~markflyhigh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

2019-11-22 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-8198.
---
Fix Version/s: (was: 2.17.0)
   Not applicable
   Resolution: Cannot Reproduce

> Investigate possible performance regression of Wordcount 1GB batch benchmark 
> on Py3.
> 
>
> Key: BEAM-8198
> URL: https://issues.apache.org/jira/browse/BEAM-8198
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>
> context: 
> https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E
> Setting fix version to 2.16.0 to understand the cause, hopefully before the 
> vote.
> cc: [~altay] [~thw] [~markflyhigh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

2019-11-22 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980370#comment-16980370
 ] 

Valentyn Tymofieiev commented on BEAM-8198:
---

Sorry for a delayed response, didn't realize it was tagged as release blocker.

> Investigate possible performance regression of Wordcount 1GB batch benchmark 
> on Py3.
> 
>
> Key: BEAM-8198
> URL: https://issues.apache.org/jira/browse/BEAM-8198
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>
> context: 
> https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E
> Setting fix version to 2.16.0 to understand the cause, hopefully before the 
> vote.
> cc: [~altay] [~thw] [~markflyhigh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8747?focusedWorklogId=348232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348232
 ]

ASF GitHub Bot logged work on BEAM-8747:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:01
Start Date: 22/Nov/19 18:01
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10172: [BEAM-8747] 
Guava dependency cleanup
URL: https://github.com/apache/beam/pull/10172#discussion_r349722981
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/counters/CounterUpdateAggregators.java
 ##
 @@ -18,11 +18,11 @@
 package org.apache.beam.runners.dataflow.worker.counters;
 
 import com.google.api.services.dataflow.model.CounterUpdate;
-import com.google.common.collect.ImmutableMap;
 import java.util.Collections;
 import java.util.List;
 import java.util.Map;
 import 
org.apache.beam.runners.dataflow.worker.MetricsToCounterUpdateConverter.Kind;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
 
 Review comment:
   Yes, I see it's disabled.
   
https://github.com/apache/beam/blob/c2f0d28/runners/google-cloud-dataflow-java/worker/build.gradle#L111-L113
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348232)
Time Spent: 2.5h  (was: 2h 20m)

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=348234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348234
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:04
Start Date: 22/Nov/19 18:04
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8457: [BEAM-3342] 
Create a Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-557633005
 
 
   Please update the status. It's good if we can get this in by the end of the 
year.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348234)
Time Spent: 41.5h  (was: 41h 20m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 41.5h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=348236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348236
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:11
Start Date: 22/Nov/19 18:11
Worklog Time Spent: 10m 
  Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-557635542
 
 
   As per discussion with Dataflow team, further testing is warranted under 
Python 3 environment - to be done shortly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348236)
Time Spent: 41h 40m  (was: 41.5h)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 41h 40m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8791) [2.17.0 Release Validation] Run Python PreCommit times out

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8791?focusedWorklogId=348241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348241
 ]

ASF GitHub Bot logged work on BEAM-8791:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:26
Start Date: 22/Nov/19 18:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10197: [BEAM-8791] 
Cherry-pick PR # 9985 to 2.17.0 release branch to reduce precommit times.
URL: https://github.com/apache/beam/pull/10197#issuecomment-557640445
 
 
   R: @Ardagan 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348241)
Time Spent: 20m  (was: 10m)

> [2.17.0 Release Validation] Run Python PreCommit times out
> --
>
> Key: BEAM-8791
> URL: https://issues.apache.org/jira/browse/BEAM-8791
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Mikhail Gryzykhin
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Code version: 
> [https://github.com/apache/beam/pull/9884/commits/d52355c9712b5ed85900a941f50317c5ab9252cb]
> [Job: 
> https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1051/|https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1051/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=348242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348242
 ]

ASF GitHub Bot logged work on BEAM-8658:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:26
Start Date: 22/Nov/19 18:26
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10163: [BEAM-8658] 
[BEAM-8781] Optionally set jar and artifact staging port …
URL: https://github.com/apache/beam/pull/10163
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348242)
Time Spent: 1h 10m  (was: 1h)

> Optionally set artifact staging port in FlinkUberJarJobServer
> -
>
> Key: BEAM-8658
> URL: https://issues.apache.org/jira/browse/BEAM-8658
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In certain network environments, port forwarding is necessary for our GRPC 
> servers, such as the artifact staging server. Currently, the port for 
> FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We 
> will need to let the user choose it if they are to forward that port.
> https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8791) [2.17.0 Release Validation] Run Python PreCommit times out

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8791?focusedWorklogId=348240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348240
 ]

ASF GitHub Bot logged work on BEAM-8791:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:26
Start Date: 22/Nov/19 18:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10197: [BEAM-8791] 
Cherry-pick PR # 9985 to 2.17.0 release branch to reduce precommit times.
URL: https://github.com/apache/beam/pull/10197
 
 
   This is a cherry-pick PR #9985. 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/be

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348244
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:27
Start Date: 22/Nov/19 18:27
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#issuecomment-557640601
 
 
   Reviewer's feedback is addressed. Waiting for reviewer to resolve 
conversations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348244)
Time Spent: 18h 40m  (was: 18.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348243
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:27
Start Date: 22/Nov/19 18:27
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#issuecomment-557640601
 
 
   Reviewer's feedback are addressed. Waiting for reviewer to resolve 
conversations.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348243)
Time Spent: 18.5h  (was: 18h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348245
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:28
Start Date: 22/Nov/19 18:28
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#issuecomment-557641095
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348245)
Time Spent: 18h 50m  (was: 18h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=348246&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348246
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:29
Start Date: 22/Nov/19 18:29
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10035: 
[BEAM-8581] and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r349733975
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -965,18 +1000,21 @@ class TriggerDriver(with_metaclass(ABCMeta, object)):
   """Breaks a series of bundle and timer firings into window (pane)s."""
 
   @abstractmethod
-  def process_elements(self, state, windowed_values, output_watermark):
+  def process_elements(self, state, windowed_values, output_watermark,
+   input_watermark=MIN_TIMESTAMP):
 pass
 
   @abstractmethod
-  def process_timer(self, window_id, name, time_domain, timestamp, state):
+  def process_timer(self, window_id, name, time_domain, timestamp, state,
+input_watermark=None):
 pass
 
   def process_entire_key(
-  self, key, windowed_values, output_watermark=MIN_TIMESTAMP):
+  self, key, windowed_values, input_watermark=MIN_TIMESTAMP,
 
 Review comment:
   Yeah, the only other place where these are called directly are in the Batch 
Dataflow Python Worker. So by setting these to MIN_TIMESTAMP we go back to the 
old implementation which didn't affect Batch.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348246)
Time Spent: 4h 40m  (was: 4.5h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=348247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348247
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:29
Start Date: 22/Nov/19 18:29
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10035: 
[BEAM-8581] and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#discussion_r349733981
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1036,14 +1074,17 @@ class BatchGlobalTriggerDriver(TriggerDriver):
   index=0,
   nonspeculative_index=0)
 
-  def process_elements(self, state, windowed_values, unused_output_watermark):
+  def process_elements(self, state, windowed_values,
+   unused_output_watermark=MIN_TIMESTAMP,
 
 Review comment:
   Ack, I set these to None.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348247)
Time Spent: 4h 50m  (was: 4h 40m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=348248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348248
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:30
Start Date: 22/Nov/19 18:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10185: [BEAM-8651] 
Cherrypick PR #10167 to the release branch. 
URL: https://github.com/apache/beam/pull/10185#issuecomment-557641695
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348248)
Time Spent: 3h  (was: 2h 50m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Several Beam users reported an intermittent error which happens during 
> unpickling in StockUnpickler.find_class. A similar error happens consistently 
> when user's pipelines have instances of super() in their main module, and use 
> --save_main_session, see: 
> [BEAM-6158|https://issues.apache.org/jira/browse/BEAM-6158?focusedCommentId=16919945&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16919945].
>  
> In this case the error happens only sometimes, and super() calls don't play a 
> role.  
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=348251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348251
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:35
Start Date: 22/Nov/19 18:35
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-557643361
 
 
   Argh, sorry for the force pushes. Local repo got into a weird state.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348251)
Time Spent: 5h  (was: 4h 50m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=348252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348252
 ]

ASF GitHub Bot logged work on BEAM-8743:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:41
Start Date: 22/Nov/19 18:41
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10158: [BEAM-8743] Add 
support for flat schemas in pubsub
URL: https://github.com/apache/beam/pull/10158#issuecomment-557645856
 
 
   I suppose this PR doesn't included the OPTION syntax as proposed in the 
mailing list. If the OPTION syntax is introduced would this include 3 mapping 
possibilities?
   
   1) if option is available, use the options
   2) otherwise: use this PR
   3) default back to classic
   
   Isn't this problematic?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348252)
Time Spent: 2.5h  (was: 2h 20m)

> Add support for flat schemas in pubsub
> --
>
> Key: BEAM-8743
> URL: https://issues.apache.org/jira/browse/BEAM-8743
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> See 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=348259&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348259
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:50
Start Date: 22/Nov/19 18:50
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-557648937
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348259)
Time Spent: 1h 10m  (was: 1h)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=348260&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348260
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:50
Start Date: 22/Nov/19 18:50
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10105: [BEAM-4776] 
Add metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#discussion_r349742458
 
 

 ##
 File path: runners/flink/job-server/flink_job_server.gradle
 ##
 @@ -139,12 +139,9 @@ def portableValidatesRunnerTask(String name, Boolean 
streaming) {
   includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner'
   excludeCategories 
'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders'
   excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB'
-  excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics'
 
 Review comment:
   I'm not sure. I can create a PR to check this, that's a topic worth 
investigating.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348260)
Time Spent: 6.5h  (was: 6h 20m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=348258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348258
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:50
Start Date: 22/Nov/19 18:50
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [WIP/BEAM-7961] Add 
tests for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-557648937
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348258)
Time Spent: 1h  (was: 50m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=348261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348261
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:51
Start Date: 22/Nov/19 18:51
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10194: [BEAM-7594] Fix 
flaky filename generation
URL: https://github.com/apache/beam/pull/10194#discussion_r349742809
 
 

 ##
 File path: sdks/python/apache_beam/io/textio_test.py
 ##
 @@ -101,17 +100,19 @@ def write_data(
 return f.name, [line.decode('utf-8') for line in all_data]
 
 
-def write_pattern(lines_per_file, no_data=False):
+def write_pattern(lines_per_file, no_data=False, return_filenames=False):
 
 Review comment:
   I don't believe any of the other tests care about the filenames.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348261)
Time Spent: 1h  (was: 50m)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam

[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=348262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348262
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:52
Start Date: 22/Nov/19 18:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9949: [BEAM-7948] 
Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348262)
Time Spent: 3h 40m  (was: 3.5h)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7594) test_read_from_text_with_file_name_file_pattern is flaky

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7594?focusedWorklogId=348263&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348263
 ]

ASF GitHub Bot logged work on BEAM-7594:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:52
Start Date: 22/Nov/19 18:52
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10194: [BEAM-7594] Fix flaky 
filename generation
URL: https://github.com/apache/beam/pull/10194#issuecomment-557649654
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348263)
Time Spent: 1h 10m  (was: 1h)

> test_read_from_text_with_file_name_file_pattern is flaky
> 
>
> Key: BEAM-7594
> URL: https://issues.apache.org/jira/browse/BEAM-7594
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Critical
>  Labels: currently-failing, flake
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> cc: [~lcaggio] [~chamikara]
> {noformat}
> 22:05:08 
> ==
> 22:05:08 ERROR: test_read_from_text_with_file_name_file_pattern 
> (apache_beam.io.textio_test.TextSourceTest)
> 22:05:08 
> --
> 22:05:08 Traceback (most recent call last):
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/io/textio_test.py",
>  line 517, in test_read_from_text_with_file_name_file_pattern
> 22:05:08 pipeline.run()
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 22:05:08 else test_runner_api))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 22:05:08 self._options).run(False)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 22:05:08 return self.runner.run_pipeline(self, self._options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 22:05:08 return runner.run_pipeline(pipeline, options)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 294, in run_pipeline
> 22:05:08 default_environment=self._default_environment))
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 301, in run_via_runner_api
> 22:05:08 return self.run_stages(stage_context, stages)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 383, in run_stages
> 22:05:08 stage_context.safe_coders)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 655, in _run_stage
> 22:05:08 result, splits = bundle_manager.process_bundle(data_input, 
> data_output)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1471, in process_bundle
> 22:05:08 result_future = 
> self._controller.control_handler.push(process_bundle_req)
> 22:05:08   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Phrase/src/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 99

[jira] [Created] (BEAM-8809) AvroWriteRequest should have a public constructor

2019-11-22 Thread Steve Niemitz (Jira)
Steve Niemitz created BEAM-8809:
---

 Summary: AvroWriteRequest should have a public constructor
 Key: BEAM-8809
 URL: https://issues.apache.org/jira/browse/BEAM-8809
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Steve Niemitz
Assignee: Steve Niemitz


AvroWriteRequest currently has an internal constructor, which prevents users 
from unit testing their avro format functions for the BQ writer.  This 
constructor should be public.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8809) AvroWriteRequest should have a public constructor

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8809?focusedWorklogId=348265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348265
 ]

ASF GitHub Bot logged work on BEAM-8809:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:55
Start Date: 22/Nov/19 18:55
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10199: 
[BEAM-8809] Make the constructor for AvroWriteRequest public
URL: https://github.com/apache/beam/pull/10199
 
 
   It's currently not possible to test format functions for the BQ writer 
because it's impossible to construct instances of AvroWriteRequest from outside 
of the IO.  Users should be able to construct instances of it for testing 
purposes.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/jo

[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=348267&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348267
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 22/Nov/19 18:55
Start Date: 22/Nov/19 18:55
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10105: [BEAM-4776] 
Add metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#discussion_r349744114
 
 

 ##
 File path: runners/flink/job-server/flink_job_server.gradle
 ##
 @@ -139,12 +139,9 @@ def portableValidatesRunnerTask(String name, Boolean 
streaming) {
   includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner'
   excludeCategories 
'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders'
   excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB'
-  excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics'
 
 Review comment:
   https://github.com/apache/beam/pull/10198 it's here
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348267)
Time Spent: 6h 40m  (was: 6.5h)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=348276&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348276
 ]

ASF GitHub Bot logged work on BEAM-8743:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:07
Start Date: 22/Nov/19 19:07
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10158: [BEAM-8743] Add 
support for flat schemas in pubsub
URL: https://github.com/apache/beam/pull/10158#issuecomment-557655251
 
 
   I was thinking that when we add something like the OPTION syntax we would 
get rid of (3). I only kept it there now so that this is not a regression (i.e. 
anyone who is reading attributes with the classic approach can continue to do 
so). Does that seem reasonable?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348276)
Time Spent: 2h 40m  (was: 2.5h)

> Add support for flat schemas in pubsub
> --
>
> Key: BEAM-8743
> URL: https://issues.apache.org/jira/browse/BEAM-8743
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> See 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8810) Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs

2019-11-22 Thread Sam Whittle (Jira)
Sam Whittle created BEAM-8810:
-

 Summary: Dataflow runner - Work stuck in state COMMITTING with 
streaming commit rpcs
 Key: BEAM-8810
 URL: https://issues.apache.org/jira/browse/BEAM-8810
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Reporter: Sam Whittle
Assignee: Sam Whittle


In several pipelines using streaming engine and thus the streaming commit rpcs, 
work became stuck in state COMMITTING indefinitely.  Such stuckness coincided 
with repeated streaming rpc failures.

The status page shows that the key has work in state COMMITTING, and has 1 
queued work item.
There is a single active commit stream, with 0 pending requests.

The stream could exist past the stream deadline because the StreamCache only 
closes stream due to the deadline when a stream is retrieved, which only occurs 
if there are other commits.  Since the pipeline is stuck due to this event, 
there are no other commits.

It seems therefore there is some race on the commitStream between onNewStream 
and commitWork that either prevents work from being retried, an exception that 
triggers between when the pending request is removed and the callback is 
called, or some potential corruption of the activeWork data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=348278&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348278
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:10
Start Date: 22/Nov/19 19:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10191: [BEAM-8802] 
Don't clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191#discussion_r349750760
 
 

 ##
 File path: sdks/python/apache_beam/transforms/timeutil.py
 ##
 @@ -64,19 +64,19 @@ class TimestampCombinerImpl(with_metaclass(ABCMeta, 
object)):
 
   @abstractmethod
   def assign_output_time(self, window, input_timestamp):
-pass
+raise NotImplementedError
 
   @abstractmethod
   def combine(self, output_timestamp, other_output_timestamp):
-pass
+raise NotImplementedError
 
   def combine_all(self, merging_timestamps):
 """Apply combine to list of timestamps."""
 combined_output_time = None
 for output_time in merging_timestamps:
   if combined_output_time is None:
 combined_output_time = output_time
-  else:
+  elif output_time is not None:
 combined_output_time = self.combine(
 combined_output_time, output_time)
 
 Review comment:
   Currently the code can handle `None`s up until the first non-None value, and 
not after that. This makes it more consistent by allowing any values to be 
`None`. The else is to go onto the next value. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348278)
Time Spent: 1h  (was: 50m)

> Timestamp combiner not respected across bundles in streaming mode.
> --
>
> Key: BEAM-8802
> URL: https://issues.apache.org/jira/browse/BEAM-8802
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8802) Timestamp combiner not respected across bundles in streaming mode.

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8802?focusedWorklogId=348280&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348280
 ]

ASF GitHub Bot logged work on BEAM-8802:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:10
Start Date: 22/Nov/19 19:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10191: [BEAM-8802] 
Don't clear watermark hold when adding elements.
URL: https://github.com/apache/beam/pull/10191
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348280)
Time Spent: 1h 10m  (was: 1h)

> Timestamp combiner not respected across bundles in streaming mode.
> --
>
> Key: BEAM-8802
> URL: https://issues.apache.org/jira/browse/BEAM-8802
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=348279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348279
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:10
Start Date: 22/Nov/19 19:10
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10105: [BEAM-4776] 
Add metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#discussion_r349750769
 
 

 ##
 File path: runners/flink/job-server/flink_job_server.gradle
 ##
 @@ -139,12 +139,9 @@ def portableValidatesRunnerTask(String name, Boolean 
streaming) {
   includeCategories 'org.apache.beam.sdk.testing.ValidatesRunner'
   excludeCategories 
'org.apache.beam.sdk.testing.FlattenWithHeterogeneousCoders'
   excludeCategories 'org.apache.beam.sdk.testing.LargeKeys$Above100MB'
-  excludeCategories 'org.apache.beam.sdk.testing.UsesAttemptedMetrics'
 
 Review comment:
   Hmm, now that I think of it, you probably wanted to check the impact of my 
changes on the Spark runner, right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348279)
Time Spent: 6h 50m  (was: 6h 40m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-22 Thread Cyrus Maden (Jira)
Cyrus Maden created BEAM-8811:
-

 Summary: Upgrade Beam pipeline diagrams in docs
 Key: BEAM-8811
 URL: https://issues.apache.org/jira/browse/BEAM-8811
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: Cyrus Maden
Assignee: Cyrus Maden






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always

2019-11-22 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980434#comment-16980434
 ] 

Pablo Estrada commented on BEAM-8803:
-

After discussing with Cham, we will change the default behavior to always 
retry. We will also add notices on 2.15.0 and 2.16.0 to mention that the 
default behavior is to retry only on transient errors - not permanent errors, 
and that relying on the default can cause rows that cause permanent errors to 
be lost if the deadletter pcollection is not collected downstream.

> Default behaviour for Python BQ Streaming inserts sink should be to retry 
> always
> 
>
> Key: BEAM-8803
> URL: https://issues.apache.org/jira/browse/BEAM-8803
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348286&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348286
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:21
Start Date: 22/Nov/19 19:21
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-557659922
 
 
   This test (test_ConcatIntCombineFn_combine) is very similar to #10173 
(test_MeanCombineFn_combine), except that it is supposed to be used to test 
counters.
   
   We can either continue to add this test (test_ConcatIntCombineFn_combine)  
and then write the counter test using this pipeline,
   or skip this test and write the counter test using #10173 
(test_MeanCombineFn_combine) instead.
   Do you think which one is better?
   
   (An email with more details is sent to the reviewer.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348286)
Time Spent: 19h  (was: 18h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 19h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348296&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348296
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:33
Start Date: 22/Nov/19 19:33
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349760651
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
 
 Review comment:
   When should we use validates runner and when should not?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348296)
Time Spent: 19h 10m  (was: 19h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?focusedWorklogId=348294&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348294
 ]

ASF GitHub Bot logged work on BEAM-8586:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:33
Start Date: 22/Nov/19 19:33
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10061: [BEAM-8586] [SQL] 
Fix MongoDb integration tests
URL: https://github.com/apache/beam/pull/10061#issuecomment-555291639
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348294)
Time Spent: 1.5h  (was: 1h 20m)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8586) [SQL] Add a server for MongoDb Integration Test

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8586?focusedWorklogId=348295&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348295
 ]

ASF GitHub Bot logged work on BEAM-8586:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:33
Start Date: 22/Nov/19 19:33
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10061: [BEAM-8586] [SQL] 
Fix MongoDb integration tests
URL: https://github.com/apache/beam/pull/10061#issuecomment-557664454
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348295)
Time Spent: 1h 40m  (was: 1.5h)

> [SQL] Add a server for MongoDb Integration Test
> ---
>
> Key: BEAM-8586
> URL: https://issues.apache.org/jira/browse/BEAM-8586
> Project: Beam
>  Issue Type: Test
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We need to pass pipeline options with server information to the 
> MongoDbReadWriteIT.
> For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348298
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:35
Start Date: 22/Nov/19 19:35
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349761098
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
+  accumulation_mode=
+  AccumulationMode.ACCUMULATING)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2))
 
 Review comment:
   It is really unnecessary. I followed the Java parity. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348298)
Time Spent: 19h 20m  (was: 19h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348299
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:35
Start Date: 22/Nov/19 19:35
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349761098
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
+  accumulation_mode=
+  AccumulationMode.ACCUMULATING)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2))
 
 Review comment:
   It is really unnecessary. I followed the Java parity. I'll remove it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348299)
Time Spent: 19.5h  (was: 19h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348301
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:36
Start Date: 22/Nov/19 19:36
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349761098
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
+  accumulation_mode=
+  AccumulationMode.ACCUMULATING)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2))
 
 Review comment:
   It is really unnecessary. I followed the Java parity. There are quite a few 
things that don't make sense in the Java test. I'll remove it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348301)
Time Spent: 19h 40m  (was: 19.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348317
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349756938
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window.py
 ##
 @@ -139,13 +148,15 @@ class MergeContext(object):
 """Context passed to WindowFn.merge() to perform merging, if any."""
 
 def __init__(self, windows):
+  # type: (Iterable[Union[IntervalWindow, GlobalWindow]]) -> None
 
 Review comment:
   Iterable[BoundedWindow]
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348317)
Time Spent: 28h 20m  (was: 28h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348318
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349760006
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -136,14 +140,17 @@ def to_rfc3339(self):
 return self.to_utc_datetime().isoformat() + 'Z'
 
   def __float__(self):
+# type: () -> float
 
 Review comment:
   I'm surprised mypy doesn't know these. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348318)
Time Spent: 28.5h  (was: 28h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348316&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348316
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349757301
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window.py
 ##
 @@ -367,7 +381,10 @@ class FixedWindows(NonMergingWindowFn):
   range.
   """
 
-  def __init__(self, size, offset=0):
+  def __init__(self,
+   size,  # type: Union[int, float, Duration]
 
 Review comment:
   Maybe make this into a named type? (Similarly for the Timestamp one.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348316)
Time Spent: 28h 10m  (was: 28h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348319&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348319
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349762049
 
 

 ##
 File path: sdks/python/mypy.ini
 ##
 @@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+[mypy]
+python_version = 3.6
+ignore_missing_imports = true
+follow_imports = true
+warn_no_return = true
+no_implicit_optional = true
+warn_redundant_casts = true
+warn_unused_ignores = true
+show_error_codes = true
+files = apache_beam
+color_output = true
+# uncomment this to see how close we are to being complete
+# check_untyped_defs = true
+
+[mypy-apache_beam.coders.proto2_coder_test_messages_pb2]
+ignore_errors = true
+
+[mypy-apache_beam.examples.*]
+ignore_errors = true
+
+[mypy-apache_beam.io.gcp.gcsfilesystem_test]
+# error: Cannot infer type of lambda  [misc]
+ignore_errors = true
+
+[mypy-apache_beam.io.gcp.internal.clients.storage.storage_v1_client]
+ignore_errors = true
+
+[mypy-apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_client]
+ignore_errors = true
+
+[mypy-apache_beam.portability.api.*]
+ignore_errors = true
+
+[mypy-apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_client]
+ignore_errors = true
+
+[mypy-apache_beam.typehints.typed_pipeline_test_py3]
+# error: Signature of "process" incompatible with supertype "DoFn"  [override]
+ignore_errors = true
+
+[mypy-apache_beam.typehints.typehints_test_py3]
+# error: Signature of "process" incompatible with supertype "DoFn"  [override]
+ignore_errors = true
 
 Review comment:
   newline
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348319)
Time Spent: 28h 40m  (was: 28.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348321&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348321
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349761222
 
 

 ##
 File path: sdks/python/apache_beam/utils/windowed_value.py
 ##
 @@ -237,6 +255,7 @@ def create(value, timestamp_micros, windows, 
pane_info=PANE_INFO_UNKNOWN):
 
 
 try:
+  # FIXME: for review: why not add this as a class attribute?
 
 Review comment:
   See comment below. (This is an artifact of trying to make Cython an 
/optional/ dependency.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348321)
Time Spent: 28h 50m  (was: 28h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348315
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349761709
 
 

 ##
 File path: sdks/python/apache_beam/utils/windowed_value.py
 ##
 @@ -249,10 +268,13 @@ class _IntervalWindowBase(object):
   """Optimized form of IntervalWindow storing only microseconds for endpoints.
   """
 
-  def __init__(self, start, end):
+  def __init__(self,
+   start,  # type: Optional[Union[int, float, Timestamp]]
 
 Review comment:
   Hmm... why are these (and below) optional?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348315)
Time Spent: 28h  (was: 27h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348314&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348314
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349759842
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -57,6 +60,7 @@ def __init__(self, seconds=0, micros=0):
 
   @staticmethod
   def of(seconds):
+# type: (Union[int, float, Timestamp]) -> Timestamp
 
 Review comment:
   Somewhere in this file would be a good place to define this Union type.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348314)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348320&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348320
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349762313
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -25,20 +25,52 @@
 import sys
 import warnings
 from distutils import log
+from distutils.errors import DistutilsError
 from distutils.version import StrictVersion
 
 # Pylint and isort disagree here.
 # pylint: disable=ungrouped-imports
 import setuptools
 from pkg_resources import DistributionNotFound
 from pkg_resources import get_distribution
+from pkg_resources import normalize_path
+from pkg_resources import to_filename
+from setuptools import Command
 from setuptools.command.build_py import build_py
-# TODO: (BEAM-8411): re-enable lint check.
-from setuptools.command.develop import develop  # pylint: disable-all
+from setuptools.command.develop import develop
 from setuptools.command.egg_info import egg_info
 from setuptools.command.test import test
 
 
+class mypy(Command):
+  user_options = []
+
+  def initialize_options(self):
+"""Abstract method that is required to be overwritten"""
+
+  def finalize_options(self):
+"""Abstract method that is required to be overwritten"""
+
+  def get_project_path(self):
+self.run_command('egg_info')
+
+# Build extensions in-place
+self.reinitialize_command('build_ext', inplace=1)
+self.run_command('build_ext')
+
+ei_cmd = self.get_finalized_command("egg_info")
+
+project_path = normalize_path(ei_cmd.egg_base)
+return os.path.join(project_path, to_filename(ei_cmd.egg_name))
+
+  def run(self):
+import subprocess
+args = ['mypy', self.get_project_path()]
+result = subprocess.call(args)
+if result != 0:
+  raise DistutilsError()
 
 Review comment:
   No message?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348320)
Time Spent: 28h 50m  (was: 28h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 28h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348313
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 19:41
Start Date: 22/Nov/19 19:41
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349757429
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window.py
 ##
 @@ -505,7 +527,8 @@ def get_window_coder(self):
 return coders.IntervalWindowCoder()
 
   def merge(self, merge_context):
-to_merge = []
+# type: (WindowFn.MergeContext) -> None
+to_merge = []  # type: List[Union[IntervalWindow, GlobalWindow]]
 
 Review comment:
   List[BoundedWindow]
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348313)
Time Spent: 27h 50m  (was: 27h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=348336&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348336
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:01
Start Date: 22/Nov/19 20:01
Worklog Time Spent: 10m 
  Work Description: tamera-lanham commented on pull request #9955: 
[BEAM-2572] Python SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#discussion_r349771600
 
 

 ##
 File path: sdks/python/apache_beam/io/aws/s3io.py
 ##
 @@ -0,0 +1,604 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""AWS S3 client
+"""
+
+from __future__ import absolute_import
+
+import errno
+import io
+import logging
+import re
+import time
+import traceback
+from builtins import object
+
+from apache_beam.io.aws.clients.s3 import messages
+from apache_beam.io.filesystemio import Downloader
+from apache_beam.io.filesystemio import DownloaderStream
+from apache_beam.io.filesystemio import Uploader
+from apache_beam.io.filesystemio import UploaderStream
+from apache_beam.utils import retry
+
+try:
+  # pylint: disable=wrong-import-order, wrong-import-position
+  # pylint: disable=ungrouped-imports
+  from apache_beam.io.aws.clients.s3 import boto3_client
+  BOTO3_INSTALLED = True
+except ImportError:
+  BOTO3_INSTALLED = False
+
+MAX_BATCH_OPERATION_SIZE = 100
+
+
+def parse_s3_path(s3_path, object_optional=False):
+  """Return the bucket and object names of the given s3:// path."""
+  match = re.match('^s3://([^/]+)/(.*)$', s3_path)
+  if match is None or (match.group(2) == '' and not object_optional):
+raise ValueError('S3 path must be in the form s3:///.')
+  return match.group(1), match.group(2)
+
+
+class S3IO(object):
+  """S3 I/O client."""
+
+  def __init__(self, client=None):
+if client is not None:
+  self.client = client
+elif BOTO3_INSTALLED:
+  self.client = boto3_client.Client()
+else:
+  message = 'AWS dependencies are not installed, and no alternative ' \
+  'client was provided to S3IO.'
+  raise RuntimeError(message)
+
+  def open(self,
+   filename,
+   mode='r',
+   read_buffer_size=16*1024*1024,
+   mime_type='application/octet-stream'):
+"""Open an S3 file path for reading or writing.
+
+Args:
+  filename (str): S3 file path in the form ``s3:///``.
+  mode (str): ``'r'`` for reading or ``'w'`` for writing.
+  read_buffer_size (int): Buffer size to use during read operations.
+  mime_type (str): Mime type to set for write operations.
 
 Review comment:
   This is addressed with the most recent changes, so the `mime_type` that the 
user sets will be reflected in the `ContentType` of the object in S3
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348336)
Time Spent: 2h 10m  (was: 2h)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348339
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:19
Start Date: 22/Nov/19 20:19
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349778092
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -136,14 +140,17 @@ def to_rfc3339(self):
 return self.to_utc_datetime().isoformat() + 'Z'
 
   def __float__(self):
+# type: () -> float
 
 Review comment:
   It may, but I didn't test it.  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348339)
Time Spent: 29h  (was: 28h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 29h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8803?focusedWorklogId=348338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348338
 ]

ASF GitHub Bot logged work on BEAM-8803:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:19
Start Date: 22/Nov/19 20:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10195: [BEAM-8803] BigQuery 
Streaming Inserts are always retried by default.
URL: https://github.com/apache/beam/pull/10195#issuecomment-557679462
 
 
   r: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348338)
Time Spent: 20m  (was: 10m)

> Default behaviour for Python BQ Streaming inserts sink should be to retry 
> always
> 
>
> Key: BEAM-8803
> URL: https://issues.apache.org/jira/browse/BEAM-8803
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348340
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:20
Start Date: 22/Nov/19 20:20
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349778574
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window.py
 ##
 @@ -505,7 +527,8 @@ def get_window_coder(self):
 return coders.IntervalWindowCoder()
 
   def merge(self, merge_context):
-to_merge = []
+# type: (WindowFn.MergeContext) -> None
+to_merge = []  # type: List[Union[IntervalWindow, GlobalWindow]]
 
 Review comment:
   Changing this and the other use of `Union[IntervalWindow, GlobalWindow]` to 
`BoundedWindow` results in these errors:
   
   ```
   apache_beam/transforms/window.py:541: error: "BoundedWindow" has no 
attribute "start"  [attr-defined]
   apache_beam/transforms/window.py:543: error: "BoundedWindow" has no 
attribute "start"  [attr-defined]
   apache_beam/transforms/window.py:550: error: "BoundedWindow" has no 
attribute "start"  [attr-defined]
   apache_beam/transforms/window.py:557: error: "BoundedWindow" has no 
attribute "start"  [attr-defined]
   ```
   
   The docs for `BoundedWindow` say that it's for timestamps in the range 
(-infinity, end).  
   
   How about we add a `start` property to `BoundedWindow`:
   
   ```python
   class BoundedWindow(object):
 """A window for timestamps in range (-infinity, end).
   
 Attributes:
   end: End of window.
 """
   
 def __init__(self, end):
   # type: (Union[int, float, Timestamp]) -> None
   self._end = Timestamp.of(end)
   
 @property
 def start(self):
   # type: () -> Timestamp
   return MAX_TIMESTAMP
   ```
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348340)
Time Spent: 29h 10m  (was: 29h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 29h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348342&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348342
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:21
Start Date: 22/Nov/19 20:21
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349778923
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -477,16 +581,22 @@ def __init__(
 self._key_coder = key_coder
 self._window_coder = window_coder
 self._timer_specs = timer_specs
-self._timer_receivers = None
-self._all_states = {}
+self._timer_receivers = None  # type: Optional[Dict[str, 
operations.ConsumerSet]]
+self._all_states = {}  # type: Dict[tuple, 
Union[SynchronousBagRuntimeState, SynchronousSetRuntimeState, 
CombiningValueRuntimeState]]
 
 Review comment:
   This is still unresolved.  Shall I add `commit` abstract method to 
`AccumulatingRuntimeState`, and rename `_commit` to `commit` on the concrete 
implementations?

 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348342)
Time Spent: 29h 20m  (was: 29h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 29h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=348346&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348346
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:33
Start Date: 22/Nov/19 20:33
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349783029
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/GaugeCell.java
 ##
 @@ -50,6 +50,12 @@ public GaugeCell(MetricName name) {
 this.name = name;
   }
 
+  @Override
+  public void reset() {
+dirty.afterModification();
 
 Review comment:
   Your right, I was under the impression that it wasn't dirty initially so 
please ignore my comments about changing reset & dirty.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348346)
Time Spent: 2h  (was: 1h 50m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348349
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:39
Start Date: 22/Nov/19 20:39
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349785083
 
 

 ##
 File path: sdks/python/apache_beam/utils/windowed_value.py
 ##
 @@ -249,10 +268,13 @@ class _IntervalWindowBase(object):
   """Optimized form of IntervalWindow storing only microseconds for endpoints.
   """
 
-  def __init__(self, start, end):
+  def __init__(self,
+   start,  # type: Optional[Union[int, float, Timestamp]]
 
 Review comment:
   Yeah, this code is a bit obtuse. 
   
   There is one place in the code that I can see None being passed, in 
`IntervalWindowCoderImpl`:
   
   ```python
 def decode_from_stream(self, in_, nested):
   # type: (create_InputStream, bool) -> IntervalWindow
   if not TYPE_CHECKING:
 global IntervalWindow
 if IntervalWindow is None:
   from apache_beam.transforms.window import IntervalWindow
   typed_value = IntervalWindow(None, None)
   typed_value._end_micros = (
   1000 * self._to_normal_time(in_.read_bigendian_uint64()))
   typed_value._start_micros = (
   typed_value._end_micros - 1000 * in_.read_var_int64())
   return typed_value
   ```
   
   The problem with the `None` code path is that calling almost any method on 
the resulting instance will result in `AttributeError` because neither 
`_start_micros` nor `_end_micros` is created.  You can see in the example 
above, those values are patched in, which works, but it's pretty hacky.
   
   It seems like this would be more straight-forward:
   
   ```python
 def decode_from_stream(self, in_, nested):
   # type: (create_InputStream, bool) -> IntervalWindow
   if not TYPE_CHECKING:
 global IntervalWindow
 if IntervalWindow is None:
   from apache_beam.transforms.window import IntervalWindow
   end_micros = 1000 * self._to_normal_time(in_.read_bigendian_uint64())
   end = Timestamp(0, end_micros)
   start = Timestamp(0, end_micros - 1000 * in_.read_var_int64())
   typed_value = IntervalWindow(start, end)
   return typed_value
   ```
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348349)
Time Spent: 29.5h  (was: 29h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 29.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348351
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:40
Start Date: 22/Nov/19 20:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349785515
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -25,20 +25,52 @@
 import sys
 import warnings
 from distutils import log
+from distutils.errors import DistutilsError
 from distutils.version import StrictVersion
 
 # Pylint and isort disagree here.
 # pylint: disable=ungrouped-imports
 import setuptools
 from pkg_resources import DistributionNotFound
 from pkg_resources import get_distribution
+from pkg_resources import normalize_path
+from pkg_resources import to_filename
+from setuptools import Command
 from setuptools.command.build_py import build_py
-# TODO: (BEAM-8411): re-enable lint check.
-from setuptools.command.develop import develop  # pylint: disable-all
+from setuptools.command.develop import develop
 from setuptools.command.egg_info import egg_info
 from setuptools.command.test import test
 
 
+class mypy(Command):
+  user_options = []
+
+  def initialize_options(self):
+"""Abstract method that is required to be overwritten"""
+
+  def finalize_options(self):
+"""Abstract method that is required to be overwritten"""
+
+  def get_project_path(self):
+self.run_command('egg_info')
+
+# Build extensions in-place
+self.reinitialize_command('build_ext', inplace=1)
+self.run_command('build_ext')
+
+ei_cmd = self.get_finalized_command("egg_info")
+
+project_path = normalize_path(ei_cmd.egg_base)
+return os.path.join(project_path, to_filename(ei_cmd.egg_name))
+
+  def run(self):
+import subprocess
+args = ['mypy', self.get_project_path()]
+result = subprocess.call(args)
+if result != 0:
+  raise DistutilsError()
 
 Review comment:
   fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348351)
Time Spent: 29h 50m  (was: 29h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 29h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348350
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:40
Start Date: 22/Nov/19 20:40
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349785473
 
 

 ##
 File path: sdks/python/mypy.ini
 ##
 @@ -0,0 +1,60 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+[mypy]
+python_version = 3.6
+ignore_missing_imports = true
+follow_imports = true
+warn_no_return = true
+no_implicit_optional = true
+warn_redundant_casts = true
+warn_unused_ignores = true
+show_error_codes = true
+files = apache_beam
+color_output = true
+# uncomment this to see how close we are to being complete
+# check_untyped_defs = true
+
+[mypy-apache_beam.coders.proto2_coder_test_messages_pb2]
+ignore_errors = true
+
+[mypy-apache_beam.examples.*]
+ignore_errors = true
+
+[mypy-apache_beam.io.gcp.gcsfilesystem_test]
+# error: Cannot infer type of lambda  [misc]
+ignore_errors = true
+
+[mypy-apache_beam.io.gcp.internal.clients.storage.storage_v1_client]
+ignore_errors = true
+
+[mypy-apache_beam.io.gcp.internal.clients.bigquery.bigquery_v2_client]
+ignore_errors = true
+
+[mypy-apache_beam.portability.api.*]
+ignore_errors = true
+
+[mypy-apache_beam.runners.dataflow.internal.clients.dataflow.dataflow_v1b3_client]
+ignore_errors = true
+
+[mypy-apache_beam.typehints.typed_pipeline_test_py3]
+# error: Signature of "process" incompatible with supertype "DoFn"  [override]
+ignore_errors = true
+
+[mypy-apache_beam.typehints.typehints_test_py3]
+# error: Signature of "process" incompatible with supertype "DoFn"  [override]
+ignore_errors = true
 
 Review comment:
   fixed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348350)
Time Spent: 29h 40m  (was: 29.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 29h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=348352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348352
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:42
Start Date: 22/Nov/19 20:42
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r349783828
 
 

 ##
 File path: 
runners/core-java/src/test/java/org/apache/beam/runners/core/metrics/ExecutionStateTrackerTest.java
 ##
 @@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.metrics;
+
+import static org.hamcrest.Matchers.equalTo;
+import static org.junit.Assert.assertThat;
+import static org.mockito.Mockito.mock;
+
+import java.io.Closeable;
+import java.util.concurrent.TimeUnit;
+import 
org.apache.beam.runners.core.metrics.ExecutionStateTracker.ExecutionState;
+import org.joda.time.DateTimeUtils.MillisProvider;
+import org.junit.Before;
+import org.junit.Test;
+
+/** Tests for {@link ExecutionStateTracker}. */
+public class ExecutionStateTrackerTest {
+
+  private MillisProvider clock;
+  private ExecutionStateSampler sampler;
+
+  @Before
+  public void setUp() {
+clock = mock(MillisProvider.class);
+sampler = ExecutionStateSampler.newForTest(clock);
+  }
+
+  private static class TestExecutionState extends ExecutionState {
+
+private long totalMillis = 0;
+
+public TestExecutionState(String stateName) {
+  super(stateName);
+}
+
+@Override
+public void takeSample(long millisSinceLastSample) {
+  totalMillis += millisSinceLastSample;
+}
+
+@Override
+public void reportLull(Thread trackedThread, long millis) {}
+  }
+
+  private final TestExecutionState testExecutionState = new 
TestExecutionState("activity");
+
+  @Test
+  public void testReset() throws Exception {
+ExecutionStateTracker tracker = createTracker();
+try (Closeable c1 = tracker.activate(new Thread())) {
 
 Review comment:
   ```suggestion
   try (Closeable c1 = tracker.activate(Thread.currentThread())) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348352)
Time Spent: 2h 10m  (was: 2h)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=348353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348353
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:42
Start Date: 22/Nov/19 20:42
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10126: [BEAM-8619] Tear 
down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#issuecomment-557686663
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348353)
Time Spent: 2h 20m  (was: 2h 10m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-22 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980494#comment-16980494
 ] 

Luke Cwik commented on BEAM-8733:
-

Your not wrong, the java implementation is also bad and makes registration more 
complicated then it needs to be since process bundle would never need to block 
if the descriptor doesn't exist and could error out immediately.

 

All the portable Beam SDKs should do the registration on the request thread and 
not push it to another thread to allow for pipe-lining of the register + 
process requests.

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8791) [2.17.0 Release Validation] Run Python PreCommit times out

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8791?focusedWorklogId=348359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348359
 ]

ASF GitHub Bot logged work on BEAM-8791:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:49
Start Date: 22/Nov/19 20:49
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #10197: [BEAM-8791] 
Cherry-pick PR # 9985 to 2.17.0 release branch to reduce precommit times.
URL: https://github.com/apache/beam/pull/10197
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348359)
Time Spent: 0.5h  (was: 20m)

> [2.17.0 Release Validation] Run Python PreCommit times out
> --
>
> Key: BEAM-8791
> URL: https://issues.apache.org/jira/browse/BEAM-8791
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Mikhail Gryzykhin
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Code version: 
> [https://github.com/apache/beam/pull/9884/commits/d52355c9712b5ed85900a941f50317c5ab9252cb]
> [Job: 
> https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1051/|https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1051/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=348360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348360
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 22/Nov/19 20:49
Start Date: 22/Nov/19 20:49
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r349788653
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
 
 Review comment:
   I think the same as you. We need at least Timestamped values.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348360)
Time Spent: 19h 50m  (was: 19h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=348363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348363
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:02
Start Date: 22/Nov/19 21:02
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-557693079
 
 
   Staging links:
   
http://apache-beam-website-pull-requests.storage.googleapis.com/10200/get-started/wordcount-example/index.html
   
http://apache-beam-website-pull-requests.storage.googleapis.com/10200/documentation/programming-guide/index.html
   
http://apache-beam-website-pull-requests.storage.googleapis.com/10200/documentation/pipelines/design-your-pipeline/index.html
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348363)
Remaining Estimate: 0h
Time Spent: 10m

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8805) Remove obsolete worker_threads experiment in tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8805?focusedWorklogId=348367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348367
 ]

ASF GitHub Bot logged work on BEAM-8805:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:10
Start Date: 22/Nov/19 21:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10193: [BEAM-8805] Remove 
obsolete worker_threads experiment in tests
URL: https://github.com/apache/beam/pull/10193#issuecomment-557695714
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348367)
Time Spent: 20m  (was: 10m)

> Remove obsolete worker_threads experiment in tests
> --
>
> Key: BEAM-8805
> URL: https://issues.apache.org/jira/browse/BEAM-8805
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As of https://github.com/apache/beam/pull/10123 the worker_threads experiment 
> is obsolete and should be removed from our test scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8805) Remove obsolete worker_threads experiment in tests

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8805?focusedWorklogId=348368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348368
 ]

ASF GitHub Bot logged work on BEAM-8805:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:11
Start Date: 22/Nov/19 21:11
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10193: [BEAM-8805] Remove 
obsolete worker_threads experiment in tests
URL: https://github.com/apache/beam/pull/10193#issuecomment-557695781
 
 
   Thanks for the clean-up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348368)
Time Spent: 0.5h  (was: 20m)

> Remove obsolete worker_threads experiment in tests
> --
>
> Key: BEAM-8805
> URL: https://issues.apache.org/jira/browse/BEAM-8805
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As of https://github.com/apache/beam/pull/10123 the worker_threads experiment 
> is obsolete and should be removed from our test scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348376&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348376
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:42
Start Date: 22/Nov/19 21:42
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349806381
 
 

 ##
 File path: sdks/python/apache_beam/utils/timestamp.py
 ##
 @@ -57,6 +60,7 @@ def __init__(self, seconds=0, micros=0):
 
   @staticmethod
   def of(seconds):
+# type: (Union[int, float, Timestamp]) -> Timestamp
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348376)
Time Spent: 30h  (was: 29h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 30h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348377
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:42
Start Date: 22/Nov/19 21:42
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349806418
 
 

 ##
 File path: sdks/python/apache_beam/transforms/window.py
 ##
 @@ -367,7 +381,10 @@ class FixedWindows(NonMergingWindowFn):
   range.
   """
 
-  def __init__(self, size, offset=0):
+  def __init__(self,
+   size,  # type: Union[int, float, Duration]
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348377)
Time Spent: 30h 10m  (was: 30h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 30h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=348375&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348375
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:42
Start Date: 22/Nov/19 21:42
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-557705225
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348375)
Time Spent: 7h  (was: 6h 50m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=348378&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-348378
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 22/Nov/19 21:43
Start Date: 22/Nov/19 21:43
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r349806569
 
 

 ##
 File path: sdks/python/apache_beam/runners/runner.py
 ##
 @@ -133,7 +152,10 @@ def run_async(self, transform, options=None):
   transform(PBegin(p))
 return p.run()
 
-  def run_pipeline(self, pipeline, options):
+  def run_pipeline(self,
+   pipeline,  # type: Pipeline
+   options  # type: PipelineOptions
+  ):
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 348378)
Time Spent: 30h 20m  (was: 30h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 30h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8569) Support Hadoop 3 on Beam

2019-11-22 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980532#comment-16980532
 ] 

Tomo Suzuki commented on BEAM-8569:
---

I was trying to upgrade bigtable-client-core to the latest 1.12.1 version 
([PR|https://github.com/apache/beam/pull/10144]) but it got conflict with 
Hadoop version 2:

Conflict:
- Bigtable-client-core 1.12.1 needs Guava version >= 23.1 for 
{{ImmutableList.builderWithExpectedSize}} method 
([note|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557633483])
- Hadoop-client 2.7.3 needs Guava version <= 20 for {{Objects.toStringHelper}} 
method 
([note|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557697439])

I'm looking forward to this Hadoop upgrade!


> Support Hadoop 3 on Beam
> 
>
> Key: BEAM-8569
> URL: https://issues.apache.org/jira/browse/BEAM-8569
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hadoop-file-system, io-java-hadoop-format, 
> runner-spark
>Reporter: Ismaël Mejía
>Priority: Minor
>
> It seems that Hadoop 3 in production is finally happening. CDH supports it in 
> their latest version and Spark 3 will include support for Hadoop 3 too.
> This is an uber ticket to cover the required changes to the codebase to 
> ensure compliance with Hadoop 3.x
> Hadoop dependencies in Beam are mostly provided and APIs are until some point 
> compatible, but we may require changes in the CI to test that new changes 
> work both in Hadoop 2 and Hadoop 3 until we decide to remove support for 
> Hadoop 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >