[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=307635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307635
 ]

ASF GitHub Bot logged work on BEAM-7970:


Author: ASF GitHub Bot
Created on: 06/Sep/19 05:38
Start Date: 06/Sep/19 05:38
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9367: [BEAM-7970] 
Regenerate Go SDK protos in correct version.
URL: https://github.com/apache/beam/pull/9367
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307635)
Time Spent: 0.5h  (was: 20m)

> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=307634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307634
 ]

ASF GitHub Bot logged work on BEAM-7970:


Author: ASF GitHub Bot
Created on: 06/Sep/19 05:38
Start Date: 06/Sep/19 05:38
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #9367: [BEAM-7970] 
Regenerate Go SDK protos in correct version.
URL: https://github.com/apache/beam/pull/9367#issuecomment-528716476
 
 
   Closing because it turns out attempting to regen protos with an updated 
version of the proto compiler also requires updating a bunch of the Go SDK's 
dependencies, which is a huge can of worms.
   
   And it's mostly unnecessary anyway. The real way to avoid issues is to use 
an older version of the proto compiler, so the change that really needs to be 
made is updating the documentation to mention which version to use.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307634)
Time Spent: 20m  (was: 10m)

> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-3165) Mongo document read with non hex objectid

2019-09-05 Thread Maxim Ermilov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923932#comment-16923932
 ] 

Maxim Ermilov commented on BEAM-3165:
-

Python's apache_beam.io.mongodbio has same bug.

 

From [https://docs.mongodb.com/manual/core/document/#the-id-field]

"The {{_id}} field may contain values of any [BSON data 
type|https://docs.mongodb.com/manual/reference/bson-types/], other than an 
array."

So code should not assume _id to be 
[ObjectId|https://docs.mongodb.com/manual/reference/bson-types/#objectid]

> Mongo document read with non hex objectid
> -
>
> Key: BEAM-3165
> URL: https://issues.apache.org/jira/browse/BEAM-3165
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-mongodb
>Affects Versions: 2.1.0
>Reporter: Utkarsh Sopan
>Priority: Major
>
> I have a mongo collection which has non-hex '_id' in form a string.
> I cant read them into a PCollection getting following exception
> Exception in thread "main" java.lang.IllegalArgumentException: invalid 
> hexadecimal representation of an ObjectId: [somestring]
>   at org.bson.types.ObjectId.parseHexString(ObjectId.java:523)
>   at org.bson.types.ObjectId.(ObjectId.java:237)
>   at 
> org.bson.json.JsonReader.visitObjectIdConstructor(JsonReader.java:674)
>   at org.bson.json.JsonReader.readBsonType(JsonReader.java:197)
>   at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:139)
>   at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
>   at org.bson.codecs.configuration.LazyCodec.decode(LazyCodec.java:47)
>   at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215)
>   at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141)
>   at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
>   at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215)
>   at org.bson.codecs.DocumentCodec.readList(DocumentCodec.java:222)
>   at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:208)
>   at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141)
>   at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
>   at org.bson.Document.parse(Document.java:105)
>   at org.bson.Document.parse(Document.java:90)
>   at 
> org.apache.beam.sdk.io.mongodb.MongoDbIO$BoundedMongoDbReader.start(MongoDbIO.java:472)
>   at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
>   at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-09-05 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923930#comment-16923930
 ] 

Rui Wang edited comment on BEAM-8036 at 9/6/19 5:11 AM:


I believe the root cause is from 
[#9158|https://github.com/apache/beam/pull/9158] as 
[#9210|https://github.com/apache/beam/pull/9210] is directly built from it. 


was (Author: amaliujia):
I believe the root cause is from #9158 as #9210 is directly built from it. 

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-09-05 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923930#comment-16923930
 ] 

Rui Wang commented on BEAM-8036:


I believe the root cause is from #9158 as #9210 is directly built from it. 

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7947) Improves the interfaces of classes such as FnDataService, BundleProcessor, ActiveBundle, etc to change the parameter type from WindowedValue to T

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7947?focusedWorklogId=307626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307626
 ]

ASF GitHub Bot logged work on BEAM-7947:


Author: ASF GitHub Bot
Created on: 06/Sep/19 05:07
Start Date: 06/Sep/19 05:07
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9496: [BEAM-7947] 
Improves the interfaces of classes such as FnDataService,…
URL: https://github.com/apache/beam/pull/9496#issuecomment-528710018
 
 
   R: @kennknowles @robertwb @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307626)
Time Spent: 20m  (was: 10m)

> Improves the interfaces of classes such as FnDataService, BundleProcessor, 
> ActiveBundle, etc to change the parameter type from WindowedValue to T
> 
>
> Key: BEAM-7947
> URL: https://issues.apache.org/jira/browse/BEAM-7947
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Both `Coder>` and `FnDataReceiver>` use 
> `WindowedValue` as the data structure that both sides of Runner and SDK 
> Harness know each other. Control Plane/Data Plane/State Plane/Logging is a 
> highly abstraction, such as Control Plane and Logging, these are common 
> requirements for all multi-language platforms. For example, the Flink 
> community is also discussing how to support Python UDF, as well as how to 
> deal with docker environment. how to data transfer, how to state access, how 
> to logging etc. If Beam can further abstract these service interfaces, i.e., 
> interface definitions are compatible with multiple engines, and finally 
> provided to other projects in the form of class libraries, it definitely will 
> help other platforms that want to support multiple languages. Here I am to 
> throw out a minnow to catch a whale, take the FnDataService#receive interface 
> as an example, and turn `WindowedValue` into `T` so that other platforms 
> can be extended arbitrarily, as follows:
> {code}
>  InboundDataClient receive(LogicalEndpoint inputLocation, Coder coder, 
> FnDataReceiver> listener);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307614
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 06/Sep/19 04:24
Start Date: 06/Sep/19 04:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-528702292
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307614)
Time Spent: 4h 40m  (was: 4.5h)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7947) Improves the interfaces of classes such as FnDataService, BundleProcessor, ActiveBundle, etc to change the parameter type from WindowedValue to T

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7947?focusedWorklogId=307611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307611
 ]

ASF GitHub Bot logged work on BEAM-7947:


Author: ASF GitHub Bot
Created on: 06/Sep/19 04:11
Start Date: 06/Sep/19 04:11
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9496: 
[BEAM-7947] Improves the interfaces of classes such as FnDataService,…
URL: https://github.com/apache/beam/pull/9496
 
 
   Currently, Both `Coder>` and 
`FnDataReceiver>` use `WindowedValue` as the data structure 
that both sides of Runner and SDK Harness know each other. Control Plane/Data 
Plane/State Plane/Logging is a highly abstraction, such as Control Plane and 
Logging, these are common requirements for all multi-language platforms. For 
example, the Flink community is also discussing how to support Python UDF, as 
well as how to deal with docker environment. how to data transfer, how to state 
access, how to logging etc. If Beam can further abstract these service 
interfaces, i.e., interface definitions are compatible with multiple engines, 
and finally provided to other projects in the form of class libraries, it 
definitely will help other platforms that want to support multiple languages. 
Take the FnDataService#receive interface as an example, and turn 
`WindowedValue` into `T` so that other platforms can be extended 
arbitrarily, as follows:
   ```
InboundDataClient receive(LogicalEndpoint inputLocation, Coder coder, 
FnDataReceiver> listener);
   ```
   For details of the discussion can be found in here:
   
https://lists.apache.org/list.html?d...@beam.apache.org:lte=1M:%5BDISCUSS%5D%20Turn%20%60WindowedValue
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?focusedWorklogId=307599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307599
 ]

ASF GitHub Bot logged work on BEAM-7945:


Author: ASF GitHub Bot
Created on: 06/Sep/19 03:21
Start Date: 06/Sep/19 03:21
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9452: 
[BEAM-7945] Allow runner to configure semi_persist_dir which is used …
URL: https://github.com/apache/beam/pull/9452#discussion_r321562506
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -815,6 +815,7 @@ message StartWorkerRequest {
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor logging_endpoint = 3;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor artifact_endpoint = 4;
   org.apache.beam.model.pipeline.v1.ApiServiceDescriptor provision_endpoint = 
5;
+  string semi_persist_dir = 6;
 
 Review comment:
   Oh, Yes, I see, this is useless changes. `pipeline_options` already defined 
in the `ProvisionInfo`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307599)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8088) PCollection boundedness should be tracked and propagated in python sdk

2019-09-05 Thread Chad Dombrova (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chad Dombrova updated BEAM-8088:

Summary: PCollection boundedness should be tracked and propagated in python 
sdk  (was: [python] PCollection boundedness should be tracked and propagated)

> PCollection boundedness should be tracked and propagated in python sdk
> --
>
> Key: BEAM-8088
> URL: https://issues.apache.org/jira/browse/BEAM-8088
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As far as I can tell Python does not care about boundedness of PCollections 
> even in streaming mode, but external transforms _do_.  In my ongoing effort 
> to get PubsubIO external transforms working I discovered that I could not 
> generate an unbounded write. 
> My pipeline looks like this:
> {code:python}
> (
> pipe
> | 'PubSubInflow' >> 
> external.pubsub.ReadFromPubSub(subscription=subscription, 
> with_attributes=True)
> | 'PubSubOutflow' >> 
> external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True)
> )
> {code}
> The PCollections returned from the external Read are Unbounded, as expected, 
> but python is responsible for creating the intermediate PCollection, which is 
> always Bounded, and thus external Write is always Bounded. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8088) [python] PCollection boundedness should be tracked and propagated

2019-09-05 Thread Chad Dombrova (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chad Dombrova updated BEAM-8088:

Summary: [python] PCollection boundedness should be tracked and propagated  
(was: PCollection boundedness should be tracked and propagated)

> [python] PCollection boundedness should be tracked and propagated
> -
>
> Key: BEAM-8088
> URL: https://issues.apache.org/jira/browse/BEAM-8088
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As far as I can tell Python does not care about boundedness of PCollections 
> even in streaming mode, but external transforms _do_.  In my ongoing effort 
> to get PubsubIO external transforms working I discovered that I could not 
> generate an unbounded write. 
> My pipeline looks like this:
> {code:python}
> (
> pipe
> | 'PubSubInflow' >> 
> external.pubsub.ReadFromPubSub(subscription=subscription, 
> with_attributes=True)
> | 'PubSubOutflow' >> 
> external.pubsub.WriteToPubSub(topic=OUTPUT_TOPIC, with_attributes=True)
> )
> {code}
> The PCollections returned from the external Read are Unbounded, as expected, 
> but python is responsible for creating the intermediate PCollection, which is 
> always Bounded, and thus external Write is always Bounded. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8161?focusedWorklogId=307598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307598
 ]

ASF GitHub Bot logged work on BEAM-8161:


Author: ASF GitHub Bot
Created on: 06/Sep/19 03:18
Start Date: 06/Sep/19 03:18
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9495: [BEAM-8161] 
Update joda time to 2.10.3 to get TZDB 2019b
URL: https://github.com/apache/beam/pull/9495
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Commented] (BEAM-5530) Migrate to java.time lib instead of joda-time

2019-09-05 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923892#comment-16923892
 ] 

Luke Cwik commented on BEAM-5530:
-

One advantage with jodatime is that the TZDB is typically updated more often 
and you don't have to wait for a JVM release or muck with the JVM installation 
to get an update TZDB.

Once custom containers are used for the Java SDK everywhere, this point will be 
moot because then anybody will be able to roll their own container containing 
an updated TZDB,

> Migrate to java.time lib instead of joda-time
> -
>
> Key: BEAM-5530
> URL: https://issues.apache.org/jira/browse/BEAM-5530
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Reporter: Alexey Romanenko
>Priority: Major
> Fix For: 3.0.0
>
>
> Joda-time has been used till moving to Java 8. For now, these two time 
> libraries are used together. It will make sense finally to move everywhere to 
> only one lib - *java.time* - as a standard Java time library (see mail list 
> discussion: 
> [https://lists.apache.org/thread.html/b10f6f9daed44f5fa65e315a44b68b2f57c3e80225f5d549b84918af@%3Cdev.beam.apache.org%3E]).
>  
> Since this migration will introduce breaking API changes, then we should 
> address it to 3.0 release.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB

2019-09-05 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-8161:
---

 Summary: Upgrade to joda time 2.10.3 to get updated TZDB
 Key: BEAM-8161
 URL: https://issues.apache.org/jira/browse/BEAM-8161
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Luke Cwik
Assignee: Luke Cwik






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8161) Upgrade to joda time 2.10.3 to get updated TZDB

2019-09-05 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik updated BEAM-8161:

Status: Open  (was: Triage Needed)

> Upgrade to joda time 2.10.3 to get updated TZDB
> ---
>
> Key: BEAM-8161
> URL: https://issues.apache.org/jira/browse/BEAM-8161
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307590
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 06/Sep/19 02:56
Start Date: 06/Sep/19 02:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-528687056
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307590)
Time Spent: 4.5h  (was: 4h 20m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307570
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 06/Sep/19 01:49
Start Date: 06/Sep/19 01:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9344: [BEAM-7984] The 
coder returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-528673785
 
 
   Long queue...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307570)
Time Spent: 4h  (was: 3h 50m)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307572
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 06/Sep/19 01:49
Start Date: 06/Sep/19 01:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9344: [BEAM-7984] 
The coder returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307572)
Time Spent: 4h 10m  (was: 4h)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307565
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 06/Sep/19 01:29
Start Date: 06/Sep/19 01:29
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder 
returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-528669756
 
 
   yay, it works!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307565)
Time Spent: 3h 50m  (was: 3h 40m)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307561
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 06/Sep/19 01:01
Start Date: 06/Sep/19 01:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9257: [BEAM-7389] Add 
DoFn methods sample
URL: https://github.com/apache/beam/pull/9257
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307561)
Time Spent: 55h 50m  (was: 55h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 55h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=307552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307552
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 06/Sep/19 00:42
Start Date: 06/Sep/19 00:42
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9493: 
[BEAM-8146] Add equals and hashCode to SchemaCoder and RowCoder
URL: https://github.com/apache/beam/pull/9493
 
 
   Adds a `typeDescriptor` parameter to SchemaCoder. Two SchemaCoders are 
considered equal if their type descriptors are equal and their schemas are 
equal (i.e. the schemas have the same UUID). This change required plumbing 
through a TypeDescriptor everywhere a SchemaCoder is created. Fortunately, most 
of these occurrences are already using a TypeDescriptor to look up a Schema 
from a SchemaRegistry. There are a few exceptions to this rule, where I have to 
construct a `TypeDescriptor.of(clazz)` we may be able to eliminate this  with a 
bit of refactoring.
   
   This PR also adds a SchemaCoderTest file which just contains tests of 
`SchemaCoder::equals`.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307545=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307545
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 06/Sep/19 00:04
Start Date: 06/Sep/19 00:04
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9418: [BEAM-5428] 
Implement cross-bundle user state caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#discussion_r321527154
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -149,6 +149,7 @@ def main(unused_argv):
 control_address=service_descriptor.url,
 worker_count=_get_worker_count(sdk_pipeline_options),
 worker_id=_worker_id,
+state_cache_size=_get_state_cache_size(sdk_pipeline_options),
 
 Review comment:
   It might be better to pass the pipeline options to SDKHarness and have it 
extract what it needs. There is currently quite some (parameter related) noise 
wherever SdkHarness is constructed in general.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307545)
Time Spent: 17h 50m  (was: 17h 40m)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=307543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307543
 ]

ASF GitHub Bot logged work on BEAM-7600:


Author: ASF GitHub Bot
Created on: 05/Sep/19 23:57
Start Date: 05/Sep/19 23:57
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9095: [BEAM-7600] 
borrow SDK harness management code into Spark runner
URL: https://github.com/apache/beam/pull/9095#discussion_r321531229
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultExecutableStageContext.java
 ##
 @@ -102,7 +100,8 @@ private JobFactoryState(int maxFactories) {
 new ConcurrentHashMap<>();
 
 @Override
-public FlinkExecutableStageContext get(JobInfo jobInfo) {
+public ExecutableStageContext get(
+JobInfo jobInfo, SerializableFunction 
isReleaseSynchronous) {
 
 Review comment:
   As discussed offline, I made separate contexts for Flink and Spark.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307543)
Time Spent: 6h 50m  (was: 6h 40m)

> Spark portable runner: reuse SDK harness
> 
>
> Key: BEAM-7600
> URL: https://issues.apache.org/jira/browse/BEAM-7600
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Right now, we're creating a new SDK harness every time an executable stage is 
> run [1], which is expensive. We should be able to re-use code from the Flink 
> runner to re-use the SDK harness [2].
>  
> [1] 
> [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L135]
> [2] 
> [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307542
 ]

ASF GitHub Bot logged work on BEAM-8113:


Author: ASF GitHub Bot
Created on: 05/Sep/19 23:56
Start Date: 05/Sep/19 23:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9451: [BEAM-8113] 
Stage files from context classloader
URL: https://github.com/apache/beam/pull/9451#discussion_r321526565
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/PipelineResourcesTest.java
 ##
 @@ -44,24 +49,98 @@
   @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder();
   @Rule public transient ExpectedException thrown = ExpectedException.none();
 
+  ClassLoader currentContext = null;
 
 Review comment:
   nit: Consider making this a test rule like 
[RestoreSystemProperties](https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/testing/RestoreSystemProperties.java)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307542)
Time Spent: 7h 10m  (was: 7h)

> FlinkRunner: Stage files from context classloader
> -
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307541
 ]

ASF GitHub Bot logged work on BEAM-8113:


Author: ASF GitHub Bot
Created on: 05/Sep/19 23:56
Start Date: 05/Sep/19 23:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9451: [BEAM-8113] 
Stage files from context classloader
URL: https://github.com/apache/beam/pull/9451#discussion_r321530670
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PipelineResources.java
 ##
 @@ -40,34 +44,42 @@
 public class PipelineResources {
 
   /**
-   * Attempts to detect all the resources the class loader has access to. This 
does not recurse to
-   * class loader parents stopping it from pulling in resources from the 
system class loader.
+   * Detects all URLs that are present in all class loaders in between context 
class loader of
+   * calling thread and class loader of class passed as parameter. It doesn't 
follow parents above
+   * this class loader stopping it from pulling in resources from the system 
class loader.
*
-   * @param classLoader The URLClassLoader to use to detect resources to stage.
-   * @throws IllegalArgumentException If either the class loader is not a 
URLClassLoader or one of
-   * the resources the class loader exposes is not a file resource.
+   * @param cls Class whose class loader stops recursion into parent loaders
+   * @throws IllegalArgumentException no classloader in context hierarchy is 
URLClassloader or if
+   * one of the resources any class loader exposes is not a file resource.
* @return A list of absolute paths to the resources the class loader uses.
*/
-  public static List detectClassPathResourcesToStage(ClassLoader 
classLoader) {
-if (!(classLoader instanceof URLClassLoader)) {
-  String message =
-  String.format(
-  "Unable to use ClassLoader to detect classpath elements. "
-  + "Current ClassLoader is %s, only URLClassLoaders are 
supported.",
-  classLoader);
-  throw new IllegalArgumentException(message);
-}
+  public static List detectClassPathResourcesToStage(Class cls) {
+return detectClassPathResourcesToStage(cls.getClassLoader());
+  }
 
-List files = new ArrayList<>();
-for (URL url : ((URLClassLoader) classLoader).getURLs()) {
-  try {
-files.add(new File(url.toURI()).getAbsolutePath());
-  } catch (IllegalArgumentException | URISyntaxException e) {
-String message = String.format("Unable to convert url (%s) to file.", 
url);
-throw new IllegalArgumentException(message, e);
-  }
+  /**
+   * Detect resources to stage, by using hierarchy between current context 
classloader and the given
+   * one. Always include the given class loader in the process.
+   *
+   * @param loader the loader to use as stopping condition when traversing 
context class loaders
+   * @return A list of absolute paths to the resources the class loader uses.
+   */
+  @VisibleForTesting
+  static List detectClassPathResourcesToStage(final ClassLoader 
loader) {
+Set files = new HashSet<>();
 
 Review comment:
   Making this a set even when there is a single classloader means that we will 
reorder that classloaders classpath which will cause lots of dependency 
conflict issues.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307541)
Time Spent: 7h 10m  (was: 7h)

> FlinkRunner: Stage files from context classloader
> -
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307536
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 05/Sep/19 23:43
Start Date: 05/Sep/19 23:43
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9418: [BEAM-5428] 
Implement cross-bundle user state caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#discussion_r321527901
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/worker_pool_main.py
 ##
 @@ -47,21 +47,26 @@
 class BeamFnExternalWorkerPoolServicer(
 beam_fn_api_pb2_grpc.BeamFnExternalWorkerPoolServicer):
 
-  def __init__(self, worker_threads, use_process=False,
-   container_executable=None):
+  def __init__(self, worker_threads,
+   use_process=False,
+   container_executable=None,
+   state_cache_size=0):
 self._worker_threads = worker_threads
 self._use_process = use_process
 self._container_executable = container_executable
+self._state_cache_size = state_cache_size
 self._worker_processes = {}
 
   @classmethod
   def start(cls, worker_threads=1, use_process=False, port=0,
-container_executable=None):
+state_cache_size=0, container_executable=None):
 worker_server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
 worker_address = 'localhost:%s' % worker_server.add_insecure_port(
 '[::]:%s' % port)
-worker_pool = cls(worker_threads, use_process=use_process,
-  container_executable=container_executable)
+worker_pool = cls(worker_threads,
+  use_process=use_process,
+  container_executable=container_executable,
+  state_cache_size=state_cache_size)
 
 Review comment:
   This should not occur here but come from the pipeline options from the 
provisioning endpoint.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307536)
Time Spent: 17h 40m  (was: 17.5h)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=307529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307529
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 05/Sep/19 23:38
Start Date: 05/Sep/19 23:38
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9418: [BEAM-5428] 
Implement cross-bundle user state caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#discussion_r321527154
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -149,6 +149,7 @@ def main(unused_argv):
 control_address=service_descriptor.url,
 worker_count=_get_worker_count(sdk_pipeline_options),
 worker_id=_worker_id,
+state_cache_size=_get_state_cache_size(sdk_pipeline_options),
 
 Review comment:
   It might be better to pass the pipeline options to SDKHarness and have it 
extract what it needs. There is currently quite some noise wherever SdkHarness 
is constructed in general.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307529)
Time Spent: 17.5h  (was: 17h 20m)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8160) Add instructions about how to set FnApi multi-threads/processes

2019-09-05 Thread Hannah Jiang (Jira)
Hannah Jiang created BEAM-8160:
--

 Summary: Add instructions about how to set FnApi 
multi-threads/processes
 Key: BEAM-8160
 URL: https://issues.apache.org/jira/browse/BEAM-8160
 Project: Beam
  Issue Type: Task
  Components: sdk-py-core
Reporter: Hannah Jiang
Assignee: Hannah Jiang


Add instructions to Beam site or Beam wiki for easy discovery.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Jonathan Jin (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923795#comment-16923795
 ] 

Jonathan Jin commented on BEAM-8158:


[~udim]: The product in question is unfortunately stuck on Python 2 for now. 
We're planning a migration to Python 3 over the next several months, but we'd 
ideally like to be able to integrate cleanly in Python 2 without resorting to 
stopgaps.

> Loosen Python dependency restrictions: httplib2, oauth2client
> -
>
> Key: BEAM-8158
> URL: https://issues.apache.org/jira/browse/BEAM-8158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jonathan Jin
>Assignee: Chamikara Jayalath
>Priority: Major
>
> The Beam Python SDK's current pinned dependencies create dependency conflict 
> issues for my team at Twitter.
> I'd like the following expansions of the Python SDK's dependency ranges:
>  * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
>  * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*
> I understand, from pull request 
> [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter 
> is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307504
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:51
Start Date: 05/Sep/19 22:51
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder 
returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-528622418
 
 
   wow, Python PreCommit takes _forever_
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307504)
Time Spent: 3h 40m  (was: 3.5h)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307498
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:37
Start Date: 05/Sep/19 22:37
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-528618322
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307498)
Time Spent: 4h 20m  (was: 4h 10m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307497
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:27
Start Date: 05/Sep/19 22:27
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528615272
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307497)
Time Spent: 3h 10m  (was: 3h)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307493
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:26
Start Date: 05/Sep/19 22:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307493)
Time Spent: 55h 40m  (was: 55.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 55h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307496
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:26
Start Date: 05/Sep/19 22:26
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9408:  [BEAM-7967] 
Execute portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#discussion_r321509256
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -120,4 +132,62 @@ private PortablePipelineResult 
createPortablePipelineResult(
   return flinkRunnerResult;
 }
   }
+
+  public static void main(String[] args) throws Exception {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307496)
Time Spent: 3h  (was: 2h 50m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307492
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:25
Start Date: 05/Sep/19 22:25
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9262: [BEAM-7389] Add code 
examples for Regex page
URL: https://github.com/apache/beam/pull/9262#issuecomment-528614780
 
 
   We can wait until the next release for pydocs. Their main use is for 
released versions really.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307492)
Time Spent: 55.5h  (was: 55h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 55.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307491
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:24
Start Date: 05/Sep/19 22:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-528614423
 
 
   Thanks Luke. This should be good for teh worker.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307491)
Time Spent: 4h 10m  (was: 4h)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307489
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:18
Start Date: 05/Sep/19 22:18
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9408:  [BEAM-7967] 
Execute portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#discussion_r321506835
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java
 ##
 @@ -51,4 +83,102 @@
   ARTIFACT_STAGING_FOLDER_PATH + "/artifact-manifest.json";
   static final String PIPELINE_PATH = PIPELINE_FOLDER_PATH + "/pipeline.json";
   static final String PIPELINE_OPTIONS_PATH = PIPELINE_FOLDER_PATH + 
"/pipeline-options.json";
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(PortablePipelineJarCreator.class);
+
+  private static InputStream getResourceFromClassPath(String resourcePath) 
throws IOException {
+InputStream inputStream = 
PortablePipelineJarUtils.class.getResourceAsStream(resourcePath);
+if (inputStream == null) {
+  throw new FileNotFoundException(
+  String.format("Resource %s not found on classpath.", resourcePath));
+}
+return inputStream;
+  }
+
+  /** Populates {@code builder} using the JSON resource specified by {@code 
resourcePath}. */
+  private static void parseJsonResource(String resourcePath, Builder builder) 
throws IOException {
+try (InputStream inputStream = getResourceFromClassPath(resourcePath)) {
+  String contents = new String(ByteStreams.toByteArray(inputStream), 
StandardCharsets.UTF_8);
+  JsonFormat.parser().merge(contents, builder);
+}
+  }
+
+  public static Pipeline getPipelineFromClasspath() throws IOException {
+Pipeline.Builder builder = Pipeline.newBuilder();
+parseJsonResource("/" + PIPELINE_PATH, builder);
+return builder.build();
+  }
+
+  public static Struct getPipelineOptionsFromClasspath() throws IOException {
+Struct.Builder builder = Struct.newBuilder();
+parseJsonResource("/" + PIPELINE_OPTIONS_PATH, builder);
+return builder.build();
+  }
+
+  public static ProxyManifest getArtifactManifestFromClassPath() throws 
IOException {
+ProxyManifest.Builder builder = ProxyManifest.newBuilder();
+parseJsonResource("/" + ARTIFACT_MANIFEST_PATH, builder);
+return builder.build();
+  }
+
+  /** Writes artifacts listed in {@code proxyManifest}. */
+  public static String stageArtifacts(
+  ProxyManifest proxyManifest,
+  PipelineOptions options,
+  String invocationId,
+  String artifactStagingPath)
+  throws Exception {
+Collection filesToStage =
+prepareArtifactsForStaging(proxyManifest, options, invocationId);
+try (GrpcFnServer artifactServer =
+GrpcFnServer.allocatePortAndCreateFor(
+new BeamFileSystemArtifactStagingService(), 
InProcessServerFactory.create())) {
+  ManagedChannel grpcChannel =
+  InProcessManagedChannelFactory.create()
+  .forDescriptor(artifactServer.getApiServiceDescriptor());
+  ArtifactServiceStager stager = 
ArtifactServiceStager.overChannel(grpcChannel);
+  String stagingSessionToken =
+  BeamFileSystemArtifactStagingService.generateStagingSessionToken(
+  invocationId, artifactStagingPath);
+  String retrievalToken = stager.stage(stagingSessionToken, filesToStage);
+  // Clean up.
+  for (StagedFile file : filesToStage) {
+if (!file.getFile().delete()) {
+  LOG.warn("Failed to delete file {}", file.getFile());
+}
+  }
+  grpcChannel.shutdown();
+  return retrievalToken;
+}
+  }
+
+  /**
+   * Artifacts are expected to exist as resources on the classpath, located 
using {@code
+   * proxyManifest}. Write them to tmp files so they can be staged.
+   */
+  private static Collection prepareArtifactsForStaging(
+  ProxyManifest proxyManifest, PipelineOptions options, String 
invocationId)
+  throws IOException {
+List filesToStage = new ArrayList<>();
+Path outputFolderPath =
+Paths.get(
+MoreObjects.firstNonNull(
+options.getTempLocation(), 
System.getProperty("java.io.tmpdir")),
+invocationId);
+if (!outputFolderPath.toFile().mkdir()) {
+  throw new IOException("Failed to create folder " + outputFolderPath);
+}
+for (Location location : proxyManifest.getLocationList()) {
+  try (InputStream inputStream = 
getResourceFromClassPath(location.getUri())) {
 
 Review comment:
   As far as I know, the jar's contents only exist as class path resources 
(streams), not regular files, and the artifact staging code expects that we are 
staging regular 

[jira] [Commented] (BEAM-7978) ArithmeticExceptions on getting backlog bytes

2019-09-05 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923775#comment-16923775
 ] 

Kenneth Knowles commented on BEAM-7978:
---

I just happened to dig in to this, and really Minutes.between(instant, instant) 
is not safe unless you handle the overflow and convert to UNKNOWN behavior. 
Localizing to this is, in the simplified kinesis client, seems best. That keeps 
the troublesome behavior to a minimum. It is better than forcing a complex spec 
outside this local function.

> ArithmeticExceptions on getting backlog bytes 
> --
>
> Key: BEAM-7978
> URL: https://issues.apache.org/jira/browse/BEAM-7978
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.14.0
>Reporter: Mateusz
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Hello,
> Beam 2.14.0
>  (and to be more precise 
> [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec])
>  introduced a change in watermark calculation in Kinesis IO causing below 
> error:
> {code:java}
> exception:  "java.lang.RuntimeException: Unknown kinesis failure, when trying 
> to reach kinesis
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155)
>   at 
> org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArithmeticException: Value cannot fit in an int: 
> 153748963401
>   at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229)
>   at 
> org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141)
>   at 
> org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72)
>   at org.joda.time.Minutes.minutesBetween(Minutes.java:101)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210)
>   ... 10 more
> {code}
> We spotted this issue on Dataflow runner. It's problematic as inability to 
> get backlog bytes seems to result in constant recreation of KinesisReader.
> The issue happens if the backlog bytes are retrieved before watermark value 
> is updated from initial default value. Easy way to reproduce it is to create 
> a pipeline with Kinesis source for a stream where no records are being put. 
> While debugging it locally, you can observe that the watermark is set to the 
> value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes 
> (default watermark idle duration threshold is set to 2 minutes) , the 
> watermark is set to value of 
> [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]),
>  so the next backlog bytes retrieval should be correct. However, as described 
> before, running the pipeline on Dataflow runner results in KinesisReader 
> being closed just after creation, so the watermark won't be fixed.
> The reason of the issue is following: The introduced watermark policies are 
> relying on 
> [WatermarkParameters|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkParameters.java]
>  which initialises currentWatermark and eventTime to 
> [BoundedWindow.TIMESTAMP_MIN_VALUE|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ecR52].
>  This result in watermark being set to 

[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307488
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:17
Start Date: 05/Sep/19 22:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#discussion_r321506498
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -202,7 +202,7 @@ def _get_worker_count(pipeline_options):
   re.match(r'worker_threads=(?P.*)',
experiment).group('worker_threads'))
 
-  return 12
+  return 1
 
 Review comment:
   also +1 for this change if it does not have performance impact as it will 
solve a problem which we discussed over the mailing thread 
https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E
   Where this was the 2nd option in the proposal.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307488)
Time Spent: 4h  (was: 3h 50m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307487=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307487
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:16
Start Date: 05/Sep/19 22:16
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#discussion_r321506498
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -202,7 +202,7 @@ def _get_worker_count(pipeline_options):
   re.match(r'worker_threads=(?P.*)',
experiment).group('worker_threads'))
 
-  return 12
+  return 1
 
 Review comment:
   also +1 for this change if it does not have performance impact as it will 
solve a problem which we discussed over the mailing thread 
https://lists.apache.org/thread.html/0cbf73d696e0d3a5bb8e93618ac9d6bb81daecf2c9c8e11ee220c8ae@%3Cdev.beam.apache.org%3E
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307487)
Time Spent: 3h 50m  (was: 3h 40m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5539) Beam Dependency Update Request: google-cloud-pubsub

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5539?focusedWorklogId=307482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307482
 ]

ASF GitHub Bot logged work on BEAM-5539:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:13
Start Date: 05/Sep/19 22:13
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9491: [BEAM-5539] 
Upgrade google-cloud-pubsub and bigquery packages.
URL: https://github.com/apache/beam/pull/9491
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307478
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:12
Start Date: 05/Sep/19 22:12
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#discussion_r321505439
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker_main.py
 ##
 @@ -202,7 +202,7 @@ def _get_worker_count(pipeline_options):
   re.match(r'worker_threads=(?P.*)',
experiment).group('worker_threads'))
 
-  return 12
+  return 1
 
 Review comment:
   > Do we know why 10K is a good number? Would not this practically accept all 
work?
   > 
   > How did we come up with 12 in the first place. (@angoenka may have 
information)
   
   It was just some basic experimentation where for the job which I was 
experimenting on did not show a lot of benefit beyond 12 threads.
   
   For TFX pipelines, we recommend 100 threads when running on Flink because of 
some scheduling issues where we use to get dead lock in situation similar to 
the one mentioned by Luke.
   
   I am not sure how it will impact the resource usage if we have 1 
standing threads for single core.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307478)
Time Spent: 3h 40m  (was: 3.5h)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307477
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:09
Start Date: 05/Sep/19 22:09
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528610103
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307477)
Time Spent: 2h 40m  (was: 2.5h)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307476
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:08
Start Date: 05/Sep/19 22:08
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528609802
 
 
   Thanks!
   This will be really useful.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307476)
Time Spent: 2.5h  (was: 2h 20m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307474
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:08
Start Date: 05/Sep/19 22:08
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9408:  [BEAM-7967] 
Execute portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#discussion_r321504185
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java
 ##
 @@ -51,4 +83,102 @@
   ARTIFACT_STAGING_FOLDER_PATH + "/artifact-manifest.json";
   static final String PIPELINE_PATH = PIPELINE_FOLDER_PATH + "/pipeline.json";
   static final String PIPELINE_OPTIONS_PATH = PIPELINE_FOLDER_PATH + 
"/pipeline-options.json";
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(PortablePipelineJarCreator.class);
+
+  private static InputStream getResourceFromClassPath(String resourcePath) 
throws IOException {
+InputStream inputStream = 
PortablePipelineJarUtils.class.getResourceAsStream(resourcePath);
+if (inputStream == null) {
+  throw new FileNotFoundException(
+  String.format("Resource %s not found on classpath.", resourcePath));
+}
+return inputStream;
+  }
+
+  /** Populates {@code builder} using the JSON resource specified by {@code 
resourcePath}. */
+  private static void parseJsonResource(String resourcePath, Builder builder) 
throws IOException {
+try (InputStream inputStream = getResourceFromClassPath(resourcePath)) {
+  String contents = new String(ByteStreams.toByteArray(inputStream), 
StandardCharsets.UTF_8);
+  JsonFormat.parser().merge(contents, builder);
+}
+  }
+
+  public static Pipeline getPipelineFromClasspath() throws IOException {
+Pipeline.Builder builder = Pipeline.newBuilder();
+parseJsonResource("/" + PIPELINE_PATH, builder);
+return builder.build();
+  }
+
+  public static Struct getPipelineOptionsFromClasspath() throws IOException {
+Struct.Builder builder = Struct.newBuilder();
+parseJsonResource("/" + PIPELINE_OPTIONS_PATH, builder);
+return builder.build();
+  }
+
+  public static ProxyManifest getArtifactManifestFromClassPath() throws 
IOException {
+ProxyManifest.Builder builder = ProxyManifest.newBuilder();
+parseJsonResource("/" + ARTIFACT_MANIFEST_PATH, builder);
+return builder.build();
+  }
+
+  /** Writes artifacts listed in {@code proxyManifest}. */
+  public static String stageArtifacts(
+  ProxyManifest proxyManifest,
+  PipelineOptions options,
+  String invocationId,
+  String artifactStagingPath)
+  throws Exception {
+Collection filesToStage =
+prepareArtifactsForStaging(proxyManifest, options, invocationId);
+try (GrpcFnServer artifactServer =
+GrpcFnServer.allocatePortAndCreateFor(
+new BeamFileSystemArtifactStagingService(), 
InProcessServerFactory.create())) {
+  ManagedChannel grpcChannel =
+  InProcessManagedChannelFactory.create()
+  .forDescriptor(artifactServer.getApiServiceDescriptor());
+  ArtifactServiceStager stager = 
ArtifactServiceStager.overChannel(grpcChannel);
+  String stagingSessionToken =
+  BeamFileSystemArtifactStagingService.generateStagingSessionToken(
+  invocationId, artifactStagingPath);
+  String retrievalToken = stager.stage(stagingSessionToken, filesToStage);
+  // Clean up.
+  for (StagedFile file : filesToStage) {
+if (!file.getFile().delete()) {
+  LOG.warn("Failed to delete file {}", file.getFile());
+}
+  }
+  grpcChannel.shutdown();
+  return retrievalToken;
+}
+  }
+
+  /**
+   * Artifacts are expected to exist as resources on the classpath, located 
using {@code
+   * proxyManifest}. Write them to tmp files so they can be staged.
+   */
+  private static Collection prepareArtifactsForStaging(
+  ProxyManifest proxyManifest, PipelineOptions options, String 
invocationId)
+  throws IOException {
+List filesToStage = new ArrayList<>();
+Path outputFolderPath =
+Paths.get(
+MoreObjects.firstNonNull(
+options.getTempLocation(), 
System.getProperty("java.io.tmpdir")),
+invocationId);
+if (!outputFolderPath.toFile().mkdir()) {
+  throw new IOException("Failed to create folder " + outputFolderPath);
+}
+for (Location location : proxyManifest.getLocationList()) {
+  try (InputStream inputStream = 
getResourceFromClassPath(location.getUri())) {
 
 Review comment:
   Can we use the jar directly instead of copying the artifacts to local file 
system?
   Copying files to local file system will also call for cleanup.
 

[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307475
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:08
Start Date: 05/Sep/19 22:08
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9408:  [BEAM-7967] 
Execute portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#discussion_r321496144
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineRunner.java
 ##
 @@ -120,4 +132,62 @@ private PortablePipelineResult 
createPortablePipelineResult(
   return flinkRunnerResult;
 }
   }
+
+  public static void main(String[] args) throws Exception {
 
 Review comment:
   Lets add comment as to why this main method is needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307475)
Time Spent: 2h 20m  (was: 2h 10m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923767#comment-16923767
 ] 

Udi Meiri commented on BEAM-7060:
-

Thanks! Standard typing migration is tracked in 
https://issues.apache.org/jira/browse/BEAM-8156

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307470
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:04
Start Date: 05/Sep/19 22:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-528608447
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307470)
Time Spent: 5h 20m  (was: 5h 10m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307468
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 05/Sep/19 22:03
Start Date: 05/Sep/19 22:03
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-528608240
 
 
   This version seems to work. I can find the new logs in stackdriver logs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307468)
Time Spent: 5h 10m  (was: 5h)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307465
 ]

ASF GitHub Bot logged work on BEAM-8113:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:54
Start Date: 05/Sep/19 21:54
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files 
from context classloader
URL: https://github.com/apache/beam/pull/9451#issuecomment-528605188
 
 
   @lukecwik sure ... the related code goes apparently through kinda rapid 
development, as there are more and more uses of the modified method in core. 
Essentially, that was a sort of why I wanted to keep that in flink runner code 
only. But if we want to solve the issues for all runners, then it should be 
placed in core, but that will probably make this PR more complicated to merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307465)
Time Spent: 7h  (was: 6h 50m)

> FlinkRunner: Stage files from context classloader
> -
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8159) Support TIME ADD/SUB

2019-09-05 Thread Rui Wang (Jira)
Rui Wang created BEAM-8159:
--

 Summary: Support TIME ADD/SUB
 Key: BEAM-8159
 URL: https://issues.apache.org/jira/browse/BEAM-8159
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_add
https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_sub



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8159) Support TIME ADD/SUB

2019-09-05 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8159:
---
Status: Open  (was: Triage Needed)

> Support TIME ADD/SUB
> 
>
> Key: BEAM-8159
> URL: https://issues.apache.org/jira/browse/BEAM-8159
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_add
> https://github.com/google/zetasql/blob/master/docs/time_functions.md#time_sub



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923750#comment-16923750
 ] 

Udi Meiri commented on BEAM-8158:
-

Jonathan are you using Python 2 or 3? Perhaps we can conditionally increase 
httplib2 max only on Python 3.

> Loosen Python dependency restrictions: httplib2, oauth2client
> -
>
> Key: BEAM-8158
> URL: https://issues.apache.org/jira/browse/BEAM-8158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jonathan Jin
>Assignee: Chamikara Jayalath
>Priority: Major
>
> The Beam Python SDK's current pinned dependencies create dependency conflict 
> issues for my team at Twitter.
> I'd like the following expansions of the Python SDK's dependency ranges:
>  * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
>  * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*
> I understand, from pull request 
> [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter 
> is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-8158:
-

Assignee: Chamikara Jayalath  (was: Ahmet Altay)

> Loosen Python dependency restrictions: httplib2, oauth2client
> -
>
> Key: BEAM-8158
> URL: https://issues.apache.org/jira/browse/BEAM-8158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jonathan Jin
>Assignee: Chamikara Jayalath
>Priority: Major
>
> The Beam Python SDK's current pinned dependencies create dependency conflict 
> issues for my team at Twitter.
> I'd like the following expansions of the Python SDK's dependency ranges:
>  * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
>  * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*
> I understand, from pull request 
> [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter 
> is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923745#comment-16923745
 ] 

Ahmet Altay commented on BEAM-8158:
---

/cc [~udim]

> Loosen Python dependency restrictions: httplib2, oauth2client
> -
>
> Key: BEAM-8158
> URL: https://issues.apache.org/jira/browse/BEAM-8158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jonathan Jin
>Assignee: Chamikara Jayalath
>Priority: Major
>
> The Beam Python SDK's current pinned dependencies create dependency conflict 
> issues for my team at Twitter.
> I'd like the following expansions of the Python SDK's dependency ranges:
>  * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
>  * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*
> I understand, from pull request 
> [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter 
> is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-8158:
-

Assignee: Ahmet Altay

> Loosen Python dependency restrictions: httplib2, oauth2client
> -
>
> Key: BEAM-8158
> URL: https://issues.apache.org/jira/browse/BEAM-8158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jonathan Jin
>Assignee: Ahmet Altay
>Priority: Major
>
> The Beam Python SDK's current pinned dependencies create dependency conflict 
> issues for my team at Twitter.
> I'd like the following expansions of the Python SDK's dependency ranges:
>  * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
>  * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*
> I understand, from pull request 
> [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter 
> is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923744#comment-16923744
 ] 

Ahmet Altay commented on BEAM-8158:
---

[~jinnovation] no updates, but let's explore a few options.

[~chamikara] could we update the googledatastore dependency to 7.1.0 or make 
this dependency optional? 

> Loosen Python dependency restrictions: httplib2, oauth2client
> -
>
> Key: BEAM-8158
> URL: https://issues.apache.org/jira/browse/BEAM-8158
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Jonathan Jin
>Priority: Major
>
> The Beam Python SDK's current pinned dependencies create dependency conflict 
> issues for my team at Twitter.
> I'd like the following expansions of the Python SDK's dependency ranges:
>  * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
>  * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*
> I understand, from pull request 
> [8653|https://github.com/apache/beam/pull/8653] by [~altay], that the latter 
> is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-7839) Distinguish unknown and unrecognized states in Dataflow runner.

2019-09-05 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-7839.
---
Fix Version/s: 2.16.0
 Assignee: Valentyn Tymofieiev
   Resolution: Fixed

> Distinguish unknown and unrecognized states in Dataflow runner.
> ---
>
> Key: BEAM-7839
> URL: https://issues.apache.org/jira/browse/BEAM-7839
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In BEAM-7766 [~kenn] mentioned that when reasoning about pipeline state, 
> Dataflow runner should distinguish between an unknown (to the service) 
> pipeline state state  and unrecognized by the SDK pipeline state.  An 
> unrecognized state may happen when service API introduces a new state that 
> older versions of the  SDK will not know about. Filing this issue to track 
> introducing a distinction.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-05 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923739#comment-16923739
 ] 

Chad Dombrova commented on BEAM-7060:
-

Note that work on type annotations is continuing in BEAM-7746


> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7839) Distinguish unknown and unrecognized states in Dataflow runner.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7839?focusedWorklogId=307450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307450
 ]

ASF GitHub Bot logged work on BEAM-7839:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:22
Start Date: 05/Sep/19 21:22
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9132: [BEAM-7839] 
Default to UNRECOGNIZED state when a state cannot be accurately interpreted by 
the SDK.
URL: https://github.com/apache/beam/pull/9132
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307450)
Time Spent: 40m  (was: 0.5h)

> Distinguish unknown and unrecognized states in Dataflow runner.
> ---
>
> Key: BEAM-7839
> URL: https://issues.apache.org/jira/browse/BEAM-7839
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In BEAM-7766 [~kenn] mentioned that when reasoning about pipeline state, 
> Dataflow runner should distinguish between an unknown (to the service) 
> pipeline state state  and unrecognized by the SDK pipeline state.  An 
> unrecognized state may happen when service API introduces a new state that 
> older versions of the  SDK will not know about. Filing this issue to track 
> introducing a distinction.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-05 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-7060.
-
Resolution: Fixed

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923735#comment-16923735
 ] 

Udi Meiri commented on BEAM-7060:
-

Py3 annotations support has been merged. Closing.
Other aspects such as mypy support and using the official types in the typing 
module will be handled in other JIRAs.

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=307446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307446
 ]

ASF GitHub Bot logged work on BEAM-7859:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:16
Start Date: 05/Sep/19 21:16
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9421: [BEAM-7859] 
set SDK worker parallelism to 1 in word count test
URL: https://github.com/apache/beam/pull/9421
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307446)
Time Spent: 1h  (was: 50m)

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The error was observed during Beam 2.14.0 release validation, see: 
> https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831
> Looks like master currently fails with a different error, both in Loopback 
> and Docker modes.
> [~ibzib] [~altay] [~robertwb] [~angoenka]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=307444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307444
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:13
Start Date: 05/Sep/19 21:13
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9283: [BEAM-7060] Type 
hints from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307444)
Time Spent: 15.5h  (was: 15h 20m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307439
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:07
Start Date: 05/Sep/19 21:07
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9344: [BEAM-7984] The 
coder returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-528582137
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307439)
Time Spent: 3.5h  (was: 3h 20m)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7738) Support PubSubIO to be configured externally for use with other SDKs

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7738?focusedWorklogId=307440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307440
 ]

ASF GitHub Bot logged work on BEAM-7738:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:07
Start Date: 05/Sep/19 21:07
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9268: [BEAM-7738] Add 
external transform support to PubsubIO
URL: https://github.com/apache/beam/pull/9268#issuecomment-528582296
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307440)
Time Spent: 3h  (was: 2h 50m)

> Support PubSubIO to be configured externally for use with other SDKs
> 
>
> Key: BEAM-7738
> URL: https://issues.apache.org/jira/browse/BEAM-7738
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp, runner-flink, sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Now that KafkaIO is supported via the external transform API (BEAM-7029) we 
> should add support for PubSub.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8000) Add Delete method to gRPC JobService

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8000?focusedWorklogId=307438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307438
 ]

ASF GitHub Bot logged work on BEAM-8000:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:05
Start Date: 05/Sep/19 21:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9489: [BEAM-8000] Add 
Delete method to JobService for clearing old jobs from memory
URL: https://github.com/apache/beam/pull/9489#issuecomment-528581673
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307438)
Time Spent: 20m  (was: 10m)

> Add Delete method to gRPC JobService
> 
>
> Key: BEAM-8000
> URL: https://issues.apache.org/jira/browse/BEAM-8000
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Labels: portability
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a user of the InMemoryJobService, I want a method to purge jobs from 
> memory when they are no longer needed, so that the service does not balloon 
> in memory usage over time.
> I was planning to name this Delete.  Also considering the name Purge.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8000) Add Delete method to gRPC JobService

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8000?focusedWorklogId=307437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307437
 ]

ASF GitHub Bot logged work on BEAM-8000:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:05
Start Date: 05/Sep/19 21:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9489: [BEAM-8000] 
Add Delete method to JobService for clearing old jobs from memory
URL: https://github.com/apache/beam/pull/9489
 
 
   As a user of the InMemoryJobService, I want a method to remove jobs from 
memory when they are no longer needed, so that the service does not grow in 
memory usage over time.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-8148) [python] allow tests to be specified when using tox

2019-09-05 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923730#comment-16923730
 ] 

Chad Dombrova commented on BEAM-8148:
-

PR is merged, so this is fixed!

> [python] allow tests to be specified when using tox
> ---
>
> Key: BEAM-8148
> URL: https://issues.apache.org/jira/browse/BEAM-8148
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> I don't know how to specify individual tests using gradle, and as a python 
> developer, it's way too opaque for me to figure out on my own.  
> The developer wiki suggest calling the tests directly:
> {noformat}
> python setup.py nosetests --tests :.
> {noformat}
> But this defeats the purpose of using tox, which is there to help users 
> manage the myriad virtual envs required to run the different tests.
> Luckily there is an easier way!  You just add {posargs} to the tox commands.  
> PR is coming shortly.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=307435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307435
 ]

ASF GitHub Bot logged work on BEAM-7600:


Author: ASF GitHub Bot
Created on: 05/Sep/19 21:02
Start Date: 05/Sep/19 21:02
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9095: [BEAM-7600] 
borrow SDK harness management code into Spark runner
URL: https://github.com/apache/beam/pull/9095#discussion_r321478397
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/DefaultExecutableStageContext.java
 ##
 @@ -102,7 +100,8 @@ private JobFactoryState(int maxFactories) {
 new ConcurrentHashMap<>();
 
 @Override
-public FlinkExecutableStageContext get(JobInfo jobInfo) {
+public ExecutableStageContext get(
+JobInfo jobInfo, SerializableFunction 
isReleaseSynchronous) {
 
 Review comment:
   I still feel that we should not have `isReleaseSynchronous` in `get` method 
as the usage can be unpredictable in case where we call `get` method at 2 
locations for the same jobInfo but with different `isReleaseSynchronous` 
method. As we are using caching, the 1st `isReleaseSynchronous` will be applied 
and can lead to concurrency bug.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307435)
Time Spent: 6h 40m  (was: 6.5h)

> Spark portable runner: reuse SDK harness
> 
>
> Key: BEAM-7600
> URL: https://issues.apache.org/jira/browse/BEAM-7600
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Right now, we're creating a new SDK harness every time an executable stage is 
> run [1], which is expensive. We should be able to re-use code from the Flink 
> runner to re-use the SDK harness [2].
>  
> [1] 
> [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkExecutableStageFunction.java#L135]
> [2] 
> [https://github.com/apache/beam/blob/c9fb261bc7666788402840bb6ce1b0ce2fd445d1/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307429
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:57
Start Date: 05/Sep/19 20:57
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528578625
 
 
   Run PortableJar_Flink PostCommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307429)
Time Spent: 2h  (was: 1h 50m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307431
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:57
Start Date: 05/Sep/19 20:57
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528578775
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307431)
Time Spent: 2h 10m  (was: 2h)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8155) Add PubsubIO.readMessagesWithCoderAndParseFn method

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8155?focusedWorklogId=307427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307427
 ]

ASF GitHub Bot logged work on BEAM-8155:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:52
Start Date: 05/Sep/19 20:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9473: [BEAM-8155] Add 
static PubsubIO.readMessagesWithCoderAndParseFn method
URL: https://github.com/apache/beam/pull/9473#issuecomment-528576937
 
 
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307427)
Remaining Estimate: 0h
Time Spent: 10m

> Add PubsubIO.readMessagesWithCoderAndParseFn method
> ---
>
> Key: BEAM-8155
> URL: https://issues.apache.org/jira/browse/BEAM-8155
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Claire McGinty
>Assignee: Claire McGinty
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Complements [#8879|https://github.com/apache/beam/pull/8879]; the constructor 
> for a generic-typed PubsubIO.Read is private, so some kind of public 
> constructor is needed to use custom parse functions with generic types.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8155) Add PubsubIO.readMessagesWithCoderAndParseFn method

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8155?focusedWorklogId=307428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307428
 ]

ASF GitHub Bot logged work on BEAM-8155:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:52
Start Date: 05/Sep/19 20:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9473: 
[BEAM-8155] Add static PubsubIO.readMessagesWithCoderAndParseFn method
URL: https://github.com/apache/beam/pull/9473
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307428)
Time Spent: 20m  (was: 10m)

> Add PubsubIO.readMessagesWithCoderAndParseFn method
> ---
>
> Key: BEAM-8155
> URL: https://issues.apache.org/jira/browse/BEAM-8155
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Claire McGinty
>Assignee: Claire McGinty
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Complements [#8879|https://github.com/apache/beam/pull/8879]; the constructor 
> for a generic-typed PubsubIO.Read is private, so some kind of public 
> constructor is needed to use custom parse functions with generic types.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307426
 ]

ASF GitHub Bot logged work on BEAM-8113:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:48
Start Date: 05/Sep/19 20:48
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9451: [BEAM-8113] Stage 
files from context classloader
URL: https://github.com/apache/beam/pull/9451#issuecomment-528575435
 
 
   I have to take another look through the comments/updates.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307426)
Time Spent: 6h 50m  (was: 6h 40m)

> FlinkRunner: Stage files from context classloader
> -
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=307424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307424
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:44
Start Date: 05/Sep/19 20:44
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9257: [BEAM-7389] Add 
DoFn methods sample
URL: https://github.com/apache/beam/pull/9257#issuecomment-528574027
 
 
   @aaltay Tests are passing. More detailed descriptions will be available in 
the docs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307424)
Time Spent: 55h 20m  (was: 55h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 55h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307421
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:43
Start Date: 05/Sep/19 20:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528573675
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307421)
Time Spent: 1h 50m  (was: 1h 40m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=307420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307420
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 05/Sep/19 20:35
Start Date: 05/Sep/19 20:35
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9477: [BEAM-8151, 
BEAM-7848] Up the max number of threads inside the SDK harness to a default of 
10k
URL: https://github.com/apache/beam/pull/9477#issuecomment-528571071
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307420)
Time Spent: 3.5h  (was: 3h 20m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8158) Loosen Python dependency restrictions: httplib2, oauth2client

2019-09-05 Thread Jonathan (Jira)
Jonathan created BEAM-8158:
--

 Summary: Loosen Python dependency restrictions: httplib2, 
oauth2client
 Key: BEAM-8158
 URL: https://issues.apache.org/jira/browse/BEAM-8158
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Jonathan


The Beam Python SDK's current pinned dependencies create dependency conflict 
issues for my team at Twitter.

I'd like the following expansions of the Python SDK's dependency ranges:
 * oauth2client>=2.0.1,<*4* to at least oauth2client>=2.0.1,<=*4.1.2*
 * httplib2>=0.8,<=*0.12.0* to at least httplib2>=0.8,<=*0.12.3*

I understand, from pull request [8653|https://github.com/apache/beam/pull/8653] 
by [~altay], that the latter is blocked in turn by googledatastore.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-09-05 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923701#comment-16923701
 ] 

Pablo Estrada commented on BEAM-2572:
-

[~yajunwong] thanks for working on this. I think it's worth contributing  - it 
just requires healthy unit tests.

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307406
 ]

ASF GitHub Bot logged work on BEAM-8113:


Author: ASF GitHub Bot
Created on: 05/Sep/19 19:44
Start Date: 05/Sep/19 19:44
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files 
from context classloader
URL: https://github.com/apache/beam/pull/9451#issuecomment-528549486
 
 
   @lukecwik can I take your last comment as that you are okay with this PR? 
I'd like to merge it tomorrow, so that I don't have to rebase it again and 
again. :-)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307406)
Time Spent: 6h 40m  (was: 6.5h)

> FlinkRunner: Stage files from context classloader
> -
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8113) FlinkRunner: Stage files from context classloader

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8113?focusedWorklogId=307405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307405
 ]

ASF GitHub Bot logged work on BEAM-8113:


Author: ASF GitHub Bot
Created on: 05/Sep/19 19:42
Start Date: 05/Sep/19 19:42
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9451: [BEAM-8113] Stage files 
from context classloader
URL: https://github.com/apache/beam/pull/9451#issuecomment-528548813
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307405)
Time Spent: 6.5h  (was: 6h 20m)

> FlinkRunner: Stage files from context classloader
> -
>
> Key: BEAM-8113
> URL: https://issues.apache.org/jira/browse/BEAM-8113
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently, only files from {{FlinkRunner.class.getClassLoader()}} are staged 
> by default. Add also files from 
> {{Thread.currentThread().getContextClassLoader()}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8157) Flink state requests return wrong state in timers when encoded key is length-prefixed

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=307397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307397
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 05/Sep/19 19:18
Start Date: 05/Sep/19 19:18
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9484: [BEAM-8157] 
Remove length prefix from state key for Flink's state backend
URL: https://github.com/apache/beam/pull/9484#discussion_r321437183
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -343,11 +343,11 @@ public void clear(K key, W window) {
 }
 
 private void prepareStateBackend(K key) {
-  // Key for state request is shipped already encoded as ByteString,
-  // this is mostly a wrapping with ByteBuffer. We still follow the
-  // usual key encoding procedure.
-  // final ByteBuffer encodedKey = FlinkKeyUtils.encodeKey(key, 
keyCoder);
-  final ByteBuffer encodedKey = ByteBuffer.wrap(key.toByteArray());
+  // Key for state request is shipped already encoded as ByteString, 
but it is
 
 Review comment:
   https://gist.github.com/tweise/e1d95369234a52b12f90794f6032f039
   
   That's working on 2.14 and also on master as of 
4460f03cefbde5cf66c797408439321de2f40692
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307397)
Time Spent: 2h  (was: 1h 50m)

> Flink state requests return wrong state in timers when encoded key is 
> length-prefixed
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Due to multiple changes made in BEAM-7126, the Flink internal key encoding is 
> broken when the key is encoded with a length prefix. The Flink runner 
> requires the internal key to be encoded without a length prefix. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=307388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307388
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 05/Sep/19 18:15
Start Date: 05/Sep/19 18:15
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9283: [BEAM-7060] Type hints 
from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#issuecomment-528509353
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307388)
Time Spent: 15h 20m  (was: 15h 10m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923683#comment-16923683
 ] 

Udi Meiri commented on BEAM-7819:
-

Also, you might get inconsistent results from Dataflow: sometimes the 
message_id and publish_time will be filled and sometimes they will be empty. 
The feature is currently being rolled out.

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Matthew Darwin
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923665#comment-16923665
 ] 

Udi Meiri edited comment on BEAM-7819 at 9/5/19 6:06 PM:
-

Adding '--kms_key_name ""' should fix the issue.
Don't forget to rerun "python setup.py sdist" to recreate the tarball passed to 
sdk_location every time.
I'm not sure what the worker_jar does but I'd leave it in. :)

{code}
python setup.py sdist && ./scripts/run_integration_test.sh --project 
[my-project] --gcs_location gs://[my-project]/tmp --test_opts 
--tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner 
--sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar 
../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar
 --kms_key_name ""
{code}


was (Author: udim):
Adding '--kms_key_name ""' should fix the issue.
Don't forget to rerun "python setup.py sdist" to recreate the tarball passed to 
sdk_location every time.
I'm not sure what the worker_jar does but I'd leave it in. :)

{code}
python setup.py sdist && ./scripts/run_integration_test.sh --project 
[my-project] --gcs_location gs://[my-project]/tmp --test_opts 
--tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner 
--sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar 
../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar
 --kms_key_name=""
{code}

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Matthew Darwin
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923665#comment-16923665
 ] 

Udi Meiri commented on BEAM-7819:
-

Adding '--kms_key_name ""' should fix the issue.
Don't forget to rerun "python setup.py sdist" to recreate the tarball passed to 
sdk_location every time.
I'm not sure what the worker_jar does but I'd leave it in. :)

{code}
python setup.py sdist && ./scripts/run_integration_test.sh --project 
[my-project] --gcs_location gs://[my-project]/tmp --test_opts 
--tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner 
--sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar 
../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar
 --kms_key_name=""
{code}

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Matthew Darwin
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?focusedWorklogId=307374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307374
 ]

ASF GitHub Bot logged work on BEAM-7859:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:51
Start Date: 05/Sep/19 17:51
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9421: [BEAM-7859] set SDK 
worker parallelism to 1 in word count test
URL: https://github.com/apache/beam/pull/9421#issuecomment-528496517
 
 
   Ping @angoenka -- this should be a no-op
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307374)
Time Spent: 50m  (was: 40m)

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The error was observed during Beam 2.14.0 release validation, see: 
> https://issues.apache.org/jira/browse/BEAM-7224?focusedCommentId=16896831=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16896831
> Looks like master currently fails with a different error, both in Loopback 
> and Docker modes.
> [~ibzib] [~altay] [~robertwb] [~angoenka]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8153) PubSubIntegrationTest failing in post-commit

2019-09-05 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923655#comment-16923655
 ] 

Udi Meiri commented on BEAM-8153:
-

Yes, the immediate issue is resolved

> PubSubIntegrationTest failing in post-commit
> 
>
> Key: BEAM-8153
> URL: https://issues.apache.org/jira/browse/BEAM-8153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Matthew Darwin
>Priority: Major
>
> Most likely due to: https://github.com/apache/beam/pull/9232
> {code}
> 11:44:31 
> ==
> 11:44:31 ERROR: test_streaming_with_attributes 
> (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> 11:44:31 
> --
> 11:44:31 Traceback (most recent call last):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 199, in test_streaming_with_attributes
> 11:44:31 self._test_streaming(with_attributes=True)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 191, in _test_streaming
> 11:44:31 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>  line 91, in run_pipeline
> 11:44:31 result = p.run()
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 420, in run
> 11:44:31 return self.runner.run_pipeline(self, self._options)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 11:44:31 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> 11:44:31 _assert_match(actual=arg1, matcher=arg2, reason=arg3)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> 11:44:31 if not matcher.matches(actual):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/core/allof.py",
>  line 16, in matches
> 11:44:31 if not matcher.matches(item):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> 11:44:31 match_result = self._matches(item)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py",
>  line 91, in _matches
> 11:44:31 return Counter(self.messages) == Counter(self.expected_msg)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 566, in __init__
> 11:44:31 self.update(*args, **kwds)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 653, in update
> 11:44:31 _count_elements(self, iterable)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub.py",
>  line 83, in __hash__
> 11:44:31 self.message_id, self.publish_time.seconds,
> 11:44:31 AttributeError: 'NoneType' object has no attribute 'seconds'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=307367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307367
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:42
Start Date: 05/Sep/19 17:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9344: [BEAM-7984] The 
coder returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-528493036
 
 
   Thanks for tracking this down. https://github.com/apache/beam/pull/9486
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307367)
Time Spent: 3h 20m  (was: 3h 10m)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-09-05 Thread Matthew Darwin (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923646#comment-16923646
 ] 

Matthew Darwin edited comment on BEAM-7819 at 9/5/19 5:40 PM:
--

Struggling a little with running the integration test locally; does it need 
both a the tarball building and the worker_jar?

./scripts/run_integration_test.sh --project [my-project] --gcs_location 
gs://[my-project]/tmp --test_opts 
--tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner 
--sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz --worker_jar 
../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.16.0-SNAPSHOT.jar

{\{AssertionError: }}
 {{Expected: (Test pipeline expected terminated in state: RUNNING and Expected 
4 messages.)}}
 \{{ but: Test pipeline expected terminated in state: RUNNING Test pipeline job 
terminated in state: FAILED}}

In addition, for the pubsub integration test, given that pubsub will generate 
the message_id and publish_time, I'm not sure how exactly to handle that in the 
expected messages?


was (Author: matt-darwin):
Struggling a little with running the integration test locally; does it need 
both a the tarball building and the worker_jar?

./scripts/run_integration_test.sh --project  [my-project] --gcs_location 
gs://[my-project]/tmp --test_opts 
--tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner 
--sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz

{{AssertionError: }}
{{Expected: (Test pipeline expected terminated in state: RUNNING and Expected 4 
messages.)}}
{{ but: Test pipeline expected terminated in state: RUNNING Test pipeline job 
terminated in state: FAILED}}

In addition, for the pubsub integration test, given that pubsub will generate 
the message_id and publish_time, I'm not sure how exactly to handle that in the 
expected messages?

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Matthew Darwin
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307365
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:39
Start Date: 05/Sep/19 17:39
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528491791
 
 
   > When is this test intended to run? Postcommit only?
   
   @tweise  Postcommit or trigger only
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307365)
Time Spent: 1h 40m  (was: 1.5h)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-09-05 Thread Matthew Darwin (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Darwin reassigned BEAM-7819:


Assignee: Matthew Darwin  (was: Udi Meiri)

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Matthew Darwin
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7819) PubsubMessage message parsing is lacking non-attribute fields

2019-09-05 Thread Matthew Darwin (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923646#comment-16923646
 ] 

Matthew Darwin commented on BEAM-7819:
--

Struggling a little with running the integration test locally; does it need 
both a the tarball building and the worker_jar?

./scripts/run_integration_test.sh --project  [my-project] --gcs_location 
gs://[my-project]/tmp --test_opts 
--tests=apache_beam.io.gcp.pubsub_integration_test --runner TestDataflowRunner 
--sdk_location ./dist/apache-beam-2.16.0.dev0.tar.gz

{{AssertionError: }}
{{Expected: (Test pipeline expected terminated in state: RUNNING and Expected 4 
messages.)}}
{{ but: Test pipeline expected terminated in state: RUNNING Test pipeline job 
terminated in state: FAILED}}

In addition, for the pubsub integration test, given that pubsub will generate 
the message_id and publish_time, I'm not sure how exactly to handle that in the 
expected messages?

> PubsubMessage message parsing is lacking non-attribute fields
> -
>
> Key: BEAM-7819
> URL: https://issues.apache.org/jira/browse/BEAM-7819
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> User reported issue: 
> https://lists.apache.org/thread.html/139b0c15abc6471a2e2202d76d915c645a529a23ecc32cd9cfecd315@%3Cuser.beam.apache.org%3E
> """
> Looking at the source code, with my untrained python eyes, I think if the 
> intention is to include the message id and the publish time in the attributes 
> attribute of the PubSubMessage type, then the protobuf mapping is missing 
> something:-
> @staticmethod
> def _from_proto_str(proto_msg):
> """Construct from serialized form of ``PubsubMessage``.
> Args:
> proto_msg: String containing a serialized protobuf of type
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> Returns:
> A new PubsubMessage object.
> """
> msg = pubsub.types.pubsub_pb2.PubsubMessage()
> msg.ParseFromString(proto_msg)
> # Convert ScalarMapContainer to dict.
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> return PubsubMessage(msg.data, attributes)
> The protobuf definition is here:-
> https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1#google.pubsub.v1.PubsubMessage
> and so it looks as if the message_id and publish_time are not being parsed as 
> they are seperate from the attributes. Perhaps the PubsubMessage class needs 
> expanding to include these as attributes, or they would need adding to the 
> dictionary for attributes. This would only need doing for the _from_proto_str 
> as obviously they would not need to be populated when transmitting a message 
> to PubSub.
> My python is not great, I'm assuming the latter option would need to look 
> something like this?
> attributes = dict((key, msg.attributes[key]) for key in msg.attributes)
> attributes.update({'message_id': msg.message_id, 'publish_time': 
> msg.publish_time})
> return PubsubMessage(msg.data, attributes)
> """



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307364
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:37
Start Date: 05/Sep/19 17:37
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528491002
 
 
   I mean it is great to have a backdoor to publish container snapshots, but..
   
   When is this test intended to run? Postcommit only?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307364)
Time Spent: 1.5h  (was: 1h 20m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7967) Execute portable Flink application jar

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7967?focusedWorklogId=307357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307357
 ]

ASF GitHub Bot logged work on BEAM-7967:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:30
Start Date: 05/Sep/19 17:30
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9408:  [BEAM-7967] Execute 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9408#issuecomment-528488379
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307357)
Time Spent: 1h 20m  (was: 1h 10m)

> Execute portable Flink application jar
> --
>
> Key: BEAM-7967
> URL: https://issues.apache.org/jira/browse/BEAM-7967
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8157) Flink state requests return wrong state in timers when encoded key is length-prefixed

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=307355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307355
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:25
Start Date: 05/Sep/19 17:25
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9484: [BEAM-8157] 
Remove length prefix from state key for Flink's state backend
URL: https://github.com/apache/beam/pull/9484#discussion_r321387508
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
 ##
 @@ -343,11 +343,11 @@ public void clear(K key, W window) {
 }
 
 private void prepareStateBackend(K key) {
-  // Key for state request is shipped already encoded as ByteString,
-  // this is mostly a wrapping with ByteBuffer. We still follow the
-  // usual key encoding procedure.
-  // final ByteBuffer encodedKey = FlinkKeyUtils.encodeKey(key, 
keyCoder);
-  final ByteBuffer encodedKey = ByteBuffer.wrap(key.toByteArray());
+  // Key for state request is shipped already encoded as ByteString, 
but it is
 
 Review comment:
   There is something that I'm missing. State and timers work as of release 
2.14 with Flink 1.8. What has caused it to break?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307355)
Time Spent: 1h 50m  (was: 1h 40m)

> Flink state requests return wrong state in timers when encoded key is 
> length-prefixed
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Due to multiple changes made in BEAM-7126, the Flink internal key encoding is 
> broken when the key is encoded with a length prefix. The Flink runner 
> requires the internal key to be encoded without a length prefix. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307354
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:23
Start Date: 05/Sep/19 17:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-528485156
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307354)
Time Spent: 5h  (was: 4h 50m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=307353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307353
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:22
Start Date: 05/Sep/19 17:22
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-528485086
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307353)
Time Spent: 4h 50m  (was: 4h 40m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8131) Provide Kubernetes setup with Prometheus

2019-09-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8131?focusedWorklogId=307352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307352
 ]

ASF GitHub Bot logged work on BEAM-8131:


Author: ASF GitHub Bot
Created on: 05/Sep/19 17:21
Start Date: 05/Sep/19 17:21
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #9482: [BEAM-8131] Provide 
Kubernetes setup for Prometheus
URL: https://github.com/apache/beam/pull/9482#issuecomment-528484358
 
 
   In doc I see that we decided to move from BQ to Prometheus.
   Can you add some background on this decision compared to using Grafana BQ 
connector?
   https://grafana.com/grafana/plugins/doitintl-bigquery-datasource/installation
   
   (I added comment in doc as well)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 307352)
Time Spent: 1h  (was: 50m)

> Provide Kubernetes setup with Prometheus
> 
>
> Key: BEAM-8131
> URL: https://issues.apache.org/jira/browse/BEAM-8131
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


  1   2   >