[jira] [Work logged] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=359742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359742
 ]

ASF GitHub Bot logged work on BEAM-8376:


Author: ASF GitHub Bot
Created on: 14/Dec/19 02:13
Start Date: 14/Dec/19 02:13
Worklog Time Spent: 10m 
  Work Description: fredzqm commented on issue #10187: [BEAM-8376] Initial 
version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#issuecomment-565671552
 
 
   Please note that WriteBatch is atomic transaction.
   Large WriteBatch could lead to contention and high error rates.
   
   We are working on launching a batchWrite API for non-atomic data ingestion 
use cases. Before it gets launched, the next best option is writing each 
document separately.
   
   (BTW, we have a max 500 write per commit limit.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359742)
Time Spent: 2h 10m  (was: 2h)

> Add FirestoreIO connector to Java SDK
> -
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Stefan Djelekar
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=359740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359740
 ]

ASF GitHub Bot logged work on BEAM-8376:


Author: ASF GitHub Bot
Created on: 14/Dec/19 02:12
Start Date: 14/Dec/19 02:12
Worklog Time Spent: 10m 
  Work Description: fredzqm commented on pull request #10187: [BEAM-8376] 
Initial version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#discussion_r357890339
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreBatchRequester.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.io.gcp.firestore;
+
+import com.google.api.core.ApiFuture;
+import com.google.cloud.firestore.DocumentReference;
+import com.google.cloud.firestore.Firestore;
+import com.google.cloud.firestore.WriteBatch;
+import com.google.cloud.firestore.WriteResult;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.concurrent.ExecutionException;
+
+public class FirestoreBatchRequester {
+private Firestore firestoreClient;
+private static final Logger LOG = 
LoggerFactory.getLogger(FirestoreBatchRequester.class);
+
+public FirestoreBatchRequester(Firestore firestoreClient) {
+this.firestoreClient = firestoreClient;
+}
+
+public void commit(List input, String collection, String documentId) 
throws ExecutionException, InterruptedException {
+WriteBatch batch = firestoreClient.batch();
+
+for (T object : input) {
+DocumentReference docRef = getDocRef(collection, documentId);
+batch.set(docRef, object);
+}
+
+ApiFuture> commit = batch.commit();
 
 Review comment:
   Please a
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359740)
Time Spent: 2h  (was: 1h 50m)

> Add FirestoreIO connector to Java SDK
> -
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Stefan Djelekar
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=359738=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359738
 ]

ASF GitHub Bot logged work on BEAM-8376:


Author: ASF GitHub Bot
Created on: 14/Dec/19 02:12
Start Date: 14/Dec/19 02:12
Worklog Time Spent: 10m 
  Work Description: fredzqm commented on pull request #10187: [BEAM-8376] 
Initial version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#discussion_r357890423
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreBatchRequest.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.io.gcp.firestore;
+
+import com.google.cloud.firestore.DocumentReference;
+import com.google.cloud.firestore.Firestore;
+import com.google.cloud.firestore.WriteBatch;
+
+import java.util.List;
+
+
+public class FirestoreBatchRequest {
+private Firestore firestoreClient;
+
+public FirestoreBatchRequest(Firestore firestoreClient) {
+this.firestoreClient = firestoreClient;
+}
+
+public WriteBatch batchWithKey(List input, String collection, String 
key) {
+WriteBatch batch = firestoreClient.batch();
 
 Review comment:
   Please make sure this is addressed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359738)
Time Spent: 1h 50m  (was: 1h 40m)

> Add FirestoreIO connector to Java SDK
> -
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Stefan Djelekar
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8376) Add FirestoreIO connector to Java SDK

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8376?focusedWorklogId=359739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359739
 ]

ASF GitHub Bot logged work on BEAM-8376:


Author: ASF GitHub Bot
Created on: 14/Dec/19 02:12
Start Date: 14/Dec/19 02:12
Worklog Time Spent: 10m 
  Work Description: fredzqm commented on pull request #10187: [BEAM-8376] 
Initial version of firestore connector JavaSDK
URL: https://github.com/apache/beam/pull/10187#discussion_r357890339
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/firestore/FirestoreBatchRequester.java
 ##
 @@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.beam.sdk.io.gcp.firestore;
+
+import com.google.api.core.ApiFuture;
+import com.google.cloud.firestore.DocumentReference;
+import com.google.cloud.firestore.Firestore;
+import com.google.cloud.firestore.WriteBatch;
+import com.google.cloud.firestore.WriteResult;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+import java.util.concurrent.ExecutionException;
+
+public class FirestoreBatchRequester {
+private Firestore firestoreClient;
+private static final Logger LOG = 
LoggerFactory.getLogger(FirestoreBatchRequester.class);
+
+public FirestoreBatchRequester(Firestore firestoreClient) {
+this.firestoreClient = firestoreClient;
+}
+
+public void commit(List input, String collection, String documentId) 
throws ExecutionException, InterruptedException {
+WriteBatch batch = firestoreClient.batch();
+
+for (T object : input) {
+DocumentReference docRef = getDocRef(collection, documentId);
+batch.set(docRef, object);
+}
+
+ApiFuture> commit = batch.commit();
 
 Review comment:
   Please a
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359739)
Time Spent: 1h 50m  (was: 1h 40m)

> Add FirestoreIO connector to Java SDK
> -
>
> Key: BEAM-8376
> URL: https://issues.apache.org/jira/browse/BEAM-8376
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Stefan Djelekar
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Motivation:
> There is no Firestore connector for Java SDK at the moment.
> Having it will enhance the integrations with database options on the Google 
> Cloud Platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=359734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359734
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 14/Dec/19 01:46
Start Date: 14/Dec/19 01:46
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10369: [BEAM-8933] 
BigQueryIO Arrow for read
URL: https://github.com/apache/beam/pull/10369#issuecomment-565669088
 
 
   Turns out the issue was the incompatible arrow version. The latest Spark 
release, v2.4.4, depends on 0.10. They've upgraded to arrow 0.15.1 for Spark 
v3.0.0.
   
   I force-pushed to my branch, including a downgrade to 0.10.0 which seems to 
fix it. Also put it up as PR #10384 
   There may be some compatibility issues with the BQ Read API though, I'm not 
sure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359734)
Time Spent: 2h 50m  (was: 2h 40m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=359732=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359732
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 14/Dec/19 01:42
Start Date: 14/Dec/19 01:42
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10384: [WIP] 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384
 
 
   Adds `ArrowSchema` and `ArrowSchemaTest`
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 

[jira] [Resolved] (BEAM-8342) upgrade samza runner to use samza 1.3

2019-12-13 Thread Hai Lu (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hai Lu resolved BEAM-8342.
--
Fix Version/s: Not applicable
   Resolution: Fixed

> upgrade samza runner to use samza 1.3
> -
>
> Key: BEAM-8342
> URL: https://issues.apache.org/jira/browse/BEAM-8342
> Project: Beam
>  Issue Type: Task
>  Components: runner-samza
>Reporter: Hai Lu
>Assignee: Hai Lu
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> some changes are needed to support v1.3 of samza



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359717=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359717
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:25
Start Date: 14/Dec/19 00:25
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357879617
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g
 ##
 @@ -0,0 +1,262 @@
+/*
+ * Copyright 2012 Facebook, Inc.
 
 Review comment:
   Yes the original file can be found 
[here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/DocumentGenerator.g).
 SpotlessApply seems to have removed it in some of the .java files.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359717)
Time Spent: 4h  (was: 3h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359716
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:25
Start Date: 14/Dec/19 00:25
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357879617
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g
 ##
 @@ -0,0 +1,262 @@
+/*
+ * Copyright 2012 Facebook, Inc.
 
 Review comment:
   Yes the original file can be found 
[here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/Thrift.g).
 SpotlessApply seems to have removed it in some of the .java files.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359716)
Time Spent: 3h 50m  (was: 3h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359714
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:24
Start Date: 14/Dec/19 00:24
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10190: [BEAM-8575] 
Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#issuecomment-565658734
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359714)
Time Spent: 35h 40m  (was: 35.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 35h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359715
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:24
Start Date: 14/Dec/19 00:24
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357879437
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/Thrift.g
 ##
 @@ -0,0 +1,290 @@
+/*
+ *  Copyright 2008 Martin Traverso
+ *  Copyright 2012 Facebook, Inc.
 
 Review comment:
   Yes the original file can be found 
[here](https://github.com/facebookarchive/swift/blob/master/swift-idl-parser/src/main/antlr3/com/facebook/swift/parser/antlr/Thrift.g).
 We've included it as part of the parser. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359715)
Time Spent: 3h 40m  (was: 3.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359712
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:04
Start Date: 14/Dec/19 00:04
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357876447
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstInteger.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+
+public class ConstInteger extends ConstValue {
 
 Review comment:
   Yes this class is reused for `i64`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359712)
Time Spent: 3.5h  (was: 3h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359711
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:01
Start Date: 14/Dec/19 00:01
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357875927
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359709=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359709
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:00
Start Date: 14/Dec/19 00:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357875805
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359710
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Dec/19 00:00
Start Date: 14/Dec/19 00:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357875805
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359708=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359708
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:54
Start Date: 13/Dec/19 23:54
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10383: [BEAM-8575] 
Added a unit test to test that Combine works with FixedWi…
URL: https://github.com/apache/beam/pull/10383#issuecomment-565653382
 
 
   R:  @chamikaramj
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359708)
Time Spent: 35.5h  (was: 35h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 35.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359707=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359707
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:54
Start Date: 13/Dec/19 23:54
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357874789
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359704=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359704
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:36
Start Date: 13/Dec/19 23:36
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-565650003
 
 
   Thanks @gsteelman. We would love to have the sample Thrift schema and 
anything else we can utilize for testing!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359704)
Time Spent: 2h 40m  (was: 2.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359703=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359703
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:34
Start Date: 13/Dec/19 23:34
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357871234
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8810) Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8810?focusedWorklogId=359699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359699
 ]

ASF GitHub Bot logged work on BEAM-8810:


Author: ASF GitHub Bot
Created on: 13/Dec/19 23:22
Start Date: 13/Dec/19 23:22
Worklog Time Spent: 10m 
  Work Description: dpmills commented on pull request #10311: [BEAM-8810] 
Detect stuck commits in StreamingDataflowWorker
URL: https://github.com/apache/beam/pull/10311#discussion_r357868037
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -2110,20 +2152,22 @@ public MapTask getMapTask() {
 /** Mark the given key and work as active. */
 public boolean activateWork(ByteString key, Work work) {
   synchronized (activeWork) {
-Queue queue = activeWork.get(key);
-if (queue == null) {
-  queue = new ArrayDeque<>();
-  activeWork.put(key, queue);
-  queue.add(work);
-  // Fall through to execute without the lock held.
-} else {
-  if (queue.peek().getWorkItem().getWorkToken() != 
work.getWorkItem().getWorkToken()) {
+Deque queue = activeWork.get(key);
+if (queue != null) {
+  Preconditions.checkState(!queue.isEmpty());
+  if (queue.peekLast().getWorkItem().getWorkToken() == 
work.getWorkItem().getWorkToken()) {
 
 Review comment:
   Check against everything in queue
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359699)
Time Spent: 1h 20m  (was: 1h 10m)

> Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
> ---
>
> Key: BEAM-8810
> URL: https://issues.apache.org/jira/browse/BEAM-8810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In several pipelines using streaming engine and thus the streaming commit 
> rpcs, work became stuck in state COMMITTING indefinitely.  Such stuckness 
> coincided with repeated streaming rpc failures.
> The status page shows that the key has work in state COMMITTING, and has 1 
> queued work item.
> There is a single active commit stream, with 0 pending requests.
> The stream could exist past the stream deadline because the StreamCache only 
> closes stream due to the deadline when a stream is retrieved, which only 
> occurs if there are other commits.  Since the pipeline is stuck due to this 
> event, there are no other commits.
> It seems therefore there is some race on the commitStream between onNewStream 
> and commitWork that either prevents work from being retried, an exception 
> that triggers between when the pending request is removed and the callback is 
> called, or some potential corruption of the activeWork data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=359692=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359692
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:51
Start Date: 13/Dec/19 22:51
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10382: [BEAM-8967] Declare 
JSR305 dependency as 'shadow'
URL: https://github.com/apache/beam/pull/10382#issuecomment-565638827
 
 
   R: @iemejia, @lukecwik, @Ardagan, @udim
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359692)
Time Spent: 50m  (was: 40m)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359689=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359689
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:45
Start Date: 13/Dec/19 22:45
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10374: [BEAM-8575] 
Added a unit test to test that Combine works with FixedWi…
URL: https://github.com/apache/beam/pull/10374#issuecomment-565637382
 
 
   Don't review this PR. It's polluted. Please review 
https://github.com/apache/beam/pull/10383 instead.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359689)
Time Spent: 35h 20m  (was: 35h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 35h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359687=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359687
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:43
Start Date: 13/Dec/19 22:43
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10383: 
[BEAM-8575] Added a unit test to test that Combine works with FixedWi…
URL: https://github.com/apache/beam/pull/10383
 
 
   …ndows.
   
   [BEAM-8575] Added a unit test to test that Combine works with FixedWindows.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Commented] (BEAM-8969) runtime_type_check DoFn wrapper doesn't call setup and teardown

2019-12-13 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995967#comment-16995967
 ] 

Robert Bradshaw commented on BEAM-8969:
---

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/typecheck.py#L46
 should be fixed to delegate these methods. 

> runtime_type_check DoFn wrapper doesn't call setup and teardown
> ---
>
> Key: BEAM-8969
> URL: https://issues.apache.org/jira/browse/BEAM-8969
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Labels: beginner
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8969) runtime_type_check DoFn wrapper doesn't call setup and teardown

2019-12-13 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8969:
-

 Summary: runtime_type_check DoFn wrapper doesn't call setup and 
teardown
 Key: BEAM-8969
 URL: https://issues.apache.org/jira/browse/BEAM-8969
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359680
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357798688
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359676
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357844088
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstList.java
 ##
 @@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+public class ConstList extends ConstValue {
+  private final List values;
+
+  public ConstList(List values) {
+this.values = ImmutableList.copyOf(checkNotNull(values, "values"));
 
 Review comment:
   Docs unclear. Does this create a deep copy or an immutable wrapper?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359676)
Time Spent: 2h 10m  (was: 2h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359671
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357798355
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359668
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357796170
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359679
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357815885
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359664=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359664
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357778411
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/DocumentGenerator.g
 ##
 @@ -0,0 +1,262 @@
+/*
+ * Copyright 2012 Facebook, Inc.
 
 Review comment:
   Is this correct?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359664)
Time Spent: 1h 20m  (was: 1h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359673
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357838118
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/ConstInteger.java
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+
+public class ConstInteger extends ConstValue {
 
 Review comment:
   Is this class reused for `i64` type? Otherwise this should be an `int` as 
the class field type, not a `long`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359673)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359677
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357817721
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359670
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357795561
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359665
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357782880
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359681=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359681
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357848868
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/DocumentTest.java
 ##
 @@ -0,0 +1,390 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.ConstInteger;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.Union;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.junit.Assert;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/** Tests for {@link Document} class. */
+@RunWith(JUnit4.class)
+public class DocumentTest {
+
+  /** Tests {@link Document#addIncludes(String)}. */
+  @Test
+  public void testAddIncludes() {
+Document document = Document.emptyDocument();
+List includesExpected = new ArrayList<>();
+includesExpected.add("simple_test.thrift");
+includesExpected.add("shared.thrift");
+document.addIncludes(includesExpected);
+
+List includesActual = document.getHeader().getIncludes();
+
+Assert.assertEquals(includesExpected, includesActual);
+  }
+
+  /** Tests {@link Document#addCppIncludes(List)}. */
+  @Test
+  public void testAddCppIncludes() {
+Document document = Document.emptyDocument();
+List cppIncludesExpected = new ArrayList<>();
+cppIncludesExpected.add("iostream");
+cppIncludesExpected.add("set");
+document.addCppIncludes(cppIncludesExpected);
+
+List cppIncludesActual = document.getHeader().getCppIncludes();
+
+Assert.assertEquals(cppIncludesExpected, cppIncludesActual);
+  }
+
+  /** Tests {@link Document#removeDefinition(String)}. */
+  @Test
+  public void testRemoveDefinition() {
+Document document = Document.emptyDocument();
+List emptyAnnotations = new ArrayList<>();
+String constName = "STRINGCONSTANT";
+document.addConstString(constName, emptyAnnotations, "test_string");
+Assert.assertEquals(1, document.getDefinitions().size());
+
+document.removeDefinition(constName);
+Assert.assertEquals(0, document.getDefinitions().size());
+  }
+
+  /** Tests {@link Document#addConst(String, ThriftType, ConstValue)}. */
+  @Test
+  public void testAddConst() {
+Document document = Document.emptyDocument();
+List emptyAnnotations = new ArrayList<>();
+String constName = "INT32CONSTANT";
+document.addConst(
+constName, new BaseType(BaseType.Type.I32, emptyAnnotations), new 
ConstInteger(252));
+
+String constNameActual = document.getDefinitions().get(0).getName();
+
+Assert.assertEquals(constName, constNameActual);
+Assert.assertTrue(document.getDefinitions().get(0) instanceof Const);
+  }
+
+  /** Tests {@link 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359667
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357782420
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359675
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357817397
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359674
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357845858
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359672=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359672
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357836567
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/ThriftIdlParser.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser;
+
+import java.io.IOException;
+import java.io.Reader;
+import org.antlr.runtime.ANTLRReaderStream;
+import org.antlr.runtime.CommonTokenStream;
+import org.antlr.runtime.RecognitionException;
+import org.antlr.runtime.tree.BufferedTreeNodeStream;
+import org.antlr.runtime.tree.Tree;
+import org.antlr.runtime.tree.TreeNodeStream;
+import org.apache.beam.sdk.io.thrift.parser.antlr.DocumentGenerator;
+import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftLexer;
+import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftParser;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ThriftIdlParser {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ThriftIdlParser.class);
+
+  /** Generates {@link Document} from {@link org.antlr.runtime.tree.Tree}. */
+  public static Document parseThriftIdl(CharSource input) throws IOException {
+Tree tree = parseTree(input);
+TreeNodeStream stream = new BufferedTreeNodeStream(tree);
+DocumentGenerator generator = new DocumentGenerator(stream);
+try {
+  return generator.document().value;
+} catch (RecognitionException e) {
+  LOG.error("Failed to generate document: " + e.getMessage());
+  throw new RuntimeException(e);
+}
+  }
+
+  /** Generates {@link org.antlr.runtime.tree.Tree} from input. */
+  static Tree parseTree(CharSource input) throws IOException {
+try (Reader reader = input.openStream()) {
+  ThriftLexer lexer = new ThriftLexer(new ANTLRReaderStream(reader));
+  ThriftParser parser = new ThriftParser(new CommonTokenStream(lexer));
+  try {
+Tree tree = (Tree) parser.document().getTree();
+if (parser.getNumberOfSyntaxErrors() > 0) {
+  LOG.error("Parsing generated " + parser.getNumberOfSyntaxErrors() + 
"errors.");
+  throw new RuntimeException("syntax error");
 
 Review comment:
   Is there any additional information we could re-throw here, perhaps what the 
actual errors are? Or is that covered in the `catch (RecognitionException e) {` 
case below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359672)
Time Spent: 2h  (was: 1h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359669
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357797134
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359678
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357846633
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.List;
+import java.util.Objects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
 
 Review comment:
   The more I think about it, the more I am concerned about importing an 
explicit version of guava. This would disallow Beam from updating to new 
versions of guava without significant code changes. Typically versions are 
specified in the build config files. Why the need to specify the version here? 
Applies to everywhere else this particular guava import is used.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359678)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=359666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359666
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 13/Dec/19 22:01
Start Date: 13/Dec/19 22:01
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r357780284
 
 

 ##
 File path: sdks/java/io/thrift/src/main/antlr/Thrift.g
 ##
 @@ -0,0 +1,290 @@
+/*
+ *  Copyright 2008 Martin Traverso
+ *  Copyright 2012 Facebook, Inc.
 
 Review comment:
   Are these correct?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359666)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8837) PCollectionVisualizationTest: possible bug

2019-12-13 Thread Ning Kang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995948#comment-16995948
 ] 

Ning Kang commented on BEAM-8837:
-

We'll just run the real logic to prepare data as inputs for visualization 
instead of patching test data for PCollections to be visualized.

 

 

> PCollectionVisualizationTest: possible bug
> --
>
> Key: BEAM-8837
> URL: https://issues.apache.org/jira/browse/BEAM-8837
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This seems like a bug, even though the test passes:
> {code}
> test_display_plain_text_when_kernel_has_no_frontend 
> (apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest)
>  ... Exception in thread Thread-4405:
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
> self.run()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/.eggs/timeloop-1.0.2-py3.7.egg/timeloop/job.py",
>  line 19, in run
> self.execute(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 132, in continuous_update_display
> updated_pv.display_facets(updating_pv=pv)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 209, in display_facets
> data = self._to_dataframe()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 278, in _to_dataframe
> for el in self._to_element_list():
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 266, in _to_element_list
> if ie.current_env().cache_manager().exists('full', self._cache_key):
> AttributeError: 'NoneType' object has no attribute 'exists'
> ok
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359663
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 13/Dec/19 21:54
Start Date: 13/Dec/19 21:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10159: [BEAM-8575] 
Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r357848112
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -399,6 +418,43 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  def test_combining_with_accumulation_mode(self):
+# PCollection will contain elements from 1 to 5.
+elements = [i for i in range(1, 6)]
+
+ts = TestStream().advance_watermark_to(0)
+for i in elements:
+  ts.add_elements([i])
+ts.advance_watermark_to_infinity()
+
+options = PipelineOptions()
+options.view_as(StandardOptions).streaming = True
+with TestPipeline(options=options) as p:
+  result = (p
+| ts
+| beam.WindowInto(
+GlobalWindows(),
+accumulation_mode=trigger.AccumulationMode.ACCUMULATING,
+trigger=AfterWatermark(early=AfterAll(AfterCount(1)))
+)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2)
+| beam.ParDo(self.record_dofn()))
+
+# The trigger should fire repeatedly for each newly added element,
+# and at least once for advancing the watermark to infinity.
+# The firings should accumulate the output.
+# First firing: 1 = 1
+# Second firing: 3 = 1 + 2
+# Third firing: 6 = 1 + 2 + 3
+# Fourth firing: 10 = 1 + 2 + 3 + 4
+# Fifth firing: 15 = 1 + 2 + 3 + 4 + 5
+# Next firings: 15 = 15 + 0  (advancing the watermark to infinity)
+# The exact number of firings may vary,
 
 Review comment:
   This was due to a bug fix for firing for discarding windows. The latter is 
correct. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359663)
Time Spent: 35h  (was: 34h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 35h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-12-13 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira updated BEAM-7970:
--
Status: Open  (was: Triage Needed)

> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8837) PCollectionVisualizationTest: possible bug

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8837?focusedWorklogId=359654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359654
 ]

ASF GitHub Bot logged work on BEAM-8837:


Author: ASF GitHub Bot
Created on: 13/Dec/19 21:14
Start Date: 13/Dec/19 21:14
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10321: [BEAM-8837] Fix 
pcoll_visualization tests
URL: https://github.com/apache/beam/pull/10321#issuecomment-565543900
 
 
   R: @udim 
   Hi Udi, could you please take another look to see if we can merge the PR? 
Thanks!
   ___
   
   Discussed offline.
   I've removed irrelevant changes for this unit test and removed usages of 
patched test PCollection data. Instead, we'll produce test data for 
visualization logic with real interactive runner pipeline runs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359654)
Time Spent: 2h 10m  (was: 2h)

> PCollectionVisualizationTest: possible bug
> --
>
> Key: BEAM-8837
> URL: https://issues.apache.org/jira/browse/BEAM-8837
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This seems like a bug, even though the test passes:
> {code}
> test_display_plain_text_when_kernel_has_no_frontend 
> (apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest)
>  ... Exception in thread Thread-4405:
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
> self.run()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/.eggs/timeloop-1.0.2-py3.7.egg/timeloop/job.py",
>  line 19, in run
> self.execute(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 132, in continuous_update_display
> updated_pv.display_facets(updating_pv=pv)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 209, in display_facets
> data = self._to_dataframe()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 278, in _to_dataframe
> for el in self._to_element_list():
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 266, in _to_element_list
> if ie.current_env().cache_manager().exists('full', self._cache_key):
> AttributeError: 'NoneType' object has no attribute 'exists'
> ok
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=359653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359653
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 13/Dec/19 21:13
Start Date: 13/Dec/19 21:13
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9915: [BEAM-7746] Add 
python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#issuecomment-565610604
 
 
   Is that not part of the precommit suite?  The tests for this PR definitely
   ran and passed.  The traceback seemed to indicate that google.protobuf is
   not installed...
   
   On Fri, Dec 13, 2019 at 11:31 AM Udi Meiri  wrote:
   
   > I believe this PR broke
   > :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
   > Opened https://issues.apache.org/jira/browse/BEAM-8966
   >
   > —
   > You are receiving this because you modified the open/close state.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359653)
Time Spent: 36h  (was: 35h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 36h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8968) portableWordCount test for Spark/Flink failing: jar not found

2019-12-13 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995907#comment-16995907
 ] 

Kyle Weaver commented on BEAM-8968:
---

I suspect this started happening after switching the Beam version tag to 
2.19.0. I verified that the runners:flink:1.9:job-server:shadowJar is running 
prior to portableWordCountFlinkRunnerBatch, so the shadowJar task must not be 
(re)building the jar with the correct new name.

> portableWordCount test for Spark/Flink failing: jar not found
> -
>
> Key: BEAM-8968
> URL: https://issues.apache.org/jira/browse/BEAM-8968
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> This affects portableWordCountSparkRunnerBatch, 
> portableWordCountFlinkRunnerBatch, and portableWordCountFlinkRunnerStreaming.
> 22:43:23 RuntimeError: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib/runners/flink/1.9/job-server/build/libs/beam-runners-flink-1.9-job-server-2.19.0-SNAPSHOT.jar
>  not found. Please build the server with 
> 22:43:23   cd 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib;
>  ./gradlew runners:flink:1.9:job-server:shadowJar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=359647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359647
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:56
Start Date: 13/Dec/19 20:56
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on issue #10254: [BEAM-8564] Add 
LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-565605851
 
 
   @amoght I don't have enough context to make the call on that, as I am very 
new to Beam. I have reached out to some others at Twitter to also review this 
change, as they will have more context. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359647)
Time Spent: 4h 40m  (was: 4.5h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8968) portableWordCount test for Spark/Flink failing: jar not found

2019-12-13 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8968:
--
Status: Open  (was: Triage Needed)

> portableWordCount test for Spark/Flink failing: jar not found
> -
>
> Key: BEAM-8968
> URL: https://issues.apache.org/jira/browse/BEAM-8968
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> This affects portableWordCountSparkRunnerBatch, 
> portableWordCountFlinkRunnerBatch, and portableWordCountFlinkRunnerStreaming.
> 22:43:23 RuntimeError: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib/runners/flink/1.9/job-server/build/libs/beam-runners-flink-1.9-job-server-2.19.0-SNAPSHOT.jar
>  not found. Please build the server with 
> 22:43:23   cd 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib;
>  ./gradlew runners:flink:1.9:job-server:shadowJar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8273) Improve documentation for environment_type=PROCESS

2019-12-13 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8273.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Improve documentation for environment_type=PROCESS
> --
>
> Key: BEAM-8273
> URL: https://issues.apache.org/jira/browse/BEAM-8273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When environment_type=PROCESS, environment_config specifies the command to 
> run the worker processes. Right now, it defaults to None and errors if not 
> set (`TypeError: expected string or buffer`).
> It might not be feasible to offer a one-size-fits-all executable for 
> providing as environment_config, but we could at least:
> a) make it easier to build one (right now I only see the executable being 
> built in a test script that depends on docker: 
> [https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165])
> b) document the process
> c) link to the documentation when no environment_config is provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8273) Improve documentation for environment_type=PROCESS

2019-12-13 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8273:
--
Summary: Improve documentation for environment_type=PROCESS  (was: Improve 
worker script for environment_type=PROCESS)

> Improve documentation for environment_type=PROCESS
> --
>
> Key: BEAM-8273
> URL: https://issues.apache.org/jira/browse/BEAM-8273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When environment_type=PROCESS, environment_config specifies the command to 
> run the worker processes. Right now, it defaults to None and errors if not 
> set (`TypeError: expected string or buffer`).
> It might not be feasible to offer a one-size-fits-all executable for 
> providing as environment_config, but we could at least:
> a) make it easier to build one (right now I only see the executable being 
> built in a test script that depends on docker: 
> [https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165])
> b) document the process
> c) link to the documentation when no environment_config is provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=359643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359643
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:46
Start Date: 13/Dec/19 20:46
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10378: [BEAM-8481] Fix a race 
condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-565602857
 
 
   Filed and assigned self 
[BEAM-8968](https://issues.apache.org/jira/browse/BEAM-8968) for missing jar 
issue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359643)
Time Spent: 2h 40m  (was: 2.5h)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8968) portableWordCount test for Spark/Flink failing: jar not found

2019-12-13 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8968:
-

 Summary: portableWordCount test for Spark/Flink failing: jar not 
found
 Key: BEAM-8968
 URL: https://issues.apache.org/jira/browse/BEAM-8968
 Project: Beam
  Issue Type: Bug
  Components: runner-flink, runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


This affects portableWordCountSparkRunnerBatch, 
portableWordCountFlinkRunnerBatch, and portableWordCountFlinkRunnerStreaming.

22:43:23 RuntimeError: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib/runners/flink/1.9/job-server/build/libs/beam-runners-flink-1.9-job-server-2.19.0-SNAPSHOT.jar
 not found. Please build the server with 
22:43:23   cd 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib;
 ./gradlew runners:flink:1.9:job-server:shadowJar




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7116) Remove KV from Schema transforms

2019-12-13 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995897#comment-16995897
 ] 

Brian Hulette commented on BEAM-7116:
-

Yep! marked it as resolved, thanks

> Remove KV from Schema transforms
> 
>
> Key: BEAM-7116
> URL: https://issues.apache.org/jira/browse/BEAM-7116
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Instead of returning KV objects, we should return a Schema with two fields. 
> The Convert transform should be able to convert these to KV objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=359642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359642
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:43
Start Date: 13/Dec/19 20:43
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10377: [BEAM-3713] 
pytest migration: py3x-{gcp,cython}
URL: https://github.com/apache/beam/pull/10377
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359642)
Time Spent: 15h 40m  (was: 15.5h)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8955) AvroSchemaTest.testAvroPipelineGroupBy broken on Spark runner

2019-12-13 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette closed BEAM-8955.
---
Fix Version/s: 2.19.0
   Resolution: Fixed

> AvroSchemaTest.testAvroPipelineGroupBy broken on Spark runner
> -
>
> Key: BEAM-8955
> URL: https://issues.apache.org/jira/browse/BEAM-8955
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10151 seems to be the cause



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7116) Remove KV from Schema transforms

2019-12-13 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-7116.
-
Fix Version/s: 2.19.0
   Resolution: Fixed

> Remove KV from Schema transforms
> 
>
> Key: BEAM-7116
> URL: https://issues.apache.org/jira/browse/BEAM-7116
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Instead of returning KV objects, we should return a Schema with two fields. 
> The Convert transform should be able to convert these to KV objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6756) Support lazy iterables in schemas

2019-12-13 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-6756.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Support lazy iterables in schemas
> -
>
> Key: BEAM-6756
> URL: https://issues.apache.org/jira/browse/BEAM-6756
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The iterables returned by GroupByKey and CoGroupByKey are lazy; this allows a 
> runner to page data into memory if the full iterable is too large. We 
> currently don't support this in Schemas, so the Schema Group and CoGroup 
> transforms materialize all data into memory. We should add support for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8917) javax.annotation.Nullable is missing for org.apache.beam.sdk.schemas.FieldValueTypeInformation

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8917?focusedWorklogId=359641=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359641
 ]

ASF GitHub Bot logged work on BEAM-8917:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:41
Start Date: 13/Dec/19 20:41
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10324: [BEAM-8917] jsr305 
dependency declaration for Nullable class
URL: https://github.com/apache/beam/pull/10324#issuecomment-565601374
 
 
   PR to fix this #10382 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359641)
Time Spent: 11h 10m  (was: 11h)

> javax.annotation.Nullable is missing for 
> org.apache.beam.sdk.schemas.FieldValueTypeInformation
> --
>
> Key: BEAM-8917
> URL: https://issues.apache.org/jira/browse/BEAM-8917
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> This ticket is from the result of static analysis by Linkage Checker 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1045])
> h1. Example Project
> Example project to produce an issue: 
> https://github.com/suztomo/beam-java-sdk-missing-nullable .
> I think the Maven artifact {{org.apache.beam:beam-sdks-java-core}}, which 
> contains {{org.apache.beam.sdk.schemas.FieldValueTypeInformation}}, should 
> declare the dependency to {{com.google.code.findbugs:jsr305}}.
> h1. Why there's no problem in compilation and tests of sdks/java/core?
> The compilation succeeds because the {{Nullable}} annotation is in the 
> transitive dependency of compileOnly {{spotbugs-annotations}} dependency:
> {noformat}
> compileOnly - Compile only dependencies for source set 'main'.
> ...
> +--- com.github.spotbugs:spotbugs-annotations:3.1.12
> |\--- com.google.code.findbugs:jsr305:3.0.2
> ...
> {noformat}
> The tests succeed because the {{Nullable}} annotation is in the transitive 
> dependency of {{guava-testlib}}.
> {noformat}
> testRuntime - Runtime dependencies for source set 'test' (deprecated, use 
> 'testRuntimeOnly' instead).
> ...
> +--- com.google.guava:guava-testlib:20.0
> |+--- com.google.code.findbugs:jsr305:1.3.9
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=359640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359640
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:40
Start Date: 13/Dec/19 20:40
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10382: [BEAM-8967] 
Declare JSR305 dependency as 'shadow'
URL: https://github.com/apache/beam/pull/10382#discussion_r357823255
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -69,7 +69,7 @@ dependencies {
   compile library.java.protobuf_java
   compile library.java.commons_compress
   compile library.java.commons_lang3
-  compile library.java.jsr305
+  shadow library.java.jsr305
 
 Review comment:
   
[BeamModulePlugin.groovy#L629](https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L629)
 says:
   ```
   // When the shadowClosure argument is specified, the shadow plugin is 
enabled to perform shading
   // of commonly found dependencies. Because of this it is important that 
dependencies are added
   // to the correct configuration. Dependencies should fall into one of 
these four configurations:
   //  * compile - Required during compilation or runtime of the main 
source set.
   //  This configuration represents all dependencies that 
much also be shaded away
   //  otherwise the generated Maven pom will be missing 
this dependency.
   //  * shadow  - Required during compilation or runtime of the main 
source set.
   //  Will become a runtime dependency of the generated 
Maven pom.
   ```
   
   JSR305 is needed at runtime 
([BEAM-8917](https://issues.apache.org/jira/browse/BEAM-8917)).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359640)
Time Spent: 40m  (was: 0.5h)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> 

[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=359638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359638
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:40
Start Date: 13/Dec/19 20:40
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10382: [BEAM-8967] Declare 
JSR305 dependency as 'shadow'
URL: https://github.com/apache/beam/pull/10382#issuecomment-565600986
 
 
   Run Java Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359638)
Time Spent: 20m  (was: 10m)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=359639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359639
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:40
Start Date: 13/Dec/19 20:40
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10382: [BEAM-8967] Declare 
JSR305 dependency as 'shadow'
URL: https://github.com/apache/beam/pull/10382#issuecomment-565601059
 
 
   Run SQL PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359639)
Time Spent: 0.5h  (was: 20m)

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?focusedWorklogId=359635=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359635
 ]

ASF GitHub Bot logged work on BEAM-8967:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:36
Start Date: 13/Dec/19 20:36
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10382: [BEAM-8967] 
Declare JSR305 dependency as 'shadow'
URL: https://github.com/apache/beam/pull/10382
 
 
   https://issues.apache.org/jira/browse/BEAM-8967 (Followup of 
https://github.com/apache/beam/pull/10324).
   
   Because sdks/java/core uses shadowClosure, JSR305 should be declared as 
"shadow", even though Gradle does not shade it. (explanation in 
[BeamModulePlugin.groovy#L629](https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L629)).
   
   CC: @iemejia, @lukecwik, @Ardagan (2.17 release manager), @udim (2.18 
release manager)
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Assigned] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki reassigned BEAM-8967:
-

Assignee: Tomo Suzuki

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8967:
--
Summary: Maven artifact beam-sdks-java-core does not have JSR305 specified 
as "compile"  (was: Maven artifact beam-sdks-java-core does not have 
dependencies specified as "compile")

> Maven artifact beam-sdks-java-core does not have JSR305 specified as "compile"
> --
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java
> They are shaded. For example, Beam's TextBasedReader uses 
> {{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
> class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
> org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
> {noformat}
> h2. commons-compress
> They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
> uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
> The shaded class is in the published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
> org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
> {noformat}
> h2. commons-lang3
> They are shaded. For example, Beam's 
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
> published JAR file:
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
> org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
> {noformat}
> h2. antlr-runtime
> Same.
> {noformat}
> suztomo-macbookpro44:beam suztomo$ jar tf 
> ~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
> org/apache/beam/repackaged/core/org/antlr/v4/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/
> org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8967:
--
Description: 
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 4 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}

h1. Compile-declared dependencies needed at runtime?

h2. protobuf-java

They are shaded. For example, Beam's TextBasedReader uses 
{{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
class is in the published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
{noformat}

h2. commons-compress

They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
uses 
{{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
The shaded class is in the published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
{noformat}

h2. commons-lang3

They are shaded. For example, Beam's {{org.apache.beam.sdk.io.LocalFileSystem}} 
uses {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
{noformat}

h2. antlr-runtime

Same.

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep org.antlr.v4 |head
org/apache/beam/repackaged/core/org/antlr/v4/
org/apache/beam/repackaged/core/org/antlr/v4/runtime/
org/apache/beam/repackaged/core/org/antlr/v4/runtime/ANTLRErrorListener.class
{noformat}







  was:
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 4 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}

h1. Compile-declared dependencies needed at runtime?

h2. protobuf-java

They are shaded. For example, Beam's TextBasedReader uses 
{{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
class is in the published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
{noformat}

h2. commons-compress

They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
uses 
{{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
The shaded class is in the published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
{noformat}

h2. commons-lang3

They are shaded. For example, Beam's {{org.apache.beam.sdk.io.LocalFileSystem}} 
uses 

[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8967:
--
Description: 
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 4 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}

h1. Compile-declared dependencies needed at runtime?

h2. protobuf-java

They are shaded. For example, Beam's TextBasedReader uses 
{{com.google.protobuf.ByteString}} from protobuf-java. The shaded ByteString 
class is in the published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep protobuf.ByteString
org/apache/beam/repackaged/core/com/google/protobuf/ByteString$1.class
{noformat}

h2. commons-compress

They are shaded. For example, Beam's {{org.apache.beam.sdk.io.Compression}} 
uses 
{{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}. 
The shaded class is in the published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep BZip2CompressorInputStream
org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream$Data.class
org/apache/beam/repackaged/core/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.class
{noformat}

h2. commons-lang3

They are shaded. For example, Beam's {{org.apache.beam.sdk.io.LocalFileSystem}} 
uses {{org.apache.commons.lang3.SystemUtils}}. The shaded class is in the 
published JAR file:

{noformat}
suztomo-macbookpro44:beam suztomo$ jar tf 
~/Downloads/beam-sdks-java-core-2.16.0.jar |grep SystemUtils
org/apache/beam/repackaged/core/org/apache/commons/lang3/SystemUtils.class
{noformat}






  was:
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 4 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}

h1. Compile-declared dependencies needed at runtime?

h2. protobuf-java: Yes

TextBasedReader uses {{com.google.protobuf.ByteString}} from protobuf-java.

h2. commons-compress: Yes

{{org.apache.beam.sdk.io.Compression}} uses 
{{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}.

h2. commons-lang3: Yes

{{org.apache.beam.sdk.io.LocalFileSystem}} uses 
{{org.apache.commons.lang3.SystemUtils}}




> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile"
> 
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> 

[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8967:
--
Description: 
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 4 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}

h1. Compile-declared dependencies needed at runtime?

h2. protobuf-java: Yes

TextBasedReader uses {{com.google.protobuf.ByteString}} from protobuf-java.

h2. commons-compress: Yes

{{org.apache.beam.sdk.io.Compression}} uses 
{{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}.

h2. commons-lang3: Yes

{{org.apache.beam.sdk.io.LocalFileSystem}} uses 
{{org.apache.commons.lang3.SystemUtils}}



  was:
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 3 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}




> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile"
> 
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 4 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}
> h1. Compile-declared dependencies needed at runtime?
> h2. protobuf-java: Yes
> TextBasedReader uses {{com.google.protobuf.ByteString}} from protobuf-java.
> h2. commons-compress: Yes
> {{org.apache.beam.sdk.io.Compression}} uses 
> {{org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream}}.
> h2. commons-lang3: Yes
> {{org.apache.beam.sdk.io.LocalFileSystem}} uses 
> {{org.apache.commons.lang3.SystemUtils}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=359618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359618
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 13/Dec/19 20:12
Start Date: 13/Dec/19 20:12
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10369: [BEAM-8933] 
BigQueryIO Arrow for read
URL: https://github.com/apache/beam/pull/10369#issuecomment-565591991
 
 
   My `ArrowUtils` addition seems to have caused some mysterious failures in 
the spark runner tests (in Java PreCommit).
   
   From 
[`org.apache.beam.runners.spark.CacheTest.shouldCacheTest`](https://builds.apache.org/job/beam_PreCommit_Java_Commit/9227/testReport/junit/org.apache.beam.runners.spark/CacheTest/shouldCacheTest/)
   ```
   java.lang.NoSuchMethodError: 
io.netty.util.internal.ReflectionUtil.trySetAccessible(Ljava/lang/reflect/AccessibleObject;)Ljava/lang/Throwable;
at io.netty.channel.nio.NioEventLoop$5.run(NioEventLoop.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:210)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:149)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:127)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:36)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:84)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:58)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:47)
at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:59)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:77)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:72)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:59)
at 
org.apache.spark.network.util.NettyUtils.createEventLoop(NettyUtils.java:50)
at 
org.apache.spark.network.client.TransportClientFactory.(TransportClientFactory.java:102)
at 
org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
at org.apache.spark.rpc.netty.NettyRpcEnv.(NettyRpcEnv.scala:71)
   ```
   
I think this must be because I added arrow as a dependency for 
`:sdks:java:core`
   
   @kennknowles do you have any idea why this would happen? Is there something 
we need to re-run when updating core java dependencies?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359618)
Time Spent: 2.5h  (was: 2h 20m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8967:
--
Description: 
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].

Other 3 dependencies are not found in the snapshot pom either:

{code:groovy}
  compile library.java.antlr_runtime
  compile library.java.protobuf_java
  compile library.java.commons_compress
  compile library.java.commons_lang3
{code}



  was:
Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004]



> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile"
> 
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004].
> Other 3 dependencies are not found in the snapshot pom either:
> {code:groovy}
>   compile library.java.antlr_runtime
>   compile library.java.protobuf_java
>   compile library.java.commons_compress
>   compile library.java.commons_lang3
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995860#comment-16995860
 ] 

Tomo Suzuki edited comment on BEAM-8967 at 12/13/19 8:05 PM:
-

Suspecting around this piece of code in BeamModulePlugin.groovy:

{noformat}
// TODO: Should we use the runtime scope instead of the compile 
scope
// which forces all our consumers to declare what they consume?
generateDependenciesFromConfiguration(
configuration: (configuration.shadowClosure ? 'shadow' 
: 'compile'), scope: 'compile')
generateDependenciesFromConfiguration(configuration: 
'provided', scope: 'provided')
{noformat}

ShadowClosure field of a project?


{code:java}
/**
 * If unset, no shading is performed. The jar and test jar archives are 
used during publishing.
 * Otherwise the shadowJar and shadowTestJar artifacts are used during 
publishing.
 *
 * The shadowJar / shadowTestJar tasks execute the specified closure to 
configure themselves.
 */
Closure shadowClosure;
{code}


Found the root cause in the document in [BeamModulePlugin.groovy line 
629|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L629].
 JSR305 (which is needed at runtime) should be declared as "shadow", even when 
no shading is needed because sdks/java/core uses shadowClosure.


{code:java}
// When the shadowClosure argument is specified, the shadow plugin is 
enabled to perform shading
// of commonly found dependencies. Because of this it is important that 
dependencies are added
// to the correct configuration. Dependencies should fall into one of these 
four configurations:
//  * compile - Required during compilation or runtime of the main 
source set.
//  This configuration represents all dependencies that 
much also be shaded away
//  otherwise the generated Maven pom will be missing this 
dependency.
//  * shadow  - Required during compilation or runtime of the main 
source set.
//  Will become a runtime dependency of the generated Maven 
pom.

{code}





was (Author: suztomo):
Suspecting around this piece of code in BeamModulePlugin.groovy:

{noformat}
// TODO: Should we use the runtime scope instead of the compile 
scope
// which forces all our consumers to declare what they consume?
generateDependenciesFromConfiguration(
configuration: (configuration.shadowClosure ? 'shadow' 
: 'compile'), scope: 'compile')
generateDependenciesFromConfiguration(configuration: 
'provided', scope: 'provided')
{noformat}

ShadowClosure field of a project?


{noformat}
/**
 * If unset, no shading is performed. The jar and test jar archives are 
used during publishing.
 * Otherwise the shadowJar and shadowTestJar artifacts are used during 
publishing.
 *
 * The shadowJar / shadowTestJar tasks execute the specified closure to 
configure themselves.
 */
Closure shadowClosure;
{noformat}



> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile"
> 
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995860#comment-16995860
 ] 

Tomo Suzuki edited comment on BEAM-8967 at 12/13/19 7:52 PM:
-

Suspecting around this piece of code in BeamModulePlugin.groovy:

{noformat}
// TODO: Should we use the runtime scope instead of the compile 
scope
// which forces all our consumers to declare what they consume?
generateDependenciesFromConfiguration(
configuration: (configuration.shadowClosure ? 'shadow' 
: 'compile'), scope: 'compile')
generateDependenciesFromConfiguration(configuration: 
'provided', scope: 'provided')
{noformat}

ShadowClosure field of a project?


{noformat}
/**
 * If unset, no shading is performed. The jar and test jar archives are 
used during publishing.
 * Otherwise the shadowJar and shadowTestJar artifacts are used during 
publishing.
 *
 * The shadowJar / shadowTestJar tasks execute the specified closure to 
configure themselves.
 */
Closure shadowClosure;
{noformat}




was (Author: suztomo):
Suspecting around this piece of code in BeamModulePlugin.groovy:

{noformat}
// TODO: Should we use the runtime scope instead of the compile 
scope
// which forces all our consumers to declare what they consume?
generateDependenciesFromConfiguration(
configuration: (configuration.shadowClosure ? 'shadow' 
: 'compile'), scope: 'compile')
generateDependenciesFromConfiguration(configuration: 
'provided', scope: 'provided')

{noformat}


> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile"
> 
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995860#comment-16995860
 ] 

Tomo Suzuki commented on BEAM-8967:
---

Suspecting around this piece of code in BeamModulePlugin.groovy:

{noformat}
// TODO: Should we use the runtime scope instead of the compile 
scope
// which forces all our consumers to declare what they consume?
generateDependenciesFromConfiguration(
configuration: (configuration.shadowClosure ? 'shadow' 
: 'compile'), scope: 'compile')
generateDependenciesFromConfiguration(configuration: 
'provided', scope: 'provided')

{noformat}


> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile"
> 
>
> Key: BEAM-8967
> URL: https://issues.apache.org/jira/browse/BEAM-8967
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Maven artifact beam-sdks-java-core does not have dependencies specified as 
> "compile".
> This is a followup of [~iemejia]'s finding:
> {quote}
> Just double checked with today's SNAPSHOTs after the merge and the pom of 
> core is not modified, however the deps look good in master, not sure if the 
> change was applied before the SNAPSHOT generation, but still to double check.
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
> {quote} 
> in [jsr305 dependency declaration for Nullable 
> class|https://github.com/apache/beam/pull/10324#issuecomment-565516004]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8967) Maven artifact beam-sdks-java-core does not have dependencies specified as "compile"

2019-12-13 Thread Tomo Suzuki (Jira)
Tomo Suzuki created BEAM-8967:
-

 Summary: Maven artifact beam-sdks-java-core does not have 
dependencies specified as "compile"
 Key: BEAM-8967
 URL: https://issues.apache.org/jira/browse/BEAM-8967
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Tomo Suzuki


Maven artifact beam-sdks-java-core does not have dependencies specified as 
"compile".

This is a followup of [~iemejia]'s finding:

{quote}
Just double checked with today's SNAPSHOTs after the merge and the pom of core 
is not modified, however the deps look good in master, not sure if the change 
was applied before the SNAPSHOT generation, but still to double check.
https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-20191213.072102-9.pom
{quote} 

in [jsr305 dependency declaration for Nullable 
class|https://github.com/apache/beam/pull/10324#issuecomment-565516004]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8825?focusedWorklogId=359613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359613
 ]

ASF GitHub Bot logged work on BEAM-8825:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:39
Start Date: 13/Dec/19 19:39
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10380: [BEAM-8825] Add limit 
on number of mutated rows to batching/sorting stages.
URL: https://github.com/apache/beam/pull/10380#issuecomment-565580084
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359613)
Time Spent: 1.5h  (was: 1h 20m)

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-13 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8966:

Description: 
I believe this is due to https://github.com/apache/beam/pull/9915

{code}
Collecting mypy-protobuf==1.12
  Using cached 
https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
Installing collected packages: mypy-protobuf
Successfully installed mypy-protobuf-1.12
beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
not used.
beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not used.
Traceback (most recent call last):
  File "/usr/local/bin/protoc-gen-mypy", line 13, in 
import google.protobuf.descriptor_pb2 as d
ModuleNotFoundError: No module named 'google'
--mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
Process Process-1:
Traceback (most recent call last):
  File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
from grpc_tools import protoc
ModuleNotFoundError: No module named 'grpc_tools'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
_bootstrap
self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
  File "/app/sdks/python/gen_protos.py", line 189, in 
_install_grpcio_tools_and_generate_proto_files
generate_proto_files()
  File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
'%s' % ret_code)
RuntimeError: Protoc returned non-zero status (see logs for details): 1
Traceback (most recent call last):
  File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
from grpc_tools import protoc
ModuleNotFoundError: No module named 'grpc_tools'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 295, in 
'mypy': generate_protos_first(mypy),
  File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
145, in setup
return distutils.core.setup(**attrs)
  File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
  File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
  File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
line 44, in run
self.run_command('egg_info')
  File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
  File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
  File "setup.py", line 220, in run
gen_protos.generate_proto_files(log=log)
  File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
raise ValueError("Proto generation failed (see log for details).")
ValueError: Proto generation failed (see log for details).
Service 'test' failed to build: The command '/bin/sh -c cd sdks/python &&   
  python setup.py sdist && pip install --no-cache-dir $(ls 
dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
{code}
https://builds.apache.org/job/beam_PostCommit_Python37/1114/consoleText

  was:
I believe this is due to https://github.com/apache/beam/pull/9915

{code}
Collecting mypy-protobuf==1.12
  Using cached 
https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
Installing collected packages: mypy-protobuf
Successfully installed mypy-protobuf-1.12
beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
not used.
beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not used.
Traceback (most recent call last):
  File "/usr/local/bin/protoc-gen-mypy", line 13, in 
import google.protobuf.descriptor_pb2 as d
ModuleNotFoundError: No module named 'google'
--mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
Process Process-1:
Traceback (most recent call last):
  File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
from grpc_tools import protoc
ModuleNotFoundError: No module named 'grpc_tools'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
_bootstrap
self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
  File "/app/sdks/python/gen_protos.py", line 189, in 

[jira] [Updated] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2019-12-13 Thread Wenbing Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Bai updated BEAM-8965:
--
Status: Open  (was: Triage Needed)

> WriteToBigQuery failed in BundleBasedDirectRunner
> -
>
> Key: BEAM-8965
> URL: https://issues.apache.org/jira/browse/BEAM-8965
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.16.0
>Reporter: Wenbing Bai
>Priority: Major
>
> {{*{{WriteToBigQuery}}* failed in }}{{*BundleBasedDirectRunner*}}{{ with 
> error PCollection of size 2 with more than one element accessed as a 
> singleton view.}}
> Here is the code
>  
> {code:java}
> with Pipeline() as p:
> query_results = (
> p 
> | beam.io.Read(beam.io.BigQuerySource(
> query='SELECT ... FROM ...')
> )
> query_results | beam.io.gcp.WriteToBigQuery(
> table=,
> method=WriteToBigQuery.Method.FILE_LOADS,
> schema={"fields": []}
> )
> {code}
>  
> Here is the error
>  
> {code:java}
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRunner.process
>     def process(self, windowed_value):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>     self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 849, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>     raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>     return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>     self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>     [si[global_window] for si in self.side_inputs]))
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py",
>  line 65, in __getitem__
>     _FilteringIterable(self._iterable, target_window), self._view_options)
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
>  line 443, in _from_runtime_iterable
>     len(head), str(head[0]), str(head[1])))
> ValueError: PCollection of size 2 with more than one element accessed as a 
> singleton view. First two elements encountered are 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 
> 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2019-12-13 Thread Wenbing Bai (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995849#comment-16995849
 ] 

Wenbing Bai commented on BEAM-8965:
---

I did a small investigate, I think Singleton object  is somehow evaluated twice 
BundleBasedDirectRunner.

Here is the Singleton object 
[https://github.com/apache/beam/blob/de30361359b70e9fe9729f0f3d52f6c6e8462cfb/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L802]

And the Singleton object is used twice in 
[https://github.com/apache/beam/blob/de30361359b70e9fe9729f0f3d52f6c6e8462cfb/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L656]

*Also note*, this is only for BundleBasedDirectRunner, good at DataFlowRunner 
and FnApiRunner.

 

> WriteToBigQuery failed in BundleBasedDirectRunner
> -
>
> Key: BEAM-8965
> URL: https://issues.apache.org/jira/browse/BEAM-8965
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.16.0
>Reporter: Wenbing Bai
>Priority: Major
>
> {{*{{WriteToBigQuery}}* failed in }}{{*BundleBasedDirectRunner*}}{{ with 
> error PCollection of size 2 with more than one element accessed as a 
> singleton view.}}
> Here is the code
>  
> {code:java}
> with Pipeline() as p:
> query_results = (
> p 
> | beam.io.Read(beam.io.BigQuerySource(
> query='SELECT ... FROM ...')
> )
> query_results | beam.io.gcp.WriteToBigQuery(
> table=,
> method=WriteToBigQuery.Method.FILE_LOADS,
> schema={"fields": []}
> )
> {code}
>  
> Here is the error
>  
> {code:java}
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRunner.process
>     def process(self, windowed_value):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>     self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 849, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>     raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>     return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>     self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>     [si[global_window] for si in self.side_inputs]))
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py",
>  line 65, in __getitem__
>     _FilteringIterable(self._iterable, target_window), self._view_options)
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
>  line 443, in _from_runtime_iterable
>     len(head), str(head[0]), str(head[1])))
> ValueError: PCollection of size 2 with more than one element accessed as a 
> singleton view. First two elements encountered are 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 
> 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=359612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359612
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:32
Start Date: 13/Dec/19 19:32
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10378: [BEAM-8481] Fix a race 
condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-565578097
 
 
   There are other errors though which I haven't triaged, such as 
`beam-runners-flink-1.9-job-server-2.19.0-SNAPSHOT.jar not found`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359612)
Time Spent: 2.5h  (was: 2h 20m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-13 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8966:

Status: Open  (was: Triage Needed)

> failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> ---
>
> Key: BEAM-8966
> URL: https://issues.apache.org/jira/browse/BEAM-8966
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Chad Dombrova
>Priority: Major
>
> I believe this is due to https://github.com/apache/beam/pull/9915
> {code}
> Collecting mypy-protobuf==1.12
>   Using cached 
> https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
> Installing collected packages: mypy-protobuf
> Successfully installed mypy-protobuf-1.12
> beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
> not used.
> beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not 
> used.
> Traceback (most recent call last):
>   File "/usr/local/bin/protoc-gen-mypy", line 13, in 
> import google.protobuf.descriptor_pb2 as d
> ModuleNotFoundError: No module named 'google'
> --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> Process Process-1:
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
> _bootstrap
> self.run()
>   File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
> self._target(*self._args, **self._kwargs)
>   File "/app/sdks/python/gen_protos.py", line 189, in 
> _install_grpcio_tools_and_generate_proto_files
> generate_proto_files()
>   File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
> '%s' % ret_code)
> RuntimeError: Protoc returned non-zero status (see logs for details): 1
> Traceback (most recent call last):
>   File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
> from grpc_tools import protoc
> ModuleNotFoundError: No module named 'grpc_tools'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "setup.py", line 295, in 
> 'mypy': generate_protos_first(mypy),
>   File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
> 145, in setup
> return distutils.core.setup(**attrs)
>   File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
> dist.run_commands()
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
> self.run_command(cmd)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
> line 44, in run
> self.run_command('egg_info')
>   File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
> self.distribution.run_command(command)
>   File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
> cmd_obj.run()
>   File "setup.py", line 220, in run
> gen_protos.generate_proto_files(log=log)
>   File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
> raise ValueError("Proto generation failed (see log for details).")
> ValueError: Proto generation failed (see log for details).
> Service 'test' failed to build: The command '/bin/sh -c cd sdks/python && 
> python setup.py sdist && pip install --no-cache-dir $(ls 
> dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=359610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359610
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:31
Start Date: 13/Dec/19 19:31
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9915: [BEAM-7746] Add python 
type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#issuecomment-565577591
 
 
   I believe this PR broke 
:sdks:python:test-suites:direct:py37:hdfsIntegrationTest
   Opened https://issues.apache.org/jira/browse/BEAM-8966
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359610)
Time Spent: 35h 50m  (was: 35h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 35h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=359611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359611
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:31
Start Date: 13/Dec/19 19:31
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10378: [BEAM-8481] Fix a race 
condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-56559
 
 
   Opened https://issues.apache.org/jira/browse/BEAM-8966 for 
:sdks:python:test-suites:direct:py37:hdfsIntegrationTest failure
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359611)
Time Spent: 2h 20m  (was: 2h 10m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=359607=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359607
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:30
Start Date: 13/Dec/19 19:30
Worklog Time Spent: 10m 
  Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO 
compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-565577445
 
 
   @gsteelman we have used the airlift/aircompressor library to only get the 
compression and decompression mechanism, the implementation of Input/Output 
stream there introduces the transitive dependency, which can be removed and 
replaced with apache hadoop common library. This significantly reduces the size 
as well.
   So, here are the 2 possible options:
   1) We only use the compression and decompression mechanism from 
airlift/aircompressor and design the Input/Output Streams for beam accordingly. 
This will be needed to be updated if there is any change in those classes on 
airlift/aircompressor's end. But, since we will only be using the compression 
and decompression mechanism from airlift/aircompressor, the updates will be 
small and quite rare. Therefore, this won't be that big of an issue.
   2) We introduce LZO as an optional package for beam. As this will give users 
the option to manage their beam size (if it is a constraint) or if LZO is not 
required.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359607)
Time Spent: 4.5h  (was: 4h 20m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8966) failure in :sdks:python:test-suites:direct:py37:hdfsIntegrationTest

2019-12-13 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8966:
---

 Summary: failure in 
:sdks:python:test-suites:direct:py37:hdfsIntegrationTest
 Key: BEAM-8966
 URL: https://issues.apache.org/jira/browse/BEAM-8966
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Udi Meiri
Assignee: Chad Dombrova


I believe this is due to https://github.com/apache/beam/pull/9915

{code}
Collecting mypy-protobuf==1.12
  Using cached 
https://files.pythonhosted.org/packages/b6/28/041dea47c93564bfc0ece050362894292ec4f173caa92fa82994a6d061d1/mypy_protobuf-1.12-py3-none-any.whl
Installing collected packages: mypy-protobuf
Successfully installed mypy-protobuf-1.12
beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto but 
not used.
beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto but not used.
Traceback (most recent call last):
  File "/usr/local/bin/protoc-gen-mypy", line 13, in 
import google.protobuf.descriptor_pb2 as d
ModuleNotFoundError: No module named 'google'
--mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
Process Process-1:
Traceback (most recent call last):
  File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
from grpc_tools import protoc
ModuleNotFoundError: No module named 'grpc_tools'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in 
_bootstrap
self.run()
  File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
  File "/app/sdks/python/gen_protos.py", line 189, in 
_install_grpcio_tools_and_generate_proto_files
generate_proto_files()
  File "/app/sdks/python/gen_protos.py", line 144, in generate_proto_files
'%s' % ret_code)
RuntimeError: Protoc returned non-zero status (see logs for details): 1
Traceback (most recent call last):
  File "/app/sdks/python/gen_protos.py", line 104, in generate_proto_files
from grpc_tools import protoc
ModuleNotFoundError: No module named 'grpc_tools'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 295, in 
'mypy': generate_protos_first(mypy),
  File "/usr/local/lib/python3.7/site-packages/setuptools/__init__.py", line 
145, in setup
return distutils.core.setup(**attrs)
  File "/usr/local/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/usr/local/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
  File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
  File "/usr/local/lib/python3.7/site-packages/setuptools/command/sdist.py", 
line 44, in run
self.run_command('egg_info')
  File "/usr/local/lib/python3.7/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
  File "/usr/local/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
  File "setup.py", line 220, in run
gen_protos.generate_proto_files(log=log)
  File "/app/sdks/python/gen_protos.py", line 121, in generate_proto_files
raise ValueError("Proto generation failed (see log for details).")
ValueError: Proto generation failed (see log for details).
Service 'test' failed to build: The command '/bin/sh -c cd sdks/python &&   
  python setup.py sdist && pip install --no-cache-dir $(ls 
dist/apache-beam-*.tar.gz | tail -n1)[gcp]' returned a non-zero code: 1
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2019-12-13 Thread Wenbing Bai (Jira)
Wenbing Bai created BEAM-8965:
-

 Summary: WriteToBigQuery failed in BundleBasedDirectRunner
 Key: BEAM-8965
 URL: https://issues.apache.org/jira/browse/BEAM-8965
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Affects Versions: 2.16.0
Reporter: Wenbing Bai


{{*{{WriteToBigQuery}}* failed in }}{{*BundleBasedDirectRunner*}}{{ with error 
PCollection of size 2 with more than one element accessed as a singleton view.}}

Here is the code

 
{code:java}
with Pipeline() as p:
query_results = (
p 
| beam.io.Read(beam.io.BigQuerySource(
query='SELECT ... FROM ...')
)
query_results | beam.io.gcp.WriteToBigQuery(
table=,
method=WriteToBigQuery.Method.FILE_LOADS,
schema={"fields": []}
)
{code}
 

Here is the error

 
{code:java}
  File "apache_beam/runners/common.py", line 778, in 
apache_beam.runners.common.DoFnRunner.process
    def process(self, windowed_value):
  File "apache_beam/runners/common.py", line 782, in 
apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 849, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise_with_traceback(new_exn)
  File "apache_beam/runners/common.py", line 780, in 
apache_beam.runners.common.DoFnRunner.process
    return self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 587, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
    self._invoke_process_per_window(
  File "apache_beam/runners/common.py", line 610, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
    [si[global_window] for si in self.side_inputs]))
  File 
"/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py",
 line 65, in __getitem__
    _FilteringIterable(self._iterable, target_window), self._view_options)
  File 
"/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
 line 443, in _from_runtime_iterable
    len(head), str(head[0]), str(head[1])))
ValueError: PCollection of size 2 with more than one element accessed as a 
singleton view. First two elements encountered are 
"gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", 
"gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 
'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
{code}
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8917) javax.annotation.Nullable is missing for org.apache.beam.sdk.schemas.FieldValueTypeInformation

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8917?focusedWorklogId=359597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359597
 ]

ASF GitHub Bot logged work on BEAM-8917:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:17
Start Date: 13/Dec/19 19:17
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10324: [BEAM-8917] jsr305 
dependency declaration for Nullable class
URL: https://github.com/apache/beam/pull/10324#issuecomment-565572550
 
 
   Investigating that. I see the same problem in my local installation. 
`/Users/suztomo/.m2/repository//org/apache/beam/beam-sdks-java-core/2.19.0-SNAPSHOT/beam-sdks-java-core-2.19.0-SNAPSHOT.pom`
 does not have jsr305.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359597)
Time Spent: 11h  (was: 10h 50m)

> javax.annotation.Nullable is missing for 
> org.apache.beam.sdk.schemas.FieldValueTypeInformation
> --
>
> Key: BEAM-8917
> URL: https://issues.apache.org/jira/browse/BEAM-8917
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> This ticket is from the result of static analysis by Linkage Checker 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1045])
> h1. Example Project
> Example project to produce an issue: 
> https://github.com/suztomo/beam-java-sdk-missing-nullable .
> I think the Maven artifact {{org.apache.beam:beam-sdks-java-core}}, which 
> contains {{org.apache.beam.sdk.schemas.FieldValueTypeInformation}}, should 
> declare the dependency to {{com.google.code.findbugs:jsr305}}.
> h1. Why there's no problem in compilation and tests of sdks/java/core?
> The compilation succeeds because the {{Nullable}} annotation is in the 
> transitive dependency of compileOnly {{spotbugs-annotations}} dependency:
> {noformat}
> compileOnly - Compile only dependencies for source set 'main'.
> ...
> +--- com.github.spotbugs:spotbugs-annotations:3.1.12
> |\--- com.google.code.findbugs:jsr305:3.0.2
> ...
> {noformat}
> The tests succeed because the {{Nullable}} annotation is in the transitive 
> dependency of {{guava-testlib}}.
> {noformat}
> testRuntime - Runtime dependencies for source set 'test' (deprecated, use 
> 'testRuntimeOnly' instead).
> ...
> +--- com.google.guava:guava-testlib:20.0
> |+--- com.google.code.findbugs:jsr305:1.3.9
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359590
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:06
Start Date: 13/Dec/19 19:06
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r357789470
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -399,6 +418,43 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  def test_combining_with_accumulation_mode(self):
+# PCollection will contain elements from 1 to 5.
+elements = [i for i in range(1, 6)]
+
+ts = TestStream().advance_watermark_to(0)
+for i in elements:
+  ts.add_elements([i])
+ts.advance_watermark_to_infinity()
+
+options = PipelineOptions()
+options.view_as(StandardOptions).streaming = True
+with TestPipeline(options=options) as p:
+  result = (p
+| ts
+| beam.WindowInto(
+GlobalWindows(),
+accumulation_mode=trigger.AccumulationMode.ACCUMULATING,
+trigger=AfterWatermark(early=AfterAll(AfterCount(1)))
+)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2)
+| beam.ParDo(self.record_dofn()))
+
+# The trigger should fire repeatedly for each newly added element,
+# and at least once for advancing the watermark to infinity.
+# The firings should accumulate the output.
+# First firing: 1 = 1
+# Second firing: 3 = 1 + 2
+# Third firing: 6 = 1 + 2 + 3
+# Fourth firing: 10 = 1 + 2 + 3 + 4
+# Fifth firing: 15 = 1 + 2 + 3 + 4 + 5
+# Next firings: 15 = 15 + 0  (advancing the watermark to infinity)
+# The exact number of firings may vary,
 
 Review comment:
   The firings were [1, 3, 6, 10, 15, 15] last week before I syn to the master.
   And it became [1, 3, 6, 10, 15, 15, 15] after I fetch and rebase last Sunday.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359590)
Time Spent: 34h 50m  (was: 34h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 34h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?focusedWorklogId=359588=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359588
 ]

ASF GitHub Bot logged work on BEAM-7970:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:04
Start Date: 13/Dec/19 19:04
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #10361: [BEAM-7970] 
Improved error help in Go PROTOBUF.md
URL: https://github.com/apache/beam/pull/10361
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359588)
Time Spent: 1h 50m  (was: 1h 40m)

> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=359587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359587
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 13/Dec/19 19:01
Start Date: 13/Dec/19 19:01
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10377: [BEAM-3713] pytest 
migration: py3x-{gcp,cython}
URL: https://github.com/apache/beam/pull/10377#issuecomment-565566850
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359587)
Time Spent: 15.5h  (was: 15h 20m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8864) BigQueryQueryToTableIT.test_big_query_legacy_sql - fails in post commit tests

2019-12-13 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995808#comment-16995808
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-8864:
-

We expected to see two records [1] and two records were written to BQ [2]. So 
this is possibly a timing issue.

 

[1][https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L74]

[2] 
[https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-12-02_10_15_03-14204280782580035511?project=apache-beam-testing]

> BigQueryQueryToTableIT.test_big_query_legacy_sql - fails in post commit tests
> -
>
> Key: BEAM-8864
> URL: https://issues.apache.org/jira/browse/BEAM-8864
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp, test-failures
>Reporter: Ahmet Altay
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
> Fix For: Not applicable
>
>
> Logs: 
> [https://builds.apache.org/job/beam_PostCommit_Python35/1123/testReport/junit/apache_beam.io.gcp.big_query_query_to_table_it_test/BigQueryQueryToTableIT/test_big_query_legacy_sql/]
> Error Message
> Expected: (Test pipeline expected terminated in state: DONE and Expected 
> checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72)
>  but: Expected checksum is 158a8ea1c254fcf40d4ed3e7c0242c3ea0a29e72 Actual 
> checksum is da39a3ee5e6b4b0d3255bfef95601890afd80709



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8837) PCollectionVisualizationTest: possible bug

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8837?focusedWorklogId=359558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359558
 ]

ASF GitHub Bot logged work on BEAM-8837:


Author: ASF GitHub Bot
Created on: 13/Dec/19 18:01
Start Date: 13/Dec/19 18:01
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10321: [BEAM-8837] Fix 
pcoll_visualization tests
URL: https://github.com/apache/beam/pull/10321#issuecomment-565543900
 
 
   R: @udim 
   Hi Udi, could you please take another look to see if we can merge the PR? 
Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359558)
Time Spent: 2h  (was: 1h 50m)

> PCollectionVisualizationTest: possible bug
> --
>
> Key: BEAM-8837
> URL: https://issues.apache.org/jira/browse/BEAM-8837
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This seems like a bug, even though the test passes:
> {code}
> test_display_plain_text_when_kernel_has_no_frontend 
> (apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest)
>  ... Exception in thread Thread-4405:
> Traceback (most recent call last):
>   File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
> self.run()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/.eggs/timeloop-1.0.2-py3.7.egg/timeloop/job.py",
>  line 19, in run
> self.execute(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 132, in continuous_update_display
> updated_pv.display_facets(updating_pv=pv)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 209, in display_facets
> data = self._to_dataframe()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 278, in _to_dataframe
> for el in self._to_element_list():
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py",
>  line 266, in _to_element_list
> if ie.current_env().cache_manager().exists('full', self._cache_key):
> AttributeError: 'NoneType' object has no attribute 'exists'
> ok
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8824) Add support for allowed lateness in python sdk

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8824?focusedWorklogId=359548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359548
 ]

ASF GitHub Bot logged work on BEAM-8824:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:58
Start Date: 13/Dec/19 17:58
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10216: [BEAM-8824] Add 
support to allow specify window allowed_lateness in python sdk
URL: https://github.com/apache/beam/pull/10216#issuecomment-565542800
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359548)
Time Spent: 6h 40m  (was: 6.5h)

> Add support for allowed lateness in python sdk
> --
>
> Key: BEAM-8824
> URL: https://issues.apache.org/jira/browse/BEAM-8824
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8824) Add support for allowed lateness in python sdk

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8824?focusedWorklogId=359549=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359549
 ]

ASF GitHub Bot logged work on BEAM-8824:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:58
Start Date: 13/Dec/19 17:58
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10216: [BEAM-8824] Add 
support to allow specify window allowed_lateness in python sdk
URL: https://github.com/apache/beam/pull/10216#issuecomment-565253504
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359549)
Time Spent: 6h 50m  (was: 6h 40m)

> Add support for allowed lateness in python sdk
> --
>
> Key: BEAM-8824
> URL: https://issues.apache.org/jira/browse/BEAM-8824
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=359537=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359537
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:44
Start Date: 13/Dec/19 17:44
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10369: [BEAM-8933] 
BigQueryIO Arrow for read
URL: https://github.com/apache/beam/pull/10369#issuecomment-565537437
 
 
   This is awesome! I'll try to put up a PR with ArrowUtils by itself today.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359537)
Time Spent: 2h 20m  (was: 2h 10m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use (with 
> Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=359525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359525
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:37
Start Date: 13/Dec/19 17:37
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10331: [BEAM-8932]  
Modify PubsubClient to use the proto message throughout.
URL: https://github.com/apache/beam/pull/10331#issuecomment-565534717
 
 
   If I recall correctly, the GCP connectors already leak proto so it cannot be 
vendored. (On mobile now so cannot effectively look into it). If this use is 
independent of that, then the vendored version should be used. If you can wait 
a bit, I'll give a more thorough review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359525)
Time Spent: 1h  (was: 50m)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=359518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359518
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:30
Start Date: 13/Dec/19 17:30
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10374: [BEAM-8575] 
Added a unit test to test that Combine works with FixedWi…
URL: https://github.com/apache/beam/pull/10374#issuecomment-565532004
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359518)
Time Spent: 34h 40m  (was: 34.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 34h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8825?focusedWorklogId=359513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359513
 ]

ASF GitHub Bot logged work on BEAM-8825:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:14
Start Date: 13/Dec/19 17:14
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10380: [BEAM-8825] Add limit 
on number of mutated rows to batching/sorting stages.
URL: https://github.com/apache/beam/pull/10380#issuecomment-565524747
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359513)
Time Spent: 1h 20m  (was: 1h 10m)

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-12-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8825?focusedWorklogId=359511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359511
 ]

ASF GitHub Bot logged work on BEAM-8825:


Author: ASF GitHub Bot
Created on: 13/Dec/19 17:13
Start Date: 13/Dec/19 17:13
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10380: [BEAM-8825] Add limit 
on number of mutated rows to batching/sorting stages.
URL: https://github.com/apache/beam/pull/10380#issuecomment-565523737
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 359511)
Time Spent: 1h 10m  (was: 1h)

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >