[jira] [Created] (BEAM-10004) ZeroDivisionError if source bundle smaller than 1mb

2020-05-15 Thread Corvin Deboeser (Jira)
Corvin Deboeser created BEAM-10004:
--

 Summary: ZeroDivisionError if source bundle smaller than 1mb
 Key: BEAM-10004
 URL: https://issues.apache.org/jira/browse/BEAM-10004
 Project: Beam
  Issue Type: Bug
  Components: io-py-mongodb
Affects Versions: 2.20.0
Reporter: Corvin Deboeser
Assignee: Yichi Zhang


If the desired_bundle_size is lower than 1mb, then split returns only 
SourceBundles with weight=0 which leads to a ZeroDivisionError down the line. 
{noformat}
ZeroDivisionError: float division by zero{noformat}

This error is raised from _compute_cumulative_weights here:

[https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/concat_source.py#L154]

 

Worked for me: Pulling the truncation from _get_split_keys 
([here|https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L226])
 into split instead.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10002) Mongo cursor timeout leads to CursorNotFound error

2020-05-15 Thread Corvin Deboeser (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corvin Deboeser updated BEAM-10002:
---
Summary: Mongo cursor timeout leads to CursorNotFound error  (was: Cursor 
not found if work items take a long time)

> Mongo cursor timeout leads to CursorNotFound error
> --
>
> Key: BEAM-10002
> URL: https://issues.apache.org/jira/browse/BEAM-10002
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Yichi Zhang
>Priority: Major
>
> If some work items take a lot of processing time and the cursor of a bundle 
> is not queried for too long, then mongodb will timeout the cursor which 
> results in
> {code:java}
> pymongo.errors.CursorNotFound: cursor id ... not found
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433595&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433595
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 08:21
Start Date: 15/May/20 08:21
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425642444



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {
+
+  /**
+   * Returns a new {@code PTransform} transform that follows SET DISTINCT 
semantics to compute the
+   * intersection with provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * both pipeline is constructed and provided {@link PCollection}.
+   *
+   * Note that this transform requires that the {@code Coder} of the all 
{@code PCollection}
+   * to be deterministic (see {@link Coder#verifyDeterministic()}). If the 
collection {@code Coder}
+   * is not deterministic, an exception is thrown at pipeline construction 
time.
+   *
+   * All inputs must have equal {@link WindowFn}s and compatible triggers 
(see {@link
+   * Trigger#isCompatible(Trigger)}).
+   *
+   * By default, the output {@code PCollection} encodes its elements 
using the same {@code
+   * Coder} that of {@code PCollection}
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "3","4", 
"5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4","4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersectDistinct(right)); // results will be 
PCollection containing: "1","3","4"
+   *
+   * }
+   *
+   * @param  the type of the elements in the input and output {@code 
PCollection}s.
+   */
+  public static  SetImpl intersectDistinct(PCollection 
rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+return new SetImpl<>(rightCollection, intersectDistinct());
+  }
+
+  /**
+   * Returns a {@code PTransform} that takes a {@code 
PCollectionList>} and returns a
+   * {@code PCollection} containing intersection of collections done in 
order for all collections
+   * in {@code PCollectionList}.
+   *
+   * Returns a new {@code PTransform} transform that follows SET DISTINCT 
semantics which takes a
+   * {@code PCollectionList>} and returns a {@code 
PCollection} containing
+   * intersection of collections done in order for all collections in {@code 
PCollectionList}.
+   *
+   * The elements of the output {@link PCollection} will have all distinct 
elements that are
+   * present in both pipeline is constructed and next {@link PCollection} in 
the list and applied to
+   * all collections in order.
+   *
+   * Note that this transform requires that the {@code Coder} of the all 
{@code PCollection}
+   * to be deterministic (see {

[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433596
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 08:26
Start Date: 15/May/20 08:26
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425645372



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {

Review comment:
   Just thought to add my original intention was to have only binary 
operation. We can make this only 2 to nry transform. As we added 
pCollectionList api, we just allowed PcollectionList with size==1. It is never 
meant to perform as Distinct.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433596)
Remaining Estimate: 87h 40m  (was: 87h 50m)
Time Spent: 8h 20m  (was: 8h 10m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>   Original Estimate: 96h
>  Time Spent: 8h 20m
>  Remaining Estimate: 87h 40m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433597
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 08:27
Start Date: 15/May/20 08:27
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425645372



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {

Review comment:
   Just thought to add my original intention was to have only binary 
operation. We can make this only 2 to nry transform. As we added 
pCollectionList api, we just allowed PcollectionList with size==1. It is never 
meant to behave as Distinct for one collection.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433597)
Remaining Estimate: 87.5h  (was: 87h 40m)
Time Spent: 8.5h  (was: 8h 20m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>   Original Estimate: 96h
>  Time Spent: 8.5h
>  Remaining Estimate: 87.5h
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3926) Support MetricsPusher in Dataflow Runner

2020-05-15 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108082#comment-17108082
 ] 

Etienne Chauchot commented on BEAM-3926:


[~foegler], [~pabloem], I have a user who asks for this feature in Dataflow. Is 
there a willingness to implement it for the Dataflow runner? 

If so, as the MetricsPusher needs to be instanciated at the engine side (cf 
arguments in the description of the ticket), I was wondering if the worker part 
of the dataflow runner could be the correct spot and as it was donated it would 
enable the community to implement the feature for Dataflow.

> Support MetricsPusher in Dataflow Runner
> 
>
> Key: BEAM-3926
> URL: https://issues.apache.org/jira/browse/BEAM-3926
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> See [relevant email 
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
>  From [~echauchot]:
>   
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very 
> different from the others. It just submits a job to the cloud hosted engine. 
> So, no access to metrics container etc... from the runner. So I think that 
> the MetricsPusher (component responsible for merging metrics and pushing them 
> to a sink backend) must not be instanciated in DataflowRunner otherwise it 
> would be more a client (driver) piece of code and we will lose all the 
> interest of being close to the execution engine (among other things 
> instrumentation of the execution of the pipelines).  I think that the 
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=433603&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433603
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 15/May/20 08:47
Start Date: 15/May/20 08:47
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-629113897


   @pabloem everything should be bundled, `build.gradle` for 
`io.google-cloud-platform` contains everything that's needed. Maybe there's 
some other error?
   
   @jaketf Can you elaborate on the `java.lang.NoClassDefFoundError: Could not 
initialize class org.apache.beam.sdk.io.gcp.healthcare.FhirIOTestUtil`? What 
exactly causes this error?
   
   >Are test utility classes not bundled up and set to dataflow?
   
   what do you mean by this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433603)
Time Spent: 42h 10m  (was: 42h)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 42h 10m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-3926) Support MetricsPusher in Dataflow Runner

2020-05-15 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108082#comment-17108082
 ] 

Etienne Chauchot edited comment on BEAM-3926 at 5/15/20, 8:47 AM:
--

[~foegler], [~pabloem], [~ajam...@google.com], I have a user who asks for this 
feature in Dataflow. Is there a willingness to implement it for the Dataflow 
runner? 

If so, as the MetricsPusher needs to be instanciated at the engine side (cf 
arguments in the description of the ticket), I was wondering if the worker part 
of the dataflow runner could be the correct spot and as it was donated it would 
enable the community to implement the feature for Dataflow.


was (Author: echauchot):
[~foegler], [~pabloem], I have a user who asks for this feature in Dataflow. Is 
there a willingness to implement it for the Dataflow runner? 

If so, as the MetricsPusher needs to be instanciated at the engine side (cf 
arguments in the description of the ticket), I was wondering if the worker part 
of the dataflow runner could be the correct spot and as it was donated it would 
enable the community to implement the feature for Dataflow.

> Support MetricsPusher in Dataflow Runner
> 
>
> Key: BEAM-3926
> URL: https://issues.apache.org/jira/browse/BEAM-3926
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> See [relevant email 
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
>  From [~echauchot]:
>   
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very 
> different from the others. It just submits a job to the cloud hosted engine. 
> So, no access to metrics container etc... from the runner. So I think that 
> the MetricsPusher (component responsible for merging metrics and pushing them 
> to a sink backend) must not be instanciated in DataflowRunner otherwise it 
> would be more a client (driver) piece of code and we will lose all the 
> interest of being close to the execution engine (among other things 
> instrumentation of the execution of the pipelines).  I think that the 
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9961) Python MongoDBIO does not apply projection

2020-05-15 Thread Corvin Deboeser (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corvin Deboeser updated BEAM-9961:
--
Component/s: (was: sdk-py-core)

> Python MongoDBIO does not apply projection
> --
>
> Key: BEAM-9961
> URL: https://issues.apache.org/jira/browse/BEAM-9961
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Priority: Minor
>
> ReadFromMongoDB does not apply the provided projection when reading from the 
> client - only filter is being applied as you can see here:
> https://github.com/apache/beam/blob/9f0cb649d39ee6236ea27f111acb4b66591a80ec/sdks/python/apache_beam/io/mongodbio.py#L204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9306) Python MongoDBIO not applying projection

2020-05-15 Thread Corvin Deboeser (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corvin Deboeser closed BEAM-9306.
-
Fix Version/s: Not applicable
   Resolution: Duplicate

> Python MongoDBIO not applying projection
> 
>
> Key: BEAM-9306
> URL: https://issues.apache.org/jira/browse/BEAM-9306
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.19.0
>Reporter: Corvin Deboeser
>Priority: Major
>  Labels: mongodb
> Fix For: Not applicable
>
>
> The ReadFromMongoDB does not apply the projection (stored in self.projection):
> [https://github.com/apache/beam/blob/ff8ce834ff24e847c90872a4ea545f987e3e0a2d/sdks/python/apache_beam/io/mongodbio.py#L200]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433622
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 09:45
Start Date: 15/May/20 09:45
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425689211



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {

Review comment:
   Thanks.Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433622)
Remaining Estimate: 87h 20m  (was: 87.5h)
Time Spent: 8h 40m  (was: 8.5h)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>   Original Estimate: 96h
>  Time Spent: 8h 40m
>  Remaining Estimate: 87h 20m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433623
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 09:46
Start Date: 15/May/20 09:46
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425689324



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {
+
+  /**
+   * Returns a new {@code PTransform} transform that follows SET DISTINCT 
semantics to compute the
+   * intersection with provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * both pipeline is constructed and provided {@link PCollection}.
+   *
+   * Note that this transform requires that the {@code Coder} of the all 
{@code PCollection}
+   * to be deterministic (see {@link Coder#verifyDeterministic()}). If the 
collection {@code Coder}
+   * is not deterministic, an exception is thrown at pipeline construction 
time.
+   *
+   * All inputs must have equal {@link WindowFn}s and compatible triggers 
(see {@link
+   * Trigger#isCompatible(Trigger)}).
+   *
+   * By default, the output {@code PCollection} encodes its elements 
using the same {@code
+   * Coder} that of {@code PCollection}
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "3","4", 
"5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4","4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersectDistinct(right)); // results will be 
PCollection containing: "1","3","4"
+   *
+   * }
+   *
+   * @param  the type of the elements in the input and output {@code 
PCollection}s.
+   */
+  public static  SetImpl intersectDistinct(PCollection 
rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+return new SetImpl<>(rightCollection, intersectDistinct());
+  }
+
+  /**
+   * Returns a {@code PTransform} that takes a {@code 
PCollectionList>} and returns a
+   * {@code PCollection} containing intersection of collections done in 
order for all collections
+   * in {@code PCollectionList}.
+   *
+   * Returns a new {@code PTransform} transform that follows SET DISTINCT 
semantics which takes a
+   * {@code PCollectionList>} and returns a {@code 
PCollection} containing
+   * intersection of collections done in order for all collections in {@code 
PCollectionList}.
+   *
+   * The elements of the output {@link PCollection} will have all distinct 
elements that are
+   * present in both pipeline is constructed and next {@link PCollection} in 
the list and applied to
+   * all collections in order.
+   *
+   * Note that this transform requires that the {@code Coder} of the all 
{@code PCollection}
+   * to be deterministic (see {

[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433624
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 09:47
Start Date: 15/May/20 09:47
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425689729



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {
+
+  /**
+   * Returns a new {@code PTransform} transform that follows SET DISTINCT 
semantics to compute the
+   * intersection with provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * both pipeline is constructed and provided {@link PCollection}.
+   *
+   * Note that this transform requires that the {@code Coder} of the all 
{@code PCollection}
+   * to be deterministic (see {@link Coder#verifyDeterministic()}). If the 
collection {@code Coder}
+   * is not deterministic, an exception is thrown at pipeline construction 
time.
+   *
+   * All inputs must have equal {@link WindowFn}s and compatible triggers 
(see {@link
+   * Trigger#isCompatible(Trigger)}).
+   *
+   * By default, the output {@code PCollection} encodes its elements 
using the same {@code
+   * Coder} that of {@code PCollection}
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "3","4", 
"5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4","4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersectDistinct(right)); // results will be 
PCollection containing: "1","3","4"
+   *
+   * }
+   *
+   * @param  the type of the elements in the input and output {@code 
PCollection}s.
+   */
+  public static  SetImpl intersectDistinct(PCollection 
rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+return new SetImpl<>(rightCollection, intersectDistinct());
+  }
+
+  /**
+   * Returns a {@code PTransform} that takes a {@code 
PCollectionList>} and returns a
+   * {@code PCollection} containing intersection of collections done in 
order for all collections
+   * in {@code PCollectionList}.
+   *
+   * Returns a new {@code PTransform} transform that follows SET DISTINCT 
semantics which takes a
+   * {@code PCollectionList>} and returns a {@code 
PCollection} containing
+   * intersection of collections done in order for all collections in {@code 
PCollectionList}.
+   *
+   * The elements of the output {@link PCollection} will have all distinct 
elements that are
+   * present in both pipeline is constructed and next {@link PCollection} in 
the list and applied to
+   * all collections in order.
+   *
+   * Note that this transform requires that the {@code Coder} of the all 
{@code PCollection}
+   * to be deterministic (see {

[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433625
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 09:49
Start Date: 15/May/20 09:49
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r422465224



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,261 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {
+
+  /**
+   * Returns a new {@code SetFns.SetImpl} transform that compute the 
intersection with provided
+   * {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * both pipeline is constructed and provided {@link PCollection}.
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersect(right));
+   * }
+   */
+  public static  SetImpl intersect(PCollection rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+SerializableBiFunction intersectFn =
+(numberOfElementsinLeft, numberOfElementsinRight) -> 
(numberOfElementsinLeft > 0 && numberOfElementsinRight > 0) ? 1L : 0L;
+return new SetImpl<>(rightCollection, intersectFn);
+  }
+
+  /**
+   * Returns a new {@code SetFns.SetImpl} transform that compute the 
intersection all with
+   * provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} which will follow 
EXCEPT_ALL Semantics as
+   * follows: Given there are m elements on pipeline which is constructed 
{@link PCollection}
+   * (left) and n elements on in provided {@link PCollection} (right): - it 
will output MIN(m -
+   * n, 0) elements of left for all elements which are present in both left 
and right.
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5"));
+   * PCollection right = p.apply(Create.of("1", "3", "4", "6"));
+   *
+   * PCollection results =
+   * left.apply(SetFns.intersectAll(right));
+   * }
+   */
+  public static  SetImpl intersectAll(PCollection rightCollection) {
+checkNotNull(rightCollection, "rightCollection argument is null");
+SerializableBiFunction intersectFn =
+(numberOfElementsinLeft, numberOfElementsinRight) -> 
(numberOfElementsinLeft > 0 && numberOfElementsinRight > 0) ? 
Math.min(numberOfElementsinLeft, numberOfElementsinRight) : 0L;
+return new SetImpl<>(rightCollection, intersectFn);
+  }
+
+  /**
+   * Returns a new {@code SetFns.SetImpl} transform that compute the 
difference (except) with
+   * provided {@code PCollection}.
+   *
+   * The argument should not be modified after this is called.
+   *
+   * The elements of the output {@link PCollection} will all distinct 
elements that present in
+   * pipeline is constructed {@link PCollection} but not present in 
provided {@link
+   * PCollection}.
+   *
+   * {@code
+   * Pipeline p = ...;
+   *
+   * PCollection left = p.apply(Create.of("1", "2", "3", "4", "5"));
+   * PCollection right = p.apply(Crea

[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=433626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433626
 ]

ASF GitHub Bot logged work on BEAM-9825:


Author: ASF GitHub Bot
Created on: 15/May/20 09:51
Start Date: 15/May/20 09:51
Worklog Time Spent: 10m 
  Work Description: darshanj commented on a change in pull request #11610:
URL: https://github.com/apache/beam/pull/11610#discussion_r425692155



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/SetFns.java
##
@@ -0,0 +1,618 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.transforms.join.CoGbkResult;
+import org.apache.beam.sdk.transforms.join.CoGroupByKey;
+import org.apache.beam.sdk.transforms.join.KeyedPCollectionTuple;
+import org.apache.beam.sdk.transforms.windowing.Trigger;
+import org.apache.beam.sdk.transforms.windowing.WindowFn;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.TupleTag;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+
+public class SetFns {

Review comment:
   I have removed to support PCollectionList with only one PCollection





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433626)
Remaining Estimate: 86h 40m  (was: 86h 50m)
Time Spent: 9h 20m  (was: 9h 10m)

> Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
> --
>
> Key: BEAM-9825
> URL: https://issues.apache.org/jira/browse/BEAM-9825
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>   Original Estimate: 96h
>  Time Spent: 9h 20m
>  Remaining Estimate: 86h 40m
>
> I'd like to propose following new high-level transforms.
>  * Intersect
> Compute the intersection between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that common to both _leftCollection_ and 
> _rightCollection_
>  
>  * Except
> Compute the difference between elements of two PCollection.
> Given _leftCollection_ and _rightCollection_, this transform returns a 
> collection containing elements that are in _leftCollection_ but not in 
> _rightCollection_
>  * Union
> Find the elements that are either of two PCollection.
> Implement IntersetAll, ExceptAll and UnionAll variants of transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2530) Make Beam compatible with next Java LTS version (Java 11)

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2530?focusedWorklogId=433628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433628
 ]

ASF GitHub Bot logged work on BEAM-2530:


Author: ASF GitHub Bot
Created on: 15/May/20 10:07
Start Date: 15/May/20 10:07
Worklog Time Spent: 10m 
  Work Description: kamilwu merged pull request #11659:
URL: https://github.com/apache/beam/pull/11659


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433628)
Time Spent: 6h 20m  (was: 6h 10m)

> Make Beam compatible with next Java LTS version (Java 11)
> -
>
> Key: BEAM-2530
> URL: https://issues.apache.org/jira/browse/BEAM-2530
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Affects Versions: Not applicable
>Reporter: Ismaël Mejía
>Priority: Major
>  Labels: Java11, java9
> Fix For: Not applicable
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The goal of this task is to validate that the Java SDK and the Java Direct 
> Runner (and its tests) work as intended on the next Java LTS version (Java 11 
> /18.9). For this we will base the compilation on the java.base profile and 
> include other core Java modules when needed.  
> *Notes:*
> - Ideally validation of the IOs/extensions will be included but if serious 
> issues are found they will be tracked independently.
> - The goal of using the Java Platform module system is out of the scope of 
> this work.
> - Support for other runners will be a tracked as a separate effort because 
> other runners depend strongly in the support of the native runner ecosystems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=433680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433680
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 15/May/20 13:20
Start Date: 15/May/20 13:20
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11360:
URL: https://github.com/apache/beam/pull/11360#issuecomment-629232037


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433680)
Time Spent: 3h 50m  (was: 3h 40m)

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=433681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433681
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 15/May/20 13:21
Start Date: 15/May/20 13:21
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #11360:
URL: https://github.com/apache/beam/pull/11360#issuecomment-629232725


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433681)
Time Spent: 4h  (was: 3h 50m)

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9722) Add batch SnowflakeIO.Read to Java SDK

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=433682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433682
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 15/May/20 13:22
Start Date: 15/May/20 13:22
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev removed a comment on pull request #11360:
URL: https://github.com/apache/beam/pull/11360#issuecomment-629232037


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433682)
Time Spent: 4h 10m  (was: 4h)

> Add batch SnowflakeIO.Read to Java SDK
> --
>
> Key: BEAM-9722
> URL: https://issues.apache.org/jira/browse/BEAM-9722
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Kasia Kucharczyk
>Assignee: Dariusz Aniszewski
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-10001) Change the code block colors from grey to blue to increase the contrast between text and background.

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10001?focusedWorklogId=433693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433693
 ]

ASF GitHub Bot logged work on BEAM-10001:
-

Author: ASF GitHub Bot
Created on: 15/May/20 13:39
Start Date: 15/May/20 13:39
Worklog Time Spent: 10m 
  Work Description: bntnam opened a new pull request #11719:
URL: https://github.com/apache/beam/pull/11719


   Change the code block colors from grey to blue to increase the contrast 
between text and background.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastComple

[jira] [Work logged] (BEAM-10001) Change the code block colors from grey to blue to increase the contrast between text and background.

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10001?focusedWorklogId=433694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433694
 ]

ASF GitHub Bot logged work on BEAM-10001:
-

Author: ASF GitHub Bot
Created on: 15/May/20 13:40
Start Date: 15/May/20 13:40
Worklog Time Spent: 10m 
  Work Description: bntnam commented on pull request #11719:
URL: https://github.com/apache/beam/pull/11719#issuecomment-629241952


   R: @pabloem 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433694)
Time Spent: 20m  (was: 10m)

> Change the code block colors from grey to blue to increase the contrast 
> between text and background.
> 
>
> Key: BEAM-10001
> URL: https://issues.apache.org/jira/browse/BEAM-10001
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Aizhamal Nurmamat kyzy
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Example: [https://beam.apache.org/get-started/try-apache-beam/]
> The old background color: 
> [http://apache-beam-website-pull-requests.storage.googleapis.com/11705/documentation/programming-guide/index.html#creating-a-pipeline]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10005) Unable to use ApproximateQuantiles.globally when not windowed by GlobalWindows

2020-05-15 Thread Darshan Jani (Jira)
Darshan Jani created BEAM-10005:
---

 Summary: Unable to use ApproximateQuantiles.globally when not 
windowed by GlobalWindows
 Key: BEAM-10005
 URL: https://issues.apache.org/jira/browse/BEAM-10005
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Affects Versions: 2.20.0
Reporter: Darshan Jani


Unable to use ApproximateQuantiles.globally with input windowed not using 
GlobalWindows.
To make it run we need to set either 
{code:java}
.withoutDefaults()
{code}
or
{code:java}
.asSingletonView()
{code}

Currently we can't call any of the above on ApproximateQuantiles.globally() as 
it does not return underlying Combine.globally, but PTransform.

Example failing case:

{code:java}
PCollection elements = p.apply(GenerateSequence.from(0).to(100)
  .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));

  PCollection> input = elements
  
.apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
  .apply(ApproximateQuantiles.globally(17));
{code}

It throws expected error from internal Combine.globally() transform:

{code:java}
Default values are not supported in Combine.globally() if the input PCollection 
is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() to output an empty PCollection if the 
input PCollection is empty, or Combine.globally().asSingletonView() to get the 
default output of the CombineFn if the input PCollection is empty.
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10005) Unable to use ApproximateQuantiles.globally when not windowed by GlobalWindows

2020-05-15 Thread Darshan Jani (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani reassigned BEAM-10005:
---

Assignee: Darshan Jani

> Unable to use ApproximateQuantiles.globally when not windowed by GlobalWindows
> --
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> Unable to use ApproximateQuantiles.globally with input windowed not using 
> GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on ApproximateQuantiles.globally() 
> as it does not return underlying Combine.globally, but PTransform.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUniques.globally when inputs not windowed by GlobalWindows

2020-05-15 Thread Darshan Jani (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani updated BEAM-10005:

Description: 
Unable to use ApproximateQuantiles.globally or ApproximateQuantiles.globally 
with input windowed not using GlobalWindows.
To make it run we need to set either 
{code:java}
.withoutDefaults()
{code}
or
{code:java}
.asSingletonView()
{code}

Currently we can't call any of the above on ApproximateQuantiles.globally() as 
it does not return underlying Combine.globally, but PTransform.

Example failing case:

{code:java}
PCollection elements = p.apply(GenerateSequence.from(0).to(100)
  .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));

  PCollection> input = elements
  
.apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
  .apply(ApproximateQuantiles.globally(17));
{code}

It throws expected error from internal Combine.globally() transform:

{code:java}
Default values are not supported in Combine.globally() if the input PCollection 
is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() to output an empty PCollection if the 
input PCollection is empty, or Combine.globally().asSingletonView() to get the 
default output of the CombineFn if the input PCollection is empty.
{code}


  was:
Unable to use ApproximateQuantiles.globally with input windowed not using 
GlobalWindows.
To make it run we need to set either 
{code:java}
.withoutDefaults()
{code}
or
{code:java}
.asSingletonView()
{code}

Currently we can't call any of the above on ApproximateQuantiles.globally() as 
it does not return underlying Combine.globally, but PTransform.

Example failing case:

{code:java}
PCollection elements = p.apply(GenerateSequence.from(0).to(100)
  .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));

  PCollection> input = elements
  
.apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
  .apply(ApproximateQuantiles.globally(17));
{code}

It throws expected error from internal Combine.globally() transform:

{code:java}
Default values are not supported in Combine.globally() if the input PCollection 
is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() to output an empty PCollection if the 
input PCollection is empty, or Combine.globally().asSingletonView() to get the 
default output of the CombineFn if the input PCollection is empty.
{code}



> Unable to use ApproximateQuantiles.globally/ApproximateUniques.globally when 
> inputs not windowed by GlobalWindows
> -
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> Unable to use ApproximateQuantiles.globally or ApproximateQuantiles.globally 
> with input windowed not using GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on ApproximateQuantiles.globally() 
> as it does not return underlying Combine.globally, but PTransform.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows

2020-05-15 Thread Darshan Jani (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani updated BEAM-10005:

Summary: Unable to use 
ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not 
windowed by GlobalWindows  (was: Unable to use 
ApproximateQuantiles.globally/ApproximateUniques.globally when inputs not 
windowed by GlobalWindows)

> Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when 
> inputs not windowed by GlobalWindows
> 
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> Unable to use ApproximateQuantiles.globally or ApproximateQuantiles.globally 
> with input windowed not using GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on ApproximateQuantiles.globally() 
> as it does not return underlying Combine.globally, but PTransform.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUniques.globally when inputs not windowed by GlobalWindows

2020-05-15 Thread Darshan Jani (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani updated BEAM-10005:

Summary: Unable to use 
ApproximateQuantiles.globally/ApproximateUniques.globally when inputs not 
windowed by GlobalWindows  (was: Unable to use ApproximateQuantiles.globally 
when not windowed by GlobalWindows)

> Unable to use ApproximateQuantiles.globally/ApproximateUniques.globally when 
> inputs not windowed by GlobalWindows
> -
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> Unable to use ApproximateQuantiles.globally with input windowed not using 
> GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on ApproximateQuantiles.globally() 
> as it does not return underlying Combine.globally, but PTransform.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows

2020-05-15 Thread Darshan Jani (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darshan Jani updated BEAM-10005:

Description: 
Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally with 
input windowed not using GlobalWindows.
To make it run we need to set either 
{code:java}
.withoutDefaults()
{code}
or
{code:java}
.asSingletonView()
{code}

Currently we can't call any of the above on 
ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not 
return underlying Combine.globally, but PTransform or Globally in case of 
ApproximateUnique.

Example failing case:

{code:java}
PCollection elements = p.apply(GenerateSequence.from(0).to(100)
  .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));

  PCollection> input = elements
  
.apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
  .apply(ApproximateQuantiles.globally(17));
{code}

It throws expected error from internal Combine.globally() transform:

{code:java}
Default values are not supported in Combine.globally() if the input PCollection 
is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() to output an empty PCollection if the 
input PCollection is empty, or Combine.globally().asSingletonView() to get the 
default output of the CombineFn if the input PCollection is empty.
{code}


  was:
Unable to use ApproximateQuantiles.globally or ApproximateQuantiles.globally 
with input windowed not using GlobalWindows.
To make it run we need to set either 
{code:java}
.withoutDefaults()
{code}
or
{code:java}
.asSingletonView()
{code}

Currently we can't call any of the above on ApproximateQuantiles.globally() as 
it does not return underlying Combine.globally, but PTransform.

Example failing case:

{code:java}
PCollection elements = p.apply(GenerateSequence.from(0).to(100)
  .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));

  PCollection> input = elements
  
.apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
  .apply(ApproximateQuantiles.globally(17));
{code}

It throws expected error from internal Combine.globally() transform:

{code:java}
Default values are not supported in Combine.globally() if the input PCollection 
is not windowed by GlobalWindows. Instead, use 
Combine.globally().withoutDefaults() to output an empty PCollection if the 
input PCollection is empty, or Combine.globally().asSingletonView() to get the 
default output of the CombineFn if the input PCollection is empty.
{code}



> Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when 
> inputs not windowed by GlobalWindows
> 
>
> Key: BEAM-10005
> URL: https://issues.apache.org/jira/browse/BEAM-10005
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Darshan Jani
>Assignee: Darshan Jani
>Priority: Major
>
> Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally 
> with input windowed not using GlobalWindows.
> To make it run we need to set either 
> {code:java}
> .withoutDefaults()
> {code}
> or
> {code:java}
> .asSingletonView()
> {code}
> Currently we can't call any of the above on 
> ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not 
> return underlying Combine.globally, but PTransform or Globally in case of 
> ApproximateUnique.
> Example failing case:
> {code:java}
> PCollection elements = p.apply(GenerateSequence.from(0).to(100)
>   .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new));
>   PCollection> input = elements
>   
> .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1
>   .apply(ApproximateQuantiles.globally(17));
> {code}
> It throws expected error from internal Combine.globally() transform:
> {code:java}
> Default values are not supported in Combine.globally() if the input 
> PCollection is not windowed by GlobalWindows. Instead, use 
> Combine.globally().withoutDefaults() to output an empty PCollection if the 
> input PCollection is empty, or Combine.globally().asSingletonView() to get 
> the default output of the CombineFn if the input PCollection is empty.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9966) Investigate variance in checkpoint duration of ParDo streaming tests

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9966?focusedWorklogId=433720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433720
 ]

ASF GitHub Bot logged work on BEAM-9966:


Author: ASF GitHub Bot
Created on: 15/May/20 15:03
Start Date: 15/May/20 15:03
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11693:
URL: https://github.com/apache/beam/pull/11693#issuecomment-629284358


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433720)
Time Spent: 1h 10m  (was: 1h)

> Investigate variance in checkpoint duration of ParDo streaming tests
> 
>
> Key: BEAM-9966
> URL: https://issues.apache.org/jira/browse/BEAM-9966
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We need to take a closer look at the variance in checkpoint duration which, 
> for different test runs, fluctuates between one second and one minute. 
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9966) Investigate variance in checkpoint duration of ParDo streaming tests

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9966?focusedWorklogId=433729&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433729
 ]

ASF GitHub Bot logged work on BEAM-9966:


Author: ASF GitHub Bot
Created on: 15/May/20 15:26
Start Date: 15/May/20 15:26
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11693:
URL: https://github.com/apache/beam/pull/11693#issuecomment-629307107


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433729)
Time Spent: 1h 20m  (was: 1h 10m)

> Investigate variance in checkpoint duration of ParDo streaming tests
> 
>
> Key: BEAM-9966
> URL: https://issues.apache.org/jira/browse/BEAM-9966
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We need to take a closer look at the variance in checkpoint duration which, 
> for different test runs, fluctuates between one second and one minute. 
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=433734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433734
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 15/May/20 15:26
Start Date: 15/May/20 15:26
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11716:
URL: https://github.com/apache/beam/pull/11716#issuecomment-629307948


   LGTM. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433734)
Time Spent: 32h 20m  (was: 32h 10m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 32h 20m
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9966) Investigate variance in checkpoint duration of ParDo streaming tests

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9966?focusedWorklogId=433730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433730
 ]

ASF GitHub Bot logged work on BEAM-9966:


Author: ASF GitHub Bot
Created on: 15/May/20 15:26
Start Date: 15/May/20 15:26
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11693:
URL: https://github.com/apache/beam/pull/11693#issuecomment-629284358


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433730)
Time Spent: 1.5h  (was: 1h 20m)

> Investigate variance in checkpoint duration of ParDo streaming tests
> 
>
> Key: BEAM-9966
> URL: https://issues.apache.org/jira/browse/BEAM-9966
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We need to take a closer look at the variance in checkpoint duration which, 
> for different test runs, fluctuates between one second and one minute. 
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9646) [Java] PTransform that integrates Cloud Vision functionality

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9646?focusedWorklogId=433736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433736
 ]

ASF GitHub Bot logged work on BEAM-9646:


Author: ASF GitHub Bot
Created on: 15/May/20 15:33
Start Date: 15/May/20 15:33
Worklog Time Spent: 10m 
  Work Description: tysonjh commented on a change in pull request #11331:
URL: https://github.com/apache/beam/pull/11331#discussion_r425882497



##
File path: 
sdks/java/extensions/ml/src/main/java/org/apache/beam/sdk/extensions/ml/AnnotateImages.java
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.ml;
+
+import com.google.cloud.vision.v1.AnnotateImageRequest;
+import com.google.cloud.vision.v1.AnnotateImageResponse;
+import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
+import com.google.cloud.vision.v1.Feature;
+import com.google.cloud.vision.v1.ImageAnnotatorClient;
+import com.google.cloud.vision.v1.ImageContext;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.GroupIntoBatches;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.values.KV;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionView;
+
+/**
+ * Parent class for transform utilizing Cloud Vision API.
+ *
+ * @param  Type of input PCollection.
+ */
+public abstract class AnnotateImages
+extends PTransform, 
PCollection>> {
+
+  private static final Long MIN_BATCH_SIZE = 1L;
+  private static final Long MAX_BATCH_SIZE = 5L;
+
+  protected final PCollectionView> contextSideInput;
+  protected final List featureList;
+  private long batchSize;
+
+  public AnnotateImages(
+  PCollectionView> contextSideInput,
+  List featureList,
+  long batchSize) {
+this.contextSideInput = contextSideInput;
+this.featureList = featureList;
+checkBatchSizeCorrectness(batchSize);
+this.batchSize = batchSize;
+  }
+
+  public AnnotateImages(List featureList, long batchSize) {
+contextSideInput = null;
+this.featureList = featureList;
+checkBatchSizeCorrectness(batchSize);
+this.batchSize = batchSize;
+  }
+
+  private void checkBatchSizeCorrectness(long batchSize) {
+if (batchSize > MAX_BATCH_SIZE) {
+  throw new IllegalArgumentException(
+  String.format(
+  "Max batch size exceeded.\n" + "Batch size needs to be equal or 
smaller than %d",
+  MAX_BATCH_SIZE));
+} else if (batchSize < MIN_BATCH_SIZE) {
+  throw new IllegalArgumentException(
+  String.format(
+  "Min batch size not reached.\n" + "Batch size needs to be larger 
than %d",
+  MIN_BATCH_SIZE));
+}
+  }
+
+  /**
+   * Applies all necessary transforms to call the Vision API. In order to 
group requests into
+   * batches, we assign keys to the requests, as {@link GroupIntoBatches} 
works only on {@link KV}s.
+   */
+  @Override
+  public PCollection> expand(PCollection input) 
{
+ParDo.SingleOutput inputToRequestMapper;
+if (contextSideInput != null) {
+  inputToRequestMapper =
+  ParDo.of(new 
MapInputToRequest(contextSideInput)).withSideInputs(contextSideInput);
+} else {
+  inputToRequestMapper = ParDo.of(new MapInputToRequest(null));
+}
+return input
+.apply(inputToRequestMapper)
+.apply(ParDo.of(new AssignRandomKeys()))
+.apply(GroupIntoBatches.ofSize(batchSize))
+.apply(ParDo.of(new ExtractValues()))
+.apply(ParDo.of(new PerformImageAnnotation()));
+  }
+
+  /**
+   * Input type to {@link AnnotateImageRequest} mapper. Needs to be 
implemented by child classes
+   *
+   * @param input Input element.
+   * @param ctx optional image context.
+   * @return A valid {@link AnnotateImageRequest} object.
+   */
+  public abstract AnnotateImageRequest mapToRequest(T input, ImageCont

[jira] [Commented] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines

2020-05-15 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108408#comment-17108408
 ] 

Kamil Wasilewski commented on BEAM-9994:


Tried python3.8 -m venv env, the result was:

{noformat}
Error: Command '['/home/kamilwu/env/bin/python3.8', '-Im', 'ensurepip', 
'--upgrade', '--default-pip']' returned non-zero exit status 1.
{noformat}

However, python3.8 -m venv --without-pip env does work.

I cloned a Jenkins VM image, the name is *jenkins-worker*, if someone if is 
willing to take a look. 


> Cannot create a virtualenv using Python 3.8 on Jenkins machines
> ---
>
> Key: BEAM-9994
> URL: https://issues.apache.org/jira/browse/BEAM-9994
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kamil Wasilewski
>Priority: Blocker
>
> Command: *virtualenv --python /usr/bin/python3.8 env*
> Output:
> {noformat}
> Running virtualenv with interpreter /usr/bin/python3.8
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in 
> 
> import distutils.spawn
> ModuleNotFoundError: No module named 'distutils.spawn'
> {noformat}
> Example test affected: 
> https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10006) PipelineOptions can pick up definitions from unrelated tests

2020-05-15 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10006:


 Summary: PipelineOptions can pick up definitions from unrelated 
tests
 Key: BEAM-10006
 URL: https://issues.apache.org/jira/browse/BEAM-10006
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Brian Hulette


Since PipelineOptions uses {{__subclasses__}} to look for all definitions, when 
used in tests it can sometimes pick up sub-classes that were created in 
previously executed tests.

See BEAM-9975 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10006) PipelineOptions can pick up definitions from unrelated tests

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10006:
-
Status: Open  (was: Triage Needed)

> PipelineOptions can pick up definitions from unrelated tests
> 
>
> Key: BEAM-10006
> URL: https://issues.apache.org/jira/browse/BEAM-10006
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> Since PipelineOptions uses {{\_\_subclasses\_\_}} to look for all 
> definitions, when used in tests it can sometimes pick up sub-classes that 
> were created in previously executed tests.
> See BEAM-9975 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10006) PipelineOptions can pick up definitions from unrelated tests

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10006:
-
Description: 
Since PipelineOptions uses {{\_\_subclasses\_\_}} to look for all definitions, 
when used in tests it can sometimes pick up sub-classes that were created in 
previously executed tests.

See BEAM-9975 for more details.

  was:
Since PipelineOptions uses {{__subclasses__}} to look for all definitions, when 
used in tests it can sometimes pick up sub-classes that were created in 
previously executed tests.

See BEAM-9975 for more details.


> PipelineOptions can pick up definitions from unrelated tests
> 
>
> Key: BEAM-10006
> URL: https://issues.apache.org/jira/browse/BEAM-10006
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> Since PipelineOptions uses {{\_\_subclasses\_\_}} to look for all 
> definitions, when used in tests it can sometimes pick up sub-classes that 
> were created in previously executed tests.
> See BEAM-9975 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10007) PortableRunner doesn't handle ValueProvider instances when converting pipeline options

2020-05-15 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10007:


 Summary: PortableRunner doesn't handle ValueProvider instances 
when converting pipeline options
 Key: BEAM-10007
 URL: https://issues.apache.org/jira/browse/BEAM-10007
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Brian Hulette


We attempt to convert ValueProvider instances directly to JSON with 
json_format, leading to errors like the one described in BEAM-9975.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10006) PipelineOptions can pick up definitions from unrelated tests

2020-05-15 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108410#comment-17108410
 ] 

Brian Hulette commented on BEAM-10006:
--

Unfortunately I can't find a way to reproduce this locally, but we've seen it 
happen in Jenkins, as in BEAM-9975

> PipelineOptions can pick up definitions from unrelated tests
> 
>
> Key: BEAM-10006
> URL: https://issues.apache.org/jira/browse/BEAM-10006
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> Since PipelineOptions uses {{\_\_subclasses\_\_}} to look for all 
> definitions, when used in tests it can sometimes pick up sub-classes that 
> were created in previously executed tests.
> See BEAM-9975 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10007) PortableRunner doesn't handle ValueProvider instances when converting pipeline options

2020-05-15 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108411#comment-17108411
 ] 

Brian Hulette commented on BEAM-10007:
--

The solution here may be as simple as just calling {{.get()}} on any 
ValueProvider instances, but I'm not sure.

> PortableRunner doesn't handle ValueProvider instances when converting 
> pipeline options
> --
>
> Key: BEAM-10007
> URL: https://issues.apache.org/jira/browse/BEAM-10007
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> We attempt to convert ValueProvider instances directly to JSON with 
> json_format, leading to errors like the one described in BEAM-9975.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10007) PortableRunner doesn't handle ValueProvider instances when converting pipeline options

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10007:
-
Status: Open  (was: Triage Needed)

> PortableRunner doesn't handle ValueProvider instances when converting 
> pipeline options
> --
>
> Key: BEAM-10007
> URL: https://issues.apache.org/jira/browse/BEAM-10007
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> We attempt to convert ValueProvider instances directly to JSON with 
> json_format, leading to errors like the one described in BEAM-9975.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10007) PortableRunner doesn't handle ValueProvider instances when converting pipeline options

2020-05-15 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108414#comment-17108414
 ] 

Brian Hulette commented on BEAM-10007:
--

My concern is that a consumer of the option will expect it to be a 
ValueProvider instance, and attempt to call .get() itself.

> PortableRunner doesn't handle ValueProvider instances when converting 
> pipeline options
> --
>
> Key: BEAM-10007
> URL: https://issues.apache.org/jira/browse/BEAM-10007
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> We attempt to convert ValueProvider instances directly to JSON with 
> json_format, leading to errors like the one described in BEAM-9975.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9814) Error occurred when transforming from row to a new row without setCoder

2020-05-15 Thread Brendan Stennett (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108415#comment-17108415
 ] 

Brendan Stennett commented on BEAM-9814:


[~reuvenlax] this appears even in simple examples without manipulating the 
RowCoder

 

 ```java
Pipeline p = Pipeline.create(options);

Schema schema = Schema.builder()
.addField("value", Schema.FieldType.STRING)
.build();

p.apply(Create.of("row1", "row2", "row3"))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element String input, 
OutputReceiver out) {
Row row = Row.withSchema(schema)
.addValue(input)
.build();

out.output(row);
}
}))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element Row row, OutputReceiver 
out) {
out.output(row.getString("value"));
}
}))
.apply("Write", TextIO.write().to(options.getOutput()));
 ```

> Error occurred when transforming from row to a new row without setCoder
> ---
>
> Key: BEAM-9814
> URL: https://issues.apache.org/jira/browse/BEAM-9814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Ruixue Liao
>Assignee: Reuven Lax
>Priority: Major
>
> The output row from transform function uses the input row schema to verify 
> which causes error. Ex:
> {code}
> .apply(MapElements.via(
> new SimpleFunction() \{
> @Override
> public Row apply(Row line) {
> return Row.withSchema(newSchema).addValues("a", 1, 
> "b").build();
> }
> }));
> {code}
> Got error when the output row schema is not the same as the input row.
> Need to add {{.setCoder(RowCoder.of(newSchema))}} after the transform 
> function to make it work.
> Related link: 
> [https://stackoverflow.com/questions/61236972/beam-sql-udf-to-split-one-column-into-multiple-columns]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9814) Error occurred when transforming from row to a new row without setCoder

2020-05-15 Thread Brendan Stennett (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108415#comment-17108415
 ] 

Brendan Stennett edited comment on BEAM-9814 at 5/15/20, 3:57 PM:
--

[~reuvenlax] this appears even in simple examples without manipulating the 
RowCoder

 

{code:language=java}
Pipeline p = Pipeline.create(options);

Schema schema = Schema.builder()
.addField("value", Schema.FieldType.STRING)
.build();

p.apply(Create.of("row1", "row2", "row3"))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element String input, 
OutputReceiver out) {
Row row = Row.withSchema(schema)
.addValue(input)
.build();

out.output(row);
}
}))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element Row row, OutputReceiver 
out) {
out.output(row.getString("value"));
}
}))
.apply("Write", TextIO.write().to(options.getOutput()));
{code}


was (Author: brendan6):
[~reuvenlax] this appears even in simple examples without manipulating the 
RowCoder

 

 ```java
Pipeline p = Pipeline.create(options);

Schema schema = Schema.builder()
.addField("value", Schema.FieldType.STRING)
.build();

p.apply(Create.of("row1", "row2", "row3"))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element String input, 
OutputReceiver out) {
Row row = Row.withSchema(schema)
.addValue(input)
.build();

out.output(row);
}
}))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element Row row, OutputReceiver 
out) {
out.output(row.getString("value"));
}
}))
.apply("Write", TextIO.write().to(options.getOutput()));
 ```

> Error occurred when transforming from row to a new row without setCoder
> ---
>
> Key: BEAM-9814
> URL: https://issues.apache.org/jira/browse/BEAM-9814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Ruixue Liao
>Assignee: Reuven Lax
>Priority: Major
>
> The output row from transform function uses the input row schema to verify 
> which causes error. Ex:
> {code}
> .apply(MapElements.via(
> new SimpleFunction() \{
> @Override
> public Row apply(Row line) {
> return Row.withSchema(newSchema).addValues("a", 1, 
> "b").build();
> }
> }));
> {code}
> Got error when the output row schema is not the same as the input row.
> Need to add {{.setCoder(RowCoder.of(newSchema))}} after the transform 
> function to make it work.
> Related link: 
> [https://stackoverflow.com/questions/61236972/beam-sql-udf-to-split-one-column-into-multiple-columns]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9814) Error occurred when transforming from row to a new row without setCoder

2020-05-15 Thread Brendan Stennett (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108415#comment-17108415
 ] 

Brendan Stennett edited comment on BEAM-9814 at 5/15/20, 4:02 PM:
--

[~reuvenlax] this appears even in simple examples without manipulating the 
RowCoder

 
{code:java}
Pipeline p = Pipeline.create(options);

Schema schema = Schema.builder()
.addField("value", Schema.FieldType.STRING)
.build();

p.apply(Create.of("row1", "row2", "row3"))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element String input, 
OutputReceiver out) {
Row row = Row.withSchema(schema)
.addValue(input)
.build();

out.output(row);
}
}))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element Row row, OutputReceiver 
out) {
out.output(row.getString("value"));
}
}))
.apply("Write", TextIO.write().to(options.getOutput()));
{code}

Edit: This exact example works in 2.20.0 but not in 2.21.0-SNAPSHOT or 
2.22.0-SNAPSHOT


was (Author: brendan6):
[~reuvenlax] this appears even in simple examples without manipulating the 
RowCoder

 

{code:language=java}
Pipeline p = Pipeline.create(options);

Schema schema = Schema.builder()
.addField("value", Schema.FieldType.STRING)
.build();

p.apply(Create.of("row1", "row2", "row3"))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element String input, 
OutputReceiver out) {
Row row = Row.withSchema(schema)
.addValue(input)
.build();

out.output(row);
}
}))
.apply(ParDo.of(new DoFn() {
@ProcessElement
public void processElement(@Element Row row, OutputReceiver 
out) {
out.output(row.getString("value"));
}
}))
.apply("Write", TextIO.write().to(options.getOutput()));
{code}

> Error occurred when transforming from row to a new row without setCoder
> ---
>
> Key: BEAM-9814
> URL: https://issues.apache.org/jira/browse/BEAM-9814
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Ruixue Liao
>Assignee: Reuven Lax
>Priority: Major
>
> The output row from transform function uses the input row schema to verify 
> which causes error. Ex:
> {code}
> .apply(MapElements.via(
> new SimpleFunction() \{
> @Override
> public Row apply(Row line) {
> return Row.withSchema(newSchema).addValues("a", 1, 
> "b").build();
> }
> }));
> {code}
> Got error when the output row schema is not the same as the input row.
> Need to add {{.setCoder(RowCoder.of(newSchema))}} after the transform 
> function to make it work.
> Related link: 
> [https://stackoverflow.com/questions/61236972/beam-sql-udf-to-split-one-column-into-multiple-columns]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=433741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433741
 ]

ASF GitHub Bot logged work on BEAM-9964:


Author: ASF GitHub Bot
Created on: 15/May/20 16:04
Start Date: 15/May/20 16:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11710:
URL: https://github.com/apache/beam/pull/11710#issuecomment-629342947


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433741)
Time Spent: 2h 40m  (was: 2.5h)

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=433742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433742
 ]

ASF GitHub Bot logged work on BEAM-9964:


Author: ASF GitHub Bot
Created on: 15/May/20 16:06
Start Date: 15/May/20 16:06
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11710:
URL: https://github.com/apache/beam/pull/11710#issuecomment-629343652


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433742)
Time Spent: 2h 50m  (was: 2h 40m)

> Setting workerCacheMb to make its way to the WindmillStateCache Constructor
> ---
>
> Key: BEAM-9964
> URL: https://issues.apache.org/jira/browse/BEAM-9964
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Omar Ismail
>Assignee: Omar Ismail
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, 
> the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to 
> make it allowable to change the cache value in Streaming when setting 
> -workerCacheMB.
> I've never made changes to the Beam SDK, so I am super excited to work on 
> this! 
>  
> [[1] 
> https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=433744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433744
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 15/May/20 16:07
Start Date: 15/May/20 16:07
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11070:
URL: https://github.com/apache/beam/pull/11070#issuecomment-629344436


   > Hi @udim I have my edits ready, but I can't push to this branch. I can 
make an PR against it, if you'd like. There's an option for each PR to "allow 
edits from maintainers", which could help, but I'm not sure if I'm considered a 
Beam maintainer or not.
   > 
   > Alternately, I can provide my edits as "suggestions" in a review, and you 
can apply them that way.
   
   @chadrik, do you have the "edit file" option available?
   
![xns3dSvObwH](https://user-images.githubusercontent.com/127695/82071819-7a052700-968b-11ea-990c-452cb5d04b5e.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433744)
Time Spent: 13h 10m  (was: 13h)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9977?focusedWorklogId=433743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433743
 ]

ASF GitHub Bot logged work on BEAM-9977:


Author: ASF GitHub Bot
Created on: 15/May/20 16:07
Start Date: 15/May/20 16:07
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11715:
URL: https://github.com/apache/beam/pull/11715#issuecomment-629344196


   R: @lukecwik 
   CC: @robertwb 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433743)
Time Spent: 0.5h  (was: 20m)

> Build Kafka Read on top of Java SplittableDoFn
> --
>
> Key: BEAM-9977
> URL: https://issues.apache.org/jira/browse/BEAM-9977
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9865) Clean up jenkins workspaces for successful jobs

2020-05-15 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-9865:
---

Assignee: (was: Udi Meiri)

> Clean up jenkins workspaces for successful jobs
> ---
>
> Key: BEAM-9865
> URL: https://issues.apache.org/jira/browse/BEAM-9865
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Priority: Major
>
> Two recent bugs (and many more in the past) citing lack of disk space:
> https://issues.apache.org/jira/browse/BEAM-9854
> https://issues.apache.org/jira/browse/BEAM-9462
> There are around 150 workspaces on each Jenkins machine: 
> apache-beam-jenkins-1..15.
> Total size:
> 1: 175G
> 7: 158G
> 8: 173G
> The majority of jobs use a clone of the Beam read/write files under src/, 
> which is wiped out at the start of the job (wipeOutWorkspace()), so there is 
> really no point in keeping workspace files around after the job has completed 
> successfully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9977) Build Kafka Read on top of Java SplittableDoFn

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9977?focusedWorklogId=433745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433745
 ]

ASF GitHub Bot logged work on BEAM-9977:


Author: ASF GitHub Bot
Created on: 15/May/20 16:10
Start Date: 15/May/20 16:10
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on a change in pull request #11715:
URL: https://github.com/apache/beam/pull/11715#discussion_r425885914



##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.java
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.io.range.OffsetRange;
+
+/**
+ * A special {@link OffsetRangeTracker} for tracking a growable offset range. 
The Long.MAX_VALUE is
+ * used as end range to indicate the possibility of infinity.
+ *
+ * A offset range is considered as growable when the end offset could 
grow(or change) during
+ * execution time(e.g., Kafka backlog, appended file).
+ */
+@Experimental(Kind.SPLITTABLE_DO_FN)
+public class GrowableOffsetRangeTracker extends OffsetRangeTracker {
+  /**
+   * An interface that should be implemented to fetch estimated end offset of 
range.
+   *
+   * {@code estimateRangeEnd} is called to give te end offset when {@code 
trySplit} or {@code
+   * getProgress} is invoked. The end offset is exclusive for the range. It's 
not necessary to
+   * increase monotonically but it's only taken into computation when it's 
larger than the current
+   * position. When returning Long.MAX_VALUE as estimate, it means the largest 
possible position for
+   * the range is Long.MAX_VALUE - 1. Having a good estimate is important for 
providing a good
+   * signal of progress and splitting at a proper position.
+   */
+  public interface OffsetPoller {
+long estimateRangeEnd();
+  }

Review comment:
   ```suggestion
 @FunctionalInterface
 public interface RangeEndEstimator {
   long estimate();
 }
   ```

##
File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/GrowableOffsetRangeTracker.java
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.transforms.splittabledofn;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.annotations.Experimental.Kind;
+import org.apache.beam.sdk.io.range.OffsetRange;
+
+/**
+ * A special {@link OffsetRangeTracker} for tracking a growable offset range. 
The Long.MAX_VALUE is
+ * used as end range to indicate the possibility of infinity.
+ *
+ * A offset range is considered as growable when the end offset could 
grow(or change) during
+ * execution time(e.g., Kafka backlog, appended file).
+ */
+@Experimental(Kind.SPLITTABLE_DO_FN)
+public class GrowableOffsetRangeTracker extends OffsetRangeTracker {
+  /**
+   * An interface that should be implemented to fetch estim

[jira] [Work logged] (BEAM-2939) Fn API SDF support

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2939?focusedWorklogId=433746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433746
 ]

ASF GitHub Bot logged work on BEAM-2939:


Author: ASF GitHub Bot
Created on: 15/May/20 16:11
Start Date: 15/May/20 16:11
Worklog Time Spent: 10m 
  Work Description: lukecwik merged pull request #11716:
URL: https://github.com/apache/beam/pull/11716


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433746)
Time Spent: 32.5h  (was: 32h 20m)

> Fn API SDF support
> --
>
> Key: BEAM-2939
> URL: https://issues.apache.org/jira/browse/BEAM-2939
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 32.5h
>  Remaining Estimate: 0h
>
> The Fn API should support streaming SDF. Detailed design TBD.
> Once design is ready, expand subtasks similarly to BEAM-2822.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9966) Investigate variance in checkpoint duration of ParDo streaming tests

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9966?focusedWorklogId=433747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433747
 ]

ASF GitHub Bot logged work on BEAM-9966:


Author: ASF GitHub Bot
Created on: 15/May/20 16:12
Start Date: 15/May/20 16:12
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11693:
URL: https://github.com/apache/beam/pull/11693#issuecomment-629346778


   I'm going to proceed merging this to ensure the load test runs correctly. 
   
   CC @tweise 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433747)
Time Spent: 1h 40m  (was: 1.5h)

> Investigate variance in checkpoint duration of ParDo streaming tests
> 
>
> Key: BEAM-9966
> URL: https://issues.apache.org/jira/browse/BEAM-9966
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We need to take a closer look at the variance in checkpoint duration which, 
> for different test runs, fluctuates between one second and one minute. 
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10002) Mongo cursor timeout leads to CursorNotFound error

2020-05-15 Thread Yichi Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108429#comment-17108429
 ] 

Yichi Zhang commented on BEAM-10002:


Thanks for the issue report [~corvin], do you want also contribute a PR to 
address it, since I'm not sure when I'll get time to work on this at the 
moment. 

> Mongo cursor timeout leads to CursorNotFound error
> --
>
> Key: BEAM-10002
> URL: https://issues.apache.org/jira/browse/BEAM-10002
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-mongodb
>Affects Versions: 2.20.0
>Reporter: Corvin Deboeser
>Assignee: Yichi Zhang
>Priority: Major
>
> If some work items take a lot of processing time and the cursor of a bundle 
> is not queried for too long, then mongodb will timeout the cursor which 
> results in
> {code:java}
> pymongo.errors.CursorNotFound: cursor id ... not found
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9966) Investigate variance in checkpoint duration of ParDo streaming tests

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9966?focusedWorklogId=433748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433748
 ]

ASF GitHub Bot logged work on BEAM-9966:


Author: ASF GitHub Bot
Created on: 15/May/20 16:14
Start Date: 15/May/20 16:14
Worklog Time Spent: 10m 
  Work Description: mxm merged pull request #11693:
URL: https://github.com/apache/beam/pull/11693


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433748)
Time Spent: 1h 50m  (was: 1h 40m)

> Investigate variance in checkpoint duration of ParDo streaming tests
> 
>
> Key: BEAM-9966
> URL: https://issues.apache.org/jira/browse/BEAM-9966
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We need to take a closer look at the variance in checkpoint duration which, 
> for different test runs, fluctuates between one second and one minute. 
> https://apache-beam-testing.appspot.com/explore?dashboard=5751884853805056



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433749
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:14
Start Date: 15/May/20 16:14
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629348473


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433749)
Time Spent: 3h 10m  (was: 3h)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=433752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433752
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 15/May/20 16:15
Start Date: 15/May/20 16:15
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #11070:
URL: https://github.com/apache/beam/pull/11070#issuecomment-629349337


   > @chadrik, do you have the "edit file" option available?
   
   Nope :(



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433752)
Time Spent: 13h 20m  (was: 13h 10m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433750
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:15
Start Date: 15/May/20 16:15
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629348851


   Load Tests Python ParDo Flink Streaming suite



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433750)
Time Spent: 3h 20m  (was: 3h 10m)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10008) Test TIMESTAMP support in Python SqlTransform

2020-05-15 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10008:


 Summary: Test TIMESTAMP support in Python SqlTransform
 Key: BEAM-10008
 URL: https://issues.apache.org/jira/browse/BEAM-10008
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433753
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:18
Start Date: 15/May/20 16:18
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629351467


   Load Tests Python ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433753)
Time Spent: 3.5h  (was: 3h 20m)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433755
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:19
Start Date: 15/May/20 16:19
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629351722


   Run Python Load Tests ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433755)
Time Spent: 3h 50m  (was: 3h 40m)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10009) Support for date times in Python schemas

2020-05-15 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10009:


 Summary: Support for date times in Python schemas
 Key: BEAM-10009
 URL: https://issues.apache.org/jira/browse/BEAM-10009
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette


When DATETIME is converted to a MillisInstant logical type, add support for it 
to RowCoder and python schemas.

Probably we should map MillisInstant to datetime with  
[datetime.datetime.from_timestamp|https://docs.python.org/3/library/datetime.html#datetime.datetime.fromtimestamp]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433756
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:19
Start Date: 15/May/20 16:19
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629351467


   Load Tests Python ParDo Flink Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433756)
Time Spent: 4h  (was: 3h 50m)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433754
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:19
Start Date: 15/May/20 16:19
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629348851


   Load Tests Python ParDo Flink Streaming suite



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433754)
Time Spent: 3h 40m  (was: 3.5h)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10009) Support for date times in Python schemas

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10009:
-
Status: Open  (was: Triage Needed)

> Support for date times in Python schemas
> 
>
> Key: BEAM-10009
> URL: https://issues.apache.org/jira/browse/BEAM-10009
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> When DATETIME is converted to a MillisInstant logical type, add support for 
> it to RowCoder and python schemas.
> Probably we should map MillisInstant to datetime with  
> [datetime.datetime.from_timestamp|https://docs.python.org/3/library/datetime.html#datetime.datetime.fromtimestamp]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10008) Test TIMESTAMP support in Python SqlTransform

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10008:
-
Status: Open  (was: Triage Needed)

> Test TIMESTAMP support in Python SqlTransform
> -
>
> Key: BEAM-10008
> URL: https://issues.apache.org/jira/browse/BEAM-10008
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10010) Test Python SqlTransform on fn_api_runner

2020-05-15 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10010:


 Summary: Test Python SqlTransform on fn_api_runner
 Key: BEAM-10010
 URL: https://issues.apache.org/jira/browse/BEAM-10010
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette


It should be possible to run with the fn_api_runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10011) Test Python SqlTransform on Dataflow

2020-05-15 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-10011:


 Summary: Test Python SqlTransform on Dataflow
 Key: BEAM-10011
 URL: https://issues.apache.org/jira/browse/BEAM-10011
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10011) Test Python SqlTransform on Dataflow

2020-05-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-10011:
---
Status: Open  (was: Triage Needed)

> Test Python SqlTransform on Dataflow
> 
>
> Key: BEAM-10011
> URL: https://issues.apache.org/jira/browse/BEAM-10011
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-10010) Test Python SqlTransform on fn_api_runner

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-10010 started by Brian Hulette.

> Test Python SqlTransform on fn_api_runner
> -
>
> Key: BEAM-10010
> URL: https://issues.apache.org/jira/browse/BEAM-10010
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> It should be possible to run with the fn_api_runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10010) Test Python SqlTransform on fn_api_runner

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-10010:


Assignee: Brian Hulette

> Test Python SqlTransform on fn_api_runner
> -
>
> Key: BEAM-10010
> URL: https://issues.apache.org/jira/browse/BEAM-10010
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>
> It should be possible to run with the fn_api_runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10010) Test Python SqlTransform on fn_api_runner

2020-05-15 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-10010:
-
Status: Open  (was: Triage Needed)

> Test Python SqlTransform on fn_api_runner
> -
>
> Key: BEAM-10010
> URL: https://issues.apache.org/jira/browse/BEAM-10010
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> It should be possible to run with the fn_api_runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433758
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:24
Start Date: 15/May/20 16:24
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629353811







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433758)
Time Spent: 4h 10m  (was: 4h)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6733) Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6733?focusedWorklogId=433759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433759
 ]

ASF GitHub Bot logged work on BEAM-6733:


Author: ASF GitHub Bot
Created on: 15/May/20 16:24
Start Date: 15/May/20 16:24
Worklog Time Spent: 10m 
  Work Description: mxm removed a comment on pull request #11678:
URL: https://github.com/apache/beam/pull/11678#issuecomment-629351722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433759)
Time Spent: 4h 20m  (was: 4h 10m)

> Use Flink 1.6/1.7 prepareSnapshotPreBarrier to replace BufferedOutputManager
> 
>
> Key: BEAM-6733
> URL: https://issues.apache.org/jira/browse/BEAM-6733
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Flink 1.6/1.7 provides a hook to execute an action before the snapshot 
> barrier is emitted by the operator. At the moment (<=1.5) the Flink Runner 
> has to buffer any elements which are emitted during a snapshot because the 
> barrier has already been emitted. This leads to a lot of code complexity.
> We can remove the buffering in favor of finishing the current bundle in 
> {{DoFnOperator}}'s {{prepareSnapshotPreBarrier}}. The 1.5/1.6/1.7 build setup 
> poses a challenge to do that in a way that does not lead to much code 
> duplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=433760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433760
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 15/May/20 16:29
Start Date: 15/May/20 16:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-629356272


   We have a failure from a test checking out the API surface for the gcp io 
package: 
https://builds.apache.org/job/beam_PreCommit_Java_Commit/11386/testReport/junit/org.apache.beam.sdk.io.gcp/GcpApiSurfaceTest/testGcpApiSurface/



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433760)
Time Spent: 42h 20m  (was: 42h 10m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 42h 20m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=433763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433763
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 15/May/20 16:32
Start Date: 15/May/20 16:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-629357697


   this test checks that we're not increasing the dependencies that we expose 
in the API. We can add the Healthcare IO classes.
   Other classes I think we should try to figure out how to avoid exposing 
them. It seems that they mostly come from the 
`HttpHealthcareApiClient$HealthcareHttpException`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433763)
Time Spent: 42.5h  (was: 42h 20m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 42.5h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=433766&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433766
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 15/May/20 16:38
Start Date: 15/May/20 16:38
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #11070:
URL: https://github.com/apache/beam/pull/11070#issuecomment-629360659


   > > @chadrik, do you have the "edit file" option available?
   > 
   > Nope :(
   
   I think that maybe it's because you're not on 
https://github.com/orgs/apache/teams/beam-committers ? (you are on 
https://github.com/orgs/apache/teams/apache-committers though)
   Maybe ask on the mailing list.
   
   Meanwhile, if you make suggestions I think that'd be easiest. (I hope)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433766)
Time Spent: 13.5h  (was: 13h 20m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=433770&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433770
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 15/May/20 16:46
Start Date: 15/May/20 16:46
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-629364334


   @mwalenia 
   > What exactly causes this 
error?https://builds.apache.org/job/beam_PostCommit_Java_PR/362/testReport/junit/org.apache.beam.sdk.io.gcp.healthcare/FhirIOWriteIT/testFhirIO_ExecuteBundle_R4_/
   
   potentially there was an issue staging all appropriate files
   ```
   May 14, 2020 2:05:45 AM 
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory
 create
   INFO: No stagingLocation provided, falling back to gcpTempLocation
   May 14, 2020 2:05:45 AM org.apache.beam.runners.dataflow.DataflowRunner 
fromOptions
   INFO: PipelineOptions.filesToStage was not specified. Defaulting to files 
from the classpath: will stage 183 files. Enable logging at DEBUG level to see 
which files will be staged.
   May 14, 2020 2:05:46 AM 
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory
 create
   INFO: No stagingLocation provided, falling back to gcpTempLocation
   May 14, 2020 2:05:46 AM org.apache.beam.runners.dataflow.DataflowRunner 
fromOptions
   INFO: PipelineOptions.filesToStage was not specified. Defaulting to files 
from the classpath: will stage 183 files. Enable logging at DEBUG level to see 
which files will be staged.
   May 14, 2020 2:05:46 AM 
org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory
 create
   INFO: No stagingLocation provided, falling back to gcpTempLocation
   May 14, 2020 2:05:46 AM org.apache.beam.runners.dataflow.DataflowRunner 
fromOptions
   INFO: PipelineOptions.filesToStage was not specified. Defaulting to files 
from the classpath: will stage 183 files. Enable logging at DEBUG level to see 
which files will be staged.
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433770)
Time Spent: 42h 40m  (was: 42.5h)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 42h 40m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=433775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433775
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 15/May/20 16:51
Start Date: 15/May/20 16:51
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-629367380


   @pabloem re: api surface.
   So the history here is that the client library executeBundle method is 
broken so we were forced to use out own http client (and our own exceptions).
   
   I think I can make the constructor / factory that include apache http deps 
private as they wouldn't be constructed by user. I'll take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433775)
Time Spent: 42h 50m  (was: 42h 40m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 42h 50m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=433777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433777
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 15/May/20 16:55
Start Date: 15/May/20 16:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11584:
URL: https://github.com/apache/beam/pull/11584#issuecomment-629369467


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433777)
Time Spent: 30h 50m  (was: 30h 40m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 30h 50m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=433776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433776
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 15/May/20 16:55
Start Date: 15/May/20 16:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11717:
URL: https://github.com/apache/beam/pull/11717#issuecomment-629369332


   R: @ibzib 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433776)
Time Spent: 30h 40m  (was: 30.5h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 30h 40m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-10003) Need two PR to submit snippets to website

2020-05-15 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reassigned BEAM-10003:
--

Assignee: Aizhamal Nurmamat kyzy

> Need two PR to submit snippets to website
> -
>
> Key: BEAM-10003
> URL: https://issues.apache.org/jira/browse/BEAM-10003
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Minor
>
> Looks like build_github_samples.sh uses code already on the repo to build 
> local serving;
> do
>   fileName=$(echo "$url" | sed -e 's/\//_/g')
>   curl -o "$DIST_DIR"/"$fileName" 
> "[https://raw.githubusercontent.com|https://raw.githubusercontent.com/]$url";
> done
> So when tying to test locally, the code needs to have already be in Beam. 
> Ideally the script should make use of local code when building so :
> 1- Easier to  build & test changes.
> 2- No need to raise two PR for what is a single change
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10003) Need two PR to submit snippets to website

2020-05-15 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108469#comment-17108469
 ] 

Ahmet Altay commented on BEAM-10003:


Nam, could you change the script to generate snippets from the branch (PR 
branch or current branch) instead of what is in master?

> Need two PR to submit snippets to website
> -
>
> Key: BEAM-10003
> URL: https://issues.apache.org/jira/browse/BEAM-10003
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Aizhamal Nurmamat kyzy
>Priority: Minor
>
> Looks like build_github_samples.sh uses code already on the repo to build 
> local serving;
> do
>   fileName=$(echo "$url" | sed -e 's/\//_/g')
>   curl -o "$DIST_DIR"/"$fileName" 
> "[https://raw.githubusercontent.com|https://raw.githubusercontent.com/]$url";
> done
> So when tying to test locally, the code needs to have already be in Beam. 
> Ideally the script should make use of local code when building so :
> 1- Easier to  build & test changes.
> 2- No need to raise two PR for what is a single change
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9729) Cleanup bundle registration now that SDKs can pull.

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9729?focusedWorklogId=433783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433783
 ]

ASF GitHub Bot logged work on BEAM-9729:


Author: ASF GitHub Bot
Created on: 15/May/20 17:09
Start Date: 15/May/20 17:09
Worklog Time Spent: 10m 
  Work Description: robertwb opened a new pull request #11720:
URL: https://github.com/apache/beam/pull/11720


   Instead always pull bundle descriptors from the runner.
   
   This avoids having potentially duplicated caches of all bundle
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java

[jira] [Work logged] (BEAM-9729) Cleanup bundle registration now that SDKs can pull.

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9729?focusedWorklogId=433785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433785
 ]

ASF GitHub Bot logged work on BEAM-9729:


Author: ASF GitHub Bot
Created on: 15/May/20 17:10
Start Date: 15/May/20 17:10
Worklog Time Spent: 10m 
  Work Description: robertwb edited a comment on pull request #11720:
URL: https://github.com/apache/beam/pull/11720#issuecomment-629376271


   R: @ananvay



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433785)
Time Spent: 2h 40m  (was: 2.5h)

> Cleanup bundle registration now that SDKs can pull.
> ---
>
> Key: BEAM-9729
> URL: https://issues.apache.org/jira/browse/BEAM-9729
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Once all runners (in particular dataflow) support pull descriptors, we can 
> clean things up by removing the push registration code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9729) Cleanup bundle registration now that SDKs can pull.

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9729?focusedWorklogId=433784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433784
 ]

ASF GitHub Bot logged work on BEAM-9729:


Author: ASF GitHub Bot
Created on: 15/May/20 17:10
Start Date: 15/May/20 17:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11720:
URL: https://github.com/apache/beam/pull/11720#issuecomment-629376271


   R: @ ananvay



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433784)
Time Spent: 2.5h  (was: 2h 20m)

> Cleanup bundle registration now that SDKs can pull.
> ---
>
> Key: BEAM-9729
> URL: https://issues.apache.org/jira/browse/BEAM-9729
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Once all runners (in particular dataflow) support pull descriptors, we can 
> clean things up by removing the push registration code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=433786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433786
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 15/May/20 17:11
Start Date: 15/May/20 17:11
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11339:
URL: https://github.com/apache/beam/pull/11339#issuecomment-629376974


   @pabloem 843104d was a pretty simple fix for api surface test
   ```
   git rev-parse HEAD && ./gradlew 
:sdks:java:io:google-cloud-platform:cleanTest 
:sdks:java:io:google-cloud-platform:test --tests 
"org.apache.beam.sdk.io.gcp.GcpApiSurfaceTest"
   843104dacaa9a7064247d6870032c9d69d51afe8
   Configuration on demand is an incubating feature.
   
   > Task :sdks:java:io:google-cloud-platform:compileTestJava
   Note: Some input files use or override a deprecated API.
   Note: Recompile with -Xlint:deprecation for details.
   Note: Some input files use unchecked or unsafe operations.
   Note: Recompile with -Xlint:unchecked for details.
   
   Deprecated Gradle features were used in this build, making it incompatible 
with Gradle 6.0.
   Use '--warning-mode all' to show the individual deprecation warnings.
   See 
https://docs.gradle.org/5.2.1/userguide/command_line_interface.html#sec:command_line_warnings
   
   BUILD SUCCESSFUL in 13s
   68 actionable tasks: 2 executed, 66 up-to-date
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433786)
Time Spent: 43h  (was: 42h 50m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 43h
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3544) Fn API metrics in Python SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3544.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Fn API metrics in Python SDK Harness
> 
>
> Key: BEAM-3544
> URL: https://issues.apache.org/jira/browse/BEAM-3544
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.21.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3544) Fn API metrics in Python SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-3544:
---

Assignee: Robert Bradshaw

> Fn API metrics in Python SDK Harness
> 
>
> Key: BEAM-3544
> URL: https://issues.apache.org/jira/browse/BEAM-3544
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Robert Bradshaw
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3543) Fn API metrics in Java SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik reassigned BEAM-3543:
---

Assignee: Luke Cwik

> Fn API metrics in Java SDK Harness
> --
>
> Key: BEAM-3543
> URL: https://issues.apache.org/jira/browse/BEAM-3543
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3543) Fn API metrics in Java SDK Harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3543.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

> Fn API metrics in Java SDK Harness
> --
>
> Key: BEAM-3543
> URL: https://issues.apache.org/jira/browse/BEAM-3543
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-harness
>Reporter: Kenneth Knowles
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
> Fix For: 2.21.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-3545) Fn API metrics in Go SDK harness

2020-05-15 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-3545.
-
Fix Version/s: 2.21.0
   Resolution: Fixed

Any additional styles of metrics/capabilities have been tracked under other 
bugs.

> Fn API metrics in Go SDK harness
> 
>
> Key: BEAM-3545
> URL: https://issues.apache.org/jira/browse/BEAM-3545
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Kenneth Knowles
>Assignee: Robert Burke
>Priority: Major
>  Labels: portability
> Fix For: 2.21.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9064) Add pytype to lint checks

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9064?focusedWorklogId=433789&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433789
 ]

ASF GitHub Bot logged work on BEAM-9064:


Author: ASF GitHub Bot
Created on: 15/May/20 17:19
Start Date: 15/May/20 17:19
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #10528:
URL: https://github.com/apache/beam/pull/10528#issuecomment-629380085


   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433789)
Time Spent: 40m  (was: 0.5h)

> Add pytype to lint checks
> -
>
> Key: BEAM-9064
> URL: https://issues.apache.org/jira/browse/BEAM-9064
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [~chadrik]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1866) Fn API support for Metrics

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1866?focusedWorklogId=433791&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433791
 ]

ASF GitHub Bot logged work on BEAM-1866:


Author: ASF GitHub Bot
Created on: 15/May/20 17:23
Start Date: 15/May/20 17:23
Worklog Time Spent: 10m 
  Work Description: lukecwik opened a new pull request #11721:
URL: https://github.com/apache/beam/pull/11721


   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructure

[jira] [Work logged] (BEAM-9770) Add BigQuery DeadLetter pattern to Patterns Page

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9770?focusedWorklogId=433792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433792
 ]

ASF GitHub Bot logged work on BEAM-9770:


Author: ASF GitHub Bot
Created on: 15/May/20 17:31
Start Date: 15/May/20 17:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11437:
URL: https://github.com/apache/beam/pull/11437#issuecomment-629386383


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433792)
Time Spent: 3h  (was: 2h 50m)

> Add BigQuery DeadLetter pattern to Patterns Page
> 
>
> Key: BEAM-9770
> URL: https://issues.apache.org/jira/browse/BEAM-9770
> Project: Beam
>  Issue Type: New Feature
>  Components: website
>Reporter: Reza ardeshir rokni
>Assignee: Reza ardeshir rokni
>Priority: Trivial
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1866) Fn API support for Metrics

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1866?focusedWorklogId=433804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433804
 ]

ASF GitHub Bot logged work on BEAM-1866:


Author: ASF GitHub Bot
Created on: 15/May/20 17:47
Start Date: 15/May/20 17:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11721:
URL: https://github.com/apache/beam/pull/11721#issuecomment-629393804


   R: @ananvay 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433804)
Time Spent: 3h 40m  (was: 3.5h)

> Fn API support for Metrics
> --
>
> Key: BEAM-1866
> URL: https://issues.apache.org/jira/browse/BEAM-1866
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Dan Halperin
>Priority: Major
>  Labels: portability
> Fix For: Not applicable
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> As part of the Fn API work, we need to define a Metrics interface between the 
> Runner and the SDK. Right now, Metrics are simply lost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10006) PipelineOptions can pick up definitions from unrelated tests

2020-05-15 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108499#comment-17108499
 ] 

Ahmet Altay commented on BEAM-10006:


Flakes probably depends on the test order. This is a known implementation 
limitation. Adding namespaces might improve things a bit 
(https://issues.apache.org/jira/browse/BEAM-6531). Or a new way to discover 
options need to be implemented (e.g. based on registering options) but that 
would be a bigger change.

> PipelineOptions can pick up definitions from unrelated tests
> 
>
> Key: BEAM-10006
> URL: https://issues.apache.org/jira/browse/BEAM-10006
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> Since PipelineOptions uses {{\_\_subclasses\_\_}} to look for all 
> definitions, when used in tests it can sometimes pick up sub-classes that 
> were created in previously executed tests.
> See BEAM-9975 for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-10012) Update Python SDK to construct Dataflow job requests from Beam runner API protos

2020-05-15 Thread Chamikara Madhusanka Jayalath (Jira)
Chamikara Madhusanka Jayalath created BEAM-10012:


 Summary: Update Python SDK to construct Dataflow job requests from 
Beam runner API protos
 Key: BEAM-10012
 URL: https://issues.apache.org/jira/browse/BEAM-10012
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py-core
Reporter: Chamikara Madhusanka Jayalath


Currently, portable runners are expected to do following when constructing a 
runner specific job.

SDK specific job graph -> Beam runner API proto -> Runner specific job request

Portable Spark and Flink follow this model.

Dataflow does following.

SDK specific job graph -> Runner specific job request

Beam runner API proto -> Upload to GCS -> Download at workers

 

We should update Dataflow to follow the prior path which is expected to be 
followed by all portable runners.

This will simplify the cross-language transforms job construction logic for 
Dataflow.

We can probably start this by just implementing this for Python SDK for 
portions of pipeline received by expanding external transforms.

cc: [~lcwik] [~robertwb]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-05-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=433806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-433806
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 15/May/20 17:49
Start Date: 15/May/20 17:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11039:
URL: https://github.com/apache/beam/pull/11039#issuecomment-629394803


   @ihji Just thinking about timing, were we hoping to get this into 2.22? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 433806)
Time Spent: 8h 40m  (was: 8.5h)

> Staging Dataflow artifacts from environment
> ---
>
> Key: BEAM-9383
> URL: https://issues.apache.org/jira/browse/BEAM-9383
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Staging Dataflow artifacts from environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10007) PortableRunner doesn't handle ValueProvider instances when converting pipeline options

2020-05-15 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108503#comment-17108503
 ] 

Ahmet Altay commented on BEAM-10007:


Converting valueproviders to actual values should be always fine. Consumers 
usually expect either a value or a valueprovider. If a valueprovider already 
has a ready value, we can convert it.

> PortableRunner doesn't handle ValueProvider instances when converting 
> pipeline options
> --
>
> Key: BEAM-10007
> URL: https://issues.apache.org/jira/browse/BEAM-10007
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Priority: Major
>
> We attempt to convert ValueProvider instances directly to JSON with 
> json_format, leading to errors like the one described in BEAM-9975.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9975) PortableRunnerTest flake "ParseError: Unexpected type for Value message."

2020-05-15 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108506#comment-17108506
 ] 

Ahmet Altay commented on BEAM-9975:
---

Thank you Brian. 
- Solving the subclass problem is probably a larger question. If we drop in the 
current mechanism of options discovery, we will break users and need to find a 
less fragile way of registering known pipeline options. (/cc [~tvalentyn] and 
[~robertwb] on this one.)
- Encoding valueproviders by calling get and using their value instead should 
be safe. As long value providers have a value, it is fine to use it in place of 
the valueprovider itself.

> PortableRunnerTest flake "ParseError: Unexpected type for Value message."
> -
>
> Key: BEAM-9975
> URL: https://issues.apache.org/jira/browse/BEAM-9975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Error looks similar to the one in BEAM-9907. Example from 
> https://builds.apache.org/job/beam_PreCommit_Python_Cron/2732
> {code}
> apache_beam/runners/portability/fn_api_runner/fn_runner_test.py:569: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> apache_beam/pipeline.py:550: in __exit__
> self.run().wait_until_finish()
> apache_beam/pipeline.py:529: in run
> return self.runner.run_pipeline(self, self._options)
> apache_beam/runners/portability/portable_runner.py:426: in run_pipeline
> job_service_handle.submit(proto_pipeline)
> apache_beam/runners/portability/portable_runner.py:107: in submit
> prepare_response = self.prepare(proto_pipeline)
> apache_beam/runners/portability/portable_runner.py:184: in prepare
> pipeline_options=self.get_pipeline_options()),
> apache_beam/runners/portability/portable_runner.py:174: in 
> get_pipeline_options
> return job_utils.dict_to_struct(p_options)
> apache_beam/runners/job/utils.py:33: in dict_to_struct
> return json_format.ParseDict(dict_obj, struct_pb2.Struct())
> target/.tox-py36-cython/py36-cython/lib/python3.6/site-packages/google/protobuf/json_format.py:450:
>  in ParseDict
> parser.ConvertMessage(js_dict, message)
> target/.tox-py36-cython/py36-cython/lib/python3.6/site-packages/google/protobuf/json_format.py:479:
>  in ConvertMessage
> methodcaller(_WKTJSONMETHODS[full_name][1], value, message)(self)
> target/.tox-py36-cython/py36-cython/lib/python3.6/site-packages/google/protobuf/json_format.py:667:
>  in _ConvertStructMessage
> self._ConvertValueMessage(value[key], message.fields[key])
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> value =  0x7f69eb7b3ac8>
> message = 
> def _ConvertValueMessage(self, value, message):
>   """Convert a JSON representation into Value message."""
>   if isinstance(value, dict):
> self._ConvertStructMessage(value, message.struct_value)
>   elif isinstance(value, list):
> self. _ConvertListValueMessage(value, message.list_value)
>   elif value is None:
> message.null_value = 0
>   elif isinstance(value, bool):
> message.bool_value = value
>   elif isinstance(value, six.string_types):
> message.string_value = value
>   elif isinstance(value, _INT_OR_FLOAT):
> message.number_value = value
>   else:
> >   raise ParseError('Unexpected type for Value message.')
> E   google.protobuf.json_format.ParseError: Unexpected type for Value 
> message.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >