[jira] [Work logged] (BEAM-9331) The Row object needs better builders

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=389356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389356
 ]

ASF GitHub Bot logged work on BEAM-9331:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:07
Start Date: 19/Feb/20 09:07
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10883: 
[BEAM-9331] Add better Row builders
URL: https://github.com/apache/beam/pull/10883#discussion_r381147867
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -599,272 +709,555 @@ public Builder addArray(Object... values) {
   return this;
 }
 
-// Values are attached. No verification is done, and no conversions are 
done. LogicalType
-// values must be specified as the base type.
+// Values are attached. No verification is done, and no conversions are 
done. LogicalType values
+// must be specified
+// as the base type. This method should be used with great care, as no 
validation is done. If
+// incorrect values are
+// passed in, it could result in strange errors later in the pipeline. 
This method is largely
+// used internal
+// to Beam.
+@Internal
 public Builder attachValues(List values) {
 
 Review comment:
   Maybe this is an opportunity to change the attachValues. No values should be 
set before or after the attach. I see 2 options to improve this:
   - In the attach first see if values are already set. Let the attachValues 
return the new Row directly. This is maybe a bit strange as it violates a 
builder pattern.
   - Have 4 build in builders. The starting one (that includes an` 
attachValues`, `add` and `withFieldValue`), all of them return a specific 
builder: the new `ModifyingBuilder` and a new `AddValuesBuilder` that only has 
the add methods and an `AttachBuilder` that only has build. This also 
eliminates some elaborate if/then/else's in the builder().
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389356)
Time Spent: 1h  (was: 50m)

> The Row object needs better builders
> 
>
> Key: BEAM-9331
> URL: https://issues.apache.org/jira/browse/BEAM-9331
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Users should be able to build a Row object by specifying field names. Desired 
> syntax:
>  
> Row.withSchema(schema)
>    .withFieldName("field1", "value)
>   .withFieldName("field2.field3", value)
>   .build()
>  
> Users should also have a builder that allows taking an existing row and 
> changing specific fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9331) The Row object needs better builders

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=389358&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389358
 ]

ASF GitHub Bot logged work on BEAM-9331:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:07
Start Date: 19/Feb/20 09:07
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10883: 
[BEAM-9331] Add better Row builders
URL: https://github.com/apache/beam/pull/10883#discussion_r381157953
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -539,37 +568,118 @@ public String toString() {
   }
 
   /**
-   * Creates a record builder with specified {@link #getSchema()}. {@link 
Builder#build()} will
-   * throw an {@link IllegalArgumentException} if number of fields in {@link 
#getSchema()} does not
-   * match the number of fields specified.
+   * Creates a row builder with specified {@link #getSchema()}. {@link 
Builder#build()} will throw
+   * an {@link IllegalArgumentException} if number of fields in {@link 
#getSchema()} does not match
+   * the number of fields specified. If any of the arguments don't match the 
expected types for the
+   * schema fields, {@link Builder#build()} will throw a {@link 
ClassCastException}.
*/
   public static Builder withSchema(Schema schema) {
 return new Builder(schema);
   }
 
+  /**
+   * Creates a row builder based on the specified row. Field values in the new 
row can be explicitly
+   * set using {@link ModifyingBuilder#withFieldValue}. Any values not so 
overridden will be the
+   * same as the values in the original row.
+   */
+  public static ModifyingBuilder fromRow(Row row) {
+return new ModifyingBuilder(row);
+  }
+
+  /** Builder for {@link Row} that bases a row on another row. */
+  public static class ModifyingBuilder {
+private final Row sourceRow;
+private final Map fieldValues = 
Maps.newHashMap();
+
+private ModifyingBuilder(Row sourceRow) {
+  this.sourceRow = sourceRow;
+}
+
+public Schema getSchema() {
+  return sourceRow.getSchema();
+}
+
+public ModifyingBuilder withFieldValue(String fieldName, Object value) {
 
 Review comment:
   Wouldn't it be useful to have a `withFieldValue` with an index?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389358)
Time Spent: 1h 10m  (was: 1h)

> The Row object needs better builders
> 
>
> Key: BEAM-9331
> URL: https://issues.apache.org/jira/browse/BEAM-9331
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Users should be able to build a Row object by specifying field names. Desired 
> syntax:
>  
> Row.withSchema(schema)
>    .withFieldName("field1", "value)
>   .withFieldName("field2.field3", value)
>   .build()
>  
> Users should also have a builder that allows taking an existing row and 
> changing specific fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9331) The Row object needs better builders

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=389357&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389357
 ]

ASF GitHub Bot logged work on BEAM-9331:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:07
Start Date: 19/Feb/20 09:07
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10883: 
[BEAM-9331] Add better Row builders
URL: https://github.com/apache/beam/pull/10883#discussion_r381152948
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -539,37 +568,118 @@ public String toString() {
   }
 
   /**
-   * Creates a record builder with specified {@link #getSchema()}. {@link 
Builder#build()} will
-   * throw an {@link IllegalArgumentException} if number of fields in {@link 
#getSchema()} does not
-   * match the number of fields specified.
+   * Creates a row builder with specified {@link #getSchema()}. {@link 
Builder#build()} will throw
+   * an {@link IllegalArgumentException} if number of fields in {@link 
#getSchema()} does not match
+   * the number of fields specified. If any of the arguments don't match the 
expected types for the
+   * schema fields, {@link Builder#build()} will throw a {@link 
ClassCastException}.
*/
   public static Builder withSchema(Schema schema) {
 return new Builder(schema);
   }
 
+  /**
+   * Creates a row builder based on the specified row. Field values in the new 
row can be explicitly
+   * set using {@link ModifyingBuilder#withFieldValue}. Any values not so 
overridden will be the
+   * same as the values in the original row.
+   */
+  public static ModifyingBuilder fromRow(Row row) {
+return new ModifyingBuilder(row);
+  }
+
+  /** Builder for {@link Row} that bases a row on another row. */
+  public static class ModifyingBuilder {
+private final Row sourceRow;
+private final Map fieldValues = 
Maps.newHashMap();
+
+private ModifyingBuilder(Row sourceRow) {
+  this.sourceRow = sourceRow;
+}
+
+public Schema getSchema() {
+  return sourceRow.getSchema();
+}
+
+public ModifyingBuilder withFieldValue(String fieldName, Object value) {
+  FieldAccessDescriptor fieldAccessDescriptor =
+  FieldAccessDescriptor.withFieldNames(fieldName).resolve(getSchema());
+  checkArgument(fieldAccessDescriptor.referencesSingleField(), "");
+  fieldValues.put(fieldAccessDescriptor, new FieldOverride(value));
+  return this;
+}
+
+public ModifyingBuilder withFieldValues(Map values) {
+  fieldValues.putAll(
+  values.entrySet().stream()
+  .collect(
+  Collectors.toMap(
+  e -> FieldAccessDescriptor.withFieldNames(e.getKey()),
+  e -> new FieldOverride(e.getValue();
+  return this;
+}
+
+public Row build() {
+  Row row =
+  (Row)
+  new RowFieldMatcher()
+  .match(
+  new CapturingRowCases(getSchema(), this.fieldValues, 
false),
+  FieldType.row(getSchema()),
+  FieldAccessDescriptor.create(),
+  sourceRow);
+  return row;
+}
+  }
+
   /** Builder for {@link Row}. */
   public static class Builder {
+private final Map fieldValues = 
Maps.newHashMap();
 private List values = Lists.newArrayList();
 private boolean attached = false;
 @Nullable private Factory> fieldValueGetterFactory;
 @Nullable private Object getterTarget;
-private Schema schema;
+private final Schema schema;
 
 Builder(Schema schema) {
   this.schema = schema;
 }
 
-public int nextFieldId() {
-  if (fieldValueGetterFactory != null) {
-throw new RuntimeException("Not supported");
-  }
-  return values.size();
-}
-
+/** Return the schema for the row being built. */
 public Schema getSchema() {
   return schema;
 }
 
+/**
+ * Set a field value using the field name. Nested values can be set using 
the field selection
+ * syntax.
+ */
+public Builder withFieldValue(String fieldName, Object value) {
 
 Review comment:
   Maybe this one can return the ModifyingBuilder so that no other methods can 
used (no attachValues, no add's).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389357)

> The Row object needs better builders
> 
>
> Key: BEAM-9331
> URL: https://issues.apache.org/jira/browse/B

[jira] [Work logged] (BEAM-9331) The Row object needs better builders

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=389359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389359
 ]

ASF GitHub Bot logged work on BEAM-9331:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:07
Start Date: 19/Feb/20 09:07
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10883: 
[BEAM-9331] Add better Row builders
URL: https://github.com/apache/beam/pull/10883#discussion_r381151951
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -539,37 +568,118 @@ public String toString() {
   }
 
   /**
-   * Creates a record builder with specified {@link #getSchema()}. {@link 
Builder#build()} will
-   * throw an {@link IllegalArgumentException} if number of fields in {@link 
#getSchema()} does not
-   * match the number of fields specified.
+   * Creates a row builder with specified {@link #getSchema()}. {@link 
Builder#build()} will throw
+   * an {@link IllegalArgumentException} if number of fields in {@link 
#getSchema()} does not match
+   * the number of fields specified. If any of the arguments don't match the 
expected types for the
+   * schema fields, {@link Builder#build()} will throw a {@link 
ClassCastException}.
*/
   public static Builder withSchema(Schema schema) {
 return new Builder(schema);
   }
 
+  /**
+   * Creates a row builder based on the specified row. Field values in the new 
row can be explicitly
+   * set using {@link ModifyingBuilder#withFieldValue}. Any values not so 
overridden will be the
+   * same as the values in the original row.
+   */
+  public static ModifyingBuilder fromRow(Row row) {
+return new ModifyingBuilder(row);
+  }
+
+  /** Builder for {@link Row} that bases a row on another row. */
+  public static class ModifyingBuilder {
 
 Review comment:
   Can't this be tweaked a bit, that this is the builder specifically for use 
with `withFieldValue`. Meaning that if when nit doesn't have a source row it 
just assumes null values for the fields not set. See remark on `withFieldValue` 
on initial builder as well. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389359)
Time Spent: 1h 20m  (was: 1h 10m)

> The Row object needs better builders
> 
>
> Key: BEAM-9331
> URL: https://issues.apache.org/jira/browse/BEAM-9331
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Users should be able to build a Row object by specifying field names. Desired 
> syntax:
>  
> Row.withSchema(schema)
>    .withFieldName("field1", "value)
>   .withFieldName("field2.field3", value)
>   .build()
>  
> Users should also have a builder that allows taking an existing row and 
> changing specific fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9331) The Row object needs better builders

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=389360&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389360
 ]

ASF GitHub Bot logged work on BEAM-9331:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:08
Start Date: 19/Feb/20 09:08
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10883: 
[BEAM-9331] Add better Row builders
URL: https://github.com/apache/beam/pull/10883#discussion_r381158593
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/Row.java
 ##
 @@ -55,8 +59,31 @@
  * {@link Row} is an immutable tuple-like schema to represent one element in a 
{@link PCollection}.
  * The fields are described with a {@link Schema}.
  *
- * {@link Schema} contains the names for each field and the coder for the 
whole record,
- * {see @link Schema#getRowCoder()}.
+ * {@link Schema} contains the names and types for each field.
+ *
+ * There are several ways to build a new Row object. To build a row from 
scratch using a schema
+ * object, {@link Row#withSchema} can be used. Schema fields can be specified 
by name, and nested
+ * fields can be specified using the field selection syntax. For example:
+ *
+ * {@code
+ * Row row = Row.withSchema(schema)
+ *  .withFieldValue("userId", "user1)
+ *  .withFieldValue("location.city", "seattle")
+ *  .withFieldValue("location.state", "wa")
+ *  .build();
+ * }
+ *
+ * The {@link Row#fromRow} builder can be used to base a row off of another 
row. The builder can
+ * be used to specify values for specific fields, and all the remaining values 
will be taken from
+ * the original row. For example, the following produces a row identical to 
the above row except for
+ * the location.city field.
+ *
+ * {@code
+ * Row modifiedRow =
+ * Row.fromRow(row)
+ *.withFieldValue("location.city", "tacoma")
 
 Review comment:
   Nope, I like the proposed `withFieldValue`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389360)
Time Spent: 1.5h  (was: 1h 20m)

> The Row object needs better builders
> 
>
> Key: BEAM-9331
> URL: https://issues.apache.org/jira/browse/BEAM-9331
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Users should be able to build a Row object by specifying field names. Desired 
> syntax:
>  
> Row.withSchema(schema)
>    .withFieldName("field1", "value)
>   .withFieldName("field2.field3", value)
>   .build()
>  
> Users should also have a builder that allows taking an existing row and 
> changing specific fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=389361&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389361
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:11
Start Date: 19/Feb/20 09:11
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10885: [BEAM-9085] 
Fix performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/10885#discussion_r381160085
 
 

 ##
 File path: sdks/python/apache_beam/testing/synthetic_pipeline.py
 ##
 @@ -415,19 +418,24 @@ def get_range_tracker(self, start_position, 
stop_position):
   tracker = range_trackers.UnsplittableRangeTracker(tracker)
 return tracker
 
+  @staticmethod
+  def random_bytes(length):
+"""Return random bytes."""
+return b''.join(
+(struct.pack('B', random.getrandbits(8)) for _ in xrange(length)))
 
 Review comment:
   Alright. I was a bit concerned about performance of `range` on Python 2, but 
the parameter is almost always small so the impact is negligible. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389361)
Time Spent: 40m  (was: 0.5h)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9331) The Row object needs better builders

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9331?focusedWorklogId=389364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389364
 ]

ASF GitHub Bot logged work on BEAM-9331:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:13
Start Date: 19/Feb/20 09:13
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10883: 
[BEAM-9331] Add better Row builders
URL: https://github.com/apache/beam/pull/10883#discussion_r381160859
 
 

 ##
 File path: sdks/java/core/src/test/java/org/apache/beam/sdk/values/RowTest.java
 ##
 @@ -477,6 +477,172 @@ public void testCreateMapWithRowValue() {
 assertEquals(data, row.getMap("map"));
   }
 
+  @Test
 
 Review comment:
   What get's people confused on the current API is that you can't loop over 
the values of a Row with `getValue()` and can't supply them to the Builder as 
the `addValue()` expects the input type, while the getValue returns a base type.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389364)
Time Spent: 1h 40m  (was: 1.5h)

> The Row object needs better builders
> 
>
> Key: BEAM-9331
> URL: https://issues.apache.org/jira/browse/BEAM-9331
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Users should be able to build a Row object by specifying field names. Desired 
> syntax:
>  
> Row.withSchema(schema)
>    .withFieldName("field1", "value)
>   .withFieldName("field2.field3", value)
>   .build()
>  
> Users should also have a builder that allows taking an existing row and 
> changing specific fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9276) python: create a class to encapsulate the work required to submit a pipeline to a job service

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9276?focusedWorklogId=389367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389367
 ]

ASF GitHub Bot logged work on BEAM-9276:


Author: ASF GitHub Bot
Created on: 19/Feb/20 09:29
Start Date: 19/Feb/20 09:29
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10811: [BEAM-9276] 
Create a class to encapsulate the steps required to submit a pipeline
URL: https://github.com/apache/beam/pull/10811#discussion_r381169290
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -74,6 +79,149 @@
 _LOGGER = logging.getLogger(__name__)
 
 
+class PortableJobServicePlan(object):
 
 Review comment:
   Not sure about the name. Perhaps `JobServiceHandle`? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389367)
Time Spent: 50m  (was: 40m)

> python: create a class to encapsulate the work required to submit a pipeline 
> to a job service
> -
>
> Key: BEAM-9276
> URL: https://issues.apache.org/jira/browse/BEAM-9276
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {{PortableRunner.run_pipeline}} is somewhat of a monolithic method for 
> submitting a pipeline.  It would be useful to factor out the code responsible 
> for interacting with the job and artifact services (prepare, stage, run) to 
> make this easier to modify this behavior in portable runner subclasses, as 
> well as in tests. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9240?focusedWorklogId=389393&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389393
 ]

ASF GitHub Bot logged work on BEAM-9240:


Author: ASF GitHub Bot
Created on: 19/Feb/20 11:14
Start Date: 19/Feb/20 11:14
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on issue #10744: [BEAM-9240]: Check 
for Nullability in typesEqual() method of FieldTyp…
URL: https://github.com/apache/beam/pull/10744#issuecomment-588168540
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389393)
Time Spent: 1h 40m  (was: 1.5h)

> Check for Nullability in typesEqual() method of FieldType class
> ---
>
> Key: BEAM-9240
> URL: https://issues.apache.org/jira/browse/BEAM-9240
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.18.0
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{If two schemas are created like this:}}
> {{Schema schema1 = Schema.builder().addStringField("col1").build();}}
>  {{Schema schema2 = Schema.builder().addNullableField("col1", 
> FieldType.STRING).build();}}
>  
> {{schema1.typeEquals(schema2) returns "true" even though the schemas differ 
> by Nullability}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9295) Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10

2020-02-19 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039923#comment-17039923
 ] 

David Morávek commented on BEAM-9295:
-

Great to see you pushing this forward! This should allow us to make great 
improvements to the batch runner, such as ParDo chaining! ;)

> Add Flink 1.10 build target and Make FlinkRunner compatible with Flink 1.10
> ---
>
> Key: BEAM-9295
> URL: https://issues.apache.org/jira/browse/BEAM-9295
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> Apache Flink 1.10 has completed the final release vote, see [1]. So, I would 
> like to add Flink 1.10 build target and make Flink Runner compatible with 
> Flink 1.10.
> And I appreciate it if you can leave your suggestions or comments!
> [1] 
> https://lists.apache.org/thread.html/r97672d4d1e47372cebf23e6643a6cc30a06bfbdf3f277b0be3695b15%40%3Cdev.flink.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389395&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389395
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 11:27
Start Date: 19/Feb/20 11:27
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588173559
 
 
   Seems yapf doesn't format docstrings 🤷‍♂  Should be fixed now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389395)
Time Spent: 12.5h  (was: 12h 20m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389410&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389410
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 12:12
Start Date: 19/Feb/20 12:12
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588193406
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389410)
Time Spent: 12h 40m  (was: 12.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389411
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 12:13
Start Date: 19/Feb/20 12:13
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588193611
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389411)
Time Spent: 12h 50m  (was: 12h 40m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389417
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 12:35
Start Date: 19/Feb/20 12:35
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588203696
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389417)
Time Spent: 13h  (was: 12h 50m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389418&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389418
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 12:36
Start Date: 19/Feb/20 12:36
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588193406
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389418)
Time Spent: 13h 10m  (was: 13h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389419&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389419
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 12:36
Start Date: 19/Feb/20 12:36
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588193611
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389419)
Time Spent: 13h 20m  (was: 13h 10m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7274) Protobuf Beam Schema support

2020-02-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7274:
---
Fix Version/s: 2.20.0

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 24h 10m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=389440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389440
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 19/Feb/20 13:23
Start Date: 19/Feb/20 13:23
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on pull request #10885: [BEAM-9085] 
Fix performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/10885#discussion_r381285675
 
 

 ##
 File path: sdks/python/apache_beam/testing/synthetic_pipeline.py
 ##
 @@ -415,19 +418,24 @@ def get_range_tracker(self, start_position, 
stop_position):
   tracker = range_trackers.UnsplittableRangeTracker(tracker)
 return tracker
 
+  @staticmethod
+  def random_bytes(length):
+"""Return random bytes."""
+return b''.join(
+(struct.pack('B', random.getrandbits(8)) for _ in xrange(length)))
+
   def _gen_kv_pair(self, index):
-r = np.random.RandomState(index)
-rand = r.random_sample()
+random.seed(index)
 
 Review comment:
   I'm pretty sure it's possible while having multiple range trackers and 
multiple workers.
   I wrote a fix and ran the benchmark. The performance is a bit worse:
   ```
   Python 2: 7.77728295326 (was 6.37386107445)
   Python 3: 6.88878361798 (was 5.698147751005)
   ```
   However, we could create a cache for generators (instances of 
`random.Random`). Then we could create a generator for each range tracker, not 
for each index. This reduces the offset of instantiating multiple generators:
   ```
   Python 2: 6.42752218246
   Python 3: 5.78995331198
   ```
   WDYT?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389440)
Time Spent: 50m  (was: 40m)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=389464&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389464
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 19/Feb/20 15:01
Start Date: 19/Feb/20 15:01
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10885: [BEAM-9085] Fix 
performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/10885#issuecomment-588277004
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389464)
Time Spent: 1h  (was: 50m)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=389465&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389465
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 19/Feb/20 15:02
Start Date: 19/Feb/20 15:02
Worklog Time Spent: 10m 
  Work Description: kkucharc commented on issue #10885: [BEAM-9085] Fix 
performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/10885#issuecomment-588277313
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389465)
Time Spent: 1h 10m  (was: 1h)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9316) FileIO.Write.relativeFileNaming should not be public

2020-02-19 Thread Claire McGinty (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040149#comment-17040149
 ] 

Claire McGinty commented on BEAM-9316:
--

thanks for taking a look [~jkff] and [~iemejia] ! my issue with it is that it 
actually has a bug resolving the relative filename when it's nested like this.

For example, I ran this snippet on DirectRunner and DataflowRunner (sorry, more 
Scala code...):

 
{code:java}
val pipeline: Pipeline = ...

pipeline.apply(
  Create.of(ImmutableList.of("a1", "b1"))
).apply(
  FileIO.writeDynamic()
.by((element: String) => element.charAt(0).toString)
.to("gs://some_bucket/write_dynamic")
.via(TextIO.sink())
.withNaming((dst: String) =>
  FileIO.Write.relativeFileNaming(
StaticValueProvider.of("nested/directory"),
new FileNaming {
  override def getFilename(
window: BoundedWindow,
pane: PaneInfo,
numShards: Int,
shardIndex: Int,
compression: Compression
  ): String = s"file_${shardIndex}_${dst}.txt"
}
  ))
.withDestinationCoder(StringUtf8Coder.of())
)
{code}
 

 

In DataflowRunner, files are written to 
gs://some_bucket/write_dynamic//nested/directory/file_0_b.txt and 
gs://some_bucket/write_dynamic//nested/directory/file_0_a.txt (note the extra 
forward slash–I think the filesystems API is prepending it to nested/directory 
when it matches the resource).

In DirectRunner, files are written to 
gs://some_bucket/write_dynamic//Users/clairemcginty/nested/directory/file_0_a.txt
 and 
gs://some_bucket/write_dynamic//Users/clairemcginty/nested/directory/file_0_b.txt.
 In this case it resolves the absolute path of "nested/directory" on my local 
FS and appends that to the gs://some_bucket/write_dynamic, again with the 
double forward slash.

It seems that the double forward slash is getting prepended when 
FileIO.Write.relativeFileNaming resolves the filename with the given 
baseDirectory (in this case "nested/directory" becomes "/nested/directory"). 
(If I get rid of .to() and just use FileIO.Write.relativeFileNaming( 
StaticValueProvider.of("gs://some_bucket/write_dynamic/nested/directory"), it 
works as expected.)

I think this is actually a difficult thing to fix since 
FileSystems.matchNewResource tries to parse the path scheme, so passing just a 
directory name doesn't work correctly. It seems more straightforward to just 
make the API easier to use correctly by renaming/deprecating things as you 
suggested.

 

> FileIO.Write.relativeFileNaming should not be public
> 
>
> Key: BEAM-9316
> URL: https://issues.apache.org/jira/browse/BEAM-9316
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-files
>Reporter: Claire McGinty
>Priority: Major
>
> I think the existing FileIO.writeDynamic is a bit easy to misuse, as 
> something like this looks correct, and compiles:
>  
> {{ FileIO.writeDynamic()}}
> {{  .by(...)}}
> {{  .withNaming(new SerializableFunction[String, FileNaming] {}}
> {{     override def apply(str: String): FileNaming =}}
> {{       FileIO.Write.relativeFileNaming(}}
> {{         "some/directory",}}
> {{         new FileNaming {}}
> {{           override defFilename(window: BoundedWindow, pane: PaneInfo, 
> numShards: Int, shardIndex: Int, compression: Compression): String = 
> "some_filename.txt"}}{{}}}
> {{  .via(...)}}
> {{  .to("gs://some/bucket")}}
>  
> However, for dynamic writes, if `outputDirectory` (.to("...")) is set, under 
> the hood, Beam will wrap the provided `fileNamingFn` in 
> `FileIO.Write.relativeFileNaming(...)` as well, so it ends up as a nested 
> `relativeFileNaming` function. 
> ([https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1243)|https://github.com/apache/beam/blob/da9e17288e8473925674a4691d9e86252e67d7d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java#L1243])]
>  
> IMO, `relativeFileNaming` should either be made private, so that it's only 
> used internally by FileIO.Write, or a precondition should be added when a 
> dynamic FileIO.Write is expanded, to check that `outputDirectory` can't be 
> set if the provided `fileNamingFn` is relative.
>  
> wdyt?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9308) Optimize state cleanup at end-of-window

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9308?focusedWorklogId=389483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389483
 ]

ASF GitHub Bot logged work on BEAM-9308:


Author: ASF GitHub Bot
Created on: 19/Feb/20 16:16
Start Date: 19/Feb/20 16:16
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #10852: [BEAM-9308] 
Decorrelate state cleanup timers
URL: https://github.com/apache/beam/pull/10852#issuecomment-588303100
 
 
   cc @reuvenlax maybe?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389483)
Time Spent: 0.5h  (was: 20m)

> Optimize state cleanup at end-of-window
> ---
>
> Key: BEAM-9308
> URL: https://issues.apache.org/jira/browse/BEAM-9308
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Steve Niemitz
>Assignee: Steve Niemitz
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using state with a large keyspace, you can end up with a large amount of 
> state cleanup timers set to fire all 1ms after the end of a window.  This can 
> cause a momentary (I've observed 1-3 minute) lag in processing while windmill 
> and the java harness fire and process these cleanup timers.
> By spreading the firing over a short period after the end of the window, we 
> can decorrelate the firing of the timers and smooth the load out, resulting 
> in much less impact from state cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389490&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389490
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 16:36
Start Date: 19/Feb/20 16:36
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389490)
Time Spent: 13.5h  (was: 13h 20m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389491&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389491
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 19/Feb/20 16:38
Start Date: 19/Feb/20 16:38
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-588315698
 
 
   @EDjur -- Thank you, merged this.
   
   Could we update 'google-cloud-videointelligence>=1.8.0<=1.12.1', to use the 
recently released 1.13.0 version as well? Do we need anything special?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389491)
Time Spent: 13h 40m  (was: 13.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040225#comment-17040225
 ] 

Ismaël Mejía commented on BEAM-9326:


Thanks for answering [~kenn]  I see the covariant argument makes sense for 
MapElements since the type is generic, but what I found odd in the case of 
JsonToRow was that the type accepted was `? extends String` because String is 
final in Java so for basic definition nobody can inherit String. It is a pretty 
minor fix, but worth the adjustment for the sake of 'correctness'.

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation

2020-02-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9327:
---
Status: Open  (was: Triage Needed)

> Avoid creating a new accumulator for each add input in Combine translation
> --
>
> Key: BEAM-9327
> URL: https://issues.apache.org/jira/browse/BEAM-9327
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>
> similar to latest inprovement in the current runner. See: 
> [https://www.youtube.com/watch?v=ZIFtmx8nBow&t=721s] min 12



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9333) DataCatalogPipelineOptions is not registered

2020-02-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9333:
---
Status: Open  (was: Triage Needed)

> DataCatalogPipelineOptions is not registered
> 
>
> Key: BEAM-9333
> URL: https://issues.apache.org/jira/browse/BEAM-9333
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ... so its not possible to set Data Catalog options from the command line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389503&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389503
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:04
Start Date: 19/Feb/20 17:04
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-588329418
 
 
   Yes that sadly is common, for the moment ignore such errors if they repeat 
it means they are 'flaky tests', please report in the mailing list so someone 
takes a look.
   
   I took an ultra quick look at your PR. It is currently not running the tests 
because you forgot to add the twister2 runner in the settings.gradle file.
   
   Please also try to fix the name of the classes 
`Twister2PiplineExecutionEnvironment` and `Twister2PiplineResult` so they are 
`Pipeline` instead of `Pipline`.
   
   I also look at the Validates runner tests that are excluded and I would 
strongly recommend you to try to make the UsesParDoLifecycle tests pass, those 
matter. If your ParDo translation is correct (and you respect the lifecycle) 
you will easily be able to remove the `UsesBoundedSplittableParDo` exclusion 
too because they are expanded into normal ParDos by default.
   
   Finally worth also to try to get the `UsesSchema` tests passing because that 
should not be difficult, that's just passing around some object you are 
probably not doing in the translation and the Twister2 users will win the 
guarantee that SQL/Schema-based PCollections are working.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389503)
Time Spent: 3.5h  (was: 3h 20m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389504&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389504
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:07
Start Date: 19/Feb/20 17:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-588331367
 
 
   Forgot to mention, can you please do a rebase of your branch, your commits 
look a bit messed up.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389504)
Time Spent: 3h 40m  (was: 3.5h)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9333) DataCatalogPipelineOptions is not registered

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9333?focusedWorklogId=389505&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389505
 ]

ASF GitHub Bot logged work on BEAM-9333:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:09
Start Date: 19/Feb/20 17:09
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10896: 
[BEAM-9333] Add DataCatalogPipelineOptionsRegistrar
URL: https://github.com/apache/beam/pull/10896
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389505)
Time Spent: 0.5h  (was: 20m)

> DataCatalogPipelineOptions is not registered
> 
>
> Key: BEAM-9333
> URL: https://issues.apache.org/jira/browse/BEAM-9333
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ... so its not possible to set Data Catalog options from the command line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389507
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:10
Start Date: 19/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-588332646
 
 
   WindowingTests are failing in the ValidatesRunner suite. So take a look at 
those. I would be a good idea to add to this PR a Jenkins invocation of the 
Twister2 ValidatesRunner (VR) tests see for reference 
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389507)
Time Spent: 4h  (was: 3h 50m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389506&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389506
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:10
Start Date: 19/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-588332646
 
 
   WindowingTests are failing in the ValidatesRunner suite. So take a look at 
those. I would be a good idea to add to this PR a Jenkins invocation of the 
twister2 validatesrunner tests see for reference 
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389506)
Time Spent: 3h 50m  (was: 3h 40m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389508&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389508
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:11
Start Date: 19/Feb/20 17:11
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-588332646
 
 
   WindowingTests are failing in the ValidatesRunner suite. So take a look at 
those. It would be a good idea to add to this PR a Jenkins invocation of the 
Twister2 ValidatesRunner (VR) tests see for reference 
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Flink.groovy
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389508)
Time Spent: 4h 10m  (was: 4h)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2020-02-19 Thread Wenbing Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8965 started by Wenbing Bai.
-
> WriteToBigQuery failed in BundleBasedDirectRunner
> -
>
> Key: BEAM-8965
> URL: https://issues.apache.org/jira/browse/BEAM-8965
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Wenbing Bai
>Assignee: Wenbing Bai
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *{{WriteToBigQuery}}* fails in *{{BundleBasedDirectRunner}}* with error 
> {{PCollection of size 2 with more than one element accessed as a singleton 
> view.}}
> Here is the code
>  
> {code:python}
> with Pipeline() as p:
> query_results = (
> p 
> | beam.io.Read(beam.io.BigQuerySource(
> query='SELECT ... FROM ...')
> )
> query_results | beam.io.gcp.WriteToBigQuery(
> table=,
> method=WriteToBigQuery.Method.FILE_LOADS,
> schema={"fields": []}
> )
> {code}
>  
> Here is the error
>  
> {code:none}
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRunner.process
>     def process(self, windowed_value):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>     self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 849, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>     raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>     return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>     self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>     [si[global_window] for si in self.side_inputs]))
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py",
>  line 65, in __getitem__
>     _FilteringIterable(self._iterable, target_window), self._view_options)
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
>  line 443, in _from_runtime_iterable
>     len(head), str(head[0]), str(head[1])))
> ValueError: PCollection of size 2 with more than one element accessed as a 
> singleton view. First two elements encountered are 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 
> 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=389512&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389512
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:33
Start Date: 19/Feb/20 17:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10885: [BEAM-9085] 
Fix performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/10885#discussion_r381431532
 
 

 ##
 File path: sdks/python/apache_beam/testing/synthetic_pipeline.py
 ##
 @@ -415,19 +418,24 @@ def get_range_tracker(self, start_position, 
stop_position):
   tracker = range_trackers.UnsplittableRangeTracker(tracker)
 return tracker
 
+  @staticmethod
+  def random_bytes(length):
+"""Return random bytes."""
+return b''.join(
+(struct.pack('B', random.getrandbits(8)) for _ in xrange(length)))
 
 Review comment:
   We can add `from builtins import range` to use py3 implementation of range 
on py2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389512)
Time Spent: 1h 20m  (was: 1h 10m)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389515
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:38
Start Date: 19/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO 
compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-588347547
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389515)
Time Spent: 10h 40m  (was: 10.5h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389517
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:41
Start Date: 19/Feb/20 17:41
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381435599
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java
 ##
 @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException {
   assertThat(readerOrig, instanceOf(CompressedReader.class));
   CompressedReader reader = (CompressedReader) readerOrig;
   // before starting
-  assertEquals(0.0, reader.getFractionConsumed(), 1e-6);
+  assertEquals(0.0, reader.getFractionConsumed(), DELTA);
+  assertEquals(0, reader.getSplitPointsConsumed());
+  assertEquals(1, reader.getSplitPointsRemaining());
+
+  // confirm has three records
+  for (int i = 0; i < numRecords; ++i) {
+if (i == 0) {
+  assertTrue(reader.start());
+} else {
+  assertTrue(reader.advance());
+}
+assertEquals(0, reader.getSplitPointsConsumed());
+assertEquals(1, reader.getSplitPointsRemaining());
+  }
+  assertFalse(reader.advance());
+
+  // after reading empty source
 
 Review comment:
   Best to add that comment over the LZOP enum since nobody reading the 
documentation is going to find the comment in the tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389517)
Time Spent: 10h 50m  (was: 10h 40m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389528
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381437087
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
 
 Review comment:
   ```suggestion
  * io.airlift:aircompressor} and {@code 
com.facebook.presto.hadoop:hadoop-apache2}. Attempts to read or write
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389528)
Time Spent: 11h 10m  (was: 11h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389525&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389525
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381437508
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
+   * .lzo_deflate files without {@code airlift/aircompressor} and {@code 
presto-hadoop-apache2}
 
 Review comment:
   ```suggestion
  * {@code .lzo_deflate} files without {@code io.airlift:aircompressor} and 
{@code com.facebook.presto.hadoop:hadoop-apache2}
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389525)
Time Spent: 11h  (was: 10h 50m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389527
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381438625
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
+   * .lzo_deflate files without {@code airlift/aircompressor} and {@code 
presto-hadoop-apache2}
+   * loaded will result in {@code NoClassDefFoundError} at runtime.
+   */
+  LZO(".lzo_deflate", ".lzo_deflate") {
+@Override
+public ReadableByteChannel readDecompressed(ReadableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoInputStream(Channels.newInputStream(channel)));
+}
+
+@Override
+public WritableByteChannel writeCompressed(WritableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel)));
+}
+  },
+
+  /**
+   * LZOP compression using LZOP Codec. .lzo extension is specified for the 
files with magic bytes
+   * and headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZOP 
compression by default,
+   * so it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write .lzo files
 
 Review comment:
   ```suggestion
  * io.airlift:aircompressor} and {@code 
com.facebook.presto.hadoop:hadoop-apache2}. Attempts to read or write {@code 
.lzo} files
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389527)
Time Spent: 11h 10m  (was: 11h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389531&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389531
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381439166
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
+   * .lzo_deflate files without {@code airlift/aircompressor} and {@code 
presto-hadoop-apache2}
+   * loaded will result in {@code NoClassDefFoundError} at runtime.
+   */
+  LZO(".lzo_deflate", ".lzo_deflate") {
+@Override
+public ReadableByteChannel readDecompressed(ReadableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoInputStream(Channels.newInputStream(channel)));
+}
+
+@Override
+public WritableByteChannel writeCompressed(WritableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel)));
+}
+  },
+
+  /**
+   * LZOP compression using LZOP Codec. .lzo extension is specified for the 
files with magic bytes
+   * and headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZOP 
compression by default,
+   * so it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write .lzo files
+   * without {@code airlift/aircompressor} and {@code presto-hadoop-apache2} 
loaded will result in
 
 Review comment:
   ```suggestion
  * without {@code io.airlift:aircompressor} and {@code 
com.facebook.presto.hadoop:hadoop-apache2} loaded will result in a
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389531)
Time Spent: 11h 20m  (was: 11h 10m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389529
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381437988
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
+   * .lzo_deflate files without {@code airlift/aircompressor} and {@code 
presto-hadoop-apache2}
+   * loaded will result in {@code NoClassDefFoundError} at runtime.
+   */
+  LZO(".lzo_deflate", ".lzo_deflate") {
+@Override
+public ReadableByteChannel readDecompressed(ReadableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoInputStream(Channels.newInputStream(channel)));
+}
+
+@Override
+public WritableByteChannel writeCompressed(WritableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel)));
+}
+  },
+
+  /**
+   * LZOP compression using LZOP Codec. .lzo extension is specified for the 
files with magic bytes
 
 Review comment:
   ```suggestion
  * LZOP compression using LZOP codec. {@code .lzo} extension is specified 
for files with magic bytes
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389529)
Time Spent: 11h 10m  (was: 11h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389524
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381436101
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
 
 Review comment:
   ```suggestion
  * LZO compression using LZO codec. {@code .lzo_deflate} extension is 
specified for files which
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389524)
Time Spent: 11h  (was: 10h 50m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389526
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381440755
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
+   * .lzo_deflate files without {@code airlift/aircompressor} and {@code 
presto-hadoop-apache2}
+   * loaded will result in {@code NoClassDefFoundError} at runtime.
+   */
+  LZO(".lzo_deflate", ".lzo_deflate") {
+@Override
+public ReadableByteChannel readDecompressed(ReadableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoInputStream(Channels.newInputStream(channel)));
+}
+
+@Override
+public WritableByteChannel writeCompressed(WritableByteChannel channel) 
throws IOException {
+  return Channels.newChannel(
+  
LzoCompression.createLzoOutputStream(Channels.newOutputStream(channel)));
+}
+  },
+
+  /**
+   * LZOP compression using LZOP Codec. .lzo extension is specified for the 
files with magic bytes
+   * and headers.
+   *
 
 Review comment:
   ```suggestion
  *
  * Warning: The LZOP codec being used does not support 
concatenated LZOP streams and will
  * silently ignore data after the end of the first LZOP stream.
  *
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389526)
Time Spent: 11h 10m  (was: 11h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389530
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:51
Start Date: 19/Feb/20 17:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r381437634
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java
 ##
 @@ -152,6 +153,54 @@ public WritableByteChannel 
writeCompressed(WritableByteChannel channel) throws I
 }
   },
 
+  /**
+   * LZO compression using LZO Codec. .lzo_deflate extension is specified for 
the files which just
+   * use the LZO algorithm without headers.
+   *
+   * The Beam Java SDK does not pull in the required libraries for LZO 
compression by default, so
+   * it is the user's responsibility to declare an explicit dependency on 
{@code
+   * airlift/aircompressor} and {@code presto-hadoop-apache2}. Attempts to 
read or write
+   * .lzo_deflate files without {@code airlift/aircompressor} and {@code 
presto-hadoop-apache2}
+   * loaded will result in {@code NoClassDefFoundError} at runtime.
 
 Review comment:
   ```suggestion
  * loaded will result in a {@code NoClassDefFoundError} at runtime.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389530)
Time Spent: 11h 20m  (was: 11h 10m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389532
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:52
Start Date: 19/Feb/20 17:52
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10254: [BEAM-8564] Add LZO 
compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-588354735
 
 
   After committing the comments, you may need to run spotlessApply again.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389532)
Time Spent: 11.5h  (was: 11h 20m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9035) Typed options for Row Schema and Fields

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=389533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389533
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 19/Feb/20 17:54
Start Date: 19/Feb/20 17:54
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10413: 
[BEAM-9035] Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#discussion_r381443180
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##
 @@ -950,6 +1007,338 @@ public int hashCode() {
 }
   }
 
+  public static class Options implements Serializable {
+private Map options;
+
+@Override
+public String toString() {
+  TreeMap sorted = new TreeMap(options);
+  return "{" + sorted + '}';
+}
+
+Map getAllOptions() {
+  return options;
+}
+
+public Set getOptionNames() {
+  return options.keySet();
+}
+
+public boolean hasOptions() {
+  return options.size() > 0;
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) {
+return true;
+  }
+  if (o == null || getClass() != o.getClass()) {
+return false;
+  }
+  Options options1 = (Options) o;
+  if (!options.keySet().equals(options1.options.keySet())) {
+return false;
+  }
+  for (Map.Entry optionEntry : options.entrySet()) {
+Option thisOption = optionEntry.getValue();
+Option otherOption = options1.options.get(optionEntry.getKey());
+if (!thisOption.getType().equals(otherOption.getType())) {
+  return false;
+}
+switch (thisOption.getType().getTypeName()) {
+  case BYTE:
+  case INT16:
+  case INT32:
+  case INT64:
+  case DECIMAL:
+  case FLOAT:
+  case DOUBLE:
+  case STRING:
+  case DATETIME:
+  case BOOLEAN:
+  case ARRAY:
+  case ITERABLE:
+  case MAP:
+  case ROW:
+  case LOGICAL_TYPE:
+if (!thisOption.getValue().equals(otherOption.getValue())) {
+  return false;
+}
+break;
+  case BYTES:
+if (!Arrays.equals((byte[]) thisOption.getValue(), 
otherOption.getValue())) {
+  return false;
+}
+}
+  }
+  return true;
+}
+
+@Override
+public int hashCode() {
+  return Objects.hash(options);
+}
+
+static class Option implements Serializable {
+  Option(FieldType type, Object value) {
+this.type = type;
+this.value = value;
+  }
+
+  private FieldType type;
+  private Object value;
+
+  @SuppressWarnings("TypeParameterUnusedInFormals")
+   T getValue() {
+return (T) value;
+  }
+
+  FieldType getType() {
+return type;
+  }
+
+  @Override
+  public String toString() {
+return "Option{type=" + type + ", value=" + value + '}';
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+Option option = (Option) o;
+return Objects.equals(type, option.type) && Objects.equals(value, 
option.value);
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hash(type, value);
+  }
+}
+
+public static class Builder {
+  private Map options;
+
+  Builder(Map init) {
+this.options = new HashMap<>(init);
+  }
+
+  Builder() {
+this(new HashMap<>());
+  }
+
+  public Builder setByteOption(String optionName, Byte value) {
+setOption(optionName, FieldType.BYTE, value);
+return this;
+  }
+
+  public Builder setBytesOption(String optionName, byte[] value) {
+setOption(optionName, FieldType.BYTES, value);
+return this;
+  }
+
+  public Builder setInt16Option(String optionName, Short value) {
+setOption(optionName, FieldType.INT16, value);
+return this;
+  }
+
+  public Builder setInt32Option(String optionName, Integer value) {
+setOption(optionName, FieldType.INT32, value);
+return this;
+  }
+
+  public Builder setInt64Option(String optionName, Long value) {
+setOption(optionName, FieldType.INT64, value);
+return this;
+  }
+
+  public Builder setDecimalOption(String optionName, BigDecimal value) {
+setOption(optionName, FieldType.DECIMAL, value);
+return this;
+  }
+
+  public Builder setFloatOption(String optionName, Float value) {
+setOption(optionName, FieldType.FLOAT, value);
+return this;
+  }

[jira] [Work started] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-19 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9228 started by Hannah Jiang.
--
> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=389539&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389539
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:12
Start Date: 19/Feb/20 18:12
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-588364642
 
 
   No problem, @iemejia, thanks for the heads up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389539)
Time Spent: 13h 10m  (was: 13h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-02-19 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9136 started by Hannah Jiang.
--
> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=389541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389541
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:15
Start Date: 19/Feb/20 18:15
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-588366147
 
 
   Yeah, I'll go ahead and merge since the regular pre-commits and the 
ValidatesRunner test seems to have all passed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389541)
Remaining Estimate: 148h 40m  (was: 148h 50m)
Time Spent: 19h 20m  (was: 19h 10m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 19h 20m
>  Remaining Estimate: 148h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=389542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389542
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:15
Start Date: 19/Feb/20 18:15
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10769: 
[BEAM-8889] Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389542)
Remaining Estimate: 148.5h  (was: 148h 40m)
Time Spent: 19.5h  (was: 19h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 19.5h
>  Remaining Estimate: 148.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-02-19 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9136:
---
Fix Version/s: (was: 2.20.0)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389543&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389543
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:17
Start Date: 19/Feb/20 18:17
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10879: 
[BEAM-9326] Make JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879#discussion_r381455475
 
 

 ##
 File path: sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java
 ##
 @@ -239,8 +239,7 @@ public void testMultipleApply() {
 pipeline.run();
   }
 
-  private static PTransform, 
PCollection> addSuffix(
-  final String suffix) {
+  private static MapElements addSuffix(final String suffix) {
 
 Review comment:
   Here, too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389543)
Time Spent: 20m  (was: 10m)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389544
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:17
Start Date: 19/Feb/20 18:17
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10879: 
[BEAM-9326] Make JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879#discussion_r381455388
 
 

 ##
 File path: sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java
 ##
 @@ -225,7 +225,7 @@ public void testPipelineSDKExceptionHandling() {
   @Test
   @Category(ValidatesRunner.class)
   public void testMultipleApply() {
-PTransform, PCollection> myTransform 
= addSuffix("+");
+MapElements myTransform = addSuffix("+");
 
 Review comment:
   Suggest changing `? extends String` to just `String` but otherwise leaving 
the type as `PTransform`. This is a good way to indicate that you don't intend 
to use any other facts about the transform.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389544)
Time Spent: 20m  (was: 10m)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9211) Spark portable jar test script is missing

2020-02-19 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-9211.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
> Fix For: Not applicable
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9200) Portable job jar postcommits failing

2020-02-19 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-9200.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Portable job jar postcommits failing
> 
>
> Key: BEAM-9200
> URL: https://issues.apache.org/jira/browse/BEAM-9200
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink, portability-spark
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> 15:25:58 Execution failed for task 
> ':runners:spark:job-server:testJavaJarCreatorPy37'.
> 15:25:58 > Could not get unknown property 'python_sdk_version' for project 
> ':runners:spark:job-server' of type org.gradle.api.Project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9276) python: create a class to encapsulate the work required to submit a pipeline to a job service

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9276?focusedWorklogId=389547&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389547
 ]

ASF GitHub Bot logged work on BEAM-9276:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:27
Start Date: 19/Feb/20 18:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10811: [BEAM-9276] 
Create a class to encapsulate the steps required to submit a pipeline
URL: https://github.com/apache/beam/pull/10811#discussion_r381461755
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -74,6 +79,149 @@
 _LOGGER = logging.getLogger(__name__)
 
 
+class PortableJobServicePlan(object):
 
 Review comment:
   yeah, I was unsure on that as well.  Open to input. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389547)
Time Spent: 1h  (was: 50m)

> python: create a class to encapsulate the work required to submit a pipeline 
> to a job service
> -
>
> Key: BEAM-9276
> URL: https://issues.apache.org/jira/browse/BEAM-9276
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{PortableRunner.run_pipeline}} is somewhat of a monolithic method for 
> submitting a pipeline.  It would be useful to factor out the code responsible 
> for interacting with the job and artifact services (prepare, stage, run) to 
> make this easier to modify this behavior in portable runner subclasses, as 
> well as in tests. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9035) Typed options for Row Schema and Fields

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=389550&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389550
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:34
Start Date: 19/Feb/20 18:34
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #10413: 
[BEAM-9035] Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#discussion_r381465187
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##
 @@ -950,6 +1007,338 @@ public int hashCode() {
 }
   }
 
+  public static class Options implements Serializable {
+private Map options;
+
+@Override
+public String toString() {
+  TreeMap sorted = new TreeMap(options);
+  return "{" + sorted + '}';
+}
+
+Map getAllOptions() {
+  return options;
+}
+
+public Set getOptionNames() {
+  return options.keySet();
+}
+
+public boolean hasOptions() {
+  return options.size() > 0;
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) {
+return true;
+  }
+  if (o == null || getClass() != o.getClass()) {
+return false;
+  }
+  Options options1 = (Options) o;
+  if (!options.keySet().equals(options1.options.keySet())) {
+return false;
+  }
+  for (Map.Entry optionEntry : options.entrySet()) {
+Option thisOption = optionEntry.getValue();
+Option otherOption = options1.options.get(optionEntry.getKey());
+if (!thisOption.getType().equals(otherOption.getType())) {
+  return false;
+}
+switch (thisOption.getType().getTypeName()) {
+  case BYTE:
+  case INT16:
+  case INT32:
+  case INT64:
+  case DECIMAL:
+  case FLOAT:
+  case DOUBLE:
+  case STRING:
+  case DATETIME:
+  case BOOLEAN:
+  case ARRAY:
+  case ITERABLE:
+  case MAP:
+  case ROW:
+  case LOGICAL_TYPE:
+if (!thisOption.getValue().equals(otherOption.getValue())) {
+  return false;
+}
+break;
+  case BYTES:
+if (!Arrays.equals((byte[]) thisOption.getValue(), 
otherOption.getValue())) {
+  return false;
+}
+}
+  }
+  return true;
+}
+
+@Override
+public int hashCode() {
+  return Objects.hash(options);
+}
+
+static class Option implements Serializable {
+  Option(FieldType type, Object value) {
+this.type = type;
+this.value = value;
+  }
+
+  private FieldType type;
+  private Object value;
+
+  @SuppressWarnings("TypeParameterUnusedInFormals")
+   T getValue() {
+return (T) value;
+  }
+
+  FieldType getType() {
+return type;
+  }
+
+  @Override
+  public String toString() {
+return "Option{type=" + type + ", value=" + value + '}';
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+Option option = (Option) o;
+return Objects.equals(type, option.type) && Objects.equals(value, 
option.value);
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hash(type, value);
+  }
+}
+
+public static class Builder {
+  private Map options;
+
+  Builder(Map init) {
+this.options = new HashMap<>(init);
+  }
+
+  Builder() {
+this(new HashMap<>());
+  }
+
+  public Builder setByteOption(String optionName, Byte value) {
+setOption(optionName, FieldType.BYTE, value);
+return this;
+  }
+
+  public Builder setBytesOption(String optionName, byte[] value) {
+setOption(optionName, FieldType.BYTES, value);
+return this;
+  }
+
+  public Builder setInt16Option(String optionName, Short value) {
+setOption(optionName, FieldType.INT16, value);
+return this;
+  }
+
+  public Builder setInt32Option(String optionName, Integer value) {
+setOption(optionName, FieldType.INT32, value);
+return this;
+  }
+
+  public Builder setInt64Option(String optionName, Long value) {
+setOption(optionName, FieldType.INT64, value);
+return this;
+  }
+
+  public Builder setDecimalOption(String optionName, BigDecimal value) {
+setOption(optionName, FieldType.DECIMAL, value);
+return this;
+  }
+
+  public Builder setFloatOption(String optionName, Float value) {
+setOption(optionName, FieldType.FLOAT, value);
+return this;
+  }

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389551
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:35
Start Date: 19/Feb/20 18:35
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#issuecomment-588376040
 
 
   R:  @rohdesamuel,
   
   Hi Sam, could you please take a look at the change for `background caching 
job` without `StreamingCache` in place? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389551)
Time Spent: 60h 10m  (was: 60h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 60h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?focusedWorklogId=389555&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389555
 ]

ASF GitHub Bot logged work on BEAM-9085:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:41
Start Date: 19/Feb/20 18:41
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10885: [BEAM-9085] 
Fix performance regression in SyntheticSource
URL: https://github.com/apache/beam/pull/10885#discussion_r381469542
 
 

 ##
 File path: sdks/python/apache_beam/testing/synthetic_pipeline.py
 ##
 @@ -418,29 +420,32 @@ def get_range_tracker(self, start_position, 
stop_position):
 return tracker
 
   @staticmethod
-  def random_bytes(length):
+  def random_bytes(length, generator):
 """Return random bytes."""
 return b''.join(
-(struct.pack('B', random.getrandbits(8)) for _ in range(length)))
+(struct.pack('B', generator.getrandbits(8)) for _ in range(length)))
 
-  def _gen_kv_pair(self, index):
-random.seed(index)
-rand = random.random()
+  def _gen_kv_pair(self, generator, index):
+generator.seed(index)
+rand = generator.random()
 
 # Determines whether to generate hot key or not.
 if rand < self._hot_key_fraction:
   # Generate hot key.
   # An integer is randomly selected from the range [0, numHotKeys-1]
   # with equal probability.
-  random.seed(index % self._num_hot_keys)
-return self.random_bytes(self._key_size), self.random_bytes(
-  self._value_size)
+  generator.seed(index % self._num_hot_keys)
+return self.random_bytes(self._key_size, generator), self.random_bytes(
+  self._value_size, generator)
 
   def read(self, range_tracker):
 index = range_tracker.start_position()
+# Get an instance of pseudo-random number generator
+generator = self._generators[(
 
 Review comment:
   I don't see a difference in performance if we instantiate the generator in 
`read`, but don't use the cache. Do we need the cache?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389555)
Time Spent: 1.5h  (was: 1h 20m)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=389556&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389556
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 19/Feb/20 18:42
Start Date: 19/Feb/20 18:42
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#issuecomment-588379763
 
 
   Rebased, @aaltay, could you please help me trigger the retest? The test 
didn't run. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389556)
Time Spent: 47.5h  (was: 47h 20m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 47.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389561
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:07
Start Date: 19/Feb/20 19:07
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10894: [BEAM-8280] Enable and 
improve IOTypeHints debug_str traceback
URL: https://github.com/apache/beam/pull/10894#issuecomment-588393384
 
 
   R: @kennknowles @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389561)
Time Spent: 4h 50m  (was: 4h 40m)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389562
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:08
Start Date: 19/Feb/20 19:08
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10892: [BEAM-8335] Make 
TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#issuecomment-588393723
 
 
   R: @davidyan74 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389562)
Time Spent: 60h 20m  (was: 60h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 60h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389574&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389574
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:23
Start Date: 19/Feb/20 19:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10843: [BEAM-8201] Pass 
all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843#issuecomment-588401221
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389574)
Time Spent: 1h 40m  (was: 1.5h)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389575&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389575
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:23
Start Date: 19/Feb/20 19:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10843: [BEAM-8201] Pass 
all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843#issuecomment-588401264
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389575)
Time Spent: 1h 50m  (was: 1h 40m)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389580
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:39
Start Date: 19/Feb/20 19:39
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10904: [BEAM-8280] 
no_annotations decorator
URL: https://github.com/apache/beam/pull/10904
 
 
   Adds `@decorators.no_annotations` decorator to disable on a specific
   function.
   
   Split out of #10717
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommi

[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389581&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389581
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:40
Start Date: 19/Feb/20 19:40
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10904: [BEAM-8280] 
no_annotations decorator
URL: https://github.com/apache/beam/pull/10904#issuecomment-588410361
 
 
   R: @robertwb @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389581)
Time Spent: 5h 10m  (was: 5h)

> re-enable IOTypeHints.from_callable
> ---
>
> Key: BEAM-8280
> URL: https://issues.apache.org/jira/browse/BEAM-8280
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> See https://issues.apache.org/jira/browse/BEAM-8279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389585
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:50
Start Date: 19/Feb/20 19:50
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381506690
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -133,15 +140,17 @@ def __eq__(self, other):
 return self.new_watermark == other.new_watermark and self.tag == other.tag
 
   def __hash__(self):
-return hash(self.new_watermark)
+return hash(str(self.new_watermark) + str(self.tag))
 
   def __lt__(self, other):
 return self.new_watermark < other.new_watermark
 
   def to_runner_api(self, unused_element_coder):
+tag = 'None' if self.tag is None else self.tag
 
 Review comment:
   Is there a reason why the string 'None' is used as the tag here (instead of 
just None)? Is there a way to distinguish between a non-existent tag and a tag 
named "None"?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389585)
Time Spent: 61h  (was: 60h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389584
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:50
Start Date: 19/Feb/20 19:50
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381504168
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -276,8 +295,11 @@ def to_runner_api_parameter(self, context):
   @PTransform.register_urn(
   common_urns.primitives.TEST_STREAM.urn,
   beam_runner_api_pb2.TestStreamPayload)
-  def from_runner_api_parameter(payload, context):
+  def from_runner_api_parameter(ptransform, payload, context):
 coder = context.coders.get_by_id(payload.coder_id)
+output_tags = set(
+None if k == 'None' else k for k in ptransform.outputs.keys())
 
 Review comment:
   Following up my other comment, what happens if the user tags their output to 
be the string 'None'?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389584)
Time Spent: 60h 50m  (was: 60h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 60h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389583&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389583
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:50
Start Date: 19/Feb/20 19:50
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381502828
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -188,7 +198,16 @@ def _infer_output_coder(self, input_type=None, 
input_coder=None):
   def expand(self, pbegin):
 assert isinstance(pbegin, pvalue.PBegin)
 self.pipeline = pbegin.pipeline
-return pvalue.PCollection(self.pipeline, is_bounded=False)
+if len(self.output_tags) == 0:
+  self.output_tags = set([None])
+
+if len(self.output_tags) == 1:
 
 Review comment:
   Is there a reason why we treat the case with only one output_tag as a 
special case (i.e. not returning a dict) ? Please add the reason as a comment 
in this code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389583)
Time Spent: 60h 40m  (was: 60.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 60h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389582&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389582
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:50
Start Date: 19/Feb/20 19:50
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381501573
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -188,7 +198,16 @@ def _infer_output_coder(self, input_type=None, 
input_coder=None):
   def expand(self, pbegin):
 assert isinstance(pbegin, pvalue.PBegin)
 self.pipeline = pbegin.pipeline
-return pvalue.PCollection(self.pipeline, is_bounded=False)
+if len(self.output_tags) == 0:
 
 Review comment:
   nit: this can be simplified to just
   ```
   if not self.output_tags:
   ```
   Google's style guide recommends using boolean context whenever possible. Not 
sure whether there is such a rule in Beam. But since you're already using 
boolean context elsewhere in the code, it's good to be consistent.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389582)
Time Spent: 60.5h  (was: 60h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 60.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389586
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:51
Start Date: 19/Feb/20 19:51
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10893: [BEAM-5605] 
Honor the bounded source timestamps timestamp in the SDF wrapper.
URL: https://github.com/apache/beam/pull/10893#discussion_r381507574
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/Read.java
 ##
 @@ -242,18 +243,19 @@ public void splitRestriction(
 }
 
 @NewTracker
-public RestrictionTracker, Object[]> restrictionTracker(
+public RestrictionTracker, TimestampedValue[]> 
restrictionTracker(
 @Restriction BoundedSource restriction, PipelineOptions 
pipelineOptions) {
   return new BoundedSourceAsSDFRestrictionTracker<>(restriction, 
pipelineOptions);
 }
 
 @ProcessElement
 public void processElement(
-RestrictionTracker, Object[]> tracker, 
OutputReceiver receiver)
+RestrictionTracker, TimestampedValue[]> tracker,
+OutputReceiver receiver)
 throws IOException {
-  Object[] out = new Object[1];
+  TimestampedValue[] out = new TimestampedValue[1];
   while (tracker.tryClaim(out)) {
-receiver.output((T) out[0]);
+receiver.outputWithTimestamp(out[0].getValue(), out[0].getTimestamp());
 
 Review comment:
   Is the timestamp for the output timestamp of records from BoundedSource?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389586)
Time Spent: 16h 20m  (was: 16h 10m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389587&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389587
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:55
Start Date: 19/Feb/20 19:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10892: [BEAM-8335] 
Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381495341
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -133,15 +140,17 @@ def __eq__(self, other):
 return self.new_watermark == other.new_watermark and self.tag == other.tag
 
   def __hash__(self):
-return hash(self.new_watermark)
+return hash(str(self.new_watermark) + str(self.tag))
 
   def __lt__(self, other):
 return self.new_watermark < other.new_watermark
 
   def to_runner_api(self, unused_element_coder):
+tag = 'None' if self.tag is None else self.tag
 return beam_runner_api_pb2.TestStreamPayload.Event(
 watermark_event=beam_runner_api_pb2.TestStreamPayload.Event.
-AdvanceWatermark(new_watermark=self.new_watermark.micros // 1000))
+AdvanceWatermark(
+new_watermark=self.new_watermark.micros // 1000, tag=tag))
 
 Review comment:
   assert that we aren't losing precision here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389587)
Time Spent: 61h 10m  (was: 61h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389588
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:55
Start Date: 19/Feb/20 19:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10892: [BEAM-8335] 
Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381501175
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream_test.py
 ##
 @@ -528,6 +529,56 @@ def process(
 
 p.run()
 
+  def test_roundtrip_proto(self):
+test_stream = (TestStream()
+   .advance_processing_time(1)
+   .advance_watermark_to(2)
+   .add_elements([1, 2, 3])) # yapf: disable
+
+p = TestPipeline(options=StandardOptions(streaming=True))
+p | test_stream
+
+pipeline_proto, context = p.to_runner_api(return_context=True)
+
+for t in pipeline_proto.components.transforms.values():
+  if t.spec.urn == common_urns.primitives.TEST_STREAM.urn:
+test_stream_proto = t
+
+self.assertTrue(test_stream_proto)
+roundtrip_test_stream = TestStream().from_runner_api(
+test_stream_proto, context)
+
+self.assertListEqual(test_stream._events, roundtrip_test_stream._events)
+self.assertSetEqual(
+test_stream.output_tags, roundtrip_test_stream.output_tags)
+self.assertEqual(test_stream.coder, roundtrip_test_stream.coder)
+
+  def test_roundtrip_proto_multi(self):
+test_stream = (TestStream(output_tags=['a', 'b'])
 
 Review comment:
   I don't think you should need to specify output_tags here since they will 
get added via the add/advance calls below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389588)
Time Spent: 61h 10m  (was: 61h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389590
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:55
Start Date: 19/Feb/20 19:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10892: [BEAM-8335] 
Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381503725
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -171,13 +180,14 @@ class TestStream(PTransform):
   time. After all of the specified elements are emitted, ceases to produce
   output.
   """
-  def __init__(self, coder=coders.FastPrimitivesCoder(), events=None):
+  def __init__(
+  self, coder=coders.FastPrimitivesCoder(), events=None, output_tags=None):
 super(TestStream, self).__init__()
 assert coder is not None
 self.coder = coder
 self.watermarks = {None: timestamp.MIN_TIMESTAMP}
 self._events = [] if events is None else list(events)
 
 Review comment:
   assert that events tags is a subset of output_tags
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389590)
Time Spent: 61.5h  (was: 61h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389589
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:55
Start Date: 19/Feb/20 19:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10892: [BEAM-8335] 
Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381501929
 
 

 ##
 File path: sdks/python/apache_beam/testing/test_stream.py
 ##
 @@ -171,13 +180,14 @@ class TestStream(PTransform):
   time. After all of the specified elements are emitted, ceases to produce
   output.
   """
 
 Review comment:
   Why do we need to declare output_tags here?
   Are you trying to allow for outputs that have no events, otherwise shouldn't 
the tags come from the list of events?
   
   The answer here impacts what we should be doing in expand in the no 
output_tags case in expand below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389589)
Time Spent: 61h 20m  (was: 61h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389591
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:55
Start Date: 19/Feb/20 19:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10892: [BEAM-8335] 
Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381507573
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/test_stream_impl.py
 ##
 @@ -45,11 +46,69 @@ class _WatermarkController(PTransform):
- If the instance receives an ElementEvent, it emits all specified elements
  to the Global Window with the event time set to the element's timestamp.
   """
+  def __init__(self, output_tag):
+self.output_tag = output_tag
+
   def get_windowing(self, _):
 return core.Windowing(window.GlobalWindows())
 
   def expand(self, pcoll):
-return pvalue.PCollection.from_(pcoll)
+ret = pvalue.PCollection.from_(pcoll)
+ret.tag = self.output_tag
+return ret
+
+
+class _ExpandableTestStream(PTransform):
+  def __init__(self, test_stream):
+self.test_stream = test_stream
+
+  def expand(self, pbegin):
+"""Expands the TestStream into the DirectRunner implementation.
+
+
+Takes the TestStream transform and creates a _TestStream -> multiplexer ->
+_WatermarkController.
+"""
+
+assert isinstance(pbegin, pvalue.PBegin)
+
+# If there is only one tag there is no need to add the multiplexer.
+if len(self.test_stream.output_tags) == 1:
+  return (
+  pbegin
+  | _TestStream(
+  self.test_stream.output_tags,
+  events=self.test_stream._events,
+  coder=self.test_stream.coder)
+  | _WatermarkController(list(self.test_stream.output_tags)[0]))
+
+# This multiplexing the  multiple output PCollections.
 
 Review comment:
   ```suggestion
   # Multiplex to the correct PCollection based upon the event tag.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389591)
Time Spent: 61.5h  (was: 61h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389592&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389592
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:55
Start Date: 19/Feb/20 19:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10892: [BEAM-8335] 
Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#discussion_r381505885
 
 

 ##
 File path: sdks/python/apache_beam/runners/direct/direct_runner.py
 ##
 @@ -73,60 +73,12 @@ class SwitchingDirectRunner(PipelineRunner):
   def is_fnapi_compatible(self):
 return BundleBasedDirectRunner.is_fnapi_compatible()
 
-  def apply_TestStream(self, transform, pbegin, options):
 
 Review comment:
   nice cleanup
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389592)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389593&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389593
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 19:56
Start Date: 19/Feb/20 19:56
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10892: [BEAM-8335] Make 
TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892#issuecomment-588418608
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389593)
Time Spent: 61h 40m  (was: 61.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread Yichi Zhang (Jira)
Yichi Zhang created BEAM-9336:
-

 Summary:  beam_PostCommit_Py_ValCont tests timeout
 Key: BEAM-9336
 URL: https://issues.apache.org/jira/browse/BEAM-9336
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Yichi Zhang


_Use this form to file an issue for test failure:_
 * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]

Initial investigation:

(Add any investigation notes so far)

_After you've filled out the above details, please [assign the issue to an 
individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
 Assignee should [treat test failures as 
high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
 helping to fix the issue or find a more appropriate owner. See [Apache Beam 
Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang updated BEAM-9336:
--
Description: 
 
 * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]

Initial investigation:

The tests seem to fail due to the pytest global timeout.

_After you've filled out the above details, please [assign the issue to an 
individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
 Assignee should [treat test failures as 
high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
 helping to fix the issue or find a more appropriate owner. See [Apache Beam 
Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._

  was:
_Use this form to file an issue for test failure:_
 * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]

Initial investigation:

(Add any investigation notes so far)

_After you've filled out the above details, please [assign the issue to an 
individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
 Assignee should [treat test failures as 
high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
 helping to fix the issue or find a more appropriate owner. See [Apache Beam 
Post-Commit Policies|https://beam.apache.org/contribute/postcommits-policies]._


>  beam_PostCommit_Py_ValCont tests timeout
> -
>
> Key: BEAM-9336
> URL: https://issues.apache.org/jira/browse/BEAM-9336
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yichi Zhang
>Priority: Minor
>  Labels: currently-failing
>
>  
>  * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]
> Initial investigation:
> The tests seem to fail due to the pytest global timeout.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9336?focusedWorklogId=389613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389613
 ]

ASF GitHub Bot logged work on BEAM-9336:


Author: ASF GitHub Bot
Created on: 19/Feb/20 20:42
Start Date: 19/Feb/20 20:42
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #10905: [BEAM-9336] 
Disable pytest timeout on python validate runner tests
URL: https://github.com/apache/beam/pull/10905
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCom

[jira] [Work logged] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9336?focusedWorklogId=389614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389614
 ]

ASF GitHub Bot logged work on BEAM-9336:


Author: ASF GitHub Bot
Created on: 19/Feb/20 20:45
Start Date: 19/Feb/20 20:45
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10905: [BEAM-9336] Disable 
pytest timeout on python validate runner tests
URL: https://github.com/apache/beam/pull/10905#issuecomment-588449231
 
 
   R: @udim Udi could you take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389614)
Time Spent: 20m  (was: 10m)

>  beam_PostCommit_Py_ValCont tests timeout
> -
>
> Key: BEAM-9336
> URL: https://issues.apache.org/jira/browse/BEAM-9336
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yichi Zhang
>Priority: Minor
>  Labels: currently-failing
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
>  * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]
> Initial investigation:
> The tests seem to fail due to the pytest global timeout.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-02-19 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-9198:
---
Summary: BeamSQL aggregation analytics functionality   (was: BeamSQL 
aggregation analytics functions )

> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-02-19 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-9198:
---
Description: 
BeamSQL has a long list of of aggregation/aggregation analytics functionalities 
to support. 


To begin with, you will need to support this syntax:

{code:sql}
analytic_function_name ( [ argument_list ] )
  OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
  )
{code}




This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed 
manner. 
4. Build benchmarks to measure performance of your implementation.



To understand what SQL analytics functionality is, you could check this great 
explanation doc: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.

To know about Beam's programming model, check: 

  was:
BeamSQL has a long list of of aggregation/aggregation analytics functionalities 
to support. 


To begin with, you will need to support this syntax:

{code:sql}
analytic_function_name ( [ argument_list ] )
  OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
  )
{code}




This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed 
manner. 
4. Build benchmarks to measure performance of your implementation.


> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9198) BeamSQL aggregation analytics functionality

2020-02-19 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-9198:
---
Description: 
BeamSQL has a long list of of aggregation/aggregation analytics functionalities 
to support. 


To begin with, you will need to support this syntax:

{code:sql}
analytic_function_name ( [ argument_list ] )
  OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
  )
{code}




This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed 
manner. 
4. Build benchmarks to measure performance of your implementation.



To understand what SQL analytics functionality is, you could check this great 
explanation doc: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.

To know about Beam's programming model, check: 
https://beam.apache.org/documentation/programming-guide/#overview



  was:
BeamSQL has a long list of of aggregation/aggregation analytics functionalities 
to support. 


To begin with, you will need to support this syntax:

{code:sql}
analytic_function_name ( [ argument_list ] )
  OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
  )
{code}




This will requires touch core components of BeamSQL:
1. SQL parser to support the syntax above.
2. SQL core to implement physical relational operator.
3. Distributed algorithms to implement a list of functions in a distributed 
manner. 
4. Build benchmarks to measure performance of your implementation.



To understand what SQL analytics functionality is, you could check this great 
explanation doc: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.

To know about Beam's programming model, check: 


> BeamSQL aggregation analytics functionality 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.
> To understand what SQL analytics functionality is, you could check this great 
> explanation doc: 
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts.
> To know about Beam's programming model, check: 
> https://beam.apache.org/documentation/programming-guide/#overview



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9021) Flink portable runner postcommit times out

2020-02-19 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040414#comment-17040414
 ] 

Kyle Weaver commented on BEAM-9021:
---

In the Flink UI, the job is reported as finished.

> Flink portable runner postcommit times out
> --
>
> Key: BEAM-9021
> URL: https://issues.apache.org/jira/browse/BEAM-9021
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-flink
>
> Build times appear to be inconsistent, with some builds taking 10-30 mins and 
> others timing out at 2 hours.
> https://builds.apache.org/job/beam_PostCommit_PortableJar_Flink/buildTimeTrend



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389616
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:05
Start Date: 19/Feb/20 21:05
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10879: [BEAM-9326] 
Make JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879#discussion_r381544015
 
 

 ##
 File path: sdks/java/core/src/test/java/org/apache/beam/sdk/PipelineTest.java
 ##
 @@ -225,7 +225,7 @@ public void testPipelineSDKExceptionHandling() {
   @Test
   @Category(ValidatesRunner.class)
   public void testMultipleApply() {
-PTransform, PCollection> myTransform 
= addSuffix("+");
+MapElements myTransform = addSuffix("+");
 
 Review comment:
   Yes I agree that the PTransform form is more abstract so better, but because 
of the capricious Java bounds checking this is not possible without doing the 
`` so I prefer to revert this change and let only  the 
JsonToRow one, which is the one I care about (and the real goal of this PR).
   Thanks for the review Kenn.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389616)
Time Spent: 0.5h  (was: 20m)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389618
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:07
Start Date: 19/Feb/20 21:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10879: [BEAM-9326] 
Make JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389618)
Time Spent: 50m  (was: 40m)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?focusedWorklogId=389617&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389617
 ]

ASF GitHub Bot logged work on BEAM-9326:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:07
Start Date: 19/Feb/20 21:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10879: [BEAM-9326] Make 
JsonToRow transform input  instead of 
URL: https://github.com/apache/beam/pull/10879#issuecomment-588470766
 
 
   Merged manually with the fixes
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389617)
Time Spent: 40m  (was: 0.5h)

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9326.

Resolution: Fixed

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040440#comment-17040440
 ] 

Udi Meiri commented on BEAM-9336:
-

I've already opened a bug here for this: 
https://issues.apache.org/jira/browse/BEAM-9271
My conclusion is that it's not pytest but some other slowness.

>  beam_PostCommit_Py_ValCont tests timeout
> -
>
> Key: BEAM-9336
> URL: https://issues.apache.org/jira/browse/BEAM-9336
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yichi Zhang
>Priority: Minor
>  Labels: currently-failing
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
>  * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]
> Initial investigation:
> The tests seem to fail due to the pytest global timeout.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9336?focusedWorklogId=389619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389619
 ]

ASF GitHub Bot logged work on BEAM-9336:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:10
Start Date: 19/Feb/20 21:10
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10905: [BEAM-9336] Disable 
pytest timeout on python validate runner tests
URL: https://github.com/apache/beam/pull/10905#issuecomment-588472147
 
 
   Please see comment in bug. I don't believe that pytest is being used at all 
to run these tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389619)
Time Spent: 0.5h  (was: 20m)

>  beam_PostCommit_Py_ValCont tests timeout
> -
>
> Key: BEAM-9336
> URL: https://issues.apache.org/jira/browse/BEAM-9336
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yichi Zhang
>Priority: Minor
>  Labels: currently-failing
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>  
>  * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]
> Initial investigation:
> The tests seem to fail due to the pytest global timeout.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9336?focusedWorklogId=389622&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389622
 ]

ASF GitHub Bot logged work on BEAM-9336:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10905: [BEAM-9336] Disable 
pytest timeout on python validate runner tests
URL: https://github.com/apache/beam/pull/10905#issuecomment-588472851
 
 
   > Please see comment in bug. I don't believe that pytest is being used at 
all to run these tests
   
   I see, thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389622)
Time Spent: 40m  (was: 0.5h)

>  beam_PostCommit_Py_ValCont tests timeout
> -
>
> Key: BEAM-9336
> URL: https://issues.apache.org/jira/browse/BEAM-9336
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yichi Zhang
>Priority: Minor
>  Labels: currently-failing
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>  
>  * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]
> Initial investigation:
> The tests seem to fail due to the pytest global timeout.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9336) beam_PostCommit_Py_ValCont tests timeout

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9336?focusedWorklogId=389631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389631
 ]

ASF GitHub Bot logged work on BEAM-9336:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #10905: [BEAM-9336] 
Disable pytest timeout on python validate runner tests
URL: https://github.com/apache/beam/pull/10905
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389631)
Time Spent: 50m  (was: 40m)

>  beam_PostCommit_Py_ValCont tests timeout
> -
>
> Key: BEAM-9336
> URL: https://issues.apache.org/jira/browse/BEAM-9336
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yichi Zhang
>Priority: Minor
>  Labels: currently-failing
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>  
>  * [[https://builds.apache.org/job/beam_PostCommit_Py_ValCont/]]
> Initial investigation:
> The tests seem to fail due to the pytest global timeout.
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389621
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381537531
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -19,29 +19,113 @@
 
 For internal use only; no backwards-compatibility guarantees.
 
-A background caching job is a job that caches events for all unbounded sources
-of a given pipeline. With Interactive Beam, one such job is started when a
-pipeline run happens (which produces a main job in contrast to the background
+A background caching job is a job that captures events for all capturable
+sources of a given pipeline. With Interactive Beam, one such job is started 
when
+a pipeline run happens (which produces a main job in contrast to the background
 caching job) and meets the following conditions:
 
-  #. The pipeline contains unbounded sources.
+  #. The pipeline contains capturable sources, configured through
+ interactive_beam.options.capturable_sources.
   #. No such background job is running.
   #. No such background job has completed successfully and the cached events 
are
- still valid (invalidated when unbounded sources change in the pipeline).
+ still valid (invalidated when capturable sources change in the pipeline).
 
 Once started, the background caching job runs asynchronously until it hits some
-cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
-will run using the deterministic replay-able cached events until they are
-invalidated.
+capture limit configured in interactive_beam.options. Meanwhile, the main job
+and future main jobs from the pipeline will run using the deterministic
+replayable captured events until they are invalidated.
 """
 
 # pytype: skip-file
 
 from __future__ import absolute_import
 
+import logging
+import threading
+import time
+
 import apache_beam as beam
-from apache_beam import runners
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.caching import streaming_cache
+from apache_beam.runners.runner import PipelineState
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class BackgroundCachingJob(object):
+  """A simple abstraction that controls necessary components of a timed and
+  [disk] space limited background caching job.
 
 Review comment:
   Why disk is in [] ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389621)
Time Spent: 62h  (was: 61h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389630
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381545257
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -34,6 +34,58 @@
 from __future__ import absolute_import
 
 from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.options import interactive_options
+
+
+class Options(interactive_options.InteractiveOptions):
+  """Options that guide how Interactive Beam works."""
+  @property
+  def enable_capture_replay(self):
+"""Whether replayable source data capture should be replayed for multiple
+PCollection evaluations and pipeline runs as long as the data captured is
+still valid."""
+return self.capture_control._enable_capture_replay
+
+  @enable_capture_replay.setter
+  def enable_capture_replay(self, value):
+"""Sets whether source data capture should be replayed. True - Enables
+capture of replayable source data so that following PCollection evaluations
+and pipeline runs always use the same data captured; False - Disables
+capture of replayable source data so that following PCollection evaluation
+and pipeline runs always use new data from sources."""
+self.capture_control._enable_capture_replay = value
+
+  @property
+  def capturable_sources(self):
+"""Interactive Beam automatically captures data from sources in this 
set."""
+return self.capture_control._capturable_sources
+
+  @property
+  def capture_duration(self):
+"""The data capture of sources ends as soon as the background caching job
+has run for this long."""
+return self.capture_control._capture_duration
+
+  @capture_duration.setter
+  def capture_duration(self, value):
+"""Sets the capture duration as a timedelta.
+
+Example::
+
+  # Sets the capture duration limit to 10 seconds.
+  interactive_beam.options.capture_duration = timedelta(seconds=10)
+  # Evicts all captured data if there is any.
+  interactive_beam.evict_captured_data()
+  # The next PCollection evaluation will capture fresh data from sources,
+  # and the data captured will be replayed until another eviction.
+"""
+self.capture_control._capture_duration = value
+
+  # TODO(BEAM-8335): add capture_size options when they are supported.
+
+
+# Users can set options to guide how Interactive Beam works.
+options = Options()
 
 Review comment:
   How do they set it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389630)
Time Spent: 62h 40m  (was: 62.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389628
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381546507
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/capture_control.py
 ##
 @@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module to control how Interactive Beam captures data from sources for
+deterministic replayable PCollection evaluation and pipeline runs.
+
+For internal use only; no backwards-compatibility guarantees.
+"""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+import logging
+from datetime import timedelta
+
+from apache_beam.io.gcp.pubsub import ReadFromPubSub
+from apache_beam.runners.interactive import background_caching_job as bcj
+from apache_beam.runners.interactive import interactive_environment as ie
+
+_LOGGER = logging.getLogger(__name__)
+_LOGGER.setLevel(logging.INFO)
+
+
+class CaptureControl(object):
+  """Options and their utilities that controls how Interactive Beam captures
+  deterministic replayable data from sources."""
+  def __init__(self):
+self._enable_capture_replay = True
+self._capturable_sources = {
+ReadFromPubSub,
+}  # yapf: disable
 
 Review comment:
   why, disable yapf here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389628)
Time Spent: 62.5h  (was: 62h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 62.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389620
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 21:12
Start Date: 19/Feb/20 21:12
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#discussion_r381536109
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/capture_control.py
 ##
 @@ -15,6 +15,16 @@
 # limitations under the License.
 #
 
+"""Module to control how Interactive Beam captures data from sources for
+deterministic replayable PCollection evaluation and pipeline runs.
+
+For internal use only; no backwards-compatibility guarantees.
+"""
+
+# pytype: skip-file
 
 Review comment:
   Why?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389620)
Time Spent: 61h 50m  (was: 61h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 61h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >