[jira] [Work logged] (BEAM-6428) Allow textual selection syntax for schema fields

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6428?focusedWorklogId=397332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397332
 ]

ASF GitHub Bot logged work on BEAM-6428:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:58
Start Date: 04/Mar/20 07:58
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #11025: 
[BEAM-6428] Improve select performance with codegen
URL: https://github.com/apache/beam/pull/11025#discussion_r387499286
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Select.java
 ##
 @@ -28,8 +28,8 @@
 import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.schemas.utils.RowSelector;
-import org.apache.beam.sdk.schemas.utils.SelectByteBuddyHelpers;
 
 Review comment:
   oeps my bad. You are right (early morning). Resoving, looking further.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397332)
Time Spent: 2h 20m  (was: 2h 10m)

> Allow textual selection syntax for schema fields
> 
>
> Key: BEAM-6428
> URL: https://issues.apache.org/jira/browse/BEAM-6428
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6428) Allow textual selection syntax for schema fields

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6428?focusedWorklogId=397326&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397326
 ]

ASF GitHub Bot logged work on BEAM-6428:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:44
Start Date: 04/Mar/20 07:44
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11025: [BEAM-6428] 
Improve select performance with codegen
URL: https://github.com/apache/beam/pull/11025#issuecomment-594372891
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397326)
Time Spent: 2h 10m  (was: 2h)

> Allow textual selection syntax for schema fields
> 
>
> Key: BEAM-6428
> URL: https://issues.apache.org/jira/browse/BEAM-6428
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6428) Allow textual selection syntax for schema fields

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6428?focusedWorklogId=397324&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397324
 ]

ASF GitHub Bot logged work on BEAM-6428:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:43
Start Date: 04/Mar/20 07:43
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11025: [BEAM-6428] 
Improve select performance with codegen
URL: https://github.com/apache/beam/pull/11025#discussion_r387493703
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Select.java
 ##
 @@ -28,8 +28,8 @@
 import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.schemas.utils.RowSelector;
-import org.apache.beam.sdk.schemas.utils.SelectByteBuddyHelpers;
 
 Review comment:
   I think you are looking at an individual commit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397324)
Time Spent: 2h  (was: 1h 50m)

> Allow textual selection syntax for schema fields
> 
>
> Key: BEAM-6428
> URL: https://issues.apache.org/jira/browse/BEAM-6428
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6428) Allow textual selection syntax for schema fields

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6428?focusedWorklogId=397323&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397323
 ]

ASF GitHub Bot logged work on BEAM-6428:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:43
Start Date: 04/Mar/20 07:43
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11025: [BEAM-6428] 
Improve select performance with codegen
URL: https://github.com/apache/beam/pull/11025#discussion_r387493491
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Select.java
 ##
 @@ -28,8 +28,8 @@
 import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.schemas.utils.RowSelector;
-import org.apache.beam.sdk.schemas.utils.SelectByteBuddyHelpers;
 
 Review comment:
   This PR introduces SelectByteBuddyHelpers.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397323)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow textual selection syntax for schema fields
> 
>
> Key: BEAM-6428
> URL: https://issues.apache.org/jira/browse/BEAM-6428
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6428) Allow textual selection syntax for schema fields

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6428?focusedWorklogId=397317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397317
 ]

ASF GitHub Bot logged work on BEAM-6428:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:36
Start Date: 04/Mar/20 07:36
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on pull request #11025: 
[BEAM-6428] Improve select performance with codegen
URL: https://github.com/apache/beam/pull/11025#discussion_r387491023
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Select.java
 ##
 @@ -28,8 +28,8 @@
 import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
 import org.apache.beam.sdk.schemas.Schema;
 import org.apache.beam.sdk.schemas.utils.RowSelector;
-import org.apache.beam.sdk.schemas.utils.SelectByteBuddyHelpers;
 
 Review comment:
   I was wondering if SelectByteBuddyHelpers is still used after this removal 
(and would be able to be deleted). But I don't find it 
(org.apache.beam.sdk.schemas.utils.SelectByteBuddyHelpers) in the current 
master.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397317)
Time Spent: 1h 40m  (was: 1.5h)

> Allow textual selection syntax for schema fields
> 
>
> Key: BEAM-6428
> URL: https://issues.apache.org/jira/browse/BEAM-6428
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Affects Versions: 2.9.0
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9424) Grouping By LogicalTypes is not supported

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9424?focusedWorklogId=397315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397315
 ]

ASF GitHub Bot logged work on BEAM-9424:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:35
Start Date: 04/Mar/20 07:35
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11034: [BEAM-9424] Allow 
grouping by LogicalType
URL: https://github.com/apache/beam/pull/11034#issuecomment-594370177
 
 
   I'm not sure I understand the problem or the solution. Why don't logical 
types work in Group? Why does creating a new LogicalTypeCoder solve this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397315)
Time Spent: 2h 50m  (was: 2h 40m)

> Grouping By LogicalTypes is not supported
> -
>
> Key: BEAM-9424
> URL: https://issues.apache.org/jira/browse/BEAM-9424
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: fdiazgon
>Assignee: fdiazgon
>Priority: Minor
>  Labels: sql
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Creating a schema from a BQ schema that has either TIME, DATE or DATETIME 
> columns, and grouping by one of these fields throws NullPointerException.
> {code:java}
> Pipeline pipeline = Pipeline.create();
> Schema beamSchemaWithLogicalTypes =
> BigQueryUtils.fromTableSchema(
> new TableSchema()
> .setFields(
> Arrays.asList(
> new TableFieldSchema().setName("fTime").setType("TIME"),
> new TableFieldSchema().setName("fDate").setType("DATE"),
> new 
> TableFieldSchema().setName("fDatetime").setType("DATETIME";
> Row row =
> Row.withSchema(beamSchemaWithLogicalTypes)
> .addValues(
> DateTime.parse("2020-02-02"),
> DateTime.parse("2020-02-02"),
> DateTime.parse("2020-02-02T00:00:00"))
> .build();
> PCollection outputRow =
> pipeline
> .apply(Create.of(row))
> .setRowSchema(beamSchemaWithLogicalTypes)
> .apply(
> SqlTransform.query(
> "SELECT fTime, fDate, fDatetime FROM PCOLLECTION GROUP BY 
> fTime, fDate, fDatetime"));
> pipeline.run();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9396) Docker image names in Jenkins jobs don't match generated ones

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9396?focusedWorklogId=397314&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397314
 ]

ASF GitHub Bot logged work on BEAM-9396:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:31
Start Date: 04/Mar/20 07:31
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #11026: [BEAM-9396] Fix 
Docker image name in CoGBK test for Python on Flink
URL: https://github.com/apache/beam/pull/11026#issuecomment-594368961
 
 
   Run Load Tests Python CoGBK Flink Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397314)
Time Spent: 2h 40m  (was: 2.5h)

> Docker image names in Jenkins jobs don't match generated ones
> -
>
> Key: BEAM-9396
> URL: https://issues.apache.org/jira/browse/BEAM-9396
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9396) Docker image names in Jenkins jobs don't match generated ones

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9396?focusedWorklogId=397311&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397311
 ]

ASF GitHub Bot logged work on BEAM-9396:


Author: ASF GitHub Bot
Created on: 04/Mar/20 07:24
Start Date: 04/Mar/20 07:24
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #11026: [BEAM-9396] Fix 
Docker image name in CoGBK test for Python on Flink
URL: https://github.com/apache/beam/pull/11026#issuecomment-594366639
 
 
   run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397311)
Time Spent: 2.5h  (was: 2h 20m)

> Docker image names in Jenkins jobs don't match generated ones
> -
>
> Key: BEAM-9396
> URL: https://issues.apache.org/jira/browse/BEAM-9396
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397262&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397262
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 05:35
Start Date: 04/Mar/20 05:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594336522
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397262)
Time Spent: 87h 40m  (was: 87.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 87h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397243
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 04:36
Start Date: 04/Mar/20 04:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594322268
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397243)
Time Spent: 87.5h  (was: 87h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 87.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397241
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 04:33
Start Date: 04/Mar/20 04:33
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594321561
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397241)
Time Spent: 87h 20m  (was: 87h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 87h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6831) python sdk WriteToBigQuery excessive usage of metered API

2020-03-03 Thread Keiji Yoshida (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050880#comment-17050880
 ] 

Keiji Yoshida commented on BEAM-6831:
-

That is because the bigquery.tables.get API is called every time a bundle in a 
PCollection is processed in Apache Beam 2.10.0 
([code|https://github.com/apache/beam/blob/v2.10.0/sdks/python/apache_beam/io/gcp/bigquery.py#L1365-L1367]).

In the latest version of Apache Beam (2.19.0), the bigquery.tables.get API is 
not called as long as `create_disposition` is set to `CREATE_NEVER` 
([code|https://github.com/apache/beam/blob/v2.19.0/sdks/python/apache_beam/io/gcp/bigquery.py#L989-L1009]).
 So, you can avoid the rate limit error by using Apache Beam 2.19.0 and setting 
`create_disposition` to `CREATE_NEVER`.

> python sdk WriteToBigQuery excessive usage of metered API
> -
>
> Key: BEAM-6831
> URL: https://issues.apache.org/jira/browse/BEAM-6831
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.10.0
>Reporter: Pesach Weinstock
>Assignee: Pablo Estrada
>Priority: Major
>  Labels: bigquery, dataflow, gcp, python
> Attachments: apache-beam-py-sdk-gcp-bq-api-issue.png
>
>
> Right now, there is a potential issue with the python sdk where 
> {{beam.io.gcp.bigquery.WriteToBigQuery}} calls the following api more often 
> than needed:
> [https://www.googleapis.com/bigquery/v2/projects//datasets//tables/?alt=json|https://www.googleapis.com/bigquery/v2/projects/%3Cproject-name%3E/datasets/%3Cdataset-name%3E/tables/%3Ctable-name%3E?alt=json]
> The above request falls under specific bigquery API quotas which are excluded 
> from bigquery streaming inserts. When used in a streaming pipeline, we hit 
> this quota pretty quickly, and cannot proceed to write any further data to 
> bigquery.
> Dispositions being used are:
>  * create_disposition: {{beam.io.BigQueryDisposition.CREATE_NEVER}}
>  * write_disposition: {{beam.io.BigQueryDisposition.WRITE_APPEND}}
> This is currently blocking us from using bigqueryIO in a streaming pipeline 
> to write to bigquery, and required us to formally request an API quota 
> increase from Google to temporarily correct the situation.
> Our pipeline uses DataflowRunner. Error seen is below, and in attached 
> screenshot of stackdriver trace.
> {code:java}
>   "errors": [
> {
>   "message": "Exceeded rate limits: too many api requests per user per 
> method for this user_method. For more information, see 
> https://cloud.google.com/bigquery/troubleshooting-errors";,
>   "domain": "usageLimits",
>   "reason": "rateLimitExceeded"
> }
>   ],
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9413) [beam_PostCommit_Py_ValCont] build failed

2020-03-03 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050757#comment-17050757
 ] 

Rui Wang commented on BEAM-9413:


[~hannahjiang]

I will see how if I can include your PR into RC0.

> [beam_PostCommit_Py_ValCont] build failed
> -
>
> Key: BEAM-9413
> URL: https://issues.apache.org/jira/browse/BEAM-9413
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: currently-failing
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> See [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/5706/]
> Error:
>  
> *16:12:13* The push refers to repository 
> [us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk]*16:12:13* An image does 
> not exist locally with the tag: 
> us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk*16:12:14* Build step 
> 'Execute shell' marked build as failure*16:12:15* Sending e-mails to: 
> bui...@beam.apache.org*16:12:15* Recording test results*16:12:16* ERROR: Step 
> 'Publish JUnit test result report' failed: No test report files were found. 
> Configuration error?*16:12:18* No emails were triggered.*16:12:18* Finished: 
> FAILURE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397209
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 03:59
Start Date: 04/Mar/20 03:59
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594313852
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397209)
Time Spent: 87h 10m  (was: 87h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 87h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9432) Create a separate expansion service package.

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9432?focusedWorklogId=397208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397208
 ]

ASF GitHub Bot logged work on BEAM-9432:


Author: ASF GitHub Bot
Created on: 04/Mar/20 03:55
Start Date: 04/Mar/20 03:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #11035: [BEAM-9432] Move 
expansion service into its own project.
URL: https://github.com/apache/beam/pull/11035#issuecomment-594312772
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397208)
Time Spent: 20m  (was: 10m)

> Create a separate expansion service package.
> 
>
> Key: BEAM-9432
> URL: https://issues.apache.org/jira/browse/BEAM-9432
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=397204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397204
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 04/Mar/20 03:30
Start Date: 04/Mar/20 03:30
Worklog Time Spent: 10m 
  Work Description: pulasthi commented on issue #10888: [BEAM-7304] 
Twister2 Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-594306907
 
 
   @iemejia Thanks, no inconvenience at all. Hopefully, we can get this into 
2.21 as you mentioned
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397204)
Time Spent: 13h 40m  (was: 13.5h)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397200
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:52
Start Date: 04/Mar/20 02:52
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387422917
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -43,6 +43,55 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+# By `format(customized_script=xxx)`, the given `customized_script` is
+# guaranteed to be executed within access to a jquery with datatable plugin
+# configured which is useful so that any `customized_script` is resilient to
+# browser refresh. Inside `customized_script`, use `$` as jQuery.
+_JQUERY_WITH_DATATABLE_TEMPLATE = """
+if (typeof window.jquery341 == 'undefined') {{
+  var jqueryScript = document.createElement('script');
+  jqueryScript.src = 
'https://code.jquery.com/jquery-3.4.1.slim.min.js';
+  jqueryScript.type = 'text/javascript';
+  jqueryScript.onload = function() {{
+var datatableScript = document.createElement('script');
+datatableScript.src = 
'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';
+datatableScript.type = 'text/javascript';
+datatableScript.onload = function() {{
+  window.jquery341 = jQuery.noConflict(true);
+  window.jquery341(document).ready(function($){{
+{customized_script}
+  }});
+}}
+document.head.appendChild(datatableScript);
+  }};
+  document.head.appendChild(jqueryScript);
+}} else {{
+  window.jquery341(document).ready(function($){{
+{customized_script}
+  }});
+}}"""
+
+_HTML_IMPORT_TEMPLATE = """
 
 Review comment:
   Added comments.
   
   https://developer.mozilla.org/en-US/docs/Web/Web_Components/HTML_Imports 
explains it.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397200)
Time Spent: 54h 20m  (was: 54h 10m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 54h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397197
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:47
Start Date: 04/Mar/20 02:47
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387420719
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -43,6 +43,55 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+# By `format(customized_script=xxx)`, the given `customized_script` is
+# guaranteed to be executed within access to a jquery with datatable plugin
+# configured which is useful so that any `customized_script` is resilient to
+# browser refresh. Inside `customized_script`, use `$` as jQuery.
+_JQUERY_WITH_DATATABLE_TEMPLATE = """
+if (typeof window.jquery341 == 'undefined') {{
 
 Review comment:
   I've changed it to `interactive_beam_jquery`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397197)
Time Spent: 54h 10m  (was: 54h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 54h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397195
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:42
Start Date: 04/Mar/20 02:42
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387419572
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -238,31 +322,57 @@ def _display_dive(self, data, update=None):
   display(HTML(html))
 
   def _display_overview(self, data, update=None):
+if (not data.empty and self._include_window_info and
+all(column in data.columns
+for column in ('event_time', 'windows', 'pane_info'))):
+  data = data.drop(['event_time', 'windows', 'pane_info'], axis=1)
+
 gfsg = GenericFeatureStatisticsGenerator()
 proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': data}])
 protostr = base64.b64encode(proto.SerializeToString()).decode('utf-8')
 if update:
   script = _OVERVIEW_SCRIPT_TEMPLATE.format(
-  display_id=update, protostr=protostr)
+  display_id=update._overview_display_id, protostr=protostr)
   display_javascript(Javascript(script))
 else:
   html = _OVERVIEW_HTML_TEMPLATE.format(
   display_id=self._overview_display_id, protostr=protostr)
   display(HTML(html))
 
   def _display_dataframe(self, data, update=None):
-if update:
-  table_id = 'table_{}'.format(update)
-  html = _DATAFRAME_PAGINATION_TEMPLATE.format(
-  dataframe_html=data.to_html(notebook=True, table_id=table_id),
-  table_id=table_id)
-  update_display(HTML(html), display_id=update)
+table_id = 'table_{}'.format(
+update._df_display_id if update else self._df_display_id)
+columns = [{
+'title': ''
+}] + [{
+'title': str(column)
+} for column in data.columns]
+format_window_info_in_dataframe(data)
+rows = data.applymap(lambda x: str(x)).to_dict('split')['data']
 
 Review comment:
   Added the comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397195)
Time Spent: 54h  (was: 53h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 54h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397194
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:37
Start Date: 04/Mar/20 02:37
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387415941
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
 
 Review comment:
   If this fails, it means the initially displayed widgets have been cleared 
from the DOM or the initial display hasn't completed yet (maybe because of some 
racing conditions). NOOP should be the right way to handle it because it either 
means the user has cleared the output or the script has no target to execute on.
   
   The error is supposed to be logged in the console. However, if not caught, 
it also gets displayed in notebook output areas. By doing this, we kept the log 
and also avoid the output area pollution.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397194)
Time Spent: 53h 50m  (was: 53h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 53h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=397190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397190
 ]

ASF GitHub Bot logged work on BEAM-9383:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:30
Start Date: 04/Mar/20 02:30
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #11039: [BEAM-9383] 
Staging Dataflow artifacts from environment
URL: https://github.com/apache/beam/pull/11039
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![B

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397189
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:29
Start Date: 04/Mar/20 02:29
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387415941
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
 
 Review comment:
   If this fails, it means the initially displayed widgets have been cleared 
from the DOM or the initial display hasn't completed yet (maybe because of some 
racing conditions). NOOP should be the right way to do so.
   
   The error is supposed to be logged in the console. However, if not caught, 
it also gets displayed in notebook output areas. By doing this, we kept the log 
and also avoid the output area pollution.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397189)
Time Spent: 53h 40m  (was: 53.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 53h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397185&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397185
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:26
Start Date: 04/Mar/20 02:26
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387414898
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
+  }}"""
+_OVERVIEW_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=397186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397186
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 04/Mar/20 02:26
Start Date: 04/Mar/20 02:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #11038: [BEAM-7746] 
More typing fixes
URL: https://github.com/apache/beam/pull/11038
 
 
   Another round of fixes.  
   
   After this and the filesystem PR are merged, there will be only 32 errors 
remaining.  I'd like to hand the remainder of the errors over to @robertwb and 
@udim to resolve, since many of the require some specific Beam knowledge to 
solve.   I will of course be happy to help (we could connect on slack via the 
Beam channel?).
   
   Then we can put the PythonLint job into effect!
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spar

[jira] [Commented] (BEAM-9434) Performance improvements processiong a large number of Avro files in S3+Spark

2020-03-03 Thread Emiliano Capoccia (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050688#comment-17050688
 ] 

Emiliano Capoccia commented on BEAM-9434:
-

In the case outlined of a large number of very small (kb) avro files, the idea 
is to expose a new hint in the AvroIO class that can handle the reading of the 
input files with a pre determined number of parallel tasks.

Both extremes of having a very high or a very low number of tasks should be 
avoided, as they are suboptimal in terms of performance: too many tasks yield 
to very high overhead whereas a too few tasks (or a single one) result in an 
unacceptable serialisation of reading on too little node, with the cluster 
being under utilised. 

In the tests that I carried out, I was reading 6578 Avro files from S3, each 
containing a single record.

The performance of the reading using the proposed pull request #11037 improved 
using 10 partitions, from 16 minutes to 2.3 minutes for performing the same 
exact work.

Even more importantly, the memory used by every node is 1/10th roughly of the 
case with a single node.

*Reference run*, 6578 files, 1 task/executor, shuffle read 164kb, 6578 records, 
shuffle write 58Mb, 16 minutes execution time.

*PR #11037*, 10 tasks/executors, 660 files per task average, totalling 6578; 
23kb average shuffle read per task, 6 Mb average shuffle write per task, 2.3 
minutes execution time per executor in parallel.

> Performance improvements processiong a large number of Avro files in S3+Spark
> -
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a performance issue when processing in Spark on K8S a large number 
> of small Avro files (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn and the cluster busy doing coordination of tiny tasks with 
> high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9415) fix postcommit xlang validate runner

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9415?focusedWorklogId=397166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397166
 ]

ASF GitHub Bot logged work on BEAM-9415:


Author: ASF GitHub Bot
Created on: 04/Mar/20 01:22
Start Date: 04/Mar/20 01:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11018: 
[BEAM-9415] fix postcommit xvr tests
URL: https://github.com/apache/beam/pull/11018
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397166)
Time Spent: 1.5h  (was: 1h 20m)

> fix postcommit xlang validate runner
> 
>
> Key: BEAM-9415
> URL: https://issues.apache.org/jira/browse/BEAM-9415
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> broken since [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/1838/]
> The proposed PR checks whether the coder is compatible with Java SDK before 
> rehydrating it from expanded components. The coder id renaming is only needed 
> for Java SDK compatible coders so the change is safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9434) Performance improvements processiong a large number of Avro files in S3+Spark

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9434?focusedWorklogId=397161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397161
 ]

ASF GitHub Bot logged work on BEAM-9434:


Author: ASF GitHub Bot
Created on: 04/Mar/20 01:15
Start Date: 04/Mar/20 01:15
Worklog Time Spent: 10m 
  Work Description: ecapoccia commented on issue #11037: [BEAM-9434] 
performance improvements reading many Avro files in S3
URL: https://github.com/apache/beam/pull/11037#issuecomment-594254609
 
 
   @iemejia @andeb please review and let me have your comments. I have more 
evidence of the tests that I've been carried out, and I'm happy with the 
performance gains. However, I'm keen to understand if the approach is sound. I 
look forward to hearing from you
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397161)
Time Spent: 20m  (was: 10m)

> Performance improvements processiong a large number of Avro files in S3+Spark
> -
>
> Key: BEAM-9434
> URL: https://issues.apache.org/jira/browse/BEAM-9434
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws, sdk-java-core
>Affects Versions: 2.19.0
>Reporter: Emiliano Capoccia
>Assignee: Emiliano Capoccia
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There is a performance issue when processing in Spark on K8S a large number 
> of small Avro files (tens of thousands or more).
> The recommended way of reading a pattern of Avro files in Beam is by means of:
>  
> {code:java}
> PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
> .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
> {code}
> However, in the case of many small files the above results in the entire 
> reading taking place in a single task/node, which is considerably slow and 
> has scalability issues.
> The option of omitting the hint is not viable, as it results in too many 
> tasks being spawn and the cluster busy doing coordination of tiny tasks with 
> high overhead.
> There are a few workarounds on the internet which mainly revolve around 
> compacting the input files before processing, so that a reduced number of 
> bulky files is processed in parallel.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9434) Performance improvements processiong a large number of Avro files in S3+Spark

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9434?focusedWorklogId=397160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397160
 ]

ASF GitHub Bot logged work on BEAM-9434:


Author: ASF GitHub Bot
Created on: 04/Mar/20 01:13
Start Date: 04/Mar/20 01:13
Worklog Time Spent: 10m 
  Work Description: ecapoccia commented on pull request #11037: [BEAM-9434] 
performance improvements reading many Avro files in S3
URL: https://github.com/apache/beam/pull/11037
 
 
   The implemented solution consists in an extension of the FileSystem classes 
for S3, that allows filtering of the matched objects. The filter consists in a 
mod hash function of the filename. 
   
   This mechanism is exploited by a parallel partitioned read of the files in 
the bucket, realised in the AvroIO and FileIO classes by means of the new hint 
. withHintPartitionInto(int) 
   
   The net effect is that a controlled number of tasks is spawn to different 
executors, each having a different partition number; each executor reads the 
entire list of files in the bucket, but matches only those files whose modhash 
matches the partition number. 
   
   In this way, all the executors in the cluster read in parallel.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostComm

[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=397156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397156
 ]

ASF GitHub Bot logged work on BEAM-9274:


Author: ASF GitHub Bot
Created on: 04/Mar/20 01:04
Start Date: 04/Mar/20 01:04
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10810: [BEAM-9274] Support 
running yapf in a git pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-594251714
 
 
   @robertwb are you still opposed to this? I'd like to get this in
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397156)
Time Spent: 3h 20m  (was: 3h 10m)

> Support running yapf in a git pre-commit hook
> -
>
> Key: BEAM-9274
> URL: https://issues.apache.org/jira/browse/BEAM-9274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> As a developer I want to be able to automatically run yapf before I make a 
> commit so that I don't waste time with failures on jenkins. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397155
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 01:00
Start Date: 04/Mar/20 01:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594250601
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397155)
Time Spent: 87h  (was: 86h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 87h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397154&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397154
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 01:00
Start Date: 04/Mar/20 01:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594250531
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397154)
Time Spent: 86h 50m  (was: 86h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 86h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397153&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397153
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:43
Start Date: 04/Mar/20 00:43
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594246062
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397153)
Time Spent: 86h 40m  (was: 86.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 86h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397152&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397152
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:41
Start Date: 04/Mar/20 00:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594245646
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397152)
Time Spent: 86.5h  (was: 86h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 86.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397150&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397150
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387371561
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -19,15 +19,298 @@
 
 from __future__ import absolute_import
 
+import itertools
+import os
+import shutil
+import tempfile
+import time
+
+import apache_beam as beam
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileRecord
 from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.cache_manager import CacheManager
+from apache_beam.runners.interactive.cache_manager import 
SafeFastPrimitivesCoder
+from apache_beam.testing.test_stream import ReverseTestStream
 from apache_beam.utils import timestamp
 
 
-class StreamingCache(object):
+class StreamingCacheSink(beam.PTransform):
+  """A PTransform that writes TestStreamFile(Header|Records)s to file.
+
+  This transform takes in an arbitrary element stream and writes the 
best-effort
+  list of TestStream events (as TestStreamFileRecords) to file.
+
+  Note that this PTransform is assumed to be only run on a single machine where
+  the following assumptions are correct: elements come in ordered, no two
+  transforms are writing to the same file. This PTransform is assumed to only
+  run correctly with the DirectRunner.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  filename,
+  sample_resolution_sec,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._filename = filename
+self._sample_resolution_sec = sample_resolution_sec
+self._coder = coder
+self._path = os.path.join(self._cache_dir, self._filename)
+
+  @property
+  def path(self):
+"""Returns the path the sink leads to."""
+return self._path
+
+  def expand(self, pcoll):
+class StreamingWriteToText(beam.DoFn):
+  """DoFn that performs the writing.
+
+  Note that the other file writing methods cannot be used in streaming
+  contexts.
+  """
+  def __init__(self, full_path, coder=SafeFastPrimitivesCoder()):
+self._full_path = full_path
+self._coder = coder
+
+# Try and make the given path.
+os.makedirs(os.path.dirname(full_path), exist_ok=True)
+
+  def start_bundle(self):
+# Open the file for 'append-mode' and writing 'bytes'.
+self._fh = open(self._full_path, 'ab')
+
+  def finish_bundle(self):
+self._fh.close()
+
+  def process(self, e):
+"""Appends the given element to the file.
+"""
+self._fh.write(self._coder.encode(e))
+self._fh.write(b'\n')
+
+return (
+pcoll
+| ReverseTestStream(
+output_tag=self._filename,
+sample_resolution_sec=self._sample_resolution_sec,
+output_format=ReverseTestStream.Format.
+SERIALIZED_TEST_STREAM_FILE_RECORDS,
+coder=self._coder)
+| beam.ParDo(
+StreamingWriteToText(full_path=self._path, coder=self._coder)))
+
+
+class StreamingCacheSource:
+  """A class that reads and parses TestStreamFile(Header|Reader)s.
+
+  This source operates in the following way:
+
+1. Wait for up to `timeout_secs` for the file to be available.
+2. Read, parse, and emit the entire contents of the file
+3. Wait for more events to come or until `is_cache_complete` returns True
+4. If there are more events, then go to 2
+5. Otherwise, stop emitting.
+
+  This class is used to read from file and send its to the TestStream via the
+  StreamingCacheManager.Reader.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  labels,
+  is_cache_complete=None,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._coder = coder
+self._labels = labels
+self._is_cache_complete = (
+is_cache_complete if is_cache_complete else lambda: True)
+
+  def _wait_until_file_exists(self, timeout_secs=30):
+"""Blocks until the file exists for a maximum of timeout_secs.
+"""
+f = None
+now_secs = time.time()
+timeout_timestamp_secs = now_secs + timeout_secs
+
+# Wait for up to `timeout_secs` for the file to be available.
+while f is None and now_secs < timeout_timestamp_secs:
+  now_secs = time.time()
+  try:
+path = os.path.join(self

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397146&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397146
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387366883
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -19,15 +19,298 @@
 
 from __future__ import absolute_import
 
+import itertools
+import os
+import shutil
+import tempfile
+import time
+
+import apache_beam as beam
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileRecord
 from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.cache_manager import CacheManager
+from apache_beam.runners.interactive.cache_manager import 
SafeFastPrimitivesCoder
+from apache_beam.testing.test_stream import ReverseTestStream
 from apache_beam.utils import timestamp
 
 
-class StreamingCache(object):
+class StreamingCacheSink(beam.PTransform):
+  """A PTransform that writes TestStreamFile(Header|Records)s to file.
+
+  This transform takes in an arbitrary element stream and writes the 
best-effort
 
 Review comment:
   Why is it best-effort?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397146)
Time Spent: 86h  (was: 85h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 86h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397149&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397149
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387367517
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -19,15 +19,298 @@
 
 from __future__ import absolute_import
 
+import itertools
+import os
+import shutil
+import tempfile
+import time
+
+import apache_beam as beam
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileRecord
 from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.cache_manager import CacheManager
+from apache_beam.runners.interactive.cache_manager import 
SafeFastPrimitivesCoder
+from apache_beam.testing.test_stream import ReverseTestStream
 from apache_beam.utils import timestamp
 
 
-class StreamingCache(object):
+class StreamingCacheSink(beam.PTransform):
+  """A PTransform that writes TestStreamFile(Header|Records)s to file.
+
+  This transform takes in an arbitrary element stream and writes the 
best-effort
+  list of TestStream events (as TestStreamFileRecords) to file.
+
+  Note that this PTransform is assumed to be only run on a single machine where
+  the following assumptions are correct: elements come in ordered, no two
+  transforms are writing to the same file. This PTransform is assumed to only
+  run correctly with the DirectRunner.
 
 Review comment:
   Can you please file a JIRA to make this more general? (I don't see anything 
yet that would preclude writing something that works for all runners.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397149)
Time Spent: 86h 10m  (was: 86h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 86h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397148
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387368473
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -19,15 +19,298 @@
 
 from __future__ import absolute_import
 
+import itertools
+import os
+import shutil
+import tempfile
+import time
+
+import apache_beam as beam
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileRecord
 from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.cache_manager import CacheManager
+from apache_beam.runners.interactive.cache_manager import 
SafeFastPrimitivesCoder
+from apache_beam.testing.test_stream import ReverseTestStream
 from apache_beam.utils import timestamp
 
 
-class StreamingCache(object):
+class StreamingCacheSink(beam.PTransform):
+  """A PTransform that writes TestStreamFile(Header|Records)s to file.
+
+  This transform takes in an arbitrary element stream and writes the 
best-effort
+  list of TestStream events (as TestStreamFileRecords) to file.
+
+  Note that this PTransform is assumed to be only run on a single machine where
+  the following assumptions are correct: elements come in ordered, no two
+  transforms are writing to the same file. This PTransform is assumed to only
+  run correctly with the DirectRunner.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  filename,
+  sample_resolution_sec,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._filename = filename
+self._sample_resolution_sec = sample_resolution_sec
+self._coder = coder
+self._path = os.path.join(self._cache_dir, self._filename)
+
+  @property
+  def path(self):
+"""Returns the path the sink leads to."""
+return self._path
+
+  def expand(self, pcoll):
+class StreamingWriteToText(beam.DoFn):
+  """DoFn that performs the writing.
+
+  Note that the other file writing methods cannot be used in streaming
+  contexts.
+  """
+  def __init__(self, full_path, coder=SafeFastPrimitivesCoder()):
+self._full_path = full_path
+self._coder = coder
+
+# Try and make the given path.
+os.makedirs(os.path.dirname(full_path), exist_ok=True)
+
+  def start_bundle(self):
+# Open the file for 'append-mode' and writing 'bytes'.
+self._fh = open(self._full_path, 'ab')
+
+  def finish_bundle(self):
+self._fh.close()
+
+  def process(self, e):
+"""Appends the given element to the file.
+"""
+self._fh.write(self._coder.encode(e))
+self._fh.write(b'\n')
+
+return (
+pcoll
+| ReverseTestStream(
+output_tag=self._filename,
+sample_resolution_sec=self._sample_resolution_sec,
+output_format=ReverseTestStream.Format.
+SERIALIZED_TEST_STREAM_FILE_RECORDS,
+coder=self._coder)
+| beam.ParDo(
+StreamingWriteToText(full_path=self._path, coder=self._coder)))
+
+
+class StreamingCacheSource:
+  """A class that reads and parses TestStreamFile(Header|Reader)s.
+
+  This source operates in the following way:
+
+1. Wait for up to `timeout_secs` for the file to be available.
+2. Read, parse, and emit the entire contents of the file
+3. Wait for more events to come or until `is_cache_complete` returns True
+4. If there are more events, then go to 2
+5. Otherwise, stop emitting.
+
+  This class is used to read from file and send its to the TestStream via the
+  StreamingCacheManager.Reader.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  labels,
+  is_cache_complete=None,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._coder = coder
+self._labels = labels
+self._is_cache_complete = (
+is_cache_complete if is_cache_complete else lambda: True)
+
+  def _wait_until_file_exists(self, timeout_secs=30):
+"""Blocks until the file exists for a maximum of timeout_secs.
+"""
+f = None
+now_secs = time.time()
+timeout_timestamp_secs = now_secs + timeout_secs
+
+# Wait for up to `timeout_secs` for the file to be available.
+while f is None and now_secs < timeout_timestamp_secs:
+  now_secs = time.time()
+  try:
+path = os.path.join(self

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397147
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387373091
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -19,15 +19,298 @@
 
 from __future__ import absolute_import
 
+import itertools
+import os
+import shutil
+import tempfile
+import time
+
+import apache_beam as beam
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
TestStreamFileRecord
 from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.cache_manager import CacheManager
+from apache_beam.runners.interactive.cache_manager import 
SafeFastPrimitivesCoder
+from apache_beam.testing.test_stream import ReverseTestStream
 from apache_beam.utils import timestamp
 
 
-class StreamingCache(object):
+class StreamingCacheSink(beam.PTransform):
+  """A PTransform that writes TestStreamFile(Header|Records)s to file.
+
+  This transform takes in an arbitrary element stream and writes the 
best-effort
+  list of TestStream events (as TestStreamFileRecords) to file.
+
+  Note that this PTransform is assumed to be only run on a single machine where
+  the following assumptions are correct: elements come in ordered, no two
+  transforms are writing to the same file. This PTransform is assumed to only
+  run correctly with the DirectRunner.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  filename,
+  sample_resolution_sec,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._filename = filename
+self._sample_resolution_sec = sample_resolution_sec
+self._coder = coder
+self._path = os.path.join(self._cache_dir, self._filename)
+
+  @property
+  def path(self):
+"""Returns the path the sink leads to."""
+return self._path
+
+  def expand(self, pcoll):
+class StreamingWriteToText(beam.DoFn):
+  """DoFn that performs the writing.
+
+  Note that the other file writing methods cannot be used in streaming
+  contexts.
+  """
+  def __init__(self, full_path, coder=SafeFastPrimitivesCoder()):
+self._full_path = full_path
+self._coder = coder
+
+# Try and make the given path.
+os.makedirs(os.path.dirname(full_path), exist_ok=True)
+
+  def start_bundle(self):
+# Open the file for 'append-mode' and writing 'bytes'.
+self._fh = open(self._full_path, 'ab')
+
+  def finish_bundle(self):
+self._fh.close()
+
+  def process(self, e):
+"""Appends the given element to the file.
+"""
+self._fh.write(self._coder.encode(e))
+self._fh.write(b'\n')
+
+return (
+pcoll
+| ReverseTestStream(
+output_tag=self._filename,
+sample_resolution_sec=self._sample_resolution_sec,
+output_format=ReverseTestStream.Format.
+SERIALIZED_TEST_STREAM_FILE_RECORDS,
+coder=self._coder)
+| beam.ParDo(
+StreamingWriteToText(full_path=self._path, coder=self._coder)))
+
+
+class StreamingCacheSource:
+  """A class that reads and parses TestStreamFile(Header|Reader)s.
+
+  This source operates in the following way:
+
+1. Wait for up to `timeout_secs` for the file to be available.
+2. Read, parse, and emit the entire contents of the file
+3. Wait for more events to come or until `is_cache_complete` returns True
+4. If there are more events, then go to 2
+5. Otherwise, stop emitting.
+
+  This class is used to read from file and send its to the TestStream via the
+  StreamingCacheManager.Reader.
+  """
+  def __init__(
+  self,
+  cache_dir,
+  labels,
+  is_cache_complete=None,
+  coder=SafeFastPrimitivesCoder()):
+self._cache_dir = cache_dir
+self._coder = coder
+self._labels = labels
+self._is_cache_complete = (
+is_cache_complete if is_cache_complete else lambda: True)
+
+  def _wait_until_file_exists(self, timeout_secs=30):
+"""Blocks until the file exists for a maximum of timeout_secs.
+"""
+f = None
+now_secs = time.time()
+timeout_timestamp_secs = now_secs + timeout_secs
+
+# Wait for up to `timeout_secs` for the file to be available.
+while f is None and now_secs < timeout_timestamp_secs:
+  now_secs = time.time()
+  try:
+path = os.path.join(self

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397144
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387345481
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/cache_manager.py
 ##
 @@ -167,20 +177,35 @@ def load_pcoder(self, *labels):
 self._saved_pcoders[self._path(*labels)])
 
   def read(self, *labels):
+# Return an iterator to an empty list if it doesn't exist.
 if not self.exists(*labels):
-  return [], -1
+  return [].__iter__(), -1
 
-source = self.source(*labels)
+# Otherwise, return a generator to the cached PCollection.
+source = self._source(*labels)
 range_tracker = source.get_range_tracker(None, None)
-result = list(source.read(range_tracker))
+reader = source.read(range_tracker)
 version = self._latest_version(*labels)
-return result, version
+return reader, version
+
+  def write(self, values, *labels):
+sink = self._sink(*labels)
+path = self._path(*labels)
+with open(path, 'wb') as f:
+  for v in values:
+sink.write_record(f, v)
 
 Review comment:
   This is not part of the public API for sinks. Instead, do 
   
   ```
 writer = sink.open_writer(init_result, path)
 for v in values:
   writer.write(v)
 write_results.append(writer.close())
   ```
   
   
https://github.com/apache/beam/blob/release-2.18.0/sdks/python/apache_beam/io/iobase.py#L668
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397144)
Time Spent: 85h 50m  (was: 85h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 85h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397145
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:23
Start Date: 04/Mar/20 00:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11005: [BEAM-8335] 
Modify the StreamingCache to subclass the CacheManager 
URL: https://github.com/apache/beam/pull/11005#discussion_r387341699
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/cache_manager.py
 ##
 @@ -167,20 +177,35 @@ def load_pcoder(self, *labels):
 self._saved_pcoders[self._path(*labels)])
 
   def read(self, *labels):
+# Return an iterator to an empty list if it doesn't exist.
 if not self.exists(*labels):
-  return [], -1
+  return [].__iter__(), -1
 
 Review comment:
   Thanks .`iter([])` is slightly more idiomatic, but this is fine. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397145)
Time Spent: 86h  (was: 85h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 86h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9434) Performance improvements processiong a large number of Avro files in S3+Spark

2020-03-03 Thread Emiliano Capoccia (Jira)
Emiliano Capoccia created BEAM-9434:
---

 Summary: Performance improvements processiong a large number of 
Avro files in S3+Spark
 Key: BEAM-9434
 URL: https://issues.apache.org/jira/browse/BEAM-9434
 Project: Beam
  Issue Type: Improvement
  Components: io-java-aws, sdk-java-core
Affects Versions: 2.19.0
Reporter: Emiliano Capoccia
Assignee: Emiliano Capoccia


There is a performance issue when processing in Spark on K8S a large number of 
small Avro files (tens of thousands or more).

The recommended way of reading a pattern of Avro files in Beam is by means of:

 
{code:java}
PCollection records = p.apply(AvroIO.read(AvroGenClass.class)
.from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles())
{code}
However, in the case of many small files the above results in the entire 
reading taking place in a single task/node, which is considerably slow and has 
scalability issues.

The option of omitting the hint is not viable, as it results in too many tasks 
being spawn and the cluster busy doing coordination of tiny tasks with high 
overhead.

There are a few workarounds on the internet which mainly revolve around 
compacting the input files before processing, so that a reduced number of bulky 
files is processed in parallel.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9413) [beam_PostCommit_Py_ValCont] build failed

2020-03-03 Thread Hannah Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050653#comment-17050653
 ] 

Hannah Jiang commented on BEAM-9413:


This was merged. [~amaliujia]

> [beam_PostCommit_Py_ValCont] build failed
> -
>
> Key: BEAM-9413
> URL: https://issues.apache.org/jira/browse/BEAM-9413
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: currently-failing
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> See [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/5706/]
> Error:
>  
> *16:12:13* The push refers to repository 
> [us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk]*16:12:13* An image does 
> not exist locally with the tag: 
> us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk*16:12:14* Build step 
> 'Execute shell' marked build as failure*16:12:15* Sending e-mails to: 
> bui...@beam.apache.org*16:12:15* Recording test results*16:12:16* ERROR: Step 
> 'Publish JUnit test result report' failed: No test report files were found. 
> Configuration error?*16:12:18* No emails were triggered.*16:12:18* Finished: 
> FAILURE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=397142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397142
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:06
Start Date: 04/Mar/20 00:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594235682
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397142)
Time Spent: 5h 40m  (was: 5.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9413) [beam_PostCommit_Py_ValCont] build failed

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9413?focusedWorklogId=397140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397140
 ]

ASF GitHub Bot logged work on BEAM-9413:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:01
Start Date: 04/Mar/20 00:01
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11023: 
[BEAM-9413]Fix a py post commit failure caused by docker migration
URL: https://github.com/apache/beam/pull/11023
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397140)
Time Spent: 2h  (was: 1h 50m)

> [beam_PostCommit_Py_ValCont] build failed
> -
>
> Key: BEAM-9413
> URL: https://issues.apache.org/jira/browse/BEAM-9413
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: currently-failing
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> See [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/5706/]
> Error:
>  
> *16:12:13* The push refers to repository 
> [us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk]*16:12:13* An image does 
> not exist locally with the tag: 
> us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk*16:12:14* Build step 
> 'Execute shell' marked build as failure*16:12:15* Sending e-mails to: 
> bui...@beam.apache.org*16:12:15* Recording test results*16:12:16* ERROR: Step 
> 'Publish JUnit test result report' failed: No test report files were found. 
> Configuration error?*16:12:18* No emails were triggered.*16:12:18* Finished: 
> FAILURE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9433) Create an expansion service artifact for common IOs

2020-03-03 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reassigned BEAM-9433:
-

Assignee: Robert Bradshaw

> Create an expansion service artifact for common IOs
> ---
>
> Key: BEAM-9433
> URL: https://issues.apache.org/jira/browse/BEAM-9433
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka, sdk-java-core, sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>
> This will allow users to easily leverage Java IOs from Python/Go/... 
> pipelines. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9433) Create an expansion service artifact for common IOs

2020-03-03 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-9433:
-

 Summary: Create an expansion service artifact for common IOs
 Key: BEAM-9433
 URL: https://issues.apache.org/jira/browse/BEAM-9433
 Project: Beam
  Issue Type: New Feature
  Components: io-java-kafka, sdk-java-core, sdk-py-core
Reporter: Robert Bradshaw


This will allow users to easily leverage Java IOs from Python/Go/... pipelines. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9432) Create a separate expansion service package.

2020-03-03 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reassigned BEAM-9432:
-

Assignee: Robert Bradshaw

> Create a separate expansion service package.
> 
>
> Key: BEAM-9432
> URL: https://issues.apache.org/jira/browse/BEAM-9432
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9288) Conscrypt shaded dependency

2020-03-03 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050643#comment-17050643
 ] 

Luke Cwik commented on BEAM-9288:
-

It seems we still are adding the conscrypt native libraries as part in META-INF

$ jar tvf beam-vendor-grpc-1_26_0-0.3.jar | grep conscrypt
2218176 Mon Sep 17 10:36:24 CEST 2018
META-INF/native/libconscrypt_openjdk_jni-linux-x86_64.so
1720832 Mon Sep 17 10:36:24 CEST 2018
META-INF/native/conscrypt_openjdk_jni-windows-x86.dll
2637496 Mon Sep 17 10:36:24 CEST 2018
META-INF/native/libconscrypt_openjdk_jni-osx-x86_64.dylib
2557952 Mon Sep 17 10:36:24 CEST 2018
META-INF/native/conscrypt_openjdk_jni-windows-x86_64.dll

> Conscrypt shaded dependency
> ---
>
> Key: BEAM-9288
> URL: https://issues.apache.org/jira/browse/BEAM-9288
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Esun Kim
>Assignee: sunjincheng
>Priority: Critical
> Fix For: 2.20.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Conscrypt is not designed to be shaded properly mainly because of so files. I 
> happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt 
> (*2) in it. I think this could make a problem when new Conscrypt is brought 
> by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this 
> case, it may have a conflict when finding proper so files for Conscrypt. 
> *1: https://issues.apache.org/jira/browse/BEAM-9030
> *2:  
> [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78]
> *3: https://issues.apache.org/jira/browse/BEAM-6136
> *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0]
> *5: https://issues.apache.org/jira/browse/BEAM-8889
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397130&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397130
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:40
Start Date: 03/Mar/20 23:40
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387360492
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -43,6 +43,55 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+# By `format(customized_script=xxx)`, the given `customized_script` is
+# guaranteed to be executed within access to a jquery with datatable plugin
+# configured which is useful so that any `customized_script` is resilient to
+# browser refresh. Inside `customized_script`, use `$` as jQuery.
+_JQUERY_WITH_DATATABLE_TEMPLATE = """
+if (typeof window.jquery341 == 'undefined') {{
 
 Review comment:
   jquery341 is probably not a good name. What would happen if we upgrade to a 
different jquery version? Maybe it would be better to call it jquery_singleton 
or something like that?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397130)
Time Spent: 53h 10m  (was: 53h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 53h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397131&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397131
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:40
Start Date: 03/Mar/20 23:40
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387360704
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -43,6 +43,55 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+# By `format(customized_script=xxx)`, the given `customized_script` is
+# guaranteed to be executed within access to a jquery with datatable plugin
+# configured which is useful so that any `customized_script` is resilient to
+# browser refresh. Inside `customized_script`, use `$` as jQuery.
+_JQUERY_WITH_DATATABLE_TEMPLATE = """
+if (typeof window.jquery341 == 'undefined') {{
+  var jqueryScript = document.createElement('script');
+  jqueryScript.src = 
'https://code.jquery.com/jquery-3.4.1.slim.min.js';
+  jqueryScript.type = 'text/javascript';
+  jqueryScript.onload = function() {{
+var datatableScript = document.createElement('script');
+datatableScript.src = 
'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';
+datatableScript.type = 'text/javascript';
+datatableScript.onload = function() {{
+  window.jquery341 = jQuery.noConflict(true);
+  window.jquery341(document).ready(function($){{
+{customized_script}
+  }});
+}}
+document.head.appendChild(datatableScript);
+  }};
+  document.head.appendChild(jqueryScript);
+}} else {{
+  window.jquery341(document).ready(function($){{
+{customized_script}
+  }});
+}}"""
+
+_HTML_IMPORT_TEMPLATE = """
 
 Review comment:
   could you add comments related to this?
   
   Why is it no longer supported by chrome?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397131)
Time Spent: 53h 20m  (was: 53h 10m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 53h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397129
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:38
Start Date: 03/Mar/20 23:38
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387359940
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -86,10 +85,57 @@ def capture_duration(self, value):
   # The next PCollection evaluation will capture fresh data from sources,
   # and the data captured will be replayed until another eviction.
 """
+assert value.total_seconds() > 0, 'Duration must be a positive value.'
 self.capture_control._capture_duration = value
 
   # TODO(BEAM-8335): add capture_size options when they are supported.
 
+  @property
+  def display_timestamp_format(self):
+"""The format in which timestamps are displayed.
+
+Default is '%Y-%m-%d %H:%M:%S.%f%z', e.g. 2020-02-01 15:05:06.15-08:00.
 
 Review comment:
   How do we plan to keep the defaults in sync between here and 
interactive_options?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397129)
Time Spent: 53h  (was: 52h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 53h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397127
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:37
Start Date: 03/Mar/20 23:37
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387359585
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -238,31 +322,57 @@ def _display_dive(self, data, update=None):
   display(HTML(html))
 
   def _display_overview(self, data, update=None):
+if (not data.empty and self._include_window_info and
+all(column in data.columns
+for column in ('event_time', 'windows', 'pane_info'))):
+  data = data.drop(['event_time', 'windows', 'pane_info'], axis=1)
+
 gfsg = GenericFeatureStatisticsGenerator()
 proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': data}])
 protostr = base64.b64encode(proto.SerializeToString()).decode('utf-8')
 if update:
   script = _OVERVIEW_SCRIPT_TEMPLATE.format(
-  display_id=update, protostr=protostr)
+  display_id=update._overview_display_id, protostr=protostr)
   display_javascript(Javascript(script))
 else:
   html = _OVERVIEW_HTML_TEMPLATE.format(
   display_id=self._overview_display_id, protostr=protostr)
   display(HTML(html))
 
   def _display_dataframe(self, data, update=None):
-if update:
-  table_id = 'table_{}'.format(update)
-  html = _DATAFRAME_PAGINATION_TEMPLATE.format(
-  dataframe_html=data.to_html(notebook=True, table_id=table_id),
-  table_id=table_id)
-  update_display(HTML(html), display_id=update)
+table_id = 'table_{}'.format(
+update._df_display_id if update else self._df_display_id)
+columns = [{
+'title': ''
+}] + [{
+'title': str(column)
+} for column in data.columns]
+format_window_info_in_dataframe(data)
+rows = data.applymap(lambda x: str(x)).to_dict('split')['data']
 
 Review comment:
   Could you explain that in a comment there?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397127)
Time Spent: 52h 40m  (was: 52.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397128
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:37
Start Date: 03/Mar/20 23:37
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387359733
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -291,3 +402,81 @@ def _to_dataframe(self):
 
   def _is_one_dimension_type(self, val):
 return type(val) in _one_dimension_types
+
+
+def format_window_info_in_dataframe(data):
+  if 'event_time' in data.columns:
+data['event_time'] = data['event_time'].apply(event_time_formatter)
+  if 'windows' in data.columns:
+data['windows'] = data['windows'].apply(windows_formatter)
+  if 'pane_info' in data.columns:
+data['pane_info'] = data['pane_info'].apply(pane_info_formatter)
+
+
+def event_time_formatter(event_time_us):
+  options = ie.current_env().options
+  to_tz = options.display_timezone
+  try:
+return (
+datetime.datetime.utcfromtimestamp(event_time_us / 100).replace(
+tzinfo=tz.tzutc()).astimezone(to_tz).strftime(
+options.display_timestamp_format))
+  except ValueError:
+if event_time_us < 0:
+  return 'Min Timestamp'
+return 'Max Timestamp'
+
+
+def windows_formatter(windows):
+  result = []
+  for w in windows:
+if isinstance(w, GlobalWindow):
+  result.append(str(w))
+elif isinstance(w, IntervalWindow):
+  # First get the duration in terms of hours, minutes, seconds, and
+  # micros.
+  duration = w.end.micros - w.start.micros
+  duration_secs = duration // 100
+  hours, remainder = divmod(duration_secs, 3600)
+  minutes, seconds = divmod(remainder, 60)
+  micros = (duration - duration_secs * 100) % 100
+
+  # Construct the duration string. Try and write the string in such a
 
 Review comment:
   Is there any other standard function that will do this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397128)
Time Spent: 52h 50m  (was: 52h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397126
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:36
Start Date: 03/Mar/20 23:36
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387359374
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
 
 Review comment:
   1. If this try fails, what happens?
   2. Is the console.log statement valuable in this case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397126)
Time Spent: 52.5h  (was: 52h 20m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9413) [beam_PostCommit_Py_ValCont] build failed

2020-03-03 Thread Hannah Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050639#comment-17050639
 ] 

Hannah Jiang commented on BEAM-9413:


It's not a blocker. However, I am almost done with this ticket. If it can be 
merged before you create RC0, I would be happy to include it. If you're ready 
to cut RC, please go ahead.

> [beam_PostCommit_Py_ValCont] build failed
> -
>
> Key: BEAM-9413
> URL: https://issues.apache.org/jira/browse/BEAM-9413
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Yueyang Qiu
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: currently-failing
> Fix For: 2.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> See [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/5706/]
> Error:
>  
> *16:12:13* The push refers to repository 
> [us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk]*16:12:13* An image does 
> not exist locally with the tag: 
> us.gcr.io/apache-beam-testing/jenkins/python2.7_sdk*16:12:14* Build step 
> 'Execute shell' marked build as failure*16:12:15* Sending e-mails to: 
> bui...@beam.apache.org*16:12:15* Recording test results*16:12:16* ERROR: Step 
> 'Publish JUnit test result report' failed: No test report files were found. 
> Configuration error?*16:12:18* No emails were triggered.*16:12:18* Finished: 
> FAILURE



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397125
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:31
Start Date: 03/Mar/20 23:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387357575
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
+  }}"""
+_OVERVIEW_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 

[jira] [Work logged] (BEAM-3925) Support ValueProviders in KafkaIO

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3925?focusedWorklogId=397121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397121
 ]

ASF GitHub Bot logged work on BEAM-3925:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:14
Start Date: 03/Mar/20 23:14
Worklog Time Spent: 10m 
  Work Description: rangadi commented on issue #6636: [BEAM-3925] [DO NOT 
MERGE] KafkaIO : Value provider support for reader configuration. 
URL: https://github.com/apache/beam/pull/6636#issuecomment-594220066
 
 
   @pseudomuto @davidheryanto : The main issue we need to address is proper 
support for `.numSplits()` interface. If user sets `numSplits` to 20 and Kafka 
topic has only 5 partitions, 15 splits (`readers`) will not have any work. We 
just need to handle that properly. I don't think it is that much more work. 
   cc: @aromanenko-dev. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397121)
Time Spent: 7h 40m  (was: 7.5h)

> Support ValueProviders in KafkaIO
> -
>
> Key: BEAM-3925
> URL: https://issues.apache.org/jira/browse/BEAM-3925
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Sameer Abhyankar
>Assignee: Pramod Upamanyu
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Add ValueProvider support for the various methods in KafkaIO. This would 
> allow us to use KafkaIO in reusable pipeline templates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9424) Grouping By LogicalTypes is not supported

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9424?focusedWorklogId=397120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397120
 ]

ASF GitHub Bot logged work on BEAM-9424:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:08
Start Date: 03/Mar/20 23:08
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #11034: [BEAM-9424] 
Allow grouping by LogicalType
URL: https://github.com/apache/beam/pull/11034#discussion_r387349711
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/RowCoderGenerator.java
 ##
 @@ -401,6 +404,20 @@ private static StackManipulation 
getCoder(Schema.FieldType fieldType) {
 }
   }
 
+  private static StackManipulation logicalTypeCoder(
+  Schema.LogicalType logicalType, StackManipulation baseCoder) {
+throw new UnsupportedOperationException("not implemented");
 
 Review comment:
   This section is left unimplemented, and that's why the test is failing
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397120)
Time Spent: 2h 40m  (was: 2.5h)

> Grouping By LogicalTypes is not supported
> -
>
> Key: BEAM-9424
> URL: https://issues.apache.org/jira/browse/BEAM-9424
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: fdiazgon
>Assignee: fdiazgon
>Priority: Minor
>  Labels: sql
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Creating a schema from a BQ schema that has either TIME, DATE or DATETIME 
> columns, and grouping by one of these fields throws NullPointerException.
> {code:java}
> Pipeline pipeline = Pipeline.create();
> Schema beamSchemaWithLogicalTypes =
> BigQueryUtils.fromTableSchema(
> new TableSchema()
> .setFields(
> Arrays.asList(
> new TableFieldSchema().setName("fTime").setType("TIME"),
> new TableFieldSchema().setName("fDate").setType("DATE"),
> new 
> TableFieldSchema().setName("fDatetime").setType("DATETIME";
> Row row =
> Row.withSchema(beamSchemaWithLogicalTypes)
> .addValues(
> DateTime.parse("2020-02-02"),
> DateTime.parse("2020-02-02"),
> DateTime.parse("2020-02-02T00:00:00"))
> .build();
> PCollection outputRow =
> pipeline
> .apply(Create.of(row))
> .setRowSchema(beamSchemaWithLogicalTypes)
> .apply(
> SqlTransform.query(
> "SELECT fTime, fDate, fDatetime FROM PCOLLECTION GROUP BY 
> fTime, fDate, fDatetime"));
> pipeline.run();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9424) Grouping By LogicalTypes is not supported

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9424?focusedWorklogId=397119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397119
 ]

ASF GitHub Bot logged work on BEAM-9424:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:07
Start Date: 03/Mar/20 23:07
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #11034: [BEAM-9424] Allow 
grouping by LogicalType
URL: https://github.com/apache/beam/pull/11034#issuecomment-594217964
 
 
   @reuvenlax it looks like there is a couple of places where LogicalType don't 
work since we introduced to/fromBaseType functions. Looking into the code we 
would need to support it in RowCoderGenerator that doesn't look straightforward 
because LogicalType doesn't have a static factory we can call.
   
   Do you have thoughts on how we can fix that?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397119)
Time Spent: 2.5h  (was: 2h 20m)

> Grouping By LogicalTypes is not supported
> -
>
> Key: BEAM-9424
> URL: https://issues.apache.org/jira/browse/BEAM-9424
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.19.0
>Reporter: fdiazgon
>Assignee: fdiazgon
>Priority: Minor
>  Labels: sql
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Creating a schema from a BQ schema that has either TIME, DATE or DATETIME 
> columns, and grouping by one of these fields throws NullPointerException.
> {code:java}
> Pipeline pipeline = Pipeline.create();
> Schema beamSchemaWithLogicalTypes =
> BigQueryUtils.fromTableSchema(
> new TableSchema()
> .setFields(
> Arrays.asList(
> new TableFieldSchema().setName("fTime").setType("TIME"),
> new TableFieldSchema().setName("fDate").setType("DATE"),
> new 
> TableFieldSchema().setName("fDatetime").setType("DATETIME";
> Row row =
> Row.withSchema(beamSchemaWithLogicalTypes)
> .addValues(
> DateTime.parse("2020-02-02"),
> DateTime.parse("2020-02-02"),
> DateTime.parse("2020-02-02T00:00:00"))
> .build();
> PCollection outputRow =
> pipeline
> .apply(Create.of(row))
> .setRowSchema(beamSchemaWithLogicalTypes)
> .apply(
> SqlTransform.query(
> "SELECT fTime, fDate, fDatetime FROM PCOLLECTION GROUP BY 
> fTime, fDate, fDatetime"));
> pipeline.run();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9432) Create a separate expansion service package.

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9432?focusedWorklogId=397118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397118
 ]

ASF GitHub Bot logged work on BEAM-9432:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:05
Start Date: 03/Mar/20 23:05
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11035: [BEAM-9432] 
Move expansion service into its own project.
URL: https://github.com/apache/beam/pull/11035
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python |

[jira] [Work logged] (BEAM-9424) Grouping By LogicalTypes is not supported

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9424?focusedWorklogId=397116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397116
 ]

ASF GitHub Bot logged work on BEAM-9424:


Author: ASF GitHub Bot
Created on: 03/Mar/20 23:03
Start Date: 03/Mar/20 23:03
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #11034: [BEAM-9424] 
Allow grouping by LogicalType
URL: https://github.com/apache/beam/pull/11034
 
 
   Add LogicalTypeCoder.
   
   Support LogicalTypes in Group transform and SchemaCoder generation.
   
   Related to: https://github.com/apache/beam/pull/11015
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastComple

[jira] [Created] (BEAM-9432) Create a separate expansion service package.

2020-03-03 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-9432:
-

 Summary: Create a separate expansion service package.
 Key: BEAM-9432
 URL: https://issues.apache.org/jira/browse/BEAM-9432
 Project: Beam
  Issue Type: New Feature
  Components: beam-model, sdk-java-core
Reporter: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397110
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:53
Start Date: 03/Mar/20 22:53
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387267410
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
 
 Review comment:
   Facets widgets doesn't depend on the jQuery we setup.
   The JS also doesn't depend on the webcomponent (it's for HTML import).
   So there is no need to wait for `onload` of anything.
   The DOM changes when the output area containing the widgets being updated 
gets deleted by the user in the notebook and some JS exceptions could be thrown 
out.
   This is to avoid `display_javascript` polluting the output area of notebooks 
in this scenario.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397110)
Time Spent: 52h 10m  (was: 52h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397098
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387267410
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
 
 Review comment:
   Facets widgets doesn't depend on the jQuery we setup.
   The JS also doesn't depend on the webcomponent (it's for HTML import).
   So there is no need to wait for `onload` of anything.
   This is to avoid the DOM change when the output area containing the widgets 
being updated gets deleted by the user in the notebook and some JS exceptions 
could be thrown out.
   `display_javascript` would pollute the output area in this scenario.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397098)
Time Spent: 51h 10m  (was: 51h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397104
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387281998
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -238,31 +322,57 @@ def _display_dive(self, data, update=None):
   display(HTML(html))
 
   def _display_overview(self, data, update=None):
+if (not data.empty and self._include_window_info and
+all(column in data.columns
+for column in ('event_time', 'windows', 'pane_info'))):
+  data = data.drop(['event_time', 'windows', 'pane_info'], axis=1)
+
 gfsg = GenericFeatureStatisticsGenerator()
 proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': data}])
 protostr = base64.b64encode(proto.SerializeToString()).decode('utf-8')
 if update:
   script = _OVERVIEW_SCRIPT_TEMPLATE.format(
-  display_id=update, protostr=protostr)
+  display_id=update._overview_display_id, protostr=protostr)
   display_javascript(Javascript(script))
 else:
   html = _OVERVIEW_HTML_TEMPLATE.format(
   display_id=self._overview_display_id, protostr=protostr)
   display(HTML(html))
 
   def _display_dataframe(self, data, update=None):
-if update:
-  table_id = 'table_{}'.format(update)
-  html = _DATAFRAME_PAGINATION_TEMPLATE.format(
-  dataframe_html=data.to_html(notebook=True, table_id=table_id),
-  table_id=table_id)
-  update_display(HTML(html), display_id=update)
+table_id = 'table_{}'.format(
+update._df_display_id if update else self._df_display_id)
+columns = [{
+'title': ''
+}] + [{
+'title': str(column)
+} for column in data.columns]
+format_window_info_in_dataframe(data)
+rows = data.applymap(lambda x: str(x)).to_dict('split')['data']
 
 Review comment:
   First, we get all the string `data` from the `split` 
[orient](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html)
 of `dataframe.to_dict`. 
   Now the `rows` is a `list` of `row`s of values.
   Each `row` looks like `[column_1_val, column_2_val, ...]`
   
   Then we are going to add datatable column index for the values in each `row`.
   The index starts from 1 because we are also going to add a column `0` 
later., so we have `{k+1: v}`.
   Each `row` now becomes `{1: column_1_val, 2: column_2_val, ...}`
   
   Then we add column `0` (`row[0] = k`) of the datatable with values of int 
based index (which will be the default order column just as the original 
dataframe).
   Each `row` now becomes `{1: column_1_val, 2: column_2_val, ..., 0: 
int_index_in_dataframe}`
   
   Then the list of above `row`s get supplied as string in the Javascript to 
load the data into the table.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397104)
Time Spent: 51h 50m  (was: 51h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarz

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397099
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387285131
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -86,10 +85,57 @@ def capture_duration(self, value):
   # The next PCollection evaluation will capture fresh data from sources,
   # and the data captured will be replayed until another eviction.
 """
+assert value.total_seconds() > 0, 'Duration must be a positive value.'
 self.capture_control._capture_duration = value
 
   # TODO(BEAM-8335): add capture_size options when they are supported.
 
+  @property
+  def display_timestamp_format(self):
+"""The format in which timestamps are displayed.
+
+Default is '%Y-%m-%d %H:%M:%S.%f%z', e.g. 2020-02-01 15:05:06.15-08:00.
 
 Review comment:
   docstrings in this module are for notebook users. So keeping them here 
allows `Shift+Tab` in notebooks to invoke the docstrings pop up. They function 
as in-notebook user guide.
   
   The default is defined in the `interactive_options` module where we hide the 
implementation details that are not exposed APIs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397099)
Time Spent: 51h 20m  (was: 51h 10m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397108
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387320751
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -291,3 +402,81 @@ def _to_dataframe(self):
 
   def _is_one_dimension_type(self, val):
 return type(val) in _one_dimension_types
+
+
+def format_window_info_in_dataframe(data):
+  if 'event_time' in data.columns:
+data['event_time'] = data['event_time'].apply(event_time_formatter)
+  if 'windows' in data.columns:
+data['windows'] = data['windows'].apply(windows_formatter)
+  if 'pane_info' in data.columns:
+data['pane_info'] = data['pane_info'].apply(pane_info_formatter)
+
+
+def event_time_formatter(event_time_us):
+  options = ie.current_env().options
+  to_tz = options.display_timezone
+  try:
+return (
+datetime.datetime.utcfromtimestamp(event_time_us / 100).replace(
+tzinfo=tz.tzutc()).astimezone(to_tz).strftime(
+options.display_timestamp_format))
+  except ValueError:
+if event_time_us < 0:
+  return 'Min Timestamp'
+return 'Max Timestamp'
+
+
+def windows_formatter(windows):
+  result = []
+  for w in windows:
+if isinstance(w, GlobalWindow):
+  result.append(str(w))
+elif isinstance(w, IntervalWindow):
+  # First get the duration in terms of hours, minutes, seconds, and
+  # micros.
+  duration = w.end.micros - w.start.micros
+  duration_secs = duration // 100
+  hours, remainder = divmod(duration_secs, 3600)
+  minutes, seconds = divmod(remainder, 60)
+  micros = (duration - duration_secs * 100) % 100
+
+  # Construct the duration string. Try and write the string in such a
 
 Review comment:
   This is trying to format a duration potentially with precision at micros, 
not exactly a `datetime`.  It's more like pretty print a `timedelta`. So the 
`strftime` function is not applicable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397108)
Time Spent: 52h  (was: 51h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397109
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387292177
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -43,6 +43,55 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+# By `format(customized_script=xxx)`, the given `customized_script` is
+# guaranteed to be executed within access to a jquery with datatable plugin
+# configured which is useful so that any `customized_script` is resilient to
+# browser refresh. Inside `customized_script`, use `$` as jQuery.
+_JQUERY_WITH_DATATABLE_TEMPLATE = """
+if (typeof window.jquery341 == 'undefined') {{
 
 Review comment:
   It's an arbitrary name we give to the jQuery v3.4.1 we imported. It's like a 
namesapce. Note the magic happens in `window.jquery341 = 
jQuery.noConflict(true);`.
   The problem here is that:
   1. A frontend can connect to the kernel at any time: code executed by kernel 
in the past does not have any effect to new frontends.
   2. Multiple frontends can connect to the same kernel: each frontend has its 
own state (browser: HTML and JS), the rendered HTML+JS cannot assume the 
existence of any global variable, function definition or libraries.
   
   This ensures no matter how many jQuery gets imported at any time, the 
interactive notebook always checks and uses the single jQuery configured by 
interactive modules with Datatable plugin initialized.
   And the `function($)` signature ensures that any customized script executed 
will use `$` as the singleton instance  `window.jquery341`. This ensures that 
code reading `$` as jQuery will always work.
   The advantage of doing this isolation is:
   1. The JS imported by interactive modules to any frontend does not alter 
their existing states. Everything in the notebook still works as it was no 
matter what libraries and global vars have been used.
   2. HTML with JS rendered by interactive modules will have determined 
behavior because it always uses the same libraries.
   3. Whether/when a frontend is connected to the kernel doesn't matter now. 
The visualization HTML contains everything it needs to setup and/or execute 
scripts.
   4. Arbitrary DOM changes doesn't matter now. Even if the user screws the 
notebook's HTML, the data visualization broadcast from kernels will always be 
rendered correctly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397109)
Time Spent: 52h  (was: 51h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397105
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387342551
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -211,20 +280,27 @@ def show(*pcolls):
 watched_pcollections.add(val)
   for pcoll in pcolls:
 if pcoll not in watched_pcollections:
-  watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll})
+  watch({'anonymous_pcollection_{}'.format(id(pcoll)): pcoll})
 
+  import warnings
+  warnings.filterwarnings('ignore', category=DeprecationWarning)
 
 Review comment:
   Change the filtering to catch a specific message and only takes effect when 
`is_in_ipython` when the user invokes `show` for the first time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397105)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397102
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387268984
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -52,42 +56,89 @@
 except ImportError:
   _pcoll_visualization_ready = False
 
+_LOGGER = logging.getLogger(__name__)
+
 # 1-d types that need additional normalization to be compatible with DataFrame.
 _one_dimension_types = (int, float, str, bool, list, tuple)
 
+_CSS = """
+
+.p-Widget.jp-OutputPrompt.jp-OutputArea-prompt:empty {{
+  padding: 0;
+  border: 0;
+}}
+
.p-Widget.jp-RenderedJavaScript.jp-mod-trusted.jp-OutputArea-output:empty {{
+  padding: 0;
+  border: 0;
+}}
+"""
 _DIVE_SCRIPT_TEMPLATE = """
-document.querySelector("#{display_id}").data = {jsonstr};"""
-_DIVE_HTML_TEMPLATE = """
+try {{
+  document.querySelector("#{display_id}").data = {jsonstr};
+}} catch (e) {{
+  console.log("#{display_id} is not rendered yet.");
+}}"""
+_DIVE_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 
   document.querySelector("#{display_id}").data = {jsonstr};
 """
 _OVERVIEW_SCRIPT_TEMPLATE = """
-  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
-  """
-_OVERVIEW_HTML_TEMPLATE = """
+  try {{
+document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  }} catch (e) {{
+console.log("#{display_id} is not rendered yet.");
+  }}"""
+_OVERVIEW_HTML_TEMPLATE = _CSS + """
 https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html";>
 
 

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397106
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387269492
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -181,6 +247,12 @@ def __init__(self, pcoll):
 self._overview_display_id = 'facets_overview_{}_{}'.format(
 self._cache_key, id(self))
 self._df_display_id = 'df_{}_{}'.format(self._cache_key, id(self))
+# Whether the visualization should include window info.
 
 Review comment:
   Got it, removing these comments.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397106)
Time Spent: 52h  (was: 51h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397100
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387315589
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -43,6 +43,55 @@
 
 _LOGGER = logging.getLogger(__name__)
 
+# By `format(customized_script=xxx)`, the given `customized_script` is
+# guaranteed to be executed within access to a jquery with datatable plugin
+# configured which is useful so that any `customized_script` is resilient to
+# browser refresh. Inside `customized_script`, use `$` as jQuery.
+_JQUERY_WITH_DATATABLE_TEMPLATE = """
+if (typeof window.jquery341 == 'undefined') {{
+  var jqueryScript = document.createElement('script');
+  jqueryScript.src = 
'https://code.jquery.com/jquery-3.4.1.slim.min.js';
+  jqueryScript.type = 'text/javascript';
+  jqueryScript.onload = function() {{
+var datatableScript = document.createElement('script');
+datatableScript.src = 
'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';
+datatableScript.type = 'text/javascript';
+datatableScript.onload = function() {{
+  window.jquery341 = jQuery.noConflict(true);
+  window.jquery341(document).ready(function($){{
+{customized_script}
+  }});
+}}
+document.head.appendChild(datatableScript);
+  }};
+  document.head.appendChild(jqueryScript);
+}} else {{
+  window.jquery341(document).ready(function($){{
+{customized_script}
+  }});
+}}"""
+
+_HTML_IMPORT_TEMPLATE = """
 
 Review comment:
   This uses something called `HTML import` where static HTML will be imported 
and embedded into current HTML.
   Here the HTML we desire is facets-jupyter.html.
   
   This feature is not supported by chrome anymore, thus requires the 
webcomponents JS lib.
   Similar to the jQuery template, we check if `HTML import` is supported by 
the browser, if so, import HTMLs else setup webcomponents and chain the `HTML 
import` to the end of `onload`.
   Note, we import HTMLs in the head for several reasons:
   1. In a notebook, DOM changes all the time. Keeping imported HTMLs in head 
makes sure all dependency HTMLs available all the time.
   2. `HTML import` only happens once per page load. There is no way to recover 
an imported HTML if you delete it from DOM unless you refresh the page.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397100)
Time Spent: 51.5h  (was: 51h 20m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397103
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387330714
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/options/interactive_options.py
 ##
 @@ -24,6 +24,8 @@
 
 from __future__ import absolute_import
 
+from dateutil import tz
 
 Review comment:
   The user can use both. Here internally we use dateutil.tz get the local 
timezone info.
   Externally, the user can use `pytz.timezone` or `dateutil.tz.gettz` because 
the `to_tz` just needs to be a subclass of datetime.tzinfo.
   
   Added the comments in the exposed API.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397103)
Time Spent: 51h 50m  (was: 51h 40m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397101
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387276995
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -215,20 +287,32 @@ def display_facets(self, updating_pv=None):
 # Ensures that dive, overview and table render the same data because the
 # materialized PCollection data might being updated continuously.
 data = self._to_dataframe()
+# String-ify the dictionaries for display because elements of type dict
+# cannot be ordered.
+data = data.applymap(lambda x: str(x) if isinstance(x, dict) else x)
 if updating_pv:
-  self._display_dive(data, updating_pv._dive_display_id)
-  self._display_overview(data, updating_pv._overview_display_id)
-  self._display_dataframe(data, updating_pv._df_display_id)
+  # Only updates when data is not empty. Otherwise, consider it a bad
+  # iteration and noop since there is nothing to be updated.
+  if data.empty:
+_LOGGER.debug('Skip a visualization update due to empty data.')
+  else:
+self._display_dataframe(data.copy(deep=True), updating_pv)
+if self._display_facets:
+  self._display_dive(data.copy(deep=True), updating_pv)
 
 Review comment:
   Because we make different changes (such as formatting and dropping some 
columns) to the dataframe before displaying it in these 3 widgets.
   For example, window info needs to be formatted for facets-dive and datatable 
while getting dropped in facets-overview.
   
   If they share the same instance, the 3 widgets will be altering the same 
dataframe object in arbitrary order, get arbitrary mixed output or run into all 
kinds of mapping errors.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397101)
Time Spent: 51h 40m  (was: 51.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 51h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397107
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:50
Start Date: 03/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #11020: [BEAM-7926] 
Update Data Visualization
URL: https://github.com/apache/beam/pull/11020#discussion_r387334402
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -211,20 +280,27 @@ def show(*pcolls):
 watched_pcollections.add(val)
   for pcoll in pcolls:
 if pcoll not in watched_pcollections:
-  watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll})
+  watch({'anonymous_pcollection_{}'.format(id(pcoll)): pcoll})
 
+  import warnings
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397107)
Time Spent: 52h  (was: 51h 50m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 52h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9431?focusedWorklogId=397097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397097
 ]

ASF GitHub Bot logged work on BEAM-9431:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:49
Start Date: 03/Mar/20 22:49
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #11033: [BEAM-9431] Remove 
ReadFromPubSub/Read-out0-ElementCount from the met…
URL: https://github.com/apache/beam/pull/11033#issuecomment-594211677
 
 
   Can you elaborate a bit more on why we remove counter validation?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397097)
Time Spent: 40m  (was: 0.5h)

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397089
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:36
Start Date: 03/Mar/20 22:36
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594207171
 
 
   This sounds like interesting voodoo. Could you eventually bring and 
explanation of this and motivation to the ML
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397089)
Time Spent: 85h 40m  (was: 85.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 85h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=397088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397088
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:34
Start Date: 03/Mar/20 22:34
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-594206246
 
 
   Oh ignore my previous comment it worked now. ARRRGHHH Jenkins!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397088)
Time Spent: 13.5h  (was: 13h 20m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=397087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397087
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:34
Start Date: 03/Mar/20 22:34
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-594206039
 
 
   @pulasthi can you please do a `git pull origin master --rebase` and push 
force on this one, it seems that when you did the PR something was failing and 
the full tests are not running. Sorry for so many inconviniences.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397087)
Time Spent: 13h 20m  (was: 13h 10m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=397086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397086
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:32
Start Date: 03/Mar/20 22:32
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10951: [BEAM-8575] 
Modified the test to work for different runners.
URL: https://github.com/apache/beam/pull/10951
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397086)
Time Spent: 59h  (was: 58h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 59h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397084
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:30
Start Date: 03/Mar/20 22:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #11030: [BEAM-8335] Add 
PCollection to Dataframe logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/11030#issuecomment-594204586
 
 
   All precommit tests passing now, LGTM. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397084)
Time Spent: 85h 20m  (was: 85h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 85h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397085
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:30
Start Date: 03/Mar/20 22:30
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11030: [BEAM-8335] 
Add PCollection to Dataframe logic for InteractiveRunner.
URL: https://github.com/apache/beam/pull/11030
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397085)
Time Spent: 85.5h  (was: 85h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 85.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397078
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:23
Start Date: 03/Mar/20 22:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594201817
 
 
   Docs error: `
   
   
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/apache_beam/runners/direct/transform_evaluator.py:docstring
 of apache_beam.runners.direct.transform_evaluator.PairWithTimingEvaluator:1: 
WARNING: py:class reference target not found: 
apache_beam.runners.direct.transform_evaluator._TransformEvaluator
   --
   
   
   `
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397078)
Time Spent: 85h  (was: 84h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 85h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397079
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:23
Start Date: 03/Mar/20 22:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594201817
 
 
   Docs error: 
   ```
   
   
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/pycommon/build/srcs/sdks/python/apache_beam/runners/direct/transform_evaluator.py:docstring
 of apache_beam.runners.direct.transform_evaluator.PairWithTimingEvaluator:1: 
WARNING: py:class reference target not found: 
apache_beam.runners.direct.transform_evaluator._TransformEvaluator
   --
   
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397079)
Time Spent: 85h 10m  (was: 85h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 85h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8989) Backwards incompatible change in ParDo.getSideInputs (caught by failure when running Apache Nemo quickstart)

2020-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-8989:
--

Assignee: (was: Won Wook Song)

> Backwards incompatible change in ParDo.getSideInputs (caught by failure when 
> running Apache Nemo quickstart)
> 
>
> Key: BEAM-8989
> URL: https://issues.apache.org/jira/browse/BEAM-8989
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Luke Cwik
>Priority: Critical
>  Labels: backward-incompatible
> Fix For: 2.19.0
>
>
> [PR/9275|https://github.com/apache/beam/pull/9275] changed 
> *ParDo.getSideInputs* from *List* to *Map PCollectionView>* which is backwards incompatible change and was released as 
> part of Beam 2.16.0 erroneously.
> Running the Apache Nemo Quickstart fails with:
>  
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Translator private 
> static void 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput)
>  have failed to translate 
> org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27Exception in thread 
> "main" java.lang.RuntimeException: Translator private static void 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput)
>  have failed to translate 
> org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27 at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:113)
>  at 
> org.apache.nemo.compiler.frontend.beam.PipelineVisitor.visitPrimitiveTransform(PipelineVisitor.java:46)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at 
> org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:80) at 
> org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:31) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:315) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at 
> org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185) at 
> org.apache.beam.examples.WordCount.main(WordCount.java:192)Caused by: 
> java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:109)
>  ... 14 moreCaused by: java.lang.NoSuchMethodError: 
> org.apache.beam.sdk.transforms.ParDo$MultiOutput.getSideInputs()Ljava/util/List;
>  at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(PipelineTranslator.java:236)
>  ... 19 more{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8989) Backwards incompatible change in ParDo.getSideInputs (caught by failure when running Apache Nemo quickstart)

2020-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8989:
---
Fix Version/s: (was: 2.20.0)
   2.19.0

> Backwards incompatible change in ParDo.getSideInputs (caught by failure when 
> running Apache Nemo quickstart)
> 
>
> Key: BEAM-8989
> URL: https://issues.apache.org/jira/browse/BEAM-8989
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Luke Cwik
>Assignee: Won Wook Song
>Priority: Critical
>  Labels: backward-incompatible
> Fix For: 2.19.0
>
>
> [PR/9275|https://github.com/apache/beam/pull/9275] changed 
> *ParDo.getSideInputs* from *List* to *Map PCollectionView>* which is backwards incompatible change and was released as 
> part of Beam 2.16.0 erroneously.
> Running the Apache Nemo Quickstart fails with:
>  
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Translator private 
> static void 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput)
>  have failed to translate 
> org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27Exception in thread 
> "main" java.lang.RuntimeException: Translator private static void 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput)
>  have failed to translate 
> org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27 at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:113)
>  at 
> org.apache.nemo.compiler.frontend.beam.PipelineVisitor.visitPrimitiveTransform(PipelineVisitor.java:46)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at 
> org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:80) at 
> org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:31) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:315) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at 
> org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185) at 
> org.apache.beam.examples.WordCount.main(WordCount.java:192)Caused by: 
> java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:109)
>  ... 14 moreCaused by: java.lang.NoSuchMethodError: 
> org.apache.beam.sdk.transforms.ParDo$MultiOutput.getSideInputs()Ljava/util/List;
>  at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(PipelineTranslator.java:236)
>  ... 19 more{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8989) Backwards incompatible change in ParDo.getSideInputs (caught by failure when running Apache Nemo quickstart)

2020-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-8989.

Resolution: Fixed

> Backwards incompatible change in ParDo.getSideInputs (caught by failure when 
> running Apache Nemo quickstart)
> 
>
> Key: BEAM-8989
> URL: https://issues.apache.org/jira/browse/BEAM-8989
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Luke Cwik
>Priority: Critical
>  Labels: backward-incompatible
> Fix For: 2.19.0
>
>
> [PR/9275|https://github.com/apache/beam/pull/9275] changed 
> *ParDo.getSideInputs* from *List* to *Map PCollectionView>* which is backwards incompatible change and was released as 
> part of Beam 2.16.0 erroneously.
> Running the Apache Nemo Quickstart fails with:
>  
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Translator private 
> static void 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput)
>  have failed to translate 
> org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27Exception in thread 
> "main" java.lang.RuntimeException: Translator private static void 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(org.apache.nemo.compiler.frontend.beam.PipelineTranslationContext,org.apache.beam.sdk.runners.TransformHierarchy$Node,org.apache.beam.sdk.transforms.ParDo$MultiOutput)
>  have failed to translate 
> org.apache.beam.examples.WordCount$ExtractWordsFn@600b9d27 at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:113)
>  at 
> org.apache.nemo.compiler.frontend.beam.PipelineVisitor.visitPrimitiveTransform(PipelineVisitor.java:46)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at 
> org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:80) at 
> org.apache.nemo.compiler.frontend.beam.NemoRunner.run(NemoRunner.java:31) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:315) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at 
> org.apache.beam.examples.WordCount.runWordCount(WordCount.java:185) at 
> org.apache.beam.examples.WordCount.main(WordCount.java:192)Caused by: 
> java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.translatePrimitive(PipelineTranslator.java:109)
>  ... 14 moreCaused by: java.lang.NoSuchMethodError: 
> org.apache.beam.sdk.transforms.ParDo$MultiOutput.getSideInputs()Ljava/util/List;
>  at 
> org.apache.nemo.compiler.frontend.beam.PipelineTranslator.parDoMultiOutputTranslator(PipelineTranslator.java:236)
>  ... 19 more{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9429) Python PostCommit: wordcount_xlang.py hangs

2020-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9429:
---
Status: Open  (was: Triage Needed)

> Python PostCommit: wordcount_xlang.py hangs
> ---
>
> Key: BEAM-9429
> URL: https://issues.apache.org/jira/browse/BEAM-9429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Heejong Lee
>Priority: Major
>  Labels: currently-failing
>
> `wordcount_xlang.py` receives an exception right after starting and hangs 
> indefenitely. Because of this, Python PostCommit is being aborted.
> Log message:
> {code:java}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.SecurityException: Invalid signature 
> file digest for Manifest main attributes
>   at 
> sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:330)
>   at 
> sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:263)
>   at java.util.jar.JarVerifier.processEntry(JarVerifier.java:275)
>   at java.util.jar.JarVerifier.update(JarVerifier.java:230)
>   at java.util.jar.JarFile.initializeVerifier(JarFile.java:383)
>   at java.util.jar.JarFile.ensureInitialization(JarFile.java:612)
>   at 
> java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(JavaUtilJarAccessImpl.java:69)
>   at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:991)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:451)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2546) Create InfluxDbIO

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=397069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397069
 ]

ASF GitHub Bot logged work on BEAM-2546:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:13
Start Date: 03/Mar/20 22:13
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11028: BEAM-2546 Beam IO 
for InfluxDB
URL: https://github.com/apache/beam/pull/11028#issuecomment-594197974
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397069)
Time Spent: 7.5h  (was: 7h 20m)

> Create InfluxDbIO
> -
>
> Key: BEAM-2546
> URL: https://issues.apache.org/jira/browse/BEAM-2546
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2546) Create InfluxDbIO

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2546?focusedWorklogId=397070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397070
 ]

ASF GitHub Bot logged work on BEAM-2546:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:13
Start Date: 03/Mar/20 22:13
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #11028: BEAM-2546 Beam IO 
for InfluxDB
URL: https://github.com/apache/beam/pull/11028#issuecomment-594197974
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397070)
Time Spent: 7h 40m  (was: 7.5h)

> Create InfluxDbIO
> -
>
> Key: BEAM-2546
> URL: https://issues.apache.org/jira/browse/BEAM-2546
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=397065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397065
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 03/Mar/20 22:10
Start Date: 03/Mar/20 22:10
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10888: [BEAM-7304] Twister2 
Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-594196873
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397065)
Time Spent: 13h 10m  (was: 13h)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
> Fix For: 2.21.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7333) Select need the ability to rename fields

2020-03-03 Thread Reuven Lax (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax resolved BEAM-7333.
--
Fix Version/s: 2.20.0
   Resolution: Fixed

> Select need the ability to rename fields
> 
>
> Key: BEAM-7333
> URL: https://issues.apache.org/jira/browse/BEAM-7333
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
> Fix For: 2.20.0
>
>
> There needs to be a way to rename fields in Select - i.e. 
> Select("field1").as("field2").
> While in many cases the RenameFields transform could be used, that doesn't 
> help when a Select would return conflicting names - i.e. Select("a.c", 
> "b.c"). In this case the Select transform would fail, because the output 
> schema would have two fields of the same name. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397054
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:46
Start Date: 03/Mar/20 21:46
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594186429
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397054)
Time Spent: 84h 50m  (was: 84h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 84h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=397056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397056
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:46
Start Date: 03/Mar/20 21:46
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594186585
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397056)
Time Spent: 5.5h  (was: 5h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397053
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:36
Start Date: 03/Mar/20 21:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10497: [BEAM-8335] Add the 
ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594182116
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397053)
Time Spent: 84h 40m  (was: 84.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 84h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=397047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397047
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:22
Start Date: 03/Mar/20 21:22
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #11022: [BEAM-7746] 
Resolve typing issues in filesystem
URL: https://github.com/apache/beam/pull/11022#discussion_r387300879
 
 

 ##
 File path: sdks/python/apache_beam/io/filesystemio.py
 ##
 @@ -175,10 +190,31 @@ def readall(self):
   res.append(data)
 return b''.join(res)
 
+  @classmethod
+  def create_buffered(cls,  # type: Type[DownloaderStreamT]
+  downloader,  # type: Downloader
+  read_buffer_size=io.DEFAULT_BUFFER_SIZE, mode='rb'):
+# type: (...) -> BinaryIO
+# In the python3 typeshed, io.BufferedReader and io.BufferedWriter do not
+# inherit from typing.BinaryIO, and BinaryIO is not a Protocol, so a class
+# must inherit from it.  We could roll our own BinaryIO Protocol, but the
+# stubs for io.Buffered* do not have the required 'mode' or 'name' attrs to
+# meet the protocol (the classes seem to expose 'mode' and 'name'
+# conditionally, if their underlying io.RawIOBase possess the attributes).
+# Thus, it is necessary to cast these types to BinaryIO either way.
+return cast(
+BinaryIO,
+io.BufferedReader(
+cls(downloader, read_buffer_size=read_buffer_size, mode=mode),
+buffer_size=read_buffer_size))
 
 Review comment:
   should I create separate args for the buffer size of the `DownloaderStream` 
and the `io.BufferedReader`?   The treatment of these varies between 
filesystems. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397047)
Time Spent: 71h 10m  (was: 71h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 71h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9429) Python PostCommit: wordcount_xlang.py hangs

2020-03-03 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050567#comment-17050567
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-9429:
-

Heejong, this seems to be related to artifact staging.

> Python PostCommit: wordcount_xlang.py hangs
> ---
>
> Key: BEAM-9429
> URL: https://issues.apache.org/jira/browse/BEAM-9429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness, test-failures
>Reporter: Kamil Wasilewski
>Priority: Major
>  Labels: currently-failing
>
> `wordcount_xlang.py` receives an exception right after starting and hangs 
> indefenitely. Because of this, Python PostCommit is being aborted.
> Log message:
> {code:java}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.SecurityException: Invalid signature 
> file digest for Manifest main attributes
>   at 
> sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:330)
>   at 
> sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:263)
>   at java.util.jar.JarVerifier.processEntry(JarVerifier.java:275)
>   at java.util.jar.JarVerifier.update(JarVerifier.java:230)
>   at java.util.jar.JarFile.initializeVerifier(JarFile.java:383)
>   at java.util.jar.JarFile.ensureInitialization(JarFile.java:612)
>   at 
> java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(JavaUtilJarAccessImpl.java:69)
>   at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:991)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:451)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9429) Python PostCommit: wordcount_xlang.py hangs

2020-03-03 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath reassigned BEAM-9429:
---

Assignee: Heejong Lee

> Python PostCommit: wordcount_xlang.py hangs
> ---
>
> Key: BEAM-9429
> URL: https://issues.apache.org/jira/browse/BEAM-9429
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, sdk-py-harness, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Heejong Lee
>Priority: Major
>  Labels: currently-failing
>
> `wordcount_xlang.py` receives an exception right after starting and hangs 
> indefenitely. Because of this, Python PostCommit is being aborted.
> Log message:
> {code:java}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.SecurityException: Invalid signature 
> file digest for Manifest main attributes
>   at 
> sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:330)
>   at 
> sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:263)
>   at java.util.jar.JarVerifier.processEntry(JarVerifier.java:275)
>   at java.util.jar.JarVerifier.update(JarVerifier.java:230)
>   at java.util.jar.JarFile.initializeVerifier(JarFile.java:383)
>   at java.util.jar.JarFile.ensureInitialization(JarFile.java:612)
>   at 
> java.util.jar.JavaUtilJarAccessImpl.ensureInitialization(JavaUtilJarAccessImpl.java:69)
>   at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:991)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:451)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>   at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397044
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:08
Start Date: 03/Mar/20 21:08
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10497: [BEAM-8335] Add 
the ReverseTestStream
URL: https://github.com/apache/beam/pull/10497#issuecomment-594169195
 
 
   Taking a look. For some reason it's failing with a pickling error:
   PicklingError: Can't pickle : it's not found as 
apache_beam.testing.test_stream.Format
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397044)
Time Spent: 84.5h  (was: 84h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 84.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9431) test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not supported with streaming engine

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9431?focusedWorklogId=397043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397043
 ]

ASF GitHub Bot logged work on BEAM-9431:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:07
Start Date: 03/Mar/20 21:07
Worklog Time Spent: 10m 
  Work Description: ananvay commented on issue #11033: [BEAM-9431] Remove 
ReadFromPubSub/Read-out0-ElementCount from the met…
URL: https://github.com/apache/beam/pull/11033#issuecomment-594168837
 
 
   LGTM. Thanks Ankur!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397043)
Time Spent: 0.5h  (was: 20m)

> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is not 
> supported with streaming engine
> -
>
> Key: BEAM-9431
> URL: https://issues.apache.org/jira/browse/BEAM-9431
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> test_streaming_pipeline_returns_expected_user_metrics_fnapi_it is failing on 
> Dataflow V2 as ReadFromPubSub/Read-out0-ElementCount is not implemented in 
> with streaming engine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >