[jira] [Closed] (BEAM-9461) CLONE - To use ByteArrayOutput/InputStream without synchronization
[ https://issues.apache.org/jira/browse/BEAM-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyoungha Min closed BEAM-9461. -- Fix Version/s: Not applicable Resolution: Abandoned > CLONE - To use ByteArrayOutput/InputStream without synchronization > -- > > Key: BEAM-9461 > URL: https://issues.apache.org/jira/browse/BEAM-9461 > Project: Beam > Issue Type: Wish > Components: sdk-java-core >Reporter: Kyoungha Min >Priority: Minor > Fix For: Not applicable > > > It would be nice to see Beam using custom ByteArrayInput/OutputStream without > synchronization. It currently uses `ThreadLocal`, so using thread-safe stream > seems unnecessary. And all streams should never be access by more than 1 > thread from the start anyway. > Simply getting rid of the synchronized keyword will speed up about ~500 times > for single byte access. Something like org.apache.beam.sdk.util.VarInt will > get significant benefit from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9461) CLONE - To use ByteArrayOutput/InputStream without synchronization
Kyoungha Min created BEAM-9461: -- Summary: CLONE - To use ByteArrayOutput/InputStream without synchronization Key: BEAM-9461 URL: https://issues.apache.org/jira/browse/BEAM-9461 Project: Beam Issue Type: Wish Components: sdk-java-core Reporter: Kyoungha Min It would be nice to see Beam using custom ByteArrayInput/OutputStream without synchronization. It currently uses `ThreadLocal`, so using thread-safe stream seems unnecessary. And all streams should never be access by more than 1 thread from the start anyway. Simply getting rid of the synchronized keyword will speed up about ~500 times for single byte access. Something like org.apache.beam.sdk.util.VarInt will get significant benefit from it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=398962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398962 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 06/Mar/20 07:04 Start Date: 06/Mar/20 07:04 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#issuecomment-595630682 @reuvenlax can you have a look at the changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398962) Time Spent: 5h 50m (was: 5h 40m) > BIP-1: Typed options for Row Schema and Fields > -- > > Key: BEAM-9035 > URL: https://issues.apache.org/jira/browse/BEAM-9035 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > Fix For: 2.19.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > This is the first issue of a multipart commit: this ticket implements the > basic infrastructure of options on row and field. > Full explanation: > Introduce the concept of Options in Beam Schema’s to add extra context to > fields and schema. In contracts to metadata, options would be added to > fields, logical types and rows. In the options schema convertors can add > options/annotations/decorators that were in the original schema, this context > can be used in the rest of the pipeline for specific transformations or > augment the end schema in the target output. > Examples of options are: > * informational: like the source of the data, ... > * drive decisions further in the pipeline: flatten a row into another, > rename a field, ... > * influence something in the output: like cluster index, primary key, ... > * logical type information > And option is a key/typed value combination. The advantages of having the > value types is: > * Having strongly typed options would give a *portable way of Logical Types* > to have structured information that could be shared over different languages. > * This could keep the type intact when mapping from a formats that have > strongly typed options (example: Protobuf). > This is part of a multi ticket implementation. The following tickets are > related: > # Typed options for Row Schema and Fields > # Convert Proto Options to Beam Schema options > # Convert Avro extra information for Beam string options > # Replace meta data with Logical Type options > # Extract meta data in Calcite SQL to Beam options > # Extract meta data in Zeta SQL to Beam options > # Add java example of using option in a transform > This feature is discussed with Reuven Lax, Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9325) UnownedOutputStream not overriding Array write method.
[ https://issues.apache.org/jira/browse/BEAM-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyoungha Min updated BEAM-9325: --- Description: org.apache.beam.sdk.util.UnownedOutputStream does not override a method `public void write(byte b[], int off, int len) throws IOException` resulting in extremely slow writing speed. This is because `java.io.FilteredOutputStream` does not provide proper method. was: org.apache.beam.sdk.util.UnownedOutputStream does not override a method `public void write(byte b[], int off, int len) throws IOException` resulting in extremely slow writing speed. This is because `java.io.FilteredOutputStream` does not provide proper method. The throughput degradation is significant enough to put it as bug. Anything that uses `UnownedInputStream`, including `CoderUtils.decodeFromByteArray`, `CoderUtils.decodeFromSafeStream` etc, are extremely slow. Issue Type: Improvement (was: Bug) > UnownedOutputStream not overriding Array write method. > -- > > Key: BEAM-9325 > URL: https://issues.apache.org/jira/browse/BEAM-9325 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Affects Versions: 2.19.0 >Reporter: Kyoungha Min >Priority: Major > Fix For: Not applicable > > Original Estimate: 1m > Remaining Estimate: 1m > > org.apache.beam.sdk.util.UnownedOutputStream does not override a method > `public void write(byte b[], int off, int len) throws IOException` > resulting in extremely slow writing speed. > This is because `java.io.FilteredOutputStream` does not provide proper method. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9325) UnownedOutputStream not overriding Array write method.
[ https://issues.apache.org/jira/browse/BEAM-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyoungha Min updated BEAM-9325: --- Description: org.apache.beam.sdk.util.UnownedOutputStream does not override a method `public void write(byte b[], int off, int len) throws IOException` resulting in extremely slow writing speed. This is because `java.io.FilteredOutputStream` does not provide proper method. The throughput degradation is significant enough to put it as bug. Anything that uses `UnownedInputStream`, including `CoderUtils.decodeFromByteArray`, `CoderUtils.decodeFromSafeStream` etc, are extremely slow. was: org.apache.beam.sdk.util.UnownedOutputStream does not override a method `public void write(byte b[], int off, int len) throws IOException` resulting in extremely slow writing speed. This is because `java.io.FilteredOutputStream` does not provide proper method. Issue Type: Bug (was: Improvement) > UnownedOutputStream not overriding Array write method. > -- > > Key: BEAM-9325 > URL: https://issues.apache.org/jira/browse/BEAM-9325 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.19.0 >Reporter: Kyoungha Min >Priority: Major > Fix For: Not applicable > > Original Estimate: 1m > Remaining Estimate: 1m > > org.apache.beam.sdk.util.UnownedOutputStream does not override a method > `public void write(byte b[], int off, int len) throws IOException` > resulting in extremely slow writing speed. > This is because `java.io.FilteredOutputStream` does not provide proper method. > > The throughput degradation is significant enough to put it as bug. > > Anything that uses `UnownedInputStream`, including > `CoderUtils.decodeFromByteArray`, `CoderUtils.decodeFromSafeStream` etc, are > extremely slow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398951 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Mar/20 05:55 Start Date: 06/Mar/20 05:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595612913 thanks Ning! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398951) Time Spent: 99h 20m (was: 99h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 99h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398950&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398950 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Mar/20 05:55 Start Date: 06/Mar/20 05:55 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398950) Time Spent: 99h 10m (was: 99h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 99h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398934 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Mar/20 05:14 Start Date: 06/Mar/20 05:14 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595604106 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398934) Time Spent: 99h (was: 98h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 99h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398933 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Mar/20 05:13 Start Date: 06/Mar/20 05:13 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595603969 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398933) Time Spent: 98h 50m (was: 98h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 98h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398932 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Mar/20 05:13 Start Date: 06/Mar/20 05:13 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595603902 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398932) Time Spent: 98h 40m (was: 98.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 98h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=398908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398908 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 06/Mar/20 04:29 Start Date: 06/Mar/20 04:29 Worklog Time Spent: 10m Work Description: chadrik commented on issue #11038: [BEAM-7746] More typing fixes URL: https://github.com/apache/beam/pull/11038#issuecomment-595594678 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398908) Time Spent: 71h 50m (was: 71h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 71h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9460) Unable to Start DataFlow Runner in latest version 2.19
[ https://issues.apache.org/jira/browse/BEAM-9460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] karthik updated BEAM-9460: -- Component/s: (was: beam-community) runner-dataflow dependencies > Unable to Start DataFlow Runner in latest version 2.19 > -- > > Key: BEAM-9460 > URL: https://issues.apache.org/jira/browse/BEAM-9460 > Project: Beam > Issue Type: Bug > Components: dependencies, runner-dataflow >Affects Versions: 2.19.0 >Reporter: karthik >Assignee: Aizhamal Nurmamat kyzy >Priority: Major > > *Unable to Start DataFlow Runner. It was working in old version 2.18. > Exception trace in the latest version* > INFO: No stagingLocation provided, falling back to gcpTempLocation > [WARNING] > java.lang.RuntimeException: Failed to construct instance from factory method > DataflowRunner#fromOptions(interface > org.apache.beam.sdk.options.PipelineOptions) > at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod > (InstanceBuilder.java:224) > at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155) > at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55) > at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147) > at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run > (ActiveUsersCube.java:84) > at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main > (ActiveUsersCube.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) > at java.lang.Thread.run (Thread.java:745) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod > (InstanceBuilder.java:214) > at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155) > at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55) > at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147) > at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run > (ActiveUsersCube.java:84) > at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main > (ActiveUsersCube.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) > at java.lang.Thread.run (Thread.java:745) > Caused by: java.lang.IllegalArgumentException: No files to stage has been > found. > at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions > (DataflowRunner.java:281) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod > (InstanceBuilder.java:214) > at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155) > at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55) > at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147) > at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run > (ActiveUsersCube.java:84) > at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main > (ActiveUsersCube.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke > (NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke > (DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke (Method.java:498) > at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) > at java.lang.Thread.run (Thread.java:745) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398872&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398872 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 06/Mar/20 02:37 Start Date: 06/Mar/20 02:37 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11020: [BEAM-7926] Update Data Visualization URL: https://github.com/apache/beam/pull/11020#issuecomment-595568670 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398872) Time Spent: 57h 40m (was: 57.5h) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 57h 40m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9460) Unable to Start DataFlow Runner in latest version 2.19
karthik created BEAM-9460: - Summary: Unable to Start DataFlow Runner in latest version 2.19 Key: BEAM-9460 URL: https://issues.apache.org/jira/browse/BEAM-9460 Project: Beam Issue Type: Bug Components: beam-community Affects Versions: 2.19.0 Reporter: karthik Assignee: Aizhamal Nurmamat kyzy *Unable to Start DataFlow Runner. It was working in old version 2.18. Exception trace in the latest version* INFO: No stagingLocation provided, falling back to gcpTempLocation [WARNING] java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod (InstanceBuilder.java:224) at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155) at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55) at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147) at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run (ActiveUsersCube.java:84) at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main (ActiveUsersCube.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:745) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod (InstanceBuilder.java:214) at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155) at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55) at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147) at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run (ActiveUsersCube.java:84) at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main (ActiveUsersCube.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:745) Caused by: java.lang.IllegalArgumentException: No files to stage has been found. at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions (DataflowRunner.java:281) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod (InstanceBuilder.java:214) at org.apache.beam.sdk.util.InstanceBuilder.build (InstanceBuilder.java:155) at org.apache.beam.sdk.PipelineRunner.fromOptions (PipelineRunner.java:55) at org.apache.beam.sdk.Pipeline.create (Pipeline.java:147) at com.pearson.gap.analytics.activeusers.ActiveUsersCube.run (ActiveUsersCube.java:84) at com.pearson.gap.analytics.activeusers.ActiveUsersCube.main (ActiveUsersCube.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:745) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Yang resolved BEAM-8841. - Resolution: Fixed > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Fix For: 2.21.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8841 started by Chun Yang. --- > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Fix For: 2.21.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chun Yang updated BEAM-8841: Fix Version/s: 2.21.0 > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Fix For: 2.21.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9459) Go Postcommit failing at GBK
[ https://issues.apache.org/jira/browse/BEAM-9459?focusedWorklogId=398830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398830 ] ASF GitHub Bot logged work on BEAM-9459: Author: ASF GitHub Bot Created on: 06/Mar/20 01:15 Start Date: 06/Mar/20 01:15 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11061: [BEAM-9459] Revert "[BEAM-6374] Emit PCollection metrics from GoSDK" URL: https://github.com/apache/beam/pull/11061 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398830) Remaining Estimate: 0h Time Spent: 10m > Go Postcommit failing at GBK > > > Key: BEAM-9459 > URL: https://issues.apache.org/jira/browse/BEAM-9459 > Project: Beam > Issue Type: Bug > Components: sdk-go, test-failures >Reporter: Daniel Oliveira >Assignee: Robert Burke >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/] > [https://scans.gradle.com/s/es67rfaomu26m] > > {noformat} > 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 > 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 > 2020/03/06 00:47:41 Console: > https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing > 2020/03/06 00:47:41 Logs: > https://console.cloud.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782 > ... > 2020/03/06 00:50:41 Test cogbk:cogbk failed: job > 2020-03-05_16_47_40-13139296997856231782 failed{noformat} > And then in the console logs: > [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782&minLogLevel=500&expandAll=false×tamp=2020-03-06T01:01:14.21000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2020-03-06T00:01:14.460Z&dateRangeEnd=2020-03-06T01:01:14.460Z&interval=PT1H&scrollTimestamp=2020-03-06T00:49:14.413355915Z] > > {code:java} > exception: "java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > -165: process bundle failed for instruction -165 using plan -122 : panic: > Unexpected coder: > CoGBK goroutine 81 > [running]: > runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40) > /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 > +0x60 > panic(0xd2c5e0, 0xc000bd7f40) > /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0, > 0xc000aa4930, 0xc000b64a00) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 > +0x479 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0, > 0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 > +0xfe > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, > 0xc000b57f80, 0xc000346c28, 0x0, 0x0) > > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 > +0x6c > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0, > 0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, > 0xff0380, 0xc000b57fc0, 0xc000346d
[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable
[ https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=398828&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398828 ] ASF GitHub Bot logged work on BEAM-8280: Author: ASF GitHub Bot Created on: 06/Mar/20 01:08 Start Date: 06/Mar/20 01:08 Worklog Time Spent: 10m Work Description: udim commented on issue #10717: [BEAM-8280] Enable type hint annotations URL: https://github.com/apache/beam/pull/10717#issuecomment-595524860 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398828) Time Spent: 7h 20m (was: 7h 10m) > re-enable IOTypeHints.from_callable > --- > > Key: BEAM-8280 > URL: https://issues.apache.org/jira/browse/BEAM-8280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 7h 20m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/BEAM-8279 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9459) Go Postcommit failing at GBK
[ https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-9459: -- Description: Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/] [https://scans.gradle.com/s/es67rfaomu26m] {noformat} 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 2020/03/06 00:47:41 Console: https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing 2020/03/06 00:47:41 Logs: https://console.cloud.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782 ... 2020/03/06 00:50:41 Test cogbk:cogbk failed: job 2020-03-05_16_47_40-13139296997856231782 failed{noformat} And then in the console logs: [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782&minLogLevel=500&expandAll=false×tamp=2020-03-06T01:01:14.21000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2020-03-06T00:01:14.460Z&dateRangeEnd=2020-03-06T01:01:14.460Z&interval=PT1H&scrollTimestamp=2020-03-06T00:49:14.413355915Z] {code:java} exception: "java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -165: process bundle failed for instruction -165 using plan -122 : panic: Unexpected coder: CoGBK goroutine 81 [running]: runtime/debug.Stack(0xc001103970, 0xd2c5e0, 0xc000bd7f40) /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc001103b90) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 +0x60 panic(0xd2c5e0, 0xc000bd7f40) /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc000b99cc0, 0xc000aa4930, 0xc000b64a00) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 +0x479 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000af3dd0, 0x10018e0, 0xc000b57f80, 0x0, 0xc000346b50) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 +0xfe github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, 0xc000b57f80, 0xc000346c28, 0x0, 0x0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 +0x6c github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0002623f0, 0x10018e0, 0xc000b57f80, 0xc0002365a0, 0x4, 0xff0340, 0xc000aa4750, 0xff0380, 0xc000b57fc0, 0xc000346de0, ...) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93 +0xdf github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680, 0x10017a0, 0xc0001bafc0, 0xc000b57dc0, 0xc0001bafc0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211 +0xa34 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0, 0xc0001bafc0, 0xc000b57dc0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118 +0x1cf created by github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go
[jira] [Updated] (BEAM-9459) Go Postcommit failing at GBK
[ https://issues.apache.org/jira/browse/BEAM-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-9459: -- Description: Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/] [https://scans.gradle.com/s/es67rfaomu26m] {noformat} 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 2020/03/06 00:47:41 Console: https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing 2020/03/06 00:47:41 Logs: https://console.cloud.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782 ... 2020/03/06 00:50:41 Test cogbk:cogbk failed: job 2020-03-05_16_47_40-13139296997856231782 failed{noformat} And then in the console logs: [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782&minLogLevel=500&expandAll=false×tamp=2020-03-06T01:01:14.21000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2020-03-06T00:01:14.460Z&dateRangeEnd=2020-03-06T01:01:14.460Z&interval=PT1H&scrollTimestamp=2020-03-06T00:49:14.413355915Z] {code:java} Error message from worker: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -489: process bundle failed for instruction -489 using plan -446 : panic: Unexpected coder: CoGBK goroutine 87 [running]: runtime/debug.Stack(0xc0010ff970, 0xd2c5e0, 0xc00022e3d0) /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc0010ffb90) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 +0x60 panic(0xd2c5e0, 0xc00022e3d0) /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc0013dc460, 0xc0002466c0, 0xc000166000) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 +0x479 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc001313dd0, 0x10018e0, 0xc000268080, 0x0, 0xc0013f3b50) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 +0xfe github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, 0xc000268080, 0xc0013f3c28, 0x0, 0x0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 +0x6c github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0013e8000, 0x10018e0, 0xc000268080, 0xc000d14008, 0x4, 0xff0340, 0xc0002461e0, 0xff0380, 0xc0002680c0, 0xc0013f3de0, ...) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93 +0xdf github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680, 0x10017a0, 0xc0001bafc0, 0xc00136d9c0, 0xc0001bafc0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211 +0xa34 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0, 0xc0001bafc0, 0xc00136d9c0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118 +0x1cf created by github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/c
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398822 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 06/Mar/20 01:03 Start Date: 06/Mar/20 01:03 Worklog Time Spent: 10m Work Description: youngoli commented on issue #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#issuecomment-595523525 Done: https://jira.apache.org/jira/browse/BEAM-9459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398822) Time Spent: 3h 50m (was: 3h 40m) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9459) Go Postcommit failing at GBK
Daniel Oliveira created BEAM-9459: - Summary: Go Postcommit failing at GBK Key: BEAM-9459 URL: https://issues.apache.org/jira/browse/BEAM-9459 Project: Beam Issue Type: Bug Components: sdk-go, test-failures Reporter: Daniel Oliveira Assignee: Robert Burke Example: [https://builds.apache.org/job/beam_PostCommit_Go_PR/106/] [https://scans.gradle.com/s/es67rfaomu26m] {noformat} 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 2020/03/06 00:47:41 Submitted job: 2020-03-05_16_47_40-13139296997856231782 2020/03/06 00:47:41 Console: https://console.cloud.google.com/dataflow/job/2020-03-05_16_47_40-13139296997856231782?project=apache-beam-testing 2020/03/06 00:47:41 Logs: https://console.cloud.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782 ... 2020/03/06 00:50:41 Test cogbk:cogbk failed: job 2020-03-05_16_47_40-13139296997856231782 failed{noformat} And then in the console logs: [https://pantheon.corp.google.com/logs/viewer?project=apache-beam-testing&resource=dataflow_step%2Fjob_id%2F2020-03-05_16_47_40-13139296997856231782&minLogLevel=500&expandAll=false×tamp=2020-03-06T01:01:14.21000Z&customFacets=&limitCustomFacetWidth=true&dateRangeStart=2020-03-06T00:01:14.460Z&dateRangeEnd=2020-03-06T01:01:14.460Z&interval=PT1H&scrollTimestamp=2020-03-06T00:49:14.413355915Z] {noformat} Error message from worker: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -489: process bundle failed for instruction -489 using plan -446 : panic: Unexpected coder: CoGBK goroutine 87 [running]: runtime/debug.Stack(0xc0010ff970, 0xd2c5e0, 0xc00022e3d0) /usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc0010ffb90) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 +0x60 panic(0xd2c5e0, 0xc00022e3d0) /usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc0013dc460, 0xc0002466c0, 0xc000166000) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 +0x479 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc001313dd0, 0x10018e0, 0xc000268080, 0x0, 0xc0013f3b50) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 +0xfe github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, 0xc000268080, 0xc0013f3c28, 0x0, 0x0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 +0x6c github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc0013e8000, 0x10018e0, 0xc000268080, 0xc000d14008, 0x4, 0xff0340, 0xc0002461e0, 0xff0380, 0xc0002680c0, 0xc0013f3de0, ...) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93 +0xdf github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4680, 0x10017a0, 0xc0001bafc0, 0xc00136d9c0, 0xc0001bafc0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211 +0xa34 github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0, 0xc0001bafc0, 0xc00136d9c0) /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118 +0x1cf created by github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main /home/
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398819 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 06/Mar/20 01:00 Start Date: 06/Mar/20 01:00 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595522865 exciting. thanks @chunyang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398819) Time Spent: 8h (was: 7h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 8h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398820 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 06/Mar/20 01:01 Start Date: 06/Mar/20 01:01 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595523014 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398820) Time Spent: 98.5h (was: 98h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 98.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398818 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 06/Mar/20 01:00 Start Date: 06/Mar/20 01:00 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398818) Time Spent: 7h 50m (was: 7h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398806 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 06/Mar/20 00:50 Start Date: 06/Mar/20 00:50 Worklog Time Spent: 10m Work Description: lostluck commented on issue #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#issuecomment-595520053 Could you file a JIRA with the trace and assign it to me please? I'm in the middle of packing. https://github.com/apache/beam/pull/11061 is the revert. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398806) Time Spent: 3h 40m (was: 3.5h) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty
[ https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=398804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398804 ] ASF GitHub Bot logged work on BEAM-6374: Author: ASF GitHub Bot Created on: 06/Mar/20 00:49 Start Date: 06/Mar/20 00:49 Worklog Time Spent: 10m Work Description: lostluck commented on issue #11061: Revert "[BEAM-6374] Emit PCollection metrics from GoSDK" URL: https://github.com/apache/beam/pull/11061#issuecomment-595519768 Run Go Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398804) Time Spent: 1h 20m (was: 1h 10m) > "elements added" for input and output collections is always empty > - > > Key: BEAM-6374 > URL: https://issues.apache.org/jira/browse/BEAM-6374 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The field for "Elements added" and "Estimated size" is always blank when > running a Go binary on Dataflow. For example when running the work count > example: https://pasteboard.co/HVf80BU.png -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6374) "elements added" for input and output collections is always empty
[ https://issues.apache.org/jira/browse/BEAM-6374?focusedWorklogId=398801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398801 ] ASF GitHub Bot logged work on BEAM-6374: Author: ASF GitHub Bot Created on: 06/Mar/20 00:48 Start Date: 06/Mar/20 00:48 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #11061: Revert "[BEAM-6374] Emit PCollection metrics from GoSDK" URL: https://github.com/apache/beam/pull/11061 Reverts apache/beam#10942 Seems to be breaking the post commit. Since I'm going on vacation tonight, I'm rolling to back, and will look into it when I get back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398801) Time Spent: 1h 10m (was: 1h) > "elements added" for input and output collections is always empty > - > > Key: BEAM-6374 > URL: https://issues.apache.org/jira/browse/BEAM-6374 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-go >Reporter: Andrew Brampton >Assignee: Robert Burke >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The field for "Elements added" and "Estimated size" is always blank when > running a Go binary on Dataflow. For example when running the work count > example: https://pasteboard.co/HVf80BU.png -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398800 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 06/Mar/20 00:47 Start Date: 06/Mar/20 00:47 Worklog Time Spent: 10m Work Description: lostluck commented on issue #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#issuecomment-595519138 No, but it looks like it's somehow related to mine. I'm going to roll it back. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398800) Time Spent: 3.5h (was: 3h 20m) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable
[ https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=398787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398787 ] ASF GitHub Bot logged work on BEAM-8280: Author: ASF GitHub Bot Created on: 06/Mar/20 00:30 Start Date: 06/Mar/20 00:30 Worklog Time Spent: 10m Work Description: udim commented on issue #10717: [BEAM-8280] Enable type hint annotations URL: https://github.com/apache/beam/pull/10717#issuecomment-595514536 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398787) Time Spent: 7h 10m (was: 7h) > re-enable IOTypeHints.from_callable > --- > > Key: BEAM-8280 > URL: https://issues.apache.org/jira/browse/BEAM-8280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/BEAM-8279 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable
[ https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=398786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398786 ] ASF GitHub Bot logged work on BEAM-8280: Author: ASF GitHub Bot Created on: 06/Mar/20 00:30 Start Date: 06/Mar/20 00:30 Worklog Time Spent: 10m Work Description: udim commented on issue #10717: [BEAM-8280] Enable type hint annotations URL: https://github.com/apache/beam/pull/10717#issuecomment-595514519 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398786) Time Spent: 7h (was: 6h 50m) > re-enable IOTypeHints.from_callable > --- > > Key: BEAM-8280 > URL: https://issues.apache.org/jira/browse/BEAM-8280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 7h > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/BEAM-8279 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398784 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 06/Mar/20 00:25 Start Date: 06/Mar/20 00:25 Worklog Time Spent: 10m Work Description: youngoli commented on issue #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#issuecomment-595512896 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398784) Time Spent: 3h 20m (was: 3h 10m) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398783 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 06/Mar/20 00:24 Start Date: 06/Mar/20 00:24 Worklog Time Spent: 10m Work Description: youngoli commented on issue #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#issuecomment-595512813 The Postcommit error doesn't seem to be directly related to my change from what I can tell: > Error message from worker: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -488: process bundle failed for instruction -488 using plan -445 : panic: Unexpected coder: CoGBK goroutine 87 [running]: > runtime/debug.Stack(0xc00109d970, 0xd2c5e0, 0xc00113cb00) >/usr/lib/go-1.12/src/runtime/debug/stack.go:24 +0x9d > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic.func1(0xc00109db90) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:40 +0x60 > panic(0xd2c5e0, 0xc00113cb00) >/usr/lib/go-1.12/src/runtime/panic.go:522 +0x1b5 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.MakeElementEncoder(0xc9bdb0, 0xc00114b620, 0xc000822000) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/coder.go:91 +0x479 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*PCollection).Up(0xc000c20fc0, 0x10018e0, 0xc000c40f00, 0x0, 0xc0010b7b50) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/pcollection.go:59 +0xfe > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.callNoPanic(0x10018e0, 0xc000c40f00, 0xc0010b7c28, 0x0, 0x0) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/util.go:43 +0x6c > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec.(*Plan).Execute(0xc001222ee0, 0x10018e0, 0xc000c40f00, 0xc000d1a490, 0x4, 0xff0340, 0xc00114b440, 0xff0380, 0xc000c40f40, 0xc0010b7de0, ...) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/exec/plan.go:93 +0xdf > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.(*control).handleInstruction(0xc0001f4480, 0x10017a0, 0xc0001bafc0, 0xc000c40d40, 0xc0001bafc0) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:211 +0xa34 > github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main.func2(0x10017a0, 0xc0001bafc0, 0xc000c40d40) > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:118 +0x1cf > created by github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness.Main > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Go_PR/src/sdks/go/test/.gogradle/project_gopath/src/github.com/apache/beam/sdks/go/test/vendor/github.com/apache/beam/sdks/go/pkg/beam/core/runtime/harness/harness.go:131 +0x6e8 > > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 3
[jira] [Work logged] (BEAM-9448) Misleading log line: says "downloading" when using cache
[ https://issues.apache.org/jira/browse/BEAM-9448?focusedWorklogId=398779&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398779 ] ASF GitHub Bot logged work on BEAM-9448: Author: ASF GitHub Bot Created on: 06/Mar/20 00:14 Start Date: 06/Mar/20 00:14 Worklog Time Spent: 10m Work Description: ibzib commented on issue #11051: [BEAM-9448] Fix log message for job server cache. URL: https://github.com/apache/beam/pull/11051#issuecomment-595509792 Run RAT PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398779) Time Spent: 40m (was: 0.5h) > Misleading log line: says "downloading" when using cache > > > Key: BEAM-9448 > URL: https://issues.apache.org/jira/browse/BEAM-9448 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Trivial > Labels: portability-flink > Time Spent: 40m > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/8d253ac99d78ef5345245ed71c7cf34328c55d9f/sdks/python/apache_beam/utils/subprocess_server.py#L197 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9458) Make Dataflow executed UnboundedSources using SDF as the default
[ https://issues.apache.org/jira/browse/BEAM-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9458: --- Status: Open (was: Triage Needed) > Make Dataflow executed UnboundedSources using SDF as the default > > > Key: BEAM-9458 > URL: https://issues.apache.org/jira/browse/BEAM-9458 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398776 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 23:56 Start Date: 05/Mar/20 23:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595504593 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398776) Time Spent: 98h 20m (was: 98h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 98h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398775 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 23:56 Start Date: 05/Mar/20 23:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595504547 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398775) Time Spent: 98h 10m (was: 98h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 98h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398774 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 23:55 Start Date: 05/Mar/20 23:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595504514 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398774) Time Spent: 98h (was: 97h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 98h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398773&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398773 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 23:50 Start Date: 05/Mar/20 23:50 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595503135 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398773) Time Spent: 97h 50m (was: 97h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 97h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398771 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 23:50 Start Date: 05/Mar/20 23:50 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11032: [BEAM-8335] Display rather than logging when is_in_notebook. URL: https://github.com/apache/beam/pull/11032#issuecomment-595503103 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398771) Time Spent: 97h 40m (was: 97.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 97h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398766&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398766 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 05/Mar/20 23:42 Start Date: 05/Mar/20 23:42 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11020: [BEAM-7926] Update Data Visualization URL: https://github.com/apache/beam/pull/11020#issuecomment-595500738 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398766) Time Spent: 57.5h (was: 57h 20m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 57.5h > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9458) Make Dataflow executed UnboundedSources using SDF as the default
Luke Cwik created BEAM-9458: --- Summary: Make Dataflow executed UnboundedSources using SDF as the default Key: BEAM-9458 URL: https://issues.apache.org/jira/browse/BEAM-9458 Project: Beam Issue Type: Sub-task Components: runner-dataflow Reporter: Luke Cwik Assignee: Luke Cwik -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API
[ https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398763 ] ASF GitHub Bot logged work on BEAM-8932: Author: ASF GitHub Bot Created on: 05/Mar/20 23:28 Start Date: 05/Mar/20 23:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10478: [BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO URL: https://github.com/apache/beam/pull/10478#issuecomment-595496877 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398763) Time Spent: 16h 50m (was: 16h 40m) > Expose complete Cloud Pub/Sub messages through PubsubIO API > --- > > Key: BEAM-8932 > URL: https://issues.apache.org/jira/browse/BEAM-8932 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: Major > Time Spent: 16h 50m > Remaining Estimate: 0h > > The PubsubIO API only exposes a subset of the fields in the underlying > PubsubMessage protocol buffer. To accomodate future feature changes as well > as for greater compatability with code using the Cloud Pub/Sub apis, a method > to read and write these protocol messages should be exposed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API
[ https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398765 ] ASF GitHub Bot logged work on BEAM-8932: Author: ASF GitHub Bot Created on: 05/Mar/20 23:28 Start Date: 05/Mar/20 23:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10478: [BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO URL: https://github.com/apache/beam/pull/10478#issuecomment-595496977 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398765) Time Spent: 17h 10m (was: 17h) > Expose complete Cloud Pub/Sub messages through PubsubIO API > --- > > Key: BEAM-8932 > URL: https://issues.apache.org/jira/browse/BEAM-8932 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: Major > Time Spent: 17h 10m > Remaining Estimate: 0h > > The PubsubIO API only exposes a subset of the fields in the underlying > PubsubMessage protocol buffer. To accomodate future feature changes as well > as for greater compatability with code using the Cloud Pub/Sub apis, a method > to read and write these protocol messages should be exposed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API
[ https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398764 ] ASF GitHub Bot logged work on BEAM-8932: Author: ASF GitHub Bot Created on: 05/Mar/20 23:28 Start Date: 05/Mar/20 23:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10478: [BEAM-8932][Cleanup] Extract PubsubBoundedWriter from PubsubIO URL: https://github.com/apache/beam/pull/10478#issuecomment-595496951 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398764) Time Spent: 17h (was: 16h 50m) > Expose complete Cloud Pub/Sub messages through PubsubIO API > --- > > Key: BEAM-8932 > URL: https://issues.apache.org/jira/browse/BEAM-8932 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: Major > Time Spent: 17h > Remaining Estimate: 0h > > The PubsubIO API only exposes a subset of the fields in the underlying > PubsubMessage protocol buffer. To accomodate future feature changes as well > as for greater compatability with code using the Cloud Pub/Sub apis, a method > to read and write these protocol messages should be exposed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=398761&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398761 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 05/Mar/20 23:15 Start Date: 05/Mar/20 23:15 Worklog Time Spent: 10m Work Description: chadrik commented on issue #11038: [BEAM-7746] More typing fixes URL: https://github.com/apache/beam/pull/11038#issuecomment-595493314 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398761) Time Spent: 71h 40m (was: 71.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 71h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398757 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 23:02 Start Date: 05/Mar/20 23:02 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595489779 yup it seems like flaky/unrelated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398757) Time Spent: 7h 40m (was: 7.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398756 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 23:02 Start Date: 05/Mar/20 23:02 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595489733 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398756) Time Spent: 7.5h (was: 7h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398750 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 23:00 Start Date: 05/Mar/20 23:00 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595489197 Flaky/unrelated tests? I can't seem to reproduce locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398750) Time Spent: 7h 20m (was: 7h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398747 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 22:55 Start Date: 05/Mar/20 22:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595487536 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398747) Time Spent: 7h 10m (was: 7h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9457) Allow WriteToBigQuery with external data resource
[ https://issues.apache.org/jira/browse/BEAM-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Bai updated BEAM-9457: -- Status: Open (was: Triage Needed) > Allow WriteToBigQuery with external data resource > - > > Key: BEAM-9457 > URL: https://issues.apache.org/jira/browse/BEAM-9457 > Project: Beam > Issue Type: New Feature > Components: io-py-gcp >Reporter: Wenbing Bai >Priority: Major > > Create another WriteToBigQuery.Method to allow user writeToBigQuery with > external data source like GCS, instead of loading the data to BigQuery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9457) Allow WriteToBigQuery with external data resource
Wenbing Bai created BEAM-9457: - Summary: Allow WriteToBigQuery with external data resource Key: BEAM-9457 URL: https://issues.apache.org/jira/browse/BEAM-9457 Project: Beam Issue Type: New Feature Components: io-py-gcp Reporter: Wenbing Bai Create another WriteToBigQuery.Method to allow user writeToBigQuery with external data source like GCS, instead of loading the data to BigQuery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398741 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 05/Mar/20 22:44 Start Date: 05/Mar/20 22:44 Worklog Time Spent: 10m Work Description: youngoli commented on issue #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#issuecomment-595484373 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398741) Time Spent: 3h (was: 2h 50m) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398740 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 05/Mar/20 22:43 Start Date: 05/Mar/20 22:43 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#discussion_r388609914 ## File path: sdks/go/pkg/beam/core/graph/fn.go ## @@ -209,21 +209,74 @@ func (f *DoFn) RestrictionT() *reflect.Type { // a KV or not based on the other signatures (unless we're more loose about which // sideinputs are present). Bind should respect that. +// The following constants prefixed with "Main" represent possible numbers of +// DoFn main inputs for DoFn construction and validation. Any value not defined +// here is an invalid number of main inputs. +const ( + MainUnknown = -1 // The number of main inputs is unknown for DoFn validation. Review comment: I'm leaving it exported only because AsDoFn is currently exported and takes one of these constants as an input. Making this unexported would make it impossible to call AsDoFn with the existing behavior (unknown num. of inputs). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398740) Time Spent: 2h 50m (was: 2h 40m) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398736 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 05/Mar/20 22:40 Start Date: 05/Mar/20 22:40 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#discussion_r388608923 ## File path: sdks/go/pkg/beam/core/graph/fn.go ## @@ -209,21 +209,74 @@ func (f *DoFn) RestrictionT() *reflect.Type { // a KV or not based on the other signatures (unless we're more loose about which // sideinputs are present). Bind should respect that. +// The following constants prefixed with "Main" represent possible numbers of Review comment: I definitely like those options better. Went with the unexported constant type, since it makes the code more self-documenting as opposed to raw numbers. Also removed the validation check on that parameter, like you suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398736) Time Spent: 2h 40m (was: 2.5h) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398731 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 22:24 Start Date: 05/Mar/20 22:24 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r388602678 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', Review comment: I agree, though as discussed earlier we might have difficulties parsing non-string options. I'll try it and see how it goes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398731) Time Spent: 1h (was: 50m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 1h > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398730 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 22:22 Start Date: 05/Mar/20 22:22 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#discussion_r388601783 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser): ' directly, rather than starting up a job server.' ' Only applies when flink_master is set to a' ' cluster address. Requires Python 3.6+.') +parser.add_argument( +'--parallelism', +default=-1, +type=int, +help='The degree of parallelism to be used when distributing ' + 'operations onto workers. If the parallelism is not set, the ' + 'configured Flink default is used, or 1 if none can be found.' +) +parser.add_argument( +'--execution_mode_for_batch', +default='PIPELINED', +help='Flink mode for data exchange of batch pipelines. ' Review comment: I think that's what experiment(s) are for: https://github.com/apache/beam/blob/35beffc5775636eb96e33eb57c6e5f213cfe033a/sdks/python/apache_beam/options/pipeline_options.py#L803-L811 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398730) Time Spent: 50m (was: 40m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 50m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9448) Misleading log line: says "downloading" when using cache
[ https://issues.apache.org/jira/browse/BEAM-9448?focusedWorklogId=398728&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398728 ] ASF GitHub Bot logged work on BEAM-9448: Author: ASF GitHub Bot Created on: 05/Mar/20 22:18 Start Date: 05/Mar/20 22:18 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11051: [BEAM-9448] Fix log message for job server cache. URL: https://github.com/apache/beam/pull/11051#discussion_r388600537 ## File path: sdks/python/apache_beam/utils/subprocess_server.py ## @@ -194,9 +194,11 @@ def local_jar(cls, url): if os.path.exists(url): return url else: - _LOGGER.warning('Downloading job server jar from %s' % url) cached_jar = os.path.join(cls.JAR_CACHE, os.path.basename(url)) - if not os.path.exists(cached_jar): + if os.path.exists(cached_jar): +_LOGGER.warning('Using cached job server jar from %s' % url) + else: +_LOGGER.warning('Downloading job server jar from %s' % url) Review comment: Changed it to `info`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398728) Time Spent: 0.5h (was: 20m) > Misleading log line: says "downloading" when using cache > > > Key: BEAM-9448 > URL: https://issues.apache.org/jira/browse/BEAM-9448 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Trivial > Labels: portability-flink > Time Spent: 0.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/blob/8d253ac99d78ef5345245ed71c7cf34328c55d9f/sdks/python/apache_beam/utils/subprocess_server.py#L197 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9456) Upgrade to gradle 6.2
[ https://issues.apache.org/jira/browse/BEAM-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Van Boxel updated BEAM-9456: - Status: Open (was: Triage Needed) > Upgrade to gradle 6.2 > - > > Key: BEAM-9456 > URL: https://issues.apache.org/jira/browse/BEAM-9456 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9456) Upgrade to gradle 6.2
Alex Van Boxel created BEAM-9456: Summary: Upgrade to gradle 6.2 Key: BEAM-9456 URL: https://issues.apache.org/jira/browse/BEAM-9456 Project: Beam Issue Type: Task Components: build-system Reporter: Alex Van Boxel Assignee: Alex Van Boxel -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark
[ https://issues.apache.org/jira/browse/BEAM-9434?focusedWorklogId=398717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398717 ] ASF GitHub Bot logged work on BEAM-9434: Author: ASF GitHub Bot Created on: 05/Mar/20 21:44 Start Date: 05/Mar/20 21:44 Worklog Time Spent: 10m Work Description: ecapoccia commented on issue #11037: [BEAM-9434] performance improvements reading many Avro files in S3 URL: https://github.com/apache/beam/pull/11037#issuecomment-595462888 R: @lukecwik do you mind having a look and giving me feedback on this PR? Thanks I look forward to hearing from you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398717) Time Spent: 40m (was: 0.5h) > Performance improvements processing a large number of Avro files in S3+Spark > > > Key: BEAM-9434 > URL: https://issues.apache.org/jira/browse/BEAM-9434 > Project: Beam > Issue Type: Improvement > Components: io-java-aws, sdk-java-core >Affects Versions: 2.19.0 >Reporter: Emiliano Capoccia >Assignee: Emiliano Capoccia >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > There is a performance issue when processing a large number of small Avro > files in Spark on K8S (tens of thousands or more). > The recommended way of reading a pattern of Avro files in Beam is by means of: > > {code:java} > PCollection records = p.apply(AvroIO.read(AvroGenClass.class) > .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) > {code} > However, in the case of many small files, the above results in the entire > reading taking place in a single task/node, which is considerably slow and > has scalability issues. > The option of omitting the hint is not viable, as it results in too many > tasks being spawn, and the cluster being busy doing coordination of tiny > tasks with high overhead. > There are a few workarounds on the internet which mainly revolve around > compacting the input files before processing, so that a reduced number of > bulky files is processed in parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields
[ https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398716 ] ASF GitHub Bot logged work on BEAM-9442: Author: ASF GitHub Bot Created on: 05/Mar/20 21:44 Start Date: 05/Mar/20 21:44 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #11046: [BEAM-9442] Properly handle nullable fields in Select URL: https://github.com/apache/beam/pull/11046#issuecomment-595462750 > @alexvanboxel are you talking about the RabbitMQ failure? yes (rookie, mistake of me) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398716) Time Spent: 1.5h (was: 1h 20m) > Schema Select does not properly handle nested nullable fields > - > > Key: BEAM-9442 > URL: https://issues.apache.org/jira/browse/BEAM-9442 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-harness >Reporter: Reuven Lax >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > A select of a nested field should be nullable if any of its parents are > nullable. So for example, a select of "a.b" should return a field named b > that is nullable if _either_ of a or b is nullable. Today we only examine b > to see if the selected fields should be nullable. > Also the Select transform itself does not properly check for null values, and > throws NullPointerExceptions when some row values are null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark
[ https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050688#comment-17050688 ] Emiliano Capoccia edited comment on BEAM-9434 at 3/5/20, 9:41 PM: -- In the case outlined of a large number of very small (kb) avro files, the idea is to expose a new hint in the AvroIO class that can handle the reading of the input files with a pre determined number of parallel tasks. Both extremes of having a very high or a very low number of tasks should be avoided, as they are suboptimal in terms of performance: too many tasks yield to very high overhead whereas too few (or a single one) result in an unacceptable serialisation on few nodes, with the cluster being under utilised. In my tests I read 6578 Avro files from S3, each containing a single record. The performance of the reading the files using the proposed pull request #11037 improved from 16 minutes to 2.3 minutes with 10 partitions. Even more importantly, the memory used by every node is 1/10th roughly of the case with a single node. *Reference run*, 6578 files, 1 task/executor, shuffle read 164kb, 6578 records, shuffle write 58Mb, 16 minutes execution time. *PR #11037*, 10 tasks/executors, 660 files per task average, totalling 6578; 23kb average shuffle read per task, 6 Mb average shuffle write per task, 2.3 minutes execution time per executor in parallel. was (Author: ecapoccia): In the case outlined of a large number of very small (kb) avro files, the idea is to expose a new hint in the AvroIO class that can handle the reading of the input files with a pre determined number of parallel tasks. Both extremes of having a very high or a very low number of tasks should be avoided, as they are suboptimal in terms of performance: too many tasks yield to very high overhead whereas a too few tasks (or a single one) result in an unacceptable serialisation of reading on too little node, with the cluster being under utilised. In the tests that I carried out, I was reading 6578 Avro files from S3, each containing a single record. The performance of the reading using the proposed pull request #11037 improved using 10 partitions, from 16 minutes to 2.3 minutes for performing the same exact work. Even more importantly, the memory used by every node is 1/10th roughly of the case with a single node. *Reference run*, 6578 files, 1 task/executor, shuffle read 164kb, 6578 records, shuffle write 58Mb, 16 minutes execution time. *PR #11037*, 10 tasks/executors, 660 files per task average, totalling 6578; 23kb average shuffle read per task, 6 Mb average shuffle write per task, 2.3 minutes execution time per executor in parallel. > Performance improvements processing a large number of Avro files in S3+Spark > > > Key: BEAM-9434 > URL: https://issues.apache.org/jira/browse/BEAM-9434 > Project: Beam > Issue Type: Improvement > Components: io-java-aws, sdk-java-core >Affects Versions: 2.19.0 >Reporter: Emiliano Capoccia >Assignee: Emiliano Capoccia >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > There is a performance issue when processing a large number of small Avro > files in Spark on K8S (tens of thousands or more). > The recommended way of reading a pattern of Avro files in Beam is by means of: > > {code:java} > PCollection records = p.apply(AvroIO.read(AvroGenClass.class) > .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) > {code} > However, in the case of many small files, the above results in the entire > reading taking place in a single task/node, which is considerably slow and > has scalability issues. > The option of omitting the hint is not viable, as it results in too many > tasks being spawn, and the cluster being busy doing coordination of tiny > tasks with high overhead. > There are a few workarounds on the internet which mainly revolve around > compacting the input files before processing, so that a reduced number of > bulky files is processed in parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark
[ https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emiliano Capoccia updated BEAM-9434: Description: There is a performance issue when processing a large number of small Avro files in Spark on K8S (tens of thousands or more). The recommended way of reading a pattern of Avro files in Beam is by means of: {code:java} PCollection records = p.apply(AvroIO.read(AvroGenClass.class) .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) {code} However, in the case of many small files, the above results in the entire reading taking place in a single task/node, which is considerably slow and has scalability issues. The option of omitting the hint is not viable, as it results in too many tasks being spawn, and the cluster being busy doing coordination of tiny tasks with high overhead. There are a few workarounds on the internet which mainly revolve around compacting the input files before processing, so that a reduced number of bulky files is processed in parallel. was: There is a performance issue when processing in Spark on K8S a large number of small Avro files (tens of thousands or more). The recommended way of reading a pattern of Avro files in Beam is by means of: {code:java} PCollection records = p.apply(AvroIO.read(AvroGenClass.class) .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) {code} However, in the case of many small files the above results in the entire reading taking place in a single task/node, which is considerably slow and has scalability issues. The option of omitting the hint is not viable, as it results in too many tasks being spawn and the cluster busy doing coordination of tiny tasks with high overhead. There are a few workarounds on the internet which mainly revolve around compacting the input files before processing, so that a reduced number of bulky files is processed in parallel. > Performance improvements processing a large number of Avro files in S3+Spark > > > Key: BEAM-9434 > URL: https://issues.apache.org/jira/browse/BEAM-9434 > Project: Beam > Issue Type: Improvement > Components: io-java-aws, sdk-java-core >Affects Versions: 2.19.0 >Reporter: Emiliano Capoccia >Assignee: Emiliano Capoccia >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > There is a performance issue when processing a large number of small Avro > files in Spark on K8S (tens of thousands or more). > The recommended way of reading a pattern of Avro files in Beam is by means of: > > {code:java} > PCollection records = p.apply(AvroIO.read(AvroGenClass.class) > .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) > {code} > However, in the case of many small files, the above results in the entire > reading taking place in a single task/node, which is considerably slow and > has scalability issues. > The option of omitting the hint is not viable, as it results in too many > tasks being spawn, and the cluster being busy doing coordination of tiny > tasks with high overhead. > There are a few workarounds on the internet which mainly revolve around > compacting the input files before processing, so that a reduced number of > bulky files is processed in parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9450) Update www.apache.org/dist/ links to point to downloads.apache.org
[ https://issues.apache.org/jira/browse/BEAM-9450?focusedWorklogId=398715&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398715 ] ASF GitHub Bot logged work on BEAM-9450: Author: ASF GitHub Bot Created on: 05/Mar/20 21:37 Start Date: 05/Mar/20 21:37 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11054: [BEAM-9450] Update www.apache.org/dist/ links to downloads.apache.org URL: https://github.com/apache/beam/pull/11054 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398715) Time Spent: 20m (was: 10m) > Update www.apache.org/dist/ links to point to downloads.apache.org > -- > > Key: BEAM-9450 > URL: https://issues.apache.org/jira/browse/BEAM-9450 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > Time Spent: 20m > Remaining Estimate: 0h > > Infra is deprecating /dist for downloads, for ref > [https://blogs.apache.org/infra/entry/more-secure-and-robust-downloads] > {quote}As of March 2020, we are deprecating www.apache.org/dist/ in favor of > [https://downloads.apache.org/] > for backup downloads as well as signature and checksum verification. The > primary driver has been splitting up web site visits and downloads to gain > better control and offer a better service for both downloads and web site > visits. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9450) Update www.apache.org/dist/ links to point to downloads.apache.org
[ https://issues.apache.org/jira/browse/BEAM-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-9450. Fix Version/s: Not applicable Resolution: Fixed > Update www.apache.org/dist/ links to point to downloads.apache.org > -- > > Key: BEAM-9450 > URL: https://issues.apache.org/jira/browse/BEAM-9450 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: Not applicable > > Time Spent: 20m > Remaining Estimate: 0h > > Infra is deprecating /dist for downloads, for ref > [https://blogs.apache.org/infra/entry/more-secure-and-robust-downloads] > {quote}As of March 2020, we are deprecating www.apache.org/dist/ in favor of > [https://downloads.apache.org/] > for backup downloads as well as signature and checksum verification. The > primary driver has been splitting up web site visits and downloads to gain > better control and offer a better service for both downloads and web site > visits. > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9434) Performance improvements processing a large number of Avro files in S3+Spark
[ https://issues.apache.org/jira/browse/BEAM-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emiliano Capoccia updated BEAM-9434: Summary: Performance improvements processing a large number of Avro files in S3+Spark (was: Performance improvements processiong a large number of Avro files in S3+Spark) > Performance improvements processing a large number of Avro files in S3+Spark > > > Key: BEAM-9434 > URL: https://issues.apache.org/jira/browse/BEAM-9434 > Project: Beam > Issue Type: Improvement > Components: io-java-aws, sdk-java-core >Affects Versions: 2.19.0 >Reporter: Emiliano Capoccia >Assignee: Emiliano Capoccia >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > There is a performance issue when processing in Spark on K8S a large number > of small Avro files (tens of thousands or more). > The recommended way of reading a pattern of Avro files in Beam is by means of: > > {code:java} > PCollection records = p.apply(AvroIO.read(AvroGenClass.class) > .from("s3://my-bucket/path-to/*.avro").withHintMatchesManyFiles()) > {code} > However, in the case of many small files the above results in the entire > reading taking place in a single task/node, which is considerably slow and > has scalability issues. > The option of omitting the hint is not viable, as it results in too many > tasks being spawn and the cluster busy doing coordination of tiny tasks with > high overhead. > There are a few workarounds on the internet which mainly revolve around > compacting the input files before processing, so that a reduced number of > bulky files is processed in parallel. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields
[ https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398714 ] ASF GitHub Bot logged work on BEAM-9442: Author: ASF GitHub Bot Created on: 05/Mar/20 21:35 Start Date: 05/Mar/20 21:35 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11046: [BEAM-9442] Properly handle nullable fields in Select URL: https://github.com/apache/beam/pull/11046#issuecomment-595459415 @alexvanboxel are you talking about the RabbitMQ failure? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398714) Time Spent: 1h 20m (was: 1h 10m) > Schema Select does not properly handle nested nullable fields > - > > Key: BEAM-9442 > URL: https://issues.apache.org/jira/browse/BEAM-9442 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-harness >Reporter: Reuven Lax >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > A select of a nested field should be nullable if any of its parents are > nullable. So for example, a select of "a.b" should return a field named b > that is nullable if _either_ of a or b is nullable. Today we only examine b > to see if the selected fields should be nullable. > Also the Select transform itself does not properly check for null values, and > throws NullPointerExceptions when some row values are null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398712 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 21:35 Start Date: 05/Mar/20 21:35 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595459219 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398712) Time Spent: 7h (was: 6h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398710 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 21:33 Start Date: 05/Mar/20 21:33 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595458238 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398710) Time Spent: 97.5h (was: 97h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 97.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields
[ https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398703&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398703 ] ASF GitHub Bot logged work on BEAM-9442: Author: ASF GitHub Bot Created on: 05/Mar/20 21:25 Start Date: 05/Mar/20 21:25 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #11046: [BEAM-9442] Properly handle nullable fields in Select URL: https://github.com/apache/beam/pull/11046#issuecomment-595455457 The error in the test is my fault, approved a fix without seeing the tests were ran. Just need to be rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398703) Time Spent: 1h 10m (was: 1h) > Schema Select does not properly handle nested nullable fields > - > > Key: BEAM-9442 > URL: https://issues.apache.org/jira/browse/BEAM-9442 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-harness >Reporter: Reuven Lax >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > A select of a nested field should be nullable if any of its parents are > nullable. So for example, a select of "a.b" should return a field named b > that is nullable if _either_ of a or b is nullable. Today we only examine b > to see if the selected fields should be nullable. > Also the Select transform itself does not properly check for null values, and > throws NullPointerExceptions when some row values are null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=398697&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398697 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 05/Mar/20 21:22 Start Date: 05/Mar/20 21:22 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10791: [BEAM-9250] Update release guide with more instructions. URL: https://github.com/apache/beam/pull/10791#issuecomment-595453999 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398697) Time Spent: 2h 20m (was: 2h 10m) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7556) Enable to upgrade proxy generation independently of beam for java support
[ https://issues.apache.org/jira/browse/BEAM-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-7556. Fix Version/s: Not applicable Resolution: Duplicate > Enable to upgrade proxy generation independently of beam for java support > - > > Key: BEAM-7556 > URL: https://issues.apache.org/jira/browse/BEAM-7556 > Project: Beam > Issue Type: Task > Components: sdk-java-core >Affects Versions: 2.13.0 >Reporter: Romain Manni-Bucau >Priority: Major > Fix For: Not applicable > > > Beam is now using a custom shaded version of bytebudy which makes impossible > - until you reshade - to upgrade bytebuddy without requiring a new beam > release. > However with the fast release rate of the JVM it is important to be able to > upgrade bytebuddy - at least while beam is using it which is technically not > a strong requirement - to enable to run on the new JVM. > For example, last beam release does not support recent java: > {code} > Caused by: java.lang.UnsupportedOperationException: Cannot define class using > reflection: Cannot define nest member class > java.lang.reflect.AccessibleObject$Cache + within different package then > class > org.apache.beam.repackaged.beam_sdks_java_core.net.bytebuddy.mirror.AccessibleObject > {code} > My preference to fix this issue would be to relax the proxying definition to > just use a "proxy classloader" where the proxy would be defined but it > requires to be able to attach it to an execution - where beam is not yet > super clean. > Alternative is to have a SPI for the asm usage and enable to user to replace > the bytebuddy impl with either a not shaded version or even a pure asm one to > let him control the dependencies. > Romain -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9453) Fix potential UnsupportedEncodingException
[ https://issues.apache.org/jira/browse/BEAM-9453?focusedWorklogId=398694&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398694 ] ASF GitHub Bot logged work on BEAM-9453: Author: ASF GitHub Bot Created on: 05/Mar/20 21:17 Start Date: 05/Mar/20 21:17 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #11017: [BEAM-9453] Fix potential UnsupportedEncodingException URL: https://github.com/apache/beam/pull/11017#issuecomment-595452062 > @alexvanboxel it causes broken jenkins test in spotless check on master branch > > https://builds.apache.org/job/beam_PreCommit_Spotless_Commit/7888/console sorry, should have seen that it didn't have tests attached. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398694) Time Spent: 40m (was: 0.5h) > Fix potential UnsupportedEncodingException > -- > > Key: BEAM-9453 > URL: https://issues.apache.org/jira/browse/BEAM-9453 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq >Affects Versions: 2.16.0 >Reporter: Henry Tang >Priority: Trivial > Labels: pull-request-available > Fix For: Not applicable > > Original Estimate: 0h > Time Spent: 40m > Remaining Estimate: 0h > > Currently the code assigns a new string with > {code:java} > String s = new String(bytes, "UTF-8"); > {code} > This has the possibility of throwing an UnsupportedEncodingException. > > Using > {code:java} > new String(bytes, StandardCharsets.UTF_8){code} > avoids the possibility of throwing an UnsupportedEncodingException > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8421) Job API relies on org.apache.beam.vendor.
[ https://issues.apache.org/jira/browse/BEAM-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-8421. Fix Version/s: Not applicable Resolution: Won't Fix > Job API relies on org.apache.beam.vendor. > - > > Key: BEAM-8421 > URL: https://issues.apache.org/jira/browse/BEAM-8421 > Project: Beam > Issue Type: Bug > Components: beam-model >Affects Versions: 2.16.0 >Reporter: Romain Manni-Bucau >Priority: Major > Fix For: Not applicable > > > API shouldn't rely on any internal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8421) Job API relies on org.apache.beam.vendor.
[ https://issues.apache.org/jira/browse/BEAM-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-8421: --- Status: Open (was: Triage Needed) > Job API relies on org.apache.beam.vendor. > - > > Key: BEAM-8421 > URL: https://issues.apache.org/jira/browse/BEAM-8421 > Project: Beam > Issue Type: Bug > Components: beam-model >Affects Versions: 2.16.0 >Reporter: Romain Manni-Bucau >Priority: Major > > API shouldn't rely on any internal -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7891) gRPC vendoring contains overlapping classes
[ https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-7891: -- Assignee: Luke Cwik (was: Ismaël Mejía) > gRPC vendoring contains overlapping classes > --- > > Key: BEAM-7891 > URL: https://issues.apache.org/jira/browse/BEAM-7891 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Romain Manni-Bucau >Assignee: Luke Cwik >Priority: Major > Fix For: 2.15.0 > > > In 2.14 the overlapping bug between modules is still not fixed, it still > prevents to use beam with some JVM, pollutes a lot shadowing/uber jar > creation and can prevent beam to run under some classloading setup > (potentielly in an engine/runner). Here is one example: > > {code:java} > [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, > beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping > classes: > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code} > This task is indeed about fixing the overlappings but also ensuring it can't > come in 2.15 since all versions are affected since vendoring had been set up > and it never had been cleanly fixed on all the build. > > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7891) gRPC vendoring contains overlapping classes
[ https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-7891: -- Assignee: Ismaël Mejía > gRPC vendoring contains overlapping classes > --- > > Key: BEAM-7891 > URL: https://issues.apache.org/jira/browse/BEAM-7891 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Romain Manni-Bucau >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.15.0 > > > In 2.14 the overlapping bug between modules is still not fixed, it still > prevents to use beam with some JVM, pollutes a lot shadowing/uber jar > creation and can prevent beam to run under some classloading setup > (potentielly in an engine/runner). Here is one example: > > {code:java} > [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, > beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping > classes: > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code} > This task is indeed about fixing the overlappings but also ensuring it can't > come in 2.15 since all versions are affected since vendoring had been set up > and it never had been cleanly fixed on all the build. > > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398687 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 21:04 Start Date: 05/Mar/20 21:04 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595446366 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398687) Time Spent: 6h 50m (was: 6h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=398680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398680 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 05/Mar/20 20:53 Start Date: 05/Mar/20 20:53 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#discussion_r388558761 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java ## @@ -261,14 +263,20 @@ public String registerCoder(Coder coder) throws IOException { * return the same unique ID. */ public String registerEnvironment(Environment env) { +String environmentId; String existing = environmentIds.get(env); if (existing != null) { - return existing; + environmentId = existing; +} else { + String name = uniqify(env.getUrn(), environmentIds.values()); + environmentIds.put(env, name); + componentsBuilder.putEnvironments(name, env); + environmentId = name; } -String name = uniqify(env.getUrn(), environmentIds.values()); -environmentIds.put(env, name); -componentsBuilder.putEnvironments(name, env); -return name; +if (defaultEnvironmentId == null) { Review comment: Ok. Let's still do this immediately after this one though so that we do not forget about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398680) Time Spent: 5h 40m (was: 5.5h) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API
[ https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=398679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398679 ] ASF GitHub Bot logged work on BEAM-8932: Author: ASF GitHub Bot Created on: 05/Mar/20 20:51 Start Date: 05/Mar/20 20:51 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10477: [BEAM-8932][Cleanup] Cleanup pubsubio by removing optionality and adding defaults to builders. URL: https://github.com/apache/beam/pull/10477 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398679) Time Spent: 16h 40m (was: 16.5h) > Expose complete Cloud Pub/Sub messages through PubsubIO API > --- > > Key: BEAM-8932 > URL: https://issues.apache.org/jira/browse/BEAM-8932 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > The PubsubIO API only exposes a subset of the fields in the underlying > PubsubMessage protocol buffer. To accomodate future feature changes as well > as for greater compatability with code using the Cloud Pub/Sub apis, a method > to read and write these protocol messages should be exposed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398678 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 20:49 Start Date: 05/Mar/20 20:49 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595439800 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398678) Time Spent: 97h 20m (was: 97h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 97h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398674 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 20:42 Start Date: 05/Mar/20 20:42 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595436740 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398674) Time Spent: 97h 10m (was: 97h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 97h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7891) gRPC vendoring contains overlapping classes
[ https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik updated BEAM-7891: Fix Version/s: 2.15.0 > gRPC vendoring contains overlapping classes > --- > > Key: BEAM-7891 > URL: https://issues.apache.org/jira/browse/BEAM-7891 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Romain Manni-Bucau >Priority: Major > Fix For: 2.15.0 > > > In 2.14 the overlapping bug between modules is still not fixed, it still > prevents to use beam with some JVM, pollutes a lot shadowing/uber jar > creation and can prevent beam to run under some classloading setup > (potentielly in an engine/runner). Here is one example: > > {code:java} > [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, > beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping > classes: > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code} > This task is indeed about fixing the overlappings but also ensuring it can't > come in 2.15 since all versions are affected since vendoring had been set up > and it never had been cleanly fixed on all the build. > > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7891) gRPC vendoring contains overlapping classes
[ https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik resolved BEAM-7891. - Resolution: Fixed > gRPC vendoring contains overlapping classes > --- > > Key: BEAM-7891 > URL: https://issues.apache.org/jira/browse/BEAM-7891 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Romain Manni-Bucau >Priority: Major > Fix For: 2.15.0 > > > In 2.14 the overlapping bug between modules is still not fixed, it still > prevents to use beam with some JVM, pollutes a lot shadowing/uber jar > creation and can prevent beam to run under some classloading setup > (potentielly in an engine/runner). Here is one example: > > {code:java} > [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, > beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping > classes: > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code} > This task is indeed about fixing the overlappings but also ensuring it can't > come in 2.15 since all versions are affected since vendoring had been set up > and it never had been cleanly fixed on all the build. > > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7891) gRPC vendoring contains overlapping classes
[ https://issues.apache.org/jira/browse/BEAM-7891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052492#comment-17052492 ] Luke Cwik commented on BEAM-7891: - This was fixed in 2.15. The jar dropped from ~3mb to ~30kb. > gRPC vendoring contains overlapping classes > --- > > Key: BEAM-7891 > URL: https://issues.apache.org/jira/browse/BEAM-7891 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Romain Manni-Bucau >Priority: Major > > In 2.14 the overlapping bug between modules is still not fixed, it still > prevents to use beam with some JVM, pollutes a lot shadowing/uber jar > creation and can prevent beam to run under some classloading setup > (potentielly in an engine/runner). Here is one example: > > {code:java} > [INFO] [WARNING] beam-vendor-grpc-1_13_1-0.2.jar, > beam-vendor-sdks-java-extensions-protobuf-2.14.0.jar define 1814 overlapping > classes: > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.ImmutableMapValues$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.ImmediateFuture$ImmediateCancelledFuture > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.base.Converter$ReverseConverter > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.hash.HashCode$IntHashCode > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Iterables$8$1 > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.HashBiMap > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.cache.CacheBuilderSpec$WriteDurationParser > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.collect.Multiset$Entry > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.graph.AbstractValueGraph > [INFO] [WARNING] - > org.apache.beam.vendor.grpc.v1p13p1.com.google.common.util.concurrent.InterruptibleTask{code} > This task is indeed about fixing the overlappings but also ensuring it can't > come in 2.15 since all versions are affected since vendoring had been set up > and it never had been cleanly fixed on all the build. > > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398662&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398662 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 05/Mar/20 20:23 Start Date: 05/Mar/20 20:23 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11020: [BEAM-7926] Update Data Visualization URL: https://github.com/apache/beam/pull/11020#issuecomment-595429224 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398662) Time Spent: 57h 20m (was: 57h 10m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 57h 20m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=398663&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398663 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 05/Mar/20 20:23 Start Date: 05/Mar/20 20:23 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#issuecomment-595429286 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398663) Time Spent: 71.5h (was: 71h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 71.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398659&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398659 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 20:18 Start Date: 05/Mar/20 20:18 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595427003 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398659) Time Spent: 96h 50m (was: 96h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 96h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398660 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 20:18 Start Date: 05/Mar/20 20:18 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595427106 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398660) Time Spent: 97h (was: 96h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 97h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9442) Schema Select does not properly handle nested nullable fields
[ https://issues.apache.org/jira/browse/BEAM-9442?focusedWorklogId=398637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398637 ] ASF GitHub Bot logged work on BEAM-9442: Author: ASF GitHub Bot Created on: 05/Mar/20 20:01 Start Date: 05/Mar/20 20:01 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11046: [BEAM-9442] Properly handle nullable fields in Select URL: https://github.com/apache/beam/pull/11046#issuecomment-595420264 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398637) Time Spent: 1h (was: 50m) > Schema Select does not properly handle nested nullable fields > - > > Key: BEAM-9442 > URL: https://issues.apache.org/jira/browse/BEAM-9442 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-harness >Reporter: Reuven Lax >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > A select of a nested field should be nullable if any of its parents are > nullable. So for example, a select of "a.b" should return a field named b > that is nullable if _either_ of a or b is nullable. Today we only examine b > to see if the selected fields should be nullable. > Also the Select transform itself does not properly check for null values, and > throws NullPointerExceptions when some row values are null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398636 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 20:01 Start Date: 05/Mar/20 20:01 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595420007 there are some test failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398636) Time Spent: 96h 40m (was: 96.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 96h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9453) Fix potential UnsupportedEncodingException
[ https://issues.apache.org/jira/browse/BEAM-9453?focusedWorklogId=398629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398629 ] ASF GitHub Bot logged work on BEAM-9453: Author: ASF GitHub Bot Created on: 05/Mar/20 19:44 Start Date: 05/Mar/20 19:44 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #11017: [BEAM-9453] Fix potential UnsupportedEncodingException URL: https://github.com/apache/beam/pull/11017#issuecomment-595412338 @alexvanboxel it causes broken jenkins test in spotless check on master branch https://builds.apache.org/job/beam_PreCommit_Spotless_Commit/7888/console This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398629) Time Spent: 0.5h (was: 20m) > Fix potential UnsupportedEncodingException > -- > > Key: BEAM-9453 > URL: https://issues.apache.org/jira/browse/BEAM-9453 > Project: Beam > Issue Type: Improvement > Components: io-java-rabbitmq >Affects Versions: 2.16.0 >Reporter: Henry Tang >Priority: Trivial > Labels: pull-request-available > Fix For: Not applicable > > Original Estimate: 0h > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the code assigns a new string with > {code:java} > String s = new String(bytes, "UTF-8"); > {code} > This has the possibility of throwing an UnsupportedEncodingException. > > Using > {code:java} > new String(bytes, StandardCharsets.UTF_8){code} > avoids the possibility of throwing an UnsupportedEncodingException > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9455) Environment-sensitive provisioning for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9455: -- Parent: BEAM-9238 Issue Type: Sub-task (was: Improvement) > Environment-sensitive provisioning for Dataflow > --- > > Key: BEAM-9455 > URL: https://issues.apache.org/jira/browse/BEAM-9455 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > Environment-sensitive provisioning for Dataflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9455) Environment-sensitive provisioning for Dataflow
Heejong Lee created BEAM-9455: - Summary: Environment-sensitive provisioning for Dataflow Key: BEAM-9455 URL: https://issues.apache.org/jira/browse/BEAM-9455 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: Heejong Lee Assignee: Heejong Lee Environment-sensitive provisioning for Dataflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9455) Environment-sensitive provisioning for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9455: -- Status: Open (was: Triage Needed) > Environment-sensitive provisioning for Dataflow > --- > > Key: BEAM-9455 > URL: https://issues.apache.org/jira/browse/BEAM-9455 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > Environment-sensitive provisioning for Dataflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee resolved BEAM-9229. --- Fix Version/s: 2.20.0 Resolution: Fixed > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.20.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9383: -- Parent: BEAM-9238 Issue Type: Sub-task (was: Improvement) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=398618&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398618 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 05/Mar/20 19:21 Start Date: 05/Mar/20 19:21 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#discussion_r388509693 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java ## @@ -261,14 +263,20 @@ public String registerCoder(Coder coder) throws IOException { * return the same unique ID. */ public String registerEnvironment(Environment env) { +String environmentId; String existing = environmentIds.get(env); if (existing != null) { - return existing; + environmentId = existing; +} else { + String name = uniqify(env.getUrn(), environmentIds.values()); + environmentIds.put(env, name); + componentsBuilder.putEnvironments(name, env); + environmentId = name; } -String name = uniqify(env.getUrn(), environmentIds.values()); -environmentIds.put(env, name); -componentsBuilder.putEnvironments(name, env); -return name; +if (defaultEnvironmentId == null) { Review comment: If we change the signature of `registerEnvironment`, a number of test files (*TranslationTest, *RunnerTest) also need to be touched. I think it will create unnecessary noise in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398618) Time Spent: 5.5h (was: 5h 20m) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)