[jira] [Commented] (BEAM-8251) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213478#comment-17213478 ] Jing Chen commented on BEAM-8251: - seems like all subtasks are done, we shall be able to close the issue. > Add worker_region and worker_zone options > - > > Key: BEAM-8251 > URL: https://issues.apache.org/jira/browse/BEAM-8251 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Kyle Weaver >Priority: P3 > Time Spent: 4h 20m > Remaining Estimate: 0h > > We are refining the way the user specifies worker regions and zones to the > Dataflow service. We need to add worker_region and worker_zone pipeline > options that will be preferred over the old experiments=worker_region and > --zone flags. I will create subtasks for adding these options to each SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8253) (Go SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen updated BEAM-8253: Status: Resolved (was: Open) > (Go SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8253 > URL: https://issues.apache.org/jira/browse/BEAM-8253 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Assignee: Jing Chen >Priority: P3 > Labels: stale-assigned > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8253) (Go SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen updated BEAM-8253: Status: Resolved (was: Resolved) > (Go SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8253 > URL: https://issues.apache.org/jira/browse/BEAM-8253 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Assignee: Jing Chen >Priority: P3 > Labels: stale-assigned > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8253) (Go SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8253: --- Assignee: Jing Chen > (Go SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8253 > URL: https://issues.apache.org/jira/browse/BEAM-8253 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Assignee: Jing Chen >Priority: P3 > Labels: stale-assigned > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9959) Mistakes Computing Composite Inputs and Outputs
[ https://issues.apache.org/jira/browse/BEAM-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-9959: --- Assignee: Jing Chen > Mistakes Computing Composite Inputs and Outputs > --- > > Key: BEAM-9959 > URL: https://issues.apache.org/jira/browse/BEAM-9959 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Robert Burke >Assignee: Jing Chen >Priority: P3 > > The Go SDK uses a Scope object to manage beam Composites. > A bug was discovered when consuming a PCollection in both the composite that > created it, and in a separate composite. > Further, the Go SDK should verify that the root hypergraph structure is a DAG > and provides a reasonable error. In particular, the leaf nodes of the graph > could form a DAG, but due to how the beam.Scope object is used, might cause > the hypergraph to not be a DAG. > Eg. It's possible to write the following in the Go SDK. > PTransforms A, B, C and PCollections colA, colB, and Composites a, b. > A and C are in a, and B are in b. > A generates colA > B consumes colA, and generates colB. > C consumes colA and colB. > ``` > a := s.Scope(a) > b := s.Scope(b) > colA := beam.Impulse(*a*) > colB := beam.ParDo(*b*, , colA) > beam.ParDo0(*a*, , colA, beam.SideInput{colB}) > ``` > If it doesn't already, the Go SDK must emit a clear error, and fail pipeline > construction. > If the affected composites are roots in the graph, the cycle prevents being > able to topologically sort the root ptransforms for the pipeline graph, which > can adversely affect runners. > The recommendation is always to wrap uses of scope in functions or other > scopes to prevent such incorrect constructions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8253) (Go SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210699#comment-17210699 ] Jing Chen commented on BEAM-8253: - do you mind sharing more information of the task, i may be able to pick it up > (Go SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8253 > URL: https://issues.apache.org/jira/browse/BEAM-8253 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-go >Reporter: Kyle Weaver >Priority: P3 > Labels: stale-assigned > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8017) Plumb errors and remove panics from package graphx
[ https://issues.apache.org/jira/browse/BEAM-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen updated BEAM-8017: Status: Resolved (was: Open) > Plumb errors and remove panics from package graphx > -- > > Key: BEAM-8017 > URL: https://issues.apache.org/jira/browse/BEAM-8017 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Jing Chen >Priority: P3 > Labels: Novice, beginner, noob, starter > Time Spent: 1h 10m > Remaining Estimate: 0h > > The graphx package, and in particular serialize.go and coder.go should be > returning errors back up, rather than panicing when issues occur deeper when > marshalling types. It makes errors harder to follow since there's now a less > necessary panic trace to skip, rather than a clearly constructed error > message. > Not difficult, but may be tedious. Requires plumbing the errors and > handling/wrapping them appropriately instead of using panic. Most error > handling is presently correctly wrapped anyway. > The graphx package as a rule is intended for beam internal use, and not part > of the user surface, so making the API changes (which aren't backwards > compatible) isn't the worst. Most of the affected methods are unexported. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8017) Plumb errors and remove panics from package graphx
[ https://issues.apache.org/jira/browse/BEAM-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8017: --- Assignee: Jing Chen > Plumb errors and remove panics from package graphx > -- > > Key: BEAM-8017 > URL: https://issues.apache.org/jira/browse/BEAM-8017 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Robert Burke >Assignee: Jing Chen >Priority: P3 > Labels: Novice, beginner, noob, starter > > The graphx package, and in particular serialize.go and coder.go should be > returning errors back up, rather than panicing when issues occur deeper when > marshalling types. It makes errors harder to follow since there's now a less > necessary panic trace to skip, rather than a clearly constructed error > message. > Not difficult, but may be tedious. Requires plumbing the errors and > handling/wrapping them appropriately instead of using panic. Most error > handling is presently correctly wrapped anyway. > The graphx package as a rule is intended for beam internal use, and not part > of the user surface, so making the API changes (which aren't backwards > compatible) isn't the worst. Most of the affected methods are unexported. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10660) [Go SDK] Implement Timer support
[ https://issues.apache.org/jira/browse/BEAM-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197443#comment-17197443 ] Jing Chen commented on BEAM-10660: -- Hi, do you have more details on the issue > [Go SDK] Implement Timer support > > > Key: BEAM-10660 > URL: https://issues.apache.org/jira/browse/BEAM-10660 > Project: Beam > Issue Type: New Feature > Components: sdk-go >Reporter: Robert Burke >Priority: P2 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10159) Support Reading data from Databricks Delta
[ https://issues.apache.org/jira/browse/BEAM-10159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149026#comment-17149026 ] Jing Chen commented on BEAM-10159: -- Is there any analog within beam that support such reading? > Support Reading data from Databricks Delta > -- > > Key: BEAM-10159 > URL: https://issues.apache.org/jira/browse/BEAM-10159 > Project: Beam > Issue Type: New Feature > Components: io-ideas >Reporter: Ismaël Mejía >Priority: P2 > > Databricks Delta is an open source storage layer on top of different > filesystems. The current implementation of Delta is strongly coupled with > Spark so we cannot rely on it because it would break Beam portability. > However now there is an open specification for Delta's protocol. > https://github.com/delta-io/delta/blob/master/PROTOCOL.md > Another possible approach could be to investigate how if Beam could use a > manifest based approach like Presto does: > https://docs.databricks.com/delta/presto-integration.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-5504: --- Assignee: Jing Chen > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: P2 > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014758#comment-17014758 ] Jing Chen commented on BEAM-5504: - [~amaliujia] curious if there is an easy way to run integration locally. > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014758#comment-17014758 ] Jing Chen edited comment on BEAM-5504 at 1/14/20 12:59 AM: --- [~amaliujia] curious if there is an easy way to run integration tests locally. was (Author: jingc): [~amaliujia] curious if there is an easy way to run integration locally. > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-5504 started by Jing Chen. --- > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8801) PubsubMessageToRow should not check useFlatSchema() in processElement
[ https://issues.apache.org/jira/browse/BEAM-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990289#comment-16990289 ] Jing Chen commented on BEAM-8801: - [~bhulette] thank you for the explanation. I am able to pick up the ticket. I am working on [BEAM-5504|https://issues.apache.org/jira/browse/BEAM-5504] to add avro support for pubsub table, it might benefit from the refactoring. Feel free to let me know if you have any comments :) > PubsubMessageToRow should not check useFlatSchema() in processElement > - > > Key: BEAM-8801 > URL: https://issues.apache.org/jira/browse/BEAM-8801 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > > Currently we check useFlatSchema() for every element that's processed. > Instead, we should check it once at pipeline construction time. See > [comment|https://github.com/apache/beam/pull/10158#discussion_r348805530]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8801) PubsubMessageToRow should not check useFlatSchema() in processElement
[ https://issues.apache.org/jira/browse/BEAM-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8801: --- Assignee: Jing Chen (was: Brian Hulette) > PubsubMessageToRow should not check useFlatSchema() in processElement > - > > Key: BEAM-8801 > URL: https://issues.apache.org/jira/browse/BEAM-8801 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Jing Chen >Priority: Major > > Currently we check useFlatSchema() for every element that's processed. > Instead, we should check it once at pipeline construction time. See > [comment|https://github.com/apache/beam/pull/10158#discussion_r348805530]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8801) PubsubMessageToRow should not check useFlatSchema() in processElement
[ https://issues.apache.org/jira/browse/BEAM-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987401#comment-16987401 ] Jing Chen commented on BEAM-8801: - Happened to see TODO at the code base, I curious what you mean by `pipeline construction time` > PubsubMessageToRow should not check useFlatSchema() in processElement > - > > Key: BEAM-8801 > URL: https://issues.apache.org/jira/browse/BEAM-8801 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > > Currently we check useFlatSchema() for every element that's processed. > Instead, we should check it once at pipeline construction time. See > [comment|https://github.com/apache/beam/pull/10158#discussion_r348805530]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987351#comment-16987351 ] Jing Chen commented on BEAM-5504: - thanks for the heads up. will put both in the same pr. > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-5504: --- Assignee: Jing Chen > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987277#comment-16987277 ] Jing Chen commented on BEAM-5504: - [~amaliujia] just need to clarify that we need to support avro format in both pubsub read and write, am i correct? > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7996) Add support for remaining data types in python RowCoder
[ https://issues.apache.org/jira/browse/BEAM-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-7996: --- Assignee: Jing Chen > Add support for remaining data types in python RowCoder > > > Key: BEAM-7996 > URL: https://issues.apache.org/jira/browse/BEAM-7996 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Jing Chen >Priority: Major > > In the initial [python RowCoder > implementation|https://github.com/apache/beam/pull/9188] we only added > support for the data types that already had coders in the Python SDK. We > should add support for the remaining data types that are not currently > supported: > * INT8 (ByteCoder in Java) > * INT16 (BigEndianShortCoder in Java) > * FLOAT (FloatCoder in Java) > * BOOLEAN (BooleanCoder in Java) > * Map (MapCoder in Java) > We might consider making those coders standard so they can be tested > independently from RowCoder in standard_coders.yaml. Or, if we don't do that > we should probably add a more robust testing framework for RowCoder itself, > because it will be challenging to test all of these types as part of the > RowCoder tests in standard_coders.yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8406) TextTable support JSON format
[ https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen resolved BEAM-8406. - Fix Version/s: Not applicable Resolution: Done > TextTable support JSON format > - > > Key: BEAM-8406 > URL: https://issues.apache.org/jira/browse/BEAM-8406 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > Fix For: Not applicable > > Time Spent: 1.5h > Remaining Estimate: 0h > > Have a JSON table implementation similar to [1]. > [1]: > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5504) Pubsub Write by BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983874#comment-16983874 ] Jing Chen commented on BEAM-5504: - [~amaliujia] do you mind sharing more context and information? > Pubsub Write by BeamSQL > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8298) Implement state caching for side inputs
[ https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8298 started by Jing Chen. --- > Implement state caching for side inputs > --- > > Key: BEAM-8298 > URL: https://issues.apache.org/jira/browse/BEAM-8298 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-py-harness >Reporter: Maximilian Michels >Assignee: Jing Chen >Priority: Major > > Caching is currently only implemented for user state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982762#comment-16982762 ] Jing Chen commented on BEAM-3788: - i am curious if there is a plan to integrate confluent's schema registry etc or simply vanilla kafka > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8406) TextTable support JSON format
[ https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8406 started by Jing Chen. --- > TextTable support JSON format > - > > Key: BEAM-8406 > URL: https://issues.apache.org/jira/browse/BEAM-8406 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Have a JSON table implementation similar to [1]. > [1]: > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8406) TextTable support JSON format
[ https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8406: --- Assignee: Jing Chen > TextTable support JSON format > - > > Key: BEAM-8406 > URL: https://issues.apache.org/jira/browse/BEAM-8406 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > > Have a JSON table implementation similar to [1]. > [1]: > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8298) Implement state caching for side inputs
[ https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8298: --- Assignee: Jing Chen > Implement state caching for side inputs > --- > > Key: BEAM-8298 > URL: https://issues.apache.org/jira/browse/BEAM-8298 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-py-harness >Reporter: Maximilian Michels >Assignee: Jing Chen >Priority: Major > > Caching is currently only implemented for user state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8338) Support ES 7.x for ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969957#comment-16969957 ] Jing Chen commented on BEAM-8338: - [~mbrunat] thanks for heads up. feel free to self assign the issue :) > Support ES 7.x for ElasticsearchIO > -- > > Key: BEAM-8338 > URL: https://issues.apache.org/jira/browse/BEAM-8338 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Michal Brunát >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x. > We should support ES 7.x for ElasticsearchIO. > [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html] > > [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8338) Support ES 7.x for ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8338: --- Assignee: (was: Jing Chen) > Support ES 7.x for ElasticsearchIO > -- > > Key: BEAM-8338 > URL: https://issues.apache.org/jira/browse/BEAM-8338 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Michal Brunát >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x. > We should support ES 7.x for ElasticsearchIO. > [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html] > > [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8298) Implement state caching for side inputs
[ https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969667#comment-16969667 ] Jing Chen commented on BEAM-8298: - [~mxm] would you mind sharing details on the issue? I am kinda interested on working on it if it is still free > Implement state caching for side inputs > --- > > Key: BEAM-8298 > URL: https://issues.apache.org/jira/browse/BEAM-8298 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Maximilian Michels >Priority: Major > > Caching is currently only implemented for user state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8338) Support ES 7.x for ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8338: --- Assignee: Jing Chen > Support ES 7.x for ElasticsearchIO > -- > > Key: BEAM-8338 > URL: https://issues.apache.org/jira/browse/BEAM-8338 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Michal Brunát >Assignee: Jing Chen >Priority: Major > > Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x. > We should support ES 7.x for ElasticsearchIO. > [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html] > > [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8376) Add FirestoreIO connector to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968627#comment-16968627 ] Jing Chen commented on BEAM-8376: - chat with [~chamikara] offline, there are ongoing efforts on the issue. unassign myself. > Add FirestoreIO connector to Java SDK > - > > Key: BEAM-8376 > URL: https://issues.apache.org/jira/browse/BEAM-8376 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Stefan Djelekar >Priority: Major > > Motivation: > There is no Firestore connector for Java SDK at the moment. > Having it will enhance the integrations with database options on the Google > Cloud Platform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8376) Add FirestoreIO connector to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8376: --- Assignee: (was: Jing Chen) > Add FirestoreIO connector to Java SDK > - > > Key: BEAM-8376 > URL: https://issues.apache.org/jira/browse/BEAM-8376 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Stefan Djelekar >Priority: Major > > Motivation: > There is no Firestore connector for Java SDK at the moment. > Having it will enhance the integrations with database options on the Google > Cloud Platform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8561: --- Assignee: (was: Jing Chen) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Priority: Minor > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968555#comment-16968555 ] Jing Chen commented on BEAM-8561: - [~clarsen] that would be great, reassign the ticket to you, let me know if there is anything i could help :) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Jing Chen >Priority: Minor > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8561: --- Assignee: Jing Chen > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Jing Chen >Priority: Minor > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7732) Allow access to SpannerOptions in Beam
[ https://issues.apache.org/jira/browse/BEAM-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968186#comment-16968186 ] Jing Chen commented on BEAM-7732: - i am curious if the issue is still open, [~nielm] > Allow access to SpannerOptions in Beam > -- > > Key: BEAM-7732 > URL: https://issues.apache.org/jira/browse/BEAM-7732 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.12.0, 2.13.0 >Reporter: Niel Markwick >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > Beam hides the > [SpannerOptions|https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerOptions.java] > object behind a > [SpannerConfig|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java] > object because the SpannerOptions object is not serializable. > This means that the only options that can be set are those that can be > specified in SpannerConfig - limited to host, project, instance, database. > Suggestion: add the possibility to set a SpannerOptionsFactory in > SpannerConfig: > {code:java} > public interface SpannerOptionsFactory extends Serializable { > public SpannerOptions create(); > } > {code} > This would allow the user use this factory class to specify custom > SpannerOptions before they are passed onto the connectToSpanner() method; > connectToSpanner() would then become: > {code:java} > public SpannerAccessor connectToSpanner() { > > SpannerOptions.Builder builder = spannerOptionsFactory.create().toBuilder(); > // rest of connectToSpanner follows, setting project, host, etc. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8376) Add FirestoreIO connector to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968160#comment-16968160 ] Jing Chen commented on BEAM-8376: - [~chamikara] do you have any context you could share? I may be able to ping you offline for more info and resource on this if you are cool with that. > Add FirestoreIO connector to Java SDK > - > > Key: BEAM-8376 > URL: https://issues.apache.org/jira/browse/BEAM-8376 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Stefan Djelekar >Priority: Major > > Motivation: > There is no Firestore connector for Java SDK at the moment. > Having it will enhance the integrations with database options on the Google > Cloud Platform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8376) Add FirestoreIO connector to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-8376: --- Assignee: Jing Chen > Add FirestoreIO connector to Java SDK > - > > Key: BEAM-8376 > URL: https://issues.apache.org/jira/browse/BEAM-8376 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Stefan Djelekar >Assignee: Jing Chen >Priority: Major > > Motivation: > There is no Firestore connector for Java SDK at the moment. > Having it will enhance the integrations with database options on the Google > Cloud Platform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3493) Prevent users from "implementing" PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968095#comment-16968095 ] Jing Chen commented on BEAM-3493: - The issue should have been fixed by [https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsValidator.java#L69] and L70. I am about to close the ticket once either [~kenn] or [~lcwik] could confirm it. Thanks Jing > Prevent users from "implementing" PipelineOptions > - > > Key: BEAM-3493 > URL: https://issues.apache.org/jira/browse/BEAM-3493 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Jing Chen >Priority: Minor > Labels: newbie, starter > > I've seen a user implement \{{PipelineOptions}}. This implies that it is > backwards-incompatible to add new options, which is of course not our intent. > We should at least document very loudly that it is not to be implemented, and > preferably have some automation that will fail on load if they have > implemented it. Ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3493) Prevent users from "implementing" PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen reassigned BEAM-3493: --- Assignee: Jing Chen > Prevent users from "implementing" PipelineOptions > - > > Key: BEAM-3493 > URL: https://issues.apache.org/jira/browse/BEAM-3493 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Jing Chen >Priority: Minor > Labels: newbie, starter > > I've seen a user implement \{{PipelineOptions}}. This implies that it is > backwards-incompatible to add new options, which is of course not our intent. > We should at least document very loudly that it is not to be implemented, and > preferably have some automation that will fail on load if they have > implemented it. Ideas? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Chen updated BEAM-2857: Comment: was deleted (was: i want to take a stab at this ticket, can you please add me to contributor list? Thanks) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor, triaged > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)