[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=395830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395830 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 02/Mar/20 07:38 Start Date: 02/Mar/20 07:38 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#issuecomment-593262817 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395830) Time Spent: 4h 10m (was: 4h) > BIP-1: Typed options for Row Schema and Fields > -- > > Key: BEAM-9035 > URL: https://issues.apache.org/jira/browse/BEAM-9035 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > Fix For: 2.19.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > This is the first issue of a multipart commit: this ticket implements the > basic infrastructure of options on row and field. > Full explanation: > Introduce the concept of Options in Beam Schema’s to add extra context to > fields and schema. In contracts to metadata, options would be added to > fields, logical types and rows. In the options schema convertors can add > options/annotations/decorators that were in the original schema, this context > can be used in the rest of the pipeline for specific transformations or > augment the end schema in the target output. > Examples of options are: > * informational: like the source of the data, ... > * drive decisions further in the pipeline: flatten a row into another, > rename a field, ... > * influence something in the output: like cluster index, primary key, ... > * logical type information > And option is a key/typed value combination. The advantages of having the > value types is: > * Having strongly typed options would give a *portable way of Logical Types* > to have structured information that could be shared over different languages. > * This could keep the type intact when mapping from a formats that have > strongly typed options (example: Protobuf). > This is part of a multi ticket implementation. The following tickets are > related: > # Typed options for Row Schema and Fields > # Convert Proto Options to Beam Schema options > # Convert Avro extra information for Beam string options > # Replace meta data with Logical Type options > # Extract meta data in Calcite SQL to Beam options > # Extract meta data in Zeta SQL to Beam options > # Add java example of using option in a transform > This feature is discussed with Reuven Lax, Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=395829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395829 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 02/Mar/20 07:38 Start Date: 02/Mar/20 07:38 Worklog Time Spent: 10m Work Description: alexvanboxel commented on issue #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#issuecomment-593262766 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395829) Time Spent: 4h (was: 3h 50m) > BIP-1: Typed options for Row Schema and Fields > -- > > Key: BEAM-9035 > URL: https://issues.apache.org/jira/browse/BEAM-9035 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > Fix For: 2.19.0 > > Time Spent: 4h > Remaining Estimate: 0h > > This is the first issue of a multipart commit: this ticket implements the > basic infrastructure of options on row and field. > Full explanation: > Introduce the concept of Options in Beam Schema’s to add extra context to > fields and schema. In contracts to metadata, options would be added to > fields, logical types and rows. In the options schema convertors can add > options/annotations/decorators that were in the original schema, this context > can be used in the rest of the pipeline for specific transformations or > augment the end schema in the target output. > Examples of options are: > * informational: like the source of the data, ... > * drive decisions further in the pipeline: flatten a row into another, > rename a field, ... > * influence something in the output: like cluster index, primary key, ... > * logical type information > And option is a key/typed value combination. The advantages of having the > value types is: > * Having strongly typed options would give a *portable way of Logical Types* > to have structured information that could be shared over different languages. > * This could keep the type intact when mapping from a formats that have > strongly typed options (example: Protobuf). > This is part of a multi ticket implementation. The following tickets are > related: > # Typed options for Row Schema and Fields > # Convert Proto Options to Beam Schema options > # Convert Avro extra information for Beam string options > # Replace meta data with Logical Type options > # Extract meta data in Calcite SQL to Beam options > # Extract meta data in Zeta SQL to Beam options > # Add java example of using option in a transform > This feature is discussed with Reuven Lax, Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9371) Implement SideInput load test in Java
[ https://issues.apache.org/jira/browse/BEAM-9371?focusedWorklogId=395826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395826 ] ASF GitHub Bot logged work on BEAM-9371: Author: ASF GitHub Bot Created on: 02/Mar/20 07:26 Start Date: 02/Mar/20 07:26 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10949: [BEAM-9371] Add SideInputLoadTest to Java SDK URL: https://github.com/apache/beam/pull/10949#issuecomment-593258607 Pinging again, @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395826) Time Spent: 40m (was: 0.5h) > Implement SideInput load test in Java > - > > Key: BEAM-9371 > URL: https://issues.apache.org/jira/browse/BEAM-9371 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9418) Support ANY_VALUE aggregation functions
Rui Wang created BEAM-9418: -- Summary: Support ANY_VALUE aggregation functions Key: BEAM-9418 URL: https://issues.apache.org/jira/browse/BEAM-9418 Project: Beam Issue Type: Task Components: dsl-sql Reporter: Rui Wang Support the following functionality in BeamSQL: {code:java} "select t.key, ANY_VALUE(t.column) from t group by t.key"; {code} Spec link: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9412) Fix linkage errors in vendored calcite
[ https://issues.apache.org/jira/browse/BEAM-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048783#comment-17048783 ] Luke Cwik commented on BEAM-9412: - Please make the logging dependencies runtime dependencies instead of shading them so that we get useful logs when integrated with SLF4J frontends. > Fix linkage errors in vendored calcite > -- > > Key: BEAM-9412 > URL: https://issues.apache.org/jira/browse/BEAM-9412 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Luke Cwik >Assignee: Kai Jiang >Priority: Minor > > As of [https://github.com/apache/beam/pull/10559], the linkage errors are: > {code:java} > Class org.slf4j.LoggerFactory is not found; > referenced by 29 class files > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.pretty.SqlPrettyWriter > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AbstractMaterializedViewRule > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Benchmark > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RexImplicationChecker > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.VisitorDataContext > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialectFactoryImpl > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.ResultSetEnumerable > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.trace.CalciteTrace > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.CalciteException > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.AvaticaHttpClientFactoryImpl > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.RemoteProtobufService > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.KerberosConnection > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.AvaticaCommonsHttpClientImpl > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.ClientKeytabJaasConf > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.AvaticaCommonsHttpClientSpnegoImpl > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.Driver > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.ProtobufTranslationImpl > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.filter.FilterCompiler > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.filter.RelationalExpressionNode > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.filter.ValueNode > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.JsonContext > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.path.ArrayPathToken > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.path.CompiledPath > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.path.PredicateContextImpl > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.spi.json.JsonOrgJsonProvider > (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar) > > org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.spi.mapper.GsonMappingProvider >
[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8
[ https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395765 ] ASF GitHub Bot logged work on BEAM-9342: Author: ASF GitHub Bot Created on: 02/Mar/20 04:07 Start Date: 02/Mar/20 04:07 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11009: [BEAM-9342[ Update bytebuddy to version 1.10.8 URL: https://github.com/apache/beam/pull/11009#issuecomment-593209029 Run Go PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395765) Time Spent: 3h 50m (was: 3h 40m) > Update bytebuddy to version 1.10.8 > -- > > Key: BEAM-9342 > URL: https://issues.apache.org/jira/browse/BEAM-9342 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > The version of bytebuddy Beam is using does not officially support Java 11 > and it is more than 1 year old. We should upgrade to a more recent version so > we can benefit of the new improvements of the library as well as of ASM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8
[ https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395764 ] ASF GitHub Bot logged work on BEAM-9342: Author: ASF GitHub Bot Created on: 02/Mar/20 04:06 Start Date: 02/Mar/20 04:06 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11009: [BEAM-9342[ Update bytebuddy to version 1.10.8 URL: https://github.com/apache/beam/pull/11009#issuecomment-593209006 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395764) Time Spent: 3h 40m (was: 3.5h) > Update bytebuddy to version 1.10.8 > -- > > Key: BEAM-9342 > URL: https://issues.apache.org/jira/browse/BEAM-9342 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > The version of bytebuddy Beam is using does not officially support Java 11 > and it is more than 1 year old. We should upgrade to a more recent version so > we can benefit of the new improvements of the library as well as of ASM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8
[ https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395763 ] ASF GitHub Bot logged work on BEAM-9342: Author: ASF GitHub Bot Created on: 02/Mar/20 04:06 Start Date: 02/Mar/20 04:06 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11009: [BEAM-9342[ Update bytebuddy to version 1.10.8 URL: https://github.com/apache/beam/pull/11009#issuecomment-593208979 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395763) Time Spent: 3.5h (was: 3h 20m) > Update bytebuddy to version 1.10.8 > -- > > Key: BEAM-9342 > URL: https://issues.apache.org/jira/browse/BEAM-9342 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > The version of bytebuddy Beam is using does not officially support Java 11 > and it is more than 1 year old. We should upgrade to a more recent version so > we can benefit of the new improvements of the library as well as of ASM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=395760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395760 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 02/Mar/20 03:47 Start Date: 02/Mar/20 03:47 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10940: [BEAM-9288] Not bundle conscrypt in gRPC vendor URL: https://github.com/apache/beam/pull/10940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395760) Time Spent: 3h 20m (was: 3h 10m) > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395712 ] ASF GitHub Bot logged work on BEAM-8382: Author: ASF GitHub Bot Created on: 02/Mar/20 02:09 Start Date: 02/Mar/20 02:09 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9765: [BEAM-8382] Add rate limit policy to KinesisIO.Read URL: https://github.com/apache/beam/pull/9765#issuecomment-593186593 Hi @aromanenko-dev as long as the `Supplier` param is OK then I think this PR is ready. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395712) Time Spent: 14h 20m (was: 14h 10m) > Add polling interval to KinesisIO.Read > -- > > Key: BEAM-8382 > URL: https://issues.apache.org/jira/browse/BEAM-8382 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 14h 20m > Remaining Estimate: 0h > > With the current implementation we are observing Kinesis throttling due to > ReadProvisionedThroughputExceeded on the order of hundreds of times per > second, regardless of the actual Kinesis throughput. This is because the > ShardReadersPool readLoop() method is polling getRecords() as fast as > possible. > From the KDS documentation: > {quote}Each shard can support up to five read transactions per second. > {quote} > and > {quote}For best results, sleep for at least 1 second (1,000 milliseconds) > between calls to getRecords to avoid exceeding the limit on getRecords > frequency. > {quote} > [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html] > [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395685 ] ASF GitHub Bot logged work on BEAM-8382: Author: ASF GitHub Bot Created on: 01/Mar/20 22:54 Start Date: 01/Mar/20 22:54 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read URL: https://github.com/apache/beam/pull/9765#issuecomment-593157293 @aromanenko-dev if I wanted to make FixedDelayRateLimiter take a function as an argument, can I just declare it as `Supplier` or should I declare it some other way due to Beam's serializability requirements? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395685) Time Spent: 14h 10m (was: 14h) > Add polling interval to KinesisIO.Read > -- > > Key: BEAM-8382 > URL: https://issues.apache.org/jira/browse/BEAM-8382 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 14h 10m > Remaining Estimate: 0h > > With the current implementation we are observing Kinesis throttling due to > ReadProvisionedThroughputExceeded on the order of hundreds of times per > second, regardless of the actual Kinesis throughput. This is because the > ShardReadersPool readLoop() method is polling getRecords() as fast as > possible. > From the KDS documentation: > {quote}Each shard can support up to five read transactions per second. > {quote} > and > {quote}For best results, sleep for at least 1 second (1,000 milliseconds) > between calls to getRecords to avoid exceeding the limit on getRecords > frequency. > {quote} > [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html] > [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395684 ] ASF GitHub Bot logged work on BEAM-8382: Author: ASF GitHub Bot Created on: 01/Mar/20 22:53 Start Date: 01/Mar/20 22:53 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read URL: https://github.com/apache/beam/pull/9765#issuecomment-593157293 @aromanenko-dev if I wanted to make FixedDelayRateLimiter take a function as an argument, can I just declare it as `Supplier` or should I declare it some other way due to Beam's serializability requirements? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395684) Time Spent: 14h (was: 13h 50m) > Add polling interval to KinesisIO.Read > -- > > Key: BEAM-8382 > URL: https://issues.apache.org/jira/browse/BEAM-8382 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 14h > Remaining Estimate: 0h > > With the current implementation we are observing Kinesis throttling due to > ReadProvisionedThroughputExceeded on the order of hundreds of times per > second, regardless of the actual Kinesis throughput. This is because the > ShardReadersPool readLoop() method is polling getRecords() as fast as > possible. > From the KDS documentation: > {quote}Each shard can support up to five read transactions per second. > {quote} > and > {quote}For best results, sleep for at least 1 second (1,000 milliseconds) > between calls to getRecords to avoid exceeding the limit on getRecords > frequency. > {quote} > [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html] > [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395683 ] ASF GitHub Bot logged work on BEAM-8382: Author: ASF GitHub Bot Created on: 01/Mar/20 22:50 Start Date: 01/Mar/20 22:50 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read URL: https://github.com/apache/beam/pull/9765#issuecomment-593156655 Hi @aromanenko-dev, sure we have tested it. Here's what read throttles look like with a 1s fixed delay: https://user-images.githubusercontent.com/1551631/75635518-a997ad80-5bcb-11ea-89df-a3bcb663adef.png;> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395683) Time Spent: 13h 50m (was: 13h 40m) > Add polling interval to KinesisIO.Read > -- > > Key: BEAM-8382 > URL: https://issues.apache.org/jira/browse/BEAM-8382 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 13h 50m > Remaining Estimate: 0h > > With the current implementation we are observing Kinesis throttling due to > ReadProvisionedThroughputExceeded on the order of hundreds of times per > second, regardless of the actual Kinesis throughput. This is because the > ShardReadersPool readLoop() method is polling getRecords() as fast as > possible. > From the KDS documentation: > {quote}Each shard can support up to five read transactions per second. > {quote} > and > {quote}For best results, sleep for at least 1 second (1,000 milliseconds) > between calls to getRecords to avoid exceeding the limit on getRecords > frequency. > {quote} > [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html] > [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8437) Consider using native doubles in standard_coders_test.yaml
[ https://issues.apache.org/jira/browse/BEAM-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette resolved BEAM-8437. - Fix Version/s: 2.20.0 Assignee: Luke Cwik Resolution: Won't Fix Sounds good, thanks for the explanation. > Consider using native doubles in standard_coders_test.yaml > -- > > Key: BEAM-8437 > URL: https://issues.apache.org/jira/browse/BEAM-8437 > Project: Beam > Issue Type: Task > Components: testing >Reporter: Brian Hulette >Assignee: Luke Cwik >Priority: Minor > Fix For: 2.20.0 > > > Context: https://github.com/apache/beam/pull/9188#discussion_r332233044 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8
[ https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395609 ] ASF GitHub Bot logged work on BEAM-9342: Author: ASF GitHub Bot Created on: 01/Mar/20 14:17 Start Date: 01/Mar/20 14:17 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #11009: [BEAM-9342[ Update bytebuddy to version 1.10.8 URL: https://github.com/apache/beam/pull/11009#discussion_r386081545 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -514,9 +514,7 @@ public void proccessElement(ProcessContext c) {} SingleOutput, String> parDo = ParDo.of(fn); // Use the parDo in a pipeline to cause state coders to be inferred. - pipeline - .apply(Create.of(KV.of("input", "value"))) - .apply(parDo); + pipeline.apply(Create.of(KV.of("input", "value"))).apply(parDo); Review comment: changes in this class are unrelated but auto applied by `spotlessApply` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395609) Time Spent: 3h 20m (was: 3h 10m) > Update bytebuddy to version 1.10.8 > -- > > Key: BEAM-9342 > URL: https://issues.apache.org/jira/browse/BEAM-9342 > Project: Beam > Issue Type: Improvement > Components: build-system, sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > The version of bytebuddy Beam is using does not officially support Java 11 > and it is more than 1 year old. We should upgrade to a more recent version so > we can benefit of the new improvements of the library as well as of ASM. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9417) Unable to Read form BigQuery and File system in same pipeline
Deepak Verma created BEAM-9417: -- Summary: Unable to Read form BigQuery and File system in same pipeline Key: BEAM-9417 URL: https://issues.apache.org/jira/browse/BEAM-9417 Project: Beam Issue Type: Bug Components: io-py-gcp Environment: macbook pro cataline, python3.7, apache-beam[gcp]===2.19.0 Reporter: Deepak Verma I am trying to read from Bigquery and Local file system in my apache beam[gcp] pipeline. {code:java} pipeline_options = PipelineOptions() options = pipeline_options.view_as(PreProcessOptions) options.view_as(SetupOptions).save_main_session = True p = beam.Pipeline(options=options) apn_query = "select * from `{bq_project}.config.apn` where customer='{customer}'"\ .format(bq_project=options.bq_project, customer=options.customer) file_path = "mycsv.csv.gz" apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder()) {code} When I am running this job, I am getting below error {code:java} Traceback (most recent call last): File "/etl/dataflow/etlTXLPreprocessor.py", line 125, in run() File "/etl/dataflow/etlTXLPreprocessor.py", line 120, in run p.run().wait_until_finish() File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 461, in run self._options).run(False) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 474, in run return self.runner.run_pipeline(self, self._options) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py", line 182, in run_pipeline return runner.run_pipeline(pipeline, options) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py", line 413, in run_pipeline pipeline.replace_all(_get_transform_overrides(options)) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 443, in replace_all self._replace(override) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 340, in _replace self.visit(TransformUpdater(self)) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 503, in visit self._root_transform().visit(visitor, self, visited) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 939, in visit part.visit(visitor, pipeline, visited) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 939, in visit part.visit(visitor, pipeline, visited) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 939, in visit part.visit(visitor, pipeline, visited) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 942, in visit visitor.visit_transform(self) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 338, in visit_transform self._replace_if_needed(transform_node) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 301, in _replace_if_needed new_output = replacement_transform.expand(input_node) File "/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/sdf_direct_runner.py", line 87, in expand invoker = DoFnInvoker.create_invoker(signature, process_invocation=False) File "apache_beam/runners/common.py", line 360, in apache_beam.runners.common.DoFnInvoker.create_invoker TypeError: create_invoker() takes at least 2 positional arguments (1 given){code} But If I run my code like this {code:java} pipeline_options = PipelineOptions() options = pipeline_options.view_as(PreProcessOptions) options.view_as(SetupOptions).save_main_session = True p = beam.Pipeline(options=options) file_path = "mycsv.csv.gz" preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder()) {code} or like this {code:java} pipeline_options = PipelineOptions() options = pipeline_options.view_as(PreProcessOptions) options.view_as(SetupOptions).save_main_session = True p = beam.Pipeline(options=options) apn_query = "select * from `{bq_project}.config.apn` where customer='{customer}'"\ .format(bq_project=options.bq_project, customer=options.customer) apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) {code} or even like this {code:java} pipeline_options = PipelineOptions() options = pipeline_options.view_as(PreProcessOptions) options.view_as(SetupOptions).save_main_session = True p = beam.Pipeline(options=options) apn_query = "select * from `{bq_project}.config.apn` where customer='{customer}'"\ .format(bq_project=options.bq_project, customer=options.customer) apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, use_standard_sql=True)) apn = p |