[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=395830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395830
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 02/Mar/20 07:38
Start Date: 02/Mar/20 07:38
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10413: [BEAM-9035] 
Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#issuecomment-593262817
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395830)
Time Spent: 4h 10m  (was: 4h)

> BIP-1: Typed options for Row Schema and Fields
> --
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the 
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to 
> fields and schema. In contracts to metadata, options would be added to 
> fields, logical types and rows. In the options schema convertors can add 
> options/annotations/decorators that were in the original schema, this context 
> can be used in the rest of the pipeline for specific transformations or 
> augment the end schema in the target output.
> Examples of options are:
>  * informational: like the source of the data, ...
>  * drive decisions further in the pipeline: flatten a row into another, 
> rename a field, ...
>  * influence something in the output: like cluster index, primary key, ...
>  * logical type information
> And option is a key/typed value combination. The advantages of having the 
> value types is: 
>  * Having strongly typed options would give a *portable way of Logical Types* 
> to have structured information that could be shared over different languages.
>  * This could keep the type intact when mapping from a formats that have 
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are 
> related:
>  # Typed options for Row Schema and Fields
>  # Convert Proto Options to Beam Schema options
>  # Convert Avro extra information for Beam string options
>  # Replace meta data with Logical Type options
>  # Extract meta data in Calcite SQL to Beam options
>  # Extract meta data in Zeta SQL to Beam options
>  # Add java example of using option in a transform 
> This feature is discussed with Reuven Lax, Brian Hulette



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=395829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395829
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 02/Mar/20 07:38
Start Date: 02/Mar/20 07:38
Worklog Time Spent: 10m 
  Work Description: alexvanboxel commented on issue #10413: [BEAM-9035] 
Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#issuecomment-593262766
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395829)
Time Spent: 4h  (was: 3h 50m)

> BIP-1: Typed options for Row Schema and Fields
> --
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the 
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to 
> fields and schema. In contracts to metadata, options would be added to 
> fields, logical types and rows. In the options schema convertors can add 
> options/annotations/decorators that were in the original schema, this context 
> can be used in the rest of the pipeline for specific transformations or 
> augment the end schema in the target output.
> Examples of options are:
>  * informational: like the source of the data, ...
>  * drive decisions further in the pipeline: flatten a row into another, 
> rename a field, ...
>  * influence something in the output: like cluster index, primary key, ...
>  * logical type information
> And option is a key/typed value combination. The advantages of having the 
> value types is: 
>  * Having strongly typed options would give a *portable way of Logical Types* 
> to have structured information that could be shared over different languages.
>  * This could keep the type intact when mapping from a formats that have 
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are 
> related:
>  # Typed options for Row Schema and Fields
>  # Convert Proto Options to Beam Schema options
>  # Convert Avro extra information for Beam string options
>  # Replace meta data with Logical Type options
>  # Extract meta data in Calcite SQL to Beam options
>  # Extract meta data in Zeta SQL to Beam options
>  # Add java example of using option in a transform 
> This feature is discussed with Reuven Lax, Brian Hulette



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9371) Implement SideInput load test in Java

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9371?focusedWorklogId=395826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395826
 ]

ASF GitHub Bot logged work on BEAM-9371:


Author: ASF GitHub Bot
Created on: 02/Mar/20 07:26
Start Date: 02/Mar/20 07:26
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10949: [BEAM-9371] Add 
SideInputLoadTest to Java SDK
URL: https://github.com/apache/beam/pull/10949#issuecomment-593258607
 
 
   Pinging again, @Ardagan 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395826)
Time Spent: 40m  (was: 0.5h)

> Implement SideInput load test in Java
> -
>
> Key: BEAM-9371
> URL: https://issues.apache.org/jira/browse/BEAM-9371
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9418) Support ANY_VALUE aggregation functions

2020-03-01 Thread Rui Wang (Jira)
Rui Wang created BEAM-9418:
--

 Summary: Support ANY_VALUE aggregation functions
 Key: BEAM-9418
 URL: https://issues.apache.org/jira/browse/BEAM-9418
 Project: Beam
  Issue Type: Task
  Components: dsl-sql
Reporter: Rui Wang


Support the following functionality in BeamSQL:
{code:java}
"select t.key, ANY_VALUE(t.column) from t group by t.key";
{code}




Spec link: 
https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9412) Fix linkage errors in vendored calcite

2020-03-01 Thread Luke Cwik (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048783#comment-17048783
 ] 

Luke Cwik commented on BEAM-9412:
-

Please make the logging dependencies runtime dependencies instead of shading 
them so that we get useful logs when integrated with SLF4J frontends.

> Fix linkage errors in vendored calcite
> --
>
> Key: BEAM-9412
> URL: https://issues.apache.org/jira/browse/BEAM-9412
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Luke Cwik
>Assignee: Kai Jiang
>Priority: Minor
>
> As of [https://github.com/apache/beam/pull/10559], the linkage errors are:
> {code:java}
> Class org.slf4j.LoggerFactory is not found;
>   referenced by 29 class files
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.pretty.SqlPrettyWriter
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rules.AbstractMaterializedViewRule
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Benchmark 
> (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RexImplicationChecker
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.VisitorDataContext
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialectFactoryImpl
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect 
> (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.ResultSetEnumerable
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.trace.CalciteTrace
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.runtime.CalciteException
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.AvaticaHttpClientFactoryImpl
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.RemoteProtobufService
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.KerberosConnection
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.AvaticaCommonsHttpClientImpl
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.ClientKeytabJaasConf
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.AvaticaCommonsHttpClientSpnegoImpl
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.Driver
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.avatica.remote.ProtobufTranslationImpl
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.filter.FilterCompiler
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.filter.RelationalExpressionNode
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.filter.ValueNode
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.JsonContext
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.path.ArrayPathToken
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.path.CompiledPath
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.internal.path.PredicateContextImpl
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.spi.json.JsonOrgJsonProvider
>  (beam-vendor-calcite-1_20_0-0.1-SNAPSHOT.jar)
> 
> org.apache.beam.vendor.calcite.v1_20_0.com.jayway.jsonpath.spi.mapper.GsonMappingProvider
>  

[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395765
 ]

ASF GitHub Bot logged work on BEAM-9342:


Author: ASF GitHub Bot
Created on: 02/Mar/20 04:07
Start Date: 02/Mar/20 04:07
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11009: [BEAM-9342[ Update 
bytebuddy to version 1.10.8
URL: https://github.com/apache/beam/pull/11009#issuecomment-593209029
 
 
   Run Go PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395765)
Time Spent: 3h 50m  (was: 3h 40m)

> Update bytebuddy to version 1.10.8
> --
>
> Key: BEAM-9342
> URL: https://issues.apache.org/jira/browse/BEAM-9342
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The version of bytebuddy Beam is using does not officially support Java 11 
> and it is more than 1 year old. We should upgrade to a more recent version so 
> we can benefit of the new improvements of the library as well as of ASM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395764
 ]

ASF GitHub Bot logged work on BEAM-9342:


Author: ASF GitHub Bot
Created on: 02/Mar/20 04:06
Start Date: 02/Mar/20 04:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11009: [BEAM-9342[ Update 
bytebuddy to version 1.10.8
URL: https://github.com/apache/beam/pull/11009#issuecomment-593209006
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395764)
Time Spent: 3h 40m  (was: 3.5h)

> Update bytebuddy to version 1.10.8
> --
>
> Key: BEAM-9342
> URL: https://issues.apache.org/jira/browse/BEAM-9342
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The version of bytebuddy Beam is using does not officially support Java 11 
> and it is more than 1 year old. We should upgrade to a more recent version so 
> we can benefit of the new improvements of the library as well as of ASM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395763
 ]

ASF GitHub Bot logged work on BEAM-9342:


Author: ASF GitHub Bot
Created on: 02/Mar/20 04:06
Start Date: 02/Mar/20 04:06
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11009: [BEAM-9342[ Update 
bytebuddy to version 1.10.8
URL: https://github.com/apache/beam/pull/11009#issuecomment-593208979
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395763)
Time Spent: 3.5h  (was: 3h 20m)

> Update bytebuddy to version 1.10.8
> --
>
> Key: BEAM-9342
> URL: https://issues.apache.org/jira/browse/BEAM-9342
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The version of bytebuddy Beam is using does not officially support Java 11 
> and it is more than 1 year old. We should upgrade to a more recent version so 
> we can benefit of the new improvements of the library as well as of ASM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=395760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395760
 ]

ASF GitHub Bot logged work on BEAM-9288:


Author: ASF GitHub Bot
Created on: 02/Mar/20 03:47
Start Date: 02/Mar/20 03:47
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10940: [BEAM-9288] 
Not bundle conscrypt in gRPC vendor
URL: https://github.com/apache/beam/pull/10940
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395760)
Time Spent: 3h 20m  (was: 3h 10m)

> Conscrypt shaded dependency
> ---
>
> Key: BEAM-9288
> URL: https://issues.apache.org/jira/browse/BEAM-9288
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Esun Kim
>Assignee: sunjincheng
>Priority: Critical
> Fix For: 2.20.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Conscrypt is not designed to be shaded properly mainly because of so files. I 
> happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt 
> (*2) in it. I think this could make a problem when new Conscrypt is brought 
> by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this 
> case, it may have a conflict when finding proper so files for Conscrypt. 
> *1: https://issues.apache.org/jira/browse/BEAM-9030
> *2:  
> [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78]
> *3: https://issues.apache.org/jira/browse/BEAM-6136
> *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0]
> *5: https://issues.apache.org/jira/browse/BEAM-8889
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395712
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 02/Mar/20 02:09
Start Date: 02/Mar/20 02:09
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [BEAM-8382] Add rate 
limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-593186593
 
 
   Hi @aromanenko-dev as long as the `Supplier` param is OK then I 
think this PR is ready. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395712)
Time Spent: 14h 20m  (was: 14h 10m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395685
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 01/Mar/20 22:54
Start Date: 01/Mar/20 22:54
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-593157293
 
 
   @aromanenko-dev if I wanted to make FixedDelayRateLimiter take a function as 
an argument, can I just declare it as `Supplier` or should I declare 
it some other way due to Beam's serializability requirements?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395685)
Time Spent: 14h 10m  (was: 14h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395684
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 01/Mar/20 22:53
Start Date: 01/Mar/20 22:53
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-593157293
 
 
   @aromanenko-dev if I wanted to make FixedDelayRateLimiter take a function as 
an argument, can I just declare it as `Supplier` or should I declare 
it some other way due to Beam's serializability requirements?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395684)
Time Spent: 14h  (was: 13h 50m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=395683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395683
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 01/Mar/20 22:50
Start Date: 01/Mar/20 22:50
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [WIP][BEAM-8382] Add 
rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-593156655
 
 
   Hi @aromanenko-dev, sure we have tested it. Here's what read throttles look 
like with a 1s fixed delay:
   https://user-images.githubusercontent.com/1551631/75635518-a997ad80-5bcb-11ea-89df-a3bcb663adef.png;>
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395683)
Time Spent: 13h 50m  (was: 13h 40m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8437) Consider using native doubles in standard_coders_test.yaml

2020-03-01 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved BEAM-8437.
-
Fix Version/s: 2.20.0
 Assignee: Luke Cwik
   Resolution: Won't Fix

Sounds good, thanks for the explanation.

> Consider using native doubles in standard_coders_test.yaml
> --
>
> Key: BEAM-8437
> URL: https://issues.apache.org/jira/browse/BEAM-8437
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Luke Cwik
>Priority: Minor
> Fix For: 2.20.0
>
>
> Context: https://github.com/apache/beam/pull/9188#discussion_r332233044



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9342) Update bytebuddy to version 1.10.8

2020-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9342?focusedWorklogId=395609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395609
 ]

ASF GitHub Bot logged work on BEAM-9342:


Author: ASF GitHub Bot
Created on: 01/Mar/20 14:17
Start Date: 01/Mar/20 14:17
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #11009: [BEAM-9342[ 
Update bytebuddy to version 1.10.8
URL: https://github.com/apache/beam/pull/11009#discussion_r386081545
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -514,9 +514,7 @@ public void proccessElement(ProcessContext c) {}
   SingleOutput, String> parDo = ParDo.of(fn);
 
   // Use the parDo in a pipeline to cause state coders to be inferred.
-  pipeline
-  .apply(Create.of(KV.of("input", "value")))
-  .apply(parDo);
+  pipeline.apply(Create.of(KV.of("input", "value"))).apply(parDo);
 
 Review comment:
   changes in this class are unrelated but auto applied by `spotlessApply`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395609)
Time Spent: 3h 20m  (was: 3h 10m)

> Update bytebuddy to version 1.10.8
> --
>
> Key: BEAM-9342
> URL: https://issues.apache.org/jira/browse/BEAM-9342
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The version of bytebuddy Beam is using does not officially support Java 11 
> and it is more than 1 year old. We should upgrade to a more recent version so 
> we can benefit of the new improvements of the library as well as of ASM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9417) Unable to Read form BigQuery and File system in same pipeline

2020-03-01 Thread Deepak Verma (Jira)
Deepak Verma created BEAM-9417:
--

 Summary: Unable to Read form BigQuery and File system in same 
pipeline
 Key: BEAM-9417
 URL: https://issues.apache.org/jira/browse/BEAM-9417
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
 Environment: macbook pro cataline, python3.7, apache-beam[gcp]===2.19.0
Reporter: Deepak Verma


I am trying to read from Bigquery and Local file system in my apache beam[gcp] 
pipeline.
{code:java}
pipeline_options = PipelineOptions()
options = pipeline_options.view_as(PreProcessOptions)
options.view_as(SetupOptions).save_main_session = True

p = beam.Pipeline(options=options)

apn_query = "select * from `{bq_project}.config.apn` where 
customer='{customer}'"\
 .format(bq_project=options.bq_project, customer=options.customer)

file_path = "mycsv.csv.gz"

apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, 
use_standard_sql=True))
preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())

{code}
 

When I am running this job, I am getting below error

 
{code:java}
Traceback (most recent call last):
 File "/etl/dataflow/etlTXLPreprocessor.py", line 125, in 
 run()
 File "/etl/dataflow/etlTXLPreprocessor.py", line 120, in run
 p.run().wait_until_finish()
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
461, in run
 self._options).run(False)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
474, in run
 return self.runner.run_pipeline(self, self._options)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
 line 182, in run_pipeline
 return runner.run_pipeline(pipeline, options)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
 line 413, in run_pipeline
 pipeline.replace_all(_get_transform_overrides(options))
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
443, in replace_all
 self._replace(override)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
340, in _replace
 self.visit(TransformUpdater(self))
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
503, in visit
 self._root_transform().visit(visitor, self, visited)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
939, in visit
 part.visit(visitor, pipeline, visited)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
939, in visit
 part.visit(visitor, pipeline, visited)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
939, in visit
 part.visit(visitor, pipeline, visited)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
942, in visit
 visitor.visit_transform(self)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
338, in visit_transform
 self._replace_if_needed(transform_node)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/pipeline.py", line 
301, in _replace_if_needed
 new_output = replacement_transform.expand(input_node)
 File 
"/etl/dataflow/venv3/lib/python3.7/site-packages/apache_beam/runners/direct/sdf_direct_runner.py",
 line 87, in expand
 invoker = DoFnInvoker.create_invoker(signature, process_invocation=False)
 File "apache_beam/runners/common.py", line 360, in 
apache_beam.runners.common.DoFnInvoker.create_invoker
TypeError: create_invoker() takes at least 2 positional arguments (1 
given){code}
 

But If I run my code like this
{code:java}
 
pipeline_options = PipelineOptions()
options = pipeline_options.view_as(PreProcessOptions)
options.view_as(SetupOptions).save_main_session = True

p = beam.Pipeline(options=options)
file_path = "mycsv.csv.gz"
preprocess_rows = p | beam.io.ReadFromText(file_path, coder=UnicodeCoder())
{code}
 

or like this
{code:java}
 
pipeline_options = PipelineOptions()
options = pipeline_options.view_as(PreProcessOptions)
options.view_as(SetupOptions).save_main_session = True

p = beam.Pipeline(options=options)

apn_query = "select * from `{bq_project}.config.apn` where 
customer='{customer}'"\
 .format(bq_project=options.bq_project, customer=options.customer)


apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, 
use_standard_sql=True))
{code}
 

or even like this 
{code:java}
pipeline_options = PipelineOptions()
options = pipeline_options.view_as(PreProcessOptions)
options.view_as(SetupOptions).save_main_session = True

p = beam.Pipeline(options=options)

apn_query = "select * from `{bq_project}.config.apn` where 
customer='{customer}'"\
 .format(bq_project=options.bq_project, customer=options.customer)


apn = p | beam.io.Read(beam.io.BigQuerySource(query=apn_query, 
use_standard_sql=True))
apn = p |