[jira] [Work started] (BEAM-14101) [CdapIO] Design and implement Spark Receiver Builder

2022-06-07 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-14101 started by Elizaveta Lomteva.

> [CdapIO] Design and implement Spark Receiver Builder
> 
>
> Key: BEAM-14101
> URL: https://issues.apache.org/jira/browse/BEAM-14101
> Project: Beam
>  Issue Type: Task
>  Components: io-java-cdap
>Reporter: Elizaveta Lomteva
>Assignee: Elizaveta Lomteva
>Priority: P2
>  Labels: cdap-io-sprint-4
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> h3. Context:
> CDAP plugins that support streaming sources include Receiver classes (ex. 
> [HubSpotReceiver|https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/streaming/HubspotReceiver.java])
>  that extend {{org.apache.spark.streaming.receiver.Receiver}} abstract class. 
> Receiver classes are used by plugin Streaming Utils classes (ex. 
> [HubSpotStreamingUtils|https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/streaming/HubspotStreamingUtil.java])
>  to provide {{getStream()}} method to Streaming Source classes (ex. 
> [HubSpotStreamingSource|https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/streaming/HubspotStreamingSource.java])
>  and usually placed in {{cdap/plugin/\{ name>}/plugin/source/streaming}} folder (ex.[ HubSpot plugin 
> repo|https://github.com/data-integrations/hubspot/tree/develop/src/main/java/io/cdap/plugin/hubspot/source/streaming]).
> Reference information:
>  * [Streaming plug-ins 
> integration|https://docs.google.com/document/d/1T-bhd0Qk7DBePIfgHEPagYiA1oLP4z5kYEd0S1SOGxQ/edit#heading=h.o88i6p9b13o9]
>  (Apache CDAP Connection Design Doc)
>  * [Plugin integration process 
> description|https://docs.google.com/document/d/1T-bhd0Qk7DBePIfgHEPagYiA1oLP4z5kYEd0S1SOGxQ/edit#heading=h.1h6udb1b52xc]
>  (Apache CDAP Connection Design Doc)
>  * [Streaming wrapper 
> design|https://docs.google.com/document/d/1T-bhd0Qk7DBePIfgHEPagYiA1oLP4z5kYEd0S1SOGxQ/edit#heading=h.fcafz0ydsso1]
>  (Apache CDAP Connection Design Doc)
> h3. Task Description:
> Required to design custom Spark receivers builder class so that the custom 
> receivers can be used in Apache Beam connector via SparkReceiverIO interface 
> (used in CDAP IO as a dependency).
> h3. Acceptance criteria:
> Design of builder class(es) that will create custom Spark receivers in Apache 
> Beam connectors ({{{}SparkReceiverIO{}}}).
> h4. Note:
> It is necessary that this builder class be independent of CDAP receivers and 
> can be used by any other custom Spark receiver like a part of SparkReceiverIO.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-12918) TPC-DS: Add Jenkins jobs

2022-06-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-12918:

Summary: TPC-DS: Add Jenkins jobs  (was: TPC-DS: add Jenkins jobs)

> TPC-DS: Add Jenkins jobs
> 
>
> Key: BEAM-12918
> URL: https://issues.apache.org/jira/browse/BEAM-12918
> Project: Beam
>  Issue Type: Test
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14548) Jenkins job beam_SeedJob keeps failing

2022-06-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14548:

Status: Open  (was: Triage Needed)

> Jenkins job beam_SeedJob keeps failing
> --
>
> Key: BEAM-14548
> URL: https://issues.apache.org/jira/browse/BEAM-14548
> Project: Beam
>  Issue Type: Bug
>  Components: infrastructure, testing
>Reporter: Alexey Romanenko
>Priority: P1
>  Labels: jenkins
>
> Jenkins job [beam_SeedJob|https://ci-beam.apache.org/job/beam_SeedJob/] keeps 
> failing starting from May 19th, the [last successful 
> build|https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/].
> The first failed job is [https://ci-beam.apache.org/job/beam_SeedJob/9696/]
> It fails with this error (that says not so much):
>  
> {code}
> Processing DSL script .test-infra/jenkins/job_00_seed.groovy
> Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
> Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
> Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
> ERROR: java.io.IOException: Failed to persist config.xml
> {code} 
> It blocs a testing and development process to add the new Jenkins jobs with 
> DSL



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14548) Jenkins job beam_SeedJob keeps failing

2022-06-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14548:

Description: 
Jenkins job [beam_SeedJob|https://ci-beam.apache.org/job/beam_SeedJob/] keeps 
failing starting from May 19th, the [last successful 
build|https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/].

The first failed job is [https://ci-beam.apache.org/job/beam_SeedJob/9696/]
It fails with this error (that says not so much):
 
{code}
Processing DSL script .test-infra/jenkins/job_00_seed.groovy
Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
ERROR: java.io.IOException: Failed to persist config.xml
{code} 

It blocs a testing and development process to add the new Jenkins jobs with DSL

  was:
Jenkins job [beam_SeedJob|https://ci-beam.apache.org/job/beam_SeedJob/] keeps 
failing starting from May 19th, the [last successful 
build|[https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/].]

The first failed job is [https://ci-beam.apache.org/job/beam_SeedJob/9696/]
It fails with this error (that says not so much):
 
Processing DSL script .test-infra/jenkins/job_00_seed.groovy
Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
ERROR: java.io.IOException: Failed to persist config.xml
 

It blocs a testing and development process to add the new Jenkins jobs with DSL


> Jenkins job beam_SeedJob keeps failing
> --
>
> Key: BEAM-14548
> URL: https://issues.apache.org/jira/browse/BEAM-14548
> Project: Beam
>  Issue Type: Bug
>  Components: infrastructure, testing
>Reporter: Alexey Romanenko
>Priority: P1
>
> Jenkins job [beam_SeedJob|https://ci-beam.apache.org/job/beam_SeedJob/] keeps 
> failing starting from May 19th, the [last successful 
> build|https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/].
> The first failed job is [https://ci-beam.apache.org/job/beam_SeedJob/9696/]
> It fails with this error (that says not so much):
>  
> {code}
> Processing DSL script .test-infra/jenkins/job_00_seed.groovy
> Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
> Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
> Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
> ERROR: java.io.IOException: Failed to persist config.xml
> {code} 
> It blocs a testing and development process to add the new Jenkins jobs with 
> DSL



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14548) Jenkins job beam_SeedJob keeps failing

2022-06-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14548:

Labels: jenkins  (was: )

> Jenkins job beam_SeedJob keeps failing
> --
>
> Key: BEAM-14548
> URL: https://issues.apache.org/jira/browse/BEAM-14548
> Project: Beam
>  Issue Type: Bug
>  Components: infrastructure, testing
>Reporter: Alexey Romanenko
>Priority: P1
>  Labels: jenkins
>
> Jenkins job [beam_SeedJob|https://ci-beam.apache.org/job/beam_SeedJob/] keeps 
> failing starting from May 19th, the [last successful 
> build|https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/].
> The first failed job is [https://ci-beam.apache.org/job/beam_SeedJob/9696/]
> It fails with this error (that says not so much):
>  
> {code}
> Processing DSL script .test-infra/jenkins/job_00_seed.groovy
> Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
> Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
> Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
> ERROR: java.io.IOException: Failed to persist config.xml
> {code} 
> It blocs a testing and development process to add the new Jenkins jobs with 
> DSL



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14548) Jenkins job beam_SeedJob keeps failing

2022-06-02 Thread Alexey Romanenko (Jira)
Alexey Romanenko created BEAM-14548:
---

 Summary: Jenkins job beam_SeedJob keeps failing
 Key: BEAM-14548
 URL: https://issues.apache.org/jira/browse/BEAM-14548
 Project: Beam
  Issue Type: Bug
  Components: infrastructure, testing
Reporter: Alexey Romanenko


Jenkins job [beam_SeedJob|https://ci-beam.apache.org/job/beam_SeedJob/] keeps 
failing starting from May 19th, the [last successful 
build|[https://ci-beam.apache.org/job/beam_SeedJob/lastSuccessfulBuild/].]

The first failed job is [https://ci-beam.apache.org/job/beam_SeedJob/9696/]
It fails with this error (that says not so much):
 
Processing DSL script .test-infra/jenkins/job_00_seed.groovy
Processing DSL script .test-infra/jenkins/job_CancelStaleDataflowJobs.groovy
Processing DSL script .test-infra/jenkins/job_CleanUpPrebuiltSDKImages.groovy
Processing DSL script .test-infra/jenkins/job_Dependency_Check.groovy
ERROR: java.io.IOException: Failed to persist config.xml
 

It blocs a testing and development process to add the new Jenkins jobs with DSL



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-12406) Progressing watermark for not available Kinesis stream

2022-06-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-12406:

Status: Open  (was: Triage Needed)

> Progressing watermark for not available Kinesis stream
> --
>
> Key: BEAM-12406
> URL: https://issues.apache.org/jira/browse/BEAM-12406
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.27.0
>Reporter: Mateusz Rataj
>Priority: P3
>
> We use Dataflow with Apache Beam to read events from Kinesis streams. 
> Recently, we've spotted that in a case when one of the streams was not 
> available in the middle of events processing (due to removal or problem with 
> the credentials), the data watermark for this stream was still being updated. 
>  
> Imagine scenario:
>  # Permissions allow to read from stream A
>  # Data is read from stream A
>  # Permissions are changed and don’t allow to read from stream A
>  # Watermark for stream A is progressing (but stream data is not read due to 
> permissions issue)
>  # Permissions are fixed to read stream A
>  # Data is read from stream A but from the updated watermark
> As a result, stream data between steps 3-5 is lost and the client doesn’t 
> know that.
> Additionally, it may be confusing from the Dataflow console perspective, as 
> it suggests that events are still being read from the stream. It is hard to 
> rely on the watermark as a source metric for alerting purposes as well.
> Brief investigation suggests that maybe the _KinesisReader.getWatermark()_ 
> logic doesn’t consider the state of the stream i.e. is it available or not, 
> and it treats the removed stream as a stream without traffic. Watermark 
> calculation should be adjusted to take that information into account.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14542) TPC-DS: Add user documentation on website

2022-06-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14542:

Summary: TPC-DS: Add user documentation on website  (was: TPC-DS: add 
documentation on website)

> TPC-DS: Add user documentation on website
> -
>
> Key: BEAM-14542
> URL: https://issues.apache.org/jira/browse/BEAM-14542
> Project: Beam
>  Issue Type: Improvement
>  Components: testing-tpcds, website
>Reporter: Alexey Romanenko
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (BEAM-14542) TPC-DS: add documentation on website

2022-06-01 Thread Alexey Romanenko (Jira)
Alexey Romanenko created BEAM-14542:
---

 Summary: TPC-DS: add documentation on website
 Key: BEAM-14542
 URL: https://issues.apache.org/jira/browse/BEAM-14542
 Project: Beam
  Issue Type: Improvement
  Components: testing-tpcds, website
Reporter: Alexey Romanenko






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14377) Make it easier to run seed job against open PRs

2022-06-01 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17544883#comment-17544883
 ] 

Alexey Romanenko commented on BEAM-14377:
-

Is seed job still broken or it's broken again? The latest successful build was 
at May 18th

> Make it easier to run seed job against open PRs
> ---
>
> Key: BEAM-14377
> URL: https://issues.apache.org/jira/browse/BEAM-14377
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Assignee: Pablo Estrada
>Priority: P2
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-3549) Move KinesisIO into the amazon-web-services module

2022-05-30 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-3549:
---
Fix Version/s: Not applicable
   Resolution: Won't Do
   Status: Resolved  (was: Open)

There is no need to do this since KinesisIO is fully reimplemented with AWS SDK 
v2 under \{{amazon-web-services2}} package.

> Move KinesisIO into the amazon-web-services module
> --
>
> Key: BEAM-3549
> URL: https://issues.apache.org/jira/browse/BEAM-3549
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Reporter: Ismaël Mejía
>Priority: P3
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14418) [Website] Add Arrows to Carousel on Home page

2022-05-20 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14418:

Status: Open  (was: Triage Needed)

> [Website] Add Arrows to Carousel on Home page
> -
>
> Key: BEAM-14418
> URL: https://issues.apache.org/jira/browse/BEAM-14418
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Daria Bezkorovaina
>Priority: P2
>  Labels: website
> Attachments: image-2022-05-04-21-55-31-526.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Add arrows to the sides of Carousel: 
> Figma: 
> [https://www.figma.com/file/gzeYx965q4DwV6ugDzp0iG/Apache-Beam-Playground?node-id=1977%3A7857]
>  
>  
> !image-2022-05-04-21-55-31-526.png|width=825,height=394!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14479) Create SolaceIO

2022-05-18 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14479:

Status: Open  (was: Triage Needed)

> Create SolaceIO
> ---
>
> Key: BEAM-14479
> URL: https://issues.apache.org/jira/browse/BEAM-14479
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Affects Versions: Not applicable
>Reporter: Dennis Brinley
>Priority: P2
>
> Solace would like to contribute SolaceIO to the Apache Beam project. SolaceIO 
> will fall into the Messaging I/O category. It will provide connectivity to 
> the Solace PubSub+ Event broker. The initial feature will be for Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-14479) Create SolaceIO

2022-05-18 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538922#comment-17538922
 ] 

Alexey Romanenko commented on BEAM-14479:
-

Thanks, sounds interesting!
 * All 3rd party license details: [https://www.apache.org/legal/resolved.html] 
Feel free to ask if you have specific questions.
 * Just fork a Beam repo, create your own branch from master and then create a 
PR for review. More details: [https://beam.apache.org/contribute/]
 * The more - the better, but usually, it's required to document all pubic API 
classes/methods and provide the usage examples for classes that are supposed to 
be used by users in their code. Feel free to take any of existing Java IOs as 
an example.

Additionally:
 * Please, use SplittableDoFn API for read part.
 * Don't hesitate to share your intent and ask more questions (if any) on 
d...@beam.apache.org

> Create SolaceIO
> ---
>
> Key: BEAM-14479
> URL: https://issues.apache.org/jira/browse/BEAM-14479
> Project: Beam
>  Issue Type: New Feature
>  Components: io-ideas
>Affects Versions: Not applicable
>Reporter: Dennis Brinley
>Priority: P2
>
> Solace would like to contribute SolaceIO to the Apache Beam project. SolaceIO 
> will fall into the Messaging I/O category. It will provide connectivity to 
> the Solace PubSub+ Event broker. The initial feature will be for Java.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (BEAM-13049) TPC-DS: Add Grafana dashboard

2022-05-10 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13049:
---

Assignee: Alexey Romanenko

> TPC-DS: Add Grafana dashboard
> -
>
> Key: BEAM-13049
> URL: https://issues.apache.org/jira/browse/BEAM-13049
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: benchmark
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (BEAM-12918) TPC-DS: add Jenkins jobs

2022-04-27 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-12918 started by null.
---
> TPC-DS: add Jenkins jobs
> 
>
> Key: BEAM-12918
> URL: https://issues.apache.org/jira/browse/BEAM-12918
> Project: Beam
>  Issue Type: Test
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Priority: P2
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (BEAM-12918) TPC-DS: add Jenkins jobs

2022-04-27 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-12918:
---

Assignee: Alexey Romanenko

> TPC-DS: add Jenkins jobs
> 
>
> Key: BEAM-12918
> URL: https://issues.apache.org/jira/browse/BEAM-12918
> Project: Beam
>  Issue Type: Test
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Time Spent: 6h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (BEAM-14363) Kinesis WatermarkParameters builder clobbers values

2022-04-27 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-14363:
---

Assignee: Nick Caballero

> Kinesis WatermarkParameters builder clobbers values
> ---
>
> Key: BEAM-14363
> URL: https://issues.apache.org/jira/browse/BEAM-14363
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.38.0
>Reporter: Nick Caballero
>Assignee: Nick Caballero
>Priority: P2
>  Labels: aws-sdk-v1, aws-sdk-v2
> Fix For: 2.39.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The `WatermarkParameters` builder for the Kinesis watermark policy doesn't 
> retain previously set values, which means you can only set either 
> `timestampFn` or `idleDurationThreshold`, but not both.
>  
> The issue is that the exposed methods call `builder()` instead of 
> `toBuilder()`, losing the previous values.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14363) Kinesis WatermarkParameters builder clobbers values

2022-04-27 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14363:

Status: Open  (was: Triage Needed)

> Kinesis WatermarkParameters builder clobbers values
> ---
>
> Key: BEAM-14363
> URL: https://issues.apache.org/jira/browse/BEAM-14363
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.38.0
>Reporter: Nick Caballero
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The `WatermarkParameters` builder for the Kinesis watermark policy doesn't 
> retain previously set values, which means you can only set either 
> `timestampFn` or `idleDurationThreshold`, but not both.
>  
> The issue is that the exposed methods call `builder()` instead of 
> `toBuilder()`, losing the previous values.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (BEAM-14363) Kinesis WatermarkParameters builder clobbers values

2022-04-27 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-14363 started by null.
---
> Kinesis WatermarkParameters builder clobbers values
> ---
>
> Key: BEAM-14363
> URL: https://issues.apache.org/jira/browse/BEAM-14363
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.38.0
>Reporter: Nick Caballero
>Priority: P2
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The `WatermarkParameters` builder for the Kinesis watermark policy doesn't 
> retain previously set values, which means you can only set either 
> `timestampFn` or `idleDurationThreshold`, but not both.
>  
> The issue is that the exposed methods call `builder()` instead of 
> `toBuilder()`, losing the previous values.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14345) Hadoop version tests are failing for Spark 3

2022-04-21 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14345:

Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Hadoop version tests are failing for Spark 3
> 
>
> Key: BEAM-14345
> URL: https://issues.apache.org/jira/browse/BEAM-14345
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, testing
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Issue is the well-known paranamer problem with Scala 2.12



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-13374) ToJson() of a Row with a DateTime field fails with an exception

2022-04-21 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13374:

Fix Version/s: 2.36.0

> ToJson() of a Row with a DateTime field fails with an exception
> ---
>
> Key: BEAM-13374
> URL: https://issues.apache.org/jira/browse/BEAM-13374
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-core
>Affects Versions: 2.33.0
>Reporter: Sergei Lilichenko
>Assignee: Andrei Gurau
>Priority: P2
> Fix For: 2.36.0, 2.37.0
>
>
> The following code fails with "class org.joda.time.Instant cannot be cast to 
> class org.joda.time.DateTime" exception at ToJson.of() transform. 
> The root cause of it appears to be [this conversion to 
> Instant|https://github.com/apache/beam/blob/85a122735f84c0ee46ba0fb583d9ff9e05dcf2fc/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java#L554].
> This bug appeared in a JDBC processing pipeline where a TIMESTAMP column is 
> part of the result set retrieved using JdbcIO.readRows(). 
>  
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schema = Schema.of(Field.of("timestamp", FieldType.DATETIME));
> p
> .apply("DateTime values", Create.of(new DateTime()))
> .apply("To Row", ParDo.of(new DoFn() {
>   @ProcessElement
>   public void toRow(@Element DateTime dateTime, OutputReceiver 
> rowOutputReceiver) {
> rowOutputReceiver.output(
> Row.withSchema(schema)
> .withFieldValue("timestamp", dateTime)
> .build());
>   }
> }))
> .setCoder(RowCoder.of(schema))
> .apply("To Json", ToJson.of())
> .apply("Print to Console", ParDo.of(new DoFn() {
>   @ProcessElement
>   public void print(@Element String expectedJson) {
> System.out.println("JSON: " + expectedJson);
>   }
> }));
> p.run().waitUntilFinish(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14335) Spotless sources for Spark runner

2022-04-21 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14335:

Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Spotless sources for Spark runner
> -
>
> Key: BEAM-14335
> URL: https://issues.apache.org/jira/browse/BEAM-14335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Shared base sources of the Spark runner are currently not formatted properly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (BEAM-13374) ToJson() of a Row with a DateTime field fails with an exception

2022-04-20 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525128#comment-17525128
 ] 

Alexey Romanenko commented on BEAM-13374:
-

[~andreigurau] Thanks for checking this. Is it fixed only in 2.37.0 or even 
before?

> ToJson() of a Row with a DateTime field fails with an exception
> ---
>
> Key: BEAM-13374
> URL: https://issues.apache.org/jira/browse/BEAM-13374
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-core
>Affects Versions: 2.33.0
>Reporter: Sergei Lilichenko
>Assignee: Andrei Gurau
>Priority: P2
> Fix For: 2.37.0
>
>
> The following code fails with "class org.joda.time.Instant cannot be cast to 
> class org.joda.time.DateTime" exception at ToJson.of() transform. 
> The root cause of it appears to be [this conversion to 
> Instant|https://github.com/apache/beam/blob/85a122735f84c0ee46ba0fb583d9ff9e05dcf2fc/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java#L554].
> This bug appeared in a JDBC processing pipeline where a TIMESTAMP column is 
> part of the result set retrieved using JdbcIO.readRows(). 
>  
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schema = Schema.of(Field.of("timestamp", FieldType.DATETIME));
> p
> .apply("DateTime values", Create.of(new DateTime()))
> .apply("To Row", ParDo.of(new DoFn() {
>   @ProcessElement
>   public void toRow(@Element DateTime dateTime, OutputReceiver 
> rowOutputReceiver) {
> rowOutputReceiver.output(
> Row.withSchema(schema)
> .withFieldValue("timestamp", dateTime)
> .build());
>   }
> }))
> .setCoder(RowCoder.of(schema))
> .apply("To Json", ToJson.of())
> .apply("Print to Console", ParDo.of(new DoFn() {
>   @ProcessElement
>   public void print(@Element String expectedJson) {
> System.out.println("JSON: " + expectedJson);
>   }
> }));
> p.run().waitUntilFinish(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-13374) ToJson() of a Row with a DateTime field fails with an exception

2022-04-20 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13374:

Component/s: sdk-java-core

> ToJson() of a Row with a DateTime field fails with an exception
> ---
>
> Key: BEAM-13374
> URL: https://issues.apache.org/jira/browse/BEAM-13374
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, sdk-java-core
>Affects Versions: 2.33.0
>Reporter: Sergei Lilichenko
>Assignee: Andrei Gurau
>Priority: P2
> Fix For: 2.37.0
>
>
> The following code fails with "class org.joda.time.Instant cannot be cast to 
> class org.joda.time.DateTime" exception at ToJson.of() transform. 
> The root cause of it appears to be [this conversion to 
> Instant|https://github.com/apache/beam/blob/85a122735f84c0ee46ba0fb583d9ff9e05dcf2fc/sdks/java/core/src/main/java/org/apache/beam/sdk/values/RowUtils.java#L554].
> This bug appeared in a JDBC processing pipeline where a TIMESTAMP column is 
> part of the result set retrieved using JdbcIO.readRows(). 
>  
> {code:java}
> Pipeline p = Pipeline.create();
> Schema schema = Schema.of(Field.of("timestamp", FieldType.DATETIME));
> p
> .apply("DateTime values", Create.of(new DateTime()))
> .apply("To Row", ParDo.of(new DoFn() {
>   @ProcessElement
>   public void toRow(@Element DateTime dateTime, OutputReceiver 
> rowOutputReceiver) {
> rowOutputReceiver.output(
> Row.withSchema(schema)
> .withFieldValue("timestamp", dateTime)
> .build());
>   }
> }))
> .setCoder(RowCoder.of(schema))
> .apply("To Json", ToJson.of())
> .apply("Print to Console", ParDo.of(new DoFn() {
>   @ProcessElement
>   public void print(@Element String expectedJson) {
> System.out.println("JSON: " + expectedJson);
>   }
> }));
> p.run().waitUntilFinish(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-14323) Improve IDE integration for Spark cross version builds

2022-04-20 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14323:

Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Improve IDE integration for Spark cross version builds
> --
>
> Key: BEAM-14323
> URL: https://issues.apache.org/jira/browse/BEAM-14323
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
> Fix For: 2.39.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> With the current build setup, developer experience is fairly poor when 
> working with cross version builds for Spark (but also similarly for Flink):
>  * Sources for version specific overrides are copied to a new location and 
> references as gradle sources from there. First of all, this is totally 
> unnecessary. These sources are not shared and should be used in place. But 
> much more troublesome, the actual sources won't be resolved / checked by any 
> IDE anymore and can't be properly worked on that way. Sadly for no reason at 
> all ...
>  * The actual shared resources on the other hand are referenced (added to 
> srcDirs) in place. The IDE will randomly assign them to one Spark version 
> module. Typically, for IntelliJ at least, that's the first (lower) one and 
> not the one developers are actively working on.
> The suggested changes is:
>  * Don't copy version specific overrides
>  * Only copy shared sources conditionally based on a flag. This allows 
> developers to disable copying to pick a primary version they intend to work 
> on. 
> Note: This is primary a cosmetic flag to improve IDE integration and has no 
> impact on builds, even if all modules disable copying.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (BEAM-8218) Implement Apache PulsarIO

2022-04-11 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8218:
---
Component/s: io-java-pulsar

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas, io-java-pulsar
>Reporter: Alex Van Boxel
>Assignee: Marco Robles
>Priority: P3
> Fix For: 2.39.0
>
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-8218) Implement Apache PulsarIO

2022-04-11 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8218:
---
Fix Version/s: 2.39.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

I move forward and resolve this jira since PulsarIO was already merged and it 
will be available in the next Beam release (2.39.0). I will create a dedicated 
jira component for this to track other issue for this connector.

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Assignee: Marco Robles
>Priority: P3
> Fix For: 2.39.0
>
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (BEAM-8218) Implement Apache PulsarIO

2022-04-11 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518694#comment-17518694
 ] 

Alexey Romanenko edited comment on BEAM-8218 at 4/11/22 9:16 AM:
-

I see that the second PR was merged on 25/02 but the 2.38.0 release branch was 
cut on 24/02. So, seems that this connector will be released in the Beam 2.39.0 
which is not yet started.

Can we resolve this jira or it requires some additional work?


was (Author: aromanenko):
I see that a second PR was merged on 25/02 but the 2.38.0 release branch was 
cut on 24/02. So, seems that this connector will be released in the Beam 2.39.0 
which is not yet started.

Can we resolve this jira or it requires some additional work?

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Assignee: Marco Robles
>Priority: P3
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-8218) Implement Apache PulsarIO

2022-04-07 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518694#comment-17518694
 ] 

Alexey Romanenko commented on BEAM-8218:


I see that a second PR was merged on 25/02 but the 2.38.0 release branch was 
cut on 24/02. So, seems that this connector will be released in the Beam 2.39.0 
which is not yet started.

Can we resolve this jira or it requires some additional work?

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Assignee: Marco Robles
>Priority: P3
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13610) Conflict with Spark 3.1.2 scala version

2022-03-25 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13610:

Labels:   (was: stale-P2)

> Conflict with Spark 3.1.2 scala version
> ---
>
> Key: BEAM-13610
> URL: https://issues.apache.org/jira/browse/BEAM-13610
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.34.0, 2.35.0
>Reporter: Ronan SY
>Priority: P2
> Fix For: Not applicable
>
>
> When trying to use Beam with spark 3.1.2 we are running into this issue :
> {code:java}
> InvalidClassException: scala.collection.mutable.WrappedArray {code}
> As explained here : 
> [https://www.mail-archive.com/issues@spark.apache.org/msg297820.html]
> It's an incompatibility issue :  spark 3.1.2 is compiled with scala 2.12.10 
> but this issue is fixed only for scala >= 2.12.14.
>  
> It seems that this issue can be fixed :
> => either by compiling spark & beam with a scala version >= 2.12.14
> => either by upgrading the spark version used by beam to 3.2.0, which can be 
> compiled with scala 2.13
>  
> Do you know if the switch to spark 3.2.0 is planned ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13610) Conflict with Spark 3.1.2 scala version

2022-03-25 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13610:

Fix Version/s: Not applicable
   Resolution: Duplicate
   Status: Resolved  (was: Open)

I close it as a duplicate of BEAM-12762. 

> Conflict with Spark 3.1.2 scala version
> ---
>
> Key: BEAM-13610
> URL: https://issues.apache.org/jira/browse/BEAM-13610
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.34.0, 2.35.0
>Reporter: Ronan SY
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>
> When trying to use Beam with spark 3.1.2 we are running into this issue :
> {code:java}
> InvalidClassException: scala.collection.mutable.WrappedArray {code}
> As explained here : 
> [https://www.mail-archive.com/issues@spark.apache.org/msg297820.html]
> It's an incompatibility issue :  spark 3.1.2 is compiled with scala 2.12.10 
> but this issue is fixed only for scala >= 2.12.14.
>  
> It seems that this issue can be fixed :
> => either by compiling spark & beam with a scala version >= 2.12.14
> => either by upgrading the spark version used by beam to 3.2.0, which can be 
> compiled with scala 2.13
>  
> Do you know if the switch to spark 3.2.0 is planned ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13049) TPC-DS: Add Grafana dashboard

2022-03-25 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13049 started by Alexey Romanenko.
---
> TPC-DS: Add Grafana dashboard
> -
>
> Key: BEAM-13049
> URL: https://issues.apache.org/jira/browse/BEAM-13049
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: benchmark
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13049) TPC-DS: Add Grafana dashboard

2022-03-25 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13049:
---

Assignee: Alexey Romanenko

> TPC-DS: Add Grafana dashboard
> -
>
> Key: BEAM-13049
> URL: https://issues.apache.org/jira/browse/BEAM-13049
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: benchmark
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13232) Underlying AWS Kinesis clients are not closed

2022-03-23 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13232:

Fix Version/s: 2.38.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Underlying AWS Kinesis clients are not closed
> -
>
> Key: BEAM-13232
> URL: https://issues.apache.org/jira/browse/BEAM-13232
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P3
>  Labels: aws, aws-sdk-v2, kinesis
> Fix For: 2.38.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> All across _KinesisSource_ and related Kinesis clients are neither shared nor 
> closed. AWS clients are fairly resource heavy (thread/connection pool), so 
> this should be fixed in the following places:
>  * SimplifiedKinesisClient
>  * KinesisSource.split



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-11325) KafkaIO should be able to read from new added topic/partition automatically during pipeline execution time

2022-03-22 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510789#comment-17510789
 ] 

Alexey Romanenko commented on BEAM-11325:
-

So, should we change a resolution for this one since it was already released?

> KafkaIO should be able to read from new added topic/partition automatically 
> during pipeline execution time
> --
>
> Key: BEAM-11325
> URL: https://issues.apache.org/jira/browse/BEAM-11325
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kafka
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: P2
> Fix For: 2.29.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-12880) KafkaIO Connector-updating/overwriting consumer groupid with random prefix

2022-03-18 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508998#comment-17508998
 ] 

Alexey Romanenko commented on BEAM-12880:
-

[~mhp] Sorry, I didn't work on this. Do you have any suggestions?

> KafkaIO Connector-updating/overwriting consumer groupid with random prefix
> --
>
> Key: BEAM-12880
> URL: https://issues.apache.org/jira/browse/BEAM-12880
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka
>Affects Versions: 2.27.0
>Reporter: Logeshwaran
>Priority: P3
>
> Apache beam version: 2.27
> Connector: KafkaIO
> cloud service : GCP Dataflow
> language : JAVA11
> we are trying to read the avro messages from confluent kafka topic using 
> dataflow service as a consumer. (Using KafkaIO Connector)
> While trying to access the schema registry using provided (schema url, 
> subject, version, ssl configuration(keystore,truststore..etc))  details , we 
> are getting the below error.
>  
> {color:#ef6950}Error message from worker: 
> org.apache.kafka.common.errors.GroupAuthorizationException:{color}
>  {color:#ef6950}Not authorized to access group: 
> initialOffset_offset_consumer_1179967555_kafka-connectivity-test{color}
>  
> Expected Result : Consumer groupid should not change also should able to 
> connect kafka consumer.
>  
> Actual Result: 
> Though the provided groupid was : kafka-connectivity-test, some how it is 
> changing the value to  
> initialOffset_offset_consumer_1179967555_kafka-connectivity-test.
>  
> PFA related code snippets. 
>  
> !image-2021-09-14-17-34-21-672.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14125) Update IO matrix for website recommending usage of AWS 2

2022-03-18 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14125:

Fix Version/s: Not applicable
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Update IO matrix for website recommending usage of AWS 2
> 
>
> Key: BEAM-14125
> URL: https://issues.apache.org/jira/browse/BEAM-14125
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws-sdk-v2, web
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-18 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508989#comment-17508989
 ] 

Alexey Romanenko edited comment on BEAM-14099 at 3/18/22, 6:06 PM:
---

Actually, this issue is caused by the way how Apache Spark handles the lineage 
of its pipelines. To make it fault tolerant, the driver process keeps all DAG 
in stack and it may take quite significant amount of stack memory for large 
pipelines with many transforms that don't materialise or checkpointing the 
results in the middle. So, it's mostly not a Beam Spark Runner problem rather 
than just a Spark particular feature.

Some workarounds can be used to overcome this issue:
- increase "-Xss" option for Spark driver process 
- materialise results from time to time with an empty side input and 
{{Reshuffle}}, like it's done in 
[JdbcIO.Reparallelize|https://github.com/apache/beam/blob/d7f7f8e587a56fbc7fbfbdd8b010a8f6991234ee/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L2084]


was (Author: aromanenko):
Actually, this issue is caused by the way how Apache Spark handles the lineage 
of its pipelines. To make it fault tolerant, the driver process keeps all DAG 
in stack and it may take quite significant amount of memory for large pipelines 
with many transforms that don't materialise or checkpointing the results. So, 
it's mostly not a Beam Spark Runner problem rather than just a Spark particular 
feature.

Some workarounds can be used to overcome this issue:
- increase "-Xss" option for Spark driver process 
- materialise results from time to time with an empty side input and 
{{Reshuffle}}, like it's done in 
[JdbcIO.Reparallelize|https://github.com/apache/beam/blob/d7f7f8e587a56fbc7fbfbdd8b010a8f6991234ee/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L2084]

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P1
> Fix For: Not applicable
>
> Attachments: BEAM-14099_spark.log
>
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.refl

[jira] [Updated] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-18 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14099:

Fix Version/s: Not applicable
   Resolution: Not A Bug
   Status: Resolved  (was: Open)

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P1
> Fix For: Not applicable
>
> Attachments: BEAM-14099_spark.log
>
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at java.io.ObjectStreamClass.invokeWriteReplace 
> (ObjectStreamClass.java:1244)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
> at scala.collection.immutable.List$SerializationProxy.writeObject 
> (List.scala:479)
> at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> ...
> {code}
> It seems that {{N}} depends on environment configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-18 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-14099:
---

Assignee: Alexey Romanenko

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P1
> Attachments: BEAM-14099_spark.log
>
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at java.io.ObjectStreamClass.invokeWriteReplace 
> (ObjectStreamClass.java:1244)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
> at scala.collection.immutable.List$SerializationProxy.writeObject 
> (List.scala:479)
> at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> ...
> {code}
> It seems that {{N}} depends on environment configuration.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-18 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508989#comment-17508989
 ] 

Alexey Romanenko commented on BEAM-14099:
-

Actually, this issue is caused by the way how Apache Spark handles the lineage 
of its pipelines. To make it fault tolerant, the driver process keeps all DAG 
in stack and it may take quite significant amount of memory for large pipelines 
with many transforms that don't materialise or checkpointing the results. So, 
it's mostly not a Beam Spark Runner problem rather than just a Spark particular 
feature.

Some workarounds can be used to overcome this issue:
- increase "-Xss" option for Spark driver process 
- materialise results from time to time with an empty side input and 
{{Reshuffle}}, like it's done in 
[JdbcIO.Reparallelize|https://github.com/apache/beam/blob/d7f7f8e587a56fbc7fbfbdd8b010a8f6991234ee/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L2084]

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Priority: P1
> Attachments: BEAM-14099_spark.log
>
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at java.io.ObjectStreamClass.invokeWriteReplace 
> (ObjectStreamClass.java:1244)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
>   

[jira] [Updated] (BEAM-13174) Deprecate amazon-web-services & kinesis module (SDK v1)

2022-03-16 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13174:

Labels: aws-sdk-v1  (was: aws-sdk-v1 stale-assigned)

> Deprecate amazon-web-services & kinesis module (SDK v1)
> ---
>
> Key: BEAM-13174
> URL: https://issues.apache.org/jira/browse/BEAM-13174
> Project: Beam
>  Issue Type: Task
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws-sdk-v1
>
> As discussed with [~aromanenko] and previously in the community, goal is to 
> keep only IOs for AWS SDK v2. The module for AWS SDK v1 will be deprecated 
> and eventually be removed to avoid maintaining two code basis.
> As part of this ticket we need to evaluate if there's potentially missing 
> features in aws2 that would prevent deprecation / removal.
> Particularly lack of writes for KinesisIO in aws2 are a known gap and have to 
> be addressed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13174) Deprecate amazon-web-services & kinesis module (SDK v1)

2022-03-16 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13174 started by Moritz Mack.
--
> Deprecate amazon-web-services & kinesis module (SDK v1)
> ---
>
> Key: BEAM-13174
> URL: https://issues.apache.org/jira/browse/BEAM-13174
> Project: Beam
>  Issue Type: Task
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws-sdk-v1, stale-assigned
>
> As discussed with [~aromanenko] and previously in the community, goal is to 
> keep only IOs for AWS SDK v2. The module for AWS SDK v1 will be deprecated 
> and eventually be removed to avoid maintaining two code basis.
> As part of this ticket we need to evaluate if there's potentially missing 
> features in aws2 that would prevent deprecation / removal.
> Particularly lack of writes for KinesisIO in aws2 are a known gap and have to 
> be addressed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13174) Deprecate amazon-web-services & kinesis module (SDK v1)

2022-03-16 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507732#comment-17507732
 ] 

Alexey Romanenko commented on BEAM-13174:
-

Work in progress on this and almost finished.

> Deprecate amazon-web-services & kinesis module (SDK v1)
> ---
>
> Key: BEAM-13174
> URL: https://issues.apache.org/jira/browse/BEAM-13174
> Project: Beam
>  Issue Type: Task
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws-sdk-v1, stale-assigned
>
> As discussed with [~aromanenko] and previously in the community, goal is to 
> keep only IOs for AWS SDK v2. The module for AWS SDK v1 will be deprecated 
> and eventually be removed to avoid maintaining two code basis.
> As part of this ticket we need to evaluate if there's potentially missing 
> features in aws2 that would prevent deprecation / removal.
> Particularly lack of writes for KinesisIO in aws2 are a known gap and have to 
> be addressed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13175) Implement missing KinesisIO.Write for aws2

2022-03-15 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13175:

Fix Version/s: 2.38.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Implement missing KinesisIO.Write for aws2
> --
>
> Key: BEAM-13175
> URL: https://issues.apache.org/jira/browse/BEAM-13175
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws-sdk-v2
> Fix For: 2.38.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> The Kinesis Producer library KPL isn't available for aws2. Hence, we cannot 
> trivially port the old KinesisIO.Write over.
> But at the same time KPL also doesn't align with the ideas behind SDFs. So 
> it's a good opportunity to implement it properly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-14 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14099:

Description: 
If pipeline, written in Java SDK, contains a large number of PTransforms then 
it fails with a  {{java.lang.StackOverflowError}}

Code snippet to reproduce (based on WordCount example):
{code}
public class WordCountWithNFilters {
  private static final int N = 100;

  public static void main(String[] args) {
PipelineOptions options = 
PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);

PCollection words = 
p.apply(TextIO.read().from("file://tmp/input.txt"))
.apply(
FlatMapElements.into(TypeDescriptors.strings())
.via((String line) -> Arrays.asList(line.split("[^\\p{L}]+";
for (int i = 0; i < N; i++) {
  words = words.apply(Filter.by((String word) -> !word.isEmpty()));
}
words.apply(Count.perElement())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(KV wordCount) ->
wordCount.getKey() + ": " + wordCount.getValue()))
.apply(TextIO.write().to("wordcounts"));

p.run().waitUntilFinish();
  }
}
{code}

Log while running with SparkRunner:
{code}
2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
View.CreatePCollectionView
[WARNING] 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.StackOverflowError
at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
(SparkPipelineResult.java:73)
at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
(SparkPipelineResult.java:104)
at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
(SparkPipelineResult.java:92)
at org.apache.beam.samples.sql.WordCountWithNFilters.main 
(WordCountWithNFilters.java:39)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
at java.lang.Thread.run (Thread.java:748)
Caused by: java.lang.StackOverflowError
at java.lang.ReflectiveOperationException. 
(ReflectiveOperationException.java:89)
at java.lang.reflect.InvocationTargetException. 
(InvocationTargetException.java:72)
at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at java.io.ObjectStreamClass.invokeWriteReplace 
(ObjectStreamClass.java:1244)
at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
at java.io.ObjectOutputStream.defaultWriteFields 
(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData (ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject 
(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields 
(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData (ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject 
(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
at scala.collection.immutable.List$SerializationProxy.writeObject 
(List.scala:479)
at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
...
{code}

It seems that {{N}} depends on environment configuration.

  was:
If pipeline, written in Java SDK, contains a large number of PTransforms then 
it fails with a  {{java.lang.StackOverflowError}}

Code snippet to reproduce (based on WordCount example):
{code}
public class WordCountWithNFilters {
  private static final int N = 100;

  public static void main(String[] args) {
PipelineOptions options = 
PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);

PCollection words = 
p.apply(TextIO.read().from("file://tmp/input.txt"))
.apply(
FlatMapElements.into(TypeDescriptors.strings())
.via((String line) -> Arrays.asList(line.split("[^\\p{L}]+";
for (int i = 0; i < N; i++) {
  words = words.apply(Filter.by((String word) -> !word.isEmpty()));
}
words.apply(Count.perElement())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(KV wordCount) ->
wordCount.getKey() + ": " + wordCount.getValue()))
.apply(TextIO.write().to("wordcounts"));

p.run().waitUntilFinish();
  }
}
{code}

Log while running with SparkRunne

[jira] [Updated] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-14 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14099:

Attachment: BEAM-14099_spark.log

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Priority: P1
> Attachments: BEAM-14099_spark.log
>
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at java.io.ObjectStreamClass.invokeWriteReplace 
> (ObjectStreamClass.java:1244)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
> at scala.collection.immutable.List$SerializationProxy.writeObject 
> (List.scala:479)
> at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-14 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14099:

Status: Open  (was: Triage Needed)

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Priority: P1
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at java.io.ObjectStreamClass.invokeWriteReplace 
> (ObjectStreamClass.java:1244)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
> at scala.collection.immutable.List$SerializationProxy.writeObject 
> (List.scala:479)
> at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-14 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14099:

Affects Version/s: 2.37.0

> Pipeline with large number of PTransforms fails with StackOverflowError 
> 
>
> Key: BEAM-14099
> URL: https://issues.apache.org/jira/browse/BEAM-14099
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Alexey Romanenko
>Priority: P1
>
> If pipeline, written in Java SDK, contains a large number of PTransforms then 
> it fails with a  {{java.lang.StackOverflowError}}
> Code snippet to reproduce (based on WordCount example):
> {code}
> public class WordCountWithNFilters {
>   private static final int N = 100;
>   public static void main(String[] args) {
> PipelineOptions options = 
> PipelineOptionsFactory.fromArgs(args).withValidation().create();
> Pipeline p = Pipeline.create(options);
> PCollection words = 
> p.apply(TextIO.read().from("file://tmp/input.txt"))
> .apply(
> FlatMapElements.into(TypeDescriptors.strings())
> .via((String line) -> 
> Arrays.asList(line.split("[^\\p{L}]+";
> for (int i = 0; i < N; i++) {
>   words = words.apply(Filter.by((String word) -> !word.isEmpty()));
> }
> words.apply(Count.perElement())
> .apply(
> MapElements.into(TypeDescriptors.strings())
> .via(
> (KV wordCount) ->
> wordCount.getKey() + ": " + wordCount.getValue()))
> .apply(TextIO.write().to("wordcounts"));
> p.run().waitUntilFinish();
>   }
> }
> {code}
> Log while running with SparkRunner:
> {code}
> 2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
> org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
> View.CreatePCollectionView
> [WARNING] 
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.StackOverflowError
> at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
> (SparkPipelineResult.java:73)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:104)
> at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
> (SparkPipelineResult.java:92)
> at org.apache.beam.samples.sql.WordCountWithNFilters.main 
> (WordCountWithNFilters.java:39)
> at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
> at java.lang.Thread.run (Thread.java:748)
> Caused by: java.lang.StackOverflowError
> at java.lang.ReflectiveOperationException. 
> (ReflectiveOperationException.java:89)
> at java.lang.reflect.InvocationTargetException. 
> (InvocationTargetException.java:72)
> at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at java.io.ObjectStreamClass.invokeWriteReplace 
> (ObjectStreamClass.java:1244)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.defaultWriteFields 
> (ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData 
> (ObjectOutputStream.java:1509)
> at java.io.ObjectOutputStream.writeOrdinaryObject 
> (ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
> at scala.collection.immutable.List$SerializationProxy.writeObject 
> (List.scala:479)
> at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-14099) Pipeline with large number of PTransforms fails with StackOverflowError

2022-03-14 Thread Alexey Romanenko (Jira)
Alexey Romanenko created BEAM-14099:
---

 Summary: Pipeline with large number of PTransforms fails with 
StackOverflowError 
 Key: BEAM-14099
 URL: https://issues.apache.org/jira/browse/BEAM-14099
 Project: Beam
  Issue Type: Bug
  Components: runner-spark, sdk-java-core
Reporter: Alexey Romanenko


If pipeline, written in Java SDK, contains a large number of PTransforms then 
it fails with a  {{java.lang.StackOverflowError}}

Code snippet to reproduce (based on WordCount example):
{code}
public class WordCountWithNFilters {
  private static final int N = 100;

  public static void main(String[] args) {
PipelineOptions options = 
PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline p = Pipeline.create(options);

PCollection words = 
p.apply(TextIO.read().from("file://tmp/input.txt"))
.apply(
FlatMapElements.into(TypeDescriptors.strings())
.via((String line) -> Arrays.asList(line.split("[^\\p{L}]+";
for (int i = 0; i < N; i++) {
  words = words.apply(Filter.by((String word) -> !word.isEmpty()));
}
words.apply(Count.perElement())
.apply(
MapElements.into(TypeDescriptors.strings())
.via(
(KV wordCount) ->
wordCount.getKey() + ": " + wordCount.getValue()))
.apply(TextIO.write().to("wordcounts"));

p.run().waitUntilFinish();
  }
}
{code}

Log while running with SparkRunner:
{code}
2022-03-14 19:01:30,465 [pool-3-thread-1] INFO  
org.apache.beam.runners.spark.SparkRunner$Evaluator  - Evaluating 
View.CreatePCollectionView
[WARNING] 
org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
java.lang.StackOverflowError
at org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom 
(SparkPipelineResult.java:73)
at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
(SparkPipelineResult.java:104)
at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish 
(SparkPipelineResult.java:92)
at org.apache.beam.samples.sql.WordCountWithNFilters.main 
(WordCountWithNFilters.java:39)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:254)
at java.lang.Thread.run (Thread.java:748)
Caused by: java.lang.StackOverflowError
at java.lang.ReflectiveOperationException. 
(ReflectiveOperationException.java:89)
at java.lang.reflect.InvocationTargetException. 
(InvocationTargetException.java:72)
at sun.reflect.GeneratedMethodAccessor39.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at java.io.ObjectStreamClass.invokeWriteReplace 
(ObjectStreamClass.java:1244)
at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1136)
at java.io.ObjectOutputStream.defaultWriteFields 
(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData (ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject 
(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields 
(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData (ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject 
(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0 (ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject (ObjectOutputStream.java:348)
at scala.collection.immutable.List$SerializationProxy.writeObject 
(List.scala:479)
at sun.reflect.GeneratedMethodAccessor40.invoke (Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
...
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-03-14 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506385#comment-17506385
 ] 

Alexey Romanenko commented on BEAM-13298:
-

Hi [~weifonghsia] , do you have any progress on this jira?

> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Assignee: Wei Fong Hsia
>Priority: P2
>  Labels: KafkaIO, stale-assigned
>   Original Estimate: 4h
>  Time Spent: 10m
>  Remaining Estimate: 3h 50m
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-2766) Hadoop(Input)FormatIO should support Void/null key/values

2022-03-11 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-2766:
---
Summary: Hadoop(Input)FormatIO should support Void/null key/values  (was: 
HadoopInputFormatIO should support Void/null key/values)

> Hadoop(Input)FormatIO should support Void/null key/values
> -
>
> Key: BEAM-2766
> URL: https://issues.apache.org/jira/browse/BEAM-2766
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-format
>Affects Versions: 2.2.0
>Reporter: Neville Li
>Assignee: Vitaly Terentyev
>Priority: P3
>  Labels: cdap-io-sprint-2
> Fix For: 2.38.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Many Hadoop {{InputFormat}} implementations use {{Void}} as key/value type 
> and generates null values which causes {{NullPointerException}} in 
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L714
> {{HadoopInputFormatIO}} should ignore these and not clone them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-2766) HadoopInputFormatIO should support Void/null key/values

2022-03-11 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-2766:
---
Fix Version/s: 2.38.0

> HadoopInputFormatIO should support Void/null key/values
> ---
>
> Key: BEAM-2766
> URL: https://issues.apache.org/jira/browse/BEAM-2766
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hadoop-format
>Affects Versions: 2.2.0
>Reporter: Neville Li
>Assignee: Vitaly Terentyev
>Priority: P3
>  Labels: cdap-io-sprint-2
> Fix For: 2.38.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Many Hadoop {{InputFormat}} implementations use {{Void}} as key/value type 
> and generates null values which causes {{NullPointerException}} in 
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop/input-format/src/main/java/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.java#L714
> {{HadoopInputFormatIO}} should ignore these and not clone them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-14011) AWS SDK2 S3FileSystem MultiPart Copy sets incorrect request parameters

2022-03-04 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-14011:

Fix Version/s: 2.38.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> AWS SDK2 S3FileSystem MultiPart Copy sets incorrect request parameters
> --
>
> Key: BEAM-14011
> URL: https://issues.apache.org/jira/browse/BEAM-14011
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0, 2.34.0, 2.35.0, 
> 2.36.0
>Reporter: Stephen Patel
>Assignee: Stephen Patel
>Priority: P2
> Fix For: 2.38.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The following code locations use incorrect parameters:
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L518-L519]
>  and 
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L539-L540]
>  specifies the sourcePath instead of the destinationPath.
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L541]
>  specifies destinationPath.getBucket() instead of sourcePath.getBucket()
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L556]
>  specifies a constant part number of 1, instead of using the partNumber 
> variable.
> Taken together, these issues cause multipart copies to fail due to:
> {noformat}
> software.amazon.awssdk.services.s3.model.NoSuchUploadException: The specified 
> upload does not exist. The upload ID may be invalid, or the upload may have 
> been aborted or completed.
> {noformat}
> If the object references are fixes, the part number issue causes multipart 
> copies to fail due to:
> {noformat}
> software.amazon.awssdk.services.s3.model.S3Exception: The list of parts was 
> not in ascending order. Parts must be ordered by part number. 
> {noformat}
> Note: I checked the AWS SDK1 S3FileSystem and did not see the same issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-14011) AWS SDK2 S3FileSystem MultiPart Copy sets incorrect request parameters

2022-03-04 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-14011 started by Stephen Patel.

> AWS SDK2 S3FileSystem MultiPart Copy sets incorrect request parameters
> --
>
> Key: BEAM-14011
> URL: https://issues.apache.org/jira/browse/BEAM-14011
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0, 2.34.0, 2.35.0, 
> 2.36.0
>Reporter: Stephen Patel
>Assignee: Stephen Patel
>Priority: P2
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The following code locations use incorrect parameters:
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L518-L519]
>  and 
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L539-L540]
>  specifies the sourcePath instead of the destinationPath.
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L541]
>  specifies destinationPath.getBucket() instead of sourcePath.getBucket()
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L556]
>  specifies a constant part number of 1, instead of using the partNumber 
> variable.
> Taken together, these issues cause multipart copies to fail due to:
> {noformat}
> software.amazon.awssdk.services.s3.model.NoSuchUploadException: The specified 
> upload does not exist. The upload ID may be invalid, or the upload may have 
> been aborted or completed.
> {noformat}
> If the object references are fixes, the part number issue causes multipart 
> copies to fail due to:
> {noformat}
> software.amazon.awssdk.services.s3.model.S3Exception: The list of parts was 
> not in ascending order. Parts must be ordered by part number. 
> {noformat}
> Note: I checked the AWS SDK1 S3FileSystem and did not see the same issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-5577) Beam Dependency Update Request: org.mongodb:mongo-java-driver

2022-03-03 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-5577:
---
Fix Version/s: 2.38.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Beam Dependency Update Request: org.mongodb:mongo-java-driver
> -
>
> Key: BEAM-5577
> URL: https://issues.apache.org/jira/browse/BEAM-5577
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: P3
> Fix For: 2.38.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
>  - 2018-10-01 19:31:53.116006 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.8.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:19:50.933043 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.8.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-15 12:13:43.378953 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.8.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-22 12:14:02.941839 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.8.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-29 12:18:53.503708 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.8.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-05 12:15:38.415073 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.8.2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-12 12:15:38.514349 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.9.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-19 12:16:19.793075 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.9.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-11-26 12:15:20.150770 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.2.2. The latest version is 3.9.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-23 12:23:04.576425 
> -
> Please consider upgrading the dependency 
> org.mongodb:mongo-java-driver. 
> The current version is 3.9.1. The latest version is 3.12.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:17:39.058418 
> -
>  

[jira] [Commented] (BEAM-13731) FhirIO: Add support for BATCH bundle errors

2022-03-02 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500082#comment-17500082
 ] 

Alexey Romanenko commented on BEAM-13731:
-

I believe it will be included into 2.37.0 release

> FhirIO: Add support for BATCH bundle errors
> ---
>
> Key: BEAM-13731
> URL: https://issues.apache.org/jira/browse/BEAM-13731
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-healthcare
>Reporter: Milena Bukal
>Assignee: Milena Bukal
>Priority: P2
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> BATCH bundles return a 200 status code even if entries fail, which results in 
> us only incrementing the success counter and not the failure counters, even 
> if all entries are failed to be written to the FHIR store. (Meanwhile, 
> transaction bundles - which is all FhirIO has been tested with, return a 
> non-20X HTTP response if _any_ of the entries don't succeed).
> Therefore, need to add parsing and metrics around this to be able to support 
> bundles of type BATCH.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13731) FhirIO: Add support for BATCH bundle errors

2022-03-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13731:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> FhirIO: Add support for BATCH bundle errors
> ---
>
> Key: BEAM-13731
> URL: https://issues.apache.org/jira/browse/BEAM-13731
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-healthcare
>Reporter: Milena Bukal
>Assignee: Milena Bukal
>Priority: P2
> Fix For: 2.37.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> BATCH bundles return a 200 status code even if entries fail, which results in 
> us only incrementing the success counter and not the failure counters, even 
> if all entries are failed to be written to the FHIR store. (Meanwhile, 
> transaction bundles - which is all FhirIO has been tested with, return a 
> non-20X HTTP response if _any_ of the entries don't succeed).
> Therefore, need to add parsing and metrics around this to be able to support 
> bundles of type BATCH.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-14011) AWS SDK2 S3FileSystem MultiPart Copy sets incorrect request parameters

2022-03-02 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-14011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500010#comment-17500010
 ] 

Alexey Romanenko commented on BEAM-14011:
-

Thanks! I briefly took a look on AWS v1 version and it seems there is not such 
issue with multipartCopy. 

Also, {{S3FileSystem}} AWS v2 uses a deprecated API for multipart copy now, so 
we need to fix it there as well.

> AWS SDK2 S3FileSystem MultiPart Copy sets incorrect request parameters
> --
>
> Key: BEAM-14011
> URL: https://issues.apache.org/jira/browse/BEAM-14011
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Affects Versions: 2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0, 2.34.0, 2.35.0, 
> 2.36.0
>Reporter: Stephen Patel
>Assignee: Stephen Patel
>Priority: P2
>
> The following code locations use incorrect parameters:
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L518-L519]
>  and 
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L539-L540]
>  specifies the sourcePath instead of the destinationPath.
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L541]
>  specifies destinationPath.getBucket() instead of sourcePath.getBucket()
> [Here|https://github.com/apache/beam/blob/v2.36.0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3FileSystem.java#L556]
>  specifies a constant part number of 1, instead of using the partNumber 
> variable.
> Taken together, these issues cause multipart copies to fail due to:
> {noformat}
> software.amazon.awssdk.services.s3.model.NoSuchUploadException: The specified 
> upload does not exist. The upload ID may be invalid, or the upload may have 
> been aborted or completed.
> {noformat}
> If the object references are fixes, the part number issue causes multipart 
> copies to fail due to:
> {noformat}
> software.amazon.awssdk.services.s3.model.S3Exception: The list of parts was 
> not in ascending order. Parts must be ordered by part number. 
> {noformat}
> Note: I checked the AWS SDK1 S3FileSystem and did not see the same issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-12877) Beam Row Avro conversion: fixed and bytes decimals

2022-02-16 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493337#comment-17493337
 ] 

Alexey Romanenko commented on BEAM-12877:
-

[~kegelink] Are you going to add your path as a pull request? If yes then you 
can find some details on this 
[here|https://beam.apache.org/contribute/#make-your-change].

> Beam Row Avro conversion: fixed and bytes decimals
> --
>
> Key: BEAM-12877
> URL: https://issues.apache.org/jira/browse/BEAM-12877
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.32.0
>Reporter: Koen Egelink
>Assignee: Koen Egelink
>Priority: P2
>  Labels: stale-assigned
> Attachments: beam-sdks-java-core-2.32.0-fixed-decimal.patch
>
>
> An Avro decimal logical type annotates Avro bytes or fixed types.
> Current Row to Avro conversion is limited to bytes type and in addition 
> hardcodes precision to MAX_INT and scale to 0. 
> I have attached a patch that adds support for decimal bytes and fixed types.
> I could think of 2 possible ways to solve this:
>  # Change Row decimal type to a logical type
>  # Use Beam field options to pass additional metadata required to serialize 
> decimals
> I felt that overhauling Beam schema types might not be a good idea. Instead I 
> went with option 2.
> Passes ./gradlew check
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (BEAM-12877) Beam Row Avro conversion: fixed and bytes decimals

2022-02-16 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493337#comment-17493337
 ] 

Alexey Romanenko edited comment on BEAM-12877 at 2/16/22, 4:36 PM:
---

[~kegelink] Are you going to add your patch as a pull request? If yes then you 
can find some details on this 
[here|https://beam.apache.org/contribute/#make-your-change].


was (Author: aromanenko):
[~kegelink] Are you going to add your path as a pull request? If yes then you 
can find some details on this 
[here|https://beam.apache.org/contribute/#make-your-change].

> Beam Row Avro conversion: fixed and bytes decimals
> --
>
> Key: BEAM-12877
> URL: https://issues.apache.org/jira/browse/BEAM-12877
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Affects Versions: 2.32.0
>Reporter: Koen Egelink
>Assignee: Koen Egelink
>Priority: P2
>  Labels: stale-assigned
> Attachments: beam-sdks-java-core-2.32.0-fixed-decimal.patch
>
>
> An Avro decimal logical type annotates Avro bytes or fixed types.
> Current Row to Avro conversion is limited to bytes type and in addition 
> hardcodes precision to MAX_INT and scale to 0. 
> I have attached a patch that adds support for decimal bytes and fixed types.
> I could think of 2 possible ways to solve this:
>  # Change Row decimal type to a logical type
>  # Use Beam field options to pass additional metadata required to serialize 
> decimals
> I felt that overhauling Beam schema types might not be a good idea. Instead I 
> went with option 2.
> Passes ./gradlew check
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13950) PVR_Spark2_Streaming perma-red

2022-02-15 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492886#comment-17492886
 ] 

Alexey Romanenko commented on BEAM-13950:
-

I agree with [~ibzib], it should not be a release blocker. Actually, we need to 
discuss if still need to support Spark 2 since most (or maybe even all) cloud 
providers already moved to Spark 3.

> PVR_Spark2_Streaming perma-red
> --
>
> Key: BEAM-13950
> URL: https://issues.apache.org/jira/browse/BEAM-13950
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, test-failures
>Affects Versions: 2.37.0
>Reporter: Brian Hulette
>Priority: P1
>
> https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark2_Streaming has 
> been failing a variable number of tests for a while. 
> Last successful run was Dec 28, 2021 
> (https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark2_Streaming/1021/),
>  which was approximately coincident with gradle 7 changes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13049) TPC-DS: Add Grafana dashboard

2022-02-15 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13049 started by Alexey Romanenko.
---
> TPC-DS: Add Grafana dashboard
> -
>
> Key: BEAM-13049
> URL: https://issues.apache.org/jira/browse/BEAM-13049
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: benchmark
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13049) TPC-DS: Add Grafana dashboard

2022-02-15 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13049:

Labels: benchmark  (was: stale-assigned)

> TPC-DS: Add Grafana dashboard
> -
>
> Key: BEAM-13049
> URL: https://issues.apache.org/jira/browse/BEAM-13049
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-tpcds
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: benchmark
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13854) Document casting trick for Avro value serializer in KafkaIO

2022-02-15 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13854:
---

Assignee: Matt Casters

> Document casting trick for Avro value serializer in KafkaIO
> ---
>
> Key: BEAM-13854
> URL: https://issues.apache.org/jira/browse/BEAM-13854
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Affects Versions: 2.36.0
>Reporter: Matt Casters
>Assignee: Matt Casters
>Priority: P3
>  Labels: pull-request-available
> Fix For: 2.38.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider we want to write Avro values to Kafka with for example the following 
> code:
> {code:java}
> KafkaIO.Write stringsToKafka =
>  KafkaIO.write()
>   .withBootstrapServers(bootstrapServers)
>   .withTopic(topic)
>   .withKeySerializer(StringSerializer.class)
>   .withValueSerializer(KafkaAvroSerializer.class)
>   .withProducerConfigUpdates(producerConfigUpdates);{code}
>  The KafkaAvroSerializer.class argument can't be passed as would normally be 
> the case in Producer option: 
> value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
> So the question then is which class should we pass or how to cast. IntelliJ 
> IDEA suggests a cast which doesn't compile.
> In the end the answer is simply:
> {code:java}
>   .withValueSerializer((Class)KafkaAvroSerializer.class) {code}
> I think it's worth documenting this little trick more clearly in the Javadoc 
> of KafkaIO to prevent others from bumping into the same issue. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13854) Document casting trick for Avro value serializer in KafkaIO

2022-02-15 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13854:

Fix Version/s: 2.38.0
   (was: 2.37.0)
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Document casting trick for Avro value serializer in KafkaIO
> ---
>
> Key: BEAM-13854
> URL: https://issues.apache.org/jira/browse/BEAM-13854
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Affects Versions: 2.36.0
>Reporter: Matt Casters
>Priority: P3
>  Labels: pull-request-available
> Fix For: 2.38.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Consider we want to write Avro values to Kafka with for example the following 
> code:
> {code:java}
> KafkaIO.Write stringsToKafka =
>  KafkaIO.write()
>   .withBootstrapServers(bootstrapServers)
>   .withTopic(topic)
>   .withKeySerializer(StringSerializer.class)
>   .withValueSerializer(KafkaAvroSerializer.class)
>   .withProducerConfigUpdates(producerConfigUpdates);{code}
>  The KafkaAvroSerializer.class argument can't be passed as would normally be 
> the case in Producer option: 
> value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
> So the question then is which class should we pass or how to cast. IntelliJ 
> IDEA suggests a cast which doesn't compile.
> In the end the answer is simply:
> {code:java}
>   .withValueSerializer((Class)KafkaAvroSerializer.class) {code}
> I think it's worth documenting this little trick more clearly in the Javadoc 
> of KafkaIO to prevent others from bumping into the same issue. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13854) Document casting trick for Avro value serializer in KafkaIO

2022-02-15 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492603#comment-17492603
 ] 

Alexey Romanenko commented on BEAM-13854:
-

Well, I think in a short-term we may just document this casting trick. In 
long-term we need to discuss how to do (or not to do =) ) better. 

[~mcasters] Are you going to create a PR based on the diff that you posted 
above? 

> Document casting trick for Avro value serializer in KafkaIO
> ---
>
> Key: BEAM-13854
> URL: https://issues.apache.org/jira/browse/BEAM-13854
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Affects Versions: 2.36.0
>Reporter: Matt Casters
>Priority: P3
> Fix For: 2.37.0
>
>
> Consider we want to write Avro values to Kafka with for example the following 
> code:
> {code:java}
> KafkaIO.Write stringsToKafka =
>  KafkaIO.write()
>   .withBootstrapServers(bootstrapServers)
>   .withTopic(topic)
>   .withKeySerializer(StringSerializer.class)
>   .withValueSerializer(KafkaAvroSerializer.class)
>   .withProducerConfigUpdates(producerConfigUpdates);{code}
>  The KafkaAvroSerializer.class argument can't be passed as would normally be 
> the case in Producer option: 
> value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
> So the question then is which class should we pass or how to cast. IntelliJ 
> IDEA suggests a cast which doesn't compile.
> In the end the answer is simply:
> {code:java}
>   .withValueSerializer((Class)KafkaAvroSerializer.class) {code}
> I think it's worth documenting this little trick more clearly in the Javadoc 
> of KafkaIO to prevent others from bumping into the same issue. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (BEAM-13891) Bump spark3 dependencies to 3.2.1

2022-02-10 Thread Alexey Romanenko (Jira)
Alexey Romanenko created BEAM-13891:
---

 Summary: Bump spark3 dependencies to 3.2.1
 Key: BEAM-13891
 URL: https://issues.apache.org/jira/browse/BEAM-13891
 Project: Beam
  Issue Type: Improvement
  Components: dependencies, runner-spark
Reporter: Alexey Romanenko






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13854) Document casting trick for Avro value serializer in KafkaIO

2022-02-09 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17489739#comment-17489739
 ] 

Alexey Romanenko commented on BEAM-13854:
-

[~mosche] Well, it was developed long before me, so I can only guess why is it 
so. Maybe this BEAM-1573 and related PR can help?

[~rangadi] Could you give us some insights on this since, iirc, you did review 
of that jira back in the days? 

> Document casting trick for Avro value serializer in KafkaIO
> ---
>
> Key: BEAM-13854
> URL: https://issues.apache.org/jira/browse/BEAM-13854
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Affects Versions: 2.36.0
>Reporter: Matt Casters
>Priority: P3
> Fix For: 2.37.0
>
>
> Consider we want to write Avro values to Kafka with for example the following 
> code:
> {code:java}
> KafkaIO.Write stringsToKafka =
>  KafkaIO.write()
>   .withBootstrapServers(bootstrapServers)
>   .withTopic(topic)
>   .withKeySerializer(StringSerializer.class)
>   .withValueSerializer(KafkaAvroSerializer.class)
>   .withProducerConfigUpdates(producerConfigUpdates);{code}
>  The KafkaAvroSerializer.class argument can't be passed as would normally be 
> the case in Producer option: 
> value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
> So the question then is which class should we pass or how to cast. IntelliJ 
> IDEA suggests a cast which doesn't compile.
> In the end the answer is simply:
> {code:java}
>   .withValueSerializer((Class)KafkaAvroSerializer.class) {code}
> I think it's worth documenting this little trick more clearly in the Javadoc 
> of KafkaIO to prevent others from bumping into the same issue. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13246) Add support for S3 Bucket Key at the object level (AWS SDK v2)

2022-02-08 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13246:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Add support for S3 Bucket Key at the object level (AWS SDK v2)
> --
>
> Key: BEAM-13246
> URL: https://issues.apache.org/jira/browse/BEAM-13246
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws, aws-s3
> Fix For: 2.37.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13839) Upgrade zstd-jni to version 1.5.2-1

2022-02-08 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13839:

Affects Version/s: (was: 2.37.0)

> Upgrade zstd-jni to version 1.5.2-1
> ---
>
> Key: BEAM-13839
> URL: https://issues.apache.org/jira/browse/BEAM-13839
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: P3
> Fix For: 2.37.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This upgrade brings the latest improvements of zstd and fixes breaking Beam 
> Java SDK Core tests related errors on Mac M1
> {{java.lang.UnsatisfiedLinkError: no zstd-jni in java.library.path}}
> {{Unsupported OS/arch, cannot find /darwin/aarch64/libzstd-jni.dylib or load 
> zstd-jni from system libraries. Please try building from source the jar or 
> providing libzstd-jni in your system.±}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13839) Upgrade zstd-jni to version 1.5.2-1

2022-02-08 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13839:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Upgrade zstd-jni to version 1.5.2-1
> ---
>
> Key: BEAM-13839
> URL: https://issues.apache.org/jira/browse/BEAM-13839
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies, sdk-java-core
>Affects Versions: 2.37.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: P3
> Fix For: 2.37.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This upgrade brings the latest improvements of zstd and fixes breaking Beam 
> Java SDK Core tests related errors on Mac M1
> {{java.lang.UnsatisfiedLinkError: no zstd-jni in java.library.path}}
> {{Unsupported OS/arch, cannot find /darwin/aarch64/libzstd-jni.dylib or load 
> zstd-jni from system libraries. Please try building from source the jar or 
> providing libzstd-jni in your system.±}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13203) Potential data loss when using SnsIO.writeAsync

2022-02-04 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487118#comment-17487118
 ] 

Alexey Romanenko commented on BEAM-13203:
-

Can we close this one once it's deprecated?

> Potential data loss when using SnsIO.writeAsync
> ---
>
> Key: BEAM-13203
> URL: https://issues.apache.org/jira/browse/BEAM-13203
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P1
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This needs to be investigated, reading the code suggests we might be losing 
> data under certain conditions e.g. when terminating the pipeline. The async 
> processing model here is far too simplistic.
> The bundle won't ever know about pending writes and won't block to wait for 
> any such operation. The same way exceptions are thrown into nowhere. Test 
> cases don't capture this as they operate on completed futures only (so 
> exceptions in the callbacks get thrown on the thread of processElement).
> {code:java}
> client.publish(publishRequest).whenComplete((response, ex) -> {
>   if (ex == null) {
> SnsResponse snsResponse = SnsResponse.of(context.element(), response);
> context.output(snsResponse);
>   } else {
> LOG.error("Error while publishing request to SNS", ex);
> throw new SnsWriteException("Error while publishing request to SNS", ex);
>   }
> }); {code}
> Also, this entirely removes backpressure from a stream. When used with a much 
> faster source we will continue to accumulate more and more memory as the 
> number of concurrent pending async operations is not limited.
> Spotify's scio contains a 
> [JavaAsyncDoFn|https://github.com/spotify/scio/blob/main/scio-core/src/main/java/com/spotify/scio/transforms/JavaAsyncDoFn.java]
>  that illustrates how it can be done.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13563) Generalize AWS client provider to be independent of client type

2022-02-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13563:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Generalize AWS client provider to be independent of client type
> ---
>
> Key: BEAM-13563
> URL: https://issues.apache.org/jira/browse/BEAM-13563
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P3
>  Labels: aws-sdk-v2
> Fix For: 2.37.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently each AWS IO uses its own client provider, in some cases also 
> separate ones for sync and async clients.
> Besides adding lots of boilerplate code across these IOs, this makes it 
> impossible to switch to an async implementation without breaking APIs. 
> The approach below would require only one common client provider to build all 
> kinds of clients:
> {code:java}
> public , ClientT> 
> ClientT buildClient(BuilderT builder) {
>   if (endpoint != null) {
> builder.endpointOverride(URI.create(endpoint));
>   }
>   return builder
>   .credentialsProvider(credentialsProvider)
>   .region(Region.of(region))
>   .build();
> }
> buildClient(DynamoDbClient.builder());
> buildClient(DynamoDbAsyncClient.builder());
> buildClient(S3Client.builder());
> buildClient(S3AsyncClient.builder());
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13563) Generalize AWS client provider to be independent of client type

2022-02-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13563 started by Moritz Mack.
--
> Generalize AWS client provider to be independent of client type
> ---
>
> Key: BEAM-13563
> URL: https://issues.apache.org/jira/browse/BEAM-13563
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P3
>  Labels: aws-sdk-v2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently each AWS IO uses its own client provider, in some cases also 
> separate ones for sync and async clients.
> Besides adding lots of boilerplate code across these IOs, this makes it 
> impossible to switch to an async implementation without breaking APIs. 
> The approach below would require only one common client provider to build all 
> kinds of clients:
> {code:java}
> public , ClientT> 
> ClientT buildClient(BuilderT builder) {
>   if (endpoint != null) {
> builder.endpointOverride(URI.create(endpoint));
>   }
>   return builder
>   .credentialsProvider(credentialsProvider)
>   .region(Region.of(region))
>   .build();
> }
> buildClient(DynamoDbClient.builder());
> buildClient(DynamoDbAsyncClient.builder());
> buildClient(S3Client.builder());
> buildClient(S3AsyncClient.builder());
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13563) Generalize AWS client provider to be independent of client type

2022-02-02 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13563:
---

Assignee: Moritz Mack

> Generalize AWS client provider to be independent of client type
> ---
>
> Key: BEAM-13563
> URL: https://issues.apache.org/jira/browse/BEAM-13563
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P3
>  Labels: aws-sdk-v2
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently each AWS IO uses its own client provider, in some cases also 
> separate ones for sync and async clients.
> Besides adding lots of boilerplate code across these IOs, this makes it 
> impossible to switch to an async implementation without breaking APIs. 
> The approach below would require only one common client provider to build all 
> kinds of clients:
> {code:java}
> public , ClientT> 
> ClientT buildClient(BuilderT builder) {
>   if (endpoint != null) {
> builder.endpointOverride(URI.create(endpoint));
>   }
>   return builder
>   .credentialsProvider(credentialsProvider)
>   .region(Region.of(region))
>   .build();
> }
> buildClient(DynamoDbClient.builder());
> buildClient(DynamoDbAsyncClient.builder());
> buildClient(S3Client.builder());
> buildClient(S3AsyncClient.builder());
> ...{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13731) FhirIO: Add support for BATCH bundle errors

2022-02-02 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485652#comment-17485652
 ] 

Alexey Romanenko commented on BEAM-13731:
-

Since PR was merged, can we close this jira?

> FhirIO: Add support for BATCH bundle errors
> ---
>
> Key: BEAM-13731
> URL: https://issues.apache.org/jira/browse/BEAM-13731
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-healthcare
>Reporter: Milena Bukal
>Assignee: Milena Bukal
>Priority: P2
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> BATCH bundles return a 200 status code even if entries fail, which results in 
> us only incrementing the success counter and not the failure counters, even 
> if all entries are failed to be written to the FHIR store. (Meanwhile, 
> transaction bundles - which is all FhirIO has been tested with, return a 
> non-20X HTTP response if _any_ of the entries don't succeed).
> Therefore, need to add parsing and metrics around this to be able to support 
> bundles of type BATCH.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-01-31 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484832#comment-17484832
 ] 

Alexey Romanenko commented on BEAM-13298:
-

[~rdahal] Thanks for clarification, I think you have a point and I'm fully 
agree with you. We need to take this into account while developing this 
feature. 

Regarding the output collection, it may contain also the results of write 
operation for every record if we are talking about Kafka sink and some other 
meta information for other types of sinks.

> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Assignee: Wei Fong Hsia
>Priority: P2
>  Labels: KafkaIO
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13245) Generalize S3FileSystem to support multiple URI schemes (AWS SDK v2)

2022-01-31 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13245:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Generalize S3FileSystem to support multiple URI schemes (AWS SDK v2)
> 
>
> Key: BEAM-13245
> URL: https://issues.apache.org/jira/browse/BEAM-13245
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Matt Rudary
>Priority: P2
>  Labels: aws, aws-s3
> Fix For: 2.37.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> S3FileSystem for SDK v2 should be generalized the same way as done for v1 
> here: [https://github.com/apache/beam/pull/15036]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-01-28 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13298:
---

Assignee: Wei Fong Hsia  (was: Alexey Romanenko)

> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Assignee: Wei Fong Hsia
>Priority: P2
>  Labels: KafkaIO
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13302) KafkaIO: Add support for outputting a PCollection

2022-01-28 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13302:
---

Assignee: Wei Fong Hsia

> KafkaIO: Add support for outputting a PCollection
> -
>
> Key: BEAM-13302
> URL: https://issues.apache.org/jira/browse/BEAM-13302
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Wei Hsia
>Assignee: Wei Fong Hsia
>Priority: P3
>
> Have KafkaIO’s WriteRecord PTransform output a PCollection, an extensible 
> WriteResult type similar to that of BigQuery’s 
> [WriteResult|https://beam.apache.org/releases/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html]
>  output. 
> The original input, of [ 
> type|https://www.javadoc.io/doc/org.apache.kafka/kafka-clients/2.4.1/org/apache/kafka/clients/producer/ProducerRecord.html],
>  will be returned as part of the WriteResult (or similar type), with a getter 
> function. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-01-28 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483865#comment-17483865
 ] 

Alexey Romanenko commented on BEAM-13298:
-

[~weifonghsia] Sounds similar to what I did for my implementation. Could you 
create a draft PR with your changes? It would be easier to discuss it there 
(and test it) than in a fork or any branch. 

Also, I can assign this Jira to you if you don't mind.

[~rdahal] Thanks for update, though, some users want to have an output 
collection with write results too. Would you be interested in [[~weifonghsia]] 
PR review? 

> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: KafkaIO
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-01-26 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-13298:
---

Assignee: Alexey Romanenko

> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: KafkaIO
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-01-26 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13298 started by Alexey Romanenko.
---
> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Assignee: Alexey Romanenko
>Priority: P2
>  Labels: KafkaIO
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13746) Fix nullability check in AwsModule for nullable fields in SSECustomerKey

2022-01-26 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13746:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Triage Needed)

> Fix nullability check in AwsModule for nullable fields in SSECustomerKey
> 
>
> Key: BEAM-13746
> URL: https://issues.apache.org/jira/browse/BEAM-13746
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws-sdk-v2
> Fix For: 2.37.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-8807) Add integration tests for SnsIO

2022-01-26 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8807:
---
Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: Open)

> Add integration tests for SnsIO
> ---
>
> Key: BEAM-8807
> URL: https://issues.apache.org/jira/browse/BEAM-8807
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Alexey Romanenko
>Assignee: Moritz Mack
>Priority: P3
>  Labels: aws, aws-sdk-v1, aws-sdk-v2
> Fix For: 2.37.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13510) SQS reader retries on invalid (expired) receipt handles (AWS SDK v2)

2022-01-25 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13510:

Fix Version/s: 2.37.0
   Status: Resolved  (was: Resolved)

> SQS reader retries on invalid (expired) receipt handles (AWS SDK v2)
> 
>
> Key: BEAM-13510
> URL: https://issues.apache.org/jira/browse/BEAM-13510
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: SQS, aws-sdk-v2
> Fix For: 2.37.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The extension of visibility timeouts as well as deletes with invalid 
> (expired) receipt handles will never succeed and therefore cannot be retried. 
> Investigate if this is also an issue in SDK v1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13653) Improved handling of topicArn in SnsIO.write

2022-01-24 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13653:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Improved handling of topicArn in SnsIO.write 
> -
>
> Key: BEAM-13653
> URL: https://issues.apache.org/jira/browse/BEAM-13653
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws, aws-sdk-v2
> Fix For: 2.37.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Usage of SnsIO.write is rather unintuitive at the moment.
> topicArn is a required configuration and during expansion the existence of 
> the topic is validated.
> However, the user also has to provide a function to build an SNS publish 
> request. The topicArn for that publish request has to be set as well, but can 
> be different from the one configured and validated in the writer. 
> This is confusing and makes any validation pointless.
> Also, when validating the SNS client instance is not closed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13298) Add "withResults()" for KafkaIO.write

2022-01-21 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13298:

Summary: Add "withResults()" for KafkaIO.write  (was: JdbcIO like 
withResults() for KafkaIO)

> Add "withResults()" for KafkaIO.write
> -
>
> Key: BEAM-13298
> URL: https://issues.apache.org/jira/browse/BEAM-13298
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-ideas, io-java-kafka
> Environment: Production
>Reporter: Ranjan Dahal
>Priority: P2
>  Labels: KafkaIO
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I am looking at use case where we have to wait until the Kafka Write 
> operation is completed before we can move forward with another transform. 
> Currently, JdbcIO support withResults() which waits for the previous 
> transform to complete as part of Wait.on(Signal) and moves on to the next. 
> Similarly, it would be very beneficial to have this capability on KafkaIO 
> (and others like PubSubIO, BigQueryIO etc). 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13653) Improved handling of topicArn in SnsIO.write

2022-01-21 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13653 started by Moritz Mack.
--
> Improved handling of topicArn in SnsIO.write 
> -
>
> Key: BEAM-13653
> URL: https://issues.apache.org/jira/browse/BEAM-13653
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: aws, aws-sdk-v2
>
> Usage of SnsIO.write is rather unintuitive at the moment.
> topicArn is a required configuration and during expansion the existence of 
> the topic is validated.
> However, the user also has to provide a function to build an SNS publish 
> request. The topicArn for that publish request has to be set as well, but can 
> be different from the one configured and validated in the writer. 
> This is confusing and makes any validation pointless.
> Also, when validating the SNS client instance is not closed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13353) KafkaIO should raise an error if both .withReadCommitted() and .commitOffsetsInFinalize() are used

2022-01-18 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477965#comment-17477965
 ] 

Alexey Romanenko commented on BEAM-13353:
-

[~lcwik] Could you add some details why it's needed?

> KafkaIO should raise an error if both .withReadCommitted() and 
> .commitOffsetsInFinalize() are used
> --
>
> Key: BEAM-13353
> URL: https://issues.apache.org/jira/browse/BEAM-13353
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: P2
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-8806) Add integration tests for SqsIO

2022-01-18 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-8806:
---
Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> Add integration tests for SqsIO
> ---
>
> Key: BEAM-8806
> URL: https://issues.apache.org/jira/browse/BEAM-8806
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Alexey Romanenko
>Assignee: Moritz Mack
>Priority: P3
>  Labels: SQS, aws-sdk-v1, aws-sdk-v2
> Fix For: 2.37.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (BEAM-13631) SQS read IO requires deterministic coder for SQS message to work in batch mode mode.

2022-01-18 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13631:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> SQS read IO requires deterministic coder for SQS message to work in batch 
> mode mode.
> 
>
> Key: BEAM-13631
> URL: https://issues.apache.org/jira/browse/BEAM-13631
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: SQS, aws-sdk-v1
> Fix For: 2.37.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the SQS read IO uses SerializableCoder.of(Message.class), which 
> isn't deterministic. This may cause issues when used in batch mode (based on 
> BoundedReadFromUnboundedSource). The mutation detector will throw in such 
> case:
> {code:java}
> Jan 10, 2022 11:37:05 AM 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector 
> verifyUnmodifiedThrowingCheckedExceptions
> WARNING: Coder of type class org.apache.beam.sdk.coders.SerializableCoder has 
> a #structuralValue method which does not return true when the encoding of the 
> elements is equal. Element 
> Shard{source=org.apache.beam.sdk.io.aws.sqs.SqsUnboundedSource@5f19451c, 
> maxNumRecords=1, maxReadTime=null}
> Coder of type class org.apache.beam.sdk.coders.SerializableCoder has a 
> #structuralValue method which does not return true when the encoding of the 
> elements is equal. Element 
> Shard{source=org.apache.beam.sdk.io.aws.sqs.SqsUnboundedSource@5f19451c, 
> maxNumRecords=1, maxReadTime=null}
> Exception in thread "main" org.apache.beam.sdk.util.IllegalMutationException: 
> PTransform SqsIO.Read/Read(SqsUnboundedSource)/Read/ParMultiDo(Read) mutated 
> value ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 
> 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 
> 50, 51, 99, 50, 54, 51], value={MessageId: 
> b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj2FXnTVQ==,MD5OfBody: 
> 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: 
> {SentTimestamp=1641794775474},MessageAttributes: 
> {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: 
> [],BinaryListValues: [], after it was output (new value was 
> ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 
> 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 
> 51, 99, 50, 54, 51], value={MessageId: 
> b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: 
> DeVRF8vQATm1f+rHIvR3eaejlRHksL1R7WE4zDT7lSwdIs9gJCYKXFXnTVQ==,MD5OfBody: 
> 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: 
> {SentTimestamp=1641794775474},MessageAttributes: 
> {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: 
> [],BinaryListValues: [],). Values must not be mutated in any way after 
> being output.
> at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit(ImmutabilityCheckingBundleFactory.java:137)
> at 
> org.apache.beam.runners.direct.EvaluationContext.commitBundles(EvaluationContext.java:231)
> at 
> org.apache.beam.runners.direct.EvaluationContext.handleResult(EvaluationContext.java:163)
> at 
> org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult(QuiescenceDriver.java:292)
> at 
> org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle(DirectTransformExecutor.java:194)
> at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:131)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value 
> ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 
> 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 
> 51, 99, 50, 54, 51], value={MessageId: 
> b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj2KQ==,MD5OfBody: 
> 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: 
> {SentTimestamp=1641794775474},MessageAttributes: 
> {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: 
> [],BinaryListValues: [], mutated illegall

[jira] [Updated] (BEAM-13400) JDBC IO does not support UUID and JSONB PostgreSQL types and OTHER JDBC types in general

2022-01-14 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-13400:

Fix Version/s: 2.37.0
   Resolution: Fixed
   Status: Resolved  (was: In Progress)

> JDBC IO does not support UUID and JSONB PostgreSQL types  and OTHER JDBC 
> types in general
> -
>
> Key: BEAM-13400
> URL: https://issues.apache.org/jira/browse/BEAM-13400
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Affects Versions: 2.34.0
>Reporter: Vitaly Ivanov
>Assignee: Vitaly Ivanov
>Priority: P2
> Fix For: 2.37.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The following exception occurs when trying to read rows from table which 
> contains fields with type JSONB and UUID. They have JDBCType OTHER.
> {noformat}
> java.lang.UnsupportedOperationException: Converting OTHER to Beam schema type 
> is not supported
>     at 
> org.apache.beam.sdk.io.jdbc.SchemaUtil.jdbcTypeToBeamFieldConverter(SchemaUtil.java:161)
>     at 
> org.apache.beam.sdk.io.jdbc.SchemaUtil.toBeamSchema(SchemaUtil.java:172)
>     at 
> org.apache.beam.sdk.io.jdbc.JdbcIO$ReadRows.inferBeamSchema(JdbcIO.java:655)
>     at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadRows.expand(JdbcIO.java:632)
>     at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadRows.expand(JdbcIO.java:551)
>     at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:548)
>     at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:499)
>     at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:56){noformat}
> I suppose the issue is quite important because the UUID type is widespread in 
> PostgeSQL.
>  
> Actual for Oracle BLOB as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (BEAM-13631) SQS read IO requires deterministic coder for SQS message to work in batch mode mode.

2022-01-14 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13631 started by Moritz Mack.
--
> SQS read IO requires deterministic coder for SQS message to work in batch 
> mode mode.
> 
>
> Key: BEAM-13631
> URL: https://issues.apache.org/jira/browse/BEAM-13631
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Labels: SQS, aws-sdk-v1
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the SQS read IO uses SerializableCoder.of(Message.class), which 
> isn't deterministic. This may cause issues when used in batch mode (based on 
> BoundedReadFromUnboundedSource). The mutation detector will throw in such 
> case:
> {code:java}
> Jan 10, 2022 11:37:05 AM 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector 
> verifyUnmodifiedThrowingCheckedExceptions
> WARNING: Coder of type class org.apache.beam.sdk.coders.SerializableCoder has 
> a #structuralValue method which does not return true when the encoding of the 
> elements is equal. Element 
> Shard{source=org.apache.beam.sdk.io.aws.sqs.SqsUnboundedSource@5f19451c, 
> maxNumRecords=1, maxReadTime=null}
> Coder of type class org.apache.beam.sdk.coders.SerializableCoder has a 
> #structuralValue method which does not return true when the encoding of the 
> elements is equal. Element 
> Shard{source=org.apache.beam.sdk.io.aws.sqs.SqsUnboundedSource@5f19451c, 
> maxNumRecords=1, maxReadTime=null}
> Exception in thread "main" org.apache.beam.sdk.util.IllegalMutationException: 
> PTransform SqsIO.Read/Read(SqsUnboundedSource)/Read/ParMultiDo(Read) mutated 
> value ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 
> 56, 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 
> 50, 51, 99, 50, 54, 51], value={MessageId: 
> b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj2FXnTVQ==,MD5OfBody: 
> 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: 
> {SentTimestamp=1641794775474},MessageAttributes: 
> {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: 
> [],BinaryListValues: [], after it was output (new value was 
> ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 
> 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 
> 51, 99, 50, 54, 51], value={MessageId: 
> b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: 
> DeVRF8vQATm1f+rHIvR3eaejlRHksL1R7WE4zDT7lSwdIs9gJCYKXFXnTVQ==,MD5OfBody: 
> 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: 
> {SentTimestamp=1641794775474},MessageAttributes: 
> {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: 
> [],BinaryListValues: [],). Values must not be mutated in any way after 
> being output.
> at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.commit(ImmutabilityCheckingBundleFactory.java:137)
> at 
> org.apache.beam.runners.direct.EvaluationContext.commitBundles(EvaluationContext.java:231)
> at 
> org.apache.beam.runners.direct.EvaluationContext.handleResult(EvaluationContext.java:163)
> at 
> org.apache.beam.runners.direct.QuiescenceDriver$TimerIterableCompletionCallback.handleResult(QuiescenceDriver.java:292)
> at 
> org.apache.beam.runners.direct.DirectTransformExecutor.finishBundle(DirectTransformExecutor.java:194)
> at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:131)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: org.apache.beam.sdk.util.IllegalMutationException: Value 
> ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 
> 45, 52, 99, 100, 50, 45, 97, 49, 55, 49, 45, 48, 57, 100, 48, 100, 53, 50, 
> 51, 99, 50, 54, 51], value={MessageId: 
> b72383f9-a4d8-4cd2-a171-09d0d523c263,ReceiptHandle: AQEBj2KQ==,MD5OfBody: 
> 38db8cbd101e4c1cfbf47e31c2aaab75,Body: {"test-key": "test-value"},Attributes: 
> {SentTimestamp=1641794775474},MessageAttributes: 
> {requestTimeMsSinceEpoch={StringValue: 1641794824800,StringListValues: 
> [],BinaryListValues: [], mutated illegally, new value was 
> ValueWithRecordId{id=[98, 55, 50, 51, 56, 51, 102, 57, 45, 97, 52, 100, 56, 
> 45, 52, 99, 100, 50, 4

[jira] [Work started] (BEAM-13175) Implement missing KinesisIO.Write for aws2

2022-01-13 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-13175 started by Moritz Mack.
--
> Implement missing KinesisIO.Write for aws2
> --
>
> Key: BEAM-13175
> URL: https://issues.apache.org/jira/browse/BEAM-13175
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The Kinesis Producer library KPL isn't available for aws2. Hence, we cannot 
> trivially port the old KinesisIO.Write over.
> But at the same time KPL also doesn't align with the ideas behind SDFs. So 
> it's a good opportunity to implement it properly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-13175) Implement missing KinesisIO.Write for aws2

2022-01-13 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475571#comment-17475571
 ] 

Alexey Romanenko commented on BEAM-13175:
-

Work in progress for this one.

> Implement missing KinesisIO.Write for aws2
> --
>
> Key: BEAM-13175
> URL: https://issues.apache.org/jira/browse/BEAM-13175
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-aws
>Reporter: Moritz Mack
>Assignee: Moritz Mack
>Priority: P2
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The Kinesis Producer library KPL isn't available for aws2. Hence, we cannot 
> trivially port the old KinesisIO.Write over.
> But at the same time KPL also doesn't align with the ideas behind SDFs. So 
> it's a good opportunity to implement it properly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (BEAM-10889) Document BatchElements

2022-01-13 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475570#comment-17475570
 ] 

Alexey Romanenko commented on BEAM-10889:
-

[~robertwb] Can we close this one?

> Document BatchElements
> --
>
> Key: BEAM-10889
> URL: https://issues.apache.org/jira/browse/BEAM-10889
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Robert Bradshaw
>Priority: P3
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This could be part of 
> http://localhost:1313/documentation/transforms/python/aggregation/groupintobatches/
>  or a new page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (BEAM-10889) Document BatchElements

2022-01-13 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko reassigned BEAM-10889:
---

Assignee: (was: Robert Bradshaw)

> Document BatchElements
> --
>
> Key: BEAM-10889
> URL: https://issues.apache.org/jira/browse/BEAM-10889
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Robert Bradshaw
>Priority: P3
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This could be part of 
> http://localhost:1313/documentation/transforms/python/aggregation/groupintobatches/
>  or a new page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   5   6   7   8   9   10   >