Re: [ANNOUNCE] New Committer: Svetak Sundhar

2024-02-14 Thread John Casey via dev
Congrats Svetak!

On Wed, Feb 14, 2024 at 9:00 AM Ahmed Abualsaud 
wrote:

> Congrats Svetak!
>
> On 2024/02/14 02:05:02 Priyans Desai via dev wrote:
> > Congratulations Svetak!!
> >
> > On Tue, Feb 13, 2024 at 8:09 PM Chamikara Jayalath via dev <
> > dev@beam.apache.org> wrote:
> >
> > > Congrats Svetak!
> > >
> > > On Tue, Feb 13, 2024 at 4:39 PM Svetak Sundhar via dev <
> > > dev@beam.apache.org> wrote:
> > >
> > >> Thanks everyone!! Looking forward to the continued collaboration :)
> > >>
> > >>
> > >> Svetak Sundhar
> > >>
> > >>   Data Engineer
> > >> s vetaksund...@google.com
> > >>
> > >>
> > >>
> > >> On Mon, Feb 12, 2024 at 9:58 PM Byron Ellis via dev <
> dev@beam.apache.org>
> > >> wrote:
> > >>
> > >>> Congrats Svetak!
> > >>>
> > >>> On Mon, Feb 12, 2024 at 6:57 PM Shunping Huang via dev <
> > >>> dev@beam.apache.org> wrote:
> > >>>
> >  Congratulations, Svetak!
> > 
> >  On Mon, Feb 12, 2024 at 9:50 PM XQ Hu via dev 
> >  wrote:
> > 
> > > Great job, Svetak! Thanks for all your contributions to Beam!!!
> > >
> > > On Mon, Feb 12, 2024 at 4:44 PM Valentyn Tymofieiev via dev <
> > > dev@beam.apache.org> wrote:
> > >
> > >> Congrats, Svetak!
> > >>
> > >> On Mon, Feb 12, 2024 at 11:20 AM Kenneth Knowles  >
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> Please join me and the rest of the Beam PMC in welcoming a new
> > >>> committer: Svetak Sundhar (sve...@apache.org).
> > >>>
> > >>> Svetak has been with Beam since 2021. Svetak has contributed
> code to
> > >>> many areas of Beam, including notebooks, Beam Quest, dataframes,
> and IOs.
> > >>> We also want to especially highlight the effort Svetak has put
> into
> > >>> improving Beam's documentation, participating in release
> validation, and
> > >>> evangelizing Beam.
> > >>>
> > >>> Considering his contributions to the project over this timeframe,
> > >>> the Beam PMC trusts Svetak with the responsibilities of a Beam
> committer.
> > >>> [1]
> > >>>
> > >>> Thank you Svetak! And we are looking to see more of your
> > >>> contributions!
> > >>>
> > >>> Kenn, on behalf of the Apache Beam PMC
> > >>>
> > >>> [1]
> > >>>
> > >>>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
> > >>>
> > >>
> >
>


ByteBuddy DoFnInvokers Write Up

2024-01-10 Thread John Casey via dev
The team at Google recently held an internal hackathon, and my hack
involved modifying how our ByteBuddy DoFnInvokers work. My hack didn't end
up going anywhere, but I learned a lot about how our code generation works.
It turns out we have no documentation or design docs about our code
generation, so I wrote up what I learned,

Please take a look, and let me know if I got anything wrong, or if you are
looking for more detail

s.apache.org/beam-bytebuddy-dofninvoker

John


Re: Issue #21005

2023-12-15 Thread John Casey via dev
Hi Asmita,

Those both make sense to me, feel free to go ahead.

I'll be happy to review your PR when its ready

On Thu, Dec 14, 2023 at 11:44 AM Asmita Mutgekar 
wrote:

> Hi Team,
>
> I have picked Issue: Add documentation and improved errors for QueryFn in
> MongoDbIO #21005
> Did some initial analysis on code and came up with the solution below ;
> 1. For documentation it's straightforward to add classes that the method
> can handle so i will be adding 2 classes that withQueryFn can handle.
> 2. For the split() method I am thinking of adding one check for if the
> object is AggregationQuery similar to FindQuery. Will have an "else"
> condition that will throw an exception saying "unknown type".
>
> Let me know if I am going in the correct direction.
>
> Thanks & Regards,
> Asmita
>


Re: Upgrading Avro dependencies

2023-11-15 Thread John Casey via dev
Alright, it looks like something was broken with my setup. I think a
straightforward upgrade is actually possible, so I'm going to continue on
that

On Wed, Nov 15, 2023 at 10:17 AM John Casey  wrote:

> So, thats the thing. I've upgraded to 1.11.3, but the plugin still seems
> to be generating Avro code that isn't compatible with 1.11.3.
>
> Looking at the PR https://github.com/apache/beam/pull/29390, it looks
> like it generates avro code based on 1.9.2, which ends up being
> incompatible.
>
> I don't think we can continue to support every avro version with our
> current setup.
>
> John
>
> On Tue, Nov 14, 2023 at 4:20 PM Alexey Romanenko 
> wrote:
>
>> Thanks! Please, let me know if you need any help on this.
>>
>> —
>> Alexey
>>
>> On 14 Nov 2023, at 17:52, John Casey  wrote:
>>
>> The vulnerability said to upgrade to 1.11.3, so I think that would be my
>> starting point.
>>
>>
>> On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>>
>>>
>>> On 10 Nov 2023, at 19:23, John Casey  wrote:
>>>
>>> I guess I'm a bit confused as to why specifically generateTestAvroJava
>>> seems to use the wrong version. I see our version specific generated code,
>>> but this action appears to be inherited from the plugin, and is configured
>>> with whichever avro version is provided. Given that I tried to just change
>>> to 1.11.3, I'm confused as to why its generating invalid java files for the
>>> provided avro version.
>>>
>>> Unlike the classes generated out of the JavaExec you referenced, this
>>> appears to only generate one version of the files.
>>>
>>>
>>> It was supposed to generate files with a specific Avro version every
>>> time to run the same tests again this specific Avro version.
>>>
>>> It may be that we don't need this action, but it still seems to run, as
>>> we depend on it in the applyAvroNature() action.
>>>
>>>
>>> I started to think if we really still need this action.
>>>
>>> We could remove this entirely. The java exec only generates versions for
>>> pre-configured test versions anyways
>>>
>>>
>>> Right. The point is in how many places in Beam we need to generate these
>>> files and which version(s) of Avro to use?
>>>
>>> —
>>> Alexey
>>>
>>>
>>> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
>>>> Hi John,
>>>>
>>>> This old Avro version in Beam is a very long story. Briefly, since
>>>> initially it was toughly integrated into Java SDK “core” module then it was
>>>> not possible to upgrade an Avro version without breaking changes for users
>>>> (because of some Avro incompatible changes, as you have noticed before).
>>>> So, we decided to extract Avro-related classes from Beam “core” to a
>>>> dedicated Avro extension [2] that supports and actually is tested with
>>>> different Avro versions. More details on this work are here [1]
>>>>
>>>> Regarding auto-generated classes. Initially, we used a Gradle plugin
>>>> for that but it’s limited with only one Avro version per instance of this
>>>> plugin, so it was not possible to generate these classes with different
>>>> Avro versions. So, we do this with a special Gradle task (“JavaExec") that
>>>> executes “org.apache.avro.tool.Main” and generate Avro classes per every
>>>> tested Avro version [3].
>>>>
>>>> We still keep an old Avro version 1.8.2. as a default dependency
>>>> version but it will be overwritten if users have a more recent one as a
>>>> project dependency in their classpath.
>>>>
>>>> I think we need to completely remove Avro Gradle plugin (use “JavaExec”
>>>> task to generate Avro classes with a provided Avro version instead) and
>>>> update the default Avro version to the more recent one since now it’s not
>>>> part of Java “core”.
>>>>
>>>> Any thoughts?
>>>>
>>>> —
>>>> Alexey
>>>>
>>>>
>>>> [1] https://github.com/apache/beam/issues/24292
>>>> [2]
>>>> https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
>>>> [3]
>>>> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>>>>
>>>>
>>>>
>>>> On 10 Nov 2023, at 18:05, John Casey via dev 
>>>> wrote:
>>>>
>>>> Hi All,
>>>>
>>>> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying
>>>> to upgrade to avro 1.11.3.
>>>>
>>>> Unfortunately, it seems that our auto-generated Avro test classes
>>>> aren't being generated properly with this new version. I've updated our
>>>> avro generation plugin as well, but for whatever reason, it seems that the
>>>> generated AvroTest file is being generated with references to classes that
>>>> did exist in 1.8.2, but no longer exist in 1.11.3.
>>>>
>>>> It seems like our autogeneration is being run with the wrong avro
>>>> version, but I can't seem to find where that would be configured.
>>>>
>>>> Here is the PR with my changes so far:
>>>> https://github.com/apache/beam/pull/29390
>>>>
>>>> Is anyone familiar with what might be misconfigured here?
>>>>
>>>> John
>>>>
>>>>
>>>>
>>>
>>


Re: Upgrading Avro dependencies

2023-11-15 Thread John Casey via dev
So, thats the thing. I've upgraded to 1.11.3, but the plugin still seems to
be generating Avro code that isn't compatible with 1.11.3.

Looking at the PR https://github.com/apache/beam/pull/29390, it looks like
it generates avro code based on 1.9.2, which ends up being incompatible.

I don't think we can continue to support every avro version with our
current setup.

John

On Tue, Nov 14, 2023 at 4:20 PM Alexey Romanenko 
wrote:

> Thanks! Please, let me know if you need any help on this.
>
> —
> Alexey
>
> On 14 Nov 2023, at 17:52, John Casey  wrote:
>
> The vulnerability said to upgrade to 1.11.3, so I think that would be my
> starting point.
>
>
> On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>>
>>
>> On 10 Nov 2023, at 19:23, John Casey  wrote:
>>
>> I guess I'm a bit confused as to why specifically generateTestAvroJava
>> seems to use the wrong version. I see our version specific generated code,
>> but this action appears to be inherited from the plugin, and is configured
>> with whichever avro version is provided. Given that I tried to just change
>> to 1.11.3, I'm confused as to why its generating invalid java files for the
>> provided avro version.
>>
>> Unlike the classes generated out of the JavaExec you referenced, this
>> appears to only generate one version of the files.
>>
>>
>> It was supposed to generate files with a specific Avro version every time
>> to run the same tests again this specific Avro version.
>>
>> It may be that we don't need this action, but it still seems to run, as
>> we depend on it in the applyAvroNature() action.
>>
>>
>> I started to think if we really still need this action.
>>
>> We could remove this entirely. The java exec only generates versions for
>> pre-configured test versions anyways
>>
>>
>> Right. The point is in how many places in Beam we need to generate these
>> files and which version(s) of Avro to use?
>>
>> —
>> Alexey
>>
>>
>> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> Hi John,
>>>
>>> This old Avro version in Beam is a very long story. Briefly, since
>>> initially it was toughly integrated into Java SDK “core” module then it was
>>> not possible to upgrade an Avro version without breaking changes for users
>>> (because of some Avro incompatible changes, as you have noticed before).
>>> So, we decided to extract Avro-related classes from Beam “core” to a
>>> dedicated Avro extension [2] that supports and actually is tested with
>>> different Avro versions. More details on this work are here [1]
>>>
>>> Regarding auto-generated classes. Initially, we used a Gradle plugin for
>>> that but it’s limited with only one Avro version per instance of this
>>> plugin, so it was not possible to generate these classes with different
>>> Avro versions. So, we do this with a special Gradle task (“JavaExec") that
>>> executes “org.apache.avro.tool.Main” and generate Avro classes per every
>>> tested Avro version [3].
>>>
>>> We still keep an old Avro version 1.8.2. as a default dependency version
>>> but it will be overwritten if users have a more recent one as a project
>>> dependency in their classpath.
>>>
>>> I think we need to completely remove Avro Gradle plugin (use “JavaExec”
>>> task to generate Avro classes with a provided Avro version instead) and
>>> update the default Avro version to the more recent one since now it’s not
>>> part of Java “core”.
>>>
>>> Any thoughts?
>>>
>>> —
>>> Alexey
>>>
>>>
>>> [1] https://github.com/apache/beam/issues/24292
>>> [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
>>> [3]
>>> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>>>
>>>
>>>
>>> On 10 Nov 2023, at 18:05, John Casey via dev 
>>> wrote:
>>>
>>> Hi All,
>>>
>>> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying
>>> to upgrade to avro 1.11.3.
>>>
>>> Unfortunately, it seems that our auto-generated Avro test classes aren't
>>> being generated properly with this new version. I've updated our avro
>>> generation plugin as well, but for whatever reason, it seems that the
>>> generated AvroTest file is being generated with references to classes that
>>> did exist in 1.8.2, but no longer exist in 1.11.3.
>>>
>>> It seems like our autogeneration is being run with the wrong avro
>>> version, but I can't seem to find where that would be configured.
>>>
>>> Here is the PR with my changes so far:
>>> https://github.com/apache/beam/pull/29390
>>>
>>> Is anyone familiar with what might be misconfigured here?
>>>
>>> John
>>>
>>>
>>>
>>
>


Re: Upgrading Avro dependencies

2023-11-14 Thread John Casey via dev
The vulnerability said to upgrade to 1.11.3, so I think that would be my
starting point.


On Mon, Nov 13, 2023 at 12:23 PM Alexey Romanenko 
wrote:

>
>
> On 10 Nov 2023, at 19:23, John Casey  wrote:
>
> I guess I'm a bit confused as to why specifically generateTestAvroJava
> seems to use the wrong version. I see our version specific generated code,
> but this action appears to be inherited from the plugin, and is configured
> with whichever avro version is provided. Given that I tried to just change
> to 1.11.3, I'm confused as to why its generating invalid java files for the
> provided avro version.
>
> Unlike the classes generated out of the JavaExec you referenced, this
> appears to only generate one version of the files.
>
>
> It was supposed to generate files with a specific Avro version every time
> to run the same tests again this specific Avro version.
>
> It may be that we don't need this action, but it still seems to run, as we
> depend on it in the applyAvroNature() action.
>
>
> I started to think if we really still need this action.
>
> We could remove this entirely. The java exec only generates versions for
> pre-configured test versions anyways
>
>
> Right. The point is in how many places in Beam we need to generate these
> files and which version(s) of Avro to use?
>
> —
> Alexey
>
>
> On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
>> Hi John,
>>
>> This old Avro version in Beam is a very long story. Briefly, since
>> initially it was toughly integrated into Java SDK “core” module then it was
>> not possible to upgrade an Avro version without breaking changes for users
>> (because of some Avro incompatible changes, as you have noticed before).
>> So, we decided to extract Avro-related classes from Beam “core” to a
>> dedicated Avro extension [2] that supports and actually is tested with
>> different Avro versions. More details on this work are here [1]
>>
>> Regarding auto-generated classes. Initially, we used a Gradle plugin for
>> that but it’s limited with only one Avro version per instance of this
>> plugin, so it was not possible to generate these classes with different
>> Avro versions. So, we do this with a special Gradle task (“JavaExec") that
>> executes “org.apache.avro.tool.Main” and generate Avro classes per every
>> tested Avro version [3].
>>
>> We still keep an old Avro version 1.8.2. as a default dependency version
>> but it will be overwritten if users have a more recent one as a project
>> dependency in their classpath.
>>
>> I think we need to completely remove Avro Gradle plugin (use “JavaExec”
>> task to generate Avro classes with a provided Avro version instead) and
>> update the default Avro version to the more recent one since now it’s not
>> part of Java “core”.
>>
>> Any thoughts?
>>
>> —
>> Alexey
>>
>>
>> [1] https://github.com/apache/beam/issues/24292
>> [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
>> [3]
>> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>>
>>
>>
>> On 10 Nov 2023, at 18:05, John Casey via dev  wrote:
>>
>> Hi All,
>>
>> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying
>> to upgrade to avro 1.11.3.
>>
>> Unfortunately, it seems that our auto-generated Avro test classes aren't
>> being generated properly with this new version. I've updated our avro
>> generation plugin as well, but for whatever reason, it seems that the
>> generated AvroTest file is being generated with references to classes that
>> did exist in 1.8.2, but no longer exist in 1.11.3.
>>
>> It seems like our autogeneration is being run with the wrong avro
>> version, but I can't seem to find where that would be configured.
>>
>> Here is the PR with my changes so far:
>> https://github.com/apache/beam/pull/29390
>>
>> Is anyone familiar with what might be misconfigured here?
>>
>> John
>>
>>
>>
>


Re: Upgrading Avro dependencies

2023-11-10 Thread John Casey via dev
I guess I'm a bit confused as to why specifically generateTestAvroJava
seems to use the wrong version. I see our version specific generated code,
but this action appears to be inherited from the plugin, and is configured
with whichever avro version is provided. Given that I tried to just change
to 1.11.3, I'm confused as to why its generating invalid java files for the
provided avro version.

Unlike the classes generated out of the JavaExec you referenced, this
appears to only generate one version of the files.
It may be that we don't need this action, but it still seems to run, as we
depend on it in the applyAvroNature() action.

We could remove this entirely. The java exec only generates versions for
pre-configured test versions anyways

On Fri, Nov 10, 2023 at 12:53 PM Alexey Romanenko 
wrote:

> Hi John,
>
> This old Avro version in Beam is a very long story. Briefly, since
> initially it was toughly integrated into Java SDK “core” module then it was
> not possible to upgrade an Avro version without breaking changes for users
> (because of some Avro incompatible changes, as you have noticed before).
> So, we decided to extract Avro-related classes from Beam “core” to a
> dedicated Avro extension [2] that supports and actually is tested with
> different Avro versions. More details on this work are here [1]
>
> Regarding auto-generated classes. Initially, we used a Gradle plugin for
> that but it’s limited with only one Avro version per instance of this
> plugin, so it was not possible to generate these classes with different
> Avro versions. So, we do this with a special Gradle task (“JavaExec") that
> executes “org.apache.avro.tool.Main” and generate Avro classes per every
> tested Avro version [3].
>
> We still keep an old Avro version 1.8.2. as a default dependency version
> but it will be overwritten if users have a more recent one as a project
> dependency in their classpath.
>
> I think we need to completely remove Avro Gradle plugin (use “JavaExec”
> task to generate Avro classes with a provided Avro version instead) and
> update the default Avro version to the more recent one since now it’s not
> part of Java “core”.
>
> Any thoughts?
>
> —
> Alexey
>
>
> [1] https://github.com/apache/beam/issues/24292
> [2] https://github.com/apache/beam/tree/master/sdks/java/extensions/avro
> [3]
> https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135
>
>
>
> On 10 Nov 2023, at 18:05, John Casey via dev  wrote:
>
> Hi All,
>
> There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to
> upgrade to avro 1.11.3.
>
> Unfortunately, it seems that our auto-generated Avro test classes aren't
> being generated properly with this new version. I've updated our avro
> generation plugin as well, but for whatever reason, it seems that the
> generated AvroTest file is being generated with references to classes that
> did exist in 1.8.2, but no longer exist in 1.11.3.
>
> It seems like our autogeneration is being run with the wrong avro version,
> but I can't seem to find where that would be configured.
>
> Here is the PR with my changes so far:
> https://github.com/apache/beam/pull/29390
>
> Is anyone familiar with what might be misconfigured here?
>
> John
>
>
>


Upgrading Avro dependencies

2023-11-10 Thread John Casey via dev
Hi All,

There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to
upgrade to avro 1.11.3.

Unfortunately, it seems that our auto-generated Avro test classes aren't
being generated properly with this new version. I've updated our avro
generation plugin as well, but for whatever reason, it seems that the
generated AvroTest file is being generated with references to classes that
did exist in 1.8.2, but no longer exist in 1.11.3.

It seems like our autogeneration is being run with the wrong avro version,
but I can't seem to find where that would be configured.

Here is the PR with my changes so far:
https://github.com/apache/beam/pull/29390

Is anyone familiar with what might be misconfigured here?

John


Adding Dead Letter Queues to Beam IOs

2023-11-08 Thread John Casey via dev
Hi All,

I've written up a design for adding DLQs to existing Beam IOs. It's been
through a round of reviews with some Dataflow folks at Google, but I'd
appreciate any comments the rest of Beam have around how to refine the
design.

TL;DR: Make it easy for a user to configure IOs to route bad data to an
alternate sink instead of crashing the pipeline or having the record be
retried indefinitely.

https://docs.google.com/document/d/1NGeCk6tOqF-TiGEAV7ixd_vhIiWz9sHPlCa1P_77Ajs/edit?usp=sharing

Thanks!

John


Re: [External Sender] Re: [Question] Error handling for IO Write Functions

2023-11-08 Thread John Casey via dev
Yep, thats a common misunderstanding with beam.

The code that is actually executed in the try block is just for pipeline
construction, and no data is processed at this point in time.

Once the pipeline is constructed, the various pardos are serialized, and
sent to the runners, where they are actually executed.

In this case, if there was an exception in the pardo that converts rows to
avro, you would see the "Exception when converting Beam Row to Avro Record"
log in whatever logs your runner provides you, and the exception would
propagate up to your runner.

In this case, your log log.info("Finished writing Parquet file to path {}",
writePath); is inaccurate, it will log when the pipeline is constructed,
not when the parquet write completes

On Wed, Nov 8, 2023 at 10:51 AM Ramya Prasad via dev 
wrote:

> Hey John,
>
> Yes that's how my code is set up, I have the FileIO.write() in its own
> try-catch block. I took a second look at where exactly the code is failing,
> and it's actually in a ParDo function which is happening before I call
> FileIO.write(). But even within that, I've tried adding a try-catch but the
> error isn't stopping the actual application run in a Spark cluster. In the
> cluster, I see that the exception is being thrown from my ParDo, but then
> immediately after that, I see the line* INFO ApplicationMaster: Final app
> status: SUCCEEDED, exitCode: 0. *This is roughly what my code setup looks
> like:
>
> @Slf4j
> public class ParquetWriteActionStrategy {
>
> public void executeWriteAction(Pipeline p) throws Exception {
>
> try {
>
> // transform PCollection from type Row to GenericRecords
> PCollection records = p.apply("transform 
> PCollection from type Row to GenericRecords",
> ParDo.of(new DoFn() {
> @ProcessElement
> public void processElement(@Element Row row, 
> OutputReceiver out) {
> try {
> 
> } catch (Exception e) {
> log.error("Exception when converting Beam Row 
> to Avro Record: {}", e.getMessage());
> throw e;
> }
>
> }
> })).setCoder(AvroCoder.of(avroSchema));
> records.apply("Writing Parquet Output File", 
> FileIO.
> write()
> .via()
> .to(writePath)
> .withSuffix(".parquet"));
>
> log.info("Finished writing Parquet file to path {}", writePath);
> } catch (Exception e) {
> log.error("Error in Parquet Write Action. {}", e.getMessage());
> throw e;
> }
>
> }
>
>
> On Wed, Nov 8, 2023 at 9:16 AM John Casey via dev 
> wrote:
>
>> There are 2 execution times when using Beam. The first execution is
>> local, when a pipeline is constructed, and the second is remote on the
>> runner, processing data.
>>
>> Based on what you said, it sounds like you are wrapping pipeline
>> construction in a try-catch, and constructing FileIO isn't failing.
>>
>> e.g.
>>
>> try {
>>
>> FileIO.write().someOtherconfigs()
>>
>> } catch ...
>>
>> this will catch any exceptions in constructing fileio, but the running
>> pipeline won't propagate exceptions through this exception block.
>>
>> On Tue, Nov 7, 2023 at 5:21 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> File write failures should be throwing exceptions that will
>>> terminate the pipeline on failure. (Generally a distributed runner will
>>> make multiple attempts before abandoning the entire pipeline of course.)
>>>
>>> Are you seeing files failing to be written but no exceptions being
>>> thrown? If so, this is definitely a bug that we want to resolve.
>>>
>>>
>>> On Tue, Nov 7, 2023 at 11:17 AM Ramya Prasad via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am a developer using Apache Beam in my Java application, and I need
>>>> some help on how to handle exceptions when writing a file to S3. I have
>>>> tried wrapping my code within a try-catch block, but no exception is being
>>>> thrown within the try block. I'm assuming that FileIO doesn't throw any
>>>> exceptions upon failure. Is there a way in which I can either term

Re: [Question] Error handling for IO Write Functions

2023-11-08 Thread John Casey via dev
There are 2 execution times when using Beam. The first execution is local,
when a pipeline is constructed, and the second is remote on the runner,
processing data.

Based on what you said, it sounds like you are wrapping pipeline
construction in a try-catch, and constructing FileIO isn't failing.

e.g.

try {

FileIO.write().someOtherconfigs()

} catch ...

this will catch any exceptions in constructing fileio, but the running
pipeline won't propagate exceptions through this exception block.

On Tue, Nov 7, 2023 at 5:21 PM Robert Bradshaw via dev 
wrote:

> File write failures should be throwing exceptions that will terminate the
> pipeline on failure. (Generally a distributed runner will make multiple
> attempts before abandoning the entire pipeline of course.)
>
> Are you seeing files failing to be written but no exceptions being thrown?
> If so, this is definitely a bug that we want to resolve.
>
>
> On Tue, Nov 7, 2023 at 11:17 AM Ramya Prasad via dev 
> wrote:
>
>> Hello,
>>
>> I am a developer using Apache Beam in my Java application, and I need
>> some help on how to handle exceptions when writing a file to S3. I have
>> tried wrapping my code within a try-catch block, but no exception is being
>> thrown within the try block. I'm assuming that FileIO doesn't throw any
>> exceptions upon failure. Is there a way in which I can either terminate the
>> program on failure or at least be made aware of if any of my write
>> operations fail?
>>
>> Thanks and sincerely,
>> Ramya
>> --
>>
>> The information contained in this e-mail may be confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>
>>
>>
>>


Re: Reshuffle PTransform Design Doc

2023-10-05 Thread John Casey via dev
Given that this is a hint, I'm not sure redistribute should be a PTransform
as opposed to some other way to hint to a runner.

I'm not sure of what the syntax of that would be, but a semantic no-op
transform that the runner may or may not do anything with is odd.

On Thu, Oct 5, 2023 at 11:30 AM Kenneth Knowles  wrote:

> So a high level suggestion from Robert that I want to highlight as a
> top-post:
>
> Instead of focusing on just fixing the SDKs and runners Reshuffle, this
> could be an opportunity to introduce Redistribute which was proposed in the
> long-ago thread. The semantics are identical but it is more clear that it
> *only* is a hint about redistributing data and there is no expectation of
> a checkpoint.
>
> This new name may also be an opportunity to maintain update compatibility
> (though this may actually be leaving unsafe code in user's hands) and/or
> separate @RequiresStableInput/checkpointing uses of Reshuffle from
> redistribution-only uses of Reshuffle.
>
> Any other thoughts on this one high level bit?
>
> Kenn
>
> On Thu, Oct 5, 2023 at 11:15 AM Kenneth Knowles  wrote:
>
>>
>> On Wed, Oct 4, 2023 at 7:45 PM Robert Burke  wrote:
>>
>>> LGTM.
>>>
>>> It looks the Go SDK already adheres to these semantics as well for the
>>> reference impl(well, reshuffle/redistribute_randomly, _by_key isn't
>>> implemented in the Go SDK, and only uses the existing unqualified reshuffle
>>> URN [0].
>>>
>>> The original strategy, and then for every element, the original Window,
>>> TS, and Pane are all serialized, shuffled, and then deserialized downstream.
>>>
>>>
>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go#L65
>>>
>>>
>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go#L145
>>>
>>> Prism at the moment vaccuously implements reshuffle by omitting the
>>> node, and rewriting the inputs and outputs [1], as it's a local runner with
>>> single transform per bundle execution, but I was intending to make it a
>>> fusion break regardless.  Ultimately prism's "test" variant will default to
>>> executing the SDKs dictated reference implementation for the composite(s),
>>> and any "fast" or "prod" variant would simply do the current implementation.
>>>
>>
>> Very nice!
>>
>> And of course I should have linked out to the existing reshuffle URN in
>> the proto.
>>
>> Kenn
>>
>>
>>
>>>
>>> Robert Burke
>>> Beam Go Busybody
>>>
>>> [0]:
>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L46C3-L46C50
>>> [1]:
>>> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/handlerunner.go#L82
>>>
>>>
>>>
>>> On 2023/09/26 15:43:53 Kenneth Knowles wrote:
>>> > Hi everyone,
>>> >
>>> > Recently there was a bug [1] caused by discrepancies between two of
>>> > Dataflow's reshuffle implementations. I think the reference
>>> implementation
>>> > in the Java SDK [2] also does not match. This all led to discussion on
>>> the
>>> > bug and the pull request [3] about what the actual semantics should
>>> be. I
>>> > got it wrong, maybe multiple times. So I wrote up a very short
>>> document to
>>> > finish the discussion:
>>> >
>>> > https://s.apache.org/beam-reshuffle
>>> >
>>> > This is also probably among the simplest imaginable use of
>>> > http://s.apache.org/ptransform-design-doc in case you want to see
>>> kind of
>>> > how I intended it to be used.
>>> >
>>> > Kenn
>>> >
>>> > [1] https://github.com/apache/beam/issues/28219
>>> > [2]
>>> >
>>> https://github.com/apache/beam/blob/d52b077ad505c8b50f10ec6a4eb83d385cdaf96a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L84
>>> > [3] https://github.com/apache/beam/pull/28272
>>> >
>>>
>>


Re: [ANNOUNCE] New PMC Member: Robert Burke

2023-10-05 Thread John Casey via dev
Congrats!

On Thu, Oct 5, 2023 at 4:07 AM Ismaël Mejía  wrote:

> Congratulations Robert, well deserved ! long live go !
>
> On Wed, Oct 4, 2023 at 11:58 PM Chamikara Jayalath 
> wrote:
>
>> Congrats Rebo!
>>
>> On Wed, Oct 4, 2023 at 1:42 AM Jan Lukavský  wrote:
>>
>>> Congrats Robert!
>>> On 10/4/23 10:29, Alexey Romanenko wrote:
>>>
>>> Congrats Robert, very well deserved!
>>>
>>> —
>>> Alexey
>>>
>>> On 4 Oct 2023, at 00:39, Austin Bennett 
>>>  wrote:
>>>
>>> Thanks for all you do @Robert Burke  !
>>>
>>> On Tue, Oct 3, 2023 at 12:53 PM Ahmed Abualsaud <
>>> ahmedabuals...@apache.org> wrote:
>>>
 Congrats Rebo!

 On 2023/10/03 18:39:47 Kenneth Knowles wrote:
 > Hi all,
 >
 > Please join me and the rest of the Beam PMC in welcoming Robert Burke
 <
 > lostl...@apache.org> as our newest PMC member.
 >
 > Robert has been a part of the Beam community since 2017. He is our
 resident
 > Gopher, producing the Go SDK and most recently the local, portable,
 Prism
 > runner. Robert has presented on Beam many times, having written not
 just
 > core Beam code but quite interesting pipelines too :-)
 >
 > Congratulations Robert and thanks for being a part of Apache Beam!
 >
 > Kenn, on behalf of the Beam PMC (which now includes Robert)
 >

>>>
>>>


Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-06 Thread John Casey via dev
Agreed on documentation and on keeping it in a separate repo.

We have a few pretty significant beam extensions (scio and Dataflow
Templates also come to mind) that Beam should highlight, but are separate
repos for their own governance, contributions, and release reasons.

The difference with Asgarde is that we might want to use it in Beam itself,
which makes it more reasonable to include in the main repo.

On Tue, Sep 5, 2023 at 8:36 PM Robert Bradshaw via dev 
wrote:

> I think this is a great library. I'm on the fence of whether it makes
> sense to include with Beam proper vs. be a library that builds on top of
> Beam. (Would there be benefits of tighter integration? There is the
> maintenance/loss of governance issue.) I am definitely not on the side that
> the entire Beam ecosystem needs to be distributed/maintained by Beam
> itself.
>
> Regardless of the direction we go, I think it could make a lot of sense to
> put pointers to it in our documentation.
>
>
> On Tue, Sep 5, 2023 at 7:21 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> I think my only concerns here are around the toil we'll be taking on, and
>> will we be leaving the asgarde project in a better or worse place.
>>
>> From a release standpoint, we would need to release it with the same
>> cadence as Beam. Adding asgarde into our standard release process seems
>> fairly straightforward, though, so I'm not too worried about it - looks
>> like it's basically (1) add a commit like this
>> ,
>> (2) run this workflow
>> ,
>> and (3) tag/mark the release as released on GitHub.
>>
>> In terms of bug fixes and improvements, though, I'm a little worried that
>> we might be leaving things in a worse state since Mazlum has been the only
>> contributor thus far, and he would lose some governance (and possibly the
>> ability to commit code on his own). An extra motivated community member or
>> two could change the math a bit, but I'm not sure if there are actually
>> clear advantages to including it in Apache other than visibility. Would
>> adding links to our docs calling Asgarde out as an option accomplish the
>> same purpose?
>>
>> > Let's be careful about whether these tests are included in our
>> presubmits. Contrib code with flaky tests has been a major pain point in
>> the past.
>>
>> +1 - I think if we do this I'd vote that it be in a separate repo (
>> github.com/apache/beam-asgarde made sense to me).
>>
>> ---
>>
>> Overall, I'm probably a slight -1 to adding this to the Apache workspace,
>> but +1 to at least adding links from the Beam docs to Asgarde.
>>
>> Thanks,
>> Danny
>>
>>
>>
>> On Tue, Sep 5, 2023 at 12:03 AM Reuven Lax via dev 
>> wrote:
>>
>>> Let's be careful about whether these tests are included in our
>>> presubmits. Contrib code with flaky tests has been a major pain point in
>>> the past.
>>>
>>> On Sat, Sep 2, 2023 at 12:02 PM Austin Bennett 
>>> wrote:
>>>
 Wanting us to not miss this. @Mazlum TOSUN  is
 happy to donate Asgarde to our project.

 It looks like he'd need a SGA and CCLA [ 1 ] on file; anything else?

 I recalled the donation of Euphoria [ 2 ] , so I looked at those
 threads [ 3 ]  for insights into the process.  It didn't look like there
 was a needed VOTE, so mostly a matter of ensuring necessary signatures, and
 ideally some sort of consensus [ or non-opposition ] to the donation.


 [ 1 ] https://www.apache.org/licenses/contributor-agreements.html
 [ 2 ] https://beam.apache.org/documentation/sdks/java/euphoria/
 [ 3 ] https://lists.apache.org/thread/xzlx4rm2tvc36mmwvhyvtdvsw7bnjscp



 On Thu, Jun 15, 2023 at 7:05 AM Kerry Donny-Clark via dev <
 dev@beam.apache.org> wrote:

> This looks like an excellent contribution. I can easily understand the
> motivation, and I think Beam would benefit from a higher level abstraction
> for error handling.
> Kerry
>
> On Wed, Jun 14, 2023, 6:31 PM Austin Bennett 
> wrote:
>
>> Hi Beam Devs,
>>
>> @Mazlum  was
>> suggested to consider donating Asgarde
>>  to Beam for Java/Kotlin error
>> handling to Beam [ see:
>> https://2022.beamsummit.org/sessions/error-handling-asgarde/ for
>> last year's Beam Summit talk ], he is also the author of Pasgard
>> e [ for Python ] and Milgard [
>> for a simplified Kotlin API ].
>>
>> Would Asgarde be a good contribution, something the Beam community
>> would be willing to accept?  I imagine we might want it to live at
>> github.com/apache/beam-asgarde ?  Or perhaps there is a good place
>> in github.com/apache/beam ??
>>
>

Re: [ANNOUNCE] New committer: Ahmed Abualsaud

2023-08-25 Thread John Casey via dev
Congrats Ahmed!

On Fri, Aug 25, 2023 at 10:43 AM Bjorn Pedersen via dev 
wrote:

> Congrats Ahmed! Well deserved!
>
> On Fri, Aug 25, 2023 at 10:36 AM Yi Hu via dev 
> wrote:
>
>> Congrats Ahmed!
>>
>> On Fri, Aug 25, 2023 at 10:11 AM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congrats Ahmed!
>>>
>>> On Fri, Aug 25, 2023 at 9:53 AM Kerry Donny-Clark via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Well done Ahmed!

 On Fri, Aug 25, 2023 at 9:17 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Congrats Ahmed!
>
> On Fri, Aug 25, 2023 at 3:16 AM Jan Lukavský  wrote:
>
>> Congrats Ahmed!
>> On 8/25/23 07:56, Anand Inguva via dev wrote:
>>
>> Congratulations Ahmed :)
>>
>> On Fri, Aug 25, 2023 at 1:17 AM Damon Douglas <
>> damondoug...@apache.org> wrote:
>>
>>> Well deserved! Congratulations, Ahmed! I'm so happy for you.
>>>
>>> On Thu, Aug 24, 2023, 5:46 PM Byron Ellis via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congratulations!

 On Thu, Aug 24, 2023 at 5:34 PM Robert Burke 
 wrote:

> Congratulations Ahmed!!
>
> On Thu, Aug 24, 2023, 4:08 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>> Congrats Ahmed!!
>>
>> On Thu, Aug 24, 2023 at 4:06 PM Bruno Volpato via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations, Ahmed!
>>>
>>> Very well deserved!
>>>
>>>
>>> On Thu, Aug 24, 2023 at 6:09 PM XQ Hu via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congratulations, Ahmed!

 On Thu, Aug 24, 2023, 5:49 PM Ahmet Altay via dev <
 dev@beam.apache.org> wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming a new
> committer: Ahmed Abualsaud (ahmedabuals...@apache.org).
>
> Ahmed has been part of the Beam community since January 2022,
> working mostly on IO connectors, made a large amount of 
> contributions to
> make Beam IOs more usable, performant, and reliable. And at the 
> same time
> Ahmed was active in the user list and at the Beam summit helping 
> users by
> sharing his knowledge.
>
> Considering their contributions to the project over this
> timeframe, the Beam PMC trusts Ahmed with the responsibilities of 
> a Beam
> committer. [1]
>
> Thank you Ahmed! And we are looking to see more of your
> contributions!
>
> Ahmet, on behalf of the Apache Beam PMC
>
> [1]
>
> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>
>


Re: Beam IO Connector

2023-08-14 Thread John Casey via dev
I believe Devon Peticolas wrote a similar tool to create an IO that wrote
to configurable sinks that might fit your use case

On Sat, Aug 12, 2023 at 12:18 PM Bruno Volpato via dev 
wrote:

> Hi Jeremy,
>
> Apparently you are trying to use Beam's DirectRunner
> , which is mostly
> focused on small pipelines / testing purposes.
> Even if it runs in the JVM, there are protections in place to make sure
> your pipeline will be able to be distributed correctly when choosing a
> production-ready runner (e.g., Dataflow, Spark, Flink), from the link above:
>
> - enforcing immutability of elements
> - enforcing encodability of elements
>
> There are ways to disable those checks (--enforceEncodability=false,
> --enforceImmutability=false), but to make sure you take the best out of
> Beam and can run the pipeline in one of the runners in the future, I
> believe the best way would be to write to a file, and read it back in the
> GUI application (for the sink part).
>
> For the source part, you may want to use Create
>  to
> create a PCollection with specific elements for the in-memory scenario.
>
> If you are getting exceptions for supported scenarios that you've
> mentioned, there are a few things -- for example, if you are using lambda,
> sometimes Java will try to Serialize the entire instance that holds members
> being used. Creating your own DoFn classes and passing the Serializables
> that what you need to use may resolve.
>
>
> Best,
> Bruno
>
>
>
>
> On Sat, Aug 12, 2023 at 11:34 AM Jeremy Bloom 
> wrote:
>
>> Hello-
>> I am fairly new to Beam but have been working with Apache Spark for a
>> number of years. The application I am developing uses a data pipeline to
>> ingest JSON with a particular schema, uses it to prepare data for a service
>> that I do not control (a mathematical optimization solver), runs the
>> application and recovers its results, and then publishes the results in
>> JSON (same schema).  Although I work in Java, colleagues of mine are
>> implementing in Python. This is an open-source, non-commercial project.
>>
>> The application has three kinds of IO sources/sinks: file system files
>> (using Windows now, but Unix in the future), URL, and in-memory (string,
>> byte buffer, etc). The last is primarily used for debugging, displayed in a
>> JTextArea.
>>
>> I have not found a Beam IO connector that handles all three data
>> sources/sinks, particularly the in-memory sink. I have tried adapting
>> FileIO and TextIO, however, I continually run up against objects that are
>> not serializable, particularly Java OutputStream and its subclasses. I have
>> looked at the code for FileIO and TextIO as well as several other custom IO
>> implementations, but none of them addresses this particular bug.
>>
>> The CSVSink example in the FileIO Javadoc uses a PrintWriter, which is
>> not serializable; when I tried the same thing, I got a not-serializable
>> exception. How does this example actually avoid this error? In the code for
>> TextIO.Sink, the PrintWriter field is marked transient, meaning that it is
>> not serialized, but again, when I tried the same thing, I got an exception.
>>
>> Please explain, in particular, how to write a Sink that avoids the not
>> serializable exception. In general, please explain how I can use a Beam IO
>> connector for the three kinds of data sources/sinks I want to use (file
>> system, url, and in-memory).
>>
>> After the frustrations I had with Spark, I have high hopes for Beam. This
>> issue is a blocker for me.
>>
>> Thank you.
>> Jeremy Bloom
>>
>


How to Write a Beam IO

2023-06-13 Thread John Casey via dev
By request, I'm resharing my slides and doc on how to write a beam IO


https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit?usp=sharing

https://docs.google.com/presentation/d/14PjBNFoCOFOROiQCdR3hkbg1fkDGSz-0Aer4L88P8Uk/edit?usp=sharing

Please feel free to comment on any questions you have, or places you feel
need more detail

John


Dataloss Bug in BigQuery IO Storage Write when used in Batch

2023-05-03 Thread John Casey via dev
Hi All,

Per https://github.com/apache/beam/issues/26521 and
https://github.com/apache/beam/issues/26520, there is an issue in Beam
versions 2.33 - 2.47 where data can be lost when using the Storage Write
API in Batch. This issue is much more likely to occur in versions 2.44-2.47.

The bugs themselves have been resolved , thanks @Reuven Lax for the quick
fixes.

For now, the recommended workaround is to use the File Loads method of
Bigquery Write instead.

I am ooo for the next week, but we will be doing a postmortem when I return
to identify how a gap like this occurred, and how to prevent it in
the future.

Thanks,
John


Re: [Proposal] Automate Release Signing

2023-05-03 Thread John Casey via dev
+1 to this as well.

On Wed, May 3, 2023 at 3:10 PM Robert Burke  wrote:

> +1 to simplifying release processes, since it leads to a more consistent
> experience.
>
> If we continue to reduce release overhead we'll be able to react with more
> agility when CVEs come a knocking.
>
> On Wed, May 3, 2023, 12:08 PM Jack McCluskey via dev 
> wrote:
>
>> +1 to automating release signing. As it stands now, this step requires a
>> PMC member to add a new release manager's GPG key which can add time to
>> getting a release started. This also results in the public key used to sign
>> each release changing from one version to the next, as different release
>> managers have different keys. Making releases easier to perform and
>> providing a standard signing key for each release both seem like wins here.
>>
>> On Wed, May 3, 2023 at 2:40 PM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hey everyone, I'm currently working on improving our release process so
>>> that it's easier and faster for us to release. As part of this work, I'd
>>> like to propose automating our release signing step (the push java
>>> artifacts step of build_release_candidate.sh
>>> )
>>> using GitHub Actions.
>>>
>>> To do this, we can follow the guide here
>>>  
>>> and
>>> ask the Infra team to add a signing key that we can use to run the
>>> workflow. Basically, the asks would be:
>>>
>>> 1) Add a signing key (and passphrase) as GH Actions Secrets so that we
>>> can sign the artifacts.
>>> 2) Allowlist a GitHub Action (crazy-max/ghaction-import-gpg) to use the
>>> key to sign artifacts.
>>> 3) Add an Apache token (name and password) as GH Actions Secrets so that
>>> we can upload the signed artifacts to Nexus.
>>>
>>> Please let me know if you have any questions or concerns. If nobody
>>> objects or raises more discussion points, I will assume lazy consensus
>>>  after 72
>>> hours.
>>>
>>> Thanks,
>>> Danny
>>>
>>


Re: [ANNOUNCE] New committer: Damon Douglas

2023-04-25 Thread John Casey via dev
Congrats Damon!

On Tue, Apr 25, 2023 at 9:36 AM Yi Hu via dev  wrote:

> Congrats Damon!
>
> On Tue, Apr 25, 2023 at 8:55 AM Ritesh Ghorse via dev 
> wrote:
>
>> Congratulations Damon!
>>
>> On Tue, Apr 25, 2023 at 12:03 AM Byron Ellis via dev 
>> wrote:
>>
>>> Congrats Damon!
>>>
>>> On Mon, Apr 24, 2023 at 8:57 PM Austin Bennett 
>>> wrote:
>>>
 thanks for all you do @Damon Douglas  !

 On Mon, Apr 24, 2023 at 1:00 PM Robert Burke 
 wrote:

> Congratulations Damon!!!
>
> On Mon, Apr 24, 2023, 12:52 PM Kenneth Knowles 
> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Damon Douglas (damondoug...@apache.org)
>>
>> Damon has contributed widely: Beam Katas, playground, infrastructure,
>> and many IO connectors. Damon does lots of code review in addition to 
>> code.
>> (yes, you can review code as a non-committer!)
>>
>> Considering their contributions to the project over this timeframe,
>> the Beam PMC trusts Damon with the responsibilities of a Beam committer. 
>> [1]
>>
>> Thank you Damon! And we are looking to see more of your contributions!
>>
>> Kenn, on behalf of the Apache Beam PMC
>>
>> [1]
>>
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>


Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread John Casey via dev
Thanks Luke

On Thu, Feb 16, 2023 at 12:06 PM Luke Cwik  wrote:

> All the PMC finalization tasks have been completed.
>
> On Thu, Feb 16, 2023 at 8:56 AM Luke Cwik  wrote:
>
>> I'll help out.
>>
>> On Thu, Feb 16, 2023 at 7:08 AM John Casey via dev 
>> wrote:
>>
>>> Can a PMC member help me with PMC only release finalization?
>>> https://beam.apache.org/contribute/release-guide/#pmc-only-finalization
>>>
>>> Thanks,
>>> John
>>>
>>> On Wed, Feb 15, 2023 at 12:22 PM John Casey 
>>> wrote:
>>>
>>>> With 7 approving votes, of which 5 are binding, 2.45.0 RC1 has been
>>>> approved.
>>>>
>>>> Binding votes are:
>>>>
>>>> Luke Cwik
>>>> Chamikara Jayalath
>>>> Ahmet Altay
>>>> Alexey Romanenko
>>>> Robert Bradshaw
>>>>
>>>> There are no disapproving votes.
>>>>
>>>> Thanks for your validations everyone,
>>>> John
>>>>
>>>> On Mon, Feb 13, 2023 at 5:19 PM Ritesh Ghorse via dev <
>>>> dev@beam.apache.org> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>> Validated:
>>>>> 1. Go SDK Quickstart on Direct & Dataflow runner.
>>>>> 2. Dataframe wrapper
>>>>> 3. RunInference Wrapper for Sklearn
>>>>>
>>>>> On Mon, Feb 13, 2023 at 2:56 PM Robert Bradshaw via dev <
>>>>> dev@beam.apache.org> wrote:
>>>>>
>>>>>> +1 (binding)
>>>>>>
>>>>>> Validated release artifacts and signatures and tested a couple of
>>>>>> Python pipelines.
>>>>>>
>>>>>> On Mon, Feb 13, 2023 at 8:57 AM Alexey Romanenko
>>>>>>  wrote:
>>>>>> >
>>>>>> > +1 (binding)
>>>>>> >
>>>>>> > Tested with  https://github.com/Talend/beam-samples/
>>>>>> > (Java SDK v8/v11/v17, Spark 3.x runner).
>>>>>> >
>>>>>> > ---
>>>>>> > Alexey
>>>>>> >
>>>>>> > On 13 Feb 2023, at 17:54, Ahmet Altay via dev 
>>>>>> wrote:
>>>>>> >
>>>>>> > +1 (binding) - I validated python quick starts on direct runner and
>>>>>> python streaming quickstart on dataflow.
>>>>>> >
>>>>>> > Thank you!
>>>>>> >
>>>>>> > On Mon, Feb 13, 2023 at 5:17 AM Bruno Volpato via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>> >>
>>>>>> >> +1 (non-binding)
>>>>>> >>
>>>>>> >> Tested with
>>>>>> https://github.com/GoogleCloudPlatform/DataflowTemplates (Java SDK
>>>>>> 11, Dataflow runner).
>>>>>> >>
>>>>>> >>
>>>>>> >> Thanks!
>>>>>> >>
>>>>>> >> On Mon, Feb 13, 2023 at 1:13 AM Chamikara Jayalath via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>> >>>
>>>>>> >>> +1 (binding)
>>>>>> >>>
>>>>>> >>> Tried several Java and Python multi-language pipelines.
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Cham
>>>>>> >>>
>>>>>> >>> On Fri, Feb 10, 2023 at 1:52 PM Luke Cwik via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>> >>>>
>>>>>> >>>> +1
>>>>>> >>>>
>>>>>> >>>> Validated release artifact signatures and verified the Java
>>>>>> Flink and Spark quickstarts.
>>>>>> >>>>
>>>>>> >>>> On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev <
>>>>>> dev@beam.apache.org> wrote:
>>>>>> >>>>>
>>>>>> >>>>> Addendum to above email.
>>>>>> >>>>>
>>>>>> >>>>> Java artifacts were built with Gradle 7.5.1 and OpenJDK
>>>>>> 1.8.0_362
>>>>>> >>>>>
>>>>>> >>>>> On Fri, Feb 10

Re: [ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-16 Thread John Casey via dev
Thanks Jan!

On Thu, Feb 16, 2023 at 11:11 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> Congratulations!
>
> On Thu, Feb 16, 2023 at 11:09 AM Reza Rokni via dev 
> wrote:
>
>> Congratulations!
>>
>> On Thu, Feb 16, 2023 at 7:47 AM Robert Burke  wrote:
>>
>>> Congratulations!🎉
>>>
>>> On Thu, Feb 16, 2023, 7:44 AM Danielle Syse via dev 
>>> wrote:
>>>
 Congrats, Jan! That's awesome news. Thank you for your continued
 contributions!

 On Thu, Feb 16, 2023 at 10:42 AM Alexey Romanenko <
 aromanenko@gmail.com> wrote:

> Hi all,
>
> Please join me and the rest of the Beam PMC in welcoming Jan Lukavský <
> j...@apache.org> as our newest PMC member.
>
> Jan has been a part of Beam community and a long time contributor
> since 2018 in many significant ways, including code contributions in
> different areas, participating in technical discussions, advocating for
> users, giving a talk at Beam Summit and even writing one of the few Beam
> books!
>
> Congratulations Jan and thanks for being a part of Apache Beam!
>
> ---
> Alexey




Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread John Casey via dev
Can a PMC member help me with PMC only release finalization?
https://beam.apache.org/contribute/release-guide/#pmc-only-finalization

Thanks,
John

On Wed, Feb 15, 2023 at 12:22 PM John Casey  wrote:

> With 7 approving votes, of which 5 are binding, 2.45.0 RC1 has been
> approved.
>
> Binding votes are:
>
> Luke Cwik
> Chamikara Jayalath
> Ahmet Altay
> Alexey Romanenko
> Robert Bradshaw
>
> There are no disapproving votes.
>
> Thanks for your validations everyone,
> John
>
> On Mon, Feb 13, 2023 at 5:19 PM Ritesh Ghorse via dev 
> wrote:
>
>> +1 (non-binding)
>> Validated:
>> 1. Go SDK Quickstart on Direct & Dataflow runner.
>> 2. Dataframe wrapper
>> 3. RunInference Wrapper for Sklearn
>>
>> On Mon, Feb 13, 2023 at 2:56 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (binding)
>>>
>>> Validated release artifacts and signatures and tested a couple of
>>> Python pipelines.
>>>
>>> On Mon, Feb 13, 2023 at 8:57 AM Alexey Romanenko
>>>  wrote:
>>> >
>>> > +1 (binding)
>>> >
>>> > Tested with  https://github.com/Talend/beam-samples/
>>> > (Java SDK v8/v11/v17, Spark 3.x runner).
>>> >
>>> > ---
>>> > Alexey
>>> >
>>> > On 13 Feb 2023, at 17:54, Ahmet Altay via dev 
>>> wrote:
>>> >
>>> > +1 (binding) - I validated python quick starts on direct runner and
>>> python streaming quickstart on dataflow.
>>> >
>>> > Thank you!
>>> >
>>> > On Mon, Feb 13, 2023 at 5:17 AM Bruno Volpato via dev <
>>> dev@beam.apache.org> wrote:
>>> >>
>>> >> +1 (non-binding)
>>> >>
>>> >> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>>> (Java SDK 11, Dataflow runner).
>>> >>
>>> >>
>>> >> Thanks!
>>> >>
>>> >> On Mon, Feb 13, 2023 at 1:13 AM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>> >>>
>>> >>> +1 (binding)
>>> >>>
>>> >>> Tried several Java and Python multi-language pipelines.
>>> >>>
>>> >>> Thanks,
>>> >>> Cham
>>> >>>
>>> >>> On Fri, Feb 10, 2023 at 1:52 PM Luke Cwik via dev <
>>> dev@beam.apache.org> wrote:
>>> >>>>
>>> >>>> +1
>>> >>>>
>>> >>>> Validated release artifact signatures and verified the Java Flink
>>> and Spark quickstarts.
>>> >>>>
>>> >>>> On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev <
>>> dev@beam.apache.org> wrote:
>>> >>>>>
>>> >>>>> Addendum to above email.
>>> >>>>>
>>> >>>>> Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362
>>> >>>>>
>>> >>>>> On Fri, Feb 10, 2023 at 11:14 AM John Casey <
>>> theotherj...@google.com> wrote:
>>> >>>>>>
>>> >>>>>> Hi everyone,
>>> >>>>>> Please review and vote on the release candidate #3 for the
>>> version 2.45.0, as follows:
>>> >>>>>> [ ] +1, Approve the release
>>> >>>>>> [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Reviewers are encouraged to test their own use cases with the
>>> release candidate, and vote +1 if no issues are found.
>>> >>>>>>
>>> >>>>>> The complete staging area is available for your review, which
>>> includes:
>>> >>>>>> * GitHub Release notes [1],
>>> >>>>>> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> 921F35F5EC5F5DDE [3],
>>> >>>>>> * all artifacts to be deployed to the Maven Central Repository
>>> [4],
>>> >>>>>> * source code tag "v2.45.0-RC1" [5],
>>> >>>>>> * website pull request listing the release [6], the blog post
>>> [6], and publishing the API reference manual [7].
>>> >>&g

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-15 Thread John Casey via dev
With 7 approving votes, of which 5 are binding, 2.45.0 RC1 has been
approved.

Binding votes are:

Luke Cwik
Chamikara Jayalath
Ahmet Altay
Alexey Romanenko
Robert Bradshaw

There are no disapproving votes.

Thanks for your validations everyone,
John

On Mon, Feb 13, 2023 at 5:19 PM Ritesh Ghorse via dev 
wrote:

> +1 (non-binding)
> Validated:
> 1. Go SDK Quickstart on Direct & Dataflow runner.
> 2. Dataframe wrapper
> 3. RunInference Wrapper for Sklearn
>
> On Mon, Feb 13, 2023 at 2:56 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (binding)
>>
>> Validated release artifacts and signatures and tested a couple of
>> Python pipelines.
>>
>> On Mon, Feb 13, 2023 at 8:57 AM Alexey Romanenko
>>  wrote:
>> >
>> > +1 (binding)
>> >
>> > Tested with  https://github.com/Talend/beam-samples/
>> > (Java SDK v8/v11/v17, Spark 3.x runner).
>> >
>> > ---
>> > Alexey
>> >
>> > On 13 Feb 2023, at 17:54, Ahmet Altay via dev 
>> wrote:
>> >
>> > +1 (binding) - I validated python quick starts on direct runner and
>> python streaming quickstart on dataflow.
>> >
>> > Thank you!
>> >
>> > On Mon, Feb 13, 2023 at 5:17 AM Bruno Volpato via dev <
>> dev@beam.apache.org> wrote:
>> >>
>> >> +1 (non-binding)
>> >>
>> >> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>> (Java SDK 11, Dataflow runner).
>> >>
>> >>
>> >> Thanks!
>> >>
>> >> On Mon, Feb 13, 2023 at 1:13 AM Chamikara Jayalath via dev <
>> dev@beam.apache.org> wrote:
>> >>>
>> >>> +1 (binding)
>> >>>
>> >>> Tried several Java and Python multi-language pipelines.
>> >>>
>> >>> Thanks,
>> >>> Cham
>> >>>
>> >>> On Fri, Feb 10, 2023 at 1:52 PM Luke Cwik via dev <
>> dev@beam.apache.org> wrote:
>> >>>>
>> >>>> +1
>> >>>>
>> >>>> Validated release artifact signatures and verified the Java Flink
>> and Spark quickstarts.
>> >>>>
>> >>>> On Fri, Feb 10, 2023 at 9:27 AM John Casey via dev <
>> dev@beam.apache.org> wrote:
>> >>>>>
>> >>>>> Addendum to above email.
>> >>>>>
>> >>>>> Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362
>> >>>>>
>> >>>>> On Fri, Feb 10, 2023 at 11:14 AM John Casey <
>> theotherj...@google.com> wrote:
>> >>>>>>
>> >>>>>> Hi everyone,
>> >>>>>> Please review and vote on the release candidate #3 for the version
>> 2.45.0, as follows:
>> >>>>>> [ ] +1, Approve the release
>> >>>>>> [ ] -1, Do not approve the release (please provide specific
>> comments)
>> >>>>>>
>> >>>>>>
>> >>>>>> Reviewers are encouraged to test their own use cases with the
>> release candidate, and vote +1 if no issues are found.
>> >>>>>>
>> >>>>>> The complete staging area is available for your review, which
>> includes:
>> >>>>>> * GitHub Release notes [1],
>> >>>>>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> 921F35F5EC5F5DDE [3],
>> >>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>> >>>>>> * source code tag "v2.45.0-RC1" [5],
>> >>>>>> * website pull request listing the release [6], the blog post [6],
>> and publishing the API reference manual [7].
>> >>>>>> * Java artifacts were built with Gradle GRADLE_VERSION and
>> OpenJDK/Oracle JDK JDK_VERSION.
>> >>>>>> * Python artifacts are deployed along with the source release to
>> the dist.apache.org [2] and PyPI[8].
>> >>>>>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> >>>>>> * Validation sheet with a tab for 2.45.0release to help with
>> validation [10].
>> >>>>>> * Docker images published to Docker Hub [11].
>> >>>>>>
>> >>>>>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>> >>>>>>
>> >>>>>> For guidelines on how to try the release in your projects, check
>> out our blog post at /blog/validate-beam-release/.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> John Casey
>> >>>>>>
>> >>>>>> [1] https://github.com/apache/beam/milestone/8
>> >>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.45.0/
>> >>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> >>>>>> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1293/
>> >>>>>> [5] https://github.com/apache/beam/tree/v2.45.0-RC1
>> >>>>>> [6] https://github.com/apache/beam/pull/25407
>> >>>>>> [7] https://github.com/apache/beam-site/pull/640
>> >>>>>> [8] https://pypi.org/project/apache-beam/2.45.0rc1/
>> >>>>>> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.45.0-RC1/go/pkg/beam
>> >>>>>> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2030665842
>> >>>>>> [11] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>> >
>> >
>>
>


Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-10 Thread John Casey via dev
Addendum to above email.

Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362

On Fri, Feb 10, 2023 at 11:14 AM John Casey  wrote:

> Hi everyone,
> Please review and vote on the release candidate #3 for the version 2.45.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found.
>
> The complete staging area is available for your review, which includes:
> * GitHub Release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 921F35F5EC5F5DDE [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.45.0-RC1" [5],
> * website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7].
> * Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle
> JDK JDK_VERSION.
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> * Go artifacts and documentation are available at pkg.go.dev [9]
> * Validation sheet with a tab for 2.45.0release to help with validation
> [10].
> * Docker images published to Docker Hub [11].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> For guidelines on how to try the release in your projects, check out our
> blog post at /blog/validate-beam-release/.
>
> Thanks,
> John Casey
>
> [1] https://github.com/apache/beam/milestone/8
> [2] https://dist.apache.org/repos/dist/dev/beam/2.45.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1293/
> [5] https://github.com/apache/beam/tree/v2.45.0-RC1
> [6] https://github.com/apache/beam/pull/25407
> [7] https://github.com/apache/beam-site/pull/640
> [8] https://pypi.org/project/apache-beam/2.45.0rc1/
> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.45.0-RC1/go/pkg/beam
> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2030665842
> [11] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>


[VOTE] Release 2.45.0, Release Candidate #1

2023-02-10 Thread John Casey via dev
Hi everyone,
Please review and vote on the release candidate #3 for the version 2.45.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if no issues are found.

The complete staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint 921F35F5EC5F5DDE [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.45.0-RC1" [5],
* website pull request listing the release [6], the blog post [6], and
publishing the API reference manual [7].
* Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle
JDK JDK_VERSION.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI[8].
* Go artifacts and documentation are available at pkg.go.dev [9]
* Validation sheet with a tab for 2.45.0release to help with validation
[10].
* Docker images published to Docker Hub [11].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at /blog/validate-beam-release/.

Thanks,
John Casey

[1] https://github.com/apache/beam/milestone/8
[2] https://dist.apache.org/repos/dist/dev/beam/2.45.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1293/
[5] https://github.com/apache/beam/tree/v2.45.0-RC1
[6] https://github.com/apache/beam/pull/25407
[7] https://github.com/apache/beam-site/pull/640
[8] https://pypi.org/project/apache-beam/2.45.0rc1/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.45.0-RC1/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2030665842
[11] https://hub.docker.com/search?q=apache%2Fbeam&type=image


Beam Release 2.45

2023-01-10 Thread John Casey via dev
Hi All,

I propose we cut 2.45 on January 18, and I nominate myself as the release
manager.
This is a week delayed from the Jan 11 schedule, but this would give the
2.44 release time to finish its processes, allowing 2.45 to pick up any
fixes in the 2.44 release.

Thanks,
John


How to write an IO guide draft

2023-01-09 Thread John Casey via dev
Hi All,

I spent the last few weeks of December drafting a "How to write an IO
guide":
https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit#

and an associated code sample: https://github.com/apache/beam/pull/24799

My goal is to make it easier for a new IO developer to create a new IO from
scratch. This is intended to complement the various standards documents
that have been floating around. Where those are intended to
prescribe structure of an IO, this is more focused on the mechanics of
internal design.

Please take a look and let me know what you think,

John


Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread John Casey via dev
One distinction here is the difference between the URN for a provider /
transform, and the class name in Java.

We should have a standard for both, but they are distinct

On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

>
>
> On Tue, Nov 15, 2022 at 11:50 AM Damon Douglas via dev <
> dev@beam.apache.org> wrote:
>
>> Hello Everyone,
>>
>> Do we like the following Java class naming convention for
>> SchemaTransformProviders [1]?  The proposal is:
>>
>> (Read|Write)SchemaTransformProvider
>>
>>
>> *For those new to Beam, even if this is your first day, consider
>> yourselves a welcome contributor to this conversation.  Below are
>> definitions/references and a suggested learning guide to understand this
>> email.*
>>
>> Explanation
>>
>> The  identifies the Beam I/O [2] and Read or Write identifies a
>> read or write Ptransform, respectively.
>>
>
> Schema-aware transforms are not restricted to I/Os. An arbitrary transform
> can be a Schema-Transform.  Also, designation Read/Write does not map to an
> arbitrary transform. Probably we should try to make this more generic ?
>
> Also, probably what's more important is the identifier of the
> SchemaTransformProvider being unique. Note the class name (the latter is
> guaranteed to be unique if we follow the Java package naming guidelines).
>
> FWIW, we came up with a similar generic URN naming scheme for
> cross-language transforms:
> https://beam.apache.org/documentation/programming-guide/#1314-defining-a-urn
>
> Thanks,
> Cham
>
>
>> For example, to implement a SchemaTransformProvider [1] for
>> BigQueryIO.Write[7], would look like:
>>
>> BigQueryWriteSchemaTransformProvider
>>
>>
>> And to implement a SchemaTransformProvider for PubSubIO.Read[8] would
>> like like:
>>
>> PubsubReadSchemaTransformProvider
>>
>>
>> Definitions/References
>>
>> [1] *SchemaTransformProvider*: A way for us to instantiate Beam IO
>> transforms using a language agnostic configuration.
>> SchemaTransformProvider builds a SchemaTransform[3] from a Beam Row[4] that
>> functions as the configuration of that SchemaProvider.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html
>>
>> [2] *Beam I/O*: PTransform for reading from or writing to sources and
>> sinks.
>> https://beam.apache.org/documentation/programming-guide/#pipeline-io
>>
>> [3] *SchemaTransform*: An interface containing a buildTransform method
>> that returns a PCollectionRowTuple[5] to PCollectionRowTuple PTransform.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransform.html
>>
>> [4] *Row*: A Beam Row is a generic element of data whose properties are
>> defined by a Schema[5].
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/Row.html
>>
>> [5] *Schema*: A description of expected field names and their data types.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>>
>> [6] *PCollectionRowTuple*: A grouping of Beam Rows[4] into a single
>> PInput or POutput tagged by a String name.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/values/PCollectionRowTuple.html
>>
>> [7] *BigQueryIO.Write*: A PTransform for writing Beam elements to a
>> BigQuery table.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html
>>
>> [8] *PubSubIO.Read*: A PTransform for reading from Pub/Sub and emitting
>> message payloads into a PCollection.
>>
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.Read.html
>>
>> Suggested Learning/Reading to understand this email
>>
>> 1. https://beam.apache.org/documentation/programming-guide/#overview
>> 2. https://beam.apache.org/documentation/programming-guide/#transforms
>> (Up to 4.1)
>> 3. https://beam.apache.org/documentation/programming-guide/#pipeline-io
>> 4. https://beam.apache.org/documentation/programming-guide/#schemas
>>
>


Re: [ANNOUNCE] New committer: Yi Hu

2022-11-09 Thread John Casey via dev
Congrats! this is well deserved YI

On Wed, Nov 9, 2022 at 2:58 PM Austin Bennett 
wrote:

> Congrats, and Thanks, Yi!
>
> On Wed, Nov 9, 2022 at 11:24 AM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> I am with the Beam PMC on this, congratulations and very well deserved,
>> Yi!
>>
>> On Wed, Nov 9, 2022 at 11:08 AM Byron Ellis via dev 
>> wrote:
>>
>>> Congratulations!
>>>
>>> On Wed, Nov 9, 2022 at 11:00 AM Pablo Estrada via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 thanks Yi : D

 On Wed, Nov 9, 2022 at 10:47 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Congrats Yi! I've really appreciated the ways you've consistently
> taken responsibility for improving our team's infra and working through
> sharp edges in the codebase that others have ignored. This is definitely
> well deserved!
>
> Thanks,
> Danny
>
> On Wed, Nov 9, 2022 at 1:37 PM Anand Inguva via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations Yi!
>>
>> On Wed, Nov 9, 2022 at 1:35 PM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations Yi!
>>>
>>> On Wed, Nov 9, 2022 at 1:34 PM Ahmed Abualsaud via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats Yi!

 On Wed, Nov 9, 2022 at 1:33 PM Sachin Agarwal via dev <
 dev@beam.apache.org> wrote:

> Congratulations Yi!
>
> On Wed, Nov 9, 2022 at 10:32 AM Kenneth Knowles 
> wrote:
>
>> Hi all,
>>
>> Please join me and the rest of the Beam PMC in welcoming a new
>> committer: Yi Hu (y...@apache.org)
>>
>> Yi started contributing to Beam in early 2022. Yi's contributions
>> are very diverse! I/Os, performance tests, Jenkins, support for 
>> Schema
>> logical types. Not only code but a very large amount of code review. 
>> Yi is
>> also noted for picking up smaller issues that normally would be left 
>> on the
>> backburner and filing issues that he finds rather than ignoring them.
>>
>> Considering their contributions to the project over this
>> timeframe, the Beam PMC trusts Yi with the responsibilities of a Beam
>> committer. [1]
>>
>> Thank you Yi! And we are looking to see more of your
>> contributions!
>>
>> Kenn, on behalf of the Apache Beam PMC
>>
>> [1]
>>
>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>
>


Re: [ANNOUNCE] New committer: Ritesh Ghorse

2022-11-04 Thread John Casey via dev
Congrats!

On Fri, Nov 4, 2022 at 10:36 AM Ahmed Abualsaud via dev 
wrote:

> Congrats Ritesh!
>
> On Fri, Nov 4, 2022 at 10:29 AM Andy Ye via dev 
> wrote:
>
>> Congrats Ritesh!
>>
>> On Fri, Nov 4, 2022 at 9:26 AM Kerry Donny-Clark via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations Ritesh, I'm happy to see your hard work and community
>>> spirit recognized!
>>>
>>> On Fri, Nov 4, 2022 at 10:16 AM Jack McCluskey via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats Ritesh!

 On Thu, Nov 3, 2022 at 10:12 PM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Congrats Ritesh! This is definitely well deserved!
>
> On Thu, Nov 3, 2022 at 8:08 PM Robert Burke 
> wrote:
>
>> Woohoo! Well done Ritesh! :D
>>
>> On Thu, Nov 3, 2022, 5:04 PM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congratulations Ritesh.
>>>
>>> On Thu, Nov 3, 2022 at 7:51 PM Yi Hu via dev 
>>> wrote:
>>>
 Congratulations Ritesh!

 On Thu, Nov 3, 2022 at 7:23 PM Byron Ellis via dev <
 dev@beam.apache.org> wrote:

> Congratulations!
>
> On Thu, Nov 3, 2022 at 4:21 PM Austin Bennett <
> whatwouldausti...@gmail.com> wrote:
>
>> Congratulations, and Thanks @riteshgho...@apache.org!
>>
>> On Thu, Nov 3, 2022 at 4:17 PM Sachin Agarwal via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congrats Ritesh!
>>>
>>> On Thu, Nov 3, 2022 at 4:16 PM Kenneth Knowles 
>>> wrote:
>>>
 Hi all,

 Please join me and the rest of the Beam PMC in welcoming a new
 committer: Ritesh Ghorse (riteshgho...@apache.org)

 Ritesh started contributing to Beam in mid-2021 and has
 contributed immensely to bringin the Go SDK to fruition, in 
 addition to
 contributions to Java and Python and release validation.

 Considering their contributions to the project over this
 timeframe, the Beam PMC trusts Ritesh with the responsibilities of 
 a Beam
 committer. [1]

 Thank you Ritesh! And we are looking to see more of your
 contributions!

 Kenn, on behalf of the Apache Beam PMC

 [1]

 https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer

>>>


Re: Rest connector

2022-09-12 Thread John Casey via dev
We don't have a generic REST API IO at the moment.

There has been some discussion, but as far as I know there haven't been any
IOs developed.

A factor here is that most http based services aren't strictly REST, which
makes the development of an easy to use, but sufficiently generic, IO
challenging.

On Mon, Sep 12, 2022 at 3:03 PM Ahmet Altay  wrote:

> /cc @John Casey 
>
> On Mon, Sep 12, 2022 at 12:03 PM Reddy annapureddy, Vijayas (V.) <
> vredd...@ford.com> wrote:
>
>> Hi Team,
>>
>> We are planning to use Apache beam for a new microservice project which
>> is in the discovery phase.  Apache beam pipeline will read data from GCS
>> and send data to other microservice via REST API.
>>
>> Wanted to check if there is any connector already in place for this use
>> case.
>>
>> Thanks,
>> Vijay
>>
>


Re: SingleStore IO design doc

2022-09-07 Thread John Casey via dev
Hi Adalbert,

This looks good. I've added a few comments to consider.

John

On Wed, Sep 7, 2022 at 5:33 AM Adalbert Makarovych <
amakarovych...@singlestore.com> wrote:

> Hi,
>
> I'm working on the SingleStore IO connector.
> Here  is a GitHub task for
> it.
> It is going to be a large change, so I created a design doc
> 
> for it.
> I would appreciate any review/comments regarding it :)
>
> --
> Adalbert Makarovych
> Software Engineer at SingleStore
>
>
> 
>


Re: SingleStore IO

2022-08-25 Thread John Casey via dev
Hi Adalbert,

The nature of scheduling work with splittable DoFns is such that trying to
start all splits at the same time isn't really supported. In addition, the
general assumption of splitting work in Beam is that a split can be retried
in isolation from other splits, which doesn't look supported by SingleStore
parallel read.

That said, this looks really promising, so I'd be happy to get on a call to
help better understand your design, and see if we can find a solution.

John

On Thu, Aug 25, 2022 at 10:16 AM Adalbert Makarovych <
amakarovych...@singlestore.com> wrote:

> Hello,
>
> I'm working on the SingleStore IO connector and would like to discuss it
> with Beam developers.
> It would be great if the connector can use SingleStore parallel read
> .
> In the ideal case, the connector should use Single-read mode as it is
> faster than Multiple-read and consumes much less memory.
>
> One of the problems is that in Single-read mode, each reader must initiate
> its read query before any readers will receive data. Is it possible to
> somehow configure Beam to start all DoFns at the same time? Or to get the
> numbers of started DoFns at the runtime?
>
> The other problem is that Single-read allows reading data from partition
> only once, so if one reading thread failed - all others should be restarted
> to retry. Is it possible to achieve this behavior? Or to at least
> gracefully fail without additional retries?
>
> Here are the first drafts of the design documentation
> 
> .
> I would appreciate any help with this stuff :)
>
> --
> Adalbert Makarovych
> Software Engineer at SingleStore
>
>
> 
>


Re: KafkaIO.java.configuredKafkaCommit() inconsistency

2022-08-09 Thread John Casey via dev
Thanks for the quick turnaround on this

On Mon, Aug 8, 2022 at 9:34 PM Balázs Németh  wrote:

> thanks, see https://github.com/apache/beam/issues/22631 +
> https://github.com/apache/beam/pull/22633
>
> John Casey via dev  ezt írta (időpont: 2022. aug.
> 8., H, 21:30):
>
>> Which looking at your message again, would imply that the
>> configuredKafkaCommit() method shouldn't inspect isolation.level
>>
>> On Mon, Aug 8, 2022 at 3:27 PM John Casey  wrote:
>>
>>> .withReadCommitted() doesn't commit messages when read, it instead
>>> specifies that the kafka consumer should only read messages that have
>>> themselves been committed to kafka.
>>>
>>> Its use is for exactly once applications.
>>>
>>>
>>>
>>> On Mon, Aug 8, 2022 at 3:16 PM Balázs Németh 
>>> wrote:
>>>
>>>> I have been reading from Kafka and trying to figure out which offset
>>>> management would be the best for my use-case. During that I noticed
>>>> something odd.
>>>>
>>>>
>>>> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2359-L2362
>>>>
>>>> private boolean configuredKafkaCommit() {
>>>>   return getConsumerConfig().get("isolation.level") ==
>>>> "read_committed"
>>>>   ||
>>>> Boolean.TRUE.equals(getConsumerConfig().get(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG));
>>>> }
>>>>
>>>> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2292-L2298
>>>>
>>>> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2321-L2334
>>>>
>>>> The name of the method, and how it's being used in the code certainly
>>>> suggest that using read_committed isolation level handles and commits
>>>> kafka offsets.Seemed strange, but I'm not a Kafka pro, so let's test it.
>>>> Well it does not.
>>>>
>>>> - using ONLY ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG does commit it
>>>> - using ONLY commitOffsetsInFinalize() does commit it
>>>>
>>>> - using ONLY withReadCommitted() does NOT commit it
>>>>
>>>> Dataflow, 2.40.0 Java SDK, without explicitly enabling SDF-read
>>>>
>>>> So is it a bug, or what am I missing here?
>>>>
>>>> If it is indeed a bug, then is it with the read_committed (so it should
>>>> commit it although found no explicit documentation about that anywhere), or
>>>> having that isolation level shouldn't prefer the commit in the finalize and
>>>> that method is wrong?
>>>>
>>>>
>>>>
>>>>


Re: KafkaIO.java.configuredKafkaCommit() inconsistency

2022-08-08 Thread John Casey via dev
Which looking at your message again, would imply that the
configuredKafkaCommit() method shouldn't inspect isolation.level

On Mon, Aug 8, 2022 at 3:27 PM John Casey  wrote:

> .withReadCommitted() doesn't commit messages when read, it instead
> specifies that the kafka consumer should only read messages that have
> themselves been committed to kafka.
>
> Its use is for exactly once applications.
>
>
>
> On Mon, Aug 8, 2022 at 3:16 PM Balázs Németh 
> wrote:
>
>> I have been reading from Kafka and trying to figure out which offset
>> management would be the best for my use-case. During that I noticed
>> something odd.
>>
>>
>> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2359-L2362
>>
>> private boolean configuredKafkaCommit() {
>>   return getConsumerConfig().get("isolation.level") ==
>> "read_committed"
>>   ||
>> Boolean.TRUE.equals(getConsumerConfig().get(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG));
>> }
>>
>> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2292-L2298
>>
>> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2321-L2334
>>
>> The name of the method, and how it's being used in the code certainly
>> suggest that using read_committed isolation level handles and commits
>> kafka offsets.Seemed strange, but I'm not a Kafka pro, so let's test it.
>> Well it does not.
>>
>> - using ONLY ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG does commit it
>> - using ONLY commitOffsetsInFinalize() does commit it
>>
>> - using ONLY withReadCommitted() does NOT commit it
>>
>> Dataflow, 2.40.0 Java SDK, without explicitly enabling SDF-read
>>
>> So is it a bug, or what am I missing here?
>>
>> If it is indeed a bug, then is it with the read_committed (so it should
>> commit it although found no explicit documentation about that anywhere), or
>> having that isolation level shouldn't prefer the commit in the finalize and
>> that method is wrong?
>>
>>
>>
>>


Re: KafkaIO.java.configuredKafkaCommit() inconsistency

2022-08-08 Thread John Casey via dev
.withReadCommitted() doesn't commit messages when read, it instead
specifies that the kafka consumer should only read messages that have
themselves been committed to kafka.

Its use is for exactly once applications.



On Mon, Aug 8, 2022 at 3:16 PM Balázs Németh  wrote:

> I have been reading from Kafka and trying to figure out which offset
> management would be the best for my use-case. During that I noticed
> something odd.
>
>
> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2359-L2362
>
> private boolean configuredKafkaCommit() {
>   return getConsumerConfig().get("isolation.level") == "read_committed"
>   ||
> Boolean.TRUE.equals(getConsumerConfig().get(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG));
> }
>
> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2292-L2298
>
> https://github.com/apache/beam/blob/c9c161d018d116504fd5f83b48853a7071ec9ce4/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L2321-L2334
>
> The name of the method, and how it's being used in the code certainly
> suggest that using read_committed isolation level handles and commits
> kafka offsets.Seemed strange, but I'm not a Kafka pro, so let's test it.
> Well it does not.
>
> - using ONLY ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG does commit it
> - using ONLY commitOffsetsInFinalize() does commit it
>
> - using ONLY withReadCommitted() does NOT commit it
>
> Dataflow, 2.40.0 Java SDK, without explicitly enabling SDF-read
>
> So is it a bug, or what am I missing here?
>
> If it is indeed a bug, then is it with the read_committed (so it should
> commit it although found no explicit documentation about that anywhere), or
> having that isolation level shouldn't prefer the commit in the finalize and
> that method is wrong?
>
>
>
>


Re: [ANNOUNCE] New committer: John Casey

2022-08-01 Thread John Casey via dev
Thanks! I'm looking forward to continuing to improve all of our connectors.

On Mon, Aug 1, 2022 at 1:52 PM Yichi Zhang via dev 
wrote:

> Congratulations John!
>
> On Sat, Jul 30, 2022 at 4:23 AM Robert Burke  wrote:
>
>> Woohoo! Congrats John and welcome to committership!
>>
>> On Fri, Jul 29, 2022, 10:07 PM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming
>>> a new committer: John Casey (johnca...@apache.org)
>>>
>>> John started contributing to Beam in late 2021. John has quickly become
>>> our resident expert on KafkaIO - identifying bugs, making enhancements,
>>> helping users - in addition to a variety of other contributions.
>>>
>>> Considering his contributions to the project over this timeframe, the
>>> Beam PMC trusts John with the responsibilities of a Beam committer. [1]
>>>
>>> Thank you John! And we are looking to see more of your contributions!
>>>
>>> Kenn, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>
>>


Re: Checkpoints timing out upgrading from Beam version 2.29 with Flink 1.12 to Beam 2.38 and Flink 1.14

2022-07-27 Thread John Casey via dev
Would it be possible to recreate the experiments to try and isolate
variables? Right now the 3 cases change both beam and flink versions.



On Tue, Jul 26, 2022 at 11:35 PM Kenneth Knowles  wrote:

> Bumping this and adding +John Casey  who knows
> about KafkaIO and unbounded sources, though probably less about the
> FlinkRunner. It seems you have isolated it to the Flink translation logic.
> I'm not sure who would be the best expert to evaluate if that logic is
> still OK.
>
> Kenn
>
> On Wed, Jun 29, 2022 at 11:07 AM Kathula, Sandeep <
> sandeep_kath...@intuit.com> wrote:
>
>> Hi,
>>
>>We have a stateless application which
>>
>>
>>
>>1. Reads from kafka
>>2. Doing some stateless transformations by reading from in memory
>>databases and updating the records
>>3. Writing back to Kafka.
>>
>>
>>
>>
>>
>>
>>
>> *With Beam 2.23 and Flink 1.9, we are seeing checkpoints are working fine
>> (it takes max 1 min).*
>>
>>
>>
>> *With Beam 2.29 and Flink 1.12, we are seeing checkpoints taking longer
>> time (it takes max 6-7 min sometimes)*
>>
>>
>>
>> *With Beam 2.38 and Flink 1.14, we are seeing checkpoints timing out
>> after 10 minutes.*
>>
>>
>>
>>
>>
>> I am checking Beam code and after some logging and analysis found the
>> problem is at
>> https://github.com/apache/beam/blob/release-2.38.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L287-L307
>>
>>
>>
>>
>>
>> We are still using the old API to read from Kafka and not yet using KafkaIO
>> based on SplittableDoFn.
>>
>>
>>
>> There are two threads
>>
>>1. Legacy source thread reading from kafka and doing entire
>>processing.
>>2. Thread which emits watermark on timer
>>
>> https://github.com/apache/beam/blob/release-2.38.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/UnboundedSourceWrapper.java#L454-L474
>>
>>
>>
>> Both these code blocks are in synchronized block waiting for same
>> checkpoint lock. Under heavy load, the thread reading from kafka is running
>> for ever in the while loop and  the thread emitting the watermarks is
>> waiting for ever to get the lock not emitting the watermarks and the
>> checkpoint times out.
>>
>>
>>
>>
>>
>> Is it a known issue and do we have any solution here? For now we are
>> putting Thread.sleep(1) once for every 10 sec after the synchronized block
>> so that the thread emitting the watermarks can be unblocked and run.
>>
>>
>>
>> One of my colleagues tried to follow up on this (attaching the previous
>> email here) but we didn’t get any reply. Any help on this would be
>> appreciated.
>>
>>
>>
>> Thanks,
>>
>> Sandeep
>>
>


Disabling Kafka IO SDF implementation

2022-07-15 Thread John Casey via dev
Hi All,

There is an issue right now where Kafka IO's SDF implementation isn't
resuming properly when the pipeline restarts
https://github.com/apache/beam/issues/21730.

In addition, there was an issue where Kafka SDF wasn't committing properly
when 'commit in finalize' was specified, and I believe there may also be an
issue with restriction tracking, though I haven't confirmed that.

Because of these issues, I don't have a good degree of trust in Kafka SDF,
and because these aren't edge cases I don't believe there are many Kafka
SDF users at the moment.

As such, I've raised https://github.com/apache/beam/pull/22261 to disable
Kafka SDF temporarily (Thanks @Chamikara Jayalath  for
setting up the new experiment that will allow Kafka Unbounded to continue
working wrapped in SDF for runners that require SDF)

In addition, https://github.com/apache/beam/issues/22303 tracks the work
required to test, fix, and ensure that Kafka SDF is stable.

Thanks,
John