Re: [Kafka Connect] Dead letter queues for source connectors?

2024-03-07 Thread Chris Egerton
Hey Greg,

Thinking more, I do like the idea of a source-side equivalent of the
ErrantRecordReporter interface!

However, I also suspect we may have to reason more carefully about what
users could do with this kind of information in a DLQ topic. Yes, it's an
option to reset the connector (or a copy of it) to the earliest unprocessed
partition/offset in the DLQ topic and start processing data anew from
there, but this might lead to a flood of duplicates. IIRC it also wouldn't
even be this simple if we were to tolerate duplicates--users would have to
reset the connector to the offset just before the one present in the DLQ,
since resetting to the actual offset would make it appear to the connector
as if the record with that offset were produced and committed successfully.
Maybe we could include both the offset for the failed record, and the
offset for the last record before that one with the same source
partition--basically saying to the user "This is the one that failed, and
this is the one that needs to be reset to if you want to try again".

I'm starting to wonder if, for now, we could try to design something that's
useful for metadata and for manual intervention, but perhaps not for direct
usage with a connector (e.g., we wouldn't expect people to take the
contents of the DLQ topic and use it to start up a second connector for the
sole purpose of slurping up previously-dropped records). Thoughts?

Cheers,

Chris

On Tue, Mar 5, 2024 at 6:19 PM Greg Harris 
wrote:

> Hey Chris,
>
> That's a cool idea! That can certainly be applied for failures other
> than poll(), and could be useful when combined with the Offsets
> modification API.
>
> Perhaps failures inside of poll() can be handled by an extra
> mechanism, similar to the ErrantRecordReporter, which allows reporting
> affected source partition/source offsets when a meaningful key or
> value cannot be read.
>
> Thanks,
> Greg
>
> On Tue, Mar 5, 2024 at 3:03 PM Chris Egerton 
> wrote:
> >
> > Hi Greg,
> >
> > This was my understanding as well--if we can't turn a record into a byte
> > array on the source side, it's difficult to know exactly what to write
> to a
> > DLQ topic.
> >
> > One idea I've toyed with recently is that we could write the source
> > partition and offset for the failed record (assuming, hopefully safely,
> > that these can at least be serialized). This may not cover all bases, is
> > highly dependent on how user-friendly the offsets published by the
> > connector are, and does come with the risk of data loss (if the upstream
> > system is wiped before skipped records can be recovered), but could be
> > useful in some scenarios.
> >
> > Thoughts?
> >
> > Chris
> >
> > On Tue, Mar 5, 2024 at 5:49 PM Greg Harris  >
> > wrote:
> >
> > > Hi Yeikel,
> > >
> > > Thanks for your question. It certainly isn't clear from the original
> > > KIP-298, the attached discussion, or the follow-up KIP-610 as to why
> > > the situation is asymmetric.
> > >
> > > The reason as I understand it is: Source connectors are responsible
> > > for importing data to Kafka. If an error occurs during this process,
> > > then writing useful information to a dead letter queue about the
> > > failure is at least as difficult as importing the record correctly.
> > >
> > > For some examples:
> > > * If an error occurs during poll(), the external data has not yet been
> > > transformed into a SourceRecord that the framework can transform or
> > > serialize.
> > > * If an error occurs during conversion/serialization, the external
> > > data cannot be reasonably serialized to be forwarded to the DLQ.
> > > * If a record cannot be written to Kafka, such as due to being too
> > > large, the same failure is likely to happen with writing to the DLQ as
> > > well.
> > >
> > > For the Sink side, we already know that the data was properly
> > > serializable and appeared as a ConsumerRecord. That can
> > > be forwarded to the DLQ as-is with a reasonable expectation for
> > > success, with the same data formatting as the source topic.
> > >
> > > If you have a vision for how this can be improved and are interested,
> > > please consider opening a KIP! The situation can certainly be made
> > > better than it is today.
> > >
> > > Thanks!
> > > Greg
> > >
> > > On Tue, Mar 5, 2024 at 5:35 AM Yeikel Santana 
> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Sink connectors support Dear Letter Queues[1], but Source connectors
> > > 

Re: [Kafka Connect] Dead letter queues for source connectors?

2024-03-05 Thread Chris Egerton
Hi Greg,

This was my understanding as well--if we can't turn a record into a byte
array on the source side, it's difficult to know exactly what to write to a
DLQ topic.

One idea I've toyed with recently is that we could write the source
partition and offset for the failed record (assuming, hopefully safely,
that these can at least be serialized). This may not cover all bases, is
highly dependent on how user-friendly the offsets published by the
connector are, and does come with the risk of data loss (if the upstream
system is wiped before skipped records can be recovered), but could be
useful in some scenarios.

Thoughts?

Chris

On Tue, Mar 5, 2024 at 5:49 PM Greg Harris 
wrote:

> Hi Yeikel,
>
> Thanks for your question. It certainly isn't clear from the original
> KIP-298, the attached discussion, or the follow-up KIP-610 as to why
> the situation is asymmetric.
>
> The reason as I understand it is: Source connectors are responsible
> for importing data to Kafka. If an error occurs during this process,
> then writing useful information to a dead letter queue about the
> failure is at least as difficult as importing the record correctly.
>
> For some examples:
> * If an error occurs during poll(), the external data has not yet been
> transformed into a SourceRecord that the framework can transform or
> serialize.
> * If an error occurs during conversion/serialization, the external
> data cannot be reasonably serialized to be forwarded to the DLQ.
> * If a record cannot be written to Kafka, such as due to being too
> large, the same failure is likely to happen with writing to the DLQ as
> well.
>
> For the Sink side, we already know that the data was properly
> serializable and appeared as a ConsumerRecord. That can
> be forwarded to the DLQ as-is with a reasonable expectation for
> success, with the same data formatting as the source topic.
>
> If you have a vision for how this can be improved and are interested,
> please consider opening a KIP! The situation can certainly be made
> better than it is today.
>
> Thanks!
> Greg
>
> On Tue, Mar 5, 2024 at 5:35 AM Yeikel Santana  wrote:
> >
> > Hi all,
> >
> > Sink connectors support Dear Letter Queues[1], but Source connectors
> don't seem to
> >
> > What is the reason that we decided to do that?
> >
> > In my data pipeline, I'd like to apply some transformations to the
> messages before they are sink, but that leaves me vulnerable to failures as
> I need to either fail the connector or employ logging to track source
> failures
> >
> > It seems that for now, I'll need to apply the transformations as a sink
> and possibly reinsert them back to Kafka for downstream consumption, but
> that sounds unnecessary
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=80453065#content/view/80453065
>


Re: [ANNOUNCE] Apache Kafka 3.7.0

2024-02-27 Thread Chris Egerton
Thanks for running this release, Stanislav! And thanks to all the
contributors who helped implement all the bug fixes and new features we got
to put out this time around.

On Tue, Feb 27, 2024, 13:03 Stanislav Kozlovski <
stanislavkozlov...@apache.org> wrote:

> The Apache Kafka community is pleased to announce the release of
> Apache Kafka 3.7.0
>
> This is a minor release that includes new features, fixes, and
> improvements from 296 JIRAs
>
> An overview of the release and its notable changes can be found in the
> release blog post:
> https://kafka.apache.org/blog#apache_kafka_370_release_announcement
>
> All of the changes in this release can be found in the release notes:
> https://www.apache.org/dist/kafka/3.7.0/RELEASE_NOTES.html
>
> You can download the source and binary release (Scala 2.12, 2.13) from:
> https://kafka.apache.org/downloads#3.7.0
>
>
> ---
>
>
> Apache Kafka is a distributed streaming platform with four core APIs:
>
>
> ** The Producer API allows an application to publish a stream of records to
> one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to one or more
> topics and process the stream of records produced to them.
>
> ** The Streams API allows an application to act as a stream processor,
> consuming an input stream from one or more topics and producing an
> output stream to one or more output topics, effectively transforming the
> input streams to output streams.
>
> ** The Connector API allows building and running reusable producers or
> consumers that connect Kafka topics to existing applications or data
> systems. For example, a connector to a relational database might
> capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of application:
>
> ** Building real-time streaming data pipelines that reliably get data
> between systems or applications.
>
> ** Building real-time streaming applications that transform or react
> to the streams of data.
>
>
> Apache Kafka is in use at large and small companies worldwide, including
> Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
> Target, The New York Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you to the following 146 contributors to this release!
> (Please report an unintended omission)
>
> Abhijeet Kumar, Akhilesh Chaganti, Alieh, Alieh Saeedi, Almog Gavra,
> Alok Thatikunta, Alyssa Huang, Aman Singh, Andras Katona, Andrew
> Schofield, Anna Sophie Blee-Goldman, Anton Agestam, Apoorv Mittal,
> Arnout Engelen, Arpit Goyal, Artem Livshits, Ashwin Pankaj,
> ashwinpankaj, atu-sharm, bachmanity1, Bob Barrett, Bruno Cadonna,
> Calvin Liu, Cerchie, chern, Chris Egerton, Christo Lolov, Colin
> Patrick McCabe, Colt McNealy, Crispin Bernier, David Arthur, David
> Jacot, David Mao, Deqi Hu, Dimitar Dimitrov, Divij Vaidya, Dongnuo
> Lyu, Eaugene Thomas, Eduwer Camacaro, Eike Thaden, Federico Valeri,
> Florin Akermann, Gantigmaa Selenge, Gaurav Narula, gongzhongqiang,
> Greg Harris, Guozhang Wang, Gyeongwon, Do, Hailey Ni, Hanyu Zheng, Hao
> Li, Hector Geraldino, hudeqi, Ian McDonald, Iblis Lin, Igor Soarez,
> iit2009060, Ismael Juma, Jakub Scholz, James Cheng, Jason Gustafson,
> Jay Wang, Jeff Kim, Jim Galasyn, John Roesler, Jorge Esteban Quilcate
> Otoya, Josep Prat, José Armando García Sancio, Jotaniya Jeel, Jouni
> Tenhunen, Jun Rao, Justine Olshan, Kamal Chandraprakash, Kirk True,
> kpatelatwork, kumarpritam863, Laglangyue, Levani Kokhreidze, Lianet
> Magrans, Liu Zeyu, Lucas Brutschy, Lucia Cerchie, Luke Chen, maniekes,
> Manikumar Reddy, mannoopj, Maros Orsak, Matthew de Detrich, Matthias
> J. Sax, Max Riedel, Mayank Shekhar Narula, Mehari Beyene, Michael
> Westerby, Mickael Maison, Nick Telford, Nikhil Ramakrishnan, Nikolay,
> Okada Haruki, olalamichelle, Omnia G.H Ibrahim, Owen Leung, Paolo
> Patierno, Philip Nee, Phuc-Hong-Tran, Proven Provenzano, Purshotam
> Chauhan, Qichao Chu, Matthias J. Sax, Rajini Sivaram, Renaldo Baur
> Filho, Ritika Reddy, Robert Wagner, Rohan, Ron Dagostino, Roon, runom,
> Ruslan Krivoshein, rykovsi, Sagar Rao, Said Boudjelda, Satish Duggana,
> shuoer86, Stanislav Kozlovski, Taher Ghaleb, Tang Yunzi, TapDang,
> Taras Ledkov, tkuramoto33, Tyler Bertrand, vamossagar12, Vedarth
> Sharma, Viktor Somogyi-Vass, Vincent Jiang, Walker Carlson,
> Wuzhengyu97, Xavier Léauté, Xiaobing Fang, yangy, Ritika Reddy,
> Yanming Zhou, Yash Mayya, yuyli, zhaohaidao, Zihao Lin, Ziming Deng
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> https://kafka.apache.org/
>
> Thank you!
>
>
> Regards,
>
> Stanislav Kozlovski
> Release Manager for Apache Kafka 3.7.0
>


Re: [PROPOSAL] Add commercial support page on website

2024-01-11 Thread Chris Egerton
Hi François,

Is it an official policy of the ASF that projects provide a listing of
commercial support options for themselves? I understand that other projects
have chosen to provide one, but this doesn't necessarily imply that all
projects should do the same, and I can't say I find this point very
convincing as a rebuttal to some of the good-faith concerns raised by the
PMC and members of the community so far. However, if there's an official
ASF stance on this topic, then I acknowledge that Apache Kafka should align
with it.

Best,

Chris


On Thu, Jan 11, 2024, 14:50 fpapon  wrote:

> Hi Justine,
>
> I'm not sure to see the difference between "happy users" and vendors
> that advertise their products in some of the company list in the
> "powered by" page.
>
> Btw, my initial purpose of my proposal was to help user to find support
> for production stuff rather than searching in google.
>
> I don't think this is a bad thing because this is something that already
> exist in many ASF projects like:
>
> https://hop.apache.org/community/commercial/
> https://struts.apache.org/commercial-support.html
> https://directory.apache.org/commercial-support.html
> https://tomee.apache.org/commercial-support.html
> https://plc4x.apache.org/users/commercial-support.html
> https://camel.apache.org/community/support/
> https://openmeetings.apache.org/commercial-support.html
> https://guacamole.apache.org/support/
>
> https://cwiki.apache.org/confluence/display/HADOOP2/Distributions+and+Commercial+Support
> https://activemq.apache.org/supporthttps://karaf.apache.org/community.html
> https://netbeans.apache.org/front/main/help/commercial-support/
> https://royale.apache.org/royale-commercial-support/
>
> https://karaf.apache.org/community.html
>
> As I understand for now, the channel for users to find production
> support is:
>
> - The mailing list (u...@kafka.apache.org / d...@kafka.apache.org)
>
> - The official #kafka  ASF Slack channel (may be we can add it on the
> website because I didn't find it in the website =>
> https://kafka.apache.org/contact)
>
> - Search in google for commercial support only
>
> I can update my PR to mention only the 3 points above for the "get
> support" page if people think that having a support page make sense.
>
> regards,
>
> François
>
> On 11/01/2024 19:34, Justine Olshan wrote:
> > I think there is a difference between the "Powered by" page and a page
> for
> > vendors to advertise their products and services.
> >
> > The idea is that the companies on that page are "powered by" Kafka. They
> > serve as examples of happy users of Kafka.
> > I don't think it is meant only as a place just for those companies to
> > advertise.
> >
> > I'm a little confused by
> >
> >> In this case, I'm ok to say that the commercial support section in the
> > "Get support" is no need as we can use this page.
> >
> > If you plan to submit for this page, please include a description on how
> > your company uses Kafka.
> >
> > I'm happy to hear other folks' opinions on this page as well.
> >
> > Thanks,
> > Justine
> >
> >
> >
> > On Thu, Jan 11, 2024 at 8:57 AM fpapon  wrote:
> >
> >> Hi,
> >>
> >> About the vendors list and neutrality, what is the policy of the
> >> "Powered by" page?
> >>
> >> https://kafka.apache.org/powered-by
> >>
> >> We can see company with logo, some are talking about their product
> >> (Agoora), some are offering services (Instaclustr, Aiven), and we can
> >> also see some that just put their logo and a link to their website
> >> without any explanation (GoldmanSachs).
> >>
> >> So as I understand and after reading the text in the footer of this
> >> page, every company can add themselves by providing a PR right?
> >>
> >> "Want to appear on this page?
> >> Submit a pull request or send a quick description of your organization
> >> and usage to the mailing list and we'll add you."
> >>
> >> In this case, I'm ok to say that the commercial support section in the
> >> "Get support" is no need as we can use this page.
> >>
> >> regards,
> >>
> >> François
> >>
> >>
> >> On 10/01/2024 19:03, Kenneth Eversole wrote:
> >>> I agree with Divji here and to be more pointed. I worry that if we go
> >> down
> >>> the path of adding vendors to a list it comes off as supporting their
> >>> product, not to mention could be a huge security risk for novice
> users. I
> >>> would rather this be a callout to other purely open source tooling,
> such
> >> as
> >>> cruise control.
> >>>
> >>> Divji brings up good question
> >>> 1.  What value does additional of this page bring to the users of
> Apache
> >>> Kafka?
> >>>
> >>> I think the community would be a better service to have a more
> >> synchronous
> >>> line of communication such as Slack/Discord and we call that out here.
> It
> >>> would be more inline with other major open source projects.
> >>>
> >>> ---
> >>> Kenneth Eversole
> >>>
> >>> On Wed, Jan 10, 2024 at 10:30 AM Divij Vaidya  >
> >>> wrote:
> >>>
>  I don't see a need for this. What 

[DISCUSS] Kafka Connect source task interruption semantics

2023-12-12 Thread Chris Egerton
Hi all,

I'd like to solicit input from users and maintainers on a problem we've
been dealing with for source task cleanup logic.

If you'd like to pore over some Jira history, here's the primary link:
https://issues.apache.org/jira/browse/KAFKA-15090

To summarize, we accidentally introduced a breaking change for Kafka
Connect in https://github.com/apache/kafka/pull/9669. Before that change,
the SourceTask::stop method [1] would be invoked on a separate thread from
the one that did the actual data processing for the task (polling the task
for records, transforming and converting those records, then sending them
to Kafka). After that change, we began invoking SourceTask::stop on the
same thread that handled data processing for the task. This had the effect
that tasks which blocked indefinitely in the SourceTask::poll method [2]
with the expectation that they could stop blocking when SourceTask::stop
was invoked would no longer be capable of graceful shutdown, and may even
hang forever.

This breaking change was introduced in the 3.0.0 release, a little over two
three ago. Since then, source connectors may have been modified to adapt to
the change in behavior by the Connect framework. As a result, we are
hesitant to go back to the prior logic of invoking SourceTask::stop on a
separate thread (see the linked Jira ticket for more detail on this front).

In https://github.com/apache/kafka/pull/14316, I proposed that we begin
interrupting the data processing thread for the source task after it had
exhausted its graceful shutdown timeout (i.e., when the Kafka Connect
runtime decides to cancel [3], [4], [5] the task). I believe this change is
fairly non-controversial--once a task has failed to shut down gracefully,
the runtime can and should do whatever it wants to force a shutdown,
graceful or otherwise.

With all that context out of the way, the question I'd like to ask is: do
we believe it's also appropriate to interrupt the data processing thread
when the task is scheduled for shutdown [6], [7]? This interruption would
ideally be followed up by a graceful shutdown of the task, which may
require the Kafka Connect runtime to handle a potential
InterruptedException from SourceTask::poll. Other exceptions (such as a
wrapped InterruptedException) would be impossible to handle gracefully, and
may lead to spurious error messages in the logs and failed final offset
commits for connectors that do not work well with this new behavior.

Finally, one important note: in the official documentation for
SourceTask::poll, we do already state that this method should not block for
too long:

> If no data is currently available, this method should block but return
control to the caller regularly (by returning null) in order for the task
to transition to the PAUSED state if requested to do so.

Looking forward to everyone's thoughts on this tricky issue!

Cheers,

Chris

[1] -
https://kafka.apache.org/36/javadoc/org/apache/kafka/connect/source/SourceTask.html#stop()
[2] -
https://kafka.apache.org/36/javadoc/org/apache/kafka/connect/source/SourceTask.html#poll()
[3] -
https://github.com/apache/kafka/blob/c5ee82cab447b094ad7491200afa319515d5f467/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1037
[4] -
https://github.com/apache/kafka/blob/c5ee82cab447b094ad7491200afa319515d5f467/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerTask.java#L129-L136
[5] -
https://github.com/apache/kafka/blob/c5ee82cab447b094ad7491200afa319515d5f467/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractWorkerSourceTask.java#L284-L297
[6] -
https://github.com/apache/kafka/blob/c5ee82cab447b094ad7491200afa319515d5f467/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1014
[7] -
https://github.com/apache/kafka/blob/c5ee82cab447b094ad7491200afa319515d5f467/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerTask.java#L112-L127


Re: [ANNOUNCE] Apache Kafka 3.6.0

2023-10-11 Thread Chris Egerton
Thanks Satish for all your hard work as release manager, and thanks to
everyone else for their contributions!

On Wed, Oct 11, 2023 at 6:07 AM Mickael Maison 
wrote:

> Thanks Satish and to everyone who contributed to this release!
>
> Mickael
>
>
> On Wed, Oct 11, 2023 at 11:09 AM Divij Vaidya 
> wrote:
> >
> > Thank you for all the hard work, Satish.
> >
> > Many years after the KIP-405 was written, we have it implemented and
> > finally available for beta testing for the users. It's a big milestone in
> > 3.6.0. Kudos again to you for driving it to this milestone. I am looking
> > forward to hearing the feedback from users so that we can fix the paper
> > cuts in 3.7.0.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Wed, Oct 11, 2023 at 9:32 AM Viktor Somogyi-Vass
> >  wrote:
> >
> > > Thanks for the release Satish! :)
> > >
> > > On Wed, Oct 11, 2023, 09:30 Bruno Cadonna  wrote:
> > >
> > > > Thanks for the release, Satish!
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > > On 10/11/23 8:29 AM, Luke Chen wrote:
> > > > > Thanks for running the release, Satish!
> > > > >
> > > > > BTW, 3.6.0 should be a major release, not a minor one. :)
> > > > >
> > > > > Luke
> > > > >
> > > > > On Wed, Oct 11, 2023 at 1:39 PM Satish Duggana  >
> > > > wrote:
> > > > >
> > > > >> The Apache Kafka community is pleased to announce the release for
> > > > >> Apache Kafka 3.6.0
> > > > >>
> > > > >> This is a minor release and it includes fixes and improvements
> from
> > > 238
> > > > >> JIRAs.
> > > > >>
> > > > >> All of the changes in this release can be found in the release
> notes:
> > > > >> https://www.apache.org/dist/kafka/3.6.0/RELEASE_NOTES.html
> > > > >>
> > > > >> An overview of the release can be found in our announcement blog
> post:
> > > > >> https://kafka.apache.org/blog
> > > > >>
> > > > >> You can download the source and binary release (Scala 2.12 and
> Scala
> > > > 2.13)
> > > > >> from:
> > > > >> https://kafka.apache.org/downloads#3.6.0
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> ---
> > > > >>
> > > > >>
> > > > >> Apache Kafka is a distributed streaming platform with four core
> APIs:
> > > > >>
> > > > >>
> > > > >> ** The Producer API allows an application to publish a stream of
> > > > records to
> > > > >> one or more Kafka topics.
> > > > >>
> > > > >> ** The Consumer API allows an application to subscribe to one or
> more
> > > > >> topics and process the stream of records produced to them.
> > > > >>
> > > > >> ** The Streams API allows an application to act as a stream
> processor,
> > > > >> consuming an input stream from one or more topics and producing an
> > > > >> output stream to one or more output topics, effectively
> transforming
> > > the
> > > > >> input streams to output streams.
> > > > >>
> > > > >> ** The Connector API allows building and running reusable
> producers or
> > > > >> consumers that connect Kafka topics to existing applications or
> data
> > > > >> systems. For example, a connector to a relational database might
> > > > >> capture every change to a table.
> > > > >>
> > > > >>
> > > > >> With these APIs, Kafka can be used for two broad classes of
> > > application:
> > > > >>
> > > > >> ** Building real-time streaming data pipelines that reliably get
> data
> > > > >> between systems or applications.
> > > > >>
> > > > >> ** Building real-time streaming applications that transform or
> react
> > > > >> to the streams of data.
> > > > >>
> > > > >>
> > > > >> Apache Kafka is in use at large and small

Re: Kafka Connect - Customize REST request headers

2023-10-07 Thread Chris Egerton
Hi Yeikel,

Neat question! And thanks for the link to the RestClient code; very helpful.

I don't believe there's a way to configure Kafka Connect to add these
headers to forwarded requests right now. You may be able to do some kind of
out-of-band proxy magic to intercept forwarded requests and insert the
proper headers there?

I don't see a reason for Kafka Connect to only forward authorization
headers, even after examining the PR [1] and corresponding Jira ticket [2]
that altered the RestClient class to begin including authorization headers
in forwarded REST requests. We may be able to tweak the RestClient to
include all headers instead of just the authorization header. I know that
this doesn't help your immediate situation, but if other committers and
contributors agree that the change would be beneficial, we may be able to
include it in the next release (which may be 3.7.0, or a patch release for
3.4, 3.5, or 3.6). Alternatively, we may have to gate such a change behind
a feature flag (either a coarse-grained boolean that enables/disables
forwarding of all non-authorization headers, or more fine-grained logic
such as include/exclude lists or even regexes), which would require a KIP
and may take longer to release.

I've CC'd the dev list to gather their perspective on this potential
change, and to solicit their input on possible workarounds that may be
useful to you sooner than the next release takes place.

[1] - https://github.com/apache/kafka/pull/6791
[2] - https://issues.apache.org/jira/browse/KAFKA-8404

Cheers,

Chris

On Fri, Oct 6, 2023 at 10:14 PM Yeikel Santana  wrote:

> Hello everyone,
>
> I'm currently running Kafka Connect behind a firewall that mandates the
> inclusion of a specific header. This situation becomes particularly
> challenging when forwarding requests among multiple workers, as it appears
> that only the Authorization header is included in the request.
>
> I'm wondering if there's a way to customize the headers of Kafka Connect
> before they are forwarded between workers. From my observations, it seems
> that this capability may not be available[1], and only the response headers
> can be customized.
>
> I'd appreciate any realistic alternatives or suggestions you may have in
> mind.
>
> Thanks!
>
>
>
>
>
>
> [1]
> https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/rest/RestClient.java#L191-L198


Re: requesting ability to be assigned JIRA ticket

2023-08-15 Thread Chris Egerton
Hi Neil,

You should be good to go now. Thanks for your interest in contributing to
Apache Kafka!

Cheers,

Chris

On Tue, Aug 15, 2023 at 12:28 PM Neil Buesing  wrote:

> Looking to make the minor fix to the documentation for a bug I reported ,
> KAFKA-13945, so I need to get my JIRA ID added.
>
> username: nbuesing
>
> Thanks,
> Neil
>
> >if you have not been added to the list, send an email to the users@kafka
> mailing list to request for it).
>


Re: [VOTE] 3.5.1 RC1

2023-07-17 Thread Chris Egerton
Hi Divij,

Thanks for running this release!

To verify, I:
- Built from source using Java 11 with both:
- - the 3.5.1-rc1 tag on GitHub
- - the kafka-3.5.1-src.tgz artifact from
https://home.apache.org/~divijv/kafka-3.5.1-rc1/
- Checked signatures and checksums
- Ran the quickstart using the kafka_2.13-3.5.1.tgz artifact from
https://home.apache.org/~divijv/kafka-3.5.1-rc1/ with Java 11 and Scala 13
in KRaft mode
- Ran all unit tests
- Ran all integration tests for Connect and MM2
- Verified that only version 1.1.10.1 of Snappy is present in the libs/
directory of the unpacked kafka_2.12-3.5.1.tgz and kafka_2.13-3.5.1.tgz
artifacts
- Verified that case-insensitive validation of the security.protocol
property is restored for Kafka clients by setting it to "pLAiNTexT" with
the bin/kafka-topics.sh command (using the --command-config option), and
with a standalone Connect worker (by adjusting the security.protocol,
consumer.security.protocol, producer.security.protocol, and
admin.security.protocol properties in the worker config file)

Everything looks good to me!

+1 (binding)

Cheers,

Chris

On Mon, Jul 17, 2023 at 12:29 PM Federico Valeri 
wrote:

> Hi Divij, I did the following checks:
>
> - Checked signature, checksum, licenses
> - Spot checked documentation and javadoc
> - Built from source with Java 17 and Scala 2.13
> - Ran full unit and integration test suites
> - Ran test Java app using staging Maven artifacts
>
> +1 (non binding)
>
> Cheers
> Fede
>
> On Mon, Jul 17, 2023 at 10:27 AM Divij Vaidya 
> wrote:
> >
> > Hello Kafka users, developers and client-developers,
> >
> > This is the second candidate (RC1) for release of Apache Kafka 3.5.1.
> First
> > release candidate (RC0) was discarded due to incorrect license files.
> They
> > have been fixed since then.
> >
> > This release is a security patch release. It upgrades the dependency,
> > snappy-java, to a version which is not vulnerable to CVE-2023-34455. You
> > can find more information about the CVE at Kafka CVE list
> > .
> >
> > Additionally, this releases fixes a regression introduced in 3.3.0, which
> > caused security.protocol configuration values to be restricted to upper
> > case only. With this release, security.protocol values are
> > case insensitive. See KAFKA-15053
> >  for details.
> >
> > Release notes for the 3.5.1 release:
> > https://home.apache.org/~divijv/kafka-3.5.1-rc1/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Thursday, July 20, 9am PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~divijv/kafka-3.5.1-rc1/
> >
> > Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > Javadoc:
> > https://home.apache.org/~divijv/kafka-3.5.1-rc1/javadoc/
> >
> > Tag to be voted upon (off 3.5 branch) is the 3.5.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.5.1-rc1
> >
> > Documentation:
> > https://kafka.apache.org/35/documentation.html
> > Please note that documentation will be updated with upgrade notes (
> >
> https://github.com/apache/kafka/commit/4c78fd64454e25e3536e8c7ed5725d3fbe944a49
> )
> > after the release is complete.
> >
> > Protocol:
> > https://kafka.apache.org/35/protocol.html
> >
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/43/ (2
> failures)
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/42/ (6
> failures)
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.5/39/ (9
> failures)
> >
> > In all 3 runs above, there are no common tests which are failing, which
> > leads me to believe that they are flaky. I have also verified that
> > unit/integration tests on my local machine successfully pass (JDK 17 +
> > Scala 2.13)
> >
> > System tests:
> > Not planning to run system tests since this is a patch release.
> >
> > Thank you.
> >
> > --
> > Divij Vaidya
> > Release Manager for Apache Kafka 3.5.1
>


Re: Kafka Connect exactly-once semantic and very large transactions

2023-06-08 Thread Chris Egerton
Hi Vojta,

>From my limited understanding of the Debezium snapshot process, I believe
that you're correct that producing the entire snapshot in a transaction is
the way to provide exactly-once semantics during that phase. If there's a
way to recover in-progress snapshots and skip over already-produced
records, then that could be a suitable alternative.

You're correct that a large transaction timeout may be required to
accommodate this case (we even try to call this out in the error message
that users see on transaction timeouts [1]). I'm not very familiar with
broker logic but with my limited understanding, your assessment of the
impact of delayed log compaction also seems valid.

The only other issue that comes to my mind is that latency will be higher
for downstream consumers since they won't be able to read any records until
the entire transaction is complete, assuming they're using the
read_committed isolation level. But given that this is the snapshotting
phase and you're presumably moving historical data instead of real-time
updates to your database, this should hopefully be acceptable for most
users.

I'd be interested to hear what someone more familiar with client and broker
internals has to say! Going to be following this thread.

[1] -
https://github.com/apache/kafka/blob/513e1c641d63c5e15144f9fcdafa1b56c5e5ba09/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ExactlyOnceWorkerSourceTask.java#L357

Cheers,

Chris

On Thu, Jun 8, 2023 at 7:53 AM Vojtech Juranek  wrote:

> Hi,
> I'm investigating possibilities of exactly-once semantic for Debezium [1]
> Kafka Connect source connectors, which implements change data capture for
> various databases. Debezium has two phases, initial snapshot phase and
> streaming phase. Initial snapshot phase loads existing data from the
> database
> and send it to the Kafka, subsequent streaming phase captures any changes
> to
> the data.
>
> Exactly-once delivery seems to work really well during the streaming
> phase.
> Now, I'm investigating how to ensure exactly-once delivery for initial
> snapshot phase.  If the snapshot fails (e.g. due to DB connection drop or
> worker node crash), we force new snapshot after the restart as the data
> may
> change during the restart and the snapshot has to reflect the state of the
> data
> in time when it was executed. However, re-taking the snapshot produces
> duplicate records in the Kafka related topics.
>
> Probably the most easy solution to this issue is to run the whole snapshot
> in
> a single Kafka transaction. This may result into a huge transaction,
> containing millions of records, in some cases even billions of records. As
> these records cannot be consumed until transaction is committed and
> therefore
> logs cannot be compacted, this would potentially result in huge increase
> of
> Kafka logs. Also, as for the large DBs this is time consuming process, it
> would very likely result in transaction timeouts (unless the timeout is
> set to
> very large value).
>
> Is my understanding of the impact of very large transactions correct? Are
> there any other drawbacks I'm missing (e.g. can it also result in some
> memory
> issue or something similar)?
>
> Thanks in advanced!
> Vojta
>
> [1] https://debezium.io/


Re: [VOTE] 3.4.1 RC3

2023-05-30 Thread Chris Egerton
Hi Luke,

Many thanks for your continued work on this release!

To verify, I:
- Built from source using Java 11 with both:
- - the 3.4.1-rc3 tag on GitHub
- - the kafka-3.4.1-src.tgz artifact from
https://home.apache.org/~showuon/kafka-3.4.1-rc3/
- Checked signatures and checksums
- Ran the quickstart using the kafka_2.13-3.4.1.tgz artifact from
https://home.apache.org/~showuon/kafka-3.4.1-rc3/ with Java 11 and Scala 13
in KRaft mode
- Ran all unit tests
- Ran all integration tests for Connect and MM2

+1 (binding)

Cheers,

Chris

On Tue, May 30, 2023 at 11:16 AM Mickael Maison 
wrote:

> Hi Luke,
>
> I built from source with Java 11 and Scala 2.13 and ran the unit and
> integration tests. It took a few retries to get some of them to pass.
> I verified signatures and hashes and also ran the zookeeper quickstart.
>
> +1 (binding)
>
> Thanks,
> Mickael
>
> On Sat, May 27, 2023 at 12:58 PM Jakub Scholz  wrote:
> >
> > +1 (non-binding) ... I used the staged binaries and Maven artifacts to
> run
> > my tests and all seems to work fine.
> >
> > Thanks for running the release.
> >
> > Jakub
> >
> > On Fri, May 26, 2023 at 9:34 AM Luke Chen  wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the 4th candidate for release of Apache Kafka 3.4.1.
> > >
> > > This is a bugfix release with several fixes since the release of
> 3.4.0. A
> > > few of the major issues include:
> > > - core
> > > KAFKA-14644 
> Process
> > > should stop after failure in raft IO thread
> > > KAFKA-14946  KRaft
> > > controller node shutting down while renouncing leadership
> > > KAFKA-14887  ZK
> session
> > > timeout can cause broker to shutdown
> > > - client
> > > KAFKA-14639  Kafka
> > > CooperativeStickyAssignor revokes/assigns partition in one rebalance
> cycle
> > > - connect
> > > KAFKA-12558  MM2
> may
> > > not
> > > sync partition offsets correctly
> > > KAFKA-14666  MM2
> should
> > > translate consumer group offsets behind replication flow
> > > - stream
> > > KAFKA-14172  bug:
> State
> > > stores lose state when tasks are reassigned under EOS
> > >
> > >
> > > Release notes for the 3.4.1 release:
> > > https://home.apache.org/~showuon/kafka-3.4.1-rc3/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Jun 2, 2023
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~showuon/kafka-3.4.1-rc3/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~showuon/kafka-3.4.1-rc3/javadoc/
> > >
> > > * Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
> > > https://github.com/apache/kafka/releases/tag/3.4.1-rc3
> > >
> > > * Documentation: (will be updated after released)
> > > https://kafka.apache.org/34/documentation.html
> > >
> > > * Protocol: (will be updated after released)
> > > https://kafka.apache.org/34/protocol.html
> > >
> > > The most recent build has had test failures. These all appear to be
> due to
> > > flakiness, but it would be nice if someone more familiar with the
> failed
> > > tests could confirm this. I may update this thread with passing build
> links
> > > if I can get one, or start a new release vote thread if test failures
> must
> > > be addressed beyond re-running builds until they pass.
> > >
> > > Unit/integration tests:
> > > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/141/
> > >
> > > System tests:
> > > Will update the results later
> > >
> > > Thank you
> > > Luke
> > >
>


Re: [VOTE] 3.4.1 RC1

2023-05-22 Thread Chris Egerton
Hi Luke,

Thanks for running the release!

Steps I took to verify:

- Built from source with Java 11
- Checked signatures and checksums
- Ran the quickstart with Java 11 in KRaft mode
- Ran all unit tests
- - The org.apache.kafka.common.utils.UtilsTest.testToLogDateTimeDormat
test case failed consistently. It appears this is because we haven't
backported a fix for KAFKA-14836 onto the 3.4 branch; after applying those
changes to my local copy of the RC's source code, the test began to pass. I
don't know if we want to count this as a blocker since the test failure is
not indicative of actual issues with the main code base, but it does seem
like a smooth backport is possible and would fix these test failure if we
want to generate a new RC.
- Ran all integration tests for Connect and MM2

Aside from the noted unit test failure, evening else looks good.

Cheers,

Chris

On Mon, May 22, 2023, 10:50 Federico Valeri  wrote:

> Hi Luke,
>
> - Source signature and checksum
> - Build from source with Java 17 and Scala 2.13
> - Full unit and integration test suite
> - Java app with staging Maven artifacts
>
> +1 (non binding)
>
> Thanks
> Fede
>
> PS: Links still point to RC0, but I checked and RC1 artifacts are
> there, including Maven. The only risk I see is that you may actually
> test with the wrong artifacts. To avoid any confusion, I would suggest
> to resend them on this thread.
>
>
>
>
>
>
> On Mon, May 22, 2023 at 2:53 PM Luke Chen  wrote:
> >
> > Hello Kafka users, developers and client-developers,
> >
> > This is the 2nd candidate for release of Apache Kafka 3.4.1.
> >
> > This is a bugfix release with several fixes since the release of 3.4.0. A
> > few of the major issues include:
> > - core
> > KAFKA-14644  Process
> > should stop after failure in raft IO thread
> > KAFKA-14946  KRaft
> > controller node shutting down while renouncing leadership
> > KAFKA-14887  ZK
> session
> > timeout can cause broker to shutdown
> > - client
> > KAFKA-14639  Kafka
> > CooperativeStickyAssignor revokes/assigns partition in one rebalance
> cycle
> > - connect
> > KAFKA-12558  MM2 may
> not
> > sync partition offsets correctly
> > KAFKA-14666  MM2
> should
> > translate consumer group offsets behind replication flow
> > - stream
> > KAFKA-14172  bug:
> State
> > stores lose state when tasks are reassigned under EOS
> >
> > Release notes for the 3.4.1 release:
> > https://home.apache.org/~showuon/kafka-3.4.1-rc0/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by May 29, 2023
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~showuon/kafka-3.4.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~showuon/kafka-3.4.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
> > https://github.com/apache/kafka/releases/tag/3.4.1-rc0
> >
> > * Documentation: (will be updated after released)
> > https://kafka.apache.org/34/documentation.html
> >
> > * Protocol: (will be updated after released)
> > https://kafka.apache.org/34/protocol.html
> >
> > The most recent build has had test failures. These all appear to be due
> to
> > flakiness, but it would be nice if someone more familiar with the failed
> > tests could confirm this. I may update this thread with passing build
> links
> > if I can get one, or start a new release vote thread if test failures
> must
> > be addressed beyond re-running builds until they pass.
> >
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/135/
> > 
> >
> > System tests:
> > Will update the results later
> >
> > Confirmed Maven artifacts are in staging repository.
> >
> > Thank you.
> > Luke
>


Re: MirrorMaker 2.0 question

2023-03-20 Thread Chris Egerton
Mi Miguel,

How many nodes are you running MM2 with? Just one?

Separately, do you notice anything at ERROR level in the logs?

Cheers,

Chris

On Mon, Mar 20, 2023 at 5:35 PM Miguel Ángel Fernández Fernández <
miguelangelprogramac...@gmail.com> wrote:

> Hello,
>
> I'm doing some tests with MirrorMaker 2 but I'm stuck. I have a couple of
> kafka clusters, I think everything is set up correctly. However, when I run
>
> /bin/connect-mirror-maker /var/lib/kafka/data/mm2.properties
>
> the result I get is the creation of the topics
>
> mm2-configs.A.internal,
> mm2-offsets.A.internal,
> mm2-status.A.internal
>
> but not of the specific topics found in cluster A (topic1).
>
> The mm2.properties configuration is basically a copy of
>
>
> https://github.com/apache/kafka/blob/trunk/config/connect-mirror-maker.properties
>
> modifying bootstrap servers
>
> A.bootstrap.servers = 127.0.0.1:29092
> B.bootstrap.servers = 127.0.0.1:29093
>
> Can anyone advise me with this problem? Is it possible that some
> configuration in mm2.properties is missing? Are there any real requirements
> regarding the configuration of the kafka clusters that may be overlooked?
>
> Thank you for your help,
> Miguel Ángel
>


Re: Exactly once kafka connect query

2023-03-16 Thread Chris Egerton
Hi Nitty,

I understand your concerns about preserving intellectual property. Perhaps
to avoid these altogether, instead of a call, you can provide a
reproduction of your issues that is acceptable to share with the public? If
I'm able to successfully diagnose the problem, I can share a summary on the
mailing list, and if things are still unclear, I can join a call to discuss
the publicly-visible example and what seems to be the issue.

Cheers,

Chris

On Thu, Mar 16, 2023 at 5:54 AM NITTY BENNY  wrote:

> Hi Xiaoxia,
>
> I am not able to see the attachments you shared with me. I don't understand
> the problem you are talking about.
> What do you want me to look?
>
> Thanks,
> Nitty
>
> On Thu, Mar 16, 2023 at 1:54 AM 小侠 <747359...@qq.com.invalid> wrote:
>
> > Hi Nitty,
> > I'm so sorry to forget the signature.
> > Looking forward to your reply.
> >
> >
> > Thank you,
> > Xiaoxia
> >
> >
> >
> >
> > -- 原始邮件 --
> > *发件人:* "users" ;
> > *发送时间:* 2023年3月15日(星期三) 晚上6:38
> > *收件人:* "users";
> > *主题:* Re: Exactly once kafka connect query
> >
> > Hi Chris,
> >
> > We won't be abe to share the source code since it is the properetry
> Amdocs
> > code.
> >
> > If you have time for a call, I can show you the code and
> > reproduction scenario over the call. I strongly believe you can find the
> > issue with that.
> >
> > Thanks,
> > Nitty
> >
> > Thanks,
> > Nitty
> >
> > On Tue, Mar 14, 2023 at 3:04 PM Chris Egerton 
> > wrote:
> >
> > > Hi Nitty,
> > >
> > > Sorry, I should have clarified. The reason I'm thinking about shutdown
> > here
> > > is that, when exactly-once support is enabled on a Kafka Connect
> cluster
> > > and a new set of task configurations is generated for a connector, the
> > > Connect framework makes an effort to shut down all the old task
> instances
> > > for that connector, and then fences out the transactional producers for
> > all
> > > of those instances. I was thinking that this may lead to the producer
> > > exceptions you are seeing but, after double-checking this assumption,
> > that
> > > does not appear to be the case.
> > >
> > > Would it be possible to share the source code for your connector and a
> > > reproduction scenario for what you're seeing? That may be easier than
> > > coordinating a call.
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Tue, Mar 14, 2023 at 6:15 AM NITTY BENNY 
> > wrote:
> > >
> > > > Hi Chris,
> > > >
> > > > Is there any possibility to have a call with you? This is actually
> > > blocking
> > > > our delivery, I actually want to sort with this.
> > > >
> > > > Thanks,
> > > > Nitty
> > > >
> > > > On Mon, Mar 13, 2023 at 8:18 PM NITTY BENNY 
> > > wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > > I really don't understand why a graceful shutdown will happen
> during
> > a
> > > > > commit operation? Am I understanding something wrong here?. I see
> > > > > this happens when I have a batch of 2 valid records and in the
> second
> > > > > batch the record is invalid. In that case I want to commit the
> valid
> > > > > records. So I called commit and sent an empty list for the current
> > > batch
> > > > to
> > > > > poll() and then when the next file comes in and poll sees new
> > records,
> > > I
> > > > > see InvalidProducerEpochException.
> > > > > Please advise me.
> > > > >
> > > > > Thanks,
> > > > > Nitty
> > > > >
> > > > > On Mon, Mar 13, 2023 at 5:33 PM NITTY BENNY 
> > > > wrote:
> > > > >
> > > > >> Hi Chris,
> > > > >>
> > > > >> The difference is in the Task Classes, no difference for value/key
> > > > >> convertors.
> > > > >>
> > > > >> I don’t see log messages for graceful shutdown. I am not clear on
> > what
> > > > >> you mean by shutting down the task.
> > > > >>
> > > > >> I called the commit operation for the successful records. Should I
> > > > >> perf

Re: Ask to join the contribution list.

2023-03-16 Thread Chris Egerton
Hi Gary,

You should be good to go now.

Cheers,

Chris

On Thu, Mar 16, 2023 at 10:14 AM Gary Lee  wrote:

> Hi,
>
> I just spot an issue related to Kafka Connect (
> https://github.com/apache/kafka/pull/13398)
>
> I think this issue has been opened at
> https://issues.apache.org/jira/browse/KAFKA-6891. I am willing to resolve
> this issue. But according to the rule I have to join the contribution list
> first. So I am asking for permission here.
>
> My JIRA id is garyparrottt.
>
> Many thanks in advance!
>


Re: Exactly once kafka connect query

2023-03-14 Thread Chris Egerton
Hi Nitty,

Sorry, I should have clarified. The reason I'm thinking about shutdown here
is that, when exactly-once support is enabled on a Kafka Connect cluster
and a new set of task configurations is generated for a connector, the
Connect framework makes an effort to shut down all the old task instances
for that connector, and then fences out the transactional producers for all
of those instances. I was thinking that this may lead to the producer
exceptions you are seeing but, after double-checking this assumption, that
does not appear to be the case.

Would it be possible to share the source code for your connector and a
reproduction scenario for what you're seeing? That may be easier than
coordinating a call.

Cheers,

Chris

On Tue, Mar 14, 2023 at 6:15 AM NITTY BENNY  wrote:

> Hi Chris,
>
> Is there any possibility to have a call with you? This is actually blocking
> our delivery, I actually want to sort with this.
>
> Thanks,
> Nitty
>
> On Mon, Mar 13, 2023 at 8:18 PM NITTY BENNY  wrote:
>
> > Hi Chris,
> >
> > I really don't understand why a graceful shutdown will happen during a
> > commit operation? Am I understanding something wrong here?. I see
> > this happens when I have a batch of 2 valid records and in the second
> > batch the record is invalid. In that case I want to commit the valid
> > records. So I called commit and sent an empty list for the current batch
> to
> > poll() and then when the next file comes in and poll sees new records, I
> > see InvalidProducerEpochException.
> > Please advise me.
> >
> > Thanks,
> > Nitty
> >
> > On Mon, Mar 13, 2023 at 5:33 PM NITTY BENNY 
> wrote:
> >
> >> Hi Chris,
> >>
> >> The difference is in the Task Classes, no difference for value/key
> >> convertors.
> >>
> >> I don’t see log messages for graceful shutdown. I am not clear on what
> >> you mean by shutting down the task.
> >>
> >> I called the commit operation for the successful records. Should I
> >> perform any other steps if I have an invalid record?
> >> Please advise.
> >>
> >> Thanks,
> >> Nitty
> >>
> >> On Mon, Mar 13, 2023 at 3:42 PM Chris Egerton 
> >> wrote:
> >>
> >>> Hi Nitty,
> >>>
> >>> Thanks again for all the details here, especially the log messages.
> >>>
> >>> > The below mentioned issue is happening for Json connector only. Is
> >>> there
> >>> any difference with asn1,binary,csv and json connector?
> >>>
> >>> Can you clarify if the difference here is in the Connector/Task
> classens,
> >>> or if it's in the key/value converters that are configured for the
> >>> connector? The key/value converters are configured using the
> >>> "key.converter" and "value.converter" property and, if problems arise
> >>> with
> >>> them, the task will fail and, if it has a non-empty ongoing
> transaction,
> >>> that transaction will be automatically aborted since we close the
> task's
> >>> Kafka producer when it fails (or shuts down gracefully).
> >>>
> >>> With regards to these log messages:
> >>>
> >>> > org.apache.kafka.common.errors.ProducerFencedException: There is a
> >>> newer
> >>> producer with the same transactionalId which fences the current one.
> >>>
> >>> It looks like your tasks aren't shutting down gracefully in time, which
> >>> causes them to be fenced out by the Connect framework later on. Do you
> >>> see
> >>> messages like "Graceful stop of task  failed" in the logs
> >>> for
> >>> your Connect worker?
> >>>
> >>> Cheers,
> >>>
> >>> Chris
> >>>
> >>> On Mon, Mar 13, 2023 at 10:58 AM NITTY BENNY 
> >>> wrote:
> >>>
> >>> > Hi Chris,
> >>> >
> >>> > As you said, the below message is coming when I call an abort if
> there
> >>> is
> >>> > an invalid record, then for the next transaction I can see the below
> >>> > message and then the connector will be stopped.
> >>> > 2023-03-13 14:28:26,043 INFO [json-sftp-source-connector|task-0]
> >>> Aborting
> >>> > transaction for batch as requested by connector
> >>> > (org.apache.kafka.connect.runtime.ExactlyOnceWorkerSourceTask)
> >>> > [task-thread-json-sftp-source-connector-0]
> >>

Re: Exactly once kafka connect query

2023-03-13 Thread Chris Egerton
-thread |
> >> connector-producer-json-sftp-source-connector-0]
> >> 2023-03-12 11:32:45,220 ERROR [json-sftp-source-connector|task-0]
> >> ExactlyOnceWorkerSourceTask{id=json-sftp-source-connector-0} failed to
> send
> >> record to streams-input:
> >> (org.apache.kafka.connect.runtime.AbstractWorkerSourceTask)
> >> [kafka-producer-network-thread |
> >> connector-producer-json-sftp-source-connector-0]
> >> org.apache.kafka.common.errors.InvalidProducerEpochException: Producer
> >> attempted to produce with an old epoch.
> >> 2023-03-12 11:32:45,222 INFO [json-sftp-source-connector|task-0]
> >> [Producer clientId=connector-producer-json-sftp-source-connector-0,
> >> transactionalId=connect-cluster-json-sftp-source-connector-0]
> Transiting to
> >> fatal error state due to
> >> org.apache.kafka.common.errors.ProducerFencedException: There is a newer
> >> producer with the same transactionalId which fences the current one.
> >> (org.apache.kafka.clients.producer.internals.TransactionManager)
> >> [kafka-producer-network-thread |
> >> connector-producer-json-sftp-source-connector-0]
> >> 2023-03-12 11:32:45,222 ERROR [json-sftp-source-connector|task-0]
> >> [Producer clientId=connector-producer-json-sftp-source-connector-0,
> >> transactionalId=connect-cluster-json-sftp-source-connector-0] Aborting
> >> producer batches due to fatal error
> >> (org.apache.kafka.clients.producer.internals.Sender)
> >> [kafka-producer-network-thread |
> >> connector-producer-json-sftp-source-connector-0]
> >> org.apache.kafka.common.errors.ProducerFencedException: There is a newer
> >> producer with the same transactionalId which fences the current one.
> >> 2023-03-12 11:32:45,222 ERROR [json-sftp-source-connector|task-0]
> >> ExactlyOnceWorkerSourceTask{id=json-sftp-source-connector-0} Failed to
> >> flush offsets to storage:
> >> (org.apache.kafka.connect.runtime.ExactlyOnceWorkerSourceTask)
> >> [kafka-producer-network-thread |
> >> connector-producer-json-sftp-source-connector-0]
> >> org.apache.kafka.common.errors.ProducerFencedException: There is a newer
> >> producer with the same transactionalId which fences the current one.
> >> 2023-03-12 11:32:45,224 ERROR [json-sftp-source-connector|task-0]
> >> ExactlyOnceWorkerSourceTask{id=json-sftp-source-connector-0} failed to
> send
> >> record to streams-input:
> >> (org.apache.kafka.connect.runtime.AbstractWorkerSourceTask)
> >> [kafka-producer-network-thread |
> >> connector-producer-json-sftp-source-connector-0]
> >> org.apache.kafka.common.errors.ProducerFencedException: There is a newer
> >> producer with the same transactionalId which fences the current one.
> >> 2023-03-12 11:32:45,222 ERROR
> [json-sftp-source-connector|task-0|offsets]
> >> ExactlyOnceWorkerSourceTask{id=json-sftp-source-connector-0} Failed to
> >> commit producer transaction
> >> (org.apache.kafka.connect.runtime.ExactlyOnceWorkerSourceTask)
> >> [task-thread-json-sftp-source-connector-0]
> >> org.apache.kafka.common.errors.ProducerFencedException: There is a newer
> >> producer with the same transactionalId which fences the current one.
> >> 2023-03-12 11:32:45,225 ERROR [json-sftp-source-connector|task-0]
> >> ExactlyOnceWorkerSourceTask{id=json-sftp-source-connector-0} Task threw
> an
> >> uncaught and unrecoverable exception. Task is being killed and will not
> >> recover until manually restarted
> >> (org.apache.kafka.connect.runtime.WorkerTask)
> >> [task-thread-json-sftp-source-connector-0]
> >>
> >> Do you know why it is showing an abort state even if I call commit?
> >>
> >> I tested one more scenario, When I call the commit I saw the below
> >>
> connect-cluster-json-sftp-source-connector-0::TransactionMetadata(transactionalId=connect-cluster-json-sftp-source-connector-0,
> >> producerId=11, producerEpoch=2, txnTimeoutMs=6, state=*Ongoing*,
> >> pendingState=None, topicPartitions=HashSet(streams-input-2),
> >> txnStartTimestamp=1678620463834, txnLastUpdateTimestamp=1678620463834)
> >> Then, before changing the states to Abort, I dropped the next file then
> I
> >> dont see any issues. Previous transaction
> >> as well as the current transaction are committed.
> >>
> >> Thank you for your support.
> >>
> >> Thanks,
> >> Nitty
> >>
> >> On Fri, Mar 10, 2023 at 8:04 PM Chris Egerton 
> >> wrote:
> >>
> &g

Re: Exactly once kafka connect query

2023-03-10 Thread Chris Egerton
Hi Nitty,

> I called commitTransaction when I reach the first error record, but
commit is not happening for me. Kafka connect tries to abort the
transaction automatically

This is really interesting--are you certain that your task never invoked
TransactionContext::abortTransaction in this case? I'm looking over the
code base and it seems fairly clear that the only thing that could trigger
a call to KafkaProducer::abortTransaction is a request by the task to abort
a transaction (either for a next batch, or for a specific record). It may
help to run the connector in a debugger and/or look for "Aborting
transaction for batch as requested by connector" or "Aborting transaction
for record on topic  as requested by connector" log
messages (which will be emitted at INFO level by
the org.apache.kafka.connect.runtime.ExactlyOnceWorkerSourceTask class if
the task is requesting an abort).

Regardless, I'll work on a fix for the bug with aborting empty
transactions. Thanks for helping uncover that one!

Cheers,

Chris

On Thu, Mar 9, 2023 at 6:36 PM NITTY BENNY  wrote:

> Hi Chris,
>
> We have a use case to commit previous successful records and stop the
> processing of the current file and move on with the next file. To achieve
> that I called commitTransaction when I reach the first error record, but
> commit is not happening for me. Kafka connect tries to abort the
> transaction automatically, I checked the _transaction_state topic and
> states marked as PrepareAbort and CompleteAbort. Do you know why kafka
> connect automatically invokes abort instead of the implicit commit I
> called?
> Then as a result, when I tries to parse the next file - say ABC, I saw the
> logs "Aborting incomplete transaction" and ERROR: "Failed to sent record to
> topic", and we lost the first batch of records from the current transaction
> in the file ABC.
>
> Is it possible that there's a case where an abort is being requested while
> the current transaction is empty (i.e., the task hasn't returned any
> records from SourceTask::poll since the last transaction was
> committed/aborted)? --- Yes, that case is possible for us. There is a case
> where the first record itself an error record.
>
> Thanks,
> Nitty
>
> On Thu, Mar 9, 2023 at 3:48 PM Chris Egerton 
> wrote:
>
> > Hi Nitty,
> >
> > Thanks for the code examples and the detailed explanations, this is
> really
> > helpful!
> >
> > > Say if I have a file with 5 records and batch size is 2, and in my 3rd
> > batch I have one error record then in that batch, I dont have a valid
> > record to call commit or abort. But I want to commit all the previous
> > batches that were successfully parsed. How do I do that?
> >
> > An important thing to keep in mind with the TransactionContext API is
> that
> > all records that a task returns from SourceTask::poll are implicitly
> > included in a transaction. Invoking SourceTaskContext::transactionContext
> > doesn't alter this or cause transactions to start being used; everything
> is
> > already in a transaction, and the Connect runtime automatically begins
> > transactions for any records it sees from the task if it hasn't already
> > begun one. It's also valid to return a null or empty list of records from
> > SourceTask::poll. So in this case, you can invoke
> > transactionContext.commitTransaction() (the no-args variant) and return
> an
> > empty batch from SourceTask::poll, which will cause the transaction
> > containing the 4 valid records that were returned in the last 2 batches
> to
> > be committed.
> >
> > FWIW, I would be a little cautious about this approach. Many times it's
> > better to fail fast on invalid data; it might be worth it to at least
> allow
> > users to configure whether the connector fails on invalid data, or
> silently
> > skips over it (which is what happens when transactions are aborted).
> >
> > > Why is abort not working without adding the last record to the list?
> >
> > Is it possible that there's a case where an abort is being requested
> while
> > the current transaction is empty (i.e., the task hasn't returned any
> > records from SourceTask::poll since the last transaction was
> > committed/aborted)? I think this may be a bug in the Connect framework
> > where we don't check to see if a transaction is already open when a task
> > requests that a transaction be aborted, which can cause tasks to fail
> (see
> > https://issues.apache.org/jira/browse/KAFKA-14799 for more details).
> >
> > Cheers,
> >
> > Chris
> >
> >
> > On Wed, Mar 8, 2023 at 6:44 PM NITTY BENNY  wrote:

Re: Exactly once kafka connect query

2023-03-09 Thread Chris Egerton
hen in that batch, I dont have a valid record to call commit or abort. But
>> I want to commit all the previous batches that were successfully parsed.
>> How do I do that?
>>
>> Second use case is where I want to abort a transaction if the record
>> count doesn't match.
>> Code Snippet :
>> [image: image.png]
>> There are no error records in this case. If you see I added the condition
>> of transactionContext check to implement exactly once, without
>> transaction it was just throwing the exception without calling the
>> addLastRecord() method and in the catch block it just logs the message and
>> return the list of records without the last record to poll().To make it
>> work, I called the method addLastRecord() in this case, so it is not
>> throwing the exception and list has last record as well. Then I called the
>> abort, everything got aborted. Why is abort not working without adding the
>> last record to the list?
>> [image: image.png]
>>
>> Code to call abort.
>>
>>
>>
>>
>> Thanks,
>> Nitty
>>
>> On Wed, Mar 8, 2023 at 4:26 PM Chris Egerton 
>> wrote:
>>
>>> Hi Nitty,
>>>
>>> I'm a little confused about what you mean by this part:
>>>
>>> > transaction is not getting completed because it is not commiting the
>>> transaction offest.
>>>
>>> The only conditions required for a transaction to be completed when a
>>> connector is defining its own transaction boundaries are:
>>>
>>> 1. The task requests a transaction commit/abort from the
>>> TransactionContext
>>> 2. The task returns a batch of records from SourceTask::poll (and, if
>>> using
>>> the per-record API provided by the TransactionContext class, includes at
>>> least one record that should trigger a transaction commit/abort in that
>>> batch)
>>>
>>> The Connect runtime should automatically commit source offsets to Kafka
>>> whenever a transaction is completed, either by commit or abort. This is
>>> because transactions should only be aborted for data that should never be
>>> re-read by the connector; if there is a validation error that should be
>>> handled by reconfiguring the connector, then the task should throw an
>>> exception instead of aborting the transaction.
>>>
>>> If possible, do you think you could provide a brief code snippet
>>> illustrating what your task is doing that's causing issues?
>>>
>>> Cheers,
>>>
>>> Chris (not Chrise )
>>>
>>> On Tue, Mar 7, 2023 at 10:17 AM NITTY BENNY 
>>> wrote:
>>>
>>> > Hi Chrise,
>>> >
>>> > Thanks for sharing the details.
>>> >
>>> > Regarding the use case, For Asn1 source connector we have a use case to
>>> > validate number of records in the file with the number of records in
>>> the
>>> > header. So currently, if validation fails we are not sending the last
>>> > record to the topic. But after introducing exactly once with connector
>>> > transaction boundary, I can see that if I call an abort when the
>>> validation
>>> > fails, transaction is not getting completed because it is not
>>> commiting the
>>> > transaction offest. I saw that transaction state changed to
>>> CompleteAbort.
>>> > So for my next transaction I am getting InvalidProducerEpochException
>>> and
>>> > then task stopped after that. I tried calling the abort after sending
>>> last
>>> > record to the topic then transaction getting completed.
>>> >
>>> > I dont know if I am doing anything wrong here.
>>> >
>>> > Please advise.
>>> > Thanks,
>>> > Nitty
>>> >
>>> > On Tue 7 Mar 2023 at 2:21 p.m., Chris Egerton >> >
>>> > wrote:
>>> >
>>> > > Hi Nitty,
>>> > >
>>> > > We've recently added some documentation on implementing exactly-once
>>> > source
>>> > > connectors here:
>>> > >
>>> >
>>> https://kafka.apache.org/documentation/#connect_exactlyoncesourceconnectors
>>> > > .
>>> > > To quote a relevant passage from those docs:
>>> > >
>>> > > > In order for a source connector to take advantage of this support,
>>> it
>>> > > must be able to provide meaningful source offsets for each record
>>&g

Re: Exactly once kafka connect query

2023-03-08 Thread Chris Egerton
Hi Nitty,

I'm a little confused about what you mean by this part:

> transaction is not getting completed because it is not commiting the
transaction offest.

The only conditions required for a transaction to be completed when a
connector is defining its own transaction boundaries are:

1. The task requests a transaction commit/abort from the TransactionContext
2. The task returns a batch of records from SourceTask::poll (and, if using
the per-record API provided by the TransactionContext class, includes at
least one record that should trigger a transaction commit/abort in that
batch)

The Connect runtime should automatically commit source offsets to Kafka
whenever a transaction is completed, either by commit or abort. This is
because transactions should only be aborted for data that should never be
re-read by the connector; if there is a validation error that should be
handled by reconfiguring the connector, then the task should throw an
exception instead of aborting the transaction.

If possible, do you think you could provide a brief code snippet
illustrating what your task is doing that's causing issues?

Cheers,

Chris (not Chrise )

On Tue, Mar 7, 2023 at 10:17 AM NITTY BENNY  wrote:

> Hi Chrise,
>
> Thanks for sharing the details.
>
> Regarding the use case, For Asn1 source connector we have a use case to
> validate number of records in the file with the number of records in the
> header. So currently, if validation fails we are not sending the last
> record to the topic. But after introducing exactly once with connector
> transaction boundary, I can see that if I call an abort when the validation
> fails, transaction is not getting completed because it is not commiting the
> transaction offest. I saw that transaction state changed to CompleteAbort.
> So for my next transaction I am getting InvalidProducerEpochException and
> then task stopped after that. I tried calling the abort after sending last
> record to the topic then transaction getting completed.
>
> I dont know if I am doing anything wrong here.
>
> Please advise.
> Thanks,
> Nitty
>
> On Tue 7 Mar 2023 at 2:21 p.m., Chris Egerton 
> wrote:
>
> > Hi Nitty,
> >
> > We've recently added some documentation on implementing exactly-once
> source
> > connectors here:
> >
> https://kafka.apache.org/documentation/#connect_exactlyoncesourceconnectors
> > .
> > To quote a relevant passage from those docs:
> >
> > > In order for a source connector to take advantage of this support, it
> > must be able to provide meaningful source offsets for each record that it
> > emits, and resume consumption from the external system at the exact
> > position corresponding to any of those offsets without dropping or
> > duplicating messages.
> >
> > So, as long as your source connector is able to use the Kafka Connect
> > framework's offsets API correctly, it shouldn't be necessary to make any
> > other code changes to the connector.
> >
> > To enable exactly-once support for source connectors on your Connect
> > cluster, see the docs section here:
> > https://kafka.apache.org/documentation/#connect_exactlyoncesource
> >
> > With regard to transactions, a transactional producer is always created
> > automatically for your connector by the Connect runtime when exactly-once
> > support is enabled on the worker. The only reason to set
> > "transaction.boundary" to "connector" is if your connector would like to
> > explicitly define its own transaction boundaries. In this case, it sounds
> > like may be what you want; I just want to make sure to call out that in
> > either case, you should not be directly instantiating a producer in your
> > connector code, but let the Kafka Connect runtime do that for you, and
> just
> > worry about returning the right records from SourceTask::poll (and
> possibly
> > defining custom transactions using the TransactionContext API).
> >
> > With respect to your question about committing or aborting in certain
> > circumstances, it'd be useful to know more about your use case, since it
> > may not be necessary to define custom transaction boundaries in your
> > connector at all.
> >
> > Cheers,
> >
> > Chris
> >
> >
> >
> > On Tue, Mar 7, 2023 at 7:21 AM NITTY BENNY  wrote:
> >
> > > Hi Team,
> > >
> > > Adding on top of this, I tried creating a TransactionContext object and
> > > calling the commitTransaction and abortTranaction methods in source
> > > connectors.
> > > But the main problem I saw is that if there is any error while parsing
> > the
> > > record, connect is cal

Re: Exactly once kafka connect query

2023-03-07 Thread Chris Egerton
Hi Nitty,

We've recently added some documentation on implementing exactly-once source
connectors here:
https://kafka.apache.org/documentation/#connect_exactlyoncesourceconnectors.
To quote a relevant passage from those docs:

> In order for a source connector to take advantage of this support, it
must be able to provide meaningful source offsets for each record that it
emits, and resume consumption from the external system at the exact
position corresponding to any of those offsets without dropping or
duplicating messages.

So, as long as your source connector is able to use the Kafka Connect
framework's offsets API correctly, it shouldn't be necessary to make any
other code changes to the connector.

To enable exactly-once support for source connectors on your Connect
cluster, see the docs section here:
https://kafka.apache.org/documentation/#connect_exactlyoncesource

With regard to transactions, a transactional producer is always created
automatically for your connector by the Connect runtime when exactly-once
support is enabled on the worker. The only reason to set
"transaction.boundary" to "connector" is if your connector would like to
explicitly define its own transaction boundaries. In this case, it sounds
like may be what you want; I just want to make sure to call out that in
either case, you should not be directly instantiating a producer in your
connector code, but let the Kafka Connect runtime do that for you, and just
worry about returning the right records from SourceTask::poll (and possibly
defining custom transactions using the TransactionContext API).

With respect to your question about committing or aborting in certain
circumstances, it'd be useful to know more about your use case, since it
may not be necessary to define custom transaction boundaries in your
connector at all.

Cheers,

Chris



On Tue, Mar 7, 2023 at 7:21 AM NITTY BENNY  wrote:

> Hi Team,
>
> Adding on top of this, I tried creating a TransactionContext object and
> calling the commitTransaction and abortTranaction methods in source
> connectors.
> But the main problem I saw is that if there is any error while parsing the
> record, connect is calling an abort but we have a use case to call commit
> in some cases. Is it a valid use case in terms of kafka connect?
>
> Another Question - Should I use a transactional producer instead
> creating an object of TransactionContext? Below is the connector
> configuration I am using.
>
>   exactly.once.support: "required"
>   transaction.boundary: "connector"
>
> Could you please help me here?
>
> Thanks,
> Nitty
>
> On Tue, Mar 7, 2023 at 12:29 AM NITTY BENNY  wrote:
>
> > Hi Team,
> > I am trying to implement exactly once behavior in our source connector.
> Is
> > there any sample source connector implementation available to have a look
> > at?
> > Regards,
> > Nitty
> >
>


[ANNOUNCE] Apache Kafka 3.3.2

2023-01-23 Thread Chris Egerton
The Apache Kafka community is pleased to announce the release for Apache
Kafka 3.3.2

Apache Kafka 3.3.2 is a bugfix release and it contains, among other things,
fixes for 20 issues reported since 3.3.1.

All of the changes in this release can be found in the release notes:
https://www.apache.org/dist/kafka/3.3.2/RELEASE_NOTES.html


You can download the source and binary release (Scala 2.12 and 2.13) from:
https://kafka.apache.org/downloads#3.3.2

---


Apache Kafka is a distributed streaming platform with four core APIs:


** The Producer API allows an application to publish a stream of records to
one or more Kafka topics.

** The Consumer API allows an application to subscribe to one or more
topics and process the stream of records produced to them.

** The Streams API allows an application to act as a stream processor,
consuming an input stream from one or more topics and producing an
output stream to one or more output topics, effectively transforming the
input streams to output streams.

** The Connector API allows building and running reusable producers or
consumers that connect Kafka topics to existing applications or data
systems. For example, a connector to a relational database might
capture every change to a table.


With these APIs, Kafka can be used for two broad classes of application:

** Building real-time streaming data pipelines that reliably get data
between systems or applications.

** Building real-time streaming applications that transform or react
to the streams of data.


Apache Kafka is in use at large and small companies worldwide, including
Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest, Rabobank,
Target, The New York Times, Uber, Yelp, and Zalando, among others.

A big thank you for the following 39 contributors to this release!

A. Sophie Blee-Goldman, Alyssa Huang, Artem Livshits, Bill Bejeck, Calvin
Liu, Chia-Ping Tsai, Chris Egerton, Christo Lolov, Colin Patrick McCabe,
Dan Stelljes, David Arthur, David Jacot, Divij Vaidya, FUNKYE, Greg Harris,
Huilin Shi, Igor Soarez, Ismael Juma, Jason Gustafson, Jeff Kim, Jorge
Esteban Quilcate Otoya, José Armando García Sancio, Justine Olshan, Kirk
True, liuzhuang2017, Lucas Brutschy, Luke Chen, Matthias J. Sax, Mickael
Maison, Niket, Pratim SC, Purshotam Chauhan, Rohan, Ron Dagostino, Shawn,
srishti-saraswat, Sushant Mahajan, Vicky Papavasileiou, zou shengfu

We welcome your help and feedback. For more information on how to
report problems, and to get involved, visit the project website at
https://kafka.apache.org/

Thank you!


Regards,

Chris


Re: JIRA access

2023-01-23 Thread Chris Egerton
Hi Titouan,

I've added you to our Jira project; you should be good to go now.

Cheers,

Chris

On Mon, Jan 23, 2023 at 10:41 AM Titouan Chary 
wrote:

> Hi,
>
> It seems that JIRA access are disabled by default. Is it the right email to
> reach out in order to get a new JIRA account. I would like to participate
> and give additional data points to the following ticket:
> https://issues.apache.org/jira/browse/KAFKA-13077
>
> Thanks in advance,
> Regards,
> Titouan Chary
>


[RESULTS] [VOTE] Release Apache Kafka version 3.3.2

2023-01-11 Thread Chris Egerton
This vote passes with 8 +1 votes (3 bindings) and no 0 or -1 votes.

+1 votes
PMC Members:
* Manikumar Reddy
* Mickael Maison
* Tom Bentley

Committers:
* Satish Duggana

Community:
* Yash Mayya
* Divij Vaidya
* Jakub Scholz
* Federico Valeri

0 votes
* No votes

-1 votes
* No votes

Vote thread:
https://lists.apache.org/thread/dqcjd6srrqhfn7bpo97hv2gny7ob42q4

I'll continue with the release process and the release announcement will
follow in the next few days.

Thank you to all who participated in this release!

Cheers,

Chris


[VOTE] 3.3.2 RC1

2022-12-21 Thread Chris Egerton
Hello Kafka users, developers and client-developers,

This is the second candidate for release of Apache Kafka 3.3.2.

This is a bugfix release with several fixes since the release of 3.3.1. A
few of the major issues include:

* KAFKA-14358 Users should not be able to create a regular topic name
__cluster_metadata
KAFKA-14379 Consumer should refresh preferred read replica on update
metadata
* KAFKA-13586 Prevent exception thrown during connector update from
crashing distributed herder


Release notes for the 3.3.2 release:
https://home.apache.org/~cegerton/kafka-3.3.2-rc1/RELEASE_NOTES.html

*** Please download, test and vote by Friday, January 6, 2023, 10pm UTC
(this date is chosen to accommodate the various upcoming holidays that
members of the community will be taking and give everyone enough time to
test out the release candidate, without unduly delaying the release)

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~cegerton/kafka-3.3.2-rc1/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~cegerton/kafka-3.3.2-rc1/javadoc/

* Tag to be voted upon (off 3.3 branch) is the 3.3.2 tag:
https://github.com/apache/kafka/releases/tag/3.3.2-rc1

* Documentation:
https://kafka.apache.org/33/documentation.html

* Protocol:
https://kafka.apache.org/33/protocol.html

The most recent build has had test failures. These all appear to be due to
flakiness, but it would be nice if someone more familiar with the failed
tests could confirm this. I may update this thread with passing build links
if I can get one, or start a new release vote thread if test failures must
be addressed beyond re-running builds until they pass.

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.3/142/testReport/

José, would it be possible to re-run the system tests for 3.3 on the latest
commit for 3.3 (e3212f2), and share the results on this thread?

Cheers,

Chris


[VOTE] 3.3.2 RC0

2022-12-15 Thread Chris Egerton
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.3.2.

This is a bugfix release with several fixes since the release of 3.3.1. A
few of the major issues include:

* KAFKA-14358 Users should not be able to create a regular topic name
__cluster_metadata
KAFKA-14379 Consumer should refresh preferred read replica on update
metadata
* KAFKA-13586 Prevent exception thrown during connector update from
crashing distributed herder


Release notes for the 3.3.2 release:
https://home.apache.org/~cegerton/kafka-3.3.2-rc0/RELEASE_NOTES.html



*** Please download, test and vote by Tuesday, December 20, 10pm UTC

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~cegerton/kafka-3.3.2-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~cegerton/kafka-3.3.2-rc0/javadoc/

* Tag to be voted upon (off 3.3 branch) is the 3.3.2 tag:
https://github.com/apache/kafka/releases/tag/3.3.2-rc0

* Documentation:
https://kafka.apache.org/33/documentation.html

* Protocol:
https://kafka.apache.org/33/protocol.html

The most recent build has had test failures. These all appear to be due to
flakiness, but it would be nice if someone more familiar with the failed
tests could confirm this. I may update this thread with passing build links
if I can get one, or start a new release vote thread if test failures must
be addressed beyond re-running builds until they pass.

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.3/135/testReport/

System tests:
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1670984851--apache--3.3--22af3f29ce/2022-12-13--001./2022-12-13--001./report.html
(initial with three flaky failures)
Follow-up system tests:
https://home.apache.org/~cegerton/system_tests/2022-12-14--015/report.html,
https://home.apache.org/~cegerton/system_tests/2022-12-14--016/report.html,
http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1671061000--apache--3.3--69fbaf2457/2022-12-14--001./2022-12-14--001./report.html

(Note that the exact commit used for some of the system test runs will not
precisely match the commit for the release candidate, but that all
differences between those two commits should have no effect on the
relevance or accuracy of the test results.)

Thanks,

Chris


[ANNOUNCE] Call for papers: Kafka Summit London 2023

2022-11-30 Thread Chris Egerton
Hi everyone,

The call for papers (https://sessionize.com/kafka-summit-london-2023/) is
now open for Kafka Summit London 2023, and you are all welcome to submit a
talk.

We are looking for the most interesting, most informative, most advanced,
and most generally applicable talks on Apache Kafka® and the tools,
technologies, and techniques in the Kafka ecosystem.

People from all industries, backgrounds, and experience levels are
encouraged to submit!

If you have any questions about submitting, reach out to Danica Fine, the
program chair, at df...@confluent.io.

The call for papers closes on Monday, January 9 2022 at 23:59 GMT.

Thanks,

Chris


Re: Granting permission for join jira

2022-11-09 Thread Chris Egerton
Hi,

You should be good to go now.

Cheers,

Chris

On Wed, Nov 9, 2022 at 12:52 AM hzhkafka  wrote:

> Jira id: hzh0425@apache
>


Re: [ANNOUNCE] New Kafka PMC Member: Bruno Cadonna

2022-11-01 Thread Chris Egerton
Congrats!

On Tue, Nov 1, 2022, 15:44 Bill Bejeck  wrote:

> Congrats Bruno! Well deserved.
>
> -Bill
>
> On Tue, Nov 1, 2022 at 3:36 PM Guozhang Wang  wrote:
>
> > Hi everyone,
> >
> > I'd like to introduce our new Kafka PMC member, Bruno.
> >
> > Bruno has been a committer since April. 2021 and has been very active in
> > the community. He's a key contributor to Kafka Streams, and also helped
> > review a lot of horizontal improvements such as Mockito. It is my
> pleasure
> > to announce that Bruno has agreed to join the Kafka PMC.
> >
> > Congratulations, Bruno!
> >
> > -- Guozhang Wang, on behalf of Apache Kafka PMC
> >
>


Re: Granting permission for Kafka Contributor

2022-10-28 Thread Chris Egerton
Hi,

You should be good to go now.

Cheers,

Chris

On Fri, Oct 28, 2022 at 4:14 PM yuachieve1234 <260738...@qq.com.invalid>
wrote:

> Jira ID:yuxj109
>
>
>
>
> yuachieve1234
> 260738...@qq.com
>
>
>
> 


Re: Entire Kafka Connect cluster stuck because of a stuck sink connector

2022-10-12 Thread Chris Egerton
Hi,

What version of Kafka Connect are you running? This sounds like a bug that
was fixed a few releases ago.

Cheers,

Chris

On Wed, Oct 12, 2022, 21:27 Hemanth Savasere 
wrote:

> We have stumbled upon an issue on a running cluster with multiple
> source/sink connectors:
>
>1. One of our connectors was a JDBC sink connector connected to an SQL
>Server database (using the oracle JDBC driver).
>2. It turns out that the DB instance had a problem causing all queries
>to be stuck forever, which in turn made the start method of the
> connector
>hang forever.
>3. After some time, the entire Kafka Connect cluster was unavailable and
>the REST API was not responding giving
> {"error_code":500,"message":"Request
>timed out"} for most requests.
>4. Pausing (just before the deletion of the consumer group) or deleting
>the problematic connector allowed the cluster to run normally again.
>
> We could reproduce the same issue by adding Thread.sleep(30) in the
> start method or in the put method of the ConnectorTask.
>
> Wanted to know if there's any wiki/documentation provided that mentions how
> to handle this issue. My approach would be to throw a timeout after waiting
> for a particular time period and make the connector fail fast.
>
> --
> Thanks & Regards,
> Hemanth
>


Re: Apply to be a contributor of kafka

2022-10-09 Thread Chris Egerton
Hi Junyang,

You should be good to go now, and I've also assigned the ticket to you.

Cheers,

Chris

On Sun, Oct 9, 2022 at 11:36 AM Junyang Liu  wrote:

> Hi,
> I’m a developer of kafka, and want to contribute to the project. I have
> made a issue and a PR resolving the issue(Kafka-14285), but I cannot assign
> the issue to myself. Can I apply to be a contributor of kafka?
>
>
> Thank you
> — Junyang Liu


Re: [ANNOUNCE] New Kafka PMC Member: A. Sophie Blee-Goldman

2022-08-02 Thread Chris Egerton
Congrats, Sophie!

On Mon, Aug 1, 2022 at 9:21 PM Luke Chen  wrote:

> Congrats Sophie! :)
>
> Luke
>
> On Tue, Aug 2, 2022 at 7:56 AM Adam Bellemare 
> wrote:
>
> > Congratulations Sophie! I’m glad to see you made as a PMC member! Well
> > earned.
> >
> > > On Aug 1, 2022, at 6:42 PM, Guozhang Wang  wrote:
> > >
> > > Hi everyone,
> > >
> > > I'd like to introduce our new Kafka PMC member, Sophie. She has been a
> > > committer since Oct. 2020 and has been contributing to the community
> > > consistently, especially around Kafka Streams and Kafka java consumer.
> > She
> > > has also presented about Kafka Streams at Kafka Summit London this
> year.
> > It
> > > is my pleasure to announce that Sophie agreed to join the Kafka PMC.
> > >
> > > Congratulations, Sophie!
> > >
> > > -- Guozhang Wang, on behalf of Apache Kafka PMC
> >
>


Re: [ANNOUNCE] New Committer: Chris Egerton

2022-07-25 Thread Chris Egerton
Thanks to Mickael and the PMC for this privilege, and to everyone here for
their well wishes. I look forward to continuing to work with this wonderful
community.

Cheers,

Chris

On Mon, Jul 25, 2022, 17:39 Anna McDonald  wrote:

> Congratulations Chris! Time to Cellobrate!
>
> anna
>
> On Mon, Jul 25, 2022 at 4:23 PM Martin Gainty  wrote:
>
> > Congratulations Chris!
> >
> > martin~
> > 
> > From: Mickael Maison 
> > Sent: Monday, July 25, 2022 12:25 PM
> > To: dev ; Users 
> > Subject: [ANNOUNCE] New Committer: Chris Egerton
> >
> > Hi all,
> >
> > The PMC for Apache Kafka has invited Chris Egerton as a committer, and
> > we are excited to announce that he accepted!
> >
> > Chris has been contributing to Kafka since 2017. He has made over 80
> > commits mostly around Kafka Connect. His most notable contributions
> > include KIP-507: Securing Internal Connect REST Endpoints and KIP-618:
> > Exactly-Once Support for Source Connectors.
> >
> > He has been an active participant in discussions and reviews on the
> > mailing lists and on Github.
> >
> > Thanks for all of your contributions Chris. Congratulations!
> >
> > -- Mickael, on behalf of the Apache Kafka PMC
> >
>


Re: a little problem in quickstart

2022-06-26 Thread Chris Egerton
Hi Mason,

You're correct that the quickstart should use 'libs' instead of 'lib'. This
has already been fixed in the docs for the upcoming 3.3.0 release with
https://github.com/apache/kafka/pull/12252. We might consider backporting
that change; I've CC'd Luke Chen, who merged that fix and might be able to
help with backporting it (I'd take it on myself but I'm not well-versed in
how the site docs work, especially with making changes for already-released
versions).

Cheers,

Chris

On Sun, Jun 26, 2022 at 11:12 AM Men Lim  wrote:

> You don't need to put in the jar file name in the plug in.path variable.
> Something like plugin.path=/kafka/plugin. Then have the jar file in that
> plugin folder. Restart the worker and it will pick it up.
>
> On Sun, Jun 26, 2022 at 8:04 AM mason lee  wrote:
>
> >  Hi I’m new to Kafka and i can not pass step 6 in
> > https://kafka.apache.org/quickstart, finally I found that the word ‘lib’
> > in
> > 'echo "plugin.path=lib/connect-file-3.2.0.jar’ should be ‘libs’.
> > It bothered me for a while, I think a change would be better.
> >
>


Re: Kafka Connect - offset.storage.topic reuse across clusters

2022-03-30 Thread Chris Egerton
Connectors overwriting each other's offsets is the primary concern. If you
have a guarantee that there will only ever be one connector with a given
name running at once on any of the Connect clusters that use the same
offsets topic, and you want offsets to be shared for all source connectors
on any of those clusters, then that concern is addressed. It does inflict
an operational burden on the administrators for your Connect clusters, and
for people creating/managing connectors on those clusters. But if you're
willing to accept that burden and the footguns it comes with, this is an
option for you.

Also worth noting that this would actually cause cross-Connect-cluster
offset tracking logic to behave the same for source connectors and sink
connectors, which already commit consumer offsets to Kafka based solely on
connector name and with no distinction made between which Connect cluster
the connector is running on. (This can technically be addressed by manually
overriding the sink connector's group ID; I'm just outlining the default
behavior.)

One other potential cause for concern is that Connect workers do a read to
the end of the offsets topic every time a source task reads offsets, so if
you're hammering the offsets topic with a ton of writes from across several
Connect clusters, there may be a performance impact for source connectors
that read offsets frequently. But this shouldn't be any different than
running a monolithic cluster with the same number of workers as the sum of
all workers across your multi-cluster setup, and it's generally not a good
idea for source connectors to read offsets apart from when they're starting
up.

I'm mostly curious about the motivation to use a different group ID,
though--if failover is the idea here, is there any specific scenario you
have in mind that makes this option less appealing?


On Wed, Mar 30, 2022, 11:42 Jordan Wyatt  wrote:

> Hi Robin,
>
> I'm interested in a use case in which I need to be able to have a connect
> cluster fail, and then bring up a new cluster with the same offset topics
> and connectors. By new cluster I mean a cluster with a new `group.id`. I
> am
> aware I could just use the same group id as before but I would like to
> explore this route.
>
> I'm keen to learn more about the reasons the described case above, and
> those in my original thread, aren't recommended.
>
> Thank you,
> Jordan
>
> On Wed, 30 Mar 2022 at 14:00, Robin Moffatt 
> wrote:
>
> > Hi Jordan,
> >
> > Is there a good reason for wanting to do this? I can think of multiple
> > reasons why you shouldn't do this even if technically it works in some
> > cases.
> > Or it's just curiosity as to whether you can/should?
> >
> > thanks, Robin.
> >
> >
> > --
> >
> > Robin Moffatt | Principal Developer Advocate | ro...@confluent.io |
> @rmoff
> >
> >
> > On Wed, 30 Mar 2022 at 13:36, Jordan Wyatt  wrote:
> >
> > > Hi,
> > >
> > > I've recently been experimenting with setting the values of the
> `offset,`
> > > `storage` and `status` topics within Kafka Connect.
> > >
> > > I'm aware from various sources (Robin Moffatt blogs, StackOverflow,
> > > Confluent Kafka Connect docs) that these topics should not be shared
> > across
> > > different connect **clusters**.  e.g for each  unique set of workers
> > with a
> > > given `group.id`, a unique set of internal storage topics should be
> > used.
> > >
> > > These discussions and documentations usually talk about sharing all
> three
> > > topics at once, however, I am interested in reusing only the offset
> > storage
> > > topic. I am struggling to find the risks of sharing this offset topic
> > > between different connect clusters.
> > >
> > > I'm aware of issues with sharing the config and status topics from
> blogs
> > > and my own testing (clusters can end up running connectors from other
> > > clusters, for example), but I cannot find a case for not sharing the
> > offset
> > > topic despite guidance to avoid this.
> > >
> > > The use cases I am interested in are:
> > >
> > > 1. Sharing an offset topic between clusters, but never in parallel.
> > >
> > >
> > > *e.g cluster 1 running connector A uses the offset topic, cluster 1 and
> > > connector A are deleted, then cluster 2 running connector B is created
> > uses
> > > the offset topic. *
> > >
> > > 2. As above, but using the offset topic in parallel.
> > >
> > > As the offset.stroage topic is keyed by connector name (from the source
> > > connectors I've tried) I do not understand the risk of both of the
> above
> > > cases **unless** > 1  connector exists with the same name in separate
> > > clusters, as there would then be the risk of key collision as group.id
> > is
> > > not referenced in the offset topic keys.
> > >
> > > Any insights into why sharing the offset topic between clusters for the
> > > cases described would be greatly appreciated, thank you.
> > >
> >
>


Re: Running multiple MM2 instances

2022-03-23 Thread Chris Egerton
Hi Julia,

Sounds like KAFKA-9981 [1]. This is a known issue with MirrorMaker 2 that
impacts horizontal scalability and has not yet been addressed. There is
some work in progress to fix this issue [2], but the effort hasn't received
much attention to date. There may be other issues as well, but until
KAFKA-9981 is addressed, running MirrorMaker 2 in a multi-node cluster will
be at best difficult and worst, impossible.

[1] - https://issues.apache.org/jira/browse/KAFKA-9981
[2] -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters

Cheers,

Chris

On Wed, Mar 23, 2022 at 11:15 AM Kalimova, Julia
 wrote:

>
>
> Hello!
>
>
>
> I’m wondering if it’s possible to scale up MM2 by running multiple
> instances per data center?
>
> For scalability purposes, I would like to run 2 instances of
> connect-mirror-maker.sh in one data center. However, I cannot get 2
> instances of mirror maker to work at the same time: once I start up the
> second mirror maker, it takes over for the first one, and the first one
> completely stops replicating: i.e. rather than scaling up and rebalancing,
> the whole workload is still handled by a single instance.
>
> What am I missing here? Would greatly appreciate any guidance with this!
> Thank you!
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender immediately
> and delete this message. See
> http://www.blackrock.com/corporate/compliance/email-disclaimers for
> further information.  Please refer to
> http://www.blackrock.com/corporate/compliance/privacy-policy for more
> information about BlackRock’s Privacy Policy.
>
>
> For a list of BlackRock's office addresses worldwide, see
> http://www.blackrock.com/corporate/about-us/contacts-locations.
>
> © 2022 BlackRock, Inc. All rights reserved.
>


Re: securing sasl/scram username and password in kafka connect

2022-03-07 Thread Chris Egerton
It looks like the file config provider isn't actually set up on the Connect
worker. What does your Connect worker config look like (usually a file
called something like connect-distributed.properties)? Feel free to change
any sensitive values to a string like "", but please don't remove
them entirely (they may be necessary for debugging).

On Mon, Mar 7, 2022 at 4:39 PM Men Lim  wrote:

> Thanks for the response Chris.  I went thru the setup again and it appeared
> I might have had a typo somewhere last friday.  Currently, I'm running into
> a file permission issue.
>
> the file has the following permissions:
>
> -rw-r--r-- 1 adm admn 88 Mar  7 21:23 connector_credentials.properties
>
> I have tried changing the pwd to 700 but still the same error:
>
> Unable to connect: Access denied for user
> '${file:/app/data/cred/connector_credentials.prop'@'172.x.x.x' (using
> password: YES)
>
> On Mon, Mar 7, 2022 at 1:55 PM Chris Egerton 
> wrote:
>
> > Hi Men,
> >
> > That config snippet has a small syntax error: all double quotes should be
> > escaped. Assuming you tried something like this:
> >
> > "database.history.producer.sasl.jaas.config":
> > "org.apache.kafka.common.security.scram.ScramLoginModule required
> > username=\"${file:/path/file.pro:user\"} password=\"${file:/path/
> file.pro
> > :password}\";"
> >
> > and still ran into issues, we'd probably need to see log files or, at the
> > very least, the stack trace for the task from the REST API (if it failed
> at
> > all) in order to follow up and provide more help.
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Mar 7, 2022 at 3:26 PM Men Lim  wrote:
> >
> > > Hi Chris,
> > > I was getting an unauthorized/authentication error message when I was
> > > trying it out last Friday.  I tried looking for the exact message in
> the
> > > connect.log.* files but was not very successful.  In my connector
> file, I
> > > have
> > >
> > > {
> > >  "name":"blah",
> > >  "config": {
> > >  ...
> > >  ...
> > >  "database.history.producer.sasl.jaas.config":
> > > "org.apache.kafka.common.security.scram.ScramLoginModule required
> > > username=\"000\" password=\"00\";",
> > >  ...
> > >   }
> > > }
> > >
> > > I changed the database.history.producer.sasl.jaas.config to:
> > >
> > > "database.history.producer.sasl.jaas.config":
> > > "org.apache.kafka.common.security.scram.ScramLoginModule required
> > > username="${file:/path/file.pro:user"} password="${file:/path/file.pro
> :
> > > password}";",
> > >
> > > On Mon, Mar 7, 2022 at 9:46 AM Chris Egerton 
> > > wrote:
> > >
> > > > Hi Men,
> > > >
> > > > The config provider mechanism should work for every property in a
> > > connector
> > > > config, and every property in a worker config except for the
> > plugin.path
> > > > property (see KAFKA-9845 [1]). You can also use it for only part of a
> > > > single property, or even multiple parts, like in this example
> > (assuming a
> > > > config provider named "file"):
> > > >
> > > >
> > sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
> > > > required username="${file:/some/file.properties:username}"
> > > > password="${file:/some/file.properties:password}"
> > > >
> > > > What sorts of errors are you seeing when trying to use a config
> > provider
> > > > with sasl/scram credentials?
> > > >
> > > > [1] - https://issues.apache.org/jira/browse/KAFKA-9845
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Mon, Mar 7, 2022 at 10:35 AM Men Lim  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > recently, I found out about
> > > > >
> > > > > config.providers=file
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
> > > > >
> > > > > This works great to remove our embedded database password into an
> > > > external
> > > > > file.  However, it does not work when I tried to do the same thing
> > with
> > > > the
> > > > > sasl/scram username and password found in the distributor or
> > connector
> > > > file
> > > > > for kafka connect:
> > > > >
> > > > >
> > >
> sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
> > > > > required \
> > > > > username="000" password="some_password";
> > > > >
> > > > > I was wondering if there's a way to secure these passwords as well?
> > > > >
> > > > > Thanks,
> > > > >
> > > >
> > >
> >
>


Re: securing sasl/scram username and password in kafka connect

2022-03-07 Thread Chris Egerton
Hi Men,

That config snippet has a small syntax error: all double quotes should be
escaped. Assuming you tried something like this:

"database.history.producer.sasl.jaas.config":
"org.apache.kafka.common.security.scram.ScramLoginModule required
username=\"${file:/path/file.pro:user\"} password=\"${file:/path/file.pro
:password}\";"

and still ran into issues, we'd probably need to see log files or, at the
very least, the stack trace for the task from the REST API (if it failed at
all) in order to follow up and provide more help.

Cheers,

Chris

On Mon, Mar 7, 2022 at 3:26 PM Men Lim  wrote:

> Hi Chris,
> I was getting an unauthorized/authentication error message when I was
> trying it out last Friday.  I tried looking for the exact message in the
> connect.log.* files but was not very successful.  In my connector file, I
> have
>
> {
>  "name":"blah",
>  "config": {
>  ...
>  ...
>  "database.history.producer.sasl.jaas.config":
> "org.apache.kafka.common.security.scram.ScramLoginModule required
> username=\"000\" password=\"00\";",
>  ...
>   }
> }
>
> I changed the database.history.producer.sasl.jaas.config to:
>
> "database.history.producer.sasl.jaas.config":
> "org.apache.kafka.common.security.scram.ScramLoginModule required
> username="${file:/path/file.pro:user"} password="${file:/path/file.pro:
> password}";",
>
> On Mon, Mar 7, 2022 at 9:46 AM Chris Egerton 
> wrote:
>
> > Hi Men,
> >
> > The config provider mechanism should work for every property in a
> connector
> > config, and every property in a worker config except for the plugin.path
> > property (see KAFKA-9845 [1]). You can also use it for only part of a
> > single property, or even multiple parts, like in this example (assuming a
> > config provider named "file"):
> >
> > sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
> > required username="${file:/some/file.properties:username}"
> > password="${file:/some/file.properties:password}"
> >
> > What sorts of errors are you seeing when trying to use a config provider
> > with sasl/scram credentials?
> >
> > [1] - https://issues.apache.org/jira/browse/KAFKA-9845
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Mar 7, 2022 at 10:35 AM Men Lim  wrote:
> >
> > > Hi all,
> > >
> > > recently, I found out about
> > >
> > > config.providers=file
> > >
> > >
> > >
> >
> config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
> > >
> > > This works great to remove our embedded database password into an
> > external
> > > file.  However, it does not work when I tried to do the same thing with
> > the
> > > sasl/scram username and password found in the distributor or connector
> > file
> > > for kafka connect:
> > >
> > >
> sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
> > > required \
> > > username="000" password="some_password";
> > >
> > > I was wondering if there's a way to secure these passwords as well?
> > >
> > > Thanks,
> > >
> >
>


Re: securing sasl/scram username and password in kafka connect

2022-03-07 Thread Chris Egerton
Hi Men,

The config provider mechanism should work for every property in a connector
config, and every property in a worker config except for the plugin.path
property (see KAFKA-9845 [1]). You can also use it for only part of a
single property, or even multiple parts, like in this example (assuming a
config provider named "file"):

sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
required username="${file:/some/file.properties:username}"
password="${file:/some/file.properties:password}"

What sorts of errors are you seeing when trying to use a config provider
with sasl/scram credentials?

[1] - https://issues.apache.org/jira/browse/KAFKA-9845

Cheers,

Chris

On Mon, Mar 7, 2022 at 10:35 AM Men Lim  wrote:

> Hi all,
>
> recently, I found out about
>
> config.providers=file
>
>
> config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
>
> This works great to remove our embedded database password into an external
> file.  However, it does not work when I tried to do the same thing with the
> sasl/scram username and password found in the distributor or connector file
> for kafka connect:
>
> sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule
> required \
> username="000" password="some_password";
>
> I was wondering if there's a way to secure these passwords as well?
>
> Thanks,
>


Re: Is MirrorMaker 2 horizontally scalable?

2022-03-03 Thread Chris Egerton
Hi Dmitri,

There's at least one issue with MirrorMaker 2 that impacts horizontal
scalability and has not yet been addressed:
https://issues.apache.org/jira/browse/KAFKA-9981. There is some work in
progress to fix it (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters),
but the effort hasn't received much attention to date.

There may be other issues as well, but until KAFKA-9981 is resolved,
running MirrorMaker 2 in a multi-node cluster will be at best difficult and
at worst, impossible.

Cheers,

Chris

On Thu, Mar 3, 2022 at 10:00 AM Dmitri Pavlov  wrote:

> Hi,
>
> A quick question, maybe you can help?
>
> Trying to follow this article
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> -> "Walkthrough: Running MirrorMaker 2.0", and the last lines in the
> paragraph are
>
> ==
> Second, launch one or more MirrorMaker cluster nodes:
> $ ./bin/connect-mirror-maker.sh mm2.properties
> ==
>
> But apparently it does not work this way, one of 2 simultaneously started
> instance will remain idle, confirmed with Jconsole -> Mbeans.
> Setup: Broker A (in cluster A) -> MM2 2 instances -> Broker B (cluster B),
> for simplicity there is only one broker per cluster.
> A simple experiment is, when one instance is configured and started to
> replicate topic A only and another topic B only, only one topic will be
> replicated, when 2 instances are running in parallel. While, if only one of
> the instances is running at a time, each topic will be replicated correctly.
>
> The main question -> is Mirrormaker 2 horizontally scalable? And if yes,
> would be possible to share a link to a document that describes the setup
> process?
>
> Thanks in advance,
> Dmitri.
>
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
>


Re: [VOTE] 2.8.1 RC1

2021-09-14 Thread Chris Egerton
Hi David,

I took a look at the Javadocs and noticed a small issue. Using the search
bar at the top of the landing page (
https://home.apache.org/~dajac/kafka-2.8.1-rc1/javadoc/), I entered
"KafkaProducer" and clicked on the search item that came up. This brought
me to
https://home.apache.org/~dajac/kafka-2.8.1-rc1/javadoc/undefined/org/apache/kafka/clients/producer/KafkaProducer.html,
which displayed a 404 "Not Found" error.

This doesn't appear to be a regression as the same behavior occurs for me
when browsing the 2.8.0 Javadocs (
https://kafka.apache.org/28/javadoc/index.html?overview-summary.html), but
I wanted to bring it up just in case. Normally I don't use the search bar
but with the removal of sidebar frames from Javadocs I've had to start
relying on it a little bit more.

Cheers,

Chris

On Tue, Sep 14, 2021 at 12:56 PM Randall Hauch  wrote:

> Thanks for generating a new RC1 with the corrected site docs, David.
>
> I was able to successfully complete the following:
>
> - Installed 2.8.1 RC0 and performed quickstart for broker and Connect
> - Verified signatures and checksums
> - Verified the tag
> - Manually compared the release notes to JIRA
> - Build release archive from the tag, ran Connect tests, installed locally,
> and ran a portion of quickstart
> - Manually spotchecked the Javadocs
> - Verified the site docs at https://kafka.apache.org/28/documentation.html
> has corrected version references, except for the tar and cd commands in
> step 1 of the https://kafka.apache.org/28/documentation.html#quickstart.
>
> I think that last issue is minor and not worth another RC, since the other
> version references in https://kafka.apache.org/28/documentation.html do
> reference 2.8.1 and we can easily fix it on the website, optionally as part
> of the other post-vote changes to the site.
>
> So I'm +1 (binding)
>
> Best regards,
>
> Randall
>
> On Tue, Sep 14, 2021 at 8:39 AM David Jacot 
> wrote:
>
> > Hello Kafka users, developers and client-developers,
> >
> > This is the second candidate for release of Apache Kafka 2.8.1.
> >
> > Apache Kafka 2.8.1 is a bugfix release and fixes 49 issues since the
> 2.8.0
> > release. Please see the release notes for more information.
> >
> > Release notes for the 2.8.1 release:
> > https://home.apache.org/~dajac/kafka-2.8.1-rc1/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Friday, September 17, 9am PT
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~dajac/kafka-2.8.1-rc1/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~dajac/kafka-2.8.1-rc1/javadoc/
> >
> > * Tag to be voted upon (off 2.8 branch) is the 2.8.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.8.1-rc1
> >
> > * Documentation:
> > https://kafka.apache.org/28/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/28/protocol.html
> >
> > * Successful Jenkins builds for the 2.8 branch:
> > Unit/integration tests:
> > https://ci-builds.apache.org/job/Kafka/job/kafka/job/2.8/80/
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/2.8/214/
> >
> > /**
> >
> > Thanks,
> > David
> >
>


New Kafka Connector

2016-08-22 Thread Chris Egerton
Hi there,

We've recently open-sourced a BigQuery sink connector and would like to
request that it be added to the Kafka Connector Hub (
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Connector+Hub). The
project can be found at https://github.com/wepay/kafka-connect-biquery, and
the connector itself has been deployed to Maven Central (latest version is
0.2.1, but it may still be in the process of synching at the time of
writing). Is there anything else you'd like to know about it before posting
it to your page?

Cheers!

Chris Egerton
Software Engineering Intern, WePay