Re: Subject: [VOTE] 2.4.1 RC0

2020-03-09 Thread Sean Glover
om source and ran unit and integration tests. They
> passed.
> > > There
> > > >>>> was a large number of skipped tests, but I'm assuming that is
> > > intentional.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Cheers
> > > >>>> Eno
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Mar 3, 2020 at 8:42 PM Eric Lalonde < eric@ autonomic.
> ai (
> > > >>>> e...@autonomic.ai ) > wrote:
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> I ran:
> > > >>>>> $
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> https:/ / github. com/ elalonde/ kafka/ blob/ master/ bin/
> > > verify-kafka-rc.
> > > >> sh (
> > > https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh )
> > > >>
> > > >>
> > > >>
> > > >>>
> > > >>>>
> > > >>>>
> > > >>>> < https:/ / github. com/ elalonde/ kafka/ blob/ master/ bin/
> > > verify-kafka-rc.
> > > >>>> sh (
> > > https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh )
> > > >>>> >
> > > >>>> 2.4.1 https:/ / home. apache. org/ ~bbejeck/ kafka-2. 4. 1-rc0 (
> > > >>>> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0 ) < https:/ /
> home.
> > > apache.
> > > >>>> org/ ~bbejeck/ kafka-2. 4. 1-rc0 (
> > > >>>> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0 ) >
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> All checksums and signatures are good and all unit and
> integration
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> tests
> > > >>
> > > >>
> > > >>>
> > > >>>>
> > > >>>>
> > > >>>> that were executed passed successfully.
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> - Eric
> > > >>>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Mar 2, 2020, at 6:39 PM, Bill Bejeck < bbejeck@ gmail. com (
> > > >>>>>> bbej...@gmail.com ) > wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Hello Kafka users, developers and client-developers,
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> This is the first candidate for release of Apache Kafka 2.4.1.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> This is a bug fix release and it includes fixes and improvements
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> from
> > > >>
> > > >>
> > > >>>
> > > >>>>
> > > >>>>
> > > >>>> 38
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> JIRAs, including a few critical bugs.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Release notes for the 2.4.1 release:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> https:/ / home. apache. org/ ~bbejeck/ kafka-2. 4. 1-rc0/
> > > RELEASE_NOTES. html
> > > >> (
> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/RELEASE_NOTES.html )
> > > >>
> > > >>
> > > >>>
> > > >>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> *Please download, test and vote by Thursday, March 5, 9 am PT*
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Kafka's KEYS file containing PGP keys we use to sign the
> release:
> > > https:/ /
> > > >>>>>> kafka. apache. org/ KEYS ( https://kafka.apache.org/KEYS )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Release artifacts to be voted upon (source and binary):
> https:/ /
> > > home. apache.
> > > >>>>>> org/ ~bbejeck/ kafka-2. 4. 1-rc0/ (
> > > >>>>>> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/ )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Maven artifacts to be voted upon:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> https:/ / repository. apache. org/ content/ groups/ staging/ org/
> > > apache/ kafka/
> > > >> (
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> )
> > > >>
> > > >>
> > > >>
> > > >>>
> > > >>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Javadoc:
> > > >>>>>> https:/ / home. apache. org/ ~bbejeck/ kafka-2. 4. 1-rc0/
> javadoc/ (
> > > >>>>>> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/javadoc/ )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Tag to be voted upon (off 2.4 branch) is the 2.4.1 tag:
> https:/ /
> > > github.
> > > >>>>>> com/ apache/ kafka/ releases/ tag/ 2. 4. 1-rc0 (
> > > >>>>>> https://github.com/apache/kafka/releases/tag/2.4.1-rc0 )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Documentation:
> > > >>>>>> https:/ / kafka. apache. org/ 24/ documentation. html (
> > > >>>>>> https://kafka.apache.org/24/documentation.html )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Protocol:
> > > >>>>>> https:/ / kafka. apache. org/ 24/ protocol. html (
> > > >>>>>> https://kafka.apache.org/24/protocol.html )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> * Successful Jenkins builds for the 2.4 branch:
> > > >>>>>> Unit/integration tests: Links to successful unit/integration
> test
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> build to
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> follow
> > > >>>>>> System tests:
> > > >>>>>> https:/ / jenkins. confluent. io/ job/ system-test-kafka/ job/
> 2.
> > > 4/ 152/ (
> > > >>>>>> https://jenkins.confluent.io/job/system-test-kafka/job/2.4/152/
> )
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Bill Bejeck
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> David Arthur
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > >
> > > > Thanks!
> > > > --Vahid
> > > >
> > > >
> > > >
>


-- 
Sean Glover
Principal Engineer, Akka Streams, Lightbend, Inc.
@seg1o, in/seanaglover


Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2020-01-17 Thread Sean Glover
Hi Habib,

With regards to your earlier question about timezones, I've updated the KIP
to remove the LatencyTime abstraction since it is no longer relevant.  I
added a note about epoch time as well.

Thanks,
Sean

On Wed, Jan 15, 2020 at 8:28 AM Habib Nahas  wrote:

> Hi Sean,
>
> Thats great, look forward to it.
>
> Thanks,
> Habib
>
> On Tue, Jan 14, 2020, at 2:55 PM, Sean Glover wrote:
> > Hi Habib,
> >
> > Thank you for the reminder. I'll update the KIP this week and address the
> > feedback from you and Gokul.
> >
> > Regards,
> > Sean
> >
> > On Tue, Jan 14, 2020 at 9:06 AM Habib Nahas  wrote:
> >
> > > Any chance of an update on the KIP? We are interested in seeing this
> move
> > > forward.
> > >
> > > Thanks,
> > > Habib
> > > Sr SDE, AWS
> > >
> > > On Wed, Dec 18, 2019, at 3:27 PM, Habib Nahas wrote:
> > > > Thanks Sean. Look forward to the updated KIP.
> > > >
> > > > Regards,
> > > > Habib
> > > >
> > > > On Fri, Dec 13, 2019, at 6:22 AM, Sean Glover wrote:
> > > > > Hi,
> > > > >
> > > > > After my last reply I had a nagging feeling something wasn't right,
> > > and I
> > > > > remembered that epoch time is UTC. This makes the discussion about
> > > > > timezone irrelevant, since we're always using UTC. This makes the
> need
> > > for
> > > > > the LatencyTime interface that I proposed in the design irrelevant
> as
> > > well,
> > > > > since I can no longer think about how that might be useful. I'll
> update
> > > > > the KIP. I'll also review KIP-32 to understand message timestamps
> > > better
> > > > > so I can explain the different types of latency results that could
> be
> > > > > reported with this metric.
> > > > >
> > > > > Regards,
> > > > > Sean
> > > > >
> > > > > On Thu, Dec 12, 2019 at 6:25 PM Sean Glover <
> sean.glo...@lightbend.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Habib,
> > > > > >
> > > > > > Thanks for question! If the consumer is in a different timezone
> than
> > > the
> > > > > > timezone used to produce messages to a partition then you can
> use an
> > > > > > implementation of LatencyTime to return the current time of that
> > > timezone.
> > > > > > The current design assumes that messages produced to a partition
> > > must all
> > > > > > be produced from the same timezone. If timezone metadata were
> > > encoded into
> > > > > > each message then it would be possible to automatically
> determine the
> > > > > > source timezone and calculate latency, however the current design
> > > will not
> > > > > > pass individual messages into LatencyTime to retrieve message
> > > metadata.
> > > > > > Instead, the LatencyTime.getWallClockTime method is only called
> once
> > > per
> > > > > > fetch request response per partition and then the metric is
> recorded
> > > once
> > > > > > the latency calculation is complete. This follows the same
> design as
> > > the
> > > > > > current consumer lag metric which calculates offset lag based on
> the
> > > last
> > > > > > message of the fetch request response for a partition. Since the
> > > metric is
> > > > > > just an aggregate (max/mean) over some time window we only need
> to
> > > > > > occasionally calculate latency, which will have negligible
> impact on
> > > the
> > > > > > performance of consumer polling.
> > > > > >
> > > > > > A simple implementation of LatencyTime that returns wall clock
> time
> > > for
> > > > > > the Asia/Singapore timezone for all partitions:
> > > > > >
> > > > > > import java.time.*;
> > > > > >
> > > > > > class SingaporeTime implements LatencyTime {
> > > > > > ZoneId zoneSingapore = ZoneId.of("Asia/Singapore");
> > > > > > Clock clockSingapore = Clock.system(zoneSingapore);
> > > > > >
> > > > > > @Override
> > > > > > public long getWallClockTi

Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2020-01-17 Thread Sean Glover
Hi Gokul,

Thank you for your detailed review.  I've summarized the updates I've made
to the KIP inline below.  Please review the updated KIP when you have time.

On Fri, Dec 20, 2019 at 6:56 AM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> Hi Sean.
>
> Thanks for writing this KIP. Sounds like a great addition. Few comments.
>
> 1. Currently, I see that you have proposed partition-level records-latency
> metrics and a global records-latency-max metric across all partitions for a
> given consumer group. Some Kafka users may organize their topics such that
> some topics are more important than others. *Why not have the latency
> metric at the topic level as well?* Although one could imagine having
> metrics aggregation outside of JMX to generate the topic-level metrics, I
> suppose having topic level metrics will allow Kafka users to setup alarms
> at the topic level with greater ease. IMHO, this KIP should address this
> use case. Even if you believe we should not expose topic level metrics, it
> would be nice to see the KIP explain why.
>

I added a topic-level metric.  Something to consider is how to represent
latency at an aggregate (non-partition) level.  For both the client and
topic-level metrics I specify in the KIP that the maximum of all
record-latency-max metrics for partitions assigned to that client should be
used.  Other aggregates could also be used, such as the median, or some set
of percentiles, but one aggregate that does not make sense is to sum
latency across many partitions, because partitions are typically always
consumed in parallel fashion.


>
> 2. Some existing solutions already expose the consumer group lap in time.
> See
>
> https://www.lightbend.com/blog/monitor-kafka-consumer-group-latency-with-kafka-lag-exporter
> for
> an example. *The KIP should reference existing solutions and suggest the
> benefits of using the native solution that you propose*.
>

I added Kafka Lag Exporter to the rejected alternatives section.  To
summarize, this project will *estimate* the latency of a partition, and can
only do so when the group partition offsets are committed back to Kafka.

For full disclosure, I am the author of Kafka Lag Exporter (and the linked
blog post), and although I recommend it as being a convenient option to
measure latency I think that a KafkaConsumer metric would be more ideal.


>
> 3. If a message was produced a long time ago, and a new consumer group has
> been created, then the latency metrics are going to be very high in value
> until the consumer group catches up. This is especially true in the context
> of KIP-405 (
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
> )
> which allows reading very old messages. Therefore, a consumer application
> that relies on reading all messages from the past will report a high
> records-latency for a while. *I think that the KIP should note down the
> caveat that setting SLAs on records-latency makes sense only in steady
> state, and not for bootstrapping new consumer groups.*
>
>
Great point.  I've added your note almost verbatim to the KIP.


> 4. Since the community has used the term consumer-lag so often, *why not
> call the metric consumer-lag-millis which makes the units clear as well*.
> records-latency is a bit confusing at least for me.
>

I agree that "lag" and "latency" are overloaded terms.  Since "latency"
doesn't appear often in Kafka literature, and in my experience, is usually
used to refer to time-based lag, I thought it would be the best name for
the metric.  According to queueing theory it could also be called "wait
time", but this would depend on the timestamp type as well (user-defined,
CreateTime, or LogAppendTime).  I'm curious what others think.


>
> Cheers.
>
> On Wed, Dec 18, 2019 at 3:28 PM Habib Nahas  wrote:
>
> > Thanks Sean. Look forward to the updated KIP.
> >
> > Regards,
> > Habib
> >
> > On Fri, Dec 13, 2019, at 6:22 AM, Sean Glover wrote:
> > > Hi,
> > >
> > > After my last reply I had a nagging feeling something wasn't right,
> and I
> > > remembered that epoch time is UTC. This makes the discussion about
> > > timezone irrelevant, since we're always using UTC. This makes the need
> > for
> > > the LatencyTime interface that I proposed in the design irrelevant as
> > well,
> > > since I can no longer think about how that might be useful. I'll update
> > > the KIP. I'll also review KIP-32 to understand message timestamps
> better
> > > so I can explain the different types of latency results that could be
> > > reported with this metric.
> > >
> > > Regards,
> > > Sean
> > >
>

Re: [ANNOUNCE] New Kafka PMC Members: Colin, Vahid and Manikumar

2020-01-15 Thread Sean Glover
Congratulations Colin, Vahid and Manikumar and thank you for all your
excellent work on Apache Kafka!

On Wed, Jan 15, 2020 at 8:42 AM Ron Dagostino  wrote:

> Congratulations!
>
> > On Jan 15, 2020, at 5:04 AM, Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
> >
> > Congrats to you guys, it's a great accomplishment! :)
> >
> >> On Wed, Jan 15, 2020 at 10:20 AM David Jacot 
> wrote:
> >>
> >> Congrats!
> >>
> >>> On Wed, Jan 15, 2020 at 12:00 AM James Cheng 
> wrote:
> >>>
> >>> Congrats Colin, Vahid, and Manikumar!
> >>>
> >>> -James
> >>>
>  On Jan 14, 2020, at 10:59 AM, Tom Bentley 
> wrote:
> 
>  Congratulations!
> 
>  On Tue, Jan 14, 2020 at 6:57 PM Rajini Sivaram <
> >> rajinisiva...@gmail.com>
>  wrote:
> 
> > Congratulations Colin, Vahid and Manikumar!
> >
> > Regards,
> > Rajini
> >
> > On Tue, Jan 14, 2020 at 6:32 PM Mickael Maison <
> >>> mickael.mai...@gmail.com>
> > wrote:
> >
> >> Congrats Colin, Vahid and Manikumar!
> >>
> >> On Tue, Jan 14, 2020 at 5:43 PM Ismael Juma 
> >> wrote:
> >>>
> >>> Congratulations Colin, Vahid and Manikumar!
> >>>
> >>> Ismael
> >>>
> >>> On Tue, Jan 14, 2020 at 9:30 AM Gwen Shapira 
> > wrote:
> >>>
>  Hi everyone,
> 
>  I'm happy to announce that Colin McCabe, Vahid Hashemian and
> > Manikumar
>  Reddy are now members of Apache Kafka PMC.
> 
>  Colin and Manikumar became committers on Sept 2018 and Vahid on
> Jan
>  2019. They all contributed many patches, code reviews and
> > participated
>  in many KIP discussions. We appreciate their contributions and are
>  looking forward to many more to come.
> 
>  Congrats Colin, Vahid and Manikumar!
> 
>  Gwen, on behalf of Apache Kafka PMC
> 
> >>
> >
> >>>
> >>>
> >>
>


Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2020-01-14 Thread Sean Glover
Hi Habib,

Thank you for the reminder.  I'll update the KIP this week and address the
feedback from you and Gokul.

Regards,
Sean

On Tue, Jan 14, 2020 at 9:06 AM Habib Nahas  wrote:

> Any chance of an update on the KIP? We are interested in seeing this move
> forward.
>
> Thanks,
> Habib
> Sr SDE, AWS
>
> On Wed, Dec 18, 2019, at 3:27 PM, Habib Nahas wrote:
> > Thanks Sean. Look forward to the updated KIP.
> >
> > Regards,
> > Habib
> >
> > On Fri, Dec 13, 2019, at 6:22 AM, Sean Glover wrote:
> > > Hi,
> > >
> > > After my last reply I had a nagging feeling something wasn't right,
> and I
> > > remembered that epoch time is UTC. This makes the discussion about
> > > timezone irrelevant, since we're always using UTC. This makes the need
> for
> > > the LatencyTime interface that I proposed in the design irrelevant as
> well,
> > > since I can no longer think about how that might be useful. I'll update
> > > the KIP. I'll also review KIP-32 to understand message timestamps
> better
> > > so I can explain the different types of latency results that could be
> > > reported with this metric.
> > >
> > > Regards,
> > > Sean
> > >
> > > On Thu, Dec 12, 2019 at 6:25 PM Sean Glover  >
> > > wrote:
> > >
> > > > Hi Habib,
> > > >
> > > > Thanks for question! If the consumer is in a different timezone than
> the
> > > > timezone used to produce messages to a partition then you can use an
> > > > implementation of LatencyTime to return the current time of that
> timezone.
> > > > The current design assumes that messages produced to a partition
> must all
> > > > be produced from the same timezone. If timezone metadata were
> encoded into
> > > > each message then it would be possible to automatically determine the
> > > > source timezone and calculate latency, however the current design
> will not
> > > > pass individual messages into LatencyTime to retrieve message
> metadata.
> > > > Instead, the LatencyTime.getWallClockTime method is only called once
> per
> > > > fetch request response per partition and then the metric is recorded
> once
> > > > the latency calculation is complete. This follows the same design as
> the
> > > > current consumer lag metric which calculates offset lag based on the
> last
> > > > message of the fetch request response for a partition. Since the
> metric is
> > > > just an aggregate (max/mean) over some time window we only need to
> > > > occasionally calculate latency, which will have negligible impact on
> the
> > > > performance of consumer polling.
> > > >
> > > > A simple implementation of LatencyTime that returns wall clock time
> for
> > > > the Asia/Singapore timezone for all partitions:
> > > >
> > > > import java.time.*;
> > > >
> > > > class SingaporeTime implements LatencyTime {
> > > > ZoneId zoneSingapore = ZoneId.of("Asia/Singapore");
> > > > Clock clockSingapore = Clock.system(zoneSingapore);
> > > >
> > > > @Override
> > > > public long getWallClockTime(TopicPartition tp) {
> > > > return clockSingapore.instant.getEpochSecond();
> > > > }
> > > >
> > > > ...
> > > > }
> > > >
> > > > Regards,
> > > > Sean
> > > >
> > > >
> > > > On Thu, Dec 12, 2019 at 6:18 AM Habib Nahas  wrote:
> > > >
> > > >> Hi Sean,
> > > >>
> > > >> Thanks for the KIP.
> > > >>
> > > >> As I understand it users are free to set their own timestamp on
> > > >> ProducerRecord. What is the recommendation for the proposed metric
> in a
> > > >> scenario where the user sets this timestamp in timezone A and
> consumes the
> > > >> record in timezone B. Its not clear to me if a custom
> implementation of
> > > >> LatencyTime will help here.
> > > >>
> > > >> Thanks,
> > > >> Habib
> > > >>
> > > >> On Wed, Dec 11, 2019, at 4:52 PM, Sean Glover wrote:
> > > >> > Hello again,
> > > >> >
> > > >> > There has been some interest in this KIP recently. I'm bumping the
> > > >> thread
> > > >> > to encourage feedback on the design.

Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2019-12-12 Thread Sean Glover
Hi,

After my last reply I had a nagging feeling something wasn't right, and I
remembered that epoch time is UTC.  This makes the discussion about
timezone irrelevant, since we're always using UTC.  This makes the need for
the LatencyTime interface that I proposed in the design irrelevant as well,
since I can no longer think about how that might be useful.  I'll update
the KIP.  I'll also review KIP-32 to understand message timestamps better
so I can explain the different types of latency results that could be
reported with this metric.

Regards,
Sean

On Thu, Dec 12, 2019 at 6:25 PM Sean Glover 
wrote:

> Hi Habib,
>
> Thanks for question! If the consumer is in a different timezone than the
> timezone used to produce messages to a partition then you can use an
> implementation of LatencyTime to return the current time of that timezone.
> The current design assumes that messages produced to a partition must all
> be produced from the same timezone.  If timezone metadata were encoded into
> each message then it would be possible to automatically determine the
> source timezone and calculate latency, however the current design will not
> pass individual messages into LatencyTime to retrieve message metadata.
> Instead, the LatencyTime.getWallClockTime method is only called once per
> fetch request response per partition and then the metric is recorded once
> the latency calculation is complete.  This follows the same design as the
> current consumer lag metric which calculates offset lag based on the last
> message of the fetch request response for a partition.  Since the metric is
> just an aggregate (max/mean) over some time window we only need to
> occasionally calculate latency, which will have negligible impact on the
> performance of consumer polling.
>
> A simple implementation of LatencyTime that returns wall clock time for
> the Asia/Singapore timezone for all partitions:
>
> import java.time.*;
>
> class SingaporeTime implements LatencyTime {
>   ZoneId zoneSingapore = ZoneId.of("Asia/Singapore");
>   Clock clockSingapore = Clock.system(zoneSingapore);
>
>   @Override
>   public long getWallClockTime(TopicPartition tp) {
> return clockSingapore.instant.getEpochSecond();
>   }
>
>   ...
> }
>
> Regards,
> Sean
>
>
> On Thu, Dec 12, 2019 at 6:18 AM Habib Nahas  wrote:
>
>> Hi Sean,
>>
>> Thanks for the KIP.
>>
>> As I understand it users are free to set their own timestamp on
>> ProducerRecord. What is the recommendation for the proposed metric in a
>> scenario where the user sets this timestamp in timezone A and consumes the
>> record in timezone B. Its not clear to me if a custom implementation of
>> LatencyTime will help here.
>>
>> Thanks,
>> Habib
>>
>> On Wed, Dec 11, 2019, at 4:52 PM, Sean Glover wrote:
>> > Hello again,
>> >
>> > There has been some interest in this KIP recently. I'm bumping the
>> thread
>> > to encourage feedback on the design.
>> >
>> > Regards,
>> > Sean
>> >
>> > On Mon, Jul 15, 2019 at 9:01 AM Sean Glover 
>> > wrote:
>> >
>> > > To hopefully spark some discussion I've copied the motivation section
>> from
>> > > the KIP:
>> > >
>> > > Consumer lag is a useful metric to monitor how many records are
>> queued to
>> > > be processed. We can look at individual lag per partition or we may
>> > > aggregate metrics. For example, we may want to monitor what the
>> maximum lag
>> > > of any particular partition in our consumer subscription so we can
>> identify
>> > > hot partitions, caused by an insufficient producing partitioning
>> strategy.
>> > > We may want to monitor a sum of lag across all partitions so we have a
>> > > sense as to our total backlog of messages to consume. Lag in offsets
>> is
>> > > useful when you have a good understanding of your messages and
>> processing
>> > > characteristics, but it doesn’t tell us how far behind *in time* we
>> are.
>> > > This is known as wait time in queueing theory, or more informally it’s
>> > > referred to as latency.
>> > >
>> > > The latency of a message can be defined as the difference between when
>> > > that message was first produced to when the message is received by a
>> > > consumer. The latency of records in a partition correlates with lag,
>> but a
>> > > larger lag doesn’t necessarily mean a larger latency. For example, a
>> topic
>> > > consumed by two separate application consumer groups

Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2019-12-12 Thread Sean Glover
Hi Habib,

Thanks for question! If the consumer is in a different timezone than the
timezone used to produce messages to a partition then you can use an
implementation of LatencyTime to return the current time of that timezone.
The current design assumes that messages produced to a partition must all
be produced from the same timezone.  If timezone metadata were encoded into
each message then it would be possible to automatically determine the
source timezone and calculate latency, however the current design will not
pass individual messages into LatencyTime to retrieve message metadata.
Instead, the LatencyTime.getWallClockTime method is only called once per
fetch request response per partition and then the metric is recorded once
the latency calculation is complete.  This follows the same design as the
current consumer lag metric which calculates offset lag based on the last
message of the fetch request response for a partition.  Since the metric is
just an aggregate (max/mean) over some time window we only need to
occasionally calculate latency, which will have negligible impact on the
performance of consumer polling.

A simple implementation of LatencyTime that returns wall clock time for the
Asia/Singapore timezone for all partitions:

import java.time.*;

class SingaporeTime implements LatencyTime {
  ZoneId zoneSingapore = ZoneId.of("Asia/Singapore");
  Clock clockSingapore = Clock.system(zoneSingapore);

  @Override
  public long getWallClockTime(TopicPartition tp) {
return clockSingapore.instant.getEpochSecond();
  }

  ...
}

Regards,
Sean


On Thu, Dec 12, 2019 at 6:18 AM Habib Nahas  wrote:

> Hi Sean,
>
> Thanks for the KIP.
>
> As I understand it users are free to set their own timestamp on
> ProducerRecord. What is the recommendation for the proposed metric in a
> scenario where the user sets this timestamp in timezone A and consumes the
> record in timezone B. Its not clear to me if a custom implementation of
> LatencyTime will help here.
>
> Thanks,
> Habib
>
> On Wed, Dec 11, 2019, at 4:52 PM, Sean Glover wrote:
> > Hello again,
> >
> > There has been some interest in this KIP recently. I'm bumping the thread
> > to encourage feedback on the design.
> >
> > Regards,
> > Sean
> >
> > On Mon, Jul 15, 2019 at 9:01 AM Sean Glover 
> > wrote:
> >
> > > To hopefully spark some discussion I've copied the motivation section
> from
> > > the KIP:
> > >
> > > Consumer lag is a useful metric to monitor how many records are queued
> to
> > > be processed. We can look at individual lag per partition or we may
> > > aggregate metrics. For example, we may want to monitor what the
> maximum lag
> > > of any particular partition in our consumer subscription so we can
> identify
> > > hot partitions, caused by an insufficient producing partitioning
> strategy.
> > > We may want to monitor a sum of lag across all partitions so we have a
> > > sense as to our total backlog of messages to consume. Lag in offsets is
> > > useful when you have a good understanding of your messages and
> processing
> > > characteristics, but it doesn’t tell us how far behind *in time* we
> are.
> > > This is known as wait time in queueing theory, or more informally it’s
> > > referred to as latency.
> > >
> > > The latency of a message can be defined as the difference between when
> > > that message was first produced to when the message is received by a
> > > consumer. The latency of records in a partition correlates with lag,
> but a
> > > larger lag doesn’t necessarily mean a larger latency. For example, a
> topic
> > > consumed by two separate application consumer groups A and B may have
> > > similar lag, but different latency per partition. Application A is a
> > > consumer which performs CPU intensive business logic on each message it
> > > receives. It’s distributed across many consumer group members to
> handle the
> > > load quickly enough, but since its processing time is slower, it takes
> > > longer to process each message per partition. Meanwhile, Application B
> is
> > > a consumer which performs a simple ETL operation to land streaming
> data in
> > > another system, such as HDFS. It may have similar lag to Application
> A, but
> > > because it has a faster processing time its latency per partition is
> > > significantly less.
> > >
> > > If the Kafka Consumer reported a latency metric it would be easier to
> > > build Service Level Agreements (SLAs) based on non-functional
> requirements
> > > of the streaming system. For example,

Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2019-12-11 Thread Sean Glover
Hello again,

There has been some interest in this KIP recently.  I'm bumping the thread
to encourage feedback on the design.

Regards,
Sean

On Mon, Jul 15, 2019 at 9:01 AM Sean Glover 
wrote:

> To hopefully spark some discussion I've copied the motivation section from
> the KIP:
>
> Consumer lag is a useful metric to monitor how many records are queued to
> be processed.  We can look at individual lag per partition or we may
> aggregate metrics. For example, we may want to monitor what the maximum lag
> of any particular partition in our consumer subscription so we can identify
> hot partitions, caused by an insufficient producing partitioning strategy.
> We may want to monitor a sum of lag across all partitions so we have a
> sense as to our total backlog of messages to consume. Lag in offsets is
> useful when you have a good understanding of your messages and processing
> characteristics, but it doesn’t tell us how far behind *in time* we are.
> This is known as wait time in queueing theory, or more informally it’s
> referred to as latency.
>
> The latency of a message can be defined as the difference between when
> that message was first produced to when the message is received by a
> consumer.  The latency of records in a partition correlates with lag, but a
> larger lag doesn’t necessarily mean a larger latency. For example, a topic
> consumed by two separate application consumer groups A and B may have
> similar lag, but different latency per partition.  Application A is a
> consumer which performs CPU intensive business logic on each message it
> receives. It’s distributed across many consumer group members to handle the
> load quickly enough, but since its processing time is slower, it takes
> longer to process each message per partition.  Meanwhile, Application B is
> a consumer which performs a simple ETL operation to land streaming data in
> another system, such as HDFS. It may have similar lag to Application A, but
> because it has a faster processing time its latency per partition is
> significantly less.
>
> If the Kafka Consumer reported a latency metric it would be easier to
> build Service Level Agreements (SLAs) based on non-functional requirements
> of the streaming system.  For example, the system must never have a latency
> of greater than 10 minutes. This SLA could be used in monitoring alerts or
> as input to automatic scaling solutions.
>
> On Thu, Jul 11, 2019 at 12:36 PM Sean Glover 
> wrote:
>
>> Hi kafka-dev,
>>
>> I've created KIP-489 as a proposal for adding latency metrics to the
>> Kafka Consumer in a similar way as record-lag metrics are implemented.
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/489%3A+Kafka+Consumer+Record+Latency+Metric
>>
>> Regards,
>> Sean
>>
>> --
>> Principal Engineer, Lightbend, Inc.
>>
>> <http://lightbend.com>
>>
>> @seg1o <https://twitter.com/seg1o>, in/seanaglover
>> <https://www.linkedin.com/in/seanaglover/>
>>
>
>
> --
> Principal Engineer, Lightbend, Inc.
>
> <http://lightbend.com>
>
> @seg1o <https://twitter.com/seg1o>, in/seanaglover
> <https://www.linkedin.com/in/seanaglover/>
>


Re: [VOTE] 2.4.0 RC1

2019-11-26 Thread Sean Glover
Hi,

I also used Eric's test script.  I had a few issues running it that I
address below[0][1], otherwise looks good.

- Signing keys all good
- All md5, sha1sums and sha512sums are good
- A couple transient test failures that passed on a second run
(ReassignPartitionsClusterTest.shouldMoveSinglePartitionWithinBroker,
SaslScramSslEndToEndAuthorizationTest.
testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl)
- Passes our own test suite for Alpakka Kafka (
https://travis-ci.org/akka/alpakka-kafka/builds/616861540,
https://github.com/akka/alpakka-kafka/pull/971)

+1 (non-binding)

..

Issues while running test script:

[0] Error with Eric test script.  I had an issue running the script with my
version of bash (TMPDIR was unassigned), which I provided a PR for (
https://github.com/elalonde/kafka/pull/1)
[1] Gradle incompatibility. I ran into difficulty running the gradle build
with the latest version of gradle (6.0.1).  I had to revert to the last
patch of version 5 (5.6.4):

 ✘ seglo@slice  /tmp/verify-kafka-SP06GE1GpP/10169.out/kafka-2.4.0-src 
gradle wrapper --warning-mode all

> Configure project :
The maven plugin has been deprecated. This is scheduled to be removed in
Gradle 7.0. Please use the maven-publish plugin instead.
at
build_c0129pbfzzxjolwxmds3lsevz$_run_closure5.doCall(/tmp/verify-kafka-SP06GE1GpP/10169.out/kafka-2.4.0-src/build.gradle:160)
(Run with --stacktrace to get the full stack trace of this
deprecation warning.)

FAILURE: Build failed with an exception.

* Where:
Build file
'/tmp/verify-kafka-SP06GE1GpP/10169.out/kafka-2.4.0-src/build.gradle' line:
472

* What went wrong:
A problem occurred evaluating root project 'kafka-2.4.0-src'.
> Could not create task ':clients:spotbugsMain'.
   > Could not create task of type 'SpotBugsTask'.
  > Could not create an instance of type
com.github.spotbugs.internal.SpotBugsReportsImpl.
 >
org.gradle.api.reporting.internal.TaskReportContainer.(Ljava/lang/Class;Lorg/gradle/api/Task;)V

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or
--debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 699ms

On Tue, Nov 26, 2019 at 1:31 PM Manikumar  wrote:

> Hi All,
>
> Please download, test and vote the RC1 in order to provide quality
> assurance for the forthcoming 2.4 release.
>
> Thanks.
>
> On Tue, Nov 26, 2019 at 8:11 PM Adam Bellemare 
> wrote:
>
> > Hello,
> >
> > Ran Eric's test script:
> > $ git clone https://github.com/elalonde/kafka
> > $ ./kafka/bin/verify-kafka-rc.sh 2.4.0
> > https://home.apache.org/~manikumar/kafka-2.4.0-rc1
> > 
> >
> > - All PGP signatures are good
> > - All md5, sha1sums and sha512sums pass
> > - Had a few intermittent failures in tests that passed upon rerunning.
> >
> > +1 (non-binding) from me.
> >
> > Adam
> >
> > On Wed, Nov 20, 2019 at 10:37 AM Manikumar 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the second candidate for release of Apache Kafka 2.4.0.
> > >
> > > This release includes many new features, including:
> > > - Allow consumers to fetch from closest replica
> > > - Support for incremental cooperative rebalancing to the consumer
> > rebalance
> > > protocol
> > > - MirrorMaker 2.0 (MM2), a new multi-cluster, cross-datacenter
> > replication
> > > engine
> > > - New Java authorizer Interface
> > > - Support for  non-key joining in KTable
> > > - Administrative API for replica reassignment
> > > - Sticky partitioner
> > > - Return topic metadata and configs in CreateTopics response
> > > - Securing Internal connect REST endpoints
> > > - API to delete consumer offsets and expose it via the AdminClient.
> > >
> > > Release notes for the 2.4.0 release:
> > > https://home.apache.org/~manikumar/kafka-2.4.0-rc1/RELEASE_NOTES.html
> > >
> > > ** Please download, test and vote by Tuesday, November 26, 9am PT **
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~manikumar/kafka-2.4.0-rc1/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~manikumar/kafka-2.4.0-rc1/javadoc/
> > >
> > > * Tag to be voted upon (off 2.4 branch) is the 2.4.0 tag:
> > > https://github.com/apache/kafka/releases/tag/2.4.0-rc1
> > >
> > > * Documentation:
> > > https://kafka.apache.org/24/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/24/protocol.html
> > >
> > > Thanks,
> > > Manikumar
> > >
> >
>


Re: Preliminary blog post about the Apache Kafka 2.4.0 release

2019-11-18 Thread Sean Glover
Here's a summary that can go under the section "What’s new in Kafka broker,
producer, and consumer" as an "improvement".  Feel free to rephrase as you
see fit.

When a partition is paused by the user in the consumer the partition is
> considered "unfetchable".  When the consumer has already fetched data for a
> partition, and then the partition is paused, then in the next consumer poll
> all data from "unfetchable" partitions will be discarded.  In use cases
> where pausing and resuming partitions is common during regular operation of
> the consumer this can result in discarding pre-fetched data when it's not
> necessary.  Once the partition is resumed then new fetch requests will be
> generated and sent to the broker to get the same partition data again.
> Depending on the frequency of pausing and resuming of partitions this can
> impact a number of different aspects of consumer polling including:
> broker/consumer throughput, number of consumer fetch requests, and
> NIO-related GC concerns for regularly dereferenced byte buffers of
> partition data.  This issue is now resolved by retaining completed fetch
> data for partitions that are paused so that it may be returned in a future
> consumer poll once the partition is resumed by the user.
>


See [KAFKA-7548](https://issues.apache.org/jira/browse/KAFKA-7548) for more
> details.


Regards,
Sean

On Mon, Nov 18, 2019 at 11:45 AM Ismael Juma  wrote:

> That makes sense to me.
>
> Ismael
>
> On Mon, Nov 18, 2019 at 8:40 AM Sean Glover 
> wrote:
>
> > Hi Manikumar,
> >
> > I'm putting together an akka.io blog post regarding [KAFKA-7548] -
> > KafkaConsumer should not throw away already fetched data for paused
> > partitions.  Since it doesn't change any user-facing APIs it has no KIP,
> > but it has a significant impact on consumer use cases that frequently
> pause
> > and resume partitions, such as in Alpakka Kafka.  I can provide a small
> > summary for you to include in your blog post if you think it's
> appropriate.
> >
> > Regards,
> > Sean
> >
> > On Mon, Nov 18, 2019 at 11:25 AM Manikumar 
> > wrote:
> >
> > > Thanks Chris. will update the blog content.
> > >
> > > On Fri, Nov 15, 2019 at 12:34 AM Chris Egerton 
> > > wrote:
> > >
> > > > Hi Manikumar,
> > > >
> > > > It looks like the header for KIP-440 is accurate ("KIP-440: Extend
> > > Connect
> > > > Converter to support headers") but the content appears to correspond
> to
> > > > KIP-481 ("SerDe Improvements for Connect Decimal type in JSON")
> > instead.
> > > > Could we double-check and make sure that the summary for KIP-440
> > matches
> > > > what was contributed for it (and it nothing was, alter the summary to
> > > more
> > > > closely reflect what KIP-440 accomplished)?
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Thu, Nov 14, 2019 at 10:41 AM Manikumar <
> manikumar.re...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've prepared a preliminary blog post about the upcoming Apache
> Kafka
> > > > 2.4.0
> > > > > release.
> > > > > Please take a look and let me know if you want to add/modify
> details.
> > > > > Thanks to all who contributed to this blog post.
> > > > >
> > > > >
> > > >
> > >
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache1
> > > > >
> > > > > Thanks,
> > > > > Manikumar
> > > > >
> > > >
> > >
> >


Re: Preliminary blog post about the Apache Kafka 2.4.0 release

2019-11-18 Thread Sean Glover
Hi Manikumar,

I'm putting together an akka.io blog post regarding [KAFKA-7548] -
KafkaConsumer should not throw away already fetched data for paused
partitions.  Since it doesn't change any user-facing APIs it has no KIP,
but it has a significant impact on consumer use cases that frequently pause
and resume partitions, such as in Alpakka Kafka.  I can provide a small
summary for you to include in your blog post if you think it's appropriate.

Regards,
Sean

On Mon, Nov 18, 2019 at 11:25 AM Manikumar 
wrote:

> Thanks Chris. will update the blog content.
>
> On Fri, Nov 15, 2019 at 12:34 AM Chris Egerton 
> wrote:
>
> > Hi Manikumar,
> >
> > It looks like the header for KIP-440 is accurate ("KIP-440: Extend
> Connect
> > Converter to support headers") but the content appears to correspond to
> > KIP-481 ("SerDe Improvements for Connect Decimal type in JSON") instead.
> > Could we double-check and make sure that the summary for KIP-440 matches
> > what was contributed for it (and it nothing was, alter the summary to
> more
> > closely reflect what KIP-440 accomplished)?
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Nov 14, 2019 at 10:41 AM Manikumar 
> > wrote:
> >
> > > Hi all,
> > >
> > > I've prepared a preliminary blog post about the upcoming Apache Kafka
> > 2.4.0
> > > release.
> > > Please take a look and let me know if you want to add/modify details.
> > > Thanks to all who contributed to this blog post.
> > >
> > >
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache1
> > >
> > > Thanks,
> > > Manikumar
> > >
> >
>


Re: [VOTE] KIP-531: Drop support for Scala 2.11 in Kafka 2.5

2019-11-18 Thread Sean Glover
+1 (non-binding)

Good idea.  It will streamline the Streams Scala DSL nicely.  It will also
affect 2.11 users of embedded-kafka.  Are there any other non-broker
dependencies that could be affected?

Sean

On Mon, Nov 18, 2019 at 10:43 AM Ismael Juma  wrote:

> Yes, everyone is encouraged to vote. Committer votes are binding, but we
> are interested in what the wider community thinks too.
>
> Ismael
>
> On Mon, Nov 18, 2019 at 5:40 AM Ivan Yurchenko 
> wrote:
>
> > Do I understand correctly, that non-commiters can also vote, despite
> their
> > votes don't decide?
> >
> > If so, then +1 from me.
> >
> > Ivan
> >
> >
> > On Mon, 18 Nov 2019 at 15:19, Ismael Juma  wrote:
> >
> > > Hi all,
> > >
> > > People seemed supportive in general, so I'd like to start a vote on
> > > KIP-531:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-531%3A+Drop+support+for+Scala+2.11+in+Kafka+2.5
> > >
> > > Ismael
> > >
> >
>


-- 
Sean Glover
Principal Engineer, Alpakka, Lightbend, Inc. <https://lightbend.com>
@seg1o <https://twitter.com/seg1o>, in/seanaglover
<https://www.linkedin.com/in/seanaglover/>


Re: [ANNOUNCE] New committer: Mickael Maison

2019-11-08 Thread Sean Glover
Congratulations Mickael :)

On Fri, Nov 8, 2019 at 12:28 PM Ankit Kumar  wrote:

> Congratulations Mickael!!
>
> *Best regards,*
> *Ankit Kumar.*
>
>
> On Fri, Nov 8, 2019 at 9:08 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> wrote:
>
> > Congrats Mickael!! :)
> >
> > On Fri, Nov 8, 2019 at 1:24 PM Satish Duggana 
> > wrote:
> >
> > > Congratulations Mickael!!
> > >
> > > On Fri, Nov 8, 2019 at 2:50 PM Rajini Sivaram  >
> > > wrote:
> > > >
> > > > Congratulations, Mickael, well deserved!!
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > > On Fri, Nov 8, 2019 at 9:08 AM David Jacot 
> > wrote:
> > > >
> > > > > Congrats Mickeal, well deserved!
> > > > >
> > > > > On Fri, Nov 8, 2019 at 8:56 AM Tom Bentley 
> > > wrote:
> > > > >
> > > > > > Congratulations Mickael!
> > > > > >
> > > > > > On Fri, Nov 8, 2019 at 6:41 AM Vahid Hashemian <
> > > > > vahid.hashem...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Congrats Mickael,
> > > > > > >
> > > > > > > Well deserved!
> > > > > > >
> > > > > > > --Vahid
> > > > > > >
> > > > > > > On Thu, Nov 7, 2019 at 9:10 PM Maulin Vasavada <
> > > > > > maulin.vasav...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Mickael!
> > > > > > > >
> > > > > > > > On Thu, Nov 7, 2019 at 8:27 PM Manikumar <
> > > manikumar.re...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Congrats Mickeal!
> > > > > > > > >
> > > > > > > > > On Fri, Nov 8, 2019 at 9:05 AM Dong Lin <
> lindon...@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Congratulations Mickael!
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 7, 2019 at 1:38 PM Jun Rao  >
> > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi, Everyone,
> > > > > > > > > > >
> > > > > > > > > > > The PMC of Apache Kafka is pleased to announce a new
> > Kafka
> > > > > > > committer
> > > > > > > > > > > Mickael
> > > > > > > > > > > Maison.
> > > > > > > > > > >
> > > > > > > > > > > Mickael has been contributing to Kafka since 2016. He
> > > proposed
> > > > > > and
> > > > > > > > > > > implemented multiple KIPs. He has also been propomating
> > > Kafka
> > > > > > > through
> > > > > > > > > > blogs
> > > > > > > > > > > and public talks.
> > > > > > > > > > >
> > > > > > > > > > > Congratulations, Mickael!
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Jun (on behalf of the Apache Kafka PMC)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Thanks!
> > > > > > > --Vahid
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>


-- 
Sean Glover
Principal Engineer, Alpakka, Lightbend, Inc. <https://lightbend.com>
@seg1o <https://twitter.com/seg1o>, in/seanaglover
<https://www.linkedin.com/in/seanaglover/>


Daily build artifacts

2019-09-12 Thread Sean Glover
Hi,

I'm curious if there's a daily build of trunk that can be accessed
publicly?  Specifically, I'm interested in referencing client libraries
from a public maven or ivy repository, if one exists.  I scanned the Apache
Kafka builds and the Release Process wiki, but I didn't see any mention of
a daily build.

Regards,
Sean


[jira] [Created] (KAFKA-8822) Remove CompletedFetch type from Fetcher

2019-08-20 Thread Sean Glover (Jira)
Sean Glover created KAFKA-8822:
--

 Summary: Remove CompletedFetch type from Fetcher
 Key: KAFKA-8822
 URL: https://issues.apache.org/jira/browse/KAFKA-8822
 Project: Kafka
  Issue Type: Improvement
  Components: clients, consumer
Reporter: Sean Glover
Assignee: Sean Glover


In KAFKA-7548 we re-factored the {{Fetcher}} to create an instance of 
{{PartitionRecords}} immediately in the fetch response handler of 
{{Fetcher.sendFetches}}.  The instance variable {{Fetcher.completedFetches}} 
had its type changed to {{ConcurrentLinkedQueue}} and 
therefore the need keep completed fetch partition data in a superfluous type 
({{CompletedFetch}}) is no longer required.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (KAFKA-8814) Consumer benchmark test for paused partitions

2019-08-18 Thread Sean Glover (JIRA)
Sean Glover created KAFKA-8814:
--

 Summary: Consumer benchmark test for paused partitions
 Key: KAFKA-8814
 URL: https://issues.apache.org/jira/browse/KAFKA-8814
 Project: Kafka
  Issue Type: New Feature
  Components: consumer, system tests, tools
Reporter: Sean Glover
Assignee: Sean Glover


A new performance benchmark and corresponding {{ConsumerPerformance}} tools 
addition to support the paused partition performance improvement implemented in 
KAFKA-7548.  Before the fix, when the user would poll for completed fetched 
records for partitions that were paused, the consumer would throw away the data 
because it no longer fetchable.  If the partition is resumed then the data 
would have to be fetched over again.  The fix will cache completed fetched 
records for paused partitions indefinitely so they can be potentially be 
returned once the partition is resumed.

In the Jira issue KAFKA-7548 there are several informal test results shown 
based on a number of different paused partition scenarios, but it was suggested 
that a test in the benchmarks testsuite would be ideal to demonstrate the 
performance improvement.  In order to the implement this benchmark we must 
implement a new feature in {{ConsumerPerformance}} used by the benchmark 
testsuite and the {{kafka-consumer-perf-test.sh}} bin script that will pause 
partitions.  I added the following parameter:

{code:scala}
val pausedPartitionsOpt = parser.accepts("paused-partitions-percent", "The 
percentage [0-1] of subscribed " +
  "partitions to pause each poll.")
.withOptionalArg()
.describedAs("percent")
.withValuesConvertedBy(regex("^0(\\.\\d+)?|1\\.0$")) // matches [0-1] 
with decimals
.ofType(classOf[Float])
.defaultsTo(0F)
{code}

This allows the user to specify a percentage (represented a floating point 
value from {{0..1}}) of partitions to pause each poll interval.  When the value 
is greater than {{0}} then we will take the next _n_ partitions to pause.  I 
ran the test on `trunk` and rebased onto the `2.3.0` tag for the following test 
summaries of 
{{kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput}}.
  The test will rotate through pausing {{80%}} of assigned partitions (5/6) 
each poll interval.  I ran this on my laptop.

{{trunk}} ({{aa4ba8eee8e6f52a9d80a98fb2530b5bcc1b9a11}})

{code}

SESSION REPORT (ALL TESTS)
ducktape version: 0.7.5
session_id:   2019-08-18--010
run time: 2 minutes 29.104 seconds
tests run:1
passed:   1
failed:   0
ignored:  0

test_id:
kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8
status: PASS
run time:   2 minutes 29.048 seconds
{"records_per_sec": 450207.0953, "mb_per_sec": 42.9351}

{code}

{{2.3.0}}

{code}

SESSION REPORT (ALL TESTS)
ducktape version: 0.7.5
session_id:   2019-08-18--011
run time: 2 minutes 41.228 seconds
tests run:1
passed:   1
failed:   0
ignored:  0

test_id:
kafkatest.benchmarks.core.benchmark_test.Benchmark.test_consumer_throughput.paused_partitions_percent=0.8
status: PASS
run time:   2 minutes 41.168 seconds
{"records_per_sec": 246574.6024, "mb_per_sec": 23.5152}

{code}

The increase in record and data throughput is significant.  Based on other 
consumer fetch metrics there are also improvements to fetch rate.  Depending on 
how often partitions are paused and resumed it's possible to save a lot of data 
transfer between the consumer and broker as well.

Please see the pull request for the associated changes.  I was unsure if I 
needed to create a KIP because while technically I added a new public api to 
the {{ConsumerPerformance}} tool, it was only to enable this benchmark to run.  
If you feel that a KIP is necessary I'll create one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2019-07-15 Thread Sean Glover
To hopefully spark some discussion I've copied the motivation section from
the KIP:

Consumer lag is a useful metric to monitor how many records are queued to
be processed.  We can look at individual lag per partition or we may
aggregate metrics. For example, we may want to monitor what the maximum lag
of any particular partition in our consumer subscription so we can identify
hot partitions, caused by an insufficient producing partitioning strategy.
We may want to monitor a sum of lag across all partitions so we have a
sense as to our total backlog of messages to consume. Lag in offsets is
useful when you have a good understanding of your messages and processing
characteristics, but it doesn’t tell us how far behind *in time* we are.
This is known as wait time in queueing theory, or more informally it’s
referred to as latency.

The latency of a message can be defined as the difference between when that
message was first produced to when the message is received by a consumer.
The latency of records in a partition correlates with lag, but a larger lag
doesn’t necessarily mean a larger latency. For example, a topic consumed by
two separate application consumer groups A and B may have similar lag, but
different latency per partition.  Application A is a consumer which
performs CPU intensive business logic on each message it receives. It’s
distributed across many consumer group members to handle the load quickly
enough, but since its processing time is slower, it takes longer to process
each message per partition.  Meanwhile, Application B is a consumer which
performs a simple ETL operation to land streaming data in another system,
such as HDFS. It may have similar lag to Application A, but because it has
a faster processing time its latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build
Service Level Agreements (SLAs) based on non-functional requirements of the
streaming system.  For example, the system must never have a latency of
greater than 10 minutes. This SLA could be used in monitoring alerts or as
input to automatic scaling solutions.

On Thu, Jul 11, 2019 at 12:36 PM Sean Glover 
wrote:

> Hi kafka-dev,
>
> I've created KIP-489 as a proposal for adding latency metrics to the Kafka
> Consumer in a similar way as record-lag metrics are implemented.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/489%3A+Kafka+Consumer+Record+Latency+Metric
>
> Regards,
> Sean
>
> --
> Principal Engineer, Lightbend, Inc.
>
> <http://lightbend.com>
>
> @seg1o <https://twitter.com/seg1o>, in/seanaglover
> <https://www.linkedin.com/in/seanaglover/>
>


-- 
Principal Engineer, Lightbend, Inc.

<http://lightbend.com>

@seg1o <https://twitter.com/seg1o>, in/seanaglover
<https://www.linkedin.com/in/seanaglover/>


[DISCUSS] KIP-489 Kafka Consumer Record Latency Metric

2019-07-11 Thread Sean Glover
Hi kafka-dev,

I've created KIP-489 as a proposal for adding latency metrics to the Kafka
Consumer in a similar way as record-lag metrics are implemented.

https://cwiki.apache.org/confluence/display/KAFKA/489%3A+Kafka+Consumer+Record+Latency+Metric

Regards,
Sean

-- 
Principal Engineer, Lightbend, Inc.



@seg1o , in/seanaglover



[jira] [Created] (KAFKA-8656) Kafka Consumer Record Latency Metric

2019-07-11 Thread Sean Glover (JIRA)
Sean Glover created KAFKA-8656:
--

 Summary: Kafka Consumer Record Latency Metric
 Key: KAFKA-8656
 URL: https://issues.apache.org/jira/browse/KAFKA-8656
 Project: Kafka
  Issue Type: New Feature
  Components: metrics
Reporter: Sean Glover
Assignee: Sean Glover


Consumer lag is a useful metric to monitor how many records are queued to be 
processed.  We can look at individual lag per partition or we may aggregate 
metrics. For example, we may want to monitor what the maximum lag of any 
particular partition in our consumer subscription so we can identify hot 
partitions, caused by an insufficient producing partitioning strategy.  We may 
want to monitor a sum of lag across all partitions so we have a sense as to our 
total backlog of messages to consume. Lag in offsets is useful when you have a 
good understanding of your messages and processing characteristics, but it 
doesn’t tell us how far behind _in time_ we are.  This is known as wait time in 
queueing theory, or more informally it’s referred to as latency.

The latency of a message can be defined as the difference between when that 
message was first produced to when the message is received by a consumer.  The 
latency of records in a partition correlates with lag, but a larger lag doesn’t 
necessarily mean a larger latency. For example, a topic consumed by two 
separate application consumer groups A and B may have similar lag, but 
different latency per partition.  Application A is a consumer which performs 
CPU intensive business logic on each message it receives. It’s distributed 
across many consumer group members to handle the load quickly enough, but since 
its processing time is slower, it takes longer to process each message per 
partition.  Meanwhile, Application B is a consumer which performs a simple ETL 
operation to land streaming data in another system, such as HDFS. It may have 
similar lag to Application A, but because it has a faster processing time its 
latency per partition is significantly less.

If the Kafka Consumer reported a latency metric it would be easier to build 
Service Level Agreements (SLAs) based on non-functional requirements of the 
streaming system.  For example, the system must never have a latency of greater 
than 10 minutes. This SLA could be used in monitoring alerts or as input to 
automatic scaling solutions.

[KIP-488|[https://cwiki.apache.org/confluence/display/KAFKA/488%3A+Kafka+Consumer+Record+Latency+Metric]]

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Request access to create a KIP

2019-07-10 Thread Sean Glover
Hi,

I'd like permission to create a new KIP page in the wiki.  Confluence
username: sean.glover

Regards,
Sean

-- 
Principal Engineer, Lightbend, Inc.



@seg1o , in/seanaglover



Re: [DISCUSS] KIP-385: Provide configuration allowing consumer to no throw away prefetched data

2019-06-27 Thread Sean Glover
Hi everyone,

I want to revive a solution to this issue.  I created a new PR that
accomodates Jason Gustafson's suggestion in the original PR to re-add
paused completed fetches back to the completed fetches queue for less
bookeeping.  If someone could jump in and do a review it would be
appreciated!

I also updated KAFKA-7548 with more information.

Patch: https://github.com/apache/kafka/pull/6988 (original:
https://github.com/apache/kafka/pull/5844)
Jira: https://issues.apache.org/jira/browse/KAFKA-7548
Sample project:
https://github.com/seglo/kafka-consumer-tests/tree/seglo/KAFKA-7548
Grafana snapshot:
https://snapshot.raintank.io/dashboard/snapshot/RDFTsgNvzP5bTmuc8X6hq7vLixp9tUtL?orgId=2

Regards,
Sean

On Wed, Oct 31, 2018 at 8:41 PM Zahari Dichev 
wrote:

> Just looked at it,
>
> Great work. Thanks a lot for the patch. This should certainly improve
> things !
>
> Zahari
>
> On Wed, Oct 31, 2018 at 6:25 PM  wrote:
>
> > Hi there, I  will take a look first thing i get home.
> >
> > Zahari
> >
> > > On 31 Oct 2018, at 18:23, Mayuresh Gharat 
> > wrote:
> > >
> > > Hi Colin, Zahari,
> > >
> > > Wanted to check if you can review the patch and let me know, if we need
> > to
> > > make any changes?
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Fri, Oct 26, 2018 at 1:41 PM Zahari Dichev 
> > > wrote:
> > >
> > >> Thanks for participating the discussion. Indeed, I learned quite a
> lot.
> > >> Will take a look at the patch as well and spend some time hunting for
> > some
> > >> other interesting issue to work on :)
> > >>
> > >> Cheers,
> > >> Zahari
> > >>
> > >>> On Fri, Oct 26, 2018 at 8:49 PM Colin McCabe 
> > wrote:
> > >>>
> > >>> Hi Zahari,
> > >>>
> > >>> I think we can retire the KIP, since the KAFKA-7548 patch should
> solve
> > >> the
> > >>> issue without any changes that require a KIP.  This is actually the
> > best
> > >>> thing we could do for our users, since things will "just work" more
> > >>> efficiently without a lot of configuration knobs.
> > >>>
> > >>> I think you did an excellent job raising this issue and discussing
> it.
> > >>> It's a very good contribution to the project even if you don't end up
> > >>> writing the patch yourself.  I'm going to take a look at the patch
> > today.
> > >>> If you want to take a look, that would also be good.
> > >>>
> > >>> best,
> > >>> Colin
> > >>>
> > >>>
> >  On Thu, Oct 25, 2018, at 12:25, Zahari Dichev wrote:
> >  Hi there Mayuresh,
> > 
> >  Great to heat that this is actually working well in production for
> > some
> >  time now. I have changed the details of the KIP to reflect the fact
> > >> that
> > >>> as
> >  already discussed - we do not really need any kind of configuration
> as
> > >>> this
> >  data should not be thrown away at all.  Submitting a PR sounds
> great,
> >  although I feel a bit jealous you (LinkedIn) beat me to my first
> kafka
> >  commit  ;)  Not sure how things stand with the voting process ?
> > 
> >  Zahari
> > 
> > 
> > 
> >  On Thu, Oct 25, 2018 at 7:39 PM Mayuresh Gharat <
> > >>> gharatmayures...@gmail.com>
> >  wrote:
> > 
> > > Hi Colin/Zahari,
> > >
> > > I have created a ticket for the similar/same feature :
> > > https://issues.apache.org/jira/browse/KAFKA-7548
> > > We (Linkedin) had a use case in Samza at Linkedin when they moved
> > >> from
> > >>> the
> > > SimpleConsumer to KafkaConsumer and they wanted to do this pause
> and
> > >>> resume
> > > pattern.
> > > They realized there was performance degradation when they started
> > >> using
> > > KafkaConsumer.assign() and pausing and unPausing partitions. We
> > >>> realized
> > > that not throwing away the prefetched data for paused partitions
> > >> might
> > > improve the performance. We wrote a benchmark (I can share it if
> > >>> needed) to
> > > prove this. I have attached the findings in the ticket.
> > > We have been running the hotfix internally for quite a while now.
> > >> When
> > > samza ran this fix in production, they realized 30% improvement in
> > >>> there
> > > app performance.
> > > I have the patch ready on our internal branch and would like to
> > >> submit
> > >>> a PR
> > > for this on the above ticket asap.
> > > I am not sure, if we need a separate config for this as we haven't
> > >>> seen a
> > > lot of memory overhead due to this in our systems. We have had this
> > >>> running
> > > in production for a considerable amount of time without any issues.
> > > It would be great if you guys can review the PR once its up and see
> > >> if
> > >>> that
> > > satisfies your requirement. If it doesn't then we can think more on
> > >> the
> > > config driven approach.
> > > Thoughts??
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > >
> > > On Thu, Oct 25, 2018 at 8:21 AM Colin McCabe 
> > >>> wrote:
> > >
> > >> Hi 

Re: [ANNOUNCE] Apache Kafka 2.0.0 Released

2018-07-30 Thread Sean Glover
pper API for our Kafka Streams DSL,
> > > which provides better type inference and better type safety during
> > compile
> > > time. Scala users can have less boilerplate in their code, notably
> > > regarding
> > > Serdes with new implicit Serdes.
> > >
> > > ** Message headers are now supported in the Kafka Streams Processor
> API,
> > > allowing users to add and manipulate headers read from the source
> topics
> > > and propagate them to the sink topics.
> > >
> > > ** Windowed aggregations performance in Kafka Streams has been largely
> > > improved (sometimes by an order of magnitude) thanks to the new
> > > single-key-fetch API.
> > >
> > > ** We have further improved unit testibility of Kafka Streams with the
> > > kafka-streams-testutil artifact.
> > >
> > >
> > >
> > >
> > >
> > > All of the changes in this release can be found in the release notes:
> > >
> > > https://www.apache.org/dist/kafka/2.0.0/RELEASE_NOTES.html
> > >
> > >
> > >
> > >
> > >
> > > You can download the source and binary release (Scala 2.11 and Scala
> > 2.12)
> > > from:
> > >
> > > https://kafka.apache.org/downloads#2.0.0
> > > <https://kafka.apache.org/downloads#2.0.0>
> > >
> > >
> > >
> > >
> > >
> >
> ---
> > >
> > >
> > >
> > >
> > >
> > > Apache Kafka is a distributed streaming platform with four core APIs:
> > >
> > >
> > >
> > > ** The Producer API allows an application to publish a stream records
> to
> > >
> > > one or more Kafka topics.
> > >
> > >
> > >
> > > ** The Consumer API allows an application to subscribe to one or more
> > >
> > > topics and process the stream of records produced to them.
> > >
> > >
> > >
> > > ** The Streams API allows an application to act as a stream processor,
> > >
> > > consuming an input stream from one or more topics and producing an
> > >
> > > output stream to one or more output topics, effectively transforming
> the
> > >
> > > input streams to output streams.
> > >
> > >
> > >
> > > ** The Connector API allows building and running reusable producers or
> > >
> > > consumers that connect Kafka topics to existing applications or data
> > >
> > > systems. For example, a connector to a relational database might
> > >
> > > capture every change to a table.
> > >
> > >
> > >
> > >
> > >
> > > With these APIs, Kafka can be used for two broad classes of
> application:
> > >
> > >
> > >
> > > ** Building real-time streaming data pipelines that reliably get data
> > >
> > > between systems or applications.
> > >
> > >
> > >
> > > ** Building real-time streaming applications that transform or react
> > >
> > > to the streams of data.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Apache Kafka is in use at large and small companies worldwide,
> including
> > >
> > > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest,
> Rabobank,
> > >
> > > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > A big thank you for the following 131 contributors to this release!
> > >
> > >
> > >
> > > Adem Efe Gencer, Alex D, Alex Dunayevsky, Allen Wang, Andras Beni,
> > >
> > > Andy Bryant, Andy Coates, Anna Povzner, Arjun Satish, asutosh936,
> > >
> > > Attila Sasvari, bartdevylder, Benedict Jin, Bill Bejeck, Blake Miller,
> > >
> > > Boyang Chen, cburroughs, Chia-Ping Tsai, Chris Egerton, Colin P.
> Mccabe,
> > >
> > > Colin Patrick McCabe, ConcurrencyPractitioner, Damian Guy, dan norwood,
> > >
> > > Daniel Shuy, Daniel Wojda, Dark, David Glasser, Debasish Ghosh,
> Detharon,
> > >
> > > Dhruvil Shah, Dmitry Minkovsky, Dong Lin, Edoardo Comar, emmanuel
> Harel,
> > >
> > > Eugene Sevastyanov, Ewen Cheslack-Postava, Fedor Bobin,
> >

Re: [DISCUSS] KIP-270 A Scala wrapper library for Kafka Streams

2018-03-19 Thread Sean Glover
Type names: I vote for option 2.  The user must explicitly add a dependency
to this library and the wrapper types are in a different package.  It seems
reasonable to expect them to do an import rename if there's a need to drop
down to the Java API.

Test Utils: The test utils in kafka-streams-scala are nice and lean, but
I'm not sure if it provides much more value than other options that exist
in the community.  There's an embedded Kafka/ZK project implementation for
ScalaTest that's popular and active: manub/scalatest-embedded-kakfa.  It
implies you must also use ScalaTest, which I acknowledge isn't everyone's
first choice for Scala test framework, but it probably is one of, if not
the most, popular library.  It includes a DSL for Kafka Streams too.  If
this KIP is accepted then perhaps a PR to that project could be made to
support the new wrapper implementations.

https://github.com/manub/scalatest-embedded-kafka#scalatest-embedded-kafka-streams

Sean

On Sun, Mar 18, 2018 at 4:05 AM, Debasish Ghosh <
debasish.gh...@lightbend.com> wrote:

> >
> > > Should this be 1.2  (maybe it's even better to not put any version at
> > all)
>
>
> Actually wanted to emphasize that the support is from 1.0.0 onwards ..
> Should we make that explicit ? Like ..
>
> kafka-streams-scala only depends on the Scala standard library and Kafka
> > Streams 1.0.0+.
>
>
>  In 1.1 release, we add a new module `kafka-streams-test-utils` to simplify
> > testing for Kafka Streams applications. Are those test utils suitable for
> > Scala users or should we add Scala wrappers for those, too?
>
>
> I will check up and let you know ..
>
> Also I am not clear about the decision on renaming of Scala abstractions.
> Can we have a consensus on this ? Here's the summary ..
>
> *Option 1:* Keep names separate (KStream for Java class, KStreamS for
> Scala). No renaming of imports required.
> *Option 2:* Unify names (KStream for Java and Scala class names). No
> conflict since they will reside in different packages. But if we need to
> use both abstractions, renaming of imports are required. But again, this
> may not be a too frequent use case.
>
> Suggestions ?
>
> regards.
>
> On Sat, Mar 17, 2018 at 3:07 AM, Matthias J. Sax 
> wrote:
>
> > Thanks a lot for the KIP! Two questions:
> >
> > 1) the KIP states:
> >
> > > kafka-streams-scala only depends on the Scala standard library and
> Kafka
> > Streams 1.0.0.
> >
> > Should this be 1.2  (maybe it's even better to not put any version at
> all)
> >
> >
> > 2) In 1.1 release, we add a new module `kafka-streams-test-utils` to
> > simplify testing for Kafka Streams applications. Are those test utils
> > suitable for Scala users or should we add Scala wrappers for those, too?
> >
> >
> > -Matthias
> >
> >
> > On 3/16/18 11:10 AM, Ted Yu wrote:
> > > Import renames seem to be fine.
> > >
> > > The class names with trailing 'S' look clean.
> > >
> > > Cheers
> > >
> > > On Fri, Mar 16, 2018 at 11:04 AM, Ismael Juma 
> wrote:
> > >
> > >> If this is rare (as it sounds), relying on import renames seems fine
> to
> > me.
> > >> Let's see what others think.
> > >>
> > >> Ismael
> > >>
> > >> On Fri, Mar 16, 2018 at 10:51 AM, Debasish Ghosh <
> > >> debasish.gh...@lightbend.com> wrote:
> > >>
> > >>> I am not sure if this is practical or not. But theoretically a user
> may
> > >>> want to extract the unsafe Java abstraction from the Scala ones and
> use
> > >>> Java APIs on them .. e.g.
> > >>>
> > >>> val userClicksStream: KStreamS[String, Long] =
> > >>> builder.stream(userClicksTopic) // Scala abstraction
> > >>>
> > >>> val jStream: KStream[String, Long] = userClicksStream.inner //
> > publishes
> > >>> the underlying Java abstraction
> > >>>
> > >>> //.. work with Java, may be pass to some function written in Java
> > >>>
> > >>> I do realize this is somewhat of a convoluted use case and may not be
> > >>> practically useful ..
> > >>>
> > >>> Otherwise we can very well work on the suggested approach of unifying
> > the
> > >>> names ..
> > >>>
> > >>> regards.
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Mar 16, 2018 at 10:28 PM, Ismael Juma 
> > wrote:
> > >>>
> >  What does "mixed mode application" mean? What are the cases where a
> > >> user
> >  would want to use both APIs? I think that would help understand the
> >  reasoning.
> > 
> >  Thanks,
> >  Ismael
> > 
> >  On Fri, Mar 16, 2018 at 8:48 AM, Debasish Ghosh <
> >  debasish.gh...@lightbend.com> wrote:
> > 
> > > Hi Damian -
> > >
> > > We could. But in case the user wants to use both Scala and Java
> APIs
> > >>> (may
> > > be for some mixed mode application), won't that be confusing ? She
> > >> will
> > > have to do something like ..
> > >
> > > import o.a.k.s.scala.{KStream => KStreamS}
> > >
> > > to rename Scala imports or the other way round for imported Java
> > >>> classes.
> > >
> > > regards.
> > 

[jira] [Commented] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-11-26 Thread Sean Glover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832845#comment-13832845
 ] 

Sean Glover commented on KAFKA-903:
---

I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

 [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
 (Logging.scala:109) - Attempt to swap the new high watermark file with the 
 old one failed
 ---

 Key: KAFKA-903
 URL: https://issues.apache.org/jira/browse/KAFKA-903
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
 Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
 probably copied on 4/30. kafka-0.8, built current on 4/30.
 -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
 -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
 -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
 commons-collections-3.2.1.jar
 -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
 -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
 -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
 -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
 -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
 -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
 -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
 -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
 -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
 -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
 jackson-dataformat-smile-2.1.4.jar
 -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
 jackson-mapper-asl-1.9.12.jar
 -rwx--+ 1 reefedjib None   47913 May  9 23:39 jopt-simple-3.0-rc2.jar
 -rwx--+ 1 reefedjib None 2365575 Apr 30 13:06 
 kafka_2.8.0-0.8.0-SNAPSHOT.jar
 -rwx--+ 1 reefedjib None  481535 Jan 11 13:46 log4j-1.2.16.jar
 -rwx--+ 1 reefedjib None   20647 Apr 18 13:41 log4j-over-slf4j-1.6.6.jar
 -rwx--+ 1 reefedjib None  251784 Apr 18 13:41 logback-classic-1.0.6.jar
 -rwx--+ 1 reefedjib None  349706 Apr 18 13:41 logback-core-1.0.6.jar
 -rwx--+ 1 reefedjib None   82123 Nov 26 13:11 metrics-core-2.2.0.jar
 -rwx--+ 1 reefedjib None 1540457 Jul 12  2012 ojdbc14.jar
 -rwx--+ 1 reefedjib None 6418368 Apr 30 08:23 scala-library-2.8.2.jar
 -rwx--+ 1 reefedjib None 3114958 Apr  2 07:47 scalatest_2.10-1.9.1.jar
 -rwx--+ 1 reefedjib None   25962 Apr 18 13:41 slf4j-api-1.6.5.jar
 -rwx--+ 1 reefedjib None   62269 Nov 29 03:26 zkclient-0.2.jar
 -rwx--+ 1 reefedjib None  601677 Apr 18 13:41 zookeeper-3.3.3.jar
Reporter: Rob Withers
Assignee: Jun Rao
Priority: Blocker
 Fix For: 0.8

 Attachments: kafka-903.patch, kafka-903_v2.patch, kafka-903_v3.patch, 
 kafka_2.8.0-0.8.0-SNAPSHOT.jar


 This FATAL shuts down both brokers on windows, 
 {2013-05-10 18:23:57,636} DEBUG [local-vat] (Logging.scala:51) - Sending 1 
 messages with no compression to [robert_v_2x0,0]
 {2013-05-10 18:23:57,637} DEBUG [local-vat] (Logging.scala:51) - Producer 
 sending messages with correlation id 178 for topics [robert_v_2x0,0] to 
 broker 1 on 192.168.1.100:9093
 {2013-05-10 18:23:57,689} FATAL [highwatermark-checkpoint-thread1] 
 (Logging.scala:109) - Attempt to swap the new high watermark file with the 
 old one failed
 {2013-05-10 18:23

[jira] [Comment Edited] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-11-26 Thread Sean Glover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832845#comment-13832845
 ] 

Sean Glover edited comment on KAFKA-903 at 11/26/13 6:53 PM:
-

I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

EDIT:

I think it may have been due to the fact that I set log.dir in my kafka 
server.properties file to a relative directory: ..\\data\\kafka-logs.  I did 
this just to keep everything related to kafka in one place.  We only use kafka 
on windows on development machines.

When I updated log.dir to an absolute path I have not seen the Exception.


was (Author: seglo):
I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

 [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
 (Logging.scala:109) - Attempt to swap the new high watermark file with the 
 old one failed
 ---

 Key: KAFKA-903
 URL: https://issues.apache.org/jira/browse/KAFKA-903
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
 Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
 probably copied on 4/30. kafka-0.8, built current on 4/30.
 -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
 -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
 -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
 commons-collections-3.2.1.jar
 -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
 -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
 -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
 -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar
 -rwx--+ 1 reefedjib None  279193 Jan 11 13:45 commons-lang-2.5.jar
 -rwx--+ 1 reefedjib None   60686 Jan 11 13:45 commons-logging-1.1.1.jar
 -rwx--+ 1 reefedjib None 1891110 Apr 18 13:41 guava-13.0.1.jar
 -rwx--+ 1 reefedjib None  206866 Apr  7 21:24 jackson-core-2.1.4.jar
 -rwx--+ 1 reefedjib None  232245 Apr  7 21:24 jackson-core-asl-1.9.12.jar
 -rwx--+ 1 reefedjib None   69314 Apr  7 21:24 
 jackson-dataformat-smile-2.1.4.jar
 -rwx--+ 1 reefedjib None  780385 Apr  7 21:24 
 jackson-mapper-asl-1.9.12.jar
 -rwx

[jira] [Comment Edited] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-11-26 Thread Sean Glover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832845#comment-13832845
 ] 

Sean Glover edited comment on KAFKA-903 at 11/26/13 7:15 PM:
-

I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

EDIT:

I think it may have been due to the fact that I set log.dir in my kafka 
server.properties file to a relative directory: ..\\data\\kafka-logs.  I did 
this just to keep everything related to kafka in one place.  We only use kafka 
on windows on development machines.

When I updated log.dir to an absolute path I have not seen the Exception.

EDIT 2:

Upon further investigation this may only have to do with my usage of parent 
directory ..  

I seem to be able use relative directories (relative to the bin directory in 
the release package).


was (Author: seglo):
I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

EDIT:

I think it may have been due to the fact that I set log.dir in my kafka 
server.properties file to a relative directory: ..\\data\\kafka-logs.  I did 
this just to keep everything related to kafka in one place.  We only use kafka 
on windows on development machines.

When I updated log.dir to an absolute path I have not seen the Exception.

 [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
 (Logging.scala:109) - Attempt to swap the new high watermark file with the 
 old one failed
 ---

 Key: KAFKA-903
 URL: https://issues.apache.org/jira/browse/KAFKA-903
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
 Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
 probably copied on 4/30. kafka-0.8, built current on 4/30.
 -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
 -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
 -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
 commons-collections-3.2.1.jar
 -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
 -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
 -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
 -rwx--+ 1 reefedjib None  109043 Jan 20  2008 commons-io-1.4.jar

[jira] [Comment Edited] (KAFKA-903) [0.8.0 - windows] FATAL - [highwatermark-checkpoint-thread1] (Logging.scala:109) - Attempt to swap the new high watermark file with the old one failed

2013-11-26 Thread Sean Glover (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832845#comment-13832845
 ] 

Sean Glover edited comment on KAFKA-903 at 11/26/13 7:58 PM:
-

I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

EDIT:

I think it may have been due to the fact that I set log.dir in my kafka 
server.properties file to a relative directory: ..\\data\\kafka-logs.  I did 
this just to keep everything related to kafka in one place.  We only use kafka 
on windows on development machines.

When I updated log.dir to an absolute path I have not seen the Exception.

EDIT 3: Removed EDIT 2, irrelevant.


was (Author: seglo):
I still see this exception on Windows 7 Pro SP1 with kafka 0.8.0 beta 1.

[2013-11-26 13:20:29,374] FATAL Attempt to swap the new high watermark file 
with the old one failed (kafka.server.HighwaterMarkCheckpoint)
[2013-11-26 13:20:29,391] INFO [Kafka Server 0], shutting down 
(kafka.server.KafkaServer)
[2013-11-26 13:20:29,393] INFO [Socket Server on Broker 0], shutting down 
(kafka.network.SocketServer)
[2013-11-26 13:20:29,399] INFO [Socket Server on Broker 0], shut down 
completely (kafka.network.SocketServer)
[2013-11-26 13:20:29,401] INFO [Kafka Request Handler on Broker 0], shutting 
down (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,403] INFO [Kafka Request Handler on Broker 0], shutted 
down completely (kafka.server.KafkaRequestHandlerPool)
[2013-11-26 13:20:29,405] INFO Shutdown Kafka scheduler 
(kafka.utils.KafkaScheduler)
[2013-11-26 13:20:29,648] INFO Closing zookeeper client... 
(kafka.server.KafkaZooKeeper)
[2013-11-26 13:20:29,660] INFO [Replica Manager on Broker 0]: Shut down 
(kafka.server.ReplicaManager)
[2013-11-26 13:20:29,671] INFO [ReplicaFetcherManager on broker 0] shutting 
down (kafka.server.ReplicaFetcherManager)
[2013-11-26 13:20:29,674] INFO [ReplicaFetcherManager on broker 0] shutdown 
completed (kafka.server.ReplicaFetcherManager)

EDIT:

I think it may have been due to the fact that I set log.dir in my kafka 
server.properties file to a relative directory: ..\\data\\kafka-logs.  I did 
this just to keep everything related to kafka in one place.  We only use kafka 
on windows on development machines.

When I updated log.dir to an absolute path I have not seen the Exception.

EDIT 2:

Upon further investigation this may only have to do with my usage of parent 
directory ..  

I seem to be able use relative directories (relative to the bin directory in 
the release package).

 [0.8.0 - windows]  FATAL - [highwatermark-checkpoint-thread1] 
 (Logging.scala:109) - Attempt to swap the new high watermark file with the 
 old one failed
 ---

 Key: KAFKA-903
 URL: https://issues.apache.org/jira/browse/KAFKA-903
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8
 Environment: Windows 7 with SP 1; jdk 7_0_17; scala-library-2.8.2, 
 probably copied on 4/30. kafka-0.8, built current on 4/30.
 -rwx--+ 1 reefedjib None   41123 Mar 19  2009 commons-cli-1.2.jar
 -rwx--+ 1 reefedjib None   58160 Jan 11 13:45 commons-codec-1.4.jar
 -rwx--+ 1 reefedjib None  575389 Apr 18 13:41 
 commons-collections-3.2.1.jar
 -rwx--+ 1 reefedjib None  143847 May 21  2009 commons-compress-1.0.jar
 -rwx--+ 1 reefedjib None   52543 Jan 11 13:45 commons-exec-1.1.jar
 -rwx--+ 1 reefedjib None   57779 Jan 11 13:45 commons-fileupload-1.2.1.jar
 -rwx--+ 1 reefedjib None