[jira] [Created] (KAFKA-8678) LeaveGroup request versioning on throttle time is incorrect

2019-07-17 Thread Boyang Chen (JIRA)
Boyang Chen created KAFKA-8678:
--

 Summary: LeaveGroup request versioning on throttle time is 
incorrect
 Key: KAFKA-8678
 URL: https://issues.apache.org/jira/browse/KAFKA-8678
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 2.3.0, 2.2.0
Reporter: Boyang Chen
Assignee: Boyang Chen


[https://github.com/apache/kafka/pull/6188] accidentally changed the version of 
setting throttle time from v1 to v2. We should fix this change.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] KIP-479: Add Materialized to Join

2019-07-17 Thread Guozhang Wang
Hi Bill, thanks for your explanations. I'm on board with your decision too.


Guozhang

On Wed, Jul 17, 2019 at 10:20 AM Bill Bejeck  wrote:

> Thanks for the response, John.
>
> > If I can offer my thoughts, it seems better to just document on the
> > Stream join javadoc for the Materialized parameter that it will not
> > make the join result queriable. I'm not opposed to the queriable flag
> > in general, but introducing it is a much larger consideration that has
> > previously derailed this KIP discussion. In the interest of just
> > closing the gap and keeping the API change small, it seems better to
> > just go with documentation for now.
>
> I agree with your statement here.  IMHO the most important goal of this KIP
> is to not breaking existing users and gain some consistency of the API.
>
> I'll update the KIP accordingly.
>
> -Bill
>
> On Tue, Jul 16, 2019 at 11:55 AM John Roesler  wrote:
>
> > Hi Bill,
> >
> > Thanks for driving this KIP toward a conclusion. I'm on board with
> > your decision.
> >
> > You didn't mention whether you're still proposing to add the
> > "queriable" flag to the Materialized config object, or just document
> > that a Stream join is never queriable. Both options have come up
> > earlier in the discussion, so it would be good to pin this down.
> >
> > If I can offer my thoughts, it seems better to just document on the
> > Stream join javadoc for the Materialized parameter that it will not
> > make the join result queriable. I'm not opposed to the queriable flag
> > in general, but introducing it is a much larger consideration that has
> > previously derailed this KIP discussion. In the interest of just
> > closing the gap and keeping the API change small, it seems better to
> > just go with documentation for now.
> >
> > Thanks again,
> > -John
> >
> > On Thu, Jul 11, 2019 at 2:45 PM Bill Bejeck  wrote:
> > >
> > > Thanks all for the great discussion so far.
> > >
> > > Everyone has made excellent points, and I appreciate the detail
> everyone
> > > has put into their arguments.
> > >
> > > However, after carefully evaluating all the points made so far,
> creating
> > an
> > > overload with Materialized is still my #1 option.
> > > My reasoning for saying so is two-fold:
> > >
> > >1. It's a small change, and IMHO since it's consistent with our
> > current
> > >API concerning state store usage, the cognitive load on users will
> be
> > >minimal.
> > >2. It achieves the most important goal of this KIP, namely to close
> > the
> > >gap of naming state stores independently of the join operator name.
> > >
> > > Additionally, I agree with the points made by Matthias earlier (I
> realize
> > > there is some overlap here).
> > >
> > > >  - the main purpose of this KIP is to close the naming gap what we
> > achieve
> > > >  - we can allow people to use the new in-memory store
> > > >  - we allow people to enable/disable caching
> > > >  - we unify the API
> > > >  - we decouple querying from naming
> > > >  - it's a small API change
> > >
> > > Although it's not a perfect solution,  IMHO the positives of using
> > > Materialize far outweigh the negatives, and from what we've discussed
> so
> > > far, anything we implement seems to involve an additional change down
> the
> > > road.
> > >
> > > If others are still strongly opposed to using Materialized, my other
> > > preferences would be
> > >
> > >1. Add a "withStoreName" to Joined.  Although I agree with Guozhang
> > that
> > >having a parameter that only applies to one use-case would be
> clumsy.
> > >2. Add a String overload for naming the store, but this would be my
> > >least favorite option as IMHO this seems to be a step backward from
> > why we
> > >introduced configuration objects in the first place.
> > >
> > > Thanks,
> > > Bill
> > >
> > > On Thu, Jun 27, 2019 at 4:45 PM Matthias J. Sax  >
> > > wrote:
> > >
> > > > Thanks for the KIP Bill!
> > > >
> > > > Great discussion to far.
> > > >
> > > > About John's idea about querying upstream stores and don't
> materialize
> > a
> > > > store: I agree with Bill that this seems to be an orthogonal
> question,
> > > > and it might be better to treat it as an independent optimization and
> > > > exclude from this KIP.
> > > >
> > > > > What should be the behavior if there is no store
> > > > > configured (e.g., if Materialized with only serdes) and querying is
> > > > > enabled?
> > > >
> > > > IMHO, this could be an error case. If one wants to query a store,
> they
> > > > need to provide a name -- if you don't know the name, how would you
> > > > actually query the store (even if it would be possible to get the
> name
> > > > from the `TopologyDescription`, it seems clumsy).
> > > >
> > > > If we don't want to throw an error, materializing seems to be the
> right
> > > > option, to exclude "query optimization" from this KIP. I would be ok
> > > > with this option, even if it's clumsy to get the name from
> > > > 

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread John Roesler
Yes, I believe that's what I had in mind. Again, not totally sure it
makes sense, but I believe something similar is the rationale for
having the partitioner option in Produced.

Thanks,
-John

On Wed, Jul 17, 2019 at 3:20 PM Levani Kokhreidze
 wrote:
>
> Hey John,
>
> Oh that’s interesting use-case.
> Do I understand this correctly, in your example I would first issue 
> repartition(Repartitioned) with proper partitioner that essentially would be 
> the same as the topic I want to join with and then do the KStream#join with 
> DSL?
>
> Regards,
> Levani
>
> > On Jul 17, 2019, at 11:11 PM, John Roesler  wrote:
> >
> > Hey, all, just to chime in,
> >
> > I think it might be useful to have an option to specify the
> > partitioner. The case I have in mind is that some data may get
> > repartitioned and then joined with an input topic. If the right-side
> > input topic uses a custom partitioning strategy, then the
> > repartitioned stream also needs to be partitioned with the same
> > strategy.
> >
> > Does that make sense, or did I maybe miss something important?
> >
> > Thanks,
> > -John
> >
> > On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
> >  wrote:
> >>
> >> Yes, I was thinking about it as well. To be honest I’m not sure about it 
> >> yet.
> >> As Kafka Streams DSL user, I don’t really think I would need control over 
> >> partitioner for internal topics.
> >> As a user, I would assume that Kafka Streams knows best how to partition 
> >> data for internal topics.
> >> In this KIP I wrote that Produced should be used only for topics that are 
> >> created by user In advance.
> >> In those cases maybe it make sense to have possibility to specify the 
> >> partitioner.
> >> I don’t have clear answer on that yet, but I guess specifying the 
> >> partitioner can be added as well if there’s agreement on this.
> >>
> >> Regards,
> >> Levani
> >>
> >>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman  
> >>> wrote:
> >>>
> >>> Thanks for clearing that up. I agree that Repartitioned would be a useful
> >>> addition. I'm wondering if it might also need to have
> >>> a withStreamPartitioner method/field, similar to Produced? I'm not sure 
> >>> how
> >>> widely this feature is really used, but seems it should be available for
> >>> repartition topics.
> >>>
> >>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze 
> >>> 
> >>> wrote:
> >>>
>  Hey Sophie,
> 
>  In both cases KStream#repartition and KStream#repartition(Repartitioned)
>  topic will be created and managed by Kafka Streams.
>  Idea of Repartitioned is to give user more control over the topic such as
>  num of partitions.
>  I feel like Repartitioned parameter is something that is missing in
>  current DSL design.
>  Essentially giving user control over parallelism by configuring num of
>  partitions for internal topics.
> 
>  Hope this answers your question.
> 
>  Regards,
>  Levani
> 
> > On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman 
>  wrote:
> >
> > Hey Levani,
> >
> > Thanks for the KIP! Can you clarify one thing for me -- for the
> > KStream#repartition signature taking a Repartitioned, will the topic be
> > auto-created by Streams (which seems to be the case for the signature
> > without a Repartitioned) or does it have to be pre-created? The wording
>  in
> > the KIP makes it seem like one version of the method will auto-create
> > topics while the other will not.
> >
> > Cheers,
> > Sophie
> >
> > On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
>  levani.co...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> One more bump about KIP-221 (
> >>
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >> <
> >>
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> > )
> >> so it doesn’t get lost in mailing list :)
> >> Would love to hear communities opinions/concerns about this KIP.
> >>
> >> Regards,
> >> Levani
> >>
> >>
> >>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze  >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Kind reminder about this KIP:
> >>
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >> <
> >>
>  https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>
> >>>
> >>> Regards,
> >>> Levani
> >>>
>  On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
>  levani.co...@gmail.com
> >> > wrote:
> 
>  Hello,
> 
>  In order to move this KIP forward, I’ve updated 

Re: [VOTE] KIP-480 : Sticky Partitioner

2019-07-17 Thread Justine Olshan
Hello all,

I wanted to let you all know the KIP has been updated. The
ComputedPartition class has been removed in favor of simply returning an
integer to represent the record's partition.
In short, the implications of this change mean that keyed records will also
trigger a change in the sticky partition. This was done for a case in which
there may be keyed and non-keyed records.
Upon testing, this did not significantly change the latency for records
with keyed values.

Thank you,
Justine

On Sun, Jul 14, 2019 at 3:07 AM M. Manna  wrote:

> +1(na)
>
> On Sat, 13 Jul 2019 at 22:17, Stanislav Kozlovski 
> wrote:
>
> > +1 (non-binding)
> >
> > Thanks!
> >
> > On Fri, Jul 12, 2019 at 6:02 PM Gwen Shapira  wrote:
> >
> > > +1 (binding)
> > >
> > > Thank you for the KIP. This was long awaited.
> > >
> > > On Tue, Jul 9, 2019 at 5:15 PM Justine Olshan 
> > > wrote:
> > > >
> > > > Hello all,
> > > >
> > > > I'd like to start the vote for KIP-480 : Sticky Partitioner.
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner
> > > >
> > > > Thank you,
> > > > Justine Olshan
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


Re: [VOTE] KIP-455: Create an Administrative API for Replica Reassignment

2019-07-17 Thread Colin McCabe
Thanks, Stanislav.  Let's restart the vote to reflect the fact that we've made 
significant changes.  The new vote will go for 3 days as usual.

I'll start with my +1 (binding).

best,
Colin


On Wed, Jul 17, 2019, at 08:56, Stanislav Kozlovski wrote:
> Hey everybody,
> 
> We have further iterated on the KIP in the accompanying discussion thread
> and I'd like to propose we resume the vote.
> 
> Some notable changes:
> - we will store reassignment information in the `/brokers/topics/[topic]`
> - we will internally use two collections to represent a reassignment -
> "addingReplicas" and "removingReplicas". LeaderAndIsr has been updated
> accordingly
> - the Alter API will still use the "targetReplicas" collection, but the
> List API will now return three separate collections - the full replica set,
> the replicas we are adding as part of this reassignment ("addingReplicas")
> and the replicas we are removing ("removingReplicas")
> - cancellation of a reassignment now means a proper rollback of the
> assignment to its original state prior to the API call
> 
> As always, you can re-read the KIP here
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment
> 
> Best,
> Stanislav
> 
> On Wed, May 22, 2019 at 6:12 PM Colin McCabe  wrote:
> 
> > Hi George,
> >
> > Thanks for taking a look.  I am working on getting a PR done as a
> > proof-of-concept.  I'll post it soon.  Then we'll finish up the vote.
> >
> > best,
> > Colin
> >
> > On Tue, May 21, 2019, at 17:33, George Li wrote:
> > >  Hi Colin,
> > >
> > >  Great! Looking forward to these features.+1 (non-binding)
> > >
> > > What is the estimated timeline to have this implemented?  If any help
> > > is needed in the implementation of cancelling reassignments,  I can
> > > help if there is spare cycle.
> > >
> > >
> > > Thanks,
> > > George
> > >
> > >
> > >
> > > On Thursday, May 16, 2019, 9:48:56 AM PDT, Colin McCabe
> > >  wrote:
> > >
> > >  Hi George,
> > >
> > > Yes, KIP-455 allows the reassignment of individual partitions to be
> > > cancelled.  I think it's very important for these operations to be at
> > > the partition level.
> > >
> > > best,
> > > Colin
> > >
> > > On Tue, May 14, 2019, at 16:34, George Li wrote:
> > > >  Hi Colin,
> > > >
> > > > Thanks for the updated KIP.  It has very good improvements of Kafka
> > > > reassignment operations.
> > > >
> > > > One question, looks like the KIP includes the Cancellation of
> > > > individual pending reassignments as well when the
> > > > AlterPartitionReasisgnmentRequest has empty replicas for the
> > > > topic/partition. Will you also be implementing the the partition
> > > > cancellation/rollback in the PR ?If yes,  it will make KIP-236 (it
> > > > has PR already) trivial, since the cancel all pending reassignments,
> > > > one just needs to do a ListPartitionRessignmentRequest, then submit
> > > > empty replicas for all those topic/partitions in
> > > > one AlterPartitionReasisgnmentRequest.
> > > >
> > > >
> > > > Thanks,
> > > > George
> > > >
> > > >On Friday, May 10, 2019, 8:44:31 PM PDT, Colin McCabe
> > > >  wrote:
> > > >
> > > >  On Fri, May 10, 2019, at 17:34, Colin McCabe wrote:
> > > > > On Fri, May 10, 2019, at 16:43, Jason Gustafson wrote:
> > > > > > Hi Colin,
> > > > > >
> > > > > > I think storing reassignment state at the partition level is the
> > right move
> > > > > > and I also agree that replicas should understand that there is a
> > > > > > reassignment in progress. This makes KIP-352 a trivial follow-up
> > for
> > > > > > example. The only doubt I have is whether the leader and isr znode
> > is the
> > > > > > right place to store the target reassignment. It is a bit odd to
> > keep the
> > > > > > target assignment in a separate place from the current assignment,
> > right? I
> > > > > > assume the thinking is probably that although the current
> > assignment should
> > > > > > probably be in the leader and isr znode as well, it is hard to
> > move the
> > > > > > state in a compatible way. Is that right? But if we have no plan
> > to remove
> > > > > > the assignment znode, do you see a downside to storing the target
> > > > > > assignment there as well?
> > > > > >
> > > > >
> > > > > Hi Jason,
> > > > >
> > > > > That's a good point -- it's probably better to keep the target
> > > > > assignment in the same znode as the current assignment, for
> > > > > consistency.  I'll change the KIP.
> > > >
> > > > Hi Jason,
> > > >
> > > > Thanks again for the review.
> > > >
> > > > I took another look at this, and I think we should stick with the
> > > > initial proposal of putting the reassignment state into
> > > > /brokers/topics/[topic]/partitions/[partitionId]/state.  The reason is
> > > > because we'll want to bump the leader epoch for the partition when
> > > > changing the reassignment state, and the leader epoch resides in that
> > > > znode anyway.  I agree there is some inconsistency here, 

Build failed in Jenkins: kafka-trunk-jdk11 #702

2019-07-17 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: Fix api exception single argument constructor usage (#6956)

--
[...truncated 2.56 MB...]

org.apache.kafka.connect.json.JsonConverterTest > byteToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > byteToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > nullSchemaPrimitiveToConnect 
STARTED

org.apache.kafka.connect.json.JsonConverterTest > nullSchemaPrimitiveToConnect 
PASSED

org.apache.kafka.connect.json.JsonConverterTest > byteToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > byteToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > intToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > intToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > dateToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > dateToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > noSchemaToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > noSchemaToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > noSchemaToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > noSchemaToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > nullSchemaAndPrimitiveToJson 
STARTED

org.apache.kafka.connect.json.JsonConverterTest > nullSchemaAndPrimitiveToJson 
PASSED

org.apache.kafka.connect.json.JsonConverterTest > 
decimalToConnectOptionalWithDefaultValue STARTED

org.apache.kafka.connect.json.JsonConverterTest > 
decimalToConnectOptionalWithDefaultValue PASSED

org.apache.kafka.connect.json.JsonConverterTest > mapToJsonStringKeys STARTED

org.apache.kafka.connect.json.JsonConverterTest > mapToJsonStringKeys PASSED

org.apache.kafka.connect.json.JsonConverterTest > arrayToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > arrayToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > nullToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > nullToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > timeToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > timeToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > structToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > structToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > 
testConnectSchemaMetadataTranslation STARTED

org.apache.kafka.connect.json.JsonConverterTest > 
testConnectSchemaMetadataTranslation PASSED

org.apache.kafka.connect.json.JsonConverterTest > shortToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > shortToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > dateToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > dateToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > doubleToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > doubleToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > testStringHeaderToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest > testStringHeaderToJson PASSED

org.apache.kafka.connect.json.JsonConverterTest > decimalToConnectOptional 
STARTED

org.apache.kafka.connect.json.JsonConverterTest > decimalToConnectOptional 
PASSED

org.apache.kafka.connect.json.JsonConverterTest > structSchemaIdentical STARTED

org.apache.kafka.connect.json.JsonConverterTest > structSchemaIdentical PASSED

org.apache.kafka.connect.json.JsonConverterTest > timeToConnectWithDefaultValue 
STARTED

org.apache.kafka.connect.json.JsonConverterTest > timeToConnectWithDefaultValue 
PASSED

org.apache.kafka.connect.json.JsonConverterTest > timeToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > timeToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > 
dateToConnectOptionalWithDefaultValue STARTED

org.apache.kafka.connect.json.JsonConverterTest > 
dateToConnectOptionalWithDefaultValue PASSED

org.apache.kafka.connect.json.JsonConverterTest > mapToConnectStringKeys STARTED

org.apache.kafka.connect.json.JsonConverterTest > mapToConnectStringKeys PASSED

org.apache.kafka.connect.json.JsonConverterTest > dateToConnectOptional STARTED

org.apache.kafka.connect.json.JsonConverterTest > dateToConnectOptional PASSED

org.apache.kafka.connect.json.JsonConverterTest > 
timeToConnectOptionalWithDefaultValue STARTED

org.apache.kafka.connect.json.JsonConverterTest > 
timeToConnectOptionalWithDefaultValue PASSED

org.apache.kafka.connect.json.JsonConverterTest > floatToConnect STARTED

org.apache.kafka.connect.json.JsonConverterTest > floatToConnect PASSED

org.apache.kafka.connect.json.JsonConverterTest > 
decimalToConnectWithDefaultValue STARTED

org.apache.kafka.connect.json.JsonConverterTest > 
decimalToConnectWithDefaultValue PASSED

org.apache.kafka.connect.json.JsonConverterTest > decimalToJson STARTED

org.apache.kafka.connect.json.JsonConverterTest 

Re: [VOTE] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Bill Bejeck
+1 (binding) for the updated KIP.

On Wed, Jul 17, 2019 at 4:09 PM John Roesler  wrote:

> Hey, Bruno and Bill,
>
> Since you cast your votes before the KIP was updated, do you mind
> re-casting just so we can be sure you're still in favor?
>
> Thanks,
> -John
>
> On Wed, Jul 17, 2019 at 2:01 PM Guozhang Wang  wrote:
> >
> > +1 (binging).
> >
> > This is a great cleanup, thanks John!
> >
> > Guozhang
> >
> > On Wed, Jul 17, 2019 at 11:26 AM Ryanne Dolan 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks for the interesting discussion.
> > >
> > > Ryanne
> > >
> > > On Fri, Jul 12, 2019, 2:49 PM Ryanne Dolan 
> wrote:
> > >
> > > > John, I'm glad to learn I'm not the only one who's re-read the
> metrics
> > > > code multiple times.
> > > >
> > > > I do wonder if the proposed names could be improved further though,
> given
> > > > that "sum", "total", and "count" are roughly synonymous. I'm already
> > > > scratching my head at what "TotalSum" means. It's clear in the
> context of
> > > > your matrix, juxtaposed with the alternatives, but when I come
> across the
> > > > name in isolation I suspect I'll be back looking at the
> implementation
> > > > again.
> > > >
> > > > Ryanne
> > > >
> > > > On Fri, Jul 12, 2019, 1:45 PM John Roesler 
> wrote:
> > > >
> > > >> Hi Kafka devs,
> > > >>
> > > >> Yesterday, I proposed KIP-488 as a minor cleanup of some of our
> metric
> > > >> implementations.
> > > >>
> > > >> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > > >>
> > > >> The change seems pretty uncontroversial, so I'm just going to open
> the
> > > >> vote now.
> > > >>
> > > >> Feel free to veto or just request more discussion if you disagree
> with
> > > >> the KIP. The vote will remain open for 72 hours.
> > > >>
> > > >> Thanks,
> > > >> -John
> > > >>
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
>


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread Levani Kokhreidze
Hey John,

Oh that’s interesting use-case. 
Do I understand this correctly, in your example I would first issue 
repartition(Repartitioned) with proper partitioner that essentially would be 
the same as the topic I want to join with and then do the KStream#join with DSL?

Regards,
Levani

> On Jul 17, 2019, at 11:11 PM, John Roesler  wrote:
> 
> Hey, all, just to chime in,
> 
> I think it might be useful to have an option to specify the
> partitioner. The case I have in mind is that some data may get
> repartitioned and then joined with an input topic. If the right-side
> input topic uses a custom partitioning strategy, then the
> repartitioned stream also needs to be partitioned with the same
> strategy.
> 
> Does that make sense, or did I maybe miss something important?
> 
> Thanks,
> -John
> 
> On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
>  wrote:
>> 
>> Yes, I was thinking about it as well. To be honest I’m not sure about it yet.
>> As Kafka Streams DSL user, I don’t really think I would need control over 
>> partitioner for internal topics.
>> As a user, I would assume that Kafka Streams knows best how to partition 
>> data for internal topics.
>> In this KIP I wrote that Produced should be used only for topics that are 
>> created by user In advance.
>> In those cases maybe it make sense to have possibility to specify the 
>> partitioner.
>> I don’t have clear answer on that yet, but I guess specifying the 
>> partitioner can be added as well if there’s agreement on this.
>> 
>> Regards,
>> Levani
>> 
>>> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman  
>>> wrote:
>>> 
>>> Thanks for clearing that up. I agree that Repartitioned would be a useful
>>> addition. I'm wondering if it might also need to have
>>> a withStreamPartitioner method/field, similar to Produced? I'm not sure how
>>> widely this feature is really used, but seems it should be available for
>>> repartition topics.
>>> 
>>> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze 
>>> wrote:
>>> 
 Hey Sophie,
 
 In both cases KStream#repartition and KStream#repartition(Repartitioned)
 topic will be created and managed by Kafka Streams.
 Idea of Repartitioned is to give user more control over the topic such as
 num of partitions.
 I feel like Repartitioned parameter is something that is missing in
 current DSL design.
 Essentially giving user control over parallelism by configuring num of
 partitions for internal topics.
 
 Hope this answers your question.
 
 Regards,
 Levani
 
> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman 
 wrote:
> 
> Hey Levani,
> 
> Thanks for the KIP! Can you clarify one thing for me -- for the
> KStream#repartition signature taking a Repartitioned, will the topic be
> auto-created by Streams (which seems to be the case for the signature
> without a Repartitioned) or does it have to be pre-created? The wording
 in
> the KIP makes it seem like one version of the method will auto-create
> topics while the other will not.
> 
> Cheers,
> Sophie
> 
> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
 levani.co...@gmail.com>
> wrote:
> 
>> Hello,
>> 
>> One more bump about KIP-221 (
>> 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> )
>> so it doesn’t get lost in mailing list :)
>> Would love to hear communities opinions/concerns about this KIP.
>> 
>> Regards,
>> Levani
>> 
>> 
>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze  
>> wrote:
>>> 
>>> Hello,
>>> 
>>> Kind reminder about this KIP:
>> 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>> 
>>> Regards,
>>> Levani
>>> 
 On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
 levani.co...@gmail.com
>> > wrote:
 
 Hello,
 
 In order to move this KIP forward, I’ve updated confluence page with
>> the new proposal
>> 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
 I’ve also filled “Rejected Alternatives” section.
 
 Looking forward to discuss this KIP :)
 
 King regards,
 Levani

Build failed in Jenkins: kafka-trunk-jdk8 #3795

2019-07-17 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H35 (ubuntu xenial) in workspace 

[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...
[WS-CLEANUP] Done
No credentials specified
Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Command "git fetch --tags --progress 
https://github.com/apache/kafka.git +refs/heads/*:refs/remotes/origin/*" 
returned status code 128:
stdout: 
stderr: fatal: unable to access 'https://github.com/apache/kafka.git/': Could 
not resolve host: github.com

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2042)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1761)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$400(CliGitAPIImpl.java:72)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:442)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:655)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H35
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
at 
hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
at hudson.remoting.Channel.call(Channel.java:955)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
at sun.reflect.GeneratedMethodAccessor974.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
at com.sun.proxy.$Proxy135.execute(Unknown Source)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1152)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1192)
at hudson.scm.SCM.checkout(SCM.java:504)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1810)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
ERROR: Error cloning remote repo 'origin'
Retrying after 10 seconds
No credentials specified
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:894)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1161)
at 

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread John Roesler
Hey, all, just to chime in,

I think it might be useful to have an option to specify the
partitioner. The case I have in mind is that some data may get
repartitioned and then joined with an input topic. If the right-side
input topic uses a custom partitioning strategy, then the
repartitioned stream also needs to be partitioned with the same
strategy.

Does that make sense, or did I maybe miss something important?

Thanks,
-John

On Wed, Jul 17, 2019 at 2:48 PM Levani Kokhreidze
 wrote:
>
> Yes, I was thinking about it as well. To be honest I’m not sure about it yet.
> As Kafka Streams DSL user, I don’t really think I would need control over 
> partitioner for internal topics.
> As a user, I would assume that Kafka Streams knows best how to partition data 
> for internal topics.
> In this KIP I wrote that Produced should be used only for topics that are 
> created by user In advance.
> In those cases maybe it make sense to have possibility to specify the 
> partitioner.
> I don’t have clear answer on that yet, but I guess specifying the partitioner 
> can be added as well if there’s agreement on this.
>
> Regards,
> Levani
>
> > On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman  
> > wrote:
> >
> > Thanks for clearing that up. I agree that Repartitioned would be a useful
> > addition. I'm wondering if it might also need to have
> > a withStreamPartitioner method/field, similar to Produced? I'm not sure how
> > widely this feature is really used, but seems it should be available for
> > repartition topics.
> >
> > On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze 
> > wrote:
> >
> >> Hey Sophie,
> >>
> >> In both cases KStream#repartition and KStream#repartition(Repartitioned)
> >> topic will be created and managed by Kafka Streams.
> >> Idea of Repartitioned is to give user more control over the topic such as
> >> num of partitions.
> >> I feel like Repartitioned parameter is something that is missing in
> >> current DSL design.
> >> Essentially giving user control over parallelism by configuring num of
> >> partitions for internal topics.
> >>
> >> Hope this answers your question.
> >>
> >> Regards,
> >> Levani
> >>
> >>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman 
> >> wrote:
> >>>
> >>> Hey Levani,
> >>>
> >>> Thanks for the KIP! Can you clarify one thing for me -- for the
> >>> KStream#repartition signature taking a Repartitioned, will the topic be
> >>> auto-created by Streams (which seems to be the case for the signature
> >>> without a Repartitioned) or does it have to be pre-created? The wording
> >> in
> >>> the KIP makes it seem like one version of the method will auto-create
> >>> topics while the other will not.
> >>>
> >>> Cheers,
> >>> Sophie
> >>>
> >>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
> >> levani.co...@gmail.com>
> >>> wrote:
> >>>
>  Hello,
> 
>  One more bump about KIP-221 (
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>  <
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>> )
>  so it doesn’t get lost in mailing list :)
>  Would love to hear communities opinions/concerns about this KIP.
> 
>  Regards,
>  Levani
> 
> 
> > On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze  >>>
>  wrote:
> >
> > Hello,
> >
> > Kind reminder about this KIP:
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>  <
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >
> > Regards,
> > Levani
> >
> >> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
> >> levani.co...@gmail.com
>  > wrote:
> >>
> >> Hello,
> >>
> >> In order to move this KIP forward, I’ve updated confluence page with
>  the new proposal
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>  <
> 
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> I’ve also filled “Rejected Alternatives” section.
> >>
> >> Looking forward to discuss this KIP :)
> >>
> >> King regards,
> >> Levani
> >>
> >>
> >>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
> >> levani.co...@gmail.com
>  > wrote:
> >>>
> >>> Hello Matthias,
> >>>
> >>> Thanks for the feedback and ideas.
> >>> I like the idea of introducing dedicated `Topic` class for topic
>  configuration for internal operators like `groupedBy`.
> >>> Would be great to hear others opinion 

Re: [VOTE] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Hey, Bruno and Bill,

Since you cast your votes before the KIP was updated, do you mind
re-casting just so we can be sure you're still in favor?

Thanks,
-John

On Wed, Jul 17, 2019 at 2:01 PM Guozhang Wang  wrote:
>
> +1 (binging).
>
> This is a great cleanup, thanks John!
>
> Guozhang
>
> On Wed, Jul 17, 2019 at 11:26 AM Ryanne Dolan  wrote:
>
> > +1 (non-binding)
> >
> > Thanks for the interesting discussion.
> >
> > Ryanne
> >
> > On Fri, Jul 12, 2019, 2:49 PM Ryanne Dolan  wrote:
> >
> > > John, I'm glad to learn I'm not the only one who's re-read the metrics
> > > code multiple times.
> > >
> > > I do wonder if the proposed names could be improved further though, given
> > > that "sum", "total", and "count" are roughly synonymous. I'm already
> > > scratching my head at what "TotalSum" means. It's clear in the context of
> > > your matrix, juxtaposed with the alternatives, but when I come across the
> > > name in isolation I suspect I'll be back looking at the implementation
> > > again.
> > >
> > > Ryanne
> > >
> > > On Fri, Jul 12, 2019, 1:45 PM John Roesler  wrote:
> > >
> > >> Hi Kafka devs,
> > >>
> > >> Yesterday, I proposed KIP-488 as a minor cleanup of some of our metric
> > >> implementations.
> > >>
> > >> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> > >>
> > >> The change seems pretty uncontroversial, so I'm just going to open the
> > >> vote now.
> > >>
> > >> Feel free to veto or just request more discussion if you disagree with
> > >> the KIP. The vote will remain open for 72 hours.
> > >>
> > >> Thanks,
> > >> -John
> > >>
> > >
> >
>
>
> --
> -- Guozhang


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Cool. I'll update the KIP, then, and we can re-start the vote. (looks
like you've already cast a vote :) )

-John


On Wed, Jul 17, 2019 at 1:23 PM Ryanne Dolan  wrote:
>
> John, makes sense to me! Thanks.
>
> Ryanne
>
> On Wed, Jul 17, 2019, 1:16 PM John Roesler  wrote:
>
> > Agreed. I think the names are actually not ambiguous once you recall
> > that the stats summarize measurements and each measurement is a
> > floating point number, but there's enough overlap that I also was
> > initially confused as well. I do plan to make this super clear in the
> > documentation.
> >
> > Thanks,
> > -John
> >
> > On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman 
> > wrote:
> > >
> > > Sounds good to me
> > >
> > > By the way, while I agree that we can't really do better than Sum and
> > Count
> > > I will say I also found the distinction a bit unclear at first glance. We
> > > should at least document clearly that "Sum" is a "sum of values" whereas
> > > "Count" is a "number of things" -- but that doesn't need to be part of
> > the
> > > KIP
> > >
> > > On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
> > >
> > > > Thanks for the replies.
> > > >
> > > > I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> > > > something, it should still be pretty obvious that WindowedX is the
> > > > unweighted version? In that case, I buy the argument that we don't
> > > > need "Simple" and we can just go with:
> > > >
> > > > WindowedSum, WindowedCount
> > > > CumulativeSum, CumulativeCount
> > > >
> > > > Sound good?
> > > > Thanks,
> > > > -John
> > > >
> > > > On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
> > > >  wrote:
> > > > >
> > > > > Thanks for the crash course in statistical terms :)
> > > > >
> > > > > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> > > > really
> > > > > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so
> > much
> > > > > because of its unfortunate length (although that is unfortunate it
> > > > > shouldn't be a deciding factor) but because it seems to have the
> > > > potential
> > > > > to confuse further. I'm not sure what we gain by adding "Simple"
> > since to
> > > > > me at least, the unweighted-ness is obvious and the definition of
> > simple
> > > > is
> > > > > not. To those who haven't been exposed to the finer details of
> > > > statistical
> > > > > definitions, I think they are more likely to read "SimpleXX" and
> > wonder
> > > > "is
> > > > > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > > > > wonder what is the weighting behind these metrics.
> > > > >
> > > > > Sophie
> > > > >
> > > > > On Wed, Jul 17, 2019 at 8:18 AM John Roesler 
> > wrote:
> > > > >
> > > > > > Thanks for the discussion, all.
> > > > > >
> > > > > > I've done a little more research into the statistical terminology.
> > > > > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > > > > Unfortunately, both can be computed either over a window of the
> > last N
> > > > > > measurements or over all prior measurements. "Moving" just
> > signifies
> > > > > > that the statistic is computed over a "live" data set, i.e., a
> > > > > > continuous stream of measurements, and the expectation is that the
> > > > > > stat would be updated in response to new measurements.
> > > > > >
> > > > > > I found https://en.wikipedia.org/wiki/Moving_average to have a
> > pretty
> > > > > > good overview of the whole picture.
> > > > > >
> > > > > > After considering the discussion so far and some light reading, it
> > > > > > seems like "Cumulative" is truly the correct term for the all-time
> > > > > > metrics:
> > > > > >
> > > > > > > In a cumulative moving average, the data arrive in an
> > > > > > > ordered datum stream, and the user would like to get
> > > > > > > the average of all of the data up until the current datum
> > > > > > > point.
> > > > > >
> > > > > > I know that we previously felt that "cumulative" was too much of a
> > > > > > mouthful, but it seems like our quest for a terser term led us
> > into a
> > > > > > briar patch. Also, now there is an independent source (the wiki
> > page)
> > > > > > indicating that this is indeed the correct term, and it doesn't
> > offer
> > > > > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > > > > that we'll rarely be saying the name of the classes out loud.
> > > > > >
> > > > > > As far as moving stats that operate over a window of the last N
> > > > > > measurements, there are multiple options, including Simple
> > > > > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > > > > infinite variations with other weighting functions. In our domain,
> > > > > > there is only one weighting function available, but it's still more
> > > > > > self-documenting and future-proof to specify the type of windowed
> > > > > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > > > > windowed (aka 

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread Levani Kokhreidze
Yes, I was thinking about it as well. To be honest I’m not sure about it yet.
As Kafka Streams DSL user, I don’t really think I would need control over 
partitioner for internal topics.
As a user, I would assume that Kafka Streams knows best how to partition data 
for internal topics.
In this KIP I wrote that Produced should be used only for topics that are 
created by user In advance. 
In those cases maybe it make sense to have possibility to specify the 
partitioner.
I don’t have clear answer on that yet, but I guess specifying the partitioner 
can be added as well if there’s agreement on this.

Regards,
Levani

> On Jul 17, 2019, at 10:42 PM, Sophie Blee-Goldman  wrote:
> 
> Thanks for clearing that up. I agree that Repartitioned would be a useful
> addition. I'm wondering if it might also need to have
> a withStreamPartitioner method/field, similar to Produced? I'm not sure how
> widely this feature is really used, but seems it should be available for
> repartition topics.
> 
> On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze 
> wrote:
> 
>> Hey Sophie,
>> 
>> In both cases KStream#repartition and KStream#repartition(Repartitioned)
>> topic will be created and managed by Kafka Streams.
>> Idea of Repartitioned is to give user more control over the topic such as
>> num of partitions.
>> I feel like Repartitioned parameter is something that is missing in
>> current DSL design.
>> Essentially giving user control over parallelism by configuring num of
>> partitions for internal topics.
>> 
>> Hope this answers your question.
>> 
>> Regards,
>> Levani
>> 
>>> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman 
>> wrote:
>>> 
>>> Hey Levani,
>>> 
>>> Thanks for the KIP! Can you clarify one thing for me -- for the
>>> KStream#repartition signature taking a Repartitioned, will the topic be
>>> auto-created by Streams (which seems to be the case for the signature
>>> without a Repartitioned) or does it have to be pre-created? The wording
>> in
>>> the KIP makes it seem like one version of the method will auto-create
>>> topics while the other will not.
>>> 
>>> Cheers,
>>> Sophie
>>> 
>>> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
>> levani.co...@gmail.com>
>>> wrote:
>>> 
 Hello,
 
 One more bump about KIP-221 (
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
 <
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> )
 so it doesn’t get lost in mailing list :)
 Would love to hear communities opinions/concerns about this KIP.
 
 Regards,
 Levani
 
 
> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze >> 
 wrote:
> 
> Hello,
> 
> Kind reminder about this KIP:
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
 <
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> 
> 
> Regards,
> Levani
> 
>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
>> levani.co...@gmail.com
 > wrote:
>> 
>> Hello,
>> 
>> In order to move this KIP forward, I’ve updated confluence page with
 the new proposal
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
 <
 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> 
>> I’ve also filled “Rejected Alternatives” section.
>> 
>> Looking forward to discuss this KIP :)
>> 
>> King regards,
>> Levani
>> 
>> 
>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
>> levani.co...@gmail.com
 > wrote:
>>> 
>>> Hello Matthias,
>>> 
>>> Thanks for the feedback and ideas.
>>> I like the idea of introducing dedicated `Topic` class for topic
 configuration for internal operators like `groupedBy`.
>>> Would be great to hear others opinion about this as well.
>>> 
>>> Kind regards,
>>> Levani
>>> 
>>> 
 On Jul 3, 2019, at 7:00 AM, Matthias J. Sax >>> > wrote:
 
 Levani,
 
 Thanks for picking up this KIP! And thanks for summarizing
>> everything.
 Even if some points may have been discussed already (can't really
 remember), it's helpful to get a good summary to refresh the
 discussion.
 
 I think your reasoning makes sense. With regard to the distinction
 between operators that manage topics and operators that use
 user-created
 topics: Following this argument, it might indicate that leaving

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread Sophie Blee-Goldman
Thanks for clearing that up. I agree that Repartitioned would be a useful
addition. I'm wondering if it might also need to have
a withStreamPartitioner method/field, similar to Produced? I'm not sure how
widely this feature is really used, but seems it should be available for
repartition topics.

On Wed, Jul 17, 2019 at 11:26 AM Levani Kokhreidze 
wrote:

> Hey Sophie,
>
> In both cases KStream#repartition and KStream#repartition(Repartitioned)
> topic will be created and managed by Kafka Streams.
> Idea of Repartitioned is to give user more control over the topic such as
> num of partitions.
> I feel like Repartitioned parameter is something that is missing in
> current DSL design.
> Essentially giving user control over parallelism by configuring num of
> partitions for internal topics.
>
> Hope this answers your question.
>
> Regards,
> Levani
>
> > On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman 
> wrote:
> >
> > Hey Levani,
> >
> > Thanks for the KIP! Can you clarify one thing for me -- for the
> > KStream#repartition signature taking a Repartitioned, will the topic be
> > auto-created by Streams (which seems to be the case for the signature
> > without a Repartitioned) or does it have to be pre-created? The wording
> in
> > the KIP makes it seem like one version of the method will auto-create
> > topics while the other will not.
> >
> > Cheers,
> > Sophie
> >
> > On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze <
> levani.co...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >> One more bump about KIP-221 (
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >)
> >> so it doesn’t get lost in mailing list :)
> >> Would love to hear communities opinions/concerns about this KIP.
> >>
> >> Regards,
> >> Levani
> >>
> >>
> >>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze  >
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> Kind reminder about this KIP:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>
> >>>
> >>> Regards,
> >>> Levani
> >>>
>  On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze <
> levani.co...@gmail.com
> >> > wrote:
> 
>  Hello,
> 
>  In order to move this KIP forward, I’ve updated confluence page with
> >> the new proposal
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >>>
>  I’ve also filled “Rejected Alternatives” section.
> 
>  Looking forward to discuss this KIP :)
> 
>  King regards,
>  Levani
> 
> 
> > On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze <
> levani.co...@gmail.com
> >> > wrote:
> >
> > Hello Matthias,
> >
> > Thanks for the feedback and ideas.
> > I like the idea of introducing dedicated `Topic` class for topic
> >> configuration for internal operators like `groupedBy`.
> > Would be great to hear others opinion about this as well.
> >
> > Kind regards,
> > Levani
> >
> >
> >> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax  >> > wrote:
> >>
> >> Levani,
> >>
> >> Thanks for picking up this KIP! And thanks for summarizing
> everything.
> >> Even if some points may have been discussed already (can't really
> >> remember), it's helpful to get a good summary to refresh the
> >> discussion.
> >>
> >> I think your reasoning makes sense. With regard to the distinction
> >> between operators that manage topics and operators that use
> >> user-created
> >> topics: Following this argument, it might indicate that leaving
> >> `through()` as-is (as an operator that uses use-defined topics) and
> >> introducing a new `repartition()` operator (an operator that manages
> >> topics itself) might be good. Otherwise, there is one operator
> >> `through()` that sometimes manages topics but sometimes not; a
> >> different
> >> name, ie, new operator would make the distinction clearer.
> >>
> >> About adding `numOfPartitions` to `Grouped`. I am wondering if the
> >> same
> >> argument as for `Produced` does apply and adding it is semantically
> >> questionable? Might be good to get opinions of others on this, too.
> I
> >> am
> >> not sure myself what solution I prefer atm.
> >>
> >> So far, KS uses configuration objects that allow to configure a

Build failed in Jenkins: kafka-trunk-jdk8 #3794

2019-07-17 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H35 (ubuntu xenial) in workspace 

[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...
[WS-CLEANUP] Done
No credentials specified
Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Command "git fetch --tags --progress 
https://github.com/apache/kafka.git +refs/heads/*:refs/remotes/origin/*" 
returned status code 128:
stdout: 
stderr: fatal: unable to access 'https://github.com/apache/kafka.git/': Could 
not resolve host: github.com

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2042)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1761)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$400(CliGitAPIImpl.java:72)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:442)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:655)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H35
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
at 
hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
at hudson.remoting.Channel.call(Channel.java:955)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
at sun.reflect.GeneratedMethodAccessor974.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
at com.sun.proxy.$Proxy135.execute(Unknown Source)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1152)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1192)
at hudson.scm.SCM.checkout(SCM.java:504)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1810)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
ERROR: Error cloning remote repo 'origin'
Retrying after 10 seconds
No credentials specified
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:894)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1161)
at 

Re: [VOTE] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Guozhang Wang
+1 (binging).

This is a great cleanup, thanks John!

Guozhang

On Wed, Jul 17, 2019 at 11:26 AM Ryanne Dolan  wrote:

> +1 (non-binding)
>
> Thanks for the interesting discussion.
>
> Ryanne
>
> On Fri, Jul 12, 2019, 2:49 PM Ryanne Dolan  wrote:
>
> > John, I'm glad to learn I'm not the only one who's re-read the metrics
> > code multiple times.
> >
> > I do wonder if the proposed names could be improved further though, given
> > that "sum", "total", and "count" are roughly synonymous. I'm already
> > scratching my head at what "TotalSum" means. It's clear in the context of
> > your matrix, juxtaposed with the alternatives, but when I come across the
> > name in isolation I suspect I'll be back looking at the implementation
> > again.
> >
> > Ryanne
> >
> > On Fri, Jul 12, 2019, 1:45 PM John Roesler  wrote:
> >
> >> Hi Kafka devs,
> >>
> >> Yesterday, I proposed KIP-488 as a minor cleanup of some of our metric
> >> implementations.
> >>
> >> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
> >>
> >> The change seems pretty uncontroversial, so I'm just going to open the
> >> vote now.
> >>
> >> Feel free to veto or just request more discussion if you disagree with
> >> the KIP. The vote will remain open for 72 hours.
> >>
> >> Thanks,
> >> -John
> >>
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread Levani Kokhreidze
Hey Sophie,

In both cases KStream#repartition and KStream#repartition(Repartitioned) topic 
will be created and managed by Kafka Streams. 
Idea of Repartitioned is to give user more control over the topic such as num 
of partitions. 
I feel like Repartitioned parameter is something that is missing in current DSL 
design. 
Essentially giving user control over parallelism by configuring num of 
partitions for internal topics.

Hope this answers your question.

Regards,
Levani

> On Jul 17, 2019, at 9:02 PM, Sophie Blee-Goldman  wrote:
> 
> Hey Levani,
> 
> Thanks for the KIP! Can you clarify one thing for me -- for the
> KStream#repartition signature taking a Repartitioned, will the topic be
> auto-created by Streams (which seems to be the case for the signature
> without a Repartitioned) or does it have to be pre-created? The wording in
> the KIP makes it seem like one version of the method will auto-create
> topics while the other will not.
> 
> Cheers,
> Sophie
> 
> On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze 
> wrote:
> 
>> Hello,
>> 
>> One more bump about KIP-221 (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>)
>> so it doesn’t get lost in mailing list :)
>> Would love to hear communities opinions/concerns about this KIP.
>> 
>> Regards,
>> Levani
>> 
>> 
>>> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze 
>> wrote:
>>> 
>>> Hello,
>>> 
>>> Kind reminder about this KIP:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
>>> 
>>> Regards,
>>> Levani
>>> 
 On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze > > wrote:
 
 Hello,
 
 In order to move this KIP forward, I’ve updated confluence page with
>> the new proposal
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>> 
 I’ve also filled “Rejected Alternatives” section.
 
 Looking forward to discuss this KIP :)
 
 King regards,
 Levani
 
 
> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze > > wrote:
> 
> Hello Matthias,
> 
> Thanks for the feedback and ideas.
> I like the idea of introducing dedicated `Topic` class for topic
>> configuration for internal operators like `groupedBy`.
> Would be great to hear others opinion about this as well.
> 
> Kind regards,
> Levani
> 
> 
>> On Jul 3, 2019, at 7:00 AM, Matthias J. Sax > > wrote:
>> 
>> Levani,
>> 
>> Thanks for picking up this KIP! And thanks for summarizing everything.
>> Even if some points may have been discussed already (can't really
>> remember), it's helpful to get a good summary to refresh the
>> discussion.
>> 
>> I think your reasoning makes sense. With regard to the distinction
>> between operators that manage topics and operators that use
>> user-created
>> topics: Following this argument, it might indicate that leaving
>> `through()` as-is (as an operator that uses use-defined topics) and
>> introducing a new `repartition()` operator (an operator that manages
>> topics itself) might be good. Otherwise, there is one operator
>> `through()` that sometimes manages topics but sometimes not; a
>> different
>> name, ie, new operator would make the distinction clearer.
>> 
>> About adding `numOfPartitions` to `Grouped`. I am wondering if the
>> same
>> argument as for `Produced` does apply and adding it is semantically
>> questionable? Might be good to get opinions of others on this, too. I
>> am
>> not sure myself what solution I prefer atm.
>> 
>> So far, KS uses configuration objects that allow to configure a
>> certain
>> "entity" like a consumer, producer, store. If we assume that a topic
>> is
>> a similar entity, I am wonder if we should have a
>> `Topic#withNumberOfPartitions()` class and method instead of a plain
>> integer? This would allow us to add other configs, like replication
>> factor, retention-time etc, easily, without the need to change the
>> "main
>> API".
>> 
>> Just want to give some ideas. Not sure if I like them myself. :)
>> 
>> 
>> -Matthias
>> 
>> 
>> 
>> On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
>>> Actually, giving it more though - maybe enhancing Produced with num
>> of 

Re: [VOTE] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Ryanne Dolan
+1 (non-binding)

Thanks for the interesting discussion.

Ryanne

On Fri, Jul 12, 2019, 2:49 PM Ryanne Dolan  wrote:

> John, I'm glad to learn I'm not the only one who's re-read the metrics
> code multiple times.
>
> I do wonder if the proposed names could be improved further though, given
> that "sum", "total", and "count" are roughly synonymous. I'm already
> scratching my head at what "TotalSum" means. It's clear in the context of
> your matrix, juxtaposed with the alternatives, but when I come across the
> name in isolation I suspect I'll be back looking at the implementation
> again.
>
> Ryanne
>
> On Fri, Jul 12, 2019, 1:45 PM John Roesler  wrote:
>
>> Hi Kafka devs,
>>
>> Yesterday, I proposed KIP-488 as a minor cleanup of some of our metric
>> implementations.
>>
>> KIP-488: https://cwiki.apache.org/confluence/x/kkAyBw
>>
>> The change seems pretty uncontroversial, so I'm just going to open the
>> vote now.
>>
>> Feel free to veto or just request more discussion if you disagree with
>> the KIP. The vote will remain open for 72 hours.
>>
>> Thanks,
>> -John
>>
>


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Ryanne Dolan
John, makes sense to me! Thanks.

Ryanne

On Wed, Jul 17, 2019, 1:16 PM John Roesler  wrote:

> Agreed. I think the names are actually not ambiguous once you recall
> that the stats summarize measurements and each measurement is a
> floating point number, but there's enough overlap that I also was
> initially confused as well. I do plan to make this super clear in the
> documentation.
>
> Thanks,
> -John
>
> On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman 
> wrote:
> >
> > Sounds good to me
> >
> > By the way, while I agree that we can't really do better than Sum and
> Count
> > I will say I also found the distinction a bit unclear at first glance. We
> > should at least document clearly that "Sum" is a "sum of values" whereas
> > "Count" is a "number of things" -- but that doesn't need to be part of
> the
> > KIP
> >
> > On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
> >
> > > Thanks for the replies.
> > >
> > > I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> > > something, it should still be pretty obvious that WindowedX is the
> > > unweighted version? In that case, I buy the argument that we don't
> > > need "Simple" and we can just go with:
> > >
> > > WindowedSum, WindowedCount
> > > CumulativeSum, CumulativeCount
> > >
> > > Sound good?
> > > Thanks,
> > > -John
> > >
> > > On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
> > >  wrote:
> > > >
> > > > Thanks for the crash course in statistical terms :)
> > > >
> > > > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> > > really
> > > > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so
> much
> > > > because of its unfortunate length (although that is unfortunate it
> > > > shouldn't be a deciding factor) but because it seems to have the
> > > potential
> > > > to confuse further. I'm not sure what we gain by adding "Simple"
> since to
> > > > me at least, the unweighted-ness is obvious and the definition of
> simple
> > > is
> > > > not. To those who haven't been exposed to the finer details of
> > > statistical
> > > > definitions, I think they are more likely to read "SimpleXX" and
> wonder
> > > "is
> > > > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > > > wonder what is the weighting behind these metrics.
> > > >
> > > > Sophie
> > > >
> > > > On Wed, Jul 17, 2019 at 8:18 AM John Roesler 
> wrote:
> > > >
> > > > > Thanks for the discussion, all.
> > > > >
> > > > > I've done a little more research into the statistical terminology.
> > > > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > > > Unfortunately, both can be computed either over a window of the
> last N
> > > > > measurements or over all prior measurements. "Moving" just
> signifies
> > > > > that the statistic is computed over a "live" data set, i.e., a
> > > > > continuous stream of measurements, and the expectation is that the
> > > > > stat would be updated in response to new measurements.
> > > > >
> > > > > I found https://en.wikipedia.org/wiki/Moving_average to have a
> pretty
> > > > > good overview of the whole picture.
> > > > >
> > > > > After considering the discussion so far and some light reading, it
> > > > > seems like "Cumulative" is truly the correct term for the all-time
> > > > > metrics:
> > > > >
> > > > > > In a cumulative moving average, the data arrive in an
> > > > > > ordered datum stream, and the user would like to get
> > > > > > the average of all of the data up until the current datum
> > > > > > point.
> > > > >
> > > > > I know that we previously felt that "cumulative" was too much of a
> > > > > mouthful, but it seems like our quest for a terser term led us
> into a
> > > > > briar patch. Also, now there is an independent source (the wiki
> page)
> > > > > indicating that this is indeed the correct term, and it doesn't
> offer
> > > > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > > > that we'll rarely be saying the name of the classes out loud.
> > > > >
> > > > > As far as moving stats that operate over a window of the last N
> > > > > measurements, there are multiple options, including Simple
> > > > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > > > infinite variations with other weighting functions. In our domain,
> > > > > there is only one weighting function available, but it's still more
> > > > > self-documenting and future-proof to specify the type of windowed
> > > > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > > > windowed (aka sampled) stats, while keeping Windowed in the name to
> > > > > distinguish it from the all-time metrics.
> > > > >
> > > > > > In financial applications a simple moving average (SMA)
> > > > > > is the unweighted mean of the previous n data.
> > > > >
> > > > > Therefore, we would have the proposed matrix:
> > > > >
> > > > > SimpleWindowedSum, SimpleWindowedCount
> > > > > CumulativeSum, CumulativeCount

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Agreed. I think the names are actually not ambiguous once you recall
that the stats summarize measurements and each measurement is a
floating point number, but there's enough overlap that I also was
initially confused as well. I do plan to make this super clear in the
documentation.

Thanks,
-John

On Wed, Jul 17, 2019 at 1:08 PM Sophie Blee-Goldman  wrote:
>
> Sounds good to me
>
> By the way, while I agree that we can't really do better than Sum and Count
> I will say I also found the distinction a bit unclear at first glance. We
> should at least document clearly that "Sum" is a "sum of values" whereas
> "Count" is a "number of things" -- but that doesn't need to be part of the
> KIP
>
> On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:
>
> > Thanks for the replies.
> >
> > I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> > something, it should still be pretty obvious that WindowedX is the
> > unweighted version? In that case, I buy the argument that we don't
> > need "Simple" and we can just go with:
> >
> > WindowedSum, WindowedCount
> > CumulativeSum, CumulativeCount
> >
> > Sound good?
> > Thanks,
> > -John
> >
> > On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
> >  wrote:
> > >
> > > Thanks for the crash course in statistical terms :)
> > >
> > > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> > really
> > > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
> > > because of its unfortunate length (although that is unfortunate it
> > > shouldn't be a deciding factor) but because it seems to have the
> > potential
> > > to confuse further. I'm not sure what we gain by adding "Simple" since to
> > > me at least, the unweighted-ness is obvious and the definition of simple
> > is
> > > not. To those who haven't been exposed to the finer details of
> > statistical
> > > definitions, I think they are more likely to read "SimpleXX" and wonder
> > "is
> > > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > > wonder what is the weighting behind these metrics.
> > >
> > > Sophie
> > >
> > > On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:
> > >
> > > > Thanks for the discussion, all.
> > > >
> > > > I've done a little more research into the statistical terminology.
> > > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > > Unfortunately, both can be computed either over a window of the last N
> > > > measurements or over all prior measurements. "Moving" just signifies
> > > > that the statistic is computed over a "live" data set, i.e., a
> > > > continuous stream of measurements, and the expectation is that the
> > > > stat would be updated in response to new measurements.
> > > >
> > > > I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> > > > good overview of the whole picture.
> > > >
> > > > After considering the discussion so far and some light reading, it
> > > > seems like "Cumulative" is truly the correct term for the all-time
> > > > metrics:
> > > >
> > > > > In a cumulative moving average, the data arrive in an
> > > > > ordered datum stream, and the user would like to get
> > > > > the average of all of the data up until the current datum
> > > > > point.
> > > >
> > > > I know that we previously felt that "cumulative" was too much of a
> > > > mouthful, but it seems like our quest for a terser term led us into a
> > > > briar patch. Also, now there is an independent source (the wiki page)
> > > > indicating that this is indeed the correct term, and it doesn't offer
> > > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > > that we'll rarely be saying the name of the classes out loud.
> > > >
> > > > As far as moving stats that operate over a window of the last N
> > > > measurements, there are multiple options, including Simple
> > > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > > infinite variations with other weighting functions. In our domain,
> > > > there is only one weighting function available, but it's still more
> > > > self-documenting and future-proof to specify the type of windowed
> > > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > > windowed (aka sampled) stats, while keeping Windowed in the name to
> > > > distinguish it from the all-time metrics.
> > > >
> > > > > In financial applications a simple moving average (SMA)
> > > > > is the unweighted mean of the previous n data.
> > > >
> > > > Therefore, we would have the proposed matrix:
> > > >
> > > > SimpleWindowedSum, SimpleWindowedCount
> > > > CumulativeSum, CumulativeCount
> > > >
> > > > Again, all these proposed names are less pithy than we might wish, but
> > > > the whole point of this exercise is to demystify and disambiguate
> > > > them. It seems like the discussion so far illustrates the futility of
> > > > trying to find names that are both short and descriptive.
> > > >
> > > > How does 

Build failed in Jenkins: kafka-trunk-jdk8 #3793

2019-07-17 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H35 (ubuntu xenial) in workspace 

[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...
[WS-CLEANUP] Done
No credentials specified
Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Command "git fetch --tags --progress 
https://github.com/apache/kafka.git +refs/heads/*:refs/remotes/origin/*" 
returned status code 128:
stdout: 
stderr: fatal: unable to access 'https://github.com/apache/kafka.git/': Could 
not resolve host: github.com

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2042)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1761)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$400(CliGitAPIImpl.java:72)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:442)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:655)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H35
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
at 
hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
at hudson.remoting.Channel.call(Channel.java:955)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
at sun.reflect.GeneratedMethodAccessor974.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
at com.sun.proxy.$Proxy135.execute(Unknown Source)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1152)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1192)
at hudson.scm.SCM.checkout(SCM.java:504)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1810)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
ERROR: Error cloning remote repo 'origin'
Retrying after 10 seconds
No credentials specified
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:894)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1161)
at 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Sophie Blee-Goldman
Sounds good to me

By the way, while I agree that we can't really do better than Sum and Count
I will say I also found the distinction a bit unclear at first glance. We
should at least document clearly that "Sum" is a "sum of values" whereas
"Count" is a "number of things" -- but that doesn't need to be part of the
KIP

On Wed, Jul 17, 2019 at 11:00 AM John Roesler  wrote:

> Thanks for the replies.
>
> I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
> something, it should still be pretty obvious that WindowedX is the
> unweighted version? In that case, I buy the argument that we don't
> need "Simple" and we can just go with:
>
> WindowedSum, WindowedCount
> CumulativeSum, CumulativeCount
>
> Sound good?
> Thanks,
> -John
>
> On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
>  wrote:
> >
> > Thanks for the crash course in statistical terms :)
> >
> > In light of this I definitely support Cumulative{Sum,Count}, but I'm
> really
> > not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
> > because of its unfortunate length (although that is unfortunate it
> > shouldn't be a deciding factor) but because it seems to have the
> potential
> > to confuse further. I'm not sure what we gain by adding "Simple" since to
> > me at least, the unweighted-ness is obvious and the definition of simple
> is
> > not. To those who haven't been exposed to the finer details of
> statistical
> > definitions, I think they are more likely to read "SimpleXX" and wonder
> "is
> > there an 'advanced' or non-simple kind of Windowed?" than they are to
> > wonder what is the weighting behind these metrics.
> >
> > Sophie
> >
> > On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:
> >
> > > Thanks for the discussion, all.
> > >
> > > I've done a little more research into the statistical terminology.
> > > Matthias is correct, "running" and "moving" appear to be synonyms.
> > > Unfortunately, both can be computed either over a window of the last N
> > > measurements or over all prior measurements. "Moving" just signifies
> > > that the statistic is computed over a "live" data set, i.e., a
> > > continuous stream of measurements, and the expectation is that the
> > > stat would be updated in response to new measurements.
> > >
> > > I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> > > good overview of the whole picture.
> > >
> > > After considering the discussion so far and some light reading, it
> > > seems like "Cumulative" is truly the correct term for the all-time
> > > metrics:
> > >
> > > > In a cumulative moving average, the data arrive in an
> > > > ordered datum stream, and the user would like to get
> > > > the average of all of the data up until the current datum
> > > > point.
> > >
> > > I know that we previously felt that "cumulative" was too much of a
> > > mouthful, but it seems like our quest for a terser term led us into a
> > > briar patch. Also, now there is an independent source (the wiki page)
> > > indicating that this is indeed the correct term, and it doesn't offer
> > > any synonyms to choose from. Maybe we can take comfort in the fact
> > > that we'll rarely be saying the name of the classes out loud.
> > >
> > > As far as moving stats that operate over a window of the last N
> > > measurements, there are multiple options, including Simple
> > > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > > infinite variations with other weighting functions. In our domain,
> > > there is only one weighting function available, but it's still more
> > > self-documenting and future-proof to specify the type of windowed
> > > statistic. Therefore, I'm proposing "Simple" as the term for the
> > > windowed (aka sampled) stats, while keeping Windowed in the name to
> > > distinguish it from the all-time metrics.
> > >
> > > > In financial applications a simple moving average (SMA)
> > > > is the unweighted mean of the previous n data.
> > >
> > > Therefore, we would have the proposed matrix:
> > >
> > > SimpleWindowedSum, SimpleWindowedCount
> > > CumulativeSum, CumulativeCount
> > >
> > > Again, all these proposed names are less pithy than we might wish, but
> > > the whole point of this exercise is to demystify and disambiguate
> > > them. It seems like the discussion so far illustrates the futility of
> > > trying to find names that are both short and descriptive.
> > >
> > > How does that sound?
> > > -John
> > >
> > > On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax  >
> > > wrote:
> > > >
> > > > It's a fair point that Ryanne raises. However, "running sum" is the
> same
> > > > as "moving sum" from my understanding.
> > > >
> > > > The issue is still, that `Sum` and `Count` which seem to be the
> cleanest
> > > > names cannot be used. While I agree that `TotalSum` and `TotalCount`
> is
> > > > somewhat redundant, I still think it the best suggestion so far.
> > > >
> > > > For the "sampled" version, I am personally fine with either
> 

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread Sophie Blee-Goldman
Hey Levani,

Thanks for the KIP! Can you clarify one thing for me -- for the
KStream#repartition signature taking a Repartitioned, will the topic be
auto-created by Streams (which seems to be the case for the signature
without a Repartitioned) or does it have to be pre-created? The wording in
the KIP makes it seem like one version of the method will auto-create
topics while the other will not.

Cheers,
Sophie

On Wed, Jul 17, 2019 at 10:15 AM Levani Kokhreidze 
wrote:

> Hello,
>
> One more bump about KIP-221 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint>)
> so it doesn’t get lost in mailing list :)
> Would love to hear communities opinions/concerns about this KIP.
>
> Regards,
> Levani
>
>
> > On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze 
> wrote:
> >
> > Hello,
> >
> > Kind reminder about this KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >
> > Regards,
> > Levani
> >
> >> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze  > wrote:
> >>
> >> Hello,
> >>
> >> In order to move this KIP forward, I’ve updated confluence page with
> the new proposal
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221:+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
> >
> >> I’ve also filled “Rejected Alternatives” section.
> >>
> >> Looking forward to discuss this KIP :)
> >>
> >> King regards,
> >> Levani
> >>
> >>
> >>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze  > wrote:
> >>>
> >>> Hello Matthias,
> >>>
> >>> Thanks for the feedback and ideas.
> >>> I like the idea of introducing dedicated `Topic` class for topic
> configuration for internal operators like `groupedBy`.
> >>> Would be great to hear others opinion about this as well.
> >>>
> >>> Kind regards,
> >>> Levani
> >>>
> >>>
>  On Jul 3, 2019, at 7:00 AM, Matthias J. Sax  > wrote:
> 
>  Levani,
> 
>  Thanks for picking up this KIP! And thanks for summarizing everything.
>  Even if some points may have been discussed already (can't really
>  remember), it's helpful to get a good summary to refresh the
> discussion.
> 
>  I think your reasoning makes sense. With regard to the distinction
>  between operators that manage topics and operators that use
> user-created
>  topics: Following this argument, it might indicate that leaving
>  `through()` as-is (as an operator that uses use-defined topics) and
>  introducing a new `repartition()` operator (an operator that manages
>  topics itself) might be good. Otherwise, there is one operator
>  `through()` that sometimes manages topics but sometimes not; a
> different
>  name, ie, new operator would make the distinction clearer.
> 
>  About adding `numOfPartitions` to `Grouped`. I am wondering if the
> same
>  argument as for `Produced` does apply and adding it is semantically
>  questionable? Might be good to get opinions of others on this, too. I
> am
>  not sure myself what solution I prefer atm.
> 
>  So far, KS uses configuration objects that allow to configure a
> certain
>  "entity" like a consumer, producer, store. If we assume that a topic
> is
>  a similar entity, I am wonder if we should have a
>  `Topic#withNumberOfPartitions()` class and method instead of a plain
>  integer? This would allow us to add other configs, like replication
>  factor, retention-time etc, easily, without the need to change the
> "main
>  API".
> 
>  Just want to give some ideas. Not sure if I like them myself. :)
> 
> 
>  -Matthias
> 
> 
> 
>  On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
> > Actually, giving it more though - maybe enhancing Produced with num
> of partitions configuration is not the best approach. Let me explain why:
> >
> > 1) If we enhance Produced class with this configuration, this will
> also affect KStream#to operation. Since KStream#to is the final sink of the
> topology, for me, it seems to be reasonable assumption that user needs to
> manually create sink topic in advance. And in that case, having num of
> partitions configuration doesn’t make much sense.
> >
> > 2) Looking at Produced class, based on API contract, seems like
> Produced is designed to be something that is explicitly for producer (key
> serializer, value serializer, partitioner those 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Thanks for the replies.

I guess that if we did add (e.g.) ExponentiallyWeightedWindowedX or
something, it should still be pretty obvious that WindowedX is the
unweighted version? In that case, I buy the argument that we don't
need "Simple" and we can just go with:

WindowedSum, WindowedCount
CumulativeSum, CumulativeCount

Sound good?
Thanks,
-John

On Wed, Jul 17, 2019 at 11:53 AM Sophie Blee-Goldman
 wrote:
>
> Thanks for the crash course in statistical terms :)
>
> In light of this I definitely support Cumulative{Sum,Count}, but I'm really
> not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
> because of its unfortunate length (although that is unfortunate it
> shouldn't be a deciding factor) but because it seems to have the potential
> to confuse further. I'm not sure what we gain by adding "Simple" since to
> me at least, the unweighted-ness is obvious and the definition of simple is
> not. To those who haven't been exposed to the finer details of statistical
> definitions, I think they are more likely to read "SimpleXX" and wonder "is
> there an 'advanced' or non-simple kind of Windowed?" than they are to
> wonder what is the weighting behind these metrics.
>
> Sophie
>
> On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:
>
> > Thanks for the discussion, all.
> >
> > I've done a little more research into the statistical terminology.
> > Matthias is correct, "running" and "moving" appear to be synonyms.
> > Unfortunately, both can be computed either over a window of the last N
> > measurements or over all prior measurements. "Moving" just signifies
> > that the statistic is computed over a "live" data set, i.e., a
> > continuous stream of measurements, and the expectation is that the
> > stat would be updated in response to new measurements.
> >
> > I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> > good overview of the whole picture.
> >
> > After considering the discussion so far and some light reading, it
> > seems like "Cumulative" is truly the correct term for the all-time
> > metrics:
> >
> > > In a cumulative moving average, the data arrive in an
> > > ordered datum stream, and the user would like to get
> > > the average of all of the data up until the current datum
> > > point.
> >
> > I know that we previously felt that "cumulative" was too much of a
> > mouthful, but it seems like our quest for a terser term led us into a
> > briar patch. Also, now there is an independent source (the wiki page)
> > indicating that this is indeed the correct term, and it doesn't offer
> > any synonyms to choose from. Maybe we can take comfort in the fact
> > that we'll rarely be saying the name of the classes out loud.
> >
> > As far as moving stats that operate over a window of the last N
> > measurements, there are multiple options, including Simple
> > (unweighted), Weighted, and Exponentially Weighted, and presumably
> > infinite variations with other weighting functions. In our domain,
> > there is only one weighting function available, but it's still more
> > self-documenting and future-proof to specify the type of windowed
> > statistic. Therefore, I'm proposing "Simple" as the term for the
> > windowed (aka sampled) stats, while keeping Windowed in the name to
> > distinguish it from the all-time metrics.
> >
> > > In financial applications a simple moving average (SMA)
> > > is the unweighted mean of the previous n data.
> >
> > Therefore, we would have the proposed matrix:
> >
> > SimpleWindowedSum, SimpleWindowedCount
> > CumulativeSum, CumulativeCount
> >
> > Again, all these proposed names are less pithy than we might wish, but
> > the whole point of this exercise is to demystify and disambiguate
> > them. It seems like the discussion so far illustrates the futility of
> > trying to find names that are both short and descriptive.
> >
> > How does that sound?
> > -John
> >
> > On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax 
> > wrote:
> > >
> > > It's a fair point that Ryanne raises. However, "running sum" is the same
> > > as "moving sum" from my understanding.
> > >
> > > The issue is still, that `Sum` and `Count` which seem to be the cleanest
> > > names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> > > somewhat redundant, I still think it the best suggestion so far.
> > >
> > > For the "sampled" version, I am personally fine with either `MovingXxx`,
> > > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> > > describe the semantics.
> > >
> > >
> > >
> > > -Matthias
> > >
> > >
> > >
> > > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > > > I'm +1 on Windowed, was about to suggest that as I was catching up on
> > the
> > > > discussion but Bill beat me to it :)
> > > >
> > > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> > > >
> > > >> Hi John,
> > > >>
> > > >> Thanks for the updates.
> > > >>
> > > >> I like RunningCount and RunningSum.
> > > >>
> > > >> What about WindowedCount, 

Re: Add to contribution list

2019-07-17 Thread Bill Bejeck
No problem Omar.  You're all set now in the wiki as well.

-Bill

On Wed, Jul 17, 2019 at 1:19 PM Omar Al-Safi  wrote:

> Awesome, thanks a lot! Correct apparently it doesn't use Jira crowd
> directory. I have created a username there, the same username: omarsmak
>
> Thanks again!
> Omar
>
> On Wed, 17 Jul 2019 at 19:06, Bill Bejeck  wrote:
>
> > Thanks for your interest in Apache Kafka.
> >
> > You're all set with Jira, but I couldn't find your user name in the wiki.
> > The wiki requires a separate username creation.
> >
> > Can you add yourself there and ping the list again and we'll get you
> added.
> >
> > Thanks,
> > Bill
> >
> > On Wed, Jul 17, 2019 at 6:42 AM Omar Al-Safi  wrote:
> >
> > > Hi guys,
> > >
> > > I would like to be added to the contribution list in Jira and Wiki in
> > order
> > > to work on some issues. My username is : omarsmak
> > >
> > > Thanks
> > >
> >
>


Build failed in Jenkins: kafka-trunk-jdk8 #3792

2019-07-17 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H35 (ubuntu xenial) in workspace 

[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...
[WS-CLEANUP] Done
No credentials specified
Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error cloning remote repo 'origin'
hudson.plugins.git.GitException: Command "git fetch --tags --progress 
https://github.com/apache/kafka.git +refs/heads/*:refs/remotes/origin/*" 
returned status code 128:
stdout: 
stderr: fatal: unable to access 'https://github.com/apache/kafka.git/': Could 
not resolve host: github.com

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2042)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1761)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$400(CliGitAPIImpl.java:72)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:442)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$2.execute(CliGitAPIImpl.java:655)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:153)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:146)
at hudson.remoting.UserRequest.perform(UserRequest.java:212)
at hudson.remoting.UserRequest.perform(UserRequest.java:54)
at hudson.remoting.Request$2.run(Request.java:369)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Suppressed: hudson.remoting.Channel$CallSiteStackTrace: Remote call to 
H35
at 
hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
at 
hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
at hudson.remoting.Channel.call(Channel.java:955)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:146)
at sun.reflect.GeneratedMethodAccessor974.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:132)
at com.sun.proxy.$Proxy135.execute(Unknown Source)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1152)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1192)
at hudson.scm.SCM.checkout(SCM.java:504)
at 
hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at 
jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1810)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at 
hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
ERROR: Error cloning remote repo 'origin'
Retrying after 10 seconds
No credentials specified
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:894)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1161)
at 

Re: [DISCUSS] KIP-479: Add Materialized to Join

2019-07-17 Thread Bill Bejeck
Thanks for the response, John.

> If I can offer my thoughts, it seems better to just document on the
> Stream join javadoc for the Materialized parameter that it will not
> make the join result queriable. I'm not opposed to the queriable flag
> in general, but introducing it is a much larger consideration that has
> previously derailed this KIP discussion. In the interest of just
> closing the gap and keeping the API change small, it seems better to
> just go with documentation for now.

I agree with your statement here.  IMHO the most important goal of this KIP
is to not breaking existing users and gain some consistency of the API.

I'll update the KIP accordingly.

-Bill

On Tue, Jul 16, 2019 at 11:55 AM John Roesler  wrote:

> Hi Bill,
>
> Thanks for driving this KIP toward a conclusion. I'm on board with
> your decision.
>
> You didn't mention whether you're still proposing to add the
> "queriable" flag to the Materialized config object, or just document
> that a Stream join is never queriable. Both options have come up
> earlier in the discussion, so it would be good to pin this down.
>
> If I can offer my thoughts, it seems better to just document on the
> Stream join javadoc for the Materialized parameter that it will not
> make the join result queriable. I'm not opposed to the queriable flag
> in general, but introducing it is a much larger consideration that has
> previously derailed this KIP discussion. In the interest of just
> closing the gap and keeping the API change small, it seems better to
> just go with documentation for now.
>
> Thanks again,
> -John
>
> On Thu, Jul 11, 2019 at 2:45 PM Bill Bejeck  wrote:
> >
> > Thanks all for the great discussion so far.
> >
> > Everyone has made excellent points, and I appreciate the detail everyone
> > has put into their arguments.
> >
> > However, after carefully evaluating all the points made so far, creating
> an
> > overload with Materialized is still my #1 option.
> > My reasoning for saying so is two-fold:
> >
> >1. It's a small change, and IMHO since it's consistent with our
> current
> >API concerning state store usage, the cognitive load on users will be
> >minimal.
> >2. It achieves the most important goal of this KIP, namely to close
> the
> >gap of naming state stores independently of the join operator name.
> >
> > Additionally, I agree with the points made by Matthias earlier (I realize
> > there is some overlap here).
> >
> > >  - the main purpose of this KIP is to close the naming gap what we
> achieve
> > >  - we can allow people to use the new in-memory store
> > >  - we allow people to enable/disable caching
> > >  - we unify the API
> > >  - we decouple querying from naming
> > >  - it's a small API change
> >
> > Although it's not a perfect solution,  IMHO the positives of using
> > Materialize far outweigh the negatives, and from what we've discussed so
> > far, anything we implement seems to involve an additional change down the
> > road.
> >
> > If others are still strongly opposed to using Materialized, my other
> > preferences would be
> >
> >1. Add a "withStoreName" to Joined.  Although I agree with Guozhang
> that
> >having a parameter that only applies to one use-case would be clumsy.
> >2. Add a String overload for naming the store, but this would be my
> >least favorite option as IMHO this seems to be a step backward from
> why we
> >introduced configuration objects in the first place.
> >
> > Thanks,
> > Bill
> >
> > On Thu, Jun 27, 2019 at 4:45 PM Matthias J. Sax 
> > wrote:
> >
> > > Thanks for the KIP Bill!
> > >
> > > Great discussion to far.
> > >
> > > About John's idea about querying upstream stores and don't materialize
> a
> > > store: I agree with Bill that this seems to be an orthogonal question,
> > > and it might be better to treat it as an independent optimization and
> > > exclude from this KIP.
> > >
> > > > What should be the behavior if there is no store
> > > > configured (e.g., if Materialized with only serdes) and querying is
> > > > enabled?
> > >
> > > IMHO, this could be an error case. If one wants to query a store, they
> > > need to provide a name -- if you don't know the name, how would you
> > > actually query the store (even if it would be possible to get the name
> > > from the `TopologyDescription`, it seems clumsy).
> > >
> > > If we don't want to throw an error, materializing seems to be the right
> > > option, to exclude "query optimization" from this KIP. I would be ok
> > > with this option, even if it's clumsy to get the name from
> > > `TopologyDescription`; hence, I would prefer to treat it as an error.
> > >
> > > > To get back to the current behavior, users would have to
> > > > add a "bytes store supplier" to the Materialized to indicate that,
> > > > yes, they really want a state store there.
> > >
> > > This sound like a quite subtle semantic difference on how to use the
> > > API. Might be hard to explain to users. I would prefer to 

Re: Add to contribution list

2019-07-17 Thread Omar Al-Safi
Awesome, thanks a lot! Correct apparently it doesn't use Jira crowd
directory. I have created a username there, the same username: omarsmak

Thanks again!
Omar

On Wed, 17 Jul 2019 at 19:06, Bill Bejeck  wrote:

> Thanks for your interest in Apache Kafka.
>
> You're all set with Jira, but I couldn't find your user name in the wiki.
> The wiki requires a separate username creation.
>
> Can you add yourself there and ping the list again and we'll get you added.
>
> Thanks,
> Bill
>
> On Wed, Jul 17, 2019 at 6:42 AM Omar Al-Safi  wrote:
>
> > Hi guys,
> >
> > I would like to be added to the contribution list in Jira and Wiki in
> order
> > to work on some issues. My username is : omarsmak
> >
> > Thanks
> >
>


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2019-07-17 Thread Levani Kokhreidze
Hello,

One more bump about KIP-221 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
 
)
 so it doesn’t get lost in mailing list :)
Would love to hear communities opinions/concerns about this KIP.

Regards,
Levani


> On Jul 12, 2019, at 5:27 PM, Levani Kokhreidze  wrote:
> 
> Hello,
> 
> Kind reminder about this KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>  
> 
> 
> Regards,
> Levani
> 
>> On Jul 9, 2019, at 11:38 AM, Levani Kokhreidze > > wrote:
>> 
>> Hello,
>> 
>> In order to move this KIP forward, I’ve updated confluence page with the new 
>> proposal 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Enhance+KStream+with+Connecting+Topic+Creation+and+Repartition+Hint
>>  
>> 
>> I’ve also filled “Rejected Alternatives” section. 
>> 
>> Looking forward to discuss this KIP :)
>> 
>> King regards,
>> Levani
>> 
>> 
>>> On Jul 3, 2019, at 1:08 PM, Levani Kokhreidze >> > wrote:
>>> 
>>> Hello Matthias,
>>> 
>>> Thanks for the feedback and ideas. 
>>> I like the idea of introducing dedicated `Topic` class for topic 
>>> configuration for internal operators like `groupedBy`.
>>> Would be great to hear others opinion about this as well.
>>> 
>>> Kind regards,
>>> Levani 
>>> 
>>> 
 On Jul 3, 2019, at 7:00 AM, Matthias J. Sax >>> > wrote:
 
 Levani,
 
 Thanks for picking up this KIP! And thanks for summarizing everything.
 Even if some points may have been discussed already (can't really
 remember), it's helpful to get a good summary to refresh the discussion.
 
 I think your reasoning makes sense. With regard to the distinction
 between operators that manage topics and operators that use user-created
 topics: Following this argument, it might indicate that leaving
 `through()` as-is (as an operator that uses use-defined topics) and
 introducing a new `repartition()` operator (an operator that manages
 topics itself) might be good. Otherwise, there is one operator
 `through()` that sometimes manages topics but sometimes not; a different
 name, ie, new operator would make the distinction clearer.
 
 About adding `numOfPartitions` to `Grouped`. I am wondering if the same
 argument as for `Produced` does apply and adding it is semantically
 questionable? Might be good to get opinions of others on this, too. I am
 not sure myself what solution I prefer atm.
 
 So far, KS uses configuration objects that allow to configure a certain
 "entity" like a consumer, producer, store. If we assume that a topic is
 a similar entity, I am wonder if we should have a
 `Topic#withNumberOfPartitions()` class and method instead of a plain
 integer? This would allow us to add other configs, like replication
 factor, retention-time etc, easily, without the need to change the "main
 API".
 
 Just want to give some ideas. Not sure if I like them myself. :)
 
 
 -Matthias
 
 
 
 On 7/1/19 1:04 AM, Levani Kokhreidze wrote:
> Actually, giving it more though - maybe enhancing Produced with num of 
> partitions configuration is not the best approach. Let me explain why:
> 
> 1) If we enhance Produced class with this configuration, this will also 
> affect KStream#to operation. Since KStream#to is the final sink of the 
> topology, for me, it seems to be reasonable assumption that user needs to 
> manually create sink topic in advance. And in that case, having num of 
> partitions configuration doesn’t make much sense. 
> 
> 2) Looking at Produced class, based on API contract, seems like Produced 
> is designed to be something that is explicitly for producer (key 
> serializer, value serializer, partitioner those all are producer specific 
> configurations) and num of partitions is topic level configuration. And I 
> don’t think mixing topic and producer level configurations together in 
> one class is the good approach.
> 
> 3) Looking at KStream interface, seems like Produced parameter is for 
> operations that work with non-internal (e.g topics created and managed 
> internally by Kafka Streams) topics and I think we should leave it as it 
> is in that case.
> 
> Taking all this things into account, I think we should distinguish 
> between DSL 

Re: Add to contribution list

2019-07-17 Thread Bill Bejeck
Thanks for your interest in Apache Kafka.

You're all set with Jira, but I couldn't find your user name in the wiki.
The wiki requires a separate username creation.

Can you add yourself there and ping the list again and we'll get you added.

Thanks,
Bill

On Wed, Jul 17, 2019 at 6:42 AM Omar Al-Safi  wrote:

> Hi guys,
>
> I would like to be added to the contribution list in Jira and Wiki in order
> to work on some issues. My username is : omarsmak
>
> Thanks
>


[jira] [Created] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-07-17 Thread Boyang Chen (JIRA)
Boyang Chen created KAFKA-8677:
--

 Summary: Flakey test 
GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
 Key: KAFKA-8677
 URL: https://issues.apache.org/jira/browse/KAFKA-8677
 Project: Kafka
  Issue Type: Bug
Reporter: Boyang Chen


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Sophie Blee-Goldman
Thanks for the crash course in statistical terms :)

In light of this I definitely support Cumulative{Sum,Count}, but I'm really
not crazy about SimpleWindowed{Sum,Count} (vs just Windowed). Not so much
because of its unfortunate length (although that is unfortunate it
shouldn't be a deciding factor) but because it seems to have the potential
to confuse further. I'm not sure what we gain by adding "Simple" since to
me at least, the unweighted-ness is obvious and the definition of simple is
not. To those who haven't been exposed to the finer details of statistical
definitions, I think they are more likely to read "SimpleXX" and wonder "is
there an 'advanced' or non-simple kind of Windowed?" than they are to
wonder what is the weighting behind these metrics.

Sophie

On Wed, Jul 17, 2019 at 8:18 AM John Roesler  wrote:

> Thanks for the discussion, all.
>
> I've done a little more research into the statistical terminology.
> Matthias is correct, "running" and "moving" appear to be synonyms.
> Unfortunately, both can be computed either over a window of the last N
> measurements or over all prior measurements. "Moving" just signifies
> that the statistic is computed over a "live" data set, i.e., a
> continuous stream of measurements, and the expectation is that the
> stat would be updated in response to new measurements.
>
> I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> good overview of the whole picture.
>
> After considering the discussion so far and some light reading, it
> seems like "Cumulative" is truly the correct term for the all-time
> metrics:
>
> > In a cumulative moving average, the data arrive in an
> > ordered datum stream, and the user would like to get
> > the average of all of the data up until the current datum
> > point.
>
> I know that we previously felt that "cumulative" was too much of a
> mouthful, but it seems like our quest for a terser term led us into a
> briar patch. Also, now there is an independent source (the wiki page)
> indicating that this is indeed the correct term, and it doesn't offer
> any synonyms to choose from. Maybe we can take comfort in the fact
> that we'll rarely be saying the name of the classes out loud.
>
> As far as moving stats that operate over a window of the last N
> measurements, there are multiple options, including Simple
> (unweighted), Weighted, and Exponentially Weighted, and presumably
> infinite variations with other weighting functions. In our domain,
> there is only one weighting function available, but it's still more
> self-documenting and future-proof to specify the type of windowed
> statistic. Therefore, I'm proposing "Simple" as the term for the
> windowed (aka sampled) stats, while keeping Windowed in the name to
> distinguish it from the all-time metrics.
>
> > In financial applications a simple moving average (SMA)
> > is the unweighted mean of the previous n data.
>
> Therefore, we would have the proposed matrix:
>
> SimpleWindowedSum, SimpleWindowedCount
> CumulativeSum, CumulativeCount
>
> Again, all these proposed names are less pithy than we might wish, but
> the whole point of this exercise is to demystify and disambiguate
> them. It seems like the discussion so far illustrates the futility of
> trying to find names that are both short and descriptive.
>
> How does that sound?
> -John
>
> On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax 
> wrote:
> >
> > It's a fair point that Ryanne raises. However, "running sum" is the same
> > as "moving sum" from my understanding.
> >
> > The issue is still, that `Sum` and `Count` which seem to be the cleanest
> > names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> > somewhat redundant, I still think it the best suggestion so far.
> >
> > For the "sampled" version, I am personally fine with either `MovingXxx`,
> > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> > describe the semantics.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > > I'm +1 on Windowed, was about to suggest that as I was catching up on
> the
> > > discussion but Bill beat me to it :)
> > >
> > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks for the updates.
> > >>
> > >> I like RunningCount and RunningSum.
> > >>
> > >> What about WindowedCount, WindowedSum instead of Moving?
> > >> I'm just throwing this out there as Windowed seems more intuitive to
> me,
> > >> but I'm not married to the idea.
> > >>
> > >> -Bill
> > >>
> > >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler 
> wrote:
> > >>
> > >>> No worries! Choosing good public API names is a high-impact design
> > >>> activity.
> > >>>
> > >>> Matthias, Bruno, Bill, and Stanislav,
> > >>>
> > >>> You've all contributed to this discussion or the vote so far... How
> do
> > >>> you feel about the proposed name change:
> > >>>
> > >>> MovingCount, MovingSum   (instead of Sampled)
> > >>> RunningCount, RunningSum   

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread Bill Bejeck
I'm fine with the new names.

While I was previously in the "brevity" camp when it came to the Cumulative
name, I'll take clarity over brevity.

Thanks for the updates, John.

-Bill

On Wed, Jul 17, 2019 at 11:18 AM John Roesler  wrote:

> Thanks for the discussion, all.
>
> I've done a little more research into the statistical terminology.
> Matthias is correct, "running" and "moving" appear to be synonyms.
> Unfortunately, both can be computed either over a window of the last N
> measurements or over all prior measurements. "Moving" just signifies
> that the statistic is computed over a "live" data set, i.e., a
> continuous stream of measurements, and the expectation is that the
> stat would be updated in response to new measurements.
>
> I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
> good overview of the whole picture.
>
> After considering the discussion so far and some light reading, it
> seems like "Cumulative" is truly the correct term for the all-time
> metrics:
>
> > In a cumulative moving average, the data arrive in an
> > ordered datum stream, and the user would like to get
> > the average of all of the data up until the current datum
> > point.
>
> I know that we previously felt that "cumulative" was too much of a
> mouthful, but it seems like our quest for a terser term led us into a
> briar patch. Also, now there is an independent source (the wiki page)
> indicating that this is indeed the correct term, and it doesn't offer
> any synonyms to choose from. Maybe we can take comfort in the fact
> that we'll rarely be saying the name of the classes out loud.
>
> As far as moving stats that operate over a window of the last N
> measurements, there are multiple options, including Simple
> (unweighted), Weighted, and Exponentially Weighted, and presumably
> infinite variations with other weighting functions. In our domain,
> there is only one weighting function available, but it's still more
> self-documenting and future-proof to specify the type of windowed
> statistic. Therefore, I'm proposing "Simple" as the term for the
> windowed (aka sampled) stats, while keeping Windowed in the name to
> distinguish it from the all-time metrics.
>
> > In financial applications a simple moving average (SMA)
> > is the unweighted mean of the previous n data.
>
> Therefore, we would have the proposed matrix:
>
> SimpleWindowedSum, SimpleWindowedCount
> CumulativeSum, CumulativeCount
>
> Again, all these proposed names are less pithy than we might wish, but
> the whole point of this exercise is to demystify and disambiguate
> them. It seems like the discussion so far illustrates the futility of
> trying to find names that are both short and descriptive.
>
> How does that sound?
> -John
>
> On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax 
> wrote:
> >
> > It's a fair point that Ryanne raises. However, "running sum" is the same
> > as "moving sum" from my understanding.
> >
> > The issue is still, that `Sum` and `Count` which seem to be the cleanest
> > names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> > somewhat redundant, I still think it the best suggestion so far.
> >
> > For the "sampled" version, I am personally fine with either `MovingXxx`,
> > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> > describe the semantics.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > > I'm +1 on Windowed, was about to suggest that as I was catching up on
> the
> > > discussion but Bill beat me to it :)
> > >
> > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks for the updates.
> > >>
> > >> I like RunningCount and RunningSum.
> > >>
> > >> What about WindowedCount, WindowedSum instead of Moving?
> > >> I'm just throwing this out there as Windowed seems more intuitive to
> me,
> > >> but I'm not married to the idea.
> > >>
> > >> -Bill
> > >>
> > >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler 
> wrote:
> > >>
> > >>> No worries! Choosing good public API names is a high-impact design
> > >>> activity.
> > >>>
> > >>> Matthias, Bruno, Bill, and Stanislav,
> > >>>
> > >>> You've all contributed to this discussion or the vote so far... How
> do
> > >>> you feel about the proposed name change:
> > >>>
> > >>> MovingCount, MovingSum   (instead of Sampled)
> > >>> RunningCount, RunningSum   (Instead of Total)
> > >>>
> > >>> Thanks,
> > >>> -John
> > >>>
> > >>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
> > >>> wrote:
> > 
> >  John, that makes sense to me. Sorry for the bikeshedding.
> > 
> >  Ryanne
> > 
> >  On Tue, Jul 16, 2019 at 12:49 PM John Roesler 
> > >> wrote:
> > 
> > > Thanks for the explanation and the suggestion, Ryanne,
> > >
> > > I went with "sampled" just because these are instances of
> > >> SampledStat,
> > > which in the Kafka Metrics ecosystem are computed from a window of
> > > recent samples. 

[jira] [Resolved] (KAFKA-8218) IllegalStateException while accessing context in Transformer

2019-07-17 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-8218.
--
Resolution: Not A Problem

> IllegalStateException while accessing context in Transformer
> 
>
> Key: KAFKA-8218
> URL: https://issues.apache.org/jira/browse/KAFKA-8218
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.1
>Reporter: Bartłomiej Kępa
>Priority: Major
>
> Custom Kotlin implementation of Transformer throws 
> {code}
> java.lang.IllegalStateException: This should not happen as headers() should 
> only be called while a record is processed
> {code}
> while being plugged into the stream topology that actually works. Invocation 
> of transform() method has valid arguments (Key and GenericRecord).
> The exception is being thrown because in our implementation of transform we 
> need to access headers from context.  
> {code:java}
>  override fun transform(key: String?, value: GenericRecord): 
> KeyValue {
>   val headers = context.headers()
>   ...
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-7176) State store metrics for migrated tasks are not removed

2019-07-17 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-7176.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

> State store metrics for migrated tasks are not removed
> --
>
> Key: KAFKA-7176
> URL: https://issues.apache.org/jira/browse/KAFKA-7176
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.1.0
>Reporter: Sam Lendle
>Priority: Major
> Fix For: 2.3.0
>
>
> I observed that state store metrics for tasks that have been migrated to 
> other instances are not removed and are still being updated with phantom 
> values, (when viewed for example via jmx mbeans). 
> For all tasks/threads on the same instance (including for migrated tasks), 
> the values of state store metrics are all (nearly) the same. For the rate 
> metrics at least, the value reported for each task is the rate I expect for 
> all active tasks on that instance, so things are apparently being counted 
> multiple times. Presumably, this is how migrated task metrics are being 
> updated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-7850) Remove deprecated KStreamTestDriver

2019-07-17 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-7850.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

KStreamTestDriver has been removed since 2.3.0 release

> Remove deprecated KStreamTestDriver
> ---
>
> Key: KAFKA-7850
> URL: https://issues.apache.org/jira/browse/KAFKA-7850
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams, unit tests
>Reporter: Richard Yu
>Assignee: Richard Yu
>Priority: Major
> Fix For: 2.3.0
>
>
> Eversince a series of new test improvements were made to KafkaStreams test 
> suite, KStreamTestDriver was deprecated in favor of TopologyTestDriver. 
> However, a couple existing unit tests continues to use KStreamTestDriver. We 
> wish to migrate all remaining classes to TopologyTestDriver so we could 
> remove the deprecated class.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: [DISCUSS] KIP-487: Automatic Topic Creation on Producer

2019-07-17 Thread Justine Olshan
Hello all,

I was looking at this KIP again, and there is a decision I made that I
think is worth discussing.

In the case where both the broker and producer's
'auto.create.topics.enable' are set to true, we have to choose either the
broker configs or the producer configs for the replication
factor/partitions.

Currently, the decision is to have the broker defaults take precedence. (It
is easier to do this in the implementation.) It also makes some sense for
this behavior to take precedence since this behavior already occurs as the
default.

However, I was wondering if it would be odd for those who can only see the
producer side to set configs for replication factor/partitions and see
different results. Currently the documentation for the config states that
the config values are only used when the broker config is not enabled, but
this might not always be clear to the user.  Changing the code to have the
producer's configurations take precedence is possible, but I was wondering
what everyone thought.

Thank you,
Justine

On Fri, Jul 12, 2019 at 2:49 PM Justine Olshan  wrote:

> Just a quick update--
>
> It seems that enabling both the broker and producer configs works fine,
> except that the broker configurations for partitions, replication factor
> take precedence.
> I don't know if that is something we would want to change, but I'll be
> updating the KIP for now to reflect this. Perhaps we would want to add more
> to the documentation of the the producer configs to clarify.
>
> Thank you,
> Justine
>
> On Fri, Jul 12, 2019 at 9:28 AM Justine Olshan 
> wrote:
>
>> Hi Colin,
>>
>> Thanks for looking at the KIP. I can definitely add to the title to make
>> it more clear.
>>
>> It makes sense that both configurations could be turned on since there
>> are many cases where the user can not control the server-side
>> configurations. I was a little concerned about how both interacting would
>> work out -- if there would be to many requests for new topics, for example.
>> But it since it does make sense to allow both configurations enabled, I
>> will test out some scenarios and I'll change the KIP to support this.
>>
>> I also agree with documentation about distinguishing the differences. I
>> was having some trouble with the wording but I like the phrases
>> "server-side" and "client-side." That's a good distinction I can use when
>> describing.
>>
>> I'll try to update the KIP soon keeping everyone's input in mind.
>>
>> Thanks,
>> Justine
>>
>> On Thu, Jul 11, 2019 at 5:39 PM Colin McCabe  wrote:
>>
>>> Hi Justine,
>>>
>>> Thanks for the KIP.  This seems like a good step towards removing
>>> server-side topic auto-creation.
>>>
>>> We should add included "client-side" to the title of the KIP somewhere,
>>> to make it clear that we're talking about client-side auto creation.
>>>
>>> The KIP says:
>>> > In order to automatically create topics with the producer, the
>>> producer's
>>> > auto.create.topics.enable config must be set to true and the broker
>>> config should be set to false
>>>
>>> From a user's point of view, this seems counter-intuitive.  In order to
>>> auto-create topics the broker's auto.create.topics.enable config should be
>>> set to false?  It seems like the server-side auto-create is unrelated to
>>> the client-side auto-create.  We could have both turned on (and I'm sure
>>> that in the real world, people will try this configuration...)  There's no
>>> reason not to support this, I think.
>>>
>>> We should add some documentation explaining the difference between
>>> server-side and client-side auto-creation.  Without documentation, an admin
>>> might think that they had disabled all forms of auto-creation by setting
>>> the -side setting to false-- but this is not the case, of course.
>>>
>>> best,
>>> Colin
>>>
>>>
>>> On Thu, Jul 11, 2019, at 16:22, Justine Olshan wrote:
>>> > Hi Dhruvil,
>>> >
>>> > Thanks for reading the KIP!
>>> > That was the general idea for deprecation. We would log a warning when
>>> the
>>> > config is enabled on the broker.
>>> > I also believe that there would be a change to documentation.
>>> > If there is anything else that should be done, please let me know!
>>> >
>>> > Justine
>>> >
>>> > On Thu, Jul 11, 2019 at 4:17 PM Dhruvil Shah 
>>> wrote:
>>> >
>>> > > Hi Justine,
>>> > >
>>> > > Thanks for the KIP, this is great!
>>> > >
>>> > > Could you add some more information about what deprecating the broker
>>> > > configuration means? Would we log a warning in the logs when auto
>>> topic
>>> > > creation is enabled on the broker, for example?
>>> > >
>>> > > Thanks,
>>> > > Dhruvil
>>> > >
>>> > > On Thu, Jul 11, 2019 at 10:28 AM Justine Olshan <
>>> jols...@confluent.io>
>>> > > wrote:
>>> > >
>>> > > > Hello all,
>>> > > >
>>> > > > I'd like to start a discussion thread for KIP-487.
>>> > > > This KIP plans to deprecate the current system of auto-creating
>>> topics
>>> > > > through requests to the metadata and give the producer the ability
>>> to
>>> > > 

Re: [VOTE] KIP-455: Create an Administrative API for Replica Reassignment

2019-07-17 Thread Stanislav Kozlovski
Hey everybody,

We have further iterated on the KIP in the accompanying discussion thread
and I'd like to propose we resume the vote.

Some notable changes:
- we will store reassignment information in the `/brokers/topics/[topic]`
- we will internally use two collections to represent a reassignment -
"addingReplicas" and "removingReplicas". LeaderAndIsr has been updated
accordingly
- the Alter API will still use the "targetReplicas" collection, but the
List API will now return three separate collections - the full replica set,
the replicas we are adding as part of this reassignment ("addingReplicas")
and the replicas we are removing ("removingReplicas")
- cancellation of a reassignment now means a proper rollback of the
assignment to its original state prior to the API call

As always, you can re-read the KIP here
https://cwiki.apache.org/confluence/display/KAFKA/KIP-455%3A+Create+an+Administrative+API+for+Replica+Reassignment

Best,
Stanislav

On Wed, May 22, 2019 at 6:12 PM Colin McCabe  wrote:

> Hi George,
>
> Thanks for taking a look.  I am working on getting a PR done as a
> proof-of-concept.  I'll post it soon.  Then we'll finish up the vote.
>
> best,
> Colin
>
> On Tue, May 21, 2019, at 17:33, George Li wrote:
> >  Hi Colin,
> >
> >  Great! Looking forward to these features.+1 (non-binding)
> >
> > What is the estimated timeline to have this implemented?  If any help
> > is needed in the implementation of cancelling reassignments,  I can
> > help if there is spare cycle.
> >
> >
> > Thanks,
> > George
> >
> >
> >
> > On Thursday, May 16, 2019, 9:48:56 AM PDT, Colin McCabe
> >  wrote:
> >
> >  Hi George,
> >
> > Yes, KIP-455 allows the reassignment of individual partitions to be
> > cancelled.  I think it's very important for these operations to be at
> > the partition level.
> >
> > best,
> > Colin
> >
> > On Tue, May 14, 2019, at 16:34, George Li wrote:
> > >  Hi Colin,
> > >
> > > Thanks for the updated KIP.  It has very good improvements of Kafka
> > > reassignment operations.
> > >
> > > One question, looks like the KIP includes the Cancellation of
> > > individual pending reassignments as well when the
> > > AlterPartitionReasisgnmentRequest has empty replicas for the
> > > topic/partition. Will you also be implementing the the partition
> > > cancellation/rollback in the PR ?If yes,  it will make KIP-236 (it
> > > has PR already) trivial, since the cancel all pending reassignments,
> > > one just needs to do a ListPartitionRessignmentRequest, then submit
> > > empty replicas for all those topic/partitions in
> > > one AlterPartitionReasisgnmentRequest.
> > >
> > >
> > > Thanks,
> > > George
> > >
> > >On Friday, May 10, 2019, 8:44:31 PM PDT, Colin McCabe
> > >  wrote:
> > >
> > >  On Fri, May 10, 2019, at 17:34, Colin McCabe wrote:
> > > > On Fri, May 10, 2019, at 16:43, Jason Gustafson wrote:
> > > > > Hi Colin,
> > > > >
> > > > > I think storing reassignment state at the partition level is the
> right move
> > > > > and I also agree that replicas should understand that there is a
> > > > > reassignment in progress. This makes KIP-352 a trivial follow-up
> for
> > > > > example. The only doubt I have is whether the leader and isr znode
> is the
> > > > > right place to store the target reassignment. It is a bit odd to
> keep the
> > > > > target assignment in a separate place from the current assignment,
> right? I
> > > > > assume the thinking is probably that although the current
> assignment should
> > > > > probably be in the leader and isr znode as well, it is hard to
> move the
> > > > > state in a compatible way. Is that right? But if we have no plan
> to remove
> > > > > the assignment znode, do you see a downside to storing the target
> > > > > assignment there as well?
> > > > >
> > > >
> > > > Hi Jason,
> > > >
> > > > That's a good point -- it's probably better to keep the target
> > > > assignment in the same znode as the current assignment, for
> > > > consistency.  I'll change the KIP.
> > >
> > > Hi Jason,
> > >
> > > Thanks again for the review.
> > >
> > > I took another look at this, and I think we should stick with the
> > > initial proposal of putting the reassignment state into
> > > /brokers/topics/[topic]/partitions/[partitionId]/state.  The reason is
> > > because we'll want to bump the leader epoch for the partition when
> > > changing the reassignment state, and the leader epoch resides in that
> > > znode anyway.  I agree there is some inconsistency here, but so be it:
> > > if we were to greenfield these zookeeper data structures, we might do
> > > it differently, but the proposed scheme will work fine and be
> > > extensible for the future.
> > >
> > > >
> > > > > A few additional questions:
> > > > >
> > > > > 1. Should `alterPartitionReassignments` be
> `alterPartitionAssignments`?
> > > > > It's the current assignment we're altering, right?
> > > >
> > > > That's fair.  AlterPartitionAssigments reads a little better, and
> I'll
> > > > 

Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics

2019-07-17 Thread John Roesler
Thanks for the discussion, all.

I've done a little more research into the statistical terminology.
Matthias is correct, "running" and "moving" appear to be synonyms.
Unfortunately, both can be computed either over a window of the last N
measurements or over all prior measurements. "Moving" just signifies
that the statistic is computed over a "live" data set, i.e., a
continuous stream of measurements, and the expectation is that the
stat would be updated in response to new measurements.

I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
good overview of the whole picture.

After considering the discussion so far and some light reading, it
seems like "Cumulative" is truly the correct term for the all-time
metrics:

> In a cumulative moving average, the data arrive in an
> ordered datum stream, and the user would like to get
> the average of all of the data up until the current datum
> point.

I know that we previously felt that "cumulative" was too much of a
mouthful, but it seems like our quest for a terser term led us into a
briar patch. Also, now there is an independent source (the wiki page)
indicating that this is indeed the correct term, and it doesn't offer
any synonyms to choose from. Maybe we can take comfort in the fact
that we'll rarely be saying the name of the classes out loud.

As far as moving stats that operate over a window of the last N
measurements, there are multiple options, including Simple
(unweighted), Weighted, and Exponentially Weighted, and presumably
infinite variations with other weighting functions. In our domain,
there is only one weighting function available, but it's still more
self-documenting and future-proof to specify the type of windowed
statistic. Therefore, I'm proposing "Simple" as the term for the
windowed (aka sampled) stats, while keeping Windowed in the name to
distinguish it from the all-time metrics.

> In financial applications a simple moving average (SMA)
> is the unweighted mean of the previous n data.

Therefore, we would have the proposed matrix:

SimpleWindowedSum, SimpleWindowedCount
CumulativeSum, CumulativeCount

Again, all these proposed names are less pithy than we might wish, but
the whole point of this exercise is to demystify and disambiguate
them. It seems like the discussion so far illustrates the futility of
trying to find names that are both short and descriptive.

How does that sound?
-John

On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax  wrote:
>
> It's a fair point that Ryanne raises. However, "running sum" is the same
> as "moving sum" from my understanding.
>
> The issue is still, that `Sum` and `Count` which seem to be the cleanest
> names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> somewhat redundant, I still think it the best suggestion so far.
>
> For the "sampled" version, I am personally fine with either `MovingXxx`,
> `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> describe the semantics.
>
>
>
> -Matthias
>
>
>
> On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > I'm +1 on Windowed, was about to suggest that as I was catching up on the
> > discussion but Bill beat me to it :)
> >
> > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck  wrote:
> >
> >> Hi John,
> >>
> >> Thanks for the updates.
> >>
> >> I like RunningCount and RunningSum.
> >>
> >> What about WindowedCount, WindowedSum instead of Moving?
> >> I'm just throwing this out there as Windowed seems more intuitive to me,
> >> but I'm not married to the idea.
> >>
> >> -Bill
> >>
> >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler  wrote:
> >>
> >>> No worries! Choosing good public API names is a high-impact design
> >>> activity.
> >>>
> >>> Matthias, Bruno, Bill, and Stanislav,
> >>>
> >>> You've all contributed to this discussion or the vote so far... How do
> >>> you feel about the proposed name change:
> >>>
> >>> MovingCount, MovingSum   (instead of Sampled)
> >>> RunningCount, RunningSum   (Instead of Total)
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan 
> >>> wrote:
> 
>  John, that makes sense to me. Sorry for the bikeshedding.
> 
>  Ryanne
> 
>  On Tue, Jul 16, 2019 at 12:49 PM John Roesler 
> >> wrote:
> 
> > Thanks for the explanation and the suggestion, Ryanne,
> >
> > I went with "sampled" just because these are instances of
> >> SampledStat,
> > which in the Kafka Metrics ecosystem are computed from a window of
> > recent samples. Thinking more about it, the fact that they are
> >> sampled
> > and the fact that they are windowed are orthogonal, which is what
> > you're pointing out... sampling by itself doesn't indicate that it's
> >> a
> > moving average.
> >
> > Since there is no way in Kafka Metrics for a metric to be sampled and
> > not windowed/moving/decaying, calling them Sampled would never be
> > incorrect. But to someone unfamiliar with the code, it wouldn't
> > 

Jenkins build is back to normal : kafka-trunk-jdk11 #701

2019-07-17 Thread Apache Jenkins Server
See 




Re: Any documents on this

2019-07-17 Thread Omar Al-Safi
ii) Are we talking here about a Kafka OnPremise to Kafka into the cloud
migration or you would like to move the data from OracleDB OnPremise to
Kafka OnCloud?

On Mon, 15 Jul 2019 at 14:01, Habeebullah khwaja  wrote:

> Hi,
> I got few questions. Can anyone please give me a feedback
> i)To upgrade to latest version of kafka from cluster to cloud. What should
> I adopt. How will it be helpful
> ii)How will you move the data from cluster  into the  cloud ?We are
> currently using oracle DB.
> Any help will be appreciated. Suggest any documents for upgradation from
> cluster to cloud.
> Thanks,
> Bahir
>


Add to contribution list

2019-07-17 Thread Omar Al-Safi
Hi guys,

I would like to be added to the contribution list in Jira and Wiki in order
to work on some issues. My username is : omarsmak

Thanks


Re: [DISCUSS] KIP-492 Add java security providers in Kafka Security config

2019-07-17 Thread Rajini Sivaram
Hi Sandeep,

Thanks for the KIP. A few questions below:

   1. Is the main use case for this KIP adding security providers for SSL?
   If so, wouldn't a more generic solution like KIP-383 work for this?
   2. Presumably this config would also apply to clients. If so, have we
   thought through the implications of changing static JVM-wide security
   providers in the client applications?
   3. Since client applications can programmatically invoke the Java
   Security API anyway, isn't the system property described in `Rejected
   Alternatives` a reasonable solution for brokers?
   4. We have SASL login modules in Kafka that automatically add security
   providers for SASL mechanisms not supported by the JVM. We should describe
   the impact of this KIP on those and whether we would now require a config
   to enable these security providers.
   5. We have been moving away from JVM-wide configs like the default JAAS
   config since they are hard to test reliably or update dynamically. The
   replacement config `sasl.jaas.config` doesn't insert a JVM-wide
   configuration. Have we investigated similar options for the specific
   scenario we are addressing here?
   6. Are we always going to insert new providers at the start of the
   provider list?


Regards,

Rajini



On Wed, Jul 17, 2019 at 5:05 AM Harsha  wrote:

> Thanks for the KIP Sandeep. LGTM.
>
> Mani & Rajini, can you please look at the KIP as well.
>
> Thanks,
> Harsha
>
> On Tue, Jul 16, 2019, at 2:54 PM, Sandeep Mopuri wrote:
> > Thanks for the suggestions, made changes accordingly.
> >
> > On Tue, Jul 16, 2019 at 9:27 AM Satish Duggana  >
> > wrote:
> >
> > > Hi Sandeep,
> > > Thanks for the KIP, I have few comments below.
> > >
> > > >>“To take advantage of these custom algorithms, we want to support
> java
> > > security provider parameter in security config. This param can be used
> by
> > > kafka brokers or kafka clients(when connecting to the kafka brokers).
> The
> > > security providers can also be used for configuring security
> algorithms in
> > > SASL based communication.”
> > >
> > > You may want to mention use case like
> > > spiffe.provider.SpiffeProvider[1] in streaming applications like
> > > Flink, Spark or Storm etc.
> > >
> > > >>"We add new config parameter in KafkaConfig named
> > > “security.provider.class”. The value of “security.provider” is
> expected to
> > > be a string representing the provider’s full classname. This provider
> class
> > > will be added to the JVM properties through Security.addProvider api.
> > > Security class can be used to programmatically add the provider
> classes to
> > > the JVM."
> > >
> > > It is good to have this property as a list of providers instead of a
> > > single property. This will allow configuring multiple providers if it
> > > is needed in the future without introducing hacky solutions like
> > > security.provider.class.name.x, where x is a sequence number. You can
> > > change the property name to “security.provider.class.names” and its
> > > value is a list of fully qualified provider class names separated by
> > > ‘,'.
> > > For example:
> > >
> > >
> security.provider.class.names=spiffe.provider.SpiffeProvider,com.foo.MyProvider
> > >
> > > Typo in existing properties section:
> > > “ssl.provider” instead of “ssl.providers”.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > 1. https://github.com/spiffe/java-spiffe
> > >
> > >
> > > On Mon, Jul 15, 2019 at 11:41 AM Sandeep Mopuri 
> wrote:
> > > >
> > > > Hello all,
> > > >
> > > > I'd like to start a discussion thread for KIP-492.
> > > > This KIP plans on introducing a new security config parameter for a
> > > custom
> > > > security providers. Please take a look and let me know what do you
> think.
> > > >
> > > > More information can be found here:
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-492%3A+Add+java+security+providers+in+Kafka+Security+config
> > > > --
> > > > Thanks,
> > > > Sai Sandeep
> > >
> >
> >
> > --
> > Thanks,
> > M.Sai Sandeep
> >
>


Build failed in Jenkins: kafka-trunk-jdk8 #3791

2019-07-17 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8024; Fix `UtilsTest` failure under non-english locales (#6351)

--
[...truncated 2.56 MB...]

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullValueToUnix PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaUnixToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaUnixToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullValueToString STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullValueToString PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessIdentity STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessIdentity PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaFieldConversion STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaFieldConversion PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToString STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToString PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToString STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToString PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaDateToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaDateToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaIdentity STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaIdentity PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToDate STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToDate PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToTime STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToTime PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToUnix STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToUnix PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testConfigMissingFormat STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testConfigMissingFormat PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToDate STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToDate PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToTime STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToTime PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToUnix STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessNullValueToUnix PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToDate STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToDate PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToTime STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToTime PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToUnix STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaNullFieldToUnix PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testConfigNoTargetType STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testConfigNoTargetType PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaStringToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaStringToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimeToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimeToTimestamp PASSED


[jira] [Created] (KAFKA-8676) Avoid Stopping Unnecessary Connectors and Tasks

2019-07-17 Thread Luying Liu (JIRA)
Luying Liu created KAFKA-8676:
-

 Summary: Avoid Stopping Unnecessary Connectors and Tasks 
 Key: KAFKA-8676
 URL: https://issues.apache.org/jira/browse/KAFKA-8676
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 2.3.0
 Environment: centOS
Reporter: Luying Liu
 Fix For: 2.3.0


When adding a new connector or changing a connector configuration, Kafka 
Connect 2.3.0 will stop all existing tasks and start all the tasks, including 
the new tasks and the existing ones. However, it is not necessary at all. Only 
the new connector and tasks need to be started. As the rebalancing can be 
applied for both running and suspended tasks.The following patch will fix this 
problem and starts only the new tasks and connectors.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8675) "Main" consumers are not unsubsribed on KafkaStreams.close()

2019-07-17 Thread Modestas Vainius (JIRA)
Modestas Vainius created KAFKA-8675:
---

 Summary: "Main" consumers are not unsubsribed on 
KafkaStreams.close()
 Key: KAFKA-8675
 URL: https://issues.apache.org/jira/browse/KAFKA-8675
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.2.1
Reporter: Modestas Vainius


Hi!

It seems that {{KafkaStreams.close()}} never unsubscribes "main" kafka 
consumers. As far as I can tell, 
{{org.apache.kafka.streams.processor.internals.TaskManager#shutdown}} does 
unsubscribe only {{restoreConsumer}}. This results into Kafka Group coordinator 
having to throw away consumer from the consumer group in a non-clean way. 
{{KafkaStreams.close()}} does {{close()}} those consumers but it seems that is 
not enough for clean exit.

Kafka Streams connects to Kafka:
{code:java}
kafka| [2019-07-17 08:02:35,707] INFO [GroupCoordinator 1]: Preparing 
to rebalance group 1-streams-test in state PreparingRebalance with old 
generation 0 (__consumer_offsets-44) (reason: Adding new member 
1-streams-test-70da298b-6c8e-4ef0-8c2a-e7cba079ec9d-StreamThread-1-consumer-9af23416-d14e-49b8-b2ae-70837f2df0db)
 (kafka.coordinator.group.GroupCoordinator)
kafka| [2019-07-17 08:02:35,717] INFO [GroupCoordinator 1]: Stabilized 
group 1-streams-test generation 1 (__consumer_offsets-44) 
(kafka.coordinator.group.GroupCoordinator)
kafka| [2019-07-17 08:02:35,730] INFO [GroupCoordinator 1]: Assignment 
received from leader for group 1-streams-test for generation 1 
(kafka.coordinator.group.GroupCoordinator)
{code}
Processing finishes in 2 secs but after 10 seconds I see this in Kafka logs:
{code:java}
kafka| [2019-07-17 08:02:45,749] INFO [GroupCoordinator 1]: Member 
1-streams-test-70da298b-6c8e-4ef0-8c2a-e7cba079ec9d-StreamThread-1-consumer-9af23416-d14e-49b8-b2ae-70837f2df0db
 in group 1-streams-test has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator)
kafka| [2019-07-17 08:02:45,749] INFO [GroupCoordinator 1]: Preparing 
to rebalance group 1-streams-test in state PreparingRebalance with old 
generation 1 (__consumer_offsets-44) (reason: removing member 
1-streams-test-70da298b-6c8e-4ef0-8c2a-e7cba079ec9d-StreamThread-1-consumer-9af23416-d14e-49b8-b2ae-70837f2df0db
 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
kafka| [2019-07-17 08:02:45,749] INFO [GroupCoordinator 1]: Group 
1-streams-test with generation 2 is now empty (__consumer_offsets-44) 
(kafka.coordinator.group.GroupCoordinator)
{code}
Topology is kind of similar to [kafka testing 
example|https://kafka.apache.org/22/documentation/streams/developer-guide/testing.html]
 but I tried on real kafka instance (one node):
{code:java}
new Topology().with {
it.addSource("sourceProcessor", "input-topic")
it.addProcessor("aggregator", new 
CustomMaxAggregatorSupplier(), "sourceProcessor")
it.addStateStore(
Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore("aggStore"),
Serdes.String(),
Serdes.Long()).withLoggingDisabled(), // need to 
disable logging to allow aggregatorStore pre-populating
"aggregator")
it.addSink(
"sinkProcessor",
"result-topic",
"aggregator"
)
it
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)