Build failed in Jenkins: kafka-trunk-jdk8 #4069

2019-11-24 Thread Apache Jenkins Server
See 


Changes:

[matthias] KAFKA-8960: Move Task determineCommitId in gradle.build to Project 
Level


--
[...truncated 2.74 MB...]
org.apache.kafka.streams.test.TestRecordTest > testFields PASSED

org.apache.kafka.streams.test.TestRecordTest > testProducerRecord STARTED

org.apache.kafka.streams.test.TestRecordTest > testProducerRecord PASSED

org.apache.kafka.streams.test.TestRecordTest > testEqualsAndHashCode STARTED

org.apache.kafka.streams.test.TestRecordTest > testEqualsAndHashCode PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp STARTED


[jira] [Resolved] (KAFKA-8960) Move Task determineCommitId in gradle.build to Project Level

2019-11-24 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-8960.

Fix Version/s: 2.5.0
 Assignee: Bruno Cadonna  (was: Rabi Kumar K C)
   Resolution: Fixed

> Move Task determineCommitId in gradle.build to Project Level
> 
>
> Key: KAFKA-8960
> URL: https://issues.apache.org/jira/browse/KAFKA-8960
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, streams
>Reporter: Bruno Cadonna
>Assignee: Bruno Cadonna
>Priority: Minor
>  Labels: newbie
> Fix For: 2.5.0
>
>
> Currently, the gradle task {{determineCommitId}} is present twice in 
> {{gradle.build}}, once in subproject {{clients}} and once in subproject 
> {{streams}}. Task {{determineCommitId}}  shall be moved to project-level such 
> that both subprojects (and also other subprojects) can call it without being 
> dependent on an implementation detail of another subproject.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: KIP-457: Add DISCONNECTED state to Kafka Streams

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias


On 6/27/19 4:37 PM, Richard Yu wrote:
>  Hi Matthias and Hachikuji,
> Sorry for getting back to you so late. Currently on a trip, so I hadn't got 
> the time to respond.
> Currently, I'm not sure which approach we should do ATM, considering that 
> Guozhang posed multiple possibilities in the previous email.Do you have any 
> preferences as to which approach we should take?
> It would greatly help in the implementation of the issue.
> Cheers,Richard
> On Thursday, June 13, 2019, 4:55:29 PM GMT+8, Richard Yu 
>  wrote:  
>  
>   Hi Guozhang,
> Thanks for the input! Then I guess from the approach you have listed above, 
> no API changes will be needed in Kafka consumer then. That will greatly 
> simplify things, although when implementing these approaches, there might be 
> some unexpected issues which might show up.
> Cheers,Richard
>     On Thursday, June 13, 2019, 4:29:29 AM GMT+8, Guozhang Wang 
>  wrote:  
>  
>  Hi Richard,
> 
> Sorry for getting late on this, I've finally get some time to take a look
> at https://github.com/apache/kafka/pull/6594 as well as the KIP itself.
> Here are some thoughts:
> 
> 1. The main motivation of this KIP is to be able to distinguish the case
> where
> 
> a. "Streams client is in an unhealthy situation and hence cannot proceed"
> (which we have an ERROR state) and
> b. "Streams client is perfectly healthy, but it cannot get to the target
> brokers and hence cannot proceed", and this should also be distinguishable
> from
> c. "both Streams and brokers are healthy, there's just no data available
> for processing and hence cannot proceed").
> 
> And we want to have a way to notify the users about the second case b)
> distinguished from the others .
> 
> 2. Following this, when I first thought about the solution I was thinking
> about adding a new state in the FSM of Kafka Streams, but after reviewing
> the code and the KIP, I felt this may be an overkill to complicate the FSM.
> Now I'm wondering if we can achieve the same thing with a single metric.
> For example:
> 
> 2.a) we know that in Streams we always rely on consumer membership to
> allocate partitions to instances, which means that the heartbeat thread has
> to be working if the consumer wants to ever receive some data, what we can
> do is to let users monitor on this metric directly, e.g. if the
> heartbeat-rate drops to zero BUT the state is still in RUNNING it means we
> are in case b) above.
> 
> 2.b) if we want to provide a streams-level metric out-of-the-box rather
> than letting users to monitor on consumer metrics, another idea is to
> leverage on existing "public Set assignment()" of
> KafkaConsumer, and record the time when it returns empty, meaning that
> nothing was assigned. And expose this as a boolean metric indicating
> nothing was assigned and hence we are likely in case b) above --- note this
> could also mean that we have fewer partitions than necessary so that some
> instance does not have any assignment indeed, which is not the same as b),
> but I feel consolidating these to cases with a single metric seem also fine.
> 
> 
> 
> Guozhang
> 
> 
> 
> 
> On Wed, Apr 17, 2019 at 2:30 PM Richard Yu 
> wrote:
> 
>> Alright, so I made a few changes to the KIP.
>> I realized that there might be an easier way to give the user information
>> on the connection state of Kafka Streams.
>> In implementation, if one wishes to have DISCONNECTED as a state, then one
>> would have to factor in proper state transitions.
>> The other approach that is now outlined in the KIP. Instead, we could just
>> add a method which I think achieves the same effect.
>> If any of you thinks there is wrong with this approach, please let me know.
>> :)
>>
>> Cheers,
>> Richard
>>
>> On Wed, Apr 17, 2019 at 11:49 AM Richard Yu 
>> wrote:
>>
>>> I just realized something.
>>>
>>> Hi Matthias, might need your input here.
>>> I realized that when implementing this change, as noted in the JIRA, we
>>> would need to "check the behaviour of the consumer" since its consumer's
>>> connection with broker that we are dealing with.
>>>
>>> So doesn't that mean we would also be dealing with consumer API changes
>> as
>>> well?
>>> I don't think consumer has any methods which would give us the state of a
>>> connection either.
>>>
>>> - Richard
>>>
>>> On Wed, Apr 17, 2019 at 8:43 AM Richard Yu 
>>> wrote:
>>>
 Hi Micheal,

 Yeah, those are some points I should've clarified.
 No problem. Have got it done.



 On Wed, Apr 17, 2019 at 6:42 AM Michael Noll 
 wrote:

> Richard,
>
> thanks for looking into this!
>
> However, I have some concerns. The KIP you created (
>
>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams
> )
> doesn't yet address open questions such as the ones mentioned by
> Matthias:
>
> 1) What is 

Re: [DISCUSS]: KIP-159: Introducing Rich functions to Streams

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 12/18/17 4:09 PM, Matthias J. Sax wrote:
> I just want to point out that the basic idea is great and that we should
> apply optimizations like "filter first" and other. But we should NOT
> convolute this KIP with orthogonal improvements.
> 
> In fact, we have an umbrella JIRA for DSL optimization already:
> https://issues.apache.org/jira/browse/KAFKA-6034
> 
> @Jan: feel free to create sub-tasks for new optimization ideas and we
> can take it from there.
> 
> 
> -Matthias
> 
> 
> On 12/18/17 7:55 AM, Bill Bejeck wrote:
>> Jan,
>>
>> I apologize for the delayed response.
>>
>> my suggestion would be that instead of
>>>
>>> SOURCE -> KTABLESOURCE -> KTABLEFILTER -> JOIN -> SINK
>>>
>>> we build
>>>
>>> SOURCE  -> KTABLEFILTER ->  KTABLESOURCE -> JOIN -> SINK
>>
>>
>> I agree that filtering before the KTable source makes sense and would be a
>> positive change to implement.
>>
>> But the situation above is just one scenario out of many we need to
>> consider.  I'm not sure we can cover all the implications from different
>> use cases ahead of time.
>>
>> So I'm inclined to agree with Guozhang that we come up with clear "rules"
>> (I use the word rules for lack of a better term) for RecordContext usage
>> and inheritance. That way going forward we can have distinct expectations
>> of different use cases.
>>
>> -Bill
>>
>> On Fri, Dec 15, 2017 at 3:57 PM, Guozhang Wang  wrote:
>>
>>> Regarding the record context inheritance: I agree it may be a better idea
>>> for now to drop the information when we cannot come up with a consensus
>>> about how the record context should be inherited. Like Bill I was a bit
>>> worried about the lacking of such data lineage information for trouble
>>> shooting in operations or debugging in coding; but I think we can still try
>>> to come up with better solutions in the future by extending the current
>>> protocol, than coming up with something that we realized that we need to
>>> change in the future.
>>>
>>> Regarding the table / filter question: I agree with Jan that we could
>>> consider update the builder so that we will push down the filter earlier
>>> than KTable source that materialized the store; on the other hand, I think
>>> Matthias' point is that even doing this does not completely exclude the
>>> scenarios that you'd have the old/new pairs in your Tables, for example,
>>> consider:
>>>
>>> table1 = stream1.groupBy(...).aggregate(...)
>>> table2 = table1.filter(..., Materialized.as(...))
>>>
>>> In this case table2 is filtering on table1 which is not read from the
>>> source, and hence it already outputs the old/new pairs already, so we still
>>> need to consider how to handle it.
>>>
>>>
>>> So I'd suggest the following execution plan towards KIP-159:
>>>
>>> 1) revisit our record context (topic, partition, offset, timestamp)
>>> protocols that is used at the DSL layer, make it clear at which high-level
>>> operators we should apply certain inheritance rule, and which others we
>>> should drop such information.
>>> 1.1) modify the lower-level PAPI that DSL leverages, to allow the
>>> caller (DSL) to modify the record context (note that today for lower-level
>>> API, the record context is always passed through when forwarding to the
>>> next processor node)
>>> 2) at the same time, consider optimizing the source KTable filter cases (I
>>> think we already have certain JIRA tickets for this) so that the filter
>>> operator is pushed early than the KTABLESOURCE node where materialization
>>> happens.
>>> 3) after 1) is done, come back to KIP-159 and add the proposed APIs.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Thu, Dec 7, 2017 at 12:27 PM, Jan Filipiak 
>>> wrote:
>>>
 Thank you Bill,

 I think this is reasonable. Do you have any suggestion
 for handling oldValues in cases like

 builder.table().filter(RichPredicate).join()

 where we process a Change with old and new value and dont have a record
 context for old.

 my suggestion would be that instead of

 SOURCE -> KTABLESOURCE -> KTABLEFILTER -> JOIN -> SINK

 we build

 SOURCE  -> KTABLEFILTER ->  KTABLESOURCE -> JOIN -> SINK

 We should build a topology like this from the beginning and not have
 an optimisation phase afterwards.

 Any opinions?

 Best Jan




 On 05.12.2017 17:34, Bill Bejeck wrote:

> Matthias,
>
> Overall I agree with what you've presented here.
>
> Initially, I was hesitant to remove information from the context of the
> result records (Joins or Aggregations) with the thought that when there
> are
> unexpected results, the source information would be useful for tracing
> back
> where the error could have occurred.  But in the case of Joins and
> Aggregations, the amount of data needed to do meaningful analysis could
>>> be
> too much. For 

Re: [DISCUSS] - KIP-314: KTable to GlobalKTable Bi-directional Join

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 7/1/18 9:47 PM, Guozhang Wang wrote:
> Hello Adam,
> 
> Sorry for being late on this thread. I've read through your updated wiki
> and here are some thoughts:
> 
> * I agree with your assessed impediments. In fact, today although the
> Global KTable have its own checkpoint files, and restoration process,
> during its restoration it will always try to bootstrap the backing global
> store up to the Kafka topic's end of offset, before starting any
> processing. And hence on each machine node the checkpoint offsets  For the
> proposed change that Global KTable also needs to trigger joins, it means
> its restoration process would not fit as well. On the other hand, even for
> Global KTable - Stream joins we do not have such guarantees either: imagine
> if there is a crash, or even graceful shutdown scenario, when the task is
> back online and Global KTable is bootstrapped, it does not guarantee to be
> at the same offset position when it has stopped anyways. So I think one can
> argue that users of Global KTable - KTable join should not expect this
> semantics either.
> 
> * The other issue, which I mentioned above, is that the updates of the
> joins triggered by the Global KTable, and hence executed by the global
> thread, now needs to be propagated into the downstream operators, and more
> important following the order of the join: i.e. if there is a record coming
> from Global KTable, and then later another record coming from the other
> KTable. It means that then the global thread and the stream thread needs to
> be synchronized (note that today these threads are totally in parallel to
> each other).
> 
> 
> With that, I think we can 1) continue working on KIP-213 for local KTable
> joins, and 2) continue this KIP for Global KTable - KTable joins, while
> educating users that similar to Global KTable - KStream joins, the global
> ktable will not Global KTable trigger joins.
> 
> Guozhang
> 
> 
> On Mon, Jun 25, 2018 at 11:05 AM, Adam Bellemare 
> wrote:
> 
>> Thanks for your help so far guys.
>>
>> While I do think that I have a fairly reasonable way forward for
>> restructuring the topologies and threads, there is, unfortunately, what I
>> believe is a fatal flaw that cannot be easily resolved. I have updated the
>> page (
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 314%3A+KTable+to+GlobalKTable+Bi-directional+Join
>> ) with the impediments to the solution, all of which revolve around
>> ensuring that data consistency is maintained. It seems to me that
>> GlobalKTables are not the way forward here and that I may be best
>> redirecting my efforts towards KIP-213 ( https://cwiki.apache.org/
>> confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable ).
>>
>> I would appreciate being proven wrong on my impediment listings, but if
>> there are no other ideas I think we should close this KIP and the
>> associated JIRA. A KTable to GlobalKTable join driven just by the KTable is
>> simply performed by a stream to GKT join with a groupbyKey and reduce to
>> form a state-store, so I would see no need to keep it open otherwise
>> (unless just for the shorthand notation).
>>
>> Thanks again
>>
>> Adam
>>
>> On Fri, Jun 22, 2018 at 9:00 PM, Guozhang Wang  wrote:
>>
>>> Hello Adam,
>>>
>>> Please see my comments inline.
>>>
>>> On Thu, Jun 21, 2018 at 8:14 AM, Adam Bellemare <
>> adam.bellem...@gmail.com>
>>> wrote:
>>>
 Hi Guozhang

 *Re: Questions*
 *1)* I do not yet have a solution to this, but I also did not look that
 closely at it when I begun this KIP. I admit that I was unaware of
>>> exactly
 how the GlobalKTable worked alongside the KTable/KStream topologies.
>> You
 mention "It means the two topologies will be merged, and that merged
 topology can only be executed as a single task, by a single thread. " -
>>> is
 the problem here that the merged topology would be parallelized to
>> other
 threads/instances? While I am becoming familiar with how the topologies
>>> are
 created under the hood, I am not yet fully clear on the implications of
 your statement. I will look into this further.


>>> Yes. The issue is that today each task is executed by a single thread
>> only
>>> at any given time, and hence any state stores are only accessed by a
>> single
>>> thread (except for interactive queries, and for global tables where the
>>> global update thread write to the global store, and the local thread read
>>> from the global store), if we let the global store update thread to be
>> also
>>> triggering joins and puts send the results into the downstream operators,
>>> then it means that the global store update thread can access on any state
>>> stores in the subsequent part of the topology, breaking our current
>>> threading model.
>>>
>>>
 *2)* " do you mean that although we have a duplicated state store:
 ModifiedEvents in addition to 

Re: [DISCUSS] KIP-326: Schedulable KTable as Graph source

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 7/15/18 6:55 PM, Matthias J. Sax wrote:
> I think it would make a lot of sense to provide a simple DSL abstraction.
> 
> Something like:
> 
> KStream stream = ...
> KTable count = stream.count();
> 
> The missing groupBy() or grouByKey() class indicates a global counting
> operation. The JavaDocs should highlight the impact.
> 
> One open question is, what key we want to use for the result KTable?
> 
> Also, the details about optional parameters like `Materialized` need to
> be discussed in details.
> 
> 
> 
> -Matthias
> 
> 
> On 7/6/18 2:43 PM, Guozhang Wang wrote:
>> That's a lot of email exchanges for me to catch up :)
>>
>> My original proposed alternative solution is indeed relying on
>> pre-aggregate before sending to the single-partition topic, so that the
>> traffic on that single-partition topic would not be huge (I called it
>> partial-aggregate but the intent was the same).
>>
>> What I was thinking is that, given such a scenario could be common, if
>> we've decided to go down this route should we provide a new API that wrap's
>> John's proposed topology (right now with KIP-328 users still need to
>> leverage this trick manually):
>>
>>
>> --
>>
>> final KStream siteEvents = builder.stream("/site-events");
>>
>> final KStream keyedByPartition = siteEvents.transform(/*
>> generate KeyValue(key, 1) for the pre-aggregate*/);
>>
>> final KTable countsByPartition =
>> keyedByPartition.groupByKey().count();   /* pre-aggregate */
>>
>> final KGroupedTable singlePartition =
>> countsByPartition.groupBy((key, value) -> new KeyValue<>("ALL", value));
>>  /* sent the suppressed pre-aggregate values to the single partition topic
>> */
>>
>> final KTable totalCount = singlePartition.reduce((l, r) -> l +
>> r, (l, r) -> l - r);   /* read from the single partition topic, do reduce
>> on the data*/
>>
>> --
>>
>> Note that if we wrap them all into a new operator, users would need to
>> provide two functions, for the aggregate and for the final "reduce" (in my
>> previous email I called it merger function, but for the same intent).
>>
>>
>>
>> Guozhang
>>
>>
>>
>> On Thu, Jul 5, 2018 at 3:38 PM, John Roesler  wrote:
>>
>>> Ok, I didn't get quite as far as I hoped, and several things are far from
>>> ready, but here's what I have so far:
>>> https://github.com/apache/kafka/pull/5337
>>>
>>> The "unit" test works, and is a good example of how you should expect it to
>>> behave:
>>> https://github.com/apache/kafka/pull/5337/files#diff-
>>> 2fdec52b9cc3d0e564f0c12a199bed77
>>>
>>> I have one working integration test, but it's slow going getting the timing
>>> right, so no promises of any kind ;)
>>>
>>> Let me know what you think!
>>>
>>> Thanks,
>>> -John
>>>
>>> On Thu, Jul 5, 2018 at 8:39 AM John Roesler  wrote:
>>>
 Hey Flávio,

 Thanks! I haven't got anything usable yet, but I'm working on it now. I'm
 hoping to push up my branch by the end of the day.

 I don't know if you've seen it but Streams actually already has something
 like this, in the form of caching on materialized stores. If you pass in
>>> a
 "Materialized.withCachingEnabled()", you should be able to get a POC
 working by setting the max cache size pretty high and setting the commit
 interval for your desired rate:
 https://docs.confluent.io/current/streams/developer-
>>> guide/memory-mgmt.html#streams-developer-guide-memory-management
 .

 There are a couple of cases in joins and whatnot where it doesn't work,
 but for the aggregations we discussed, it should. The reason for KIP-328
>>> is
 to provide finer control and hopefully a more straightforward API.

 Let me know if that works, and I'll drop a message in here when I create
 the draft PR for KIP-328. I'd really appreciate your feedback.

 Thanks,
 -John

 On Wed, Jul 4, 2018 at 10:17 PM flaviost...@gmail.com <
 flaviost...@gmail.com> wrote:

> John, that was fantastic, man!
> Have you built any custom implementation of your KIP in your machine so
> that I could test it out here? I wish I could test it out.
> If you need any help implementing this feature, please tell me.
>
> Thanks.
>
> -Flávio Stutz
>
>
>
>
> On 2018/07/03 18:04:52, John Roesler  wrote:
>> Hi Flávio,
>> Thanks! I think that we can actually do this, but the API could be
> better.
>> I've included Java code below, but I'll copy and modify your example
>>> so
>> we're on the same page.
>>
>> EXERCISE 1:
>>   - The case is "total counting of events for a huge website"
>>   - Tasks from Application A will have something like:
>>  .stream(/site-events)
>>  .transform( re-key s.t. the new key is the partition id)
>>  .groupByKey() // you have to do this before count
>>  .count()
>>  

Re: [DISCUSS] KIP-335 Consider configurations for Kafka Streams

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 7/19/18 10:57 AM, Matthias J. Sax wrote:
> Thanks for the KIP.
> 
> I just double checked the code. It seems that `retries` and
> `retries.backoff.ms` is only used in `GlobalStateManagerImpl` any longer
> to catch a timeout exception and retry.
> 
> Thus, it seems reasonable to me to deprecate those configs and rewrite
> `GlobalStateManager` accordingly.
> 
> It makes also sense to me, to introduce a global config to set the
> blocking timeout.
> 
> Note, that there is already `poll.ms` that is passed into `poll()` as
> timeout value. Thus, the question raises, if `poll.ms` should be kept or
> replaced?
> 
> What name do you suggest for the new config?
> 
> I also think, that the section about compatibility should be more detailed.
> 
> 
> 
> -Matthias
> 
> On 7/8/18 9:25 PM, Richard Yu wrote:
>>  Hi Matthias,
>> It would be nice to get your opinions on this.
>>
>> On Monday, July 9, 2018, 12:17:33 PM GMT+8, Richard Yu 
>>  wrote:  
>>  
>>  Hi all,
>>
>>
>> Eversince KIP-266 was concluded, there has been a pressing need to migrate 
>> Kafka Streams as well. For the link, please click here:
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-335%3A+Consider+configurations+for+KafkaStreams
>>
>>
>> Thanks,
>> Richard Yu  
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-362: Dynamic Session Window Support

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias


On 1/9/19 10:18 PM, Guozhang Wang wrote:
> Hello Lei,
> 
> Just checking what's the current status of this KIP. We have a KIP deadline
> for 2.2 on 24th and wondering if this one may be able to make it.
> 
> 
> Guozhang
> 
> On Sat, Dec 15, 2018 at 1:01 PM Lei Chen  wrote:
> 
>> Sorry for the late reply Matthias. Have been busy with other work recently.
>> I'll restart the discussion and update the KIP accordingly.
>>
>> Lei
>>
>> On Tue, Nov 6, 2018 at 3:11 PM Matthias J. Sax 
>> wrote:
>>
>>> Any update on this KIP?
>>>
>>> On 9/20/18 3:30 PM, Matthias J. Sax wrote:
 Thanks for following up. Very nice examples!

 I think, that the window definition for Flink is semantically
 questionable. If there is only a single record, why is the window
 defined as [ts, ts+gap]? To me, this definition is not sound and seems
 to be arbitrary. To define the windows as [ts-gap,ts+gap] as you
>> mention
 would be semantically more useful -- still, I think that defining the
 window as [ts,ts] as we do currently in Kafka Streams is semantically
 the best.

 I have the impression, that Flink only defines them differently,
>> because
 it solves the issues in the implementation. (Ie, an implementation
 details leaks into the semantics, what is usually not desired.)

 However, I believe that we could change the implementation accordingly.
 We could store the windowed keys, as [ts-gap,ts+gap] (or [ts,ts+gap])
>> in
 RocksDB, but at API level we return [ts,ts]. This way, we can still
>> find
 all windows we need and provide the same deterministic behavior and
>> keep
 the current window boundaries on the semantic level (there is no need
>> to
 store the window start and/or end time). With this technique, we can
 also implement dynamic session gaps. I think, we would need to store
>> the
 used "gap" for each window, too. But again, this would be an
 implementation detail.

 Let's see what others think.

 One tricky question we would need to address is, how we can be backward
 compatible. I am currently working on KIP-258 that should help to
 address this backward compatibility issue though.


 -Matthias



 On 9/19/18 5:17 PM, Lei Chen wrote:
> Thanks Matthias. That makes sense.
>
> You're right that symmetric merge is necessary to ensure consistency.
>> On
> the other hand, I kinda feel it defeats the purpose of dynamic gap,
>>> which
> is to update the gap from old value to new value. The symmetric merge
> always honor the larger gap in both direction, rather than honor the
>> gap
> carried by record with larger timestamp. I wasn't able to find any
>>> semantic
> definitions w.r.t this particular aspect online, but spent some time
> looking into other streaming engines like Apache Flink.
>
> Apache Flink defines the window differently, that uses (start time,
>>> start
> time + gap).
>
> so our previous example (10, 10), (19,5),(15,3) in Flink's case will
>> be:
> [10,20]
> [19,24] => merged to [10,24]
> [15,18] => merged to [10,24]
>
> while example (15,3)(19,5)(10,10) will be
> [15,18]
> [19,24] => no merge
> [10,20] => merged to [10,24]
>
> however, since it only records gap in future direction, not past, a
>> late
> record might not trigger any merge where in symmetric merge it would.
> (7,2),(10, 10), (19,5),(15,3)
> [7,9]
> [10,20]
> [19,24] => merged to [10,24]
> [15,18] => merged to [10,24]
> so at the end
> two windows [7,9][10,24] are there.
>
> As you can see, in Flink, the gap semantic is more toward to the way
>>> that,
> a gap carried by one record only affects how this record merges with
>>> future
> records. e.g. a later event (T2, G2) will only be merged with (T1, G1)
>>> is
> T2 is less than T1+G1, but not when T1 is less than T2 - G2. Let's
>> call
> this "forward-merge" way of handling this. I just went thought some
>>> source
> code and if my understanding is incorrect about Flink's
>> implementation,
> please correct me.
>
> On the other hand, if we want to do symmetric merge in Kafka Streams,
>> we
> can change the window definition to [start time - gap, start time +
>>> gap].
> This way the example (7,2),(10, 10), (19,5),(15,3) will be
> [5,9]
> [0,20] => merged to [0,20]
> [14,24] => merged to [0,24]
> [12,18] => merged to [0,24]
>
>  (19,5),(15,3)(7,2),(10, 10) will generate same result
> [14,24]
> [12,18] => merged to [12,24]
> [5,9] => no merge
> [0,20] => merged to [0,24]
>
> Note that symmetric-merge would require us to change the way how Kafka
> Steams fetch windows now, instead of fetching range from timestamp-gap
>>> to
> timestamp+gap, we 

Re: [DISCUSS] KIP-378: Enable Dependency Injection for Kafka Streams handlers

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 4/26/19 2:00 AM, Matthias J. Sax wrote:
> What is the status of this KIP?
> 
> -Matthias
> 
> 
> On 3/18/19 5:11 PM, Guozhang Wang wrote:
>> Hello Wladimir,
>>
>> Thanks for the replies.
>>
>> What do you mean by "the community has opted for the second more complex
>> solution"? Could you elaborate a bit more?
>>
>> Guozhang
>>
>>
>> On Sun, Mar 17, 2019 at 3:45 PM Wladimir Schmidt  wrote:
>>
>>> Hi Matthias,
>>>
>>> Sorry, due to other commitments I haven't started the other
>>> implementation yet.
>>> In the meantime, the community has opted for the second, more complex
>>> solution.
>>> I already had ideas in this regard, but their elaboration needs to be
>>> discussed more.
>>>
>>>
>>> Best greetings,
>>> Wladimir
>>>
>>>
>>> On 21-Feb-19 09:33, Matthias J. Sax wrote:
 Hi Wladimir,

 what is the status of this KIP?

 -Matthias

 On 1/9/19 4:17 PM, Guozhang Wang wrote:
> Hello Wladimir,
>
> Just checking if you are still working on this KIP. We have the 2.2 KIP
> freeze deadline by 24th this month, and it'll be great to complete this
>>> KIP
> by then so 2.2.0 release could have this feature.
>
>
> Guozhang
>
> On Mon, Dec 3, 2018 at 11:26 PM Guozhang Wang 
>>> wrote:
>
>> Hello Wladimir,
>>
>> I've thought about the two options and I think I'm sold on the second
>> option and actually I think it is better generalize it to be
>>> potentially
>> used for other clients (producer, consumer) as while since they also
>>> have
>> similar dependency injection requests for metrics reporter,
>>> partitioner,
>> partition assignor etc.
>>
>> So I'd suggest we add the following to AbstractConfig directly (note I
>> intentionally renamed the class to ConfiguredInstanceFactory to be
>>> used for
>> other clients as well):
>>
>> ```
>> AbstractConfig(ConfigDef definition, Map originals,
>> ConfiguredInstanceFactory, boolean doLog)
>> ```
>>
>> And then in StreamsConfig add:
>>
>> ```
>> StreamsConfig(Map props, ConfiguredInstanceFactory)
>> ```
>>
>> which would call the above AbstractConfig constructor (we can leave to
>> core team to decide when they want to add for producer and consumer);
>>
>> And in KafkaStreams / TopologyTestDriver we can add one overloaded
>> constructor each that includes all the parameters including the
>> ConfiguredInstanceFactory --- for those who only want `factory` but not
>> `client-suppliers` for example, they can set it to `null` and the
>>> streams
>> library will just use the default one.
>>
>>
>> Guozhang
>>
>>
>> On Sun, Dec 2, 2018 at 12:13 PM Wladimir Schmidt 
>> wrote:
>>
>>> Hello Guozhang,
>>> sure, the first approach is very straight-forward and allows minimal
>>> changes to the Kafka Streams API.
>>> On the other hand, second approach with the interface implementation
>>> looks more cleaner to me.
>>> I totally agree that this should be first discussed before will be
>>> implemented.
>>>
>>> Thanks, Wladimir
>>>
>>>
>>> On 17-Nov-18 23:37, Guozhang Wang wrote:
>>>
>>> Hello folks,
>>>
>>> I'd like to revive this thread for discussion. After reading the
>>> previous
>>> emails I think I'm still a bit leaning towards re-enabling to pass in
>>> StreamsConfig to Kafka Streams constructors compared with a
>>> ConfiguredStreamsFactory as additional parameters to overloaded
>>> KafkaStreams constructors: although the former seems less cleaner as
>>> it
>>> requires users to read through the usage of AbstractConfig to know
>>> how to
>>> use it in their frameworks, this to me is a solvable problem through
>>> documentations, plus AbstractConfig is a public interface already and
>>> hence
>>> the additional ConfiguredStreamsFactory to me is really a bit
>>> overlapping
>>> in functionality.
>>>
>>>
>>> Guozhang
>>>
>>>
>>>
>>> On Sun, Oct 21, 2018 at 1:41 PM Wladimir Schmidt 
>>>  wrote:
>>>
>>>
>>> Hi Damian,
>>>
>>> The first approach was added only because it had been initially
>>> proposed
>>> in my pull request,
>>> which started a discussion and thus, the KIP-378 was born.
>>>
>>> Yes, I would like to have something "injectable". In this regard, a
>>> `ConfiguredStreamsFactory` (name is a subject to discussion)
>>> is a good option to be introduced into `KafkaStreams` constructor.
>>>
>>> Even though, I consider the second approach to be cleaner, it
>>> involves a
>>> certain amount of refactoring of the streams library.
>>> The first approach, on the contrary, adds (or removes deprecated
>>> annotation, if the method has not been removed yet) only additional
>>> 

Re: KIP-406: GlobalStreamThread should honor custom reset policy

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 12/17/18 1:49 PM, Richard Yu wrote:
> Hi Matthias,
> 
> It would be great if we got your input on this.
> 
> 
> On Sun, Dec 16, 2018 at 3:06 PM Richard Yu 
> wrote:
> 
>> Hi everybody,
>>
>> There is a new KIP regarding the resilience of GlobalStreamThread which
>> could be seen below:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-406%3A+GlobalStreamThread+should+honor+custom+reset+policy
>>
>> We are considering the new addition of some new reset policy. It would be
>> great if you could pitch in!
>>
>> Thanks,
>> Richard Yu
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-408: Add Asynchronous Processing to Kafka Streams

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 1/4/19 3:42 PM, Richard Yu wrote:
> Hi all,
> 
> Just want to hear some opinions on this KIP from the PMCs. It would be nice
> if we got input from them.
> Don't want to drag this KIP for too long! :)
> 
> Hope we get some input :)
> 
> Thanks,
> Richard
> 
> On Thu, Jan 3, 2019 at 8:26 PM Richard Yu 
> wrote:
> 
>> Hi Boyang,
>>
>> Interesting article. Although something crossed my mind. When skipping bad
>> records, we couldn't go back to them to process again to guarantee ordering
>> i.e (both exactly-once and at-least-once would not be supported, only
>> at-most-once). Also, in Kafka, when it comes to individually acking every
>> single record, the resulting latency is horrible (from what I heard). We
>> actually discussed something like this in
>> https://issues.apache.org/jira/browse/KAFKA-7432. It might give you some
>> insight since it is a related issue.
>>
>> I hope this helps,
>> Richard
>>
>>
>>
>>
>> On Thu, Jan 3, 2019 at 7:29 PM Boyang Chen  wrote:
>>
>>> Hey Richard,
>>>
>>> thanks for the explanation. Recently I read an interesting blog post<
>>> https://streaml.io/blog/pulsar-streaming-queuing> from Apache Pulsar
>>> (written long time ago), where they define the concept of individual ack
>>> which means we could skip records and leave certain records remain on the
>>> queue for late processing. This should be something similar to KIP-408
>>> which also shares some motivations for us to invest.
>>>
>>> Boyang
>>>
>>> 
>>> From: Richard Yu 
>>> Sent: Friday, January 4, 2019 5:42 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-408: Add Asynchronous Processing to Kafka
>>> Streams
>>>
>>> Hi all,
>>>
>>> Just bumping this KIP. Would be great if we got some discussion.
>>>
>>>
>>> On Sun, Dec 30, 2018 at 5:13 PM Richard Yu 
>>> wrote:
>>>
 Hi all,

 I made some recent changes to the KIP. It should be more relevant with
>>> the
 issue now (involves Processor API in detail).
 It would be great if you could comment.

 Thanks,
 Richard

 On Wed, Dec 26, 2018 at 10:01 PM Richard Yu >>>
 wrote:

> Hi all,
>
> Just changing the title of the KIP. Discovered it wasn't right.
> Thats about it. :)
>
> On Mon, Dec 24, 2018 at 7:57 PM Richard Yu >>>
> wrote:
>
>> Sorry, just making a correction.
>>
>> Even if we are processing records out of order, we will still have to
>> checkpoint offset ranges.
>> So it doesn't really change anything even if we are doing in-order
>> processing.
>>
>> Thinking this over, I'm leaning slightly towards maintaining the
>> ordering guarantee.
>> Although when implementing this change, there might be some kinks that
>> we have not thought about which could throw a monkey wrench into the
>>> works.
>>
>> But definitely worth trying out,
>> Richard
>>
>> On Mon, Dec 24, 2018 at 6:51 PM Richard Yu <
>>> yohan.richard...@gmail.com>
>> wrote:
>>
>>> Hi Boyang,
>>>
>>> I could see where you are going with this. Well, I suppose I should
>>> have added this to alternatives, but I might as well mention it now.
>>>
>>> It had crossed my mind that we consider returning in-order even if
>>> there are multiple threads processing on the same thread. But for
>>> this to
>>> happen, we must block for the offsets in-between which have not been
>>> processed yet. For example, offsets 1-50 are being processed by
>>> thread1,
>>> while the offsets 51 - 100 are being processed by thread2. We will
>>> have to
>>> wait for thread1 to finish processing its offsets first before we
>>> return
>>> the records processed by thread2. So in other words, once thread1 is
>>> done,
>>> thread2's work up to that point will be returned in one go, but not
>>> before
>>> that.
>>>
>>> I suppose this could work, but the client will have to wait some time
>>> before the advantages of multithreaded processing can be seen (i.e.
>>> the
>>> first thread has to finish processing its segment of the records
>>> first
>>> before any others are returned to guarantee ordering). Another point
>>> I
>>> would like to make is that the threads are *asynchronous. *So for us
>>> to know when a thread is done processing a certain segment, we will
>>> probably have a similar policy to how getMetadataAsync() works (i.e.
>>> have a
>>> parent thread be notified of when the children threads are done).
>>> [image: image.png]
>>> Just pulling this from the KIP. But instead, we would apply this to
>>> metadata segments instead of just a callback.
>>> I don't know whether or not the tradeoffs are acceptable to the
>>> client.
>>> Ordering could be guaranteed, but it would be hard to do. For
>>> example, if
>>> there was a crash, we might 

[jira] [Resolved] (KAFKA-7996) KafkaStreams does not pass timeout when closing Producer

2019-11-24 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7996.

Resolution: Fixed

> KafkaStreams does not pass timeout when closing Producer
> 
>
> Key: KAFKA-7996
> URL: https://issues.apache.org/jira/browse/KAFKA-7996
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Patrik Kleindl
>Priority: Major
>  Labels: needs-kip
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-459%3A+Improve+KafkaStreams%23close]
> [https://confluentcommunity.slack.com/messages/C48AHTCUQ/convo/C48AHTCUQ-1550831721.026100/]
> We are running 2.1 and have a case where the shutdown of a streams 
> application takes several minutes
>  I noticed that although we call streams.close with a timeout of 30 seconds 
> the log says
>  [Producer 
> clientId=…-8be49feb-8a2e-4088-bdd7-3c197f6107bb-StreamThread-1-producer] 
> Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
> Matthias J Sax [vor 3 Tagen]
>  I just checked the code, and yes, we don't provide a timeout for the 
> producer on close()...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-7996) KafkaStreams does not pass timeout when closing Producer

2019-11-24 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-7996:


> KafkaStreams does not pass timeout when closing Producer
> 
>
> Key: KAFKA-7996
> URL: https://issues.apache.org/jira/browse/KAFKA-7996
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Patrik Kleindl
>Priority: Major
>  Labels: needs-kip
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-459%3A+Improve+KafkaStreams%23close]
> [https://confluentcommunity.slack.com/messages/C48AHTCUQ/convo/C48AHTCUQ-1550831721.026100/]
> We are running 2.1 and have a case where the shutdown of a streams 
> application takes several minutes
>  I noticed that although we call streams.close with a timeout of 30 seconds 
> the log says
>  [Producer 
> clientId=…-8be49feb-8a2e-4088-bdd7-3c197f6107bb-StreamThread-1-producer] 
> Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
> Matthias J Sax [vor 3 Tagen]
>  I just checked the code, and yes, we don't provide a timeout for the 
> producer on close()...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-459: Improve KafkaStreams#close

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 7/9/19 3:51 PM, Dongjin Lee wrote:
> Hi Matthias,
> 
> Have you thought about this issue?
> 
> Thanks,
> Dongjin
> 
> On Wed, Jun 19, 2019 at 5:07 AM Dongjin Lee  wrote:
> 
>> Hello.
>>
>> I just uploaded the draft implementation of the three proposed
>> alternatives.
>>
>> - Type A: define a close timeout constant -
>> https://github.com/dongjinleekr/kafka/tree/feature/KAFKA-7996-a
>> - Type B: Provide a new configuration option, 'close.wait.ms' -
>> https://github.com/dongjinleekr/kafka/tree/feature/KAFKA-7996-b
>> - Type C: Extend KafkaStreams constructor to support a close timeout
>> parameter -
>> https://github.com/dongjinleekr/kafka/tree/feature/KAFKA-7996-c
>>
>> As you can see in the branches, Type B and C are a little bit more
>> complicated than A, since it provides an option to control the timeout to
>> close AdminClient and Producer. To provide that functionality, B and C
>> share a refactoring commit, which replaces KafkaStreams#create into
>> KafkaStreams.builder. It is why they are consist of two commits.
>>
>> Please have a look you are free.
>>
>> Thanks,
>> Dongjin
>>
>> On Thu, May 30, 2019 at 12:26 PM Dongjin Lee  wrote:
>>
>>> I just updated the KIP document reflecting what I found about the clients
>>> API inconsistency and Matthias's comments. Since it is now obvious that
>>> modifying the default close timeout for the client is not feasible, the
>>> updated document proposes totally different alternatives. (see Rejected
>>> Alternatives section)
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-459%3A+Improve+KafkaStreams%23close
>>>
>>> Please have a look you are free. All kinds of feedbacks are welcomed!
>>>
>>> Thanks,
>>> Dongjin
>>>
>>> On Fri, May 24, 2019 at 1:07 PM Matthias J. Sax 
>>> wrote:
>>>
 Thanks for digging into the back ground.

 I think it would be good to get feedback from people who work on
 clients, too.


 -Matthias


 On 5/19/19 12:58 PM, Dongjin Lee wrote:
> Hi Matthias,
>
> I investigated the inconsistencies between `close` semantics of
 `Producer`,
> `Consumer`, and `AdminClient`. And I found that this inconsistency was
> intended. Here are the details:
>
> The current `KafkaConsumer#close`'s default timeout, 30 seconds, was
> introduced in KIP-102 (0.10.2.0)[^1]. According to the document, there
 are
> two differences between `Consumer` and `Producer`;
>
> 1. `Consumer`s don't have large requests.
> 2. `Consumer#close` is affected by consumer coordinator, whose close
> operation is affected by `request.timeout.ms`.
>
> By the above reasons, Consumer's default timeout was set a little bit
> different.[^3] (It is done by Rajini.)
>
> At the initial proposal, I proposed to change the default timeout
 value of
> `[Producer, AdminClient]#close` from `Long.MAX_VALUE` into another one;
> However, since it is now clear that the current implementation is
 totally
> reasonable, *it seems like changing the approach into just providing a
> close timeout into the clients used by KafkaStreams is a more suitable
> one.*[^4]
> This approach has the following advantages:
>
> 1. The problem described in KAFKA-7996 now resolved, since Producer
 doesn't
> hang up while its `close` operation.
> 2. We don't have to change the semantics of `Producer#close`,
> `AdminClient#close` nor `KafkaStreams#close`. As you pointed out, these
> kinds of changes are hard for users to reason about.
>
> How do you think?
>
> Thanks,
> Dongjin
>
> [^1]:
>
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-102+-+Add+close+with+timeout+for+consumers
> [^2]: "The existing close() method without a timeout will attempt to
 close
> the consumer gracefully with a default timeout of 30 seconds. This is
> different from the producer default of Long.MAX_VALUE since consumers
 don't
> have large requests."
> [^3]: 'Rejected Alternatives' section explains it.
> [^4]: In the case of Streams reset tool, `KafkaAdminClient`'s close
 timeout
> is 60 seconds (KIP-198):
 https://github.com/apache/kafka/pull/3927/files
>
> On Fri, Apr 26, 2019 at 5:16 PM Matthias J. Sax 
> wrote:
>
>> Thanks for the KIP.
>>
>> Overall, I agree with the sentiment of the KIP. The current semantics
 of
>> `KafkaStreams#close(timeout)` are not well defined. Also the general
>> client inconsistencies are annoying.
>>
>>
>>> This KIP make any change on public interfaces; however, it makes a
>> subtle change to the existing API's semantics. If this KIP is
 accepted,
>> documenting these semantics with as much detail as possible may much
 better.
>>
>> I am not sure if I would call this change 

Re: [DISCUSS] KIP-463: Auto-configure serdes passed alongside TopologyBuilder

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias


On 4/25/19 7:01 PM, Richard Yu wrote:
> Hi all,
> 
> Due to issues that was discovered during the first attempt to implement a
> solution for the KAFKA-3729 (
> https://issues.apache.org/jira/browse/KAFKA-3729),
> a KIP was thought to be necessary. There are a couple of alternatives by
> which we can proceed, so it would be good if we could discern the pros and
> cons of each approach.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-463%3A+Auto-configure+non-default+Serdes+passed+alongside+the+TopologyBuilder
> 
> Hope this helps,
> Richard Yu
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-508: Make Suppression State Queriable

2019-11-24 Thread Matthias J. Sax
Moved this KIP into status "inactive". Feel free to resume and any time.

-Matthias

On 8/16/19 7:08 AM, Dongjin Lee wrote:
> Hi all,
> 
> I would like to start a discussion of KIP-508, making suppression state
> queriable. Please give it a read when you are free and give some feedbacks.
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-508%3A+Make+Suppression+State+Queriable
> 
> Thanks,
> Dongjin
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Reopened] (KAFKA-8403) Suppress needs a Materialized variant

2019-11-24 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-8403:


> Suppress needs a Materialized variant
> -
>
> Key: KAFKA-8403
> URL: https://issues.apache.org/jira/browse/KAFKA-8403
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Major
>  Labels: needs-kip
>
> The newly added KTable Suppress operator lacks a Materialized variant, which 
> would be useful if you wanted to query the results of the suppression.
> Suppression results will eventually match the upstream results, but the 
> intermediate distinction may be meaningful for some applications. For 
> example, you could want to query only the final results of a windowed 
> aggregation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8403) Suppress needs a Materialized variant

2019-11-24 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-8403.

Resolution: Fixed

> Suppress needs a Materialized variant
> -
>
> Key: KAFKA-8403
> URL: https://issues.apache.org/jira/browse/KAFKA-8403
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Priority: Major
>  Labels: needs-kip
>
> The newly added KTable Suppress operator lacks a Materialized variant, which 
> would be useful if you wanted to query the results of the suppression.
> Suppression results will eventually match the upstream results, but the 
> intermediate distinction may be meaningful for some applications. For 
> example, you could want to query only the final results of a windowed 
> aggregation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk11 #984

2019-11-24 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: Remove unnecessary license generation code in wrapper.gradle


--
[...truncated 2.75 MB...]
org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics STARTED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver PASSED

org.apache.kafka.streams.TestTopicsTest > testValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testValueList PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordList PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testInputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testInputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders STARTED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValue STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValue PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldStoreAndReturnStateStores STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldStoreAndReturnStateStores PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureOutputRecordsUsingTo STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureOutputRecordsUsingTo PASSED

org.apache.kafka.streams.MockProcessorContextTest > shouldCaptureOutputRecords 
STARTED

org.apache.kafka.streams.MockProcessorContextTest > shouldCaptureOutputRecords 
PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
fullConstructorShouldSetAllExpectedAttributes STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
fullConstructorShouldSetAllExpectedAttributes PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureCommitsAndAllowReset STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureCommitsAndAllowReset PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldThrowIfForwardedWithDeprecatedChildName STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldThrowIfForwardedWithDeprecatedChildName PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldThrowIfForwardedWithDeprecatedChildIndex STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldThrowIfForwardedWithDeprecatedChildIndex PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureApplicationAndRecordMetadata STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureApplicationAndRecordMetadata PASSED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureRecordsOutputToChildByName STARTED

org.apache.kafka.streams.MockProcessorContextTest > 
shouldCaptureRecordsOutputToChildByName PASSED

org.apache.kafka.streams.MockProcessorContextTest > shouldCapturePunctuator 
STARTED

org.apache.kafka.streams.MockProcessorContextTest > shouldCapturePunctuator 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest >