date:20190620

See 


Changes:

[wangguoz] KAFKA-8569: integrate warning message under static membership (#6972)

--
[...truncated 2.52 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest >

[jira] [Created] (KAFKA-8576) Consumer failed to join the coordinator

2019-06-20 Thread yanrui (JIRA)

yanrui created KAFKA-8576:
-

 Summary: Consumer failed to join the coordinator
 Key: KAFKA-8576
 URL: https://issues.apache.org/jira/browse/KAFKA-8576
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1
Reporter: yanrui
 Attachments: image-2019-06-21-10-52-38-762.png

Environment：
 single node kafka （2.1.1）6g 4c
 client（0.11.0.1 ）
 consumer group number：1170
After running for a while, consumers can’t join the coordinator.The report is 
not the correct coordinator when describing the group.The consumer is  endless 
trap in the discovery group,then marking  the group coordinator dead .Ask for 
help analyzing the reason, thank you very much

   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Partition Reassignment in Cloud

2019-06-20 Thread Varun Kumar

The idea is to expand the cluster, i.e increase the number of brokers and 
repartition the partition to distribute the load. So at the end of process both 
the broker needs to be up and running.
Would like to know if it is possible to move disk/log.dir from one broker to 
another and updating metadata(partition assignment) will work in all cases ? I 
tried this is Kafka 2.2 and it worked, but not sure if there are any special 
cases where it might fail.

From: George Li 
Sent: Friday, June 21, 2019 2:17 AM
To: dev
Subject: Re: Partition Reassignment in Cloud

The new broker host meta.properties file can have the broker.id set to the 
original broker_id (with original host shutdown/decommission), the new host has 
the storage of the original host (either by copying or by change the network 
storage mount from original to new host).  This way, it saves time running 
reassignments to change old_broker_id => new_broker_id ?

On Wednesday, June 19, 2019, 9:19:58 PM PDT, Varun Kumar 
 wrote:

 Hi

I have been trying a small experiment with partition reassignment in cloud. 
where instead of copying data between brokers using network, I moved the disk 
between the 2 brokers and ran the partition reassignment. This actually 
increased the speed of partition reassignment significantly. (As it had to 
catchup/fetch only down time data)


I tried this experiment this in Kafka 2.2.1 and it worked. I validated the 
data-consistency using "kafka-replica-verification.sh" script.

Few more details of the experiment:

  *  Both the brokers from and to which the partitions are moving had to be 
shutdown.
  *  All the partitions in the disk are moved at once to new broker.
  *  Had to update broker.id property in meta.properties file for the moved log 
directory before broker restart .
  *  Had to re-balance Leaders after brokers restart.

Can you please let me know if this approach will work in production ? Is there 
any scenario where it might truncate/delete all data in moved disk and copy 
complete partition over network ?

Thanks
Varun

Build failed in Jenkins: kafka-trunk-jdk11 #647

See 


Changes:

[wangguoz] KAFKA-8569: integrate warning message under static membership (#6972)

--
[...truncated 2.52 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyVa

Jenkins build is back to normal : kafka-trunk-jdk8 #3738

See

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Hi Jun,

Thanks for your comments.

1. Yeah I think APIException would not make a distinct call here anymore,
and what really matters is the RetriableException. Updated the wiki.

2. Makes sense. Updated the wiki.

3. My current thoughts is to return the first ever hit error for that
partition, and also the encoded error_records would only includes the
relative offsets of records that hitting that exact error as well.


Guozhang

On Wed, Jun 12, 2019 at 3:38 PM Jun Rao  wrote:

> Hi, Guozhang,
>
> Thanks for the KIP. A few comments below.
>
> 1. "If the error_records is not empty and the error code is not API
> exception and is not retriable, still retry by creating a new batch ".
> InvalidTimestampException
> is an ApiException. It seems we should still retry the non-error records in
> the batch.
>
> 2. error_records => [INT64] : Since we don't have more than 2 billion
> messages per batch, we can just use INT32. It would also be useful to
> describe what those numbers are. I guess they are the relative offset in
> the batch?
>
> 3. It's possible that a batch of records hit more than one type of error
> for different records, which error code and error message should the server
> return to the client?
>
> Jun
>
> On Sat, May 11, 2019 at 12:44 PM Guozhang Wang  wrote:
>
> > Hello everyone,
> >
> > I'd like to start a discussion thread on this newly created KIP to
> improve
> > error communication and handling for producer response:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
> >
> > Thanks,
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [VOTE] KIP-474: To deprecate WindowStore#put(key, value)

2019-06-20 Thread Bill Bejeck

Sorry for being late to the party.

I've reviewed the KIP and it's a great step in the right direction.

+1 (binding)

Thanks,
Bill

On Thu, Jun 20, 2019 at 11:42 AM John Roesler  wrote:

> Not that it changes the outcome, but I'm also +1 (nonbinding).
>
> Been wanting to do this for a while, thanks Omkar!
>
> Now, if we can just get one more binding vote...
>
> -John
>
> On Sun, Jun 9, 2019 at 1:04 AM omkar mestry  wrote:
> >
> > Hi,
> >
> > Ok the voting thread is still open.
> >
> > Thanks Regards
> > Omkar Mestry
> >
> > On Sun, 9 Jun 2019 at 11:33 AM, Matthias J. Sax 
> > wrote:
> >
> > > Omkar,
> > >
> > > a KIP is accepted if there are 3 binding votes for it. So far, there
> are
> > > only 2. Hence, the KIP is not accepted yet. The vote stays open.
> > >
> > > Just wait a little longer, until you get one more binding vote.
> > >
> > >
> > > -Matthias
> > >
> > > On 6/7/19 3:33 AM, omkar mestry wrote:
> > > > Thanks to everyone who have voted! I'm closing this vote thread with
> a
> > > > final count:
> > > >
> > > > binding +1: 2 (Matthias, Guozhang)
> > > >
> > > > non-binding +1: 2 (Boyang, Dongjin)
> > > >
> > > > Thanks & Regards
> > > > Omkar Mestry
> > > >
> > > > On Tue, Jun 4, 2019 at 3:05 AM Guozhang Wang 
> wrote:
> > > >
> > > >> +1 (binding).
> > > >>
> > > >> On Sat, Jun 1, 2019 at 3:19 PM Matthias J. Sax <
> matth...@confluent.io>
> > > >> wrote:
> > > >>
> > > >>> +1 (binding)
> > > >>>
> > > >>> On 5/31/19 10:58 PM, Dongjin Lee wrote:
> > >  +1 (non-binding).
> > > 
> > >  Thanks,
> > >  Dongjin
> > > 
> > >  <
> > > >>>
> > > >>
> > >
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon
> > > 
> > >  Virus-free.
> > >  www.avast.com
> > >  <
> > > >>>
> > > >>
> > >
> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link
> > > 
> > >  <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > > 
> > >  On Sat, Jun 1, 2019 at 2:45 PM Boyang Chen 
> > > >> wrote:
> > > 
> > > > Thanks omkar for taking the initiative, +1 (non-binding).
> > > >
> > > > 
> > > > From: omkar mestry 
> > > > Sent: Saturday, June 1, 2019 1:40 PM
> > > > To: dev@kafka.apache.org
> > > > Subject: [VOTE] KIP-474: To deprecate WindowStore#put(key, value)
> > > >
> > > > Hi all,
> > > >
> > > > Since we seem to have an agreement in the discussion I would
> like to
> > > > start the vote on KIP-474.
> > > >
> > > > KIP 474 :-
> > > >
> > > >>>
> > > >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115526545
> > > >
> > > > Thanks & Regards
> > > > Omkar Mestry
> > > >
> > > 
> > > 
> > > >>>
> > > >>>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > >
> > >
>

[jira] [Created] (KAFKA-8575) Investigate cleaning up task suspension

2019-06-20 Thread Sophie Blee-Goldman (JIRA)

Sophie Blee-Goldman created KAFKA-8575:
--

 Summary: Investigate cleaning up task suspension
 Key: KAFKA-8575
 URL: https://issues.apache.org/jira/browse/KAFKA-8575
 Project: Kafka
  Issue Type: Sub-task
Reporter: Sophie Blee-Goldman


With KIP-429 the suspend/resume of tasks may have minimal gains while adding a 
lot of complexity and potential bugs. We should consider removing/cleaning it 
up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] KIP-478 Strongly Typed Processor API

Thanks for the feedback, Guozhang and Matthias,

Regarding motivation: I'll update the wiki. Briefly:
* Any processor can benefit. Imagine a pure user of the ProcessorAPI
who has very complex processing logic. I have seen several processor
implementation that are hundreds of lines long and call
`context.forward` in many different locations and branches. In such an
implementation, it would be very easy to have a bug in a rarely used
branch that forwards the wrong kind of value. This would structurally
prevent that from happening.
* Also, anyone who heavily uses the ProcessorAPI would likely have
developed helper methods to wire together processors, just as we have
in the DSL implementation. This change would enable them to ensure at
compile time that they are actually wiring together compatible types.
This was actually _my_ original motivation, since I found it very
difficult and time consuming to follow the Streams DSL internal
builders.

Regarding breaking the source compatibility of Processor: I would
_love_ to side-step the naming problem, but I really don't know if
it's excusable to break compatibility. I suspect that our oldest and
dearest friends are using the ProcessorAPI in some form or another,
and all their source code would break. It sucks to have to create a
whole new interface to get around this, but it feels like the right
thing to do. Would be nice to get even more feedback on this point,
though.

Regarding the types of stores, as I said in my response to Sophie,
it's not an issue.

Regarding the change to StreamsBuilder, it doesn't pin the types in
any way, since all the types are bounded by Object only, and there are
no extra constraints between arguments (each type is used only once in
one argument). But maybe I missed the point you were asking about.
Since the type takes generic paramters, we should allow users to pass
in parameterized arguments. Otherwise, they would _have to_ give us a
raw type, and they would be forced to get a "rawtyes" warning from the
compiler. So, it's our obligation in any API that accepts a
parameterized-type parameter to allow people to actually pass a
parameterized type, even if we don't actually use the parameters.

The naming question is a complex one, as I took pains to detail
previously. Please don't just pick out one minor point, call it weak,
and then claim that it invalidates the whole decision. I don't think
there's a clear best choice, so I'm more than happy for someone to
advocate for renaming the class instead of the package. Can you
provide some reasons why you think that would be better?

Regarding the deprecated methods, you're absolutely right. I'll update the KIP.

Thanks again for all the feedback!
-John

On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax  wrote:
>
> Just want to second what Sophie said about the stores. The type of a
> used stores is completely independent of input/output types.
>
> This related to change `addGlobalStore()` method. Why do you want to pin
> the types? In fact, people request the ability to filter() and maybe
> even map() the data before they are put into the global store. Limiting
> the types seems to be a step backward here?
>
>
>
> Also, the pack name is questionable.
>
> > This wouldn't be the first project to do something like this...
>
> Not a strong argument. I would actually propose to not a a new package,
> but just a new class `TypedProcessor`.
>
>
> For `ProcessorContext#forward` methods -- some of those methods are
> already deprecated. While the will still be affected, it would be worth
> to mark them as deprecated in the wiki page, too.
>
>
> @Guozhang: I dont' think we should break source compatibility in a minor
> release.
>
>
>
> -Matthias
>
>
>
> On 6/20/19 1:43 PM, Guozhang Wang wrote:
> > Hi John,
> >
> > Thanks for KIP! I've a few comments below:
> >
> > 1. So far the "Motivation" section is very general, and the only concrete
> > example that I have in mind is `TransformValues#punctuate`. Do we have any
> > other concrete issues that drive this KIP? If not then I feel better to
> > narrow the scope of this KIP to:
> >
> > 1.a) modifying ProcessorContext only with the output types on forward.
> > 1.b) modifying Transformer signature to have generics of ProcessorContext,
> > and then lift the restricting of not using punctuate: if user did not
> > follow the enforced typing and just code without generics, they will get
> > warning at compile time and get run-time error if they forward wrong-typed
> > records, which I think would be acceptable.
> >
> > I feel this would be a good solution for this specific issue; again, feel
> > free to update the wiki page with other known issues that cannot be
> > resolved.
> >
> > 2. If, we want to go with the current scope then my next question would be,
> > how much breakage we would introducing if we just modify the Processor
> > signature directly? My feeling is that DSL users would be most likely not
> > affected and PAPI users only need to modify a few

Build failed in Jenkins: kafka-trunk-jdk11 #646

See 


Changes:

[jason] MINOR: Fix Partition::toString method (#6971)

--
[...truncated 2.43 MB...]
org.apache.kafka.streams.scala.kstream.SuppressedTest > 
Suppressed.untilWindowCloses should produce the correct suppression STARTED

org.apache.kafka.streams.scala.kstream.SuppressedTest > 
Suppressed.untilWindowCloses should produce the correct suppression PASSED

org.apache.kafka.streams.scala.kstream.SuppressedTest > 
Suppressed.untilTimeLimit should produce the correct suppression STARTED

org.apache.kafka.streams.scala.kstream.SuppressedTest > 
Suppressed.untilTimeLimit should produce the correct suppression PASSED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig.maxRecords 
should produce the correct buffer config STARTED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig.maxRecords 
should produce the correct buffer config PASSED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig.maxBytes 
should produce the correct buffer config STARTED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig.maxBytes 
should produce the correct buffer config PASSED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig.unbounded 
should produce the correct buffer config STARTED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig.unbounded 
should produce the correct buffer config PASSED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig should 
support very long chains of factory methods STARTED

org.apache.kafka.streams.scala.kstream.SuppressedTest > BufferConfig should 
support very long chains of factory methods PASSED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed should 
create a Consumed with Serdes STARTED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed should 
create a Consumed with Serdes PASSED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed with 
timestampExtractor and resetPolicy should create a Consumed with Serdes, 
timestampExtractor and resetPolicy STARTED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed with 
timestampExtractor and resetPolicy should create a Consumed with Serdes, 
timestampExtractor and resetPolicy PASSED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed with 
timestampExtractor should create a Consumed with Serdes and timestampExtractor 
STARTED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed with 
timestampExtractor should create a Consumed with Serdes and timestampExtractor 
PASSED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed with 
resetPolicy should create a Consumed with Serdes and resetPolicy STARTED

org.apache.kafka.streams.scala.kstream.ConsumedTest > Create a Consumed with 
resetPolicy should create a Consumed with Serdes and resetPolicy PASSED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced should 
create a Produced with Serdes STARTED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced should 
create a Produced with Serdes PASSED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced with 
timestampExtractor and resetPolicy should create a Consumed with Serdes, 
timestampExtractor and resetPolicy STARTED

org.apache.kafka.streams.scala.kstream.ProducedTest > Create a Produced with 
timestampExtractor and resetPolicy should create a Consumed with Serdes, 
timestampExtractor and resetPolicy PASSED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped should 
create a Grouped with Serdes STARTED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped should 
create a Grouped with Serdes PASSED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped with 
repartition topic name should create a Grouped with Serdes, and repartition 
topic name STARTED

org.apache.kafka.streams.scala.kstream.GroupedTest > Create a Grouped with 
repartition topic name should create a Grouped with Serdes, and repartition 
topic name PASSED

org.apache.kafka.streams.scala.kstream.JoinedTest > Create a Joined should 
create a Joined with Serdes STARTED

org.apache.kafka.streams.scala.kstream.JoinedTest > Create a Joined should 
create a Joined with Serdes PASSED

org.apache.kafka.streams.scala.kstream.JoinedTest > Create a Joined should 
create a Joined with Serdes and repartition topic name STARTED

org.apache.kafka.streams.scala.kstream.JoinedTest > Create a Joined should 
create a Joined with Serdes and repartition topic name PASSED

org.apache.kafka.streams.scala.kstream.KTableTest > filter a KTable should 
filter records satisfying the predicate STARTED

org.apache.kafka.streams.scala.kstream.KTableTest > filter a KTable should 
filter records satisfying the predicat

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

I have performance concerns about this proposal, because each time a
window/session store is accessed, a new (unnecessary) object needs to be
created, and accessing the store in the processors is on the hot code path.


-Matthias


On 6/20/19 10:57 AM, John Roesler wrote:
> Hi again, all,
> 
> After wrestling with some other issues around the window interface,
> I'd propose that we consider normalizing the WindowStore (and
> SessionStore) interfaces with respect to KeyValueStore. We can't
> actually make the classes related because they'll clash on the
> deprecated methods, but we can align them. Doing so would be an
> ergonomic advantage, since all our provided store interfaces would
> have the same "shape".
> 
> Specifically, this means that we would deprecate any method that takes
> a raw "K key, long windowStartTime" or returns a raw "K key", and
> instead always take as arguments Windowed keys and also return
> Windowed results. This proposal already comes close to this, by
> re-using the KeyValueIterator>, so it would just be a
> few tweaks to "perfect" it.
> 
> What do you think?
> 
> Thanks,
> -John
> 
> On Tue, May 28, 2019 at 2:58 PM Matthias J. Sax  wrote:
>>
>> Let's see what other think about (2)
>>
>>
>>
>> (3) Interesting. My reasoning follows the code:
>>
>> For example, `RocksDbKeyValueBytesStoreSupplier#metricsScope()`returns
>> "rocksdb-state" and this is concatenated later in `MeteredKeyValueStore`
>> that adds a tag
>>
>>   key:   metricScope + "-id"
>>   value: name()  // this is the store name
>>
>> Hence, the `-state` part comes from the supplier and thus it seems to be
>> part of `[store-type]` (ie, `store-type` == metricScope()` -- not sure
>> if this interpretation holds).
>>
>> If it's not part of `[store-type]` the supplier should not return it as
>> `metricScope()` IMHO, but the `MeteredKeyValueStore` should add it
>> together with `-id` suffix from my understanding.
>>
>> Thoughts?
>>
>>
>>
>> (5) Renaming to `streams` might imply that we need to rename _all_
>> metrics, not just the store metrics that are affected, to avoid
>> inconsistencies.
>>
>> If we really want to rename it, I would rather suggest to do it as part
>> of KIP-444, that is a larger rework anyway. It seems to be out of scope
>> for this KIP.
>>
>>
>> -Matthias
>>
>>
>> On 5/28/19 12:30 PM, Bruno Cadonna wrote:
>>> Hi Matthias,
>>>
>>> 2)
>>> Yes, this is how I meant it.
>>> I am still in favour of `value-by-...` instead of `get`, because it
>>> gives you immediately the meaning of the metric without the need to
>>> know the API of the stores.
>>>
>>> 3)
>>> I asked to avoid misunderstandings because in KIP-444, `-state-` is
>>> stated explicitly in the tag.
>>>
>>> 5)
>>> The question was not about correctness. I know that we use `stream` in
>>> the metrics group. The question was rather if we want to change it.
>>> The streams metrics interface is called `StreamsMetrics`. I understand
>>> that this has low priority. Just wanted to mention it.
>>>
>>> Best,
>>> Bruno
>>>
>>>
>>> On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax  
>>> wrote:

 Thanks Bruno.

 I updated the KIP without changing the metric name yet -- want to
 discuss this first.



 1) I am also ok to keep `all()`


 2) That's a good idea. To make sure I understand your suggestion
 correctly, below the mapping between metric name an methods.
 (Some metric names are use by multiple stores):

 (ReadOnly)KeyValueStore:

>> value-by-key : ReadOnlyKeyValueStore#get(K key)

 -> replaces `get` metric

 I am actually not sure if I like this. Just keeping `get` instead of
 `value-by-key`, `value-by-key-and-time` seems more reasonable to me.



 (ReadOnly)WindowStore:

>> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long 
>> windowStartTime)

 I think a simple `get` might be better

>> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant 
>> from, Instant to)
>> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, 
>> long toTime)

>> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K 
>> to, Instant from, Instant to)
>> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long 
>> fromTime, long toTime)

>> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
>> range-by-time-range : WindowStore#range(long fromTime, long toTime)

>> range-all : ReadOnlyWindowStore#all()


 (ReadOnly)SessionStore:

>> range-by-key : ReadOnlySessionStore#range(K key)

>> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
>> range-by-key-and-time-range : SessionStore#range(K key, long 
>> earliestSessionEndTime, long latestSessionStartTime)

>> range-by-key-range-and-time-range : SessionStore#range(K keyFrom,

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

For encoding the list-type: I see John's point about re-encoding the
list-type redundantly. However, I also don't like the idea that the
Deserializer returns a fixed type...

Maybe it's best allow users to specify the target list type on
deserialization via config?

Similar for the primitive types: I don't think we need to encode the
type size, but users could specify the type on the deserializer (via a
config again)?


About generics: nesting could be arbitrarily deep. Hence, I doubt we can
support this and a cast will be necessary at some point in the user code.



-Matthias



On 6/20/19 1:21 PM, John Roesler wrote:
> Hey Daniyar,
> 
> Thanks for looking at it!
> 
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
> 
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map? The serde could
> only be used to produce a LinkedList, thus, we'd still need an
> inner serde, like you have in the KIP (Serde innerSerde).
> 
> Something more like Serde> = Serdes.listSerde(
>   /**list type**/ LinkedList.class,
>   /**inner serde**/ new MyRecordSerde()
> )
> 
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
> 
> 
> What do you think?
> Thanks,
> -John
> 
> On Thu, Jun 20, 2019 at 2:46 PM Development  > wrote:
> 
> Hey John,
> 
> I gave read about TypeReference. It could work for the list serde.
> However, it is not directly
> supported: https://github.com/FasterXML/jackson-databind/issues/1490
> The only way is to pass an actual class object into the constructor,
> something like:
> 
> It could be an option, but not a pretty one. What do you think of my
> approach to use vanilla java and canonical class name? (As described
> previously)
> 
> Best,
> Daniyar Yeralin
> 
>> On Jun 20, 2019, at 2:45 PM, Development > > wrote:
>>
>> Hi John,
>>
>> Thank you for your input! Yes, my idea looks a little bit over
>> engineered :)
>>
>> I also wanted to see a feedback from Mathias as well since he gave
>> me an idea about storing fixed/variable size entries.
>>
>> Best,
>> Daniyar Yeralin
>>
>>> On Jun 18, 2019, at 6:06 PM, John Roesler >> > wrote:
>>>
>>> Hi Daniyar,
>>>
>>> That's a very clever solution!
>>>
>>> One observation is that, now, this is what we might call a
>>> polymorphic
>>> serde. That is, you're detecting the actual concrete type and then
>>> promising to produce the exact same concrete type on read. There are
>>> some inherent problems with this approach, which in general require
>>> some kind of  schema registry (not necessarily Schema Registry, just
>>> any registry for schemas) to solve.
>>>
>>> Notice that every serialized record has quite a bit of duplicated
>>> information: the concrete type as well as a byte to indicate whether
>>> the value type is a fixed size, and, if so, an integer to
>>> indicate the
>>> actual size. These constitute a schema, of sorts, because they
>>> tell us
>>> later how exactly to deserialize the data. Unfortunately, this
>>> information is completely redundant. In all likelihood, the
>>> information will be exactly the same for every record in the topic.
>>> This problem is essentially the core motivation for serializations
>>> like Avro: to move the schema outside of the serialization itself, so
>>> that the records won't contain so much redundant information.
>>>
>>> In this light, I'm wondering if it makes sense to go back to
>>> something
>>> like what you had earlier in which you don't support perfectly
>>> preserving the concrete type for _this_ serde, but instead just
>>> support deserializing to _some_ List. Then, you could defer full,
>>> perfect, type preservation to serdes that have an external system in
>>> which to register their type information.
>>>
>>> There does exist an alternative, if we really do want to preserve the
>>> concrete type (which does seem kind of nice). You can add a
>>> configuration option specifically for the serde to configure what the
>>> list type will be, and maybe what the element type is, as well.
>>>
>>> As far as "related work" goes, you might be interested to take a look
>>> at how Jackson can be configured to deserialize into a specific,
>>> arbitrarily nested, generically parameterized class structure.
>>> Specifically, you might find
>>> 
>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>>> interesting.
>>>
>>> Thanks,
>>> -John
>>>
>>>

[jira] [Created] (KAFKA-8574) EOS race condition during task transition leads to LocalStateStore truncation in Kafka Streams 2.0.1

2019-06-20 Thread William Greer (JIRA)

William Greer created KAFKA-8574:

Summary: EOS race condition during task transition leads to
LocalStateStore truncation in Kafka Streams 2.0.1
Key: KAFKA-8574
URL: https://issues.apache.org/jira/browse/KAFKA-8574
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 2.0.1
Reporter: William Greer

*Overview*
While using EOS in Kafka Stream there is a race condition where the checkpoint
file is written by the previous owning thread (Thread A) after the new owning
thread (Thread B) reads the checkpoint file. Thread B then starts a restoration
since no checkpoint file was found. A re-balance occurs before Thread B
completes the restoration and a third Thread (Thread C) becomes the owning
thread (Thread C) reads the checkpoint file written by Thread A which does not
correspond to the current state of the RocksDB state store. When this race
condition occurs the state store will have the most recent records and some
amount of the oldest records but will be missing some amount of records in
between. If A->Z represents the entire changelog to the present then when this
scenario occurs the state store would contain records [A->K and Y->Z] where the
state store is missing records K->Y.
This race condition is possible due to dirty writes and dirty reads of the
checkpoint file.
*Example:*
Thread refers to a Kafka Streams StreamThread [0]
Thread A, B and C are running in the same JVM in the same streams application.
Scenario:
Thread-A is in RUNNING state and up to date on partition 1.
Thread-A is suspended on 1. This does not write a checkpoint file because EOS
is enabled [1]
Thread-B is assigned to 1
Thread-B does not find checkpoint in StateManager [2]
Thread-A is assigned a different partition. Task writes suspended tasks
checkpoints to disk. Checkpoint for 1 is written. [3]
Thread-B deletes LocalStore and starts restoring. The deletion of the
LocalStore does not delete checkpoint file. [4]
Thread-C is revoked
Thread-A is revoked
Thread-B is revoked from the assigned status. Does not write a checkpoint file
- Note Thread-B never reaches the running state, it remains in the
PARTITIONS_ASSIGNED state until it transitions to the PARTITIONS_REVOKED state
Thread-C is assigned 1
Thread-C finds checkpoint in StateManager. This checkpoint corresponds to where
Thread-A left the state store for partition 1 at and not where Thread-B left
the state store at.
Thread-C begins restoring from checkpoint. The state store is missing an
unknown number of records at this point
Thread-B is assigned does not write a checkpoint file for partition 1, because
it had not reached a running status before being revoked
[0]
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java
[1]
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamTask.java#L522-L553
[2]
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/ProcessorStateManager.java#L98
[3]
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java#L104-L105
&
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AssignedTasks.java#L316-L331
[4]
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/StoreChangelogReader.java#L228
&
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AbstractStateManager.java#L62-L123
Specifically
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/AbstractStateManager.java#L107-L119
is where the state store is deleted but the checkpoint file is not.
*How we recovered:*
1. Deleted the impacted state store. This triggered multiple exceptions and
initiated a re-balance.
*Possible approaches to address this issue:*
1. Add a collection of global task locks for concurrency protection of the
checkpoint file. With the lock for suspended tasks being released after
closeNonAssignedSuspendedTasks and the locks being acquired after lock release
for the assigned tasks.
2. Delete checkpoint file in EOS when partitions are revoked. This doesn't
address the race condition but would make it so that the checkpoint file would
never be ahead of the LocalStore in EOS, this would increase the likelihood of
triggering a full restoration of a LocalStore on partition movement between
threads on one host.
3. Configure task stickiness for StreamThreads. E.G. if a host with multiple
StreamThreads is assigned a task the host had before prefer to assign the task
to the thread on the host that had the task befor

Re: [DISCUSS] KIP-478 Strongly Typed Processor API

Just want to second what Sophie said about the stores. The type of a
used stores is completely independent of input/output types.

This related to change `addGlobalStore()` method. Why do you want to pin
the types? In fact, people request the ability to filter() and maybe
even map() the data before they are put into the global store. Limiting
the types seems to be a step backward here?



Also, the pack name is questionable.

> This wouldn't be the first project to do something like this...

Not a strong argument. I would actually propose to not a a new package,
but just a new class `TypedProcessor`.


For `ProcessorContext#forward` methods -- some of those methods are
already deprecated. While the will still be affected, it would be worth
to mark them as deprecated in the wiki page, too.


@Guozhang: I dont' think we should break source compatibility in a minor
release.



-Matthias



On 6/20/19 1:43 PM, Guozhang Wang wrote:
> Hi John,
> 
> Thanks for KIP! I've a few comments below:
> 
> 1. So far the "Motivation" section is very general, and the only concrete
> example that I have in mind is `TransformValues#punctuate`. Do we have any
> other concrete issues that drive this KIP? If not then I feel better to
> narrow the scope of this KIP to:
> 
> 1.a) modifying ProcessorContext only with the output types on forward.
> 1.b) modifying Transformer signature to have generics of ProcessorContext,
> and then lift the restricting of not using punctuate: if user did not
> follow the enforced typing and just code without generics, they will get
> warning at compile time and get run-time error if they forward wrong-typed
> records, which I think would be acceptable.
> 
> I feel this would be a good solution for this specific issue; again, feel
> free to update the wiki page with other known issues that cannot be
> resolved.
> 
> 2. If, we want to go with the current scope then my next question would be,
> how much breakage we would introducing if we just modify the Processor
> signature directly? My feeling is that DSL users would be most likely not
> affected and PAPI users only need to modify a few lines on class
> declaration. I feel it worth doing some research on this part and then
> decide if we really want to bite the bullet of duplicated Processor /
> ProcessorSupplier classes for maintaining compatibility.
> 
> 
> Guozhang
> 
> 
> 
> On Wed, Jun 19, 2019 at 12:21 PM John Roesler  wrote:
> 
>> Hi all,
>>
>> In response to the feedback so far, I changed the package name from
>> `processor2` to `processor.generic`.
>>
>> Thanks,
>> -John
>>
>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler  wrote:
>>>
>>> Thanks for the feedback, Sophie!
>>>
>>> I actually felt a little uneasy when I wrote that remark, because it's
>>> not restricted at all in the API, it's just available to you if you
>>> choose to give your stores and context the same parameters. So, I
>>> think your use case is valid, and also perfectly permissable under the
>>> current KIP. Sorry for sowing confusion on my own discussion thread!
>>>
>>> I'm not crazy about the package name, either. I went with it only
>>> because there's seemingly nothing special about the new package except
>>> that it can't have the same name as the old one. Otherwise, the
>>> existing "processor" and "Processor" names for the package and class
>>> are perfectly satisfying. Rather than pile on additional semantics, it
>>> seemed cleaner to just add a number to the package name.
>>>
>>> This wouldn't be the first project to do something like this... Apache
>>> Commons, for example, has added a "2" to the end of some of their
>>> packages for exactly the same reason.
>>>
>>> I'm open to any suggestions. For example, we could do something like
>>> org.apache.kafka.streams.typedprocessor.Processor or
>>> org.apache.kafka.streams.processor.typed.Processor , which would have
>>> just about the same effect. One microscopic thought is that, if
>>> there's another interface in the "processor" package that we wish to
>>> do the same thing to, would _could_ pile it in to "processor2", but we
>>> couldn't do the same if we use a package that has "typed" in the name,
>>> unless that change is _also_ related to types in some way. But this
>>> seems like a very minor concern.
>>>
>>> What's your preference?
>>> -John
>>>
>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman 
>> wrote:

 Hey John, thanks for writing this up! I like the proposal but there's
>> one
 point that I think may be too restrictive:

 "A processor that happens to use a typed store is actually emitting the
 same types that it is storing."

 I can imagine someone could want to leverage this new type safety
>> without
 also limiting how they can interact with/use their store. As an
>> (admittedly
 contrived) example, say you have an input stream of purchases of a
>> certain
 type (entertainment, food, etc), and on seeing a new record you want to
 output how many typ

Re: [VOTE] KIP-471: Expose RocksDB Metrics in Kafka Streams

+1 (binding)


On 6/20/19 11:53 AM, Guozhang Wang wrote:
> +1 (binding)
> 
> Thanks Bruno!
> 
> Would also be interested to see how much overhead it may incur by enabling
> DEBUG metrics now, if it is huge we may consider doing finer-grained
> metrics enabling, but that would be another follow-up task.
> 
> Guozhang
> 
> On Wed, Jun 19, 2019 at 1:37 PM Patrik Kleindl  wrote:
> 
>> +1 (non-binding)
>> Thanks!
>> Best regards
>> Patrik
>>
>>> Am 19.06.2019 um 21:55 schrieb Bill Bejeck :
>>>
>>> +1 (binding)
>>>
>>> Thanks,
>>> Bill
>>>
 On Wed, Jun 19, 2019 at 1:19 PM John Roesler  wrote:

 I'm +1 (nonbinding)

 Thanks!
 -John

> On Tue, Jun 18, 2019 at 7:48 AM Bruno Cadonna 
>> wrote:
>
> Hi,
>
> I would like to start the voting on KIP-471:
>

>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams
>
> You can find the discussion here:
>

>> https://lists.apache.org/thread.html/125bdd984fe0667962018da6ce10bce6d5895c5103955a8e4c730fef@%3Cdev.kafka.apache.org%3E
>
> Best,
> Bruno

>>
> 
> 



signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-470: TopologyTestDriver test input and output usability improvements

Hello Jukka,

Thanks for writing the KIP, I have a couple of quick questions:

1) Is "TestRecord" an existing class that you propose to piggy-back on?
Right now we have a scala TestRecord case class but I doubt that was your
proposal, or are you proposing to add a new Java class?

2) Would the new API only allow a single input / output topic with
`createInput/OutputTopic`? If not, when we call pipeInput how to determine
which topic this record should be pipe to?


Guozhang

On Mon, Jun 17, 2019 at 1:34 PM John Roesler  wrote:

> Woah, I wasn't aware of that Hamcrest test style. Awesome!
>
> Thanks for the updates. I look forward to hearing what others think.
>
> -John
>
> On Mon, Jun 17, 2019 at 4:12 AM Jukka Karvanen
>  wrote:
> >
> > Wiki page updated:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-470%3A+TopologyTestDriver+test+input+and+output+usability+improvements
> >
> >
> > ClientRecord removed and replaced with TestRecord in method calls.
> > TestRecordFactory removed (time tracking functionality to be included to
> > TestInputTopic)
> > OutputVerifier deprecated
> > TestRecord topic removed and getters added
> >
> > Getters in TestRecord enable writing test ignoring selected fields with
> > hamcrest like this:
> >
> > assertThat(outputTopic.readRecord(), allOf(
> > hasProperty("key", equalTo(1L)),
> > hasProperty("value", equalTo("Hello")),
> > hasProperty("headers", equalTo(headers;
> >
> >
> > Jukka
> >
> > la 15. kesäk. 2019 klo 1.10 John Roesler (j...@confluent.io) kirjoitti:
> >
> > > Sounds good. Thanks as always for considering my feedback!
> > > -John
> > >
> > > On Fri, Jun 14, 2019 at 12:12 PM Jukka Karvanen
> > >  wrote:
> > > >
> > > > Ok, I will modify KIP Public Interface in a wiki based on the
> feedback.
> > > >
> > > > TestRecordFactory / ConsumerRecordFactory was used by TestInputTopic
> with
> > > > the version I had with KIP456, but maybe I can merge That
> functionality
> > > to
> > > > InputTopic or  TestRecordFactory   can kept non public maybe moving
> it to
> > > > internals package.
> > > >
> > > > I will make the proposal with a slim down interface.
> > > > I don't want to go to so slim as you proposed with only TestRecord or
> > > > List, because you then still end up doing helper methods
> to
> > > > construct List of TestRecord.
> > > > The list of values is easier to write and clearer to read than if you
> > > need
> > > > to contruct list of TestRecords.
> > > >
> > > > For example:
> > > >
> > > > final List inputValues = Arrays.asList(
> > > > "Apache Kafka Streams Example",
> > > > "Using Kafka Streams Test Utils",
> > > > "Reading and Writing Kafka Topic"
> > > > );
> > > > inputTopic.pipeValueList(inputValues);
> > > >
> > > >
> > > > Let's check after the next iteration is it still worth reducing the
> > > methods.
> > > >
> > > >
> > > > Jukka
> > > >
> > > >
> > > > pe 14. kesäk. 2019 klo 18.27 John Roesler (j...@confluent.io)
> kirjoitti:
> > > >
> > > > > Thanks, Jukka,
> > > > >
> > > > > Ok, I buy this reasoning.
> > > > >
> > > > > Just to echo what I think I read, you would just drop ClientRecord
> > > > > from the proposal, and TestRecord would stand on its own, with the
> > > > > same methods and properties you proposed, and the "input topic"
> would
> > > > > accept TestRecord, and the "output topic" would produce TestRecord?
> > > > > Further, the "input and output topic" classes would internally
> handle
> > > > > the conversion to and from ConsumerRecord and ProducerRecord to
> pass
> > > > > to and from the TopologyTestDriver?
> > > > >
> > > > > This seems good to me.
> > > > >
> > > > > Since the object coming out of the "output topic" is much more
> > > > > ergonomic, I suspect we won't need the OutputVerifier at all. It
> was
> > > > > mostly needed because of all the extra junk in ProducerRecord you
> need
> > > > > to ignore. It seems better to just deprecate it. If in the future
> it
> > > > > turns out there is some actual use case for a verifier, we can
> have a
> > > > > very small KIP to add one. But reading your response, I suspect
> that
> > > > > existing test verification libraries would be sufficient on their
> own.
> > > > >
> > > > > Similarly, it seems like we can shrink the total interface by
> removing
> > > > > the TestRecordFactory from the proposal. If TestRecord already
> offers
> > > > > all the constructors you'd want, then the only benefit of the
> factory
> > > > > is to auto-increment the timestamps, but then again, the "input
> topic"
> > > > > can already do that (e.g., it can do it if the record timestamp is
> not
> > > > > set).
> > > > >
> > > > > Likewise, if the TestRecords are easy to create, then we don't need
> > > > > all the redundant methods in "input topic" to pipe values, or
> > > > > key/values, or key/value/timestamp, etc. We can do with just two
> > > > > methods, one for a single TestRecord and one for a collection of
> them.
> > >

Re: Partition Reassignment in Cloud

2019-06-20 Thread George Li

 The new broker host meta.properties file can have the broker.id set to the 
original broker_id (with original host shutdown/decommission), the new host has 
the storage of the original host (either by copying or by change the network 
storage mount from original to new host).  This way, it saves time running 
reassignments to change old_broker_id => new_broker_id ? 

On Wednesday, June 19, 2019, 9:19:58 PM PDT, Varun Kumar 
 wrote:  
 
 Hi

I have been trying a small experiment with partition reassignment in cloud. 
where instead of copying data between brokers using network, I moved the disk 
between the 2 brokers and ran the partition reassignment. This actually 
increased the speed of partition reassignment significantly. (As it had to 
catchup/fetch only down time data)


I tried this experiment this in Kafka 2.2.1 and it worked. I validated the 
data-consistency using "kafka-replica-verification.sh" script.

Few more details of the experiment:

  *  Both the brokers from and to which the partitions are moving had to be 
shutdown.
  *  All the partitions in the disk are moved at once to new broker.
  *  Had to update broker.id property in meta.properties file for the moved log 
directory before broker restart .
  *  Had to re-balance Leaders after brokers restart.

Can you please let me know if this approach will work in production ? Is there 
any scenario where it might truncate/delete all data in moved disk and copy 
complete partition over network ?

Thanks
Varun

Re: [DISCUSS] KIP-478 Strongly Typed Processor API

Hi John,

Thanks for KIP! I've a few comments below:

1. So far the "Motivation" section is very general, and the only concrete
example that I have in mind is `TransformValues#punctuate`. Do we have any
other concrete issues that drive this KIP? If not then I feel better to
narrow the scope of this KIP to:

1.a) modifying ProcessorContext only with the output types on forward.
1.b) modifying Transformer signature to have generics of ProcessorContext,
and then lift the restricting of not using punctuate: if user did not
follow the enforced typing and just code without generics, they will get
warning at compile time and get run-time error if they forward wrong-typed
records, which I think would be acceptable.

I feel this would be a good solution for this specific issue; again, feel
free to update the wiki page with other known issues that cannot be
resolved.

2. If, we want to go with the current scope then my next question would be,
how much breakage we would introducing if we just modify the Processor
signature directly? My feeling is that DSL users would be most likely not
affected and PAPI users only need to modify a few lines on class
declaration. I feel it worth doing some research on this part and then
decide if we really want to bite the bullet of duplicated Processor /
ProcessorSupplier classes for maintaining compatibility.


Guozhang



On Wed, Jun 19, 2019 at 12:21 PM John Roesler  wrote:

> Hi all,
>
> In response to the feedback so far, I changed the package name from
> `processor2` to `processor.generic`.
>
> Thanks,
> -John
>
> On Mon, Jun 17, 2019 at 4:49 PM John Roesler  wrote:
> >
> > Thanks for the feedback, Sophie!
> >
> > I actually felt a little uneasy when I wrote that remark, because it's
> > not restricted at all in the API, it's just available to you if you
> > choose to give your stores and context the same parameters. So, I
> > think your use case is valid, and also perfectly permissable under the
> > current KIP. Sorry for sowing confusion on my own discussion thread!
> >
> > I'm not crazy about the package name, either. I went with it only
> > because there's seemingly nothing special about the new package except
> > that it can't have the same name as the old one. Otherwise, the
> > existing "processor" and "Processor" names for the package and class
> > are perfectly satisfying. Rather than pile on additional semantics, it
> > seemed cleaner to just add a number to the package name.
> >
> > This wouldn't be the first project to do something like this... Apache
> > Commons, for example, has added a "2" to the end of some of their
> > packages for exactly the same reason.
> >
> > I'm open to any suggestions. For example, we could do something like
> > org.apache.kafka.streams.typedprocessor.Processor or
> > org.apache.kafka.streams.processor.typed.Processor , which would have
> > just about the same effect. One microscopic thought is that, if
> > there's another interface in the "processor" package that we wish to
> > do the same thing to, would _could_ pile it in to "processor2", but we
> > couldn't do the same if we use a package that has "typed" in the name,
> > unless that change is _also_ related to types in some way. But this
> > seems like a very minor concern.
> >
> > What's your preference?
> > -John
> >
> > On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman 
> wrote:
> > >
> > > Hey John, thanks for writing this up! I like the proposal but there's
> one
> > > point that I think may be too restrictive:
> > >
> > > "A processor that happens to use a typed store is actually emitting the
> > > same types that it is storing."
> > >
> > > I can imagine someone could want to leverage this new type safety
> without
> > > also limiting how they can interact with/use their store. As an
> (admittedly
> > > contrived) example, say you have an input stream of purchases of a
> certain
> > > type (entertainment, food, etc), and on seeing a new record you want to
> > > output how many types of purchase a shopper has made more than 5
> purchases
> > > of in the last month. Your state store will probably be holding some
> more
> > > complicated PurchaseHistory object (keyed by user), but your output is
> just
> > > a 
> > >
> > > I'm also not crazy about "processor2" as the package name ... not sure
> what
> > > a better one would be though (something with "typed"?)
> > >
> > > On Mon, Jun 17, 2019 at 12:47 PM John Roesler 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to propose KIP-478 (
> https://cwiki.apache.org/confluence/x/2SkLBw
> > > > ).
> > > >
> > > > This proposal would add output type bounds to the Processor interface
> > > > in Kafka Streams, which enables static checking of a number of useful
> > > > properties:
> > > > * A processor B that consumes the output of processor A is actually
> > > > expecting the same types that processor A produces.
> > > > * A processor that happens to use a typed store is actually emitting
> > > > the same types that it is stor

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Hey Daniyar,

Thanks for looking at it!

Something like your screenshot is more along the lines of what I was
thinking. Sorry, but I didn't follow what you mean, how would that not be
"vanilla java"?

Unfortunately the deserializer needs more information, though. For example,
what if the inner type is a Map? The serde could only be
used to produce a LinkedList, thus, we'd still need an inner serde,
like you have in the KIP (Serde innerSerde).

Something more like Serde> = Serdes.listSerde(
  /**list type**/ LinkedList.class,
  /**inner serde**/ new MyRecordSerde()
)

And in configuration, it's something like:
default.key.serde: org...ListSerde
default.key.list.serde.type: java.util.LinkedList
default.key.list.serde.inner: com.mycompany.MyRecordSerde


What do you think?
Thanks,
-John

On Thu, Jun 20, 2019 at 2:46 PM Development  wrote:

> Hey John,
>
> I gave read about TypeReference. It could work for the list serde.
> However, it is not directly supported:
> https://github.com/FasterXML/jackson-databind/issues/1490
> The only way is to pass an actual class object into the constructor,
> something like:
>
> It could be an option, but not a pretty one. What do you think of my
> approach to use vanilla java and canonical class name? (As described
> previously)
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 2:45 PM, Development  wrote:
>
> Hi John,
>
> Thank you for your input! Yes, my idea looks a little bit over engineered
> :)
>
> I also wanted to see a feedback from Mathias as well since he gave me an
> idea about storing fixed/variable size entries.
>
> Best,
> Daniyar Yeralin
>
> On Jun 18, 2019, at 6:06 PM, John Roesler  wrote:
>
> Hi Daniyar,
>
> That's a very clever solution!
>
> One observation is that, now, this is what we might call a polymorphic
> serde. That is, you're detecting the actual concrete type and then
> promising to produce the exact same concrete type on read. There are
> some inherent problems with this approach, which in general require
> some kind of  schema registry (not necessarily Schema Registry, just
> any registry for schemas) to solve.
>
> Notice that every serialized record has quite a bit of duplicated
> information: the concrete type as well as a byte to indicate whether
> the value type is a fixed size, and, if so, an integer to indicate the
> actual size. These constitute a schema, of sorts, because they tell us
> later how exactly to deserialize the data. Unfortunately, this
> information is completely redundant. In all likelihood, the
> information will be exactly the same for every record in the topic.
> This problem is essentially the core motivation for serializations
> like Avro: to move the schema outside of the serialization itself, so
> that the records won't contain so much redundant information.
>
> In this light, I'm wondering if it makes sense to go back to something
> like what you had earlier in which you don't support perfectly
> preserving the concrete type for _this_ serde, but instead just
> support deserializing to _some_ List. Then, you could defer full,
> perfect, type preservation to serdes that have an external system in
> which to register their type information.
>
> There does exist an alternative, if we really do want to preserve the
> concrete type (which does seem kind of nice). You can add a
> configuration option specifically for the serde to configure what the
> list type will be, and maybe what the element type is, as well.
>
> As far as "related work" goes, you might be interested to take a look
> at how Jackson can be configured to deserialize into a specific,
> arbitrarily nested, generically parameterized class structure.
> Specifically, you might find
>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> interesting.
>
> Thanks,
> -John
>
> On Mon, Jun 17, 2019 at 12:38 PM Development  wrote:
>
>
> bump
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

2019-06-20 Thread Development

Hey John,

I gave read about TypeReference. It could work for the list serde. However, it 
is not directly supported: 
https://github.com/FasterXML/jackson-databind/issues/1490 

The only way is to pass an actual class object into the constructor, something 
like:


It could be an option, but not a pretty one. What do you think of my approach 
to use vanilla java and canonical class name? (As described previously)

Best,
Daniyar Yeralin

> On Jun 20, 2019, at 2:45 PM, Development  wrote:
> 
> Hi John,
> 
> Thank you for your input! Yes, my idea looks a little bit over engineered :)
> 
> I also wanted to see a feedback from Mathias as well since he gave me an idea 
> about storing fixed/variable size entries.
> 
> Best,
> Daniyar Yeralin
> 
>> On Jun 18, 2019, at 6:06 PM, John Roesler  wrote:
>> 
>> Hi Daniyar,
>> 
>> That's a very clever solution!
>> 
>> One observation is that, now, this is what we might call a polymorphic
>> serde. That is, you're detecting the actual concrete type and then
>> promising to produce the exact same concrete type on read. There are
>> some inherent problems with this approach, which in general require
>> some kind of  schema registry (not necessarily Schema Registry, just
>> any registry for schemas) to solve.
>> 
>> Notice that every serialized record has quite a bit of duplicated
>> information: the concrete type as well as a byte to indicate whether
>> the value type is a fixed size, and, if so, an integer to indicate the
>> actual size. These constitute a schema, of sorts, because they tell us
>> later how exactly to deserialize the data. Unfortunately, this
>> information is completely redundant. In all likelihood, the
>> information will be exactly the same for every record in the topic.
>> This problem is essentially the core motivation for serializations
>> like Avro: to move the schema outside of the serialization itself, so
>> that the records won't contain so much redundant information.
>> 
>> In this light, I'm wondering if it makes sense to go back to something
>> like what you had earlier in which you don't support perfectly
>> preserving the concrete type for _this_ serde, but instead just
>> support deserializing to _some_ List. Then, you could defer full,
>> perfect, type preservation to serdes that have an external system in
>> which to register their type information.
>> 
>> There does exist an alternative, if we really do want to preserve the
>> concrete type (which does seem kind of nice). You can add a
>> configuration option specifically for the serde to configure what the
>> list type will be, and maybe what the element type is, as well.
>> 
>> As far as "related work" goes, you might be interested to take a look
>> at how Jackson can be configured to deserialize into a specific,
>> arbitrarily nested, generically parameterized class structure.
>> Specifically, you might find
>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>> interesting.
>> 
>> Thanks,
>> -John
>> 
>> On Mon, Jun 17, 2019 at 12:38 PM Development  wrote:
>>> 
>>> bump
>

Re: [VOTE] KIP-471: Expose RocksDB Metrics in Kafka Streams

+1 (binding)

Thanks Bruno!

Would also be interested to see how much overhead it may incur by enabling
DEBUG metrics now, if it is huge we may consider doing finer-grained
metrics enabling, but that would be another follow-up task.

Guozhang

On Wed, Jun 19, 2019 at 1:37 PM Patrik Kleindl  wrote:

> +1 (non-binding)
> Thanks!
> Best regards
> Patrik
>
> > Am 19.06.2019 um 21:55 schrieb Bill Bejeck :
> >
> > +1 (binding)
> >
> > Thanks,
> > Bill
> >
> >> On Wed, Jun 19, 2019 at 1:19 PM John Roesler  wrote:
> >>
> >> I'm +1 (nonbinding)
> >>
> >> Thanks!
> >> -John
> >>
> >>> On Tue, Jun 18, 2019 at 7:48 AM Bruno Cadonna 
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I would like to start the voting on KIP-471:
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams
> >>>
> >>> You can find the discussion here:
> >>>
> >>
> https://lists.apache.org/thread.html/125bdd984fe0667962018da6ce10bce6d5895c5103955a8e4c730fef@%3Cdev.kafka.apache.org%3E
> >>>
> >>> Best,
> >>> Bruno
> >>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

2019-06-20 Thread Development

Hi John,

Thank you for your input! Yes, my idea looks a little bit over engineered :)

I also wanted to see a feedback from Mathias as well since he gave me an idea 
about storing fixed/variable size entries.

Best,
Daniyar Yeralin

> On Jun 18, 2019, at 6:06 PM, John Roesler  wrote:
> 
> Hi Daniyar,
> 
> That's a very clever solution!
> 
> One observation is that, now, this is what we might call a polymorphic
> serde. That is, you're detecting the actual concrete type and then
> promising to produce the exact same concrete type on read. There are
> some inherent problems with this approach, which in general require
> some kind of  schema registry (not necessarily Schema Registry, just
> any registry for schemas) to solve.
> 
> Notice that every serialized record has quite a bit of duplicated
> information: the concrete type as well as a byte to indicate whether
> the value type is a fixed size, and, if so, an integer to indicate the
> actual size. These constitute a schema, of sorts, because they tell us
> later how exactly to deserialize the data. Unfortunately, this
> information is completely redundant. In all likelihood, the
> information will be exactly the same for every record in the topic.
> This problem is essentially the core motivation for serializations
> like Avro: to move the schema outside of the serialization itself, so
> that the records won't contain so much redundant information.
> 
> In this light, I'm wondering if it makes sense to go back to something
> like what you had earlier in which you don't support perfectly
> preserving the concrete type for _this_ serde, but instead just
> support deserializing to _some_ List. Then, you could defer full,
> perfect, type preservation to serdes that have an external system in
> which to register their type information.
> 
> There does exist an alternative, if we really do want to preserve the
> concrete type (which does seem kind of nice). You can add a
> configuration option specifically for the serde to configure what the
> list type will be, and maybe what the element type is, as well.
> 
> As far as "related work" goes, you might be interested to take a look
> at how Jackson can be configured to deserialize into a specific,
> arbitrarily nested, generically parameterized class structure.
> Specifically, you might find
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> interesting.
> 
> Thanks,
> -John
> 
> On Mon, Jun 17, 2019 at 12:38 PM Development  wrote:
>> 
>> bump

Re: [DISCUSS] KIP-479: Add Materialized to Join

Hello Bill,

Thanks for the KIP. Glad to see that we can likely shooting two birds with
one stone. I have some concerns though about those "two birds" themselves:

1. About not breaking compatibility of stream-stream join materialized
stores: I think this is a valid issue to tackle, but after thinking about
it once more I'm not sure if exposing Materialized would be a good solution
here. My rationles:

1.a For stream-stream join, our current usage of window-store is not ideal,
and we want to modify it in the near future to be more efficient. Not
allowing users to override such state store backend gives us such freedom
(which was also considered in the original DSL design), whereas getting a
Materialized basically kicks out that freedom out of the
window.
1.b For strema-stream join, in our original design we intend to "never"
want users to query the state, since it is just for buffering the upcoming
records from the stream. Now I know that some users may indeed want to
query it from the debugging perspective, but still I concerned about
whether leveraging IQ for debugging purposes would be the right solution
here. And adding Materialized object opens the door to let users query
about it (unless we did something intentionally to still forbids it), which
also restricts us in the future.

2. About the coupling between Materialized.name() and queryable: again I
think this is a valid issue. But I'm not sure if the current
"withQuerryingDisabled / Enabled" at Materialized is the best approach.
Here I think I agree with John, that generally speaking it's better be a
control function on the `KTable` itself, rather than on `Materialized`, so
fixing it via adding functions through `Materialized` seems not a natural
approach either.

Overall, I'm thinking maybe we should still use two stones rather than one
to kill these two birds, and probably for this KIP we just focus on 1)
above. And for that I'd like to not expose the Materialized either for
rationales that I've listed above. Instead, we just restrict KIP-307 to NOT
use the Joined.name for state store names and always use internal names as
well, which admittedly indeed leaves a hole of not being able to cover all
internal names here, but now I feel this `hole` may better be filled by,
e.g. not creating changelog topics but just use the upstream to
re-bootstrap the materialized store, more concretely: when materializing
the store, try to piggy-back the changelog topic on an existing topic, e.g.
a) if the stream is coming directly from some source topic (including
repartition topic), make that as changelog topic and if it is repartition
topic change the retention / data purging policy necessarily as well; b) if
the stream is coming from some stateless operators, delegate that stateless
operator to the parent stream similar as a); if the stream is coming from a
stream-stream join which is the only stateful operator that can result in a
stream, consider merging the join into multi-way joins (yes, this is a very
hand-wavy thought, but the point here is that we do not try to tackle it
now but leave it for a better solution :).

Guozhang

On Wed, Jun 19, 2019 at 11:41 AM John Roesler  wrote:

> Hi Bill,
>
> Thanks for the KIP! Awesome job catching this unexpected consequence
> of the prior KIPs before it was released.
>
> The proposal looks good to me. On top of just fixing the problem, it
> seems to address two other pain points:
> * that naming a state store automatically causes it to become queriable.
> * that there's currently no way to configure the bytes store for join
> windows.
>
> It's awesome that we can fix this issue and two others with one feature.
>
> I'm wondering about a missing quadrant from the truth table involving
> whether a Materialized is stored or not and querying is
> enabled/disabled... What should be the behavior if there is no store
> configured (e.g., if Materialized with only serdes) and querying is
> enabled?
>
> It seems we have two choices:
> 1. we can force creation of a state store in this case, so the store
> can be used to serve the queries
> 2. we can provide just a queriable view, basically letting IQ query
> into the "KTableValueGetter", which would transparently construct the
> query response by applying the operator logic to the upstream state if
> the operator state isn't already stored.
>
> Offhand, it seems like the second is actually a pretty awesome
> capability. But it might have an awkward interaction with the current
> semantics. Presently, if I provide a Materialized.withName, it implies
> that querying should be enabled AND that the view should actually be
> stored in a state store. Under option 2 above, this behavior would
> change to NOT provision a state store and instead just consult the
> ValueGetter. To get back to the current behavior, users would have to
> add a "bytes store supplier" to the Materialized to indicate that,
> yes, they really want a state store there.
>
> Behavior changes are always kind of s

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Hi again, all,

After wrestling with some other issues around the window interface,
I'd propose that we consider normalizing the WindowStore (and
SessionStore) interfaces with respect to KeyValueStore. We can't
actually make the classes related because they'll clash on the
deprecated methods, but we can align them. Doing so would be an
ergonomic advantage, since all our provided store interfaces would
have the same "shape".

Specifically, this means that we would deprecate any method that takes
a raw "K key, long windowStartTime" or returns a raw "K key", and
instead always take as arguments Windowed keys and also return
Windowed results. This proposal already comes close to this, by
re-using the KeyValueIterator>, so it would just be a
few tweaks to "perfect" it.

What do you think?

Thanks,
-John

On Tue, May 28, 2019 at 2:58 PM Matthias J. Sax  wrote:
>
> Let's see what other think about (2)
>
>
>
> (3) Interesting. My reasoning follows the code:
>
> For example, `RocksDbKeyValueBytesStoreSupplier#metricsScope()`returns
> "rocksdb-state" and this is concatenated later in `MeteredKeyValueStore`
> that adds a tag
>
>   key:   metricScope + "-id"
>   value: name()  // this is the store name
>
> Hence, the `-state` part comes from the supplier and thus it seems to be
> part of `[store-type]` (ie, `store-type` == metricScope()` -- not sure
> if this interpretation holds).
>
> If it's not part of `[store-type]` the supplier should not return it as
> `metricScope()` IMHO, but the `MeteredKeyValueStore` should add it
> together with `-id` suffix from my understanding.
>
> Thoughts?
>
>
>
> (5) Renaming to `streams` might imply that we need to rename _all_
> metrics, not just the store metrics that are affected, to avoid
> inconsistencies.
>
> If we really want to rename it, I would rather suggest to do it as part
> of KIP-444, that is a larger rework anyway. It seems to be out of scope
> for this KIP.
>
>
> -Matthias
>
>
> On 5/28/19 12:30 PM, Bruno Cadonna wrote:
> > Hi Matthias,
> >
> > 2)
> > Yes, this is how I meant it.
> > I am still in favour of `value-by-...` instead of `get`, because it
> > gives you immediately the meaning of the metric without the need to
> > know the API of the stores.
> >
> > 3)
> > I asked to avoid misunderstandings because in KIP-444, `-state-` is
> > stated explicitly in the tag.
> >
> > 5)
> > The question was not about correctness. I know that we use `stream` in
> > the metrics group. The question was rather if we want to change it.
> > The streams metrics interface is called `StreamsMetrics`. I understand
> > that this has low priority. Just wanted to mention it.
> >
> > Best,
> > Bruno
> >
> >
> > On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax  
> > wrote:
> >>
> >> Thanks Bruno.
> >>
> >> I updated the KIP without changing the metric name yet -- want to
> >> discuss this first.
> >>
> >>
> >>
> >> 1) I am also ok to keep `all()`
> >>
> >>
> >> 2) That's a good idea. To make sure I understand your suggestion
> >> correctly, below the mapping between metric name an methods.
> >> (Some metric names are use by multiple stores):
> >>
> >> (ReadOnly)KeyValueStore:
> >>
>  value-by-key : ReadOnlyKeyValueStore#get(K key)
> >>
> >> -> replaces `get` metric
> >>
> >> I am actually not sure if I like this. Just keeping `get` instead of
> >> `value-by-key`, `value-by-key-and-time` seems more reasonable to me.
> >>
> >>
> >>
> >> (ReadOnly)WindowStore:
> >>
>  value-by-key-and-time : ReadOnlyWindowStore#get(K key, long 
>  windowStartTime)
> >>
> >> I think a simple `get` might be better
> >>
>  range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant 
>  from, Instant to)
>  range-by-key-and-time-range : WindowStore#range(K key, long fromTime, 
>  long toTime)
> >>
>  range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K 
>  to, Instant from, Instant to)
>  range-by-key-range-and-time-range : WindowStore#range(K from, K to, long 
>  fromTime, long toTime)
> >>
>  range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
>  range-by-time-range : WindowStore#range(long fromTime, long toTime)
> >>
>  range-all : ReadOnlyWindowStore#all()
> >>
> >>
> >> (ReadOnly)SessionStore:
> >>
>  range-by-key : ReadOnlySessionStore#range(K key)
> >>
>  range-by-key-range : ReadOnlySessionStore#range(K from, K to)
>  range-by-key-and-time-range : SessionStore#range(K key, long 
>  earliestSessionEndTime, long latestSessionStartTime)
> >>
>  range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K 
>  keyTo, long earliestSessionEndTime, long latestSessionStartTime)
> >>
>  value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, 
>  long sessionEndTime)
> >>
> >> I think a simple `get` might be better
> >>
>  range-all : ReadOnlyKeyValueStore#all()
> >>
> >>
> >>
> >>
> >> 3) the `state-` part is already contained in

[jira] [Resolved] (KAFKA-8452) Possible Suppress buffer optimization: de-duplicate prior value

2019-06-20 Thread John Roesler (JIRA)



 [ 
https://issues.apache.org/jira/browse/KAFKA-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-8452.
-
Resolution: Fixed

> Possible Suppress buffer optimization: de-duplicate prior value
> ---
>
> Key: KAFKA-8452
> URL: https://issues.apache.org/jira/browse/KAFKA-8452
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Major
>
> As of KAFKA-8199, the suppression buffers have to track the "prior value" in 
> addition to the "old" and "new" values for each record, to support 
> transparent downstream views.
> In many cases, the prior value is actually the same as the old value, and we 
> could avoid storing it separately. The challenge is that the old and new 
> values are already serialized into a common array (as a Change via the 
> FullChangeSerde), so the "prior" value would actually be a slice on the 
> underlying array. But, of course, Java does not have array slices.
> To get around this, we either need to switch to ByteBuffers (which support 
> slices) or break apart the serialized Change into just serialized old and new 
> values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Preliminary blog post for the Apache Kafka 2.3.0 release

2019-06-20 Thread Colin McCabe

On Thu, Jun 20, 2019, at 00:23, Matthias J. Sax wrote:
> Great blog post, Colin!
> 
> Two comments:
> 
> 
> (1) KIP-258: "future features" -> "the ability to return the latest
> timestamp in Interactive Queries"
> 
> This is not a future feature, but the timestamp can be queried in 2.3
> already.

Thanks for the correction.  This should be fixed.

> 
> 
> (2) Why only listing KIP-428; KIP-445 is equally important.
> 

Good point.  I added information about KIP-445.

best,
Colin

> 
> 
> -Matthias
> 
> 
> On 6/19/19 7:02 AM, Ron Dagostino wrote:
> > Looks great, Colin.
> > 
> > I have also enjoyed Stephane Maarek's "What's New in Kafka..." series of
> > videos (e.g. https://www.youtube.com/watch?v=kaWbp1Cnfo4&t=10s).  Having
> > summaries like this in both formats -- blog and video -- for every release
> > would be helpful as different people have different preferences.
> > 
> > Ron
> > 
> > On Tue, Jun 18, 2019 at 8:20 PM Colin McCabe  wrote:
> > 
> >> Thanks, Konstantine.  I reworked the wording a bit -- take a look.
> >>
> >> best,
> >> C.
> >>
> >> On Tue, Jun 18, 2019, at 14:31, Konstantine Karantasis wrote:
> >>> Thanks Colin.
> >>> Great initiative!
> >>>
> >>> Here's a small correction (between **) for KIP-415 with a small
> >> suggestion
> >>> as well (between _ _):
> >>>
> >>> In Kafka Connect, worker tasks are distributed among the available worker
> >>> nodes. When a connector is reconfigured or a new connector is deployed
> >> _as
> >>> well as when a worker is added or removed_, the *tasks* must be
> >> rebalanced
> >>> across the Connect cluster to help ensure that all of the worker nodes
> >> are
> >>> doing a fair share of the Connect work. In 2.2 and earlier, a Connect
> >>> rebalance caused all worker threads to pause while the rebalance
> >> proceeded.
> >>> As of KIP-415, rebalancing is no longer a stop-the-world affair, making
> >>> configuration changes a more pleasant thing.
> >>>
> >>> Cheers,
> >>> Konstantine
> >>>
> >>> On Tue, Jun 18, 2019 at 1:50 PM Swen Moczarski  >>>
> >>> wrote:
> >>>
>  Nice overview!
> 
>  I found some typos:
>  rbmainder
>  bmits
>  implbmentation
> 
>  Am Di., 18. Juni 2019 um 22:43 Uhr schrieb Boyang Chen <
>  bche...@outlook.com
> > :
> 
> > One typo:
> > KIP-428: Add in-mbmory window store
> > should be
> > KIP-428: Add in-memory window store
> >
> >
> > 
> > From: Colin McCabe 
> > Sent: Wednesday, June 19, 2019 4:22 AM
> > To: dev@kafka.apache.org
> > Subject: Re: Preliminary blog post for the Apache Kafka 2.3.0 release
> >
> > Sorry, I copied the wrong URL at first.  Try this URL instead:
> >
> 
> >> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache
> >
> > best,
> > Colin
> >
> > On Tue, Jun 18, 2019, at 13:17, Colin McCabe wrote:
> >> Hmm.  I'm looking to see if there's any way to open up the
> > permissions... :|
> >>
> >>
> >>
> >> On Tue, Jun 18, 2019, at 13:12, M. Manna wrote:
> >>> It’s asking for credentials...?
> >>>
> >>> On Tue, 18 Jun 2019 at 15:10, Colin McCabe 
>  wrote:
> >>>
>  Hi all,
> 
>  I've written up a preliminary blog post about the upcoming
> >> Apache
> > Kafka
>  2.3.0 release.  Take a look and let me know what you think.
> 
> 
> 
> >
> 
> >> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache
> 
>  cheers,
>  Colin
> 
> >>>
> >>
> >
> 
> >>>
> >>
> > 
> 
> 
> Attachments:
> * signature.asc

Re: [VOTE] KIP-474: To deprecate WindowStore#put(key, value)

Not that it changes the outcome, but I'm also +1 (nonbinding).

Been wanting to do this for a while, thanks Omkar!

Now, if we can just get one more binding vote...

-John

On Sun, Jun 9, 2019 at 1:04 AM omkar mestry  wrote:
>
> Hi,
>
> Ok the voting thread is still open.
>
> Thanks Regards
> Omkar Mestry
>
> On Sun, 9 Jun 2019 at 11:33 AM, Matthias J. Sax 
> wrote:
>
> > Omkar,
> >
> > a KIP is accepted if there are 3 binding votes for it. So far, there are
> > only 2. Hence, the KIP is not accepted yet. The vote stays open.
> >
> > Just wait a little longer, until you get one more binding vote.
> >
> >
> > -Matthias
> >
> > On 6/7/19 3:33 AM, omkar mestry wrote:
> > > Thanks to everyone who have voted! I'm closing this vote thread with a
> > > final count:
> > >
> > > binding +1: 2 (Matthias, Guozhang)
> > >
> > > non-binding +1: 2 (Boyang, Dongjin)
> > >
> > > Thanks & Regards
> > > Omkar Mestry
> > >
> > > On Tue, Jun 4, 2019 at 3:05 AM Guozhang Wang  wrote:
> > >
> > >> +1 (binding).
> > >>
> > >> On Sat, Jun 1, 2019 at 3:19 PM Matthias J. Sax 
> > >> wrote:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>> On 5/31/19 10:58 PM, Dongjin Lee wrote:
> >  +1 (non-binding).
> > 
> >  Thanks,
> >  Dongjin
> > 
> >  <
> > >>>
> > >>
> > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon
> > 
> >  Virus-free.
> >  www.avast.com
> >  <
> > >>>
> > >>
> > https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link
> > 
> >  <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> > 
> >  On Sat, Jun 1, 2019 at 2:45 PM Boyang Chen 
> > >> wrote:
> > 
> > > Thanks omkar for taking the initiative, +1 (non-binding).
> > >
> > > 
> > > From: omkar mestry 
> > > Sent: Saturday, June 1, 2019 1:40 PM
> > > To: dev@kafka.apache.org
> > > Subject: [VOTE] KIP-474: To deprecate WindowStore#put(key, value)
> > >
> > > Hi all,
> > >
> > > Since we seem to have an agreement in the discussion I would like to
> > > start the vote on KIP-474.
> > >
> > > KIP 474 :-
> > >
> > >>>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115526545
> > >
> > > Thanks & Regards
> > > Omkar Mestry
> > >
> > 
> > 
> > >>>
> > >>>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
> >

[jira] [Created] (KAFKA-8573) kafka-topics.cmd OOM when connecting to a secure cluster without SSL properties

2019-06-20 Thread Jorg Heymans (JIRA)

Jorg Heymans created KAFKA-8573:
---

 Summary: kafka-topics.cmd OOM when connecting to a secure cluster 
without SSL properties
 Key: KAFKA-8573
 URL: https://issues.apache.org/jira/browse/KAFKA-8573
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 2.2.1
Reporter: Jorg Heymans


When using kafka-topics.cmd to connect to an SSL secured cluster, without 
specifying '--command-config=my-ssl.properties' on OOM is triggered:

 
{noformat}
[2019-06-20 14:25:07,998] ERROR Uncaught exception in thread 
'kafka-admin-client-thread | adminclient-1': 
(org.apache.kafka.common.utils.KafkaThread)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30)
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:385)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:651)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:572)
at org.apache.kafka.common.network.Selector.poll(Selector.java:483)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
at 
org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1131)
at java.lang.Thread.run(Thread.java:748){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (KAFKA-8572) Broker reports not leader partition as an error

2019-06-20 Thread Antony Stubbs (JIRA)

Antony Stubbs created KAFKA-8572:


 Summary: Broker reports not leader partition as an error
 Key: KAFKA-8572
 URL: https://issues.apache.org/jira/browse/KAFKA-8572
 Project: Kafka
  Issue Type: Improvement
Reporter: Antony Stubbs


As this is an expected part of the broker protocol, is error an appropriate log 
level?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (KAFKA-8564) NullPointerException when loading logs at startup

2019-06-20 Thread Edoardo Comar (JIRA)



 [ 
https://issues.apache.org/jira/browse/KAFKA-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-8564.
--
   Resolution: Fixed
Fix Version/s: 2.2.2
   2.1.2
   2.3.0

> NullPointerException when loading logs at startup
> -
>
> Key: KAFKA-8564
> URL: https://issues.apache.org/jira/browse/KAFKA-8564
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Mickael Maison
>Assignee: Edoardo Comar
>Priority: Blocker
> Fix For: 2.3.0, 2.1.2, 2.2.2
>
>
> If brokers restart when topics are being deleted, it's possible to end up 
> with a partition folder with the deleted suffix but without any log segments:
> {quote}ls -la 
> ./kafka-logs/3part3rep5-1.f2ce83b86df9416abe50d2e2299009c2-delete/
> total 8
> drwxr-xr-x@  4 mickael  staff   128  6 Jun 14:35 .
> drwxr-xr-x@ 61 mickael  staff  1952  6 Jun 14:35 ..
> -rw-r--r--@  1 mickael  staff    10  6 Jun 14:32 23261863.snapshot
> -rw-r--r--@  1 mickael  staff 0  6 Jun 14:35 leader-epoch-checkpoint
> {quote}
> From 2.2.1, brokers fail to start when loading such folders:
> {quote}[2019-06-19 09:40:48,123] ERROR There was an error in one of the 
> threads during logs loading: java.lang.NullPointerException 
> (kafka.log.LogManager)
>  [2019-06-19 09:40:48,126] ERROR [KafkaServer id=1] Fatal error during 
> KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
>  java.lang.NullPointerException
>  at kafka.log.Log.activeSegment(Log.scala:1896)
>  at kafka.log.Log.(Log.scala:295)
>  at kafka.log.Log$.apply(Log.scala:2186)
>  at kafka.log.LogManager.loadLog(LogManager.scala:275)
>  at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:345)
>  at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {quote}
> With 2.2.0, upon loading such folders, brokers create a new empty log segment 
> and load that successfully.
> The change of behaviour was introduced in 
> [https://github.com/apache/kafka/commit/f000dab5442ce49c4852823c257b4fb0cdfe15aa]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Preliminary blog post for the Apache Kafka 2.3.0 release