[GitHub] kafka pull request #3819: KAFKA-5576: RocksDB upgrade to 5.8, plus one bug f...

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3819


---


[GitHub] kafka pull request #4032: MINOR: Fix typo

2017-10-05 Thread jeffwidman
GitHub user jeffwidman opened a pull request:

https://github.com/apache/kafka/pull/4032

MINOR: Fix typo



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeffwidman/kafka patch-3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4032


commit d136c4264e30204412f2df5c6ec5ed36f9a18f6c
Author: Jeff Widman 
Date:   2017-10-05T23:56:17Z

MINOR: Fix typo




---


[GitHub] kafka pull request #4031: MINOR: log4j improvements on assigned tasks and st...

2017-10-05 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/4031

MINOR: log4j improvements on assigned tasks and store changelog reader



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka KMinor-assigned-task-log4j

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4031.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4031


commit 6e29bc79de539163858c6594c6d3fae35f80d7be
Author: Guozhang Wang 
Date:   2017-10-05T23:49:27Z

log4j improvements




---


Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

2017-10-05 Thread Richard Yu
We should split KAFKA-4499 into several sub-issues with 4499 being the
parent issue.
Adding the implementation to CachingWindowStore, RocksDBWindowStore, etc
will each require the addition of a test and implementing the methods which
is not trivial.
This way, it should be easier to manage the progress of the KIP.


On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax 
wrote:

> Thanks for driving this and sorry for late response. With release
> deadline it was pretty busy lately.
>
> Can you please add a description for the suggested method, what they are
> going to return? It's a little unclear to me atm.
>
> It would also be helpful to discuss, for which use case each method is
> useful. This might also help to identify potential gaps for which
> another API might be more helpful.
>
> Also, we should talk about provided guarantees when using those APIs
> with regard to consistency -- not saying that we need to provide strong
> guarantees, but he KIP should describe what user can expect.
>
>
> -Matthias
>
> On 9/24/17 8:11 PM, Richard Yu wrote:
> > Hello, I would like to solicit review and comment on this issue (link
> > below):
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> >
>
>


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Ismael Juma
You can update your properties file in the same way with the current
clients, right? I don't understand how mapping properties into a single
string makes things easier.

Ismael

On Thu, Oct 5, 2017 at 11:22 PM, Clebert Suconic 
wrote:

> Another useful feature for this...
>
> Say that I am writing an application... if I write this URI on any of
> my internal properties.. I can tweak my Consumer or Producer without
> changing any code.
>
> Say, you store the URI on your project's configuration xml..read it
> and start your consumer... later on you could just update the XML, and
> your user wouldn't need to rewrite his application.
>
>
> right now if you (as an user) want to do such thing.. you would have
> to write the parser yourself.
>
> On Thu, Oct 5, 2017 at 4:33 PM, Michael Pearce 
> wrote:
> > To me, this is a lot more in line with many other systems connections,
> to have the ability to have a single connection string / uri, is this
> really that left field suggesting or wanting this?
> >
> > If anything this bring kafka more standardised approach imo, to have a
> unified resource identifier, protocol name and a set schema for that.
> >
> > e.g.
> > Database connection strings like
> >
> > oracle:
> > jdbc:oracle:thin:@(description=(address_list=
> >(address=(protocol=tcp)(port=1521)(host=prodHost)))
> > (connect_data=(INSTANCE_NAME=ORCL)))
> >
> > or
> >
> > postgres:
> > "jdbc:postgresql://localhost/test?user=fred&password=secret&ssl=true"
> >
> > And then like wise on messaging front, systems like
> >
> > rabbitmq
> > “amqp://myhost?heartbeat=10&connection_timeout=1”
> >
> >
> > I personally like the suggestion +1.
> >
> >
> > Cheers
> > Mike
> >
> >
> > On 05/10/2017, 20:10, "Clebert Suconic" 
> wrote:
> >
> > On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe 
> wrote:
> > > We used URIs as file paths in Hadoop.  I think it was a mistake,
> for a
> > > few different reasons.
> > >
> > > URIs are actually very complex.  You probably know about scheme,
> host,
> > > and port, but did you know about authority, user-info, query,
> fragment,
> > > scheme-specific-part?  Do you know what they do in Hadoop?  The
> mapping
> > > isn't obvious (and it wouldn't be obvious in Kafka either).
> >
> > URIs are just a hashmap of key=string.. just like Properties...
> >
> > The Consumer and Producer is just having such hashMap.. and these
> > values are easy to translate to boolean, integer.. etc. We would just
> > need to add such mapping as part of this task when done. I don't see
> > anything difficult there.
> >
> >
> > >
> > > When you flip back and forth between URIs and strings (and you
> > > inevitably will do this, when serializing or sending things over
> the
> > > wire), you run into tons of really hard problems.  Should you
> preserve
> > > the "fragment" (the thing after the hash mark) for your URI, or
> not?  It
> > > may not do anything now, but maybe it will do something later.
> URIs
> > > also have complex string escaping rules.  Parsing URIs is very
> messy,
> > > especially when you start talking about non-Java programming
> languages.
> >
> >
> > Why flip back and forth? URIs would generate the same HashMap that's
> > being generated today.. I don't see any mess here.
> > Besides... This would be an addition, not replacement...
> >
> > And I'm talking only about the Java API now.
> >
> > Again, All the properties on ProducerConfig and ConsumerConfig seems
> > easy to be mapped as primitive types (String, numbers.. booleans).
> >
> > Serialization shouldn't be a problem there. it would generate the
> same
> > properties it's generated now.
> >
> > >
> > > URIs are designed for a world where you talk to a single host over
> a
> > > single port.  That isn't the world distributed systems live in.
> You
> > > don't want your clients to fail to bootstrap because the single
> server
> > > you specified is having a bad day, even when the other 8 servers
> are up.
> >
> > I have seen a few projects using this style of URI: I would make it
> > doing the same here:
> >
> > If you have multiple hosts:
> >
> > KafkaConsumer consumer = new
> > KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)
> ?property1=value");
> >
> > if you have a single host:
> > KafkaConsumer consumer = new
> > KafkaConsumer("kafka://host2:port?property1=value&
> property2=value2");
> >
> >
> > One example of an apache project using a similar approach is
> qpid-jms:
> > http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/
> index.html#failover-configuration-options
> >
> >
> > > The bottom line is that URIs are the wrong abstraction for the job.
> > > They just don't express what we really want, and they introduce a
> lot of
> > > complexity and ambiguity.
> >
> > I have seen the opposite to be honest. this has been simpler for 

[GitHub] kafka pull request #3607: [DO NOT MERGE] Existing StreamThread exception han...

2017-10-05 Thread guozhangwang
Github user guozhangwang closed the pull request at:

https://github.com/apache/kafka/pull/3607


---


Re: Contributing to kafka

2017-10-05 Thread Guozhang Wang
Hi Gilles,

Thanks for your interest in contributing, has added you to the list.

Cheers,
Guozhang

On Thu, Oct 5, 2017 at 1:09 PM, Gilles Degols  wrote:

> Hello,
>
>
>
> As I use Kafka during my day-job I would like to contribute to the project
> in my spare time. According the kafka.apache.org website, I need someone
> to
> add me to the contributor list, to assign myself some JIRA tickets.
>
> Could someone add me to the contributor list (username: gilles.degols)?
>
>
>
> Thank you very much,
>
> Regards,
>
> Gilles Degols
>
>


-- 
-- Guozhang


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Clebert Suconic
Another useful feature for this...

Say that I am writing an application... if I write this URI on any of
my internal properties.. I can tweak my Consumer or Producer without
changing any code.

Say, you store the URI on your project's configuration xml..read it
and start your consumer... later on you could just update the XML, and
your user wouldn't need to rewrite his application.


right now if you (as an user) want to do such thing.. you would have
to write the parser yourself.

On Thu, Oct 5, 2017 at 4:33 PM, Michael Pearce  wrote:
> To me, this is a lot more in line with many other systems connections, to 
> have the ability to have a single connection string / uri, is this really 
> that left field suggesting or wanting this?
>
> If anything this bring kafka more standardised approach imo, to have a 
> unified resource identifier, protocol name and a set schema for that.
>
> e.g.
> Database connection strings like
>
> oracle:
> jdbc:oracle:thin:@(description=(address_list=
>(address=(protocol=tcp)(port=1521)(host=prodHost)))
> (connect_data=(INSTANCE_NAME=ORCL)))
>
> or
>
> postgres:
> "jdbc:postgresql://localhost/test?user=fred&password=secret&ssl=true"
>
> And then like wise on messaging front, systems like
>
> rabbitmq
> “amqp://myhost?heartbeat=10&connection_timeout=1”
>
>
> I personally like the suggestion +1.
>
>
> Cheers
> Mike
>
>
> On 05/10/2017, 20:10, "Clebert Suconic"  wrote:
>
> On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe  wrote:
> > We used URIs as file paths in Hadoop.  I think it was a mistake, for a
> > few different reasons.
> >
> > URIs are actually very complex.  You probably know about scheme, host,
> > and port, but did you know about authority, user-info, query, fragment,
> > scheme-specific-part?  Do you know what they do in Hadoop?  The mapping
> > isn't obvious (and it wouldn't be obvious in Kafka either).
>
> URIs are just a hashmap of key=string.. just like Properties...
>
> The Consumer and Producer is just having such hashMap.. and these
> values are easy to translate to boolean, integer.. etc. We would just
> need to add such mapping as part of this task when done. I don't see
> anything difficult there.
>
>
> >
> > When you flip back and forth between URIs and strings (and you
> > inevitably will do this, when serializing or sending things over the
> > wire), you run into tons of really hard problems.  Should you preserve
> > the "fragment" (the thing after the hash mark) for your URI, or not?  It
> > may not do anything now, but maybe it will do something later.  URIs
> > also have complex string escaping rules.  Parsing URIs is very messy,
> > especially when you start talking about non-Java programming languages.
>
>
> Why flip back and forth? URIs would generate the same HashMap that's
> being generated today.. I don't see any mess here.
> Besides... This would be an addition, not replacement...
>
> And I'm talking only about the Java API now.
>
> Again, All the properties on ProducerConfig and ConsumerConfig seems
> easy to be mapped as primitive types (String, numbers.. booleans).
>
> Serialization shouldn't be a problem there. it would generate the same
> properties it's generated now.
>
> >
> > URIs are designed for a world where you talk to a single host over a
> > single port.  That isn't the world distributed systems live in.  You
> > don't want your clients to fail to bootstrap because the single server
> > you specified is having a bad day, even when the other 8 servers are up.
>
> I have seen a few projects using this style of URI: I would make it
> doing the same here:
>
> If you have multiple hosts:
>
> KafkaConsumer consumer = new
> 
> KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)?property1=value");
>
> if you have a single host:
> KafkaConsumer consumer = new
> KafkaConsumer("kafka://host2:port?property1=value&property2=value2");
>
>
> One example of an apache project using a similar approach is qpid-jms:
> 
> http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/index.html#failover-configuration-options
>
>
> > The bottom line is that URIs are the wrong abstraction for the job.
> > They just don't express what we really want, and they introduce a lot of
> > complexity and ambiguity.
>
> I have seen the opposite to be honest. this has been simpler for me
> and users I know than using a HashMap.. .  users in my experience tend
> to write this faster.
>
> users can certainly put up with the HashMap.. but this is easier to
> remember. I'm just proposing what I think it's a simpler API.
>
>
>
>
> Perhaps we should move into the KIP discussion itself here.. I first
> intended to start this thread to see if it would make sense or not...
> But I don't have authorization to create the KIP page.. so again..
> based o

Re: [DISCUSS] KIP-171: Extend Consumer Group Reset Offset for Stream Application

2017-10-05 Thread Matthias J. Sax
Jorge,

KIP-198 (that got merged already) overlaps with this KIP. Can you please
update your KIP accordingly?

Also, while working on KIP-198, we identified some shortcomings in
AdminClient that do not allow us to move StreamsResetter our of core
package. We want to address those shortcoming in another KIP to add
missing functionality to the new AdminClient.

Having say this, and remembering a discussion about dependencies that
might be introduced by this KIP, it might be good to understand those
dependencies in detail. Maybe we can resolve those dependencies somehow
and thus, be able to more StreamsResetter out of core package. Could you
summarize those dependencies in the KIP or just as a reply?

Thanks!


-Matthias

On 9/11/17 3:02 PM, Jorge Esteban Quilcate Otoya wrote:
> Thanks Guozhang!
> 
> I have updated the KIP to:
> 
> 1. Only one scenario param is allowed. If none, `to-earliest` will be used,
> behaving as the current version.
> 
> 2.
>   1. An exception will be printed mentioning that there is no existing
> offsets registered.
>   2. inputTopics format could support define partition numbers as in
> reset-offsets option for kafka-consumer-groups.
> 
> 3. That should be handled by KIP-198.
> 
> I will start the VOTE thread in a following email.
> 
> 
> El mié., 30 ago. 2017 a las 2:01, Guozhang Wang ()
> escribió:
> 
>> Hi Jorge,
>>
>> Thanks for the KIP. It would be a great to add feature to the reset tools.
>> I made a pass over it and it looks good to me overall. I have a few
>> comments:
>>
>> 1. For all the scenarios, do we allow users to specify more than one
>> parameters? If not could you make that clear in the wiki, e.g. we would
>> return with an error message saying that only one is allowed; if yes then
>> what precedence order we are following?
>>
>> 2. Personally I feel that "--by-duration", "--to-offset" and "--shift-by"
>> are a tad overkill, because 1) they assume there exist some committed
>> offset for each of the topic, but that may not be always true, 2) offset /
>> time shifting amount on different topics may not be a good fit universally,
>> i.e. one could imagine the we want to reset all input topics to their
>> offsets of a given time, but resetting all topics' offset to the same value
>> or let all of them shifting the same amount of offsets are usually not
>> applicable. For "--by-duration" it seems could be easily supported by the
>> "to-date".
>>
>> For the general consumer group reset tool, since it could be set one per
>> partition these parameters may be more useful.
>>
>> 3. As for the implementation details, when removing zookeeper config in
>> `kafka-streams-application-reset`, we should consider return a meaning
>> error message otherwise it would be "unrecognized config" blah.
>>
>>
>> If you feel confident about the wiki after discussing about these points,
>> please feel free to move on to start a voting thread. Note that we are
>> about 3 weeks away from KIP deadline and 4 weeks away from feature
>> deadline.
>>
>>
>> Guozhang
>>
>>
>>
>>
>>
>> On Tue, Aug 22, 2017 at 1:45 PM, Matthias J. Sax 
>> wrote:
>>
>>> Thanks for the update Jorge.
>>>
>>> I don't have any further comments.
>>>
>>>
>>> -Matthias
>>>
>>> On 8/12/17 6:43 PM, Jorge Esteban Quilcate Otoya wrote:
 I have updated the KIP:

 - Change execution parameters, using `--dry-run`
 - Reference KAFKA-4327
 - And advise about changes on `StreamResetter`

 Also includes that it will cover a change on `ConsumerGroupCommand` to
 align execution options.

 El dom., 16 jul. 2017 a las 5:37, Matthias J. Sax (<
>>> matth...@confluent.io>)
 escribió:

> Thanks a lot for the update!
>
> I like the KIP!
>
> One more question about `--dry-run` vs `--execute`: While I agree that
> we should use the same flag for both tools, I am not sure which one is
> the better one... My personal take is, that I like `--dry-run` better.
> Not sure what others think.
>
> One more comment: with the removal of ZK, we can also tackle this
>> JIRA:
> https://issues.apache.org/jira/browse/KAFKA-4327 If we do so, I think
>>> we
> should mention it in the KIP.
>
> I am also not sure about backward compatibility issue for this case.
> Actually, I don't expect people to call `StreamsResetter` from Java
> code, but you can never know. So if we break this, we need to make
>> sure
> to cover it in the KIP and later on in the release notes.
>
>
> -Matthias
>
> On 7/14/17 7:15 AM, Jorge Esteban Quilcate Otoya wrote:
>> Hi,
>>
>> KIP is updated.
>> Changes:
>> 1. Approach discussed to keep both tools (streams application
>> resetter
> and
>> consumer group reset offset).
>> 2. Options has been aligned between both tools.
>> 3. Zookeeper option from streams-application-resetted will be
>> removed,
> and
>> replaced internally for Kafka AdminClient.
>>
>> Looki

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

2017-10-05 Thread Matthias J. Sax
Thanks for driving this and sorry for late response. With release
deadline it was pretty busy lately.

Can you please add a description for the suggested method, what they are
going to return? It's a little unclear to me atm.

It would also be helpful to discuss, for which use case each method is
useful. This might also help to identify potential gaps for which
another API might be more helpful.

Also, we should talk about provided guarantees when using those APIs
with regard to consistency -- not saying that we need to provide strong
guarantees, but he KIP should describe what user can expect.


-Matthias

On 9/24/17 8:11 PM, Richard Yu wrote:
> Hello, I would like to solicit review and comment on this issue (link
> below):
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> 



signature.asc
Description: OpenPGP digital signature


[GitHub] kafka pull request #4030: KAFKA-5953: Register all jdbc drivers available in...

2017-10-05 Thread kkonstantine
GitHub user kkonstantine opened a pull request:

https://github.com/apache/kafka/pull/4030

KAFKA-5953: Register all jdbc drivers available in plugin and class paths



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkonstantine/kafka 
KAFKA-5953-Connect-classloader-isolation-may-be-broken-for-JDBC-drivers

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4030






---


[GitHub] kafka pull request #4029: KAFKA-6016: Make the reassign partitions system te...

2017-10-05 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/4029

KAFKA-6016: Make the reassign partitions system test use the idempotent 
producer

With these changes, we are ensuring that the partitions being reassigned 
are from non-zero offsets. We also ensure that every message in the log has 
producerId and sequence number. 

This means that it successfully reproduces 
https://issues.apache.org/jira/browse/KAFKA-6003, as can be seen below:

```

[2017-10-05 20:57:00,466] ERROR [ReplicaFetcher replicaId=1, leaderId=4, 
fetcherId=0] Error due to (kafka.server.ReplicaFetcherThread)
kafka.common.KafkaException: Error processing data for partition 
test_topic-16 offset 682
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:203)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:171)
at scala.Option.foreach(Option.scala:257)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:171)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:168)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:168)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:168)
at 
kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:168)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:218)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:166)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:109)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
Caused by: org.apache.kafka.common.errors.UnknownProducerIdException: Found 
no record of producerId=1000 on the broker. It is possible that the last 
message with the producerId=1000 has been removed due to hitting the retention 
limit.
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-6016-add-idempotent-producer-to-reassign-partitions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4029.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4029


commit af48d74be4f2c4473d8f97664ff0f3e450bfe3ec
Author: Apurva Mehta 
Date:   2017-10-05T05:27:23Z

Initial commit trying to create the scenario where we are creating a
replica from scratch but starting from a non zero sequence when doing
so.

commit 9566f91b00a5a7c249823107e4792b844809ccca
Author: Apurva Mehta 
Date:   2017-10-05T05:52:24Z

Use retention bytes to force segment deletion

commit 6087b3ed01472d24677623c9b3ef92a3678da96f
Author: Apurva Mehta 
Date:   2017-10-05T21:16:47Z

Configure the log so that we can reproduce the case where we are building 
producer state from a non zero sequence




---


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Michael Pearce
To me, this is a lot more in line with many other systems connections, to have 
the ability to have a single connection string / uri, is this really that left 
field suggesting or wanting this?

If anything this bring kafka more standardised approach imo, to have a unified 
resource identifier, protocol name and a set schema for that.

e.g.
Database connection strings like

oracle:
jdbc:oracle:thin:@(description=(address_list=
   (address=(protocol=tcp)(port=1521)(host=prodHost)))
(connect_data=(INSTANCE_NAME=ORCL)))

or

postgres:
"jdbc:postgresql://localhost/test?user=fred&password=secret&ssl=true"

And then like wise on messaging front, systems like

rabbitmq
“amqp://myhost?heartbeat=10&connection_timeout=1”


I personally like the suggestion +1.


Cheers
Mike


On 05/10/2017, 20:10, "Clebert Suconic"  wrote:

On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe  wrote:
> We used URIs as file paths in Hadoop.  I think it was a mistake, for a
> few different reasons.
>
> URIs are actually very complex.  You probably know about scheme, host,
> and port, but did you know about authority, user-info, query, fragment,
> scheme-specific-part?  Do you know what they do in Hadoop?  The mapping
> isn't obvious (and it wouldn't be obvious in Kafka either).

URIs are just a hashmap of key=string.. just like Properties...

The Consumer and Producer is just having such hashMap.. and these
values are easy to translate to boolean, integer.. etc. We would just
need to add such mapping as part of this task when done. I don't see
anything difficult there.


>
> When you flip back and forth between URIs and strings (and you
> inevitably will do this, when serializing or sending things over the
> wire), you run into tons of really hard problems.  Should you preserve
> the "fragment" (the thing after the hash mark) for your URI, or not?  It
> may not do anything now, but maybe it will do something later.  URIs
> also have complex string escaping rules.  Parsing URIs is very messy,
> especially when you start talking about non-Java programming languages.


Why flip back and forth? URIs would generate the same HashMap that's
being generated today.. I don't see any mess here.
Besides... This would be an addition, not replacement...

And I'm talking only about the Java API now.

Again, All the properties on ProducerConfig and ConsumerConfig seems
easy to be mapped as primitive types (String, numbers.. booleans).

Serialization shouldn't be a problem there. it would generate the same
properties it's generated now.

>
> URIs are designed for a world where you talk to a single host over a
> single port.  That isn't the world distributed systems live in.  You
> don't want your clients to fail to bootstrap because the single server
> you specified is having a bad day, even when the other 8 servers are up.

I have seen a few projects using this style of URI: I would make it
doing the same here:

If you have multiple hosts:

KafkaConsumer consumer = new

KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)?property1=value");

if you have a single host:
KafkaConsumer consumer = new
KafkaConsumer("kafka://host2:port?property1=value&property2=value2");


One example of an apache project using a similar approach is qpid-jms:

http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/index.html#failover-configuration-options


> The bottom line is that URIs are the wrong abstraction for the job.
> They just don't express what we really want, and they introduce a lot of
> complexity and ambiguity.

I have seen the opposite to be honest. this has been simpler for me
and users I know than using a HashMap.. .  users in my experience tend
to write this faster.

users can certainly put up with the HashMap.. but this is easier to
remember. I'm just proposing what I think it's a simpler API.




Perhaps we should move into the KIP discussion itself here.. I first
intended to start this thread to see if it would make sense or not...
But I don't have authorization to create the KIP page.. so again..
based on the contributing page.. can someone add me authorizations to
the WIKI space?


The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008

[GitHub] kafka pull request #4028: Update docs to reflect kafka trademark status

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4028


---


[GitHub] kafka-site pull request #90: Kafka registered trademark

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka-site/pull/90


---


[GitHub] kafka-site issue #90: Kafka registered trademark

2017-10-05 Thread junrao
Github user junrao commented on the issue:

https://github.com/apache/kafka-site/pull/90
  
@derrickdoo : Thanks for the patch. LGTM


---


[GitHub] kafka pull request #4028: Update docs to reflect kafka trademark status

2017-10-05 Thread derrickdoo
GitHub user derrickdoo opened a pull request:

https://github.com/apache/kafka/pull/4028

Update docs to reflect kafka trademark status

Updated a couple places in docs with the 'registered' trademark symbol.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/derrickdoo/kafka kafka-trademark-status

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4028.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4028


commit 2b3fb94e7d786cb60a164bed738614e4835edafb
Author: Derrick Or 
Date:   2017-10-05T20:11:37Z

update docs to reflect kafka trademark status




---


Contributing to kafka

2017-10-05 Thread Gilles Degols
Hello,

 

As I use Kafka during my day-job I would like to contribute to the project
in my spare time. According the kafka.apache.org website, I need someone to
add me to the contributor list, to assign myself some JIRA tickets. 

Could someone add me to the contributor list (username: gilles.degols)?

 

Thank you very much,

Regards,

Gilles Degols



[GitHub] kafka pull request #4025: KAFKA-5989: resume consumption of tasks that have ...

2017-10-05 Thread dguy
Github user dguy closed the pull request at:

https://github.com/apache/kafka/pull/4025


---


[GitHub] kafka pull request #4027: MINOR: fix inconsistance

2017-10-05 Thread lisa2lisa
GitHub user lisa2lisa opened a pull request:

https://github.com/apache/kafka/pull/4027

MINOR: fix inconsistance



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lisa2lisa/kafka typo-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4027


commit eadb291fefa64163d0d6fbc5272fac314df8da4d
Author: Xin Li 
Date:   2017-10-05T19:43:15Z

MINOR: fix inconsistance




---


[jira] [Resolved] (KAFKA-2376) Add Kafka Connect metrics

2017-10-05 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-2376.
--
Resolution: Fixed

All of the subtasks have been completed, so marking this as fixed.

> Add Kafka Connect metrics
> -
>
> Key: KAFKA-2376
> URL: https://issues.apache.org/jira/browse/KAFKA-2376
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Randall Hauch
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 1.0.0
>
>
> Kafka Connect needs good metrics for monitoring since that will be the 
> primary insight into the health of connectors as they copy data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka-site pull request #90: Kafka registered trademark

2017-10-05 Thread derrickdoo
GitHub user derrickdoo opened a pull request:

https://github.com/apache/kafka-site/pull/90

Kafka registered trademark

Update art assets and content to reflect registered status of the Kafka 
trademark

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/derrickdoo/kafka-site 
docs-110-registered-trademark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/90.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #90


commit 72b3bc737c4d53a8a4685adca2ef0e5a0f65c355
Author: Joel Hamill 
Date:   2017-10-04T00:51:03Z

MINOR: Update verbiage on landing page

Author: Joel Hamill 
Author: Joel Hamill <11722533+joel-ham...@users.noreply.github.com>

Reviewers: Guozhang Wang , Michael G. Noll 
, Damian Guy 

Closes #77 from joel-hamill/joel-hamill/nav-fixes-streams

commit 2603fb667570593e1db06a91e22fb4c9a018
Author: Derrick Or 
Date:   2017-10-05T19:41:11Z

update art assets and content to reflect registered status of kafka 
trademark




---


[jira] [Resolved] (KAFKA-5904) Create Connect metrics for worker rebalances

2017-10-05 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-5904.
--
Resolution: Fixed

Resolved as part of the [PR](https://github.com/apache/kafka/pull/4011) for 
KAFKA-5903.

> Create Connect metrics for worker rebalances
> 
>
> Key: KAFKA-5904
> URL: https://issues.apache.org/jira/browse/KAFKA-5904
> Project: Kafka
>  Issue Type: Sub-task
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 1.0.0
>
>
> See KAFKA-2376 for parent task and 
> [KIP-196|https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework]
>  for the details on the metrics. This subtask is to create the "Worker 
> Rebalance Metrics".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #4026: KAFKA-5746: Document new broker metrics added for ...

2017-10-05 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/4026

KAFKA-5746: Document new broker metrics added for health checks



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka MINOR-KIP-188-metrics-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4026


commit 924e3d2f56a6c24a42018f396612c95f02cc5fe1
Author: Rajini Sivaram 
Date:   2017-10-05T19:12:15Z

KAFKA-5746: Document new broker metrics added for health checks




---


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Clebert Suconic
On Thu, Oct 5, 2017 at 2:20 PM, Colin McCabe  wrote:
> We used URIs as file paths in Hadoop.  I think it was a mistake, for a
> few different reasons.
>
> URIs are actually very complex.  You probably know about scheme, host,
> and port, but did you know about authority, user-info, query, fragment,
> scheme-specific-part?  Do you know what they do in Hadoop?  The mapping
> isn't obvious (and it wouldn't be obvious in Kafka either).

URIs are just a hashmap of key=string.. just like Properties...

The Consumer and Producer is just having such hashMap.. and these
values are easy to translate to boolean, integer.. etc. We would just
need to add such mapping as part of this task when done. I don't see
anything difficult there.


>
> When you flip back and forth between URIs and strings (and you
> inevitably will do this, when serializing or sending things over the
> wire), you run into tons of really hard problems.  Should you preserve
> the "fragment" (the thing after the hash mark) for your URI, or not?  It
> may not do anything now, but maybe it will do something later.  URIs
> also have complex string escaping rules.  Parsing URIs is very messy,
> especially when you start talking about non-Java programming languages.


Why flip back and forth? URIs would generate the same HashMap that's
being generated today.. I don't see any mess here.
Besides... This would be an addition, not replacement...

And I'm talking only about the Java API now.

Again, All the properties on ProducerConfig and ConsumerConfig seems
easy to be mapped as primitive types (String, numbers.. booleans).

Serialization shouldn't be a problem there. it would generate the same
properties it's generated now.

>
> URIs are designed for a world where you talk to a single host over a
> single port.  That isn't the world distributed systems live in.  You
> don't want your clients to fail to bootstrap because the single server
> you specified is having a bad day, even when the other 8 servers are up.

I have seen a few projects using this style of URI: I would make it
doing the same here:

If you have multiple hosts:

KafkaConsumer consumer = new
KafkaConsumer("kafka:(kafka://host1:port,kafka://host2:port)?property1=value");

if you have a single host:
KafkaConsumer consumer = new
KafkaConsumer("kafka://host2:port?property1=value&property2=value2");


One example of an apache project using a similar approach is qpid-jms:
http://qpid.apache.org/releases/qpid-jms-0.25.0/docs/index.html#failover-configuration-options


> The bottom line is that URIs are the wrong abstraction for the job.
> They just don't express what we really want, and they introduce a lot of
> complexity and ambiguity.

I have seen the opposite to be honest. this has been simpler for me
and users I know than using a HashMap.. .  users in my experience tend
to write this faster.

users can certainly put up with the HashMap.. but this is easier to
remember. I'm just proposing what I think it's a simpler API.




Perhaps we should move into the KIP discussion itself here.. I first
intended to start this thread to see if it would make sense or not...
But I don't have authorization to create the KIP page.. so again..
based on the contributing page.. can someone add me authorizations to
the WIKI space?


Re: [DISCUSS] KIP-207: Offsets returned by ListOffsetsResponse should be monotonically increasing even during a partition leader change

2017-10-05 Thread Tom Bentley
Hi Colin,

Is it really true that "the period when the offset is unavailable should be
brief"? I'm thinking about a producer with acks=1, so the old leader
returns the ProduceResponse immediately and then is replaced before it can
sent a FetchResponse to any followers. The new leader is then waiting for
more messages from producers in order for its high watermark to increase
(because it's log doesn't have the original messages in, so its HW can't
catch up with this). This wait could be be arbitrarily long.

I rather suspect this isn't a problem really and that I misunderstand the
precise details of the protocol, but it would be beneficial to me to
discover my misconceptions.

Thanks,

Tom



On 5 October 2017 at 19:23, Colin McCabe  wrote:

> Hi all,
>
> I created a KIP for discussion about fixing a corner case in
> ListOffsetsResponse.  Check it out at:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 207%3A+Offsets+returned+by+ListOffsetsResponse+should+be+
> monotonically+increasing+even+during+a+partition+leader+change
>
> cheers,
> Colin
>


[GitHub] kafka pull request #4011: KAFKA-5903: Added Connect metrics to the worker an...

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4011


---


[jira] [Resolved] (KAFKA-5903) Create Connect metrics for workers

2017-10-05 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-5903.
--
Resolution: Fixed

Issue resolved by pull request 4011
[https://github.com/apache/kafka/pull/4011]

> Create Connect metrics for workers
> --
>
> Key: KAFKA-5903
> URL: https://issues.apache.org/jira/browse/KAFKA-5903
> Project: Kafka
>  Issue Type: Sub-task
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Randall Hauch
>Priority: Blocker
> Fix For: 1.0.0
>
>
> See KAFKA-2376 for parent task and 
> [KIP-196|https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework]
>  for the details on the metrics. This subtask is to create the "Worker 
> Metrics".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[DISCUSS] KIP-207: Offsets returned by ListOffsetsResponse should be monotonically increasing even during a partition leader change

2017-10-05 Thread Colin McCabe
Hi all,

I created a KIP for discussion about fixing a corner case in
ListOffsetsResponse.  Check it out at:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-207%3A+Offsets+returned+by+ListOffsetsResponse+should+be+monotonically+increasing+even+during+a+partition+leader+change

cheers,
Colin


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Colin McCabe
We used URIs as file paths in Hadoop.  I think it was a mistake, for a
few different reasons.

URIs are actually very complex.  You probably know about scheme, host,
and port, but did you know about authority, user-info, query, fragment,
scheme-specific-part?  Do you know what they do in Hadoop?  The mapping
isn't obvious (and it wouldn't be obvious in Kafka either).

When you flip back and forth between URIs and strings (and you
inevitably will do this, when serializing or sending things over the
wire), you run into tons of really hard problems.  Should you preserve
the "fragment" (the thing after the hash mark) for your URI, or not?  It
may not do anything now, but maybe it will do something later.  URIs
also have complex string escaping rules.  Parsing URIs is very messy,
especially when you start talking about non-Java programming languages.

URIs are designed for a world where you talk to a single host over a
single port.  That isn't the world distributed systems live in.  You
don't want your clients to fail to bootstrap because the single server
you specified is having a bad day, even when the other 8 servers are up.

What we want is a list of servers.  But URIs don't give us that.  That's
why in HDFS, we introduced another layer of indirection, so that the
"URI hostname" maps to an entry in a configuration file, which then maps
to a list of hostnames.

Later on, we found out that people wanted a unified namespace.  They
wanted to be able to access /foo without caring whether it was on s3,
the first hdfs cluster, or the second hdfs cluster.  But our use of URIs
for paths had made that impossible.  If the path was on s3, it had to be
accessed via s3://mybucketname/foo.  If it was on the first hdfs
cluster, it had to be accessed by hdfs://myhdfs1name/foo.  And so on. 
We had re-invented the equivalent of DOS drive letters: ugly, clunky
drive letter prefixes that to chaperone around every path name.

The bottom line is that URIs are the wrong abstraction for the job. 
They just don't express what we really want, and they introduce a lot of
complexity and ambiguity.

best,
Colin


On Thu, Oct 5, 2017, at 08:08, Clebert Suconic wrote:
> I can start a KIP discussion on this.. or not if you really think this
> is against basic rules...
> 
> 
> I will need authorization to create the page.. if you could assign me
> regardless so I can have it for next time?
> 
> On Thu, Oct 5, 2017 at 10:31 AM, Clebert Suconic
>  wrote:
> > Just as a facility for users... I think it would be easier to
> > prototype consumers and producer by simply doing new
> > Consumer("tcp://HOST:PORT") or new Producer("tcp://HOST:PORT")...
> >
> > on the other project I work (ActiveMQ Artemis) we used to do a similar
> > way to what Kafka does..we then provided the URI support and I now
> > think the URI was a lot easier.
> >
> > I'm just trying to leverage my experience into here... I'm an apache
> > committer at ActiveMQ Artemis.. I think I could bring some goodies
> > into Kafka.. I see no reason to be a competitor.. instead I'm all up
> > to contribute here as well.  And I was looking for something small and
> > easy to start with.
> >
> >
> >
> >
> >
> >
> > On Thu, Oct 5, 2017 at 10:15 AM, Jay Kreps  wrote:
> >> Hey Clebert,
> >>
> >> Is there a motivation for adding a second way? We generally try to avoid
> >> having two ways to do something unless it's really needed...I suspect you
> >> have a reason for wanting this, though.
> >>
> >> -Jay
> >>
> >> On Mon, Oct 2, 2017 at 6:15 AM Clebert Suconic 
> >> wrote:
> >>
> >>> At ActiveMQ and ActiveMQ Artemis, ConnectionFactories have an
> >>> interesting feature where you can pass parameters through an URI.
> >>>
> >>> I was looking at Producer and Consumer APIs, and these two classes are
> >>> using a method that I considered old for Artemis resembling HornetQ:
> >>>
> >>> Instead of passing a Properties (aka HashMaps), users would be able to
> >>> create a Consumer or Producer by simply doing:
> >>>
> >>> new Consumer("tcp::/host:port?properties=values;properties=values...etc");
> >>>
> >>> Example:
> >>>
> >>>
> >>> Instead of the following:
> >>>
> >>> Map config = new HashMap<>();
> >>> config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:");
> >>> config.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, -2);
> >>> new KafkaConsumer<>(config, new ByteArrayDeserializer(), new
> >>> ByteArrayDeserializer());
> >>>
> >>>
> >>>
> >>> Someone could do
> >>>
> >>> new KafkaConsumer<>("tcp://localhost:?receive.buffer.bytes=-2",
> >>> new ByteArrayDeserializer(), new ByteArrayDeserializer());
> >>>
> >>>
> >>>
> >>> I don't know if that little API improvement would be welcomed? I would be
> >>> able to send a Pull Request but I don't want to do it if that wouldn't
> >>> be welcomed in the first place:
> >>>
> >>>
> >>> Just an idea...  let me know if that is welcomed or not.
> >>>
> >>> If so I can forward the discussion into how I would implement it.
> >>>
> >
> >
> >
> > --
> >

[GitHub] kafka-site pull request #89: Back out changes to index

2017-10-05 Thread joel-hamill
Github user joel-hamill closed the pull request at:

https://github.com/apache/kafka-site/pull/89


---


[GitHub] kafka pull request #4017: Rename streams tutorial and quickstart

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4017


---


[jira] [Resolved] (KAFKA-6012) NoSuchElementException in markErrorMeter during TransactionsBounceTest

2017-10-05 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-6012.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 4024
[https://github.com/apache/kafka/pull/4024]

> NoSuchElementException in markErrorMeter during TransactionsBounceTest
> --
>
> Key: KAFKA-6012
> URL: https://issues.apache.org/jira/browse/KAFKA-6012
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ismael Juma
>Assignee: Rajini Sivaram
>Priority: Blocker
> Fix For: 1.1.0, 1.0.0
>
>
> I think this is probably a test issue, but setting as "Blocker" until we can 
> confirm that.
> {code}
> Error
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
> output-topic-0: 10467 ms has passed since batch creation plus linger time
> Stacktrace
> org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
> output-topic-0: 10467 ms has passed since batch creation plus linger time
> Standard Output
> [2017-10-05 00:29:31,327] ERROR ZKShutdownHandler is not registered, so 
> ZooKeeper server won't take any action on ERROR or SHUTDOWN server state 
> changes (org.apache.zookeeper.server.ZooKeeperServer:472)
> [2017-10-05 00:29:31,877] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition input-topic-0 to broker 
> %1:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. 
> (kafka.server.ReplicaFetcherThread:101)
> [2017-10-05 00:29:31,877] ERROR [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Error for partition input-topic-0 to broker 
> %1:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. 
> (kafka.server.ReplicaFetcherThread:101)
> [2017-10-05 00:29:31,877] ERROR [ReplicaFetcher replicaId=0, leaderId=2, 
> fetcherId=0] Error for partition input-topic-1 to broker 
> %2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. 
> (kafka.server.ReplicaFetcherThread:101)
> [2017-10-05 00:29:32,268] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition output-topic-1 to broker 
> %1:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. 
> (kafka.server.ReplicaFetcherThread:101)
> [2017-10-05 00:29:32,284] ERROR [ReplicaFetcher replicaId=2, leaderId=1, 
> fetcherId=0] Error for partition output-topic-1 to broker 
> %1:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
> server does not host this topic-partition. 
> (kafka.server.ReplicaFetcherThread:101)
> [2017-10-05 00:29:44,283] ERROR [KafkaApi-0] Error when handling request 
> {controller_id=0,controller_epoch=1,delete_partitions=false,partitions=[{topic=input-topic,partition=1}]}
>  (kafka.server.KafkaApis:107)
> java.util.NoSuchElementException: key not found: NONE
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at scala.collection.AbstractMap.default(Map.scala:59)
>   at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
>   at kafka.network.RequestMetrics.markErrorMeter(RequestChannel.scala:410)
>   at 
> kafka.network.RequestChannel$$anonfun$updateErrorMetrics$1.apply(RequestChannel.scala:315)
>   at 
> kafka.network.RequestChannel$$anonfun$updateErrorMetrics$1.apply(RequestChannel.scala:314)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> kafka.network.RequestChannel.updateErrorMetrics(RequestChannel.scala:314)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$sendResponse$1.apply(KafkaApis.scala:2092)
>   at 
> kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$sendResponse$1.apply(KafkaApis.scala:2092)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> kafka.server.KafkaApis.kafka$server$KafkaApis$$sendResponse(KafkaApis.scala:2092)
>   at 
> kafka.server.KafkaApis.sendResponseExemptThrottle(KafkaApis.scala:2061)
>   at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:202)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:104)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:65)
> {code}
> https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/2106/tests



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #4024: KAFKA-6012: Close request metrics only after closi...

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4024


---


[GitHub] kafka pull request #4025: KAFKA-5989: resume consumption of tasks that have ...

2017-10-05 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/4025

KAFKA-5989: resume consumption of tasks that have state stores but no 
changelogging

Stores where logging is disabled where never consumed as the partitions 
were paused, but never resumed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka 1.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4025


commit 387cc7adb77bc3439cd2860870269a28a78c3c3d
Author: Damian Guy 
Date:   2017-10-05T15:28:35Z

applying patch from trunk




---


[GitHub] kafka pull request #4002: KAFKA-5989: resume consumption of tasks that have ...

2017-10-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4002


---


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Clebert Suconic
I can start a KIP discussion on this.. or not if you really think this
is against basic rules...


I will need authorization to create the page.. if you could assign me
regardless so I can have it for next time?

On Thu, Oct 5, 2017 at 10:31 AM, Clebert Suconic
 wrote:
> Just as a facility for users... I think it would be easier to
> prototype consumers and producer by simply doing new
> Consumer("tcp://HOST:PORT") or new Producer("tcp://HOST:PORT")...
>
> on the other project I work (ActiveMQ Artemis) we used to do a similar
> way to what Kafka does..we then provided the URI support and I now
> think the URI was a lot easier.
>
> I'm just trying to leverage my experience into here... I'm an apache
> committer at ActiveMQ Artemis.. I think I could bring some goodies
> into Kafka.. I see no reason to be a competitor.. instead I'm all up
> to contribute here as well.  And I was looking for something small and
> easy to start with.
>
>
>
>
>
>
> On Thu, Oct 5, 2017 at 10:15 AM, Jay Kreps  wrote:
>> Hey Clebert,
>>
>> Is there a motivation for adding a second way? We generally try to avoid
>> having two ways to do something unless it's really needed...I suspect you
>> have a reason for wanting this, though.
>>
>> -Jay
>>
>> On Mon, Oct 2, 2017 at 6:15 AM Clebert Suconic 
>> wrote:
>>
>>> At ActiveMQ and ActiveMQ Artemis, ConnectionFactories have an
>>> interesting feature where you can pass parameters through an URI.
>>>
>>> I was looking at Producer and Consumer APIs, and these two classes are
>>> using a method that I considered old for Artemis resembling HornetQ:
>>>
>>> Instead of passing a Properties (aka HashMaps), users would be able to
>>> create a Consumer or Producer by simply doing:
>>>
>>> new Consumer("tcp::/host:port?properties=values;properties=values...etc");
>>>
>>> Example:
>>>
>>>
>>> Instead of the following:
>>>
>>> Map config = new HashMap<>();
>>> config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:");
>>> config.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, -2);
>>> new KafkaConsumer<>(config, new ByteArrayDeserializer(), new
>>> ByteArrayDeserializer());
>>>
>>>
>>>
>>> Someone could do
>>>
>>> new KafkaConsumer<>("tcp://localhost:?receive.buffer.bytes=-2",
>>> new ByteArrayDeserializer(), new ByteArrayDeserializer());
>>>
>>>
>>>
>>> I don't know if that little API improvement would be welcomed? I would be
>>> able to send a Pull Request but I don't want to do it if that wouldn't
>>> be welcomed in the first place:
>>>
>>>
>>> Just an idea...  let me know if that is welcomed or not.
>>>
>>> If so I can forward the discussion into how I would implement it.
>>>
>
>
>
> --
> Clebert Suconic



-- 
Clebert Suconic


[jira] [Resolved] (KAFKA-5978) Transient failure in SslTransportLayerTest.testNetworkThreadTimeRecorded

2017-10-05 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-5978.
---
Resolution: Fixed
  Assignee: Rajini Sivaram

This could be because the times are too small, increased message size in 
KAFKA-6010 to avoid the failure.

> Transient failure in SslTransportLayerTest.testNetworkThreadTimeRecorded
> 
>
> Key: KAFKA-5978
> URL: https://issues.apache.org/jira/browse/KAFKA-5978
> Project: Kafka
>  Issue Type: Bug
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>
> Stack trace:
> {quote}
> java.lang.AssertionError: Send time not recorded
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.kafka.common.network.SslTransportLayerTest.testNetworkThreadTimeRecorded(SslTransportLayerTest.java:602)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Clebert Suconic
Just as a facility for users... I think it would be easier to
prototype consumers and producer by simply doing new
Consumer("tcp://HOST:PORT") or new Producer("tcp://HOST:PORT")...

on the other project I work (ActiveMQ Artemis) we used to do a similar
way to what Kafka does..we then provided the URI support and I now
think the URI was a lot easier.

I'm just trying to leverage my experience into here... I'm an apache
committer at ActiveMQ Artemis.. I think I could bring some goodies
into Kafka.. I see no reason to be a competitor.. instead I'm all up
to contribute here as well.  And I was looking for something small and
easy to start with.






On Thu, Oct 5, 2017 at 10:15 AM, Jay Kreps  wrote:
> Hey Clebert,
>
> Is there a motivation for adding a second way? We generally try to avoid
> having two ways to do something unless it's really needed...I suspect you
> have a reason for wanting this, though.
>
> -Jay
>
> On Mon, Oct 2, 2017 at 6:15 AM Clebert Suconic 
> wrote:
>
>> At ActiveMQ and ActiveMQ Artemis, ConnectionFactories have an
>> interesting feature where you can pass parameters through an URI.
>>
>> I was looking at Producer and Consumer APIs, and these two classes are
>> using a method that I considered old for Artemis resembling HornetQ:
>>
>> Instead of passing a Properties (aka HashMaps), users would be able to
>> create a Consumer or Producer by simply doing:
>>
>> new Consumer("tcp::/host:port?properties=values;properties=values...etc");
>>
>> Example:
>>
>>
>> Instead of the following:
>>
>> Map config = new HashMap<>();
>> config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:");
>> config.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, -2);
>> new KafkaConsumer<>(config, new ByteArrayDeserializer(), new
>> ByteArrayDeserializer());
>>
>>
>>
>> Someone could do
>>
>> new KafkaConsumer<>("tcp://localhost:?receive.buffer.bytes=-2",
>> new ByteArrayDeserializer(), new ByteArrayDeserializer());
>>
>>
>>
>> I don't know if that little API improvement would be welcomed? I would be
>> able to send a Pull Request but I don't want to do it if that wouldn't
>> be welcomed in the first place:
>>
>>
>> Just an idea...  let me know if that is welcomed or not.
>>
>> If so I can forward the discussion into how I would implement it.
>>



-- 
Clebert Suconic


[GitHub] kafka pull request #4024: KAFKA-6012: Close request metrics only after closi...

2017-10-05 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/4024

KAFKA-6012: Close request metrics only after closing request handlers



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-6012-error-metric

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4024


commit 4a891452343d6176d48dcb7cab180ae2a650dc81
Author: Rajini Sivaram 
Date:   2017-10-05T14:19:10Z

KAFKA-6012: Close request metrics only after closing request handlers




---


Re: [DISCUSS] URIs on Producer and Consumer

2017-10-05 Thread Jay Kreps
Hey Clebert,

Is there a motivation for adding a second way? We generally try to avoid
having two ways to do something unless it's really needed...I suspect you
have a reason for wanting this, though.

-Jay

On Mon, Oct 2, 2017 at 6:15 AM Clebert Suconic 
wrote:

> At ActiveMQ and ActiveMQ Artemis, ConnectionFactories have an
> interesting feature where you can pass parameters through an URI.
>
> I was looking at Producer and Consumer APIs, and these two classes are
> using a method that I considered old for Artemis resembling HornetQ:
>
> Instead of passing a Properties (aka HashMaps), users would be able to
> create a Consumer or Producer by simply doing:
>
> new Consumer("tcp::/host:port?properties=values;properties=values...etc");
>
> Example:
>
>
> Instead of the following:
>
> Map config = new HashMap<>();
> config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:");
> config.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, -2);
> new KafkaConsumer<>(config, new ByteArrayDeserializer(), new
> ByteArrayDeserializer());
>
>
>
> Someone could do
>
> new KafkaConsumer<>("tcp://localhost:?receive.buffer.bytes=-2",
> new ByteArrayDeserializer(), new ByteArrayDeserializer());
>
>
>
> I don't know if that little API improvement would be welcomed? I would be
> able to send a Pull Request but I don't want to do it if that wouldn't
> be welcomed in the first place:
>
>
> Just an idea...  let me know if that is welcomed or not.
>
> If so I can forward the discussion into how I would implement it.
>


[GitHub] kafka pull request #4023: KAFKA-5829: Only delete producer snapshots before ...

2017-10-05 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/4023

KAFKA-5829: Only delete producer snapshots before the recovery point



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-5829-avoid-reading-older-segments-on-hard-shutdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4023


commit 2a98d68a85f97507423cc50cd002e2db27a47c60
Author: Ismael Juma 
Date:   2017-10-05T13:26:04Z

Several clean ups in Log, LogManager, etc.

commit 291c73e54b808d0c0af57c08107c3c8f357e8188
Author: Ismael Juma 
Date:   2017-10-05T13:57:07Z

KAFKA-5829: Only delete producer snapshots before the recovery point




---


Re: [DISCUSS] KIP-201: Rationalising Policy interfaces

2017-10-05 Thread Tom Bentley
I'd like to raise a somewhat subtle point about how the proposed API should
behave.

The current CreateTopicPolicy gets passed either the request partition
count and replication factor, or the requested assignment. So if the
request had specified partition count and replication factor, the policy
sees a null replicaAssignments(). Likewise if the request specified a
replica assignment the policy would get back null from numPartitions() and
replicationFactor().

These semantics mean the policy can't reject an assignment that happened to
be auto-generated (or rather, it's obvious to the policy that the
assignment is auto generated, because it can't see such assignments),
though it can reject a request because the assignment was auto-generated,
or vice versa.

Retaining these semantics makes the TopicState less symmetric between it's
use in requestedState() and the current state available from the
ClusterState, and also less symmetric between its use for createTopic() and
for alterTopic(). This can make it harder to write a policy. For example,
if I want the policy "the number of partitions must be < 100", if the
requestedState().numPartitions() can be null I need to cope with that and
figure it out from inspecting the replicasAssignments(). It would be much
better for the policy writer to just be able to write:

if (request.requestedState().numPartitions() >= 100)
throw new PolicyViolationException("#partitions must be < 100")

An alternative would be to keep the symmetry (and thus
TopicState.replicasAssignments() would never return null, and
TopicState.numPartitions() and TopicState.replicationFactor() could each be
primitives), but expose the auto-generatedness of the replicaAssignments()
explicitly, perhaps by using a subtype of TopicState for the return type of
requestedState():

interface RequestedTopicState extends TopicState {
/**
 * True if the {@link TopicState#replicasAssignments()}
 * in this request we generated by the broker, false if
 * they were explicitly requested by the client.
 */
boolean generatedReplicaAssignments();
}

Thoughts?

On 4 October 2017 at 11:06, Tom Bentley  wrote:

> Good point. Then I guess I can do those items too. I would also need to do
> the same changes for DeleteRecordsRequest and Response.
>
> On 4 October 2017 at 10:37, Ismael Juma  wrote:
>
>> Those two points are related to policies in the following sense:
>>
>> 1. A policy that can't send errors to clients is much less useful
>> 2. Testing policies is much easier with `validateOnly`
>>
>> Ismael
>>
>> On Wed, Oct 4, 2017 at 9:20 AM, Tom Bentley 
>> wrote:
>>
>> > Thanks Edoardo,
>> >
>> > I've added that motivation to the KIP.
>> >
>> > KIP-201 doesn't address two points raised in KIP-170: Adding a
>> > validationOnly flag to
>> > DeleteTopicRequest and adding an error message to DeleteTopicResponse.
>> > Since those are not policy-related I think they're best left out of
>> > KIP-201. I suppose it is up to you and Mickael whether to narrow the
>> scope
>> > of KIP-170 to address those points.
>> >
>> > Thanks again,
>> >
>> > Tom
>> >
>> > On 4 October 2017 at 08:20, Edoardo Comar  wrote:
>> >
>> > > Thanks Tom,
>> > > looks got to me and KIP-201 could supersede KIP-170
>> > > but could you please add a missing motivation bullet that was behind
>> > > KIP-170:
>> > >
>> > > introducing ClusterState to allow validation of create/alter topic
>> > request
>> > >
>> > > not just against the request metadata but also
>> > > against the current amount of resources already used in the cluster
>> (eg
>> > > number of partitions).
>> > >
>> > > thanks
>> > > Edo
>> > > --
>> > >
>> > > Edoardo Comar
>> > >
>> > > IBM Message Hub
>> > >
>> > > IBM UK Ltd, Hursley Park, SO21 2JN
>> > >
>> > >
>> > >
>> > > From:   Tom Bentley 
>> > > To: dev@kafka.apache.org
>> > > Date:   02/10/2017 15:15
>> > > Subject:Re: [DISCUSS] KIP-201: Rationalising Policy interfaces
>> > >
>> > >
>> > >
>> > > Hi All,
>> > >
>> > > I've updated KIP-201 again so there is now a single policy interface
>> (and
>> > > thus a single key by which to configure it) for topic creation,
>> > > modification, deletion and record deletion, which each have their own
>> > > validation method.
>> > >
>> > > There are still a few loose ends:
>> > >
>> > > 1. I currently propose validateAlterTopic(), but it would be possible
>> to
>> > > be
>> > > more fine grained about this: validateAlterConfig(),
>> validAddPartitions()
>> > > and validateReassignPartitions(), for example. Obviously this results
>> in
>> > a
>> > > policy method per operation, and makes it more clear what is being
>> > > changed.
>> > > I guess the down side is its more work for implementer, and
>> potentially
>> > > makes it harder to change the interface in the future.
>> > >
>> > > 2. A couple of TODOs about what the TopicState interface should return
>> > > when
>> 

[jira] [Resolved] (KAFKA-5877) Controller should only update reassignment znode if there is change in the reassignment data

2017-10-05 Thread Dong Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-5877.
-
Resolution: Fixed

> Controller should only update reassignment znode if there is change in the 
> reassignment data
> 
>
> Key: KAFKA-5877
> URL: https://issues.apache.org/jira/browse/KAFKA-5877
> Project: Kafka
>  Issue Type: Bug
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 1.1.0
>
>
> I encountered a scenario where controller keeps printing the following stack 
> trace repeatedly for a finite set of partitions. Although I have not fully 
> figured out the cause of this event, it seems that controller will update the 
> reassignment znode even if the new data is same as existing data. This patch 
> optimizes the controller behavior by only updating reassignment znode if it 
> needs to change the reassignment znode data.
> 2017/09/12 20:34:05.842 [KafkaController] [Controller 1376005]: Error 
> completing reassignment of partition [FederatorResultEvent,202]
> kafka.common.KafkaException: Partition [FederatorResultEvent,202] to be 
> reassigned is already assigned to replicas 1367001,1384010,1386010. Ignoring 
> request for partition reassignment
> at 
> kafka.controller.KafkaController.initiateReassignReplicasForTopicPartition(KafkaController.scala:608)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> kafka.controller.KafkaController$PartitionReassignment$$anonfun$process$14.apply(KafkaController.scala:1327)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> kafka.controller.KafkaController$PartitionReassignment$$anonfun$process$14.apply(KafkaController.scala:1320)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224) 
> ~[scala-library-2.10.4.jar:?]
> at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) 
> ~[scala-library-2.10.4.jar:?]
> at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403) 
> ~[scala-library-2.10.4.jar:?]
> at 
> kafka.controller.KafkaController$PartitionReassignment.process(KafkaController.scala:1320)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:53)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:53)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) 
> ~[kafka_2.10-0.11.0.9.jar:?]
> at 
> kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:52)
>  ~[kafka_2.10-0.11.0.9.jar:?]
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64) 
> ~[kafka_2.10-0.11.0.9.jar:?]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3839: KAFKA-5877; Controller should only update reassign...

2017-10-05 Thread lindong28
Github user lindong28 closed the pull request at:

https://github.com/apache/kafka/pull/3839


---